Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split several observation from a cell to create new row in my data-set.

    Hello,
    I would like to start by thanking the forum participants. I learned a lot here.
    I want to build a panel data set. To manage my data, I have to deal with the following problem.
    For the key variable, SEC, the original dataset recorded several observations in the same cell for a specific country - year. I need those observations to be in a separate row keeping all the other observations (country, year ....) the same.
    There is a unique ID for each row. It would be ideal to create another unique ID for the new rows (but that is not the main concern now)
    I have tried to split the observations into unique columns and then convert the data from wide format to long format (with the help of this discussion: https://www.statalist.org/forums/for...s-in-same-cell)
    Unfortunately, I failed. I would appreciate it if you offer any help.

    P.S: The original sector variable "SEC" was a string variable and I destringed it.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year long(cty SEC) double ID
    2009 1 16806       814422
    2009 1  6604    653200578
    2012 1   198  84481521203
    2018 1  5595 369461773733
    2012 1  7064   8448125978
    2014 1 12184  90921701158
    2013 1  5078    369172859
    2017 1 15289  36943208613
    2016 1  5078     11955912
    2018 1 11945 369541585591
    2017 1 16217 369381271124
    2018 1 14941    304614261
    2009 1  6604    110209654
    2018 1 16360    319078768
    2019 1  5094     43106924
    2015 1 16859     38150319
    2010 1 14491  36895145493
    2013 1 16395    100368761
    2012 1  5078  36963182202
    2015 1   438   3695341102
    end
    label values cty cty
    label def cty 1 "Austria", modify
    label values SEC SEC
    label def SEC 198 "012", modify
    label def SEC 438 "013", modify
    label def SEC 5078 "214", modify
    label def SEC 5094 "214, 013", modify
    label def SEC 5595 "214, 347, 463", modify
    label def SEC 6604 "222", modify
    label def SEC 7064 "235", modify
    label def SEC 11945 "345, 333, 120, 334, 335, 342, 346, 379, 414, 341, 352, 343, 351, 353, 232, 354, 483, 389, 344, 347, 369, 282, 019, 348, 362, 392, 361, 292, 317, 319, 321, 327, 322, 325, 326, 263, 265, 261, 266, 264, 355, 267, 279, 273, 272, 281, 448
    > , 271, 295, 293, 296, 375, 373, 372, 371, 163, 382, 413, 411, 393, 412, 429, 433, 415, 421, 416, 431, 432, 439, 434, 445, 435, 444, 449, 452, 446, 443, 442, 451, 469, 472, 474, 473, 479, 476, 475, 471, 491, 492, 499, 494, 493, 482, 481, 447, 381, 465, 
    > 385", modify
    label def SEC 12184 "347", modify
    label def SEC 14491 "369, 461", modify
    label def SEC 14941 "412", modify
    label def SEC 15289 "431", modify
    label def SEC 16217 "461, 473, 474, 385", modify
    label def SEC 16360 "465", modify
    label def SEC 16395 "471", modify
    label def SEC 16806 "713", modify
    label def SEC 16859 "965", modify

  • #2
    No, you emphatically did not -destring- the original SEC variable. You -encode-d it, which just made it an unworkable mess. So the first thing to do is undo that. Then the rest is just a mattter of -split-ting it and -reshape-ing the result.

    Code:
    decode SEC, gen(str_SEC)
    drop SEC
    
    split str_SEC, parse(",") destring gen(SEC)
    drop str_SEC
    reshape long SEC, i(ID)
    Note: the resulting dat set has an additional variable _j which gives the order in which that particular value of SEC was mentioned for that ID in the original data. If you will make use of that information, I suggest renaming _j to something of mnemonic value. If not, then just -drop _j- to avoid cluttering the data set with junk.

    Comment


    • #3
      Thanks a lot. The code worked perfectly. At that time I only worked on a small part of my dataset and it was fine. Now I want to do the same analysis across the EU. Now I have more than 180,000 rows. Reshaping this data into long format seems like an impossible task. Is it a problem of dimensionality? I wait more than an hour and still no result. Is there a way to reshape such a large dataset?

      Comment


      • #4
        Yes, large data sets take a long time to reshape. There are a few user-written commands that speed up the process considerably. The one I use is -tolong-, by Rafal Racibgorski, available from SSC.

        Comment


        • #5
          Hello,
          I am dealing with a problem and I hope Statalist can help me. In my data set, I have policies that are imposed by the government at a specific time. It is shown as Inception date (IncSate) in the dataset. There is also a removal date (RmvDate) that shows when the policy is removed from being active. If a policy has inception data of 2008 and a removal date of 2012, it means the policy has been in action for 4 years. if the inception date is 2008 and the removal date is missing, it has been implemented for 13 years (since the span time is from 2008 to 2020). There is no explicit information about the years of implementation and I need to add one observation(row) for every policy that has been in implementation for n year, just like the county decided to impose the policy every year (at the end I want to collapse them to built a county-year by type of policy panel data).

          I have no idea how to count the years and have the new rows. I would appreciate it a lot if you give me a hint.



          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str14 ID str30 ImplementingJurisdiction float(IncDate RmvDate) str62 InterventionType
          "921798129" "Argentina" 17981     . "Import licensing requirement"
          "921798130" "Argentina" 17981     . "Import licensing requirement"
          "921798131" "Argentina" 17981     . "Import licensing requirement"
          "921798132" "Argentina" 17981     . "Import licensing requirement"
          "921798133" "Argentina" 17981     . "Import licensing requirement"
          "921798134" "Argentina" 17981     . "Import licensing requirement"
          "921798135" "Argentina" 17981     . "Import licensing requirement"
          "921798136" "Argentina" 17981     . "Import licensing requirement"
          "921798137" "Argentina" 17981     . "Import licensing requirement"
          "95179571"  "Ukraine"   17957 18139 "Import tariff"               
          "95179572"  "Ukraine"   17957 18139 "Import tariff"               
          "95179573"  "Ukraine"   17957 18139 "Import tariff"               
          "95179574"  "Ukraine"   17957 18139 "Import tariff"               
          "95179575"  "Ukraine"   17957 18139 "Import tariff"               
          "95179576"  "Ukraine"   17957 18139 "Import tariff"               
          "95179577"  "Ukraine"   17957 18139 "Import tariff"               
          "95179578"  "Ukraine"   17957 18139 "Import tariff"               
          "95179579"  "Ukraine"   17957 18139 "Import tariff"               
          "951795710" "Ukraine"   17957 18139 "Import tariff"               
          "951795711" "Ukraine"   17957 18139 "Import tariff"               
          "100179081" "Russia"    17908 18455 "Import tariff"               
          end
          format %td IncDate
          format %td RmvDate




          Comment


          • #6
            It is unclear what you want the final result to look like. First, there is no RmvDate for the Argentina observations. So how many years do you want to assign to that?

            Second, it seems that many of the observations have IncDate and RmvDate in the same year--so these policies were in effect only for part of one year. Is that correct?

            Third, within country, it seems that all of the observations are identical except for the ID. Do you want to retain separate information for each ID, or do you want to combine all "Import licensing requirement" into a single combined observation for the number of years these were in effect?

            Please reply by working out by hand a small example of what the final results should look like, and show that.

            Finally, your example contains only one observation of a policy that spans two years. When responding please provide a fuller example that contains some multi-year policies so that code can be checked to assure that it handles that situation correctly.

            Comment


            • #7
              Thank you very much for the prompt response. I will try to answer all the questions:
              1- when there is no RmvDate, it means the policy is in effect until the end of the time span of the dataset (2020) so I consider 13 years (times)
              2- when the inception date and removal date is the same year, I consider imposing the policy for 1 year (time)
              3- I want to build a panel date that every county-year I have how many of each trade policy is imposed


              ID year county MAST chapter count
              1 2008 Italy Migration measure 10
              2 2009 Italy Tariff measures 8
              3- 2008 Germany Migration measure 3
              4 2009 Germany Tariff measures 15



              Thus I would like a panel of data based on country- year that counts how many of each type of policy is implemented in each country-year. To this end, I need to consider a policy that was in effect from 2008 to 2015 as it is imposed every year. So it must be counted every year from 2008 to 2015.

              I constructed variables for IncYr and RmvYr as the years the policy is started and removed. I think it is easier to work with "year"





              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str14 ID str30 ImplementingJurisdiction str93 MASTchapter float(IncYr RmvYr)
              "53178921"   "Zambia"    "Tariff measures"                                                                               2008    .
              "92179811"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179812"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179813"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179814"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179815"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179816"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179817"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179818"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "92179819"   "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798110"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798111"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798112"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798113"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798114"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798115"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798116"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798117"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798118"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798119"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798120"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798121"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798122"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798123"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798124"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798125"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798126"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798127"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798128"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798129"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798130"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798131"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798132"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798133"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798134"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798135"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798136"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "921798137"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "95179571"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179572"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179573"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179574"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179575"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179576"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179577"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179578"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "95179579"   "Ukraine"   "Tariff measures"                                                                               2009 2009
              "951795710"  "Ukraine"   "Tariff measures"                                                                               2009 2009
              "951795711"  "Ukraine"   "Tariff measures"                                                                               2009 2009
              "100179081"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179082"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179083"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179084"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179085"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179086"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179087"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179088"  "Russia"    "Tariff measures"                                                                               2009 2010
              "100179089"  "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790810" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790811" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790812" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790813" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790814" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790815" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790816" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790817" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790818" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790819" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790820" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790821" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790822" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790823" "Russia"    "Tariff measures"                                                                               2009 2010
              "1001790824" "Russia"    "Tariff measures"                                                                               2009 2010
              "101180221"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180222"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180223"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180224"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180225"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180226"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180227"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180228"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "101180229"  "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802210" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802211" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802212" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802213" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802214" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802215" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802216" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802217" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802218" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802219" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802220" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802221" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802222" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802223" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802224" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802225" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802226" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              "1011802227" "Argentina" "E1 Non-automatic import-licensing procedures other than authorizations for SPS or TBT reasons" 2009    .
              end


              Comment


              • #8
                An example of the final dataset with better quality:
                ID country Year MAST Chapter count
                1 Italy 2008 Migration measure 10
                2 Italy 2009 Tariff measures 8
                3 Germany 2008 Migration measure 3
                4 Germany 2009 Tariff measures 15

                Comment


                • #9
                  Thank you.
                  Code:
                  replace RmvYr = 2020 if missing(RmvYr)
                  expand RmvYr-IncYr+1
                  by ID IncYr, sort: gen year = IncYr + _n - 1
                  
                  contract ImplementingJurisdiction year MASTchapter, freq(count)
                  order ImplementingJurisdiction year MASTchapter count
                  gen long id = _n, before(ImplementingJurisdiction)
                  Note: The last two commands are just cosmetic; the meat of the calculation is everything through the -contract- command.

                  Comment


                  • #10
                    Thanks a lot. The code worked perfectly.

                    Comment


                    • #11
                      Hello,

                      I have a problem with the names of my variables when I reshape them, my dataset looks like

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input long id str30 ImplementingJurisdiction float year str93 MASTchapter int count
                       1 "Afghanistan" 2010 "P5 Export taxes and charges"  5
                       2 "Afghanistan" 2011 "P5 Export taxes and charges"  5
                       3 "Afghanistan" 2012 "P5 Export taxes and charges"  5
                       4 "Afghanistan" 2013 "P5 Export taxes and charges"  5
                       5 "Afghanistan" 2014 "P5 Export taxes and charges"  5
                       6 "Afghanistan" 2015 "P5 Export taxes and charges"  5
                       7 "Afghanistan" 2016 "P5 Export taxes and charges"  5
                       8 "Afghanistan" 2017 "P5 Export taxes and charges"  5
                       9 "Afghanistan" 2018 "P5 Export taxes and charges"  5
                      10 "Afghanistan" 2019 "P5 Export taxes and charges"  5
                      11 "Afghanistan" 2020 "P5 Export taxes and charges"  5
                      12 "Albania"     2009 "Tariff measures"              9
                      13 "Albania"     2010 "Tariff measures"             11
                      14 "Albania"     2011 "Tariff measures"             12
                      15 "Albania"     2012 "P7 Export subsidies"          1
                      16 "Albania"     2012 "Tariff measures"             20
                      17 "Albania"     2013 "FDI measures"                 1
                      18 "Albania"     2013 "P7 Export subsidies"          1
                      19 "Albania"     2013 "Tariff measures"             34
                      20 "Albania"     2014 "FDI measures"                 1
                      end


                      and I need it to be like this


                      Countryid year Implementingjurisdiction Exporttaxesandcharges Tariffmeasures Exportsubsidies Redmeasurestotal
                      1 2008 Afghanistan . . . . .
                      1 2009 Afghanistan . . . . .
                      1 2010 Afghanistan 5 0 0 0 5
                      1 2011 Afghanistan 5 0 0 0 5
                      1 2012 Afghanistan 5 0 0 0 5
                      1 2013 Afghanistan 5 0 0 0 5
                      1 2014 Afghanistan 5 0 0 0 5
                      1 2015 Afghanistan 5 0 0 0 5
                      1 2016 Afghanistan 5 0 0 0 5
                      1 2017 Afghanistan 5 0 0 0 5
                      1 2018 Afghanistan 5 0 0 0 5
                      1 2019 Afghanistan 5 0 0 0 5
                      1 2020 Afghanistan 5 0 0 0 5
                      1 2021 Afghanistan 0 0 0 0 0
                      1 2022 Afghanistan 0 0 0 0 0
                      1 2023 Afghanistan 0 0 0 0 0


                      I used
                      reshape wide count , i( id ) j(MASTchapter) string



                      but I always get errors regarding the MASTchapter being a string variable or because it includes space.

                      I know that what is inside the MASTchapter is large sentences, but I need them as is. I would appreciate it if you could suggest any solution to this.

                      Comment


                      • #12
                        I know that what is inside the MASTchapter is large sentences, but I need them as is.
                        No you don't, and your desired results tableau shows the solution: get rid of the spaces!

                        Code:
                        replace MASTchapter = subinstr(MASTchapter, " ", "", .)
                        reshape wide count, i(id) j(MASTchapter) string
                        rename count* *
                        Added: I don't know if your example shows the full range of values of MASTchapter. If there are other values of MASTchapter that contain characters that are illegal in variable names, you do not have to go through the tedious task of writing separate -replace- commands for each. You could, instead, in the first command use:
                        Code:
                        replace MASTchapter = strtoname(MASTchapter)
                        In fact, you might even prefer it for the situation where blanks are the only problem because it creates names that have underscore characters replacing the illegal ones. And something like Tariff_measures is easier on the eye than Tariffmeasures.
                        Last edited by Clyde Schechter; 16 Nov 2021, 19:30.

                        Comment


                        • #13
                          Thanks a lot. The code is perfect and it works for my example very well. In the main dataset for "MAST chapter", I have larger names like "A_Sanitary_and_phytosanitary_mea" (after running "strtoname" command) this is bigger than 31 characters and when I continue with your code I receive "countA_Sanitary_and_phytosanitary_mea invalid variable name" as an error. I understand that I can replace these names with short names one by one. But is there a smarter way to do it, as stata shrinks it to a workable name? My dataset is large so it is much faster.

                          Comment


                          • #14
                            Good that you are getting answers but each new question is about something other than the thread title. Please start a new thread when you have a different question.

                            Comment


                            • #15
                              Re #13:

                              A quick way to do it that will work in most circumstances is:
                              Code:
                              rename count _
                              replace MASTchapter = strtoname(substr(MASTchapter, 1, 31))
                              reshape wide _, i(id) j(MASTchapter) string
                              rename _* *
                              The circumstances in which this won't work are:

                              1. There are already variables in your data set whose names begin with _. In that case, the final -rename- command will wrongly remove the initial _ from those, potentially causing you other problems.
                              2. If there are different values of MASTchapter that start with the same first 31 characters. In that case, the -reshape- will fail because it will find repeated values of MASTchapter within id.

                              Neither of these problems is common in practice, though I have certainly run into each of them on occasion, the first more commonly than the second.

                              If you run into those problems you will have to develop some more tailored way of shortening or changing the values of MASTchapter. The binding constraint in all circumstances is that the names to be created by -reshape- can have at most 32 characters, and they must all be distinct.

                              Comment

                              Working...
                              X