Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting and reshaping a variable

    Hello

    I have a question regarding splitting and reshaping a variable. My data set looks like the one in the attachment. I like to split the variable "Variable" into Cash, ROA etc. and reshape it to wide format, so that "Name" is the unique identifyer. I have tried it with the reshape formula; reshape wide variable but there I have the problem that I don't really know how to formulate it correctly. Do I have to make it in two steps ore can I just use the reshape or splitting formula?

    It should look like:
    Name Jahr Cash ROA
    ABB 1 Value Value
    ABB 2 Value Value
    ABB 3 Value Value
    ABB 4 Value Value
    CS 1 Value Value
    CS 2 Value Value
    CS 3 Value Value
    CS 4 Value Value

    Many thanks for any hint or help.

    Best regards
    Anna-Laura

  • #2
    You did not provide an example of your data. Please provide an example, preferably using the user-written command dataex from SSC. You can install it and see how it works using

    Code:
    ssc install dataex
    help dataex
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Hello

      Heres my dataset as .dta file and a snapshot of the dataex comand output.



      Many thanks.
      Attached Files

      Comment


      • #4
        When using dataex, you are advised to copy the output shown in Stata's Results window and paste it into your Statalist post. First type in Stata's Command window

        Code:
        set more off
        to prevent the -more- message from appearing and then use dataex. If your dataset is too long, print out a few observations, e.g.

        Code:
        dataex in 1/10
        Last edited by Robert Picard; 23 Sep 2015, 08:42.

        Comment


        • #5
          Hello

          Thanks for your help. Here is a copy of my data set how iit looks like withe the dataex in 1/10 comand:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str21 Name str26 Variable byte Year double Value byte(SMIundSMIMndex CAC40Index DJIndex)
          "3MCOMPANY" "CASH"  1       . 0 0 1
          "3MCOMPANY" "CASH"  2 1447000 0 0 1
          "3MCOMPANY" "CASH"  3 1896000 0 0 1
          "3MCOMPANY" "CASH"  4 1849000 0 0 1
          "3MCOMPANY" "CASH"  5 3040000 0 0 1
          "3MCOMPANY" "CASH"  6 3377000 0 0 1
          "3MCOMPANY" "CASH"  7 2219000 0 0 1
          "3MCOMPANY" "CASH"  8 2883000 0 0 1
          "3MCOMPANY" "CASH"  9 2581000 0 0 1
          "3MCOMPANY" "CASH" 10 1897000 0 0 1
          end
          ------------------ copy up to and including the previous line ----------

          My Question is how I organize this data set that this str26 Variable which contains 5 different accounting measures over 10 years (1-10 s. byte Year Variable) for several firms is seperateted in 5 different varibables itn the wide format whith the name Cash, Total Liabilities etc.

          Many thanks.
          Anna-Laura

          Comment


          • #6
            The question does not make sense to me. On the evidence of your example your data include a string variable Variable with values like "CASH" and it is followed by numeric variables Year Value ... DJIndex. All looks fine. Look at your data in the Data Editor to see.

            Comment


            • #7
              I agree with Nick. Your data is in the ideal format for analysis. What you should look to do is to create dummies for Cash, Total Liabilities etc.

              However, if you insist on having many variables (instead of using dummies), and have the time, here is one way to proceed... and probably by far not the most efficient way!



              Code:
              use "Long Set.dta", clear
              encode  Name, gen (name)
              encode  Variable, gen (variable)
              separate  Value, by(variable)
              You can use a loop to rename the new variables using their value labels


              Code:
                          storage  display     value
              variable name   type   format      label      variable label
              ---------------------------------------------------------------------------------------------------
              Value1          long   %12.0g                 Value, variable == CASH
              Value2          long   %12.0g                 Value, variable == CURRENTASSETS
              Value3          long   %12.0g                 Value, variable == CURRENTLIABILITIES
              Value4          long   %12.0g                 Value, variable == EARNINGSBEFINTEREST&TAXES
              Value5          double %12.0g                 Value, variable == MARKETVALUE
              Value6          double %12.0g                 Value, variable == OPERATINGPROFITMARGIN
              Value7          double %12.0g                 Value, variable == ROA
              Value8          long   %12.0g                 Value, variable == TOTALASSETS
              Value9          long   %12.0g                 Value, variable == TOTALCAPITAL
              Value10         double %12.0g                 Value, variable == TOTALDEBT%TOTALCAPITAL/STD
              Code:
              foreach v of varlist  Value1-  Value10{
              local x: variable label `v'
              di "`x'"
              rename `v' `=strtoname("`x'")'
              }
              Finally generate a data set for each variable and merge. I will illustrate only for the first three variables


              Code:
              qui tab  variable, gen(var)
              drop   Name Variable Value  variable
              save master.dta, replace
              keep if var1==1
              keep  Year SMIundSMIMndex CAC40Index DJIndex name Value__variable____CASH
              sort  Year SMIundSMIMndex CAC40Index DJIndex name Value__variable____CASH
              save one, replace
              
              
              use master.dta
              keep if var2==1
              keep  Year SMIundSMIMndex CAC40Index DJIndex name  Value__variable____CURRENTASSETS
              sort  Year SMIundSMIMndex CAC40Index DJIndex name  Value__variable____CURRENTASSETS
              save two.dta, replace
              
              use master.dta
              keep if var3==1
              keep  Year SMIundSMIMndex CAC40Index DJIndex name  Value__variable____CURRENTLIABILS
              save three.dta, replace
              Once you are done, merge and you have what you want!


              Code:
              use one.dta
              merge 1:1  Year SMIundSMIMndex CAC40Index DJIndex name using "two.dta"
              drop  _merge
              merge 1:1  Year SMIundSMIMndex CAC40Index DJIndex name using "three.dta"
              drop  _merge



              Code:
              . sort name Year
              
              . list
              
                   +-------------------------------------------------------------------------------------------------+
                   | Year   SMIund~x   CAC40I~x   DJIndex                    name   Value_~H   Value__~S   Value__~L |
                   |-------------------------------------------------------------------------------------------------|
                1. |    1          0          0         1               3MCOMPANY          .     7115000     5238000 |
                2. |    2          0          0         1               3MCOMPANY    1447000     8946000     7323000 |
                3. |    3          0          0         1               3MCOMPANY    1896000     9838000     5362000 |
                4. |    4          0          0         1               3MCOMPANY    1849000     9598000     5839000 |
                5. |    5          0          0         1               3MCOMPANY    3040000    10795000     4897000 |
                   |-------------------------------------------------------------------------------------------------|
                6. |    6          0          0         1               3MCOMPANY    3377000    12215000     6089000 |
                7. |    7          0          0         1               3MCOMPANY    2219000    12240000     5441000 |
                8. |    8          0          0         1               3MCOMPANY    2883000    13630000     6200000 |
                9. |    9          0          0         1               3MCOMPANY    2581000    12733000     7498000 |
               10. |   10          0          0         1               3MCOMPANY    1897000    11765000     5998000 |
                   |-------------------------------------------------------------------------------------------------|
               11. |    1          1          0         0                  ABBLTD    4258320    18675360    15319920 |
               12. |    2          1          0         0                  ABBLTD    5199640    20955940    15098720 |
               13. |    3          1          0         0                  ABBLTD    5277704    26810734    16974910 |
               14. |    4          1          0         0                  ABBLTD    1852312    26430657    17528676 |
               15. |    5          1          0         0                  ABBLTD    1430716    26569256    15535856 |
                   |-------------------------------------------------------------------------------------------------|
               16. |    6          1          0         0                  ABBLTD    1728834    23954298    15912558 |
               17. |    7          1          0         0                  ABBLTD    1552390    22579536    15801548 |
               18. |    8          1          0         0                  ABBLTD    2550144    25931044    17661396 |
               19. |    9          1          0         0                  ABBLTD    2155702    23597525    14890775 |
               20. |   10          1          0         0                  ABBLTD    2204692    24836084    15486520 |
                   |-------------------------------------------------------------------------------------------------|
               21. |    1          0          1         0                 ACCORSA    1763000     4960000     6386000 |
               22. |    2          0          1         0                 ACCORSA    1267000     3821000     5061000 |
               23. |    3          0          1         0                 ACCORSA    1138000     3991000     5522000 |
               24. |    4          0          1         0                 ACCORSA    1253000     3960000     5432000 |
               25. |    5          0          1         0                 ACCORSA    1164000     4312000     5670000 |
                   |-------------------------------------------------------------------------------------------------|
               26. |    6          0          1         0                 ACCORSA      84000     2310000     2336000 |
               27. |    7          0          1         0                 ACCORSA      85000     2576000     2293000 |
               28. |    8          0          1         0                 ACCORSA     122000     2925000     2736000 |

              Comment


              • #8
                That was exactly my question of how I can separate the Variable "Variable" in to several different ones. As I was not sure which comad I should us, I tought about a reshaping comand rather than a create comand.

                So many thanks and have a good one.

                Comment


                • #9
                  Unless I'm missing something, this is simply a reshape to wide. The only problems are some missing values for the variable Variable and the values of Variable being too wide and containing illegal characters that can't be rolled into a variable name.

                  Code:
                  clear
                  use "Long Set.dta", clear
                  
                  * generate a shorter suffix for wide variables
                  gen suffix = substr(strtoname(Variable),1,10)
                  replace suffix = "missing" if mi(suffix)
                  
                  * check that the suffix is specific enough
                  bysort suffix (Variable): assert Variable[1] == Variable[_N]
                  drop Variable
                  
                  reshape wide Value, i(Name Year) j(suffix) string

                  Comment

                  Working...
                  X