Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape specific variables

    Hello,

    I am using Stata 17.0/BE for Mac and I’m trying to reshape my data for a panel. I want the following:
    • put all the _course* variables as observations under a new variable (newvar1);
    • all the observations for _course* variables to be under a new variable (newvar2);
    • all the observations for variables year and stuff* to adjust
    This is what I have:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(id year) double stuff int stufff2 byte(_course1 _course2 _course3 _course4 _course5)
    333 2010 6  45 83 90  .  .  .
    333 2011 6  16  .  .  .  .  .
    333 2011 7 117  .  . 84 66 73
    333 2012 8 117  .  . 83  . 83
    end
    And this is what I want it to look like after the reshape:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id str8 newvar1 int year double stuff int stufff2 byte _newvar2
    333 "_course1" 2010 6  45 83
    333 "_course2" 2010 6  45 90
    333 "."        2011 6  16  .
    333 "_course3" 2011 7 117 84
    333 "_course4" 2011 7 117 66
    333 "_course5" 2011 7 117 73
    333 "_course3" 2012 8 117 83
    333 "_course5" 2012 8 117 83
    end
    Thank you for your help,

    Nicolas Charette


  • #2
    Consider this:
    Code:
    reshape long _course , i(id year stuff*) j(course_num)
    
    gen newvar1 = "_course" + string(course_num)
    drop course_num
    rename _course newvar2
    drop if missing(newvar2)
    It doesn't have the third observation in your required dataset; is that a problem?

    Comment


    • #3
      Maybe easier to use expand command and then apply logic to recode to what you want

      Comment


      • #4
        This builds on the code in post #2 and creates the single observation when all values of _course are missing.
        Code:
        generate seq = _n
        reshape long _course , i(seq) j(newvar1) string
        replace newvar1 = "_course" + newvar1
        rename _course newvar2
        sort id year seq
        // keep one obs when all values of newvar2 are mssing
        by id year seq: egen c = count(newvar2)
        by id year seq: replace newvar1 = "" if c==0 & _n==1
        drop c seq
        drop if newvar2==. & newvar1!=""
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str8 newvar1 int(id year) double stuff int stufff2 byte newvar2
        "_course1" 333 2010 6  45 83
        "_course2" 333 2010 6  45 90
        ""         333 2011 6  16  .
        "_course3" 333 2011 7 117 84
        "_course4" 333 2011 7 117 66
        "_course5" 333 2011 7 117 73
        "_course3" 333 2012 8 117 83
        "_course5" 333 2012 8 117 83
        end
        A few comments.

        Since I needed a sequence number to identify the original observations in the long dataset, I used it rather than id and year and stuff and stuff2 to identify the observations in the reshaped dataset. Also, it is not clear that there cannot be two observations with the same values of id, year, stuff, and stuff2, in which case the reshape long i(id year stuff*) would fail.

        I preserve the suffix 1-5 as a string so I can just use string concatenation to turn it back into _course1 - course5.

        To create the single observation with missing values when all values of _course are missing, I count the number of non-missing values and if the count is zero, replace newvar1 with "" (which is Stata's string missing value), and then don't drop those observations.

        Added in edit: the code below produces the same result, perhaps more simply.
        Code:
        generate seq = _n
        egen c = rownonmiss(_course*)
        reshape long _course , i(seq) j(newvar1) string
        replace newvar1 = "_course" + newvar1
        rename _course newvar2
        // keep one obs when all values of newvar2 are mssing
        replace newvar1 = "" if c==0 & newvar1=="_course1"
        drop c seq
        drop if newvar2==. & newvar1!=""
        Last edited by William Lisowski; 12 Aug 2022, 09:27.

        Comment


        • #5
          Thank you very much Hemanshu! It worked. Thanks for added code as well, William!

          Comment

          Working...
          X