Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expanding data with start/end date to include all dates in between

    I have a dataset with information on individual id, signup & dropout year/month in the program I am evaluating.

    The data is structured in the following format: we have 2 observations by id: the first one corresponds to the year/month the individual signed up for the program & the last one the year/month the individual droped out from the program.

    I would like to transform the data so as to have 1 observation for each year/month in which the id was enrolled in the program.

    Consider this simple data example:
    Code:
    clear
    input id year month
    A  2010 2
    A  2010 7
    B  1999 11
    B  2000 3
    end
    I would be looking for the following output
    Code:
    clear
    input id year month
    A  2010 2
    A  2010 3
    A  2010 4
    A  2010 5
    A  2010 6
    A  2010 7
    B  1999 11
    B  1999 12
    B  2000 1
    B  2000 2
    B  2000 3
    end

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 id float(year month)
    "A" 2010  2
    "A" 2010  7
    "B" 1999 11
    "B" 2000  3
    end
    
    gen time= ym(year, month)
    format time %tm
    encode id, g(ID)
    xtset ID time
    tsfill
    l ID time, sepby(ID)
    Res.:

    Code:
    . l ID time, sepby(ID)
    
         +--------------+
         | ID      time |
         |--------------|
      1. |  A    2010m2 |
      2. |  A    2010m3 |
      3. |  A    2010m4 |
      4. |  A    2010m5 |
      5. |  A    2010m6 |
      6. |  A    2010m7 |
         |--------------|
      7. |  B   1999m11 |
      8. |  B   1999m12 |
      9. |  B    2000m1 |
     10. |  B    2000m2 |
     11. |  B    2000m3 |
         +--------------+
    
    .

    Comment


    • #3
      Thanks Andrew Musau !
      How could I then extract the year and month components of the time variable?
      Also, could I also "un-xtset" the data in case I wish Stata not to read it as a panel?

      Comment


      • #4
        Originally posted by Paula de Souza Leao Spinola View Post
        Thanks Andrew Musau !
        How could I then extract the year and month components of the time variable?
        Code:
        replace year= year(dofm(time))
        replace month= month(dofm(time))

        Also, could I also "un-xtset" the data in case I wish Stata not to read it as a panel?
        Code:
        xtset, clear

        Comment


        • #5
          Here's another way to do it:


          Code:
          clear
          input str1 id year month
          A  2010 2
          A  2010 7
          B  1999 11
          B  2000 3
          end
          
          gen mdate = ym(year, month)
          
          bysort id (mdate) : gen toexpand = cond(_n == 1, mdate[_N] - mdate[1], 1)
          
          expand toexpand
          
          bysort id (mdate) : replace mdate = mdate[_n-1] + 1 if inrange(_n, 2, _N-1)
          
          format mdate %tm
          
          list id mdate, sepby(id)
          
               +--------------+
               | id     mdate |
               |--------------|
            1. |  A    2010m2 |
            2. |  A    2010m3 |
            3. |  A    2010m4 |
            4. |  A    2010m5 |
            5. |  A    2010m6 |
            6. |  A    2010m7 |
               |--------------|
            7. |  B   1999m11 |
            8. |  B   1999m12 |
            9. |  B    2000m1 |
           10. |  B    2000m2 |
           11. |  B    2000m3 |
               +--------------+
          Note that the data example in #1 won't run, because as in #2 and here you need to specify that the identifier is string.

          Comment


          • #6
            This is brilliant, thanks!

            Comment

            Working...
            X