Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Egen max by cutoff age in panel data

    I have some panel data that is somewhat messy in capturing individuals' education, because of a combination of relatively high missingness in the education variable as well as the fact that respondents are fairly different ages at the onset of the survey.

    What I want to do, then, is just simply create a variable that is the max value of -educ- (years of education) by the time the respondent is age 21. I've tried a few combinations of bysort (): gen max but have run into trouble because 21 is not the max age of anyone in the survey. Here's some basic data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id int year byte(age educ)
    1 1990 16 10
    1 1991 17  .
    1 1992 18 12
    1 1993 19 12
    1 1994 20  .
    1 1995 21  .
    1 1996 22 13
    1 1997 23 14
    1 1998 24 15
    2 1990 20  .
    2 1991 21 14
    2 1992 22 14
    2 1993 23 14
    2 1994 24 14
    2 1995 25 14
    2 1996 26  .
    2 1997 27  .
    2 1998 28  .
    end

  • #2
    You should probably just impute the missing values on education using one of the standard approaches. That said, since you ask for a specific algorithm, here is one way to do it. I "fill in" missing values of educ when educ equals 21 by filling in the last known education level for each value that is missing. I do that by taking advantage of some idiosyncratic behavior of the replace command, so I explicitly create a copy of educ called educ_mutate first. You could just mutate educ directly here, but I prefer to work with a copy so that you still have the original variable if you need it. Then I find education when the respondent has age equal to 21 and fill in the wanted variable based on that value. This assumes respondent has an entry at age 21 and at least one record of education level at age 21 or younger.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id int year byte(age educ)
    1 1990 16 10
    1 1991 17  .
    1 1992 18 12
    1 1993 19 12
    1 1994 20  .
    1 1995 21  .
    1 1996 22 13
    1 1997 23 14
    1 1998 24 15
    2 1990 20  .
    2 1991 21 14
    2 1992 22 14
    2 1993 23 14
    2 1994 24 14
    2 1995 25 14
    2 1996 26  .
    2 1997 27  .
    2 1998 28  .
    end
    
    gen educ_mutate = educ
    bysort id (age): replace educ_mutate = educ_mutate[_n-1] if missing(educ_mutate)
    bysort id: gen wanted = educ_mutate if age == 21
    bysort id (wanted): replace wanted = wanted[1]

    Comment


    • #3
      egen can indeed help here. See especially Section 9 in https://journals.sagepub.com/doi/pdf...867X1101100210

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte id int year byte(age educ)
      1 1990 16 10
      1 1991 17  .
      1 1992 18 12
      1 1993 19 12
      1 1994 20  .
      1 1995 21  .
      1 1996 22 13
      1 1997 23 14
      1 1998 24 15
      2 1990 20  .
      2 1991 21 14
      2 1992 22 14
      2 1993 23 14
      2 1994 24 14
      2 1995 25 14
      2 1996 26  .
      2 1997 27  .
      2 1998 28  .
      end
      
      egen wanted = max(cond(age <= 21, educ, .)), by(id)
      
      list, sepby(id)
      
           +---------------------------------+
           | id   year   age   educ   wanted |
           |---------------------------------|
        1. |  1   1990    16     10       12 |
        2. |  1   1991    17      .       12 |
        3. |  1   1992    18     12       12 |
        4. |  1   1993    19     12       12 |
        5. |  1   1994    20      .       12 |
        6. |  1   1995    21      .       12 |
        7. |  1   1996    22     13       12 |
        8. |  1   1997    23     14       12 |
        9. |  1   1998    24     15       12 |
           |---------------------------------|
       10. |  2   1990    20      .       14 |
       11. |  2   1991    21     14       14 |
       12. |  2   1992    22     14       14 |
       13. |  2   1993    23     14       14 |
       14. |  2   1994    24     14       14 |
       15. |  2   1995    25     14       14 |
       16. |  2   1996    26      .       14 |
       17. |  2   1997    27      .       14 |
       18. |  2   1998    28      .       14 |
           +---------------------------------+

      Comment


      • #4
        Thanks both, this is exactly what I was hoping for. Daniel Schaefer I'll probably try imputation as a secondary option and see how it compares.

        Comment

        Working...
        X