Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysing Longitudinal Data - Aggregating waves

    I am a novice at Stata and currently doing a project for my Econ undergrad econometrics module.

    I am using data from UKDS Understanding Society: Longitudinal Data. Each observation is assigned with a personal id number representing an individual who was surveyed multiple times (waves) across many years. There is a maximum of 9 waves, but some individuals did not successfully complete all 9 surveys, some have only 1 or 4 waves of response etc.

    Based on the summarize waves input, I found the average amount of waves responded to is 4.23 out of 9.

    I want to be able to identify the personal id numbers which only completed less than the average amount of waves and drop them from the data. I do not want to merge the responses of each personal id number into one observation because this would distort the data too much.

    What code can I use for stata to recognise each personal id number and put the ones that are the same together, such that I can output a count of how many waves each personal id completed?

    I also want to be able to look at how each personal id number's income changed over time.
    Last edited by Iylana James; 04 Nov 2024, 05:41.

  • #2
    It's difficult to answer without a data example. Assuming each observation represents an individual in a specific wave, you have a structure similar to Stata's nlswork dataset.

    Code:
    webuse nlswork, clear
    keep in 1/100
    tab id
    Res.:

    Code:
     tab id
    
         NLS ID |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         12       12.00       12.00
              2 |         12       12.00       24.00
              3 |         15       15.00       39.00
              4 |         11       11.00       50.00
              5 |         11       11.00       61.00
              6 |         15       15.00       76.00
              7 |          8        8.00       84.00
              9 |         13       13.00       97.00
             10 |          3        3.00      100.00
    ------------+-----------------------------------
          Total |        100      100.00
    If we wanted to keep individuals appearing in 10 or more observation years, we'd proceed as follows:

    Code:
    bys id: gen count=_N
    keep if count>=10
    tab id
    Res.:

    Code:
    . tab id
    
         NLS ID |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         12       13.48       13.48
              2 |         12       13.48       26.97
              3 |         15       16.85       43.82
              4 |         11       12.36       56.18
              5 |         11       12.36       68.54
              6 |         15       16.85       85.39
              9 |         13       14.61      100.00
    ------------+-----------------------------------
          Total |         89      100.00

    Comment

    Working...
    X