Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to use the first occurrence for each individual age in a panel data

    Hello,

    I have an unbalanced dataset, consist of pid (each person identification number), years (from 1991 till 2013) and the age of respondents.

    I want to use only the first or the last occurrence of age for each member and use each person specific age (either at first or last occurrence) value to run in the analysis.

    Any suggestions please?

    Much is appreciated.

    Hiba

  • #2
    This sounds as if you just need one or both of

    Code:
    egen minage = min(age), by(pid)
    
    egen maxage = max(age), by(pid)
    
    * help egen for more
    If that is not what you want, please post an example of your data, or data with similar form, showing exactly what you would like to see in a new variable.

    Comment


    • #3
      Well, I don't understand quite what you want to do, but here is how to get the first age and the last age:

      Code:
      egen youngest_age = min(age), by(pid)
      egen oldest_age = max(age), by(pid)
      and then you can figure out which of these you want to use when.

      By the way, I have reinterpreted your request to oldest and youngest ages, as opposed to the first and last recorded ages in whatever order the data set is in. If the data are not sorted chronologically, first and last might be different from youngest and oldest, and also might include missing values. If you really meant first and last, then it is:

      Code:
      gen long sort_order = _n // TO PRESERVE ORDER DURING SORTING
      by pid (sort_order), sort: gen first_age = age[1]
      by pid (sort_order), sort: gen last_age = age[_N]
      (It is unlikely that you intend this second approach.)

      Comment


      • #4
        Thank you Nick and Clyde,

        I meant the second approach that Clyde proposed. but looking at both codes when the data are sorted both give the same value, as the youngest age for each individual will be his/her first recorded value in the sample at first year interviewed.

        What I intend to do is to divide the age category into age subgroups (16-24, 25-39. 40-49, 50-64, and 65+) , but instead of look at their average age across descriptives, i look at the first year observed age value.

        I have attached a picture for the result.

        Thank you for your help and sorry for the confusion.

        It was a simple approach and in my mind I made it so complex.

        Much is appreciated.



        Attached Files

        Comment

        Working...
        X