Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help dropping observations and generating a new variable

    Hi all,

    I am having trouble with my MSc. Thesis. I have observations that look like this:
    PersonID Role Year
    A 1 2013
    A 1 2014
    A 1 2015
    A 2 2013
    A 2 2014
    B 1 2014
    B 1 2015
    Basically, as it currently is, my data tells you what different jobs a person had and between what years. Therefore, A had 2 different jobs (1 and 2), 1 in years 2013-2015 and 2 in years 2013-2014. The problem is that I am not interested at all. I would like to have 1 observation per person, and I do not care about what different jobs/roles they had.

    What I want to have is 1 observation per person, with the least recent year (for A this would be 2013), and one new variable that tells you the number of years that the person worked (i.e. the number of observations per PersonID). it would be like:
    ID Year Experience_years
    A 2013 5
    B 2014 2

    Do you think that is possible?

    Thank you very much,

    Carla

  • #2
    Documentation you would find useful would include -help system variables- and -help by-. Also, you should re-read the StataList FAQ to learn about using -dataex- to post example data.

    Code:
    // If there are any instances of multiple observations per person-year, the following will
    // not give correct answers, so checking first is a good idea.
    duplicates report PersonID Year
    //
    sort PersonID Year
    by PersonID: gen Experience_years = _N
    // Next command works because -sort- put the observation with the oldest year at the beginning.
    by PersonId: keep if _n == 1
    Note that the -keep- command retains only the oldest year's data on jobs etc. for each person, but I am taking you at your word that this is what you want.

    Comment

    Working...
    X