Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying differences in a variable within a dataset

    I have a dataset that has data that is pooled across different years.

    At the moment, I would to like be able to identify the occupational codes which are common across two years (e.g. 2004 and 2005).

    So for example, say in the year 2004 the following occupational codes are present:

    0101
    0102
    0103
    0104
    0105

    And in the year 2005, the following occupational codes are present:

    0101
    0102
    1111
    1112
    1113

    Is there a command/function in Stata which can tell me that only occupational codes "0101" and "0102" are present in both? Note that I can't use merge as I'm not using two different .DTA files - it's within the dataset.

    Thanks

  • #2
    https://www.stata.com/support/faqs/d...ions-in-group/ applies.

    Code:
    bysort code (year) ; gen diff = year[_N] != year[1] 
    
    tab code if diff
    If there were more years than two in the dataset the code could be extended

    Code:
    gen relevant = inlist(year, 2004, 2005) 
    
    bysort relevant code (year) : replace diff = year[1] != year[_N] 
    
    tab code if diff & relevant

    Comment


    • #3
      Originally posted by Nick Cox View Post
      https://www.stata.com/support/faqs/d...ions-in-group/ applies.

      Code:
      bysort code (year) ; gen diff = year[_N] != year[1]
      
      tab code if diff
      If there were more years than two in the dataset the code could be extended

      Code:
      gen relevant = inlist(year, 2004, 2005)
      
      bysort relevant code (year) : replace diff = year[1] != year[_N]
      
      tab code if diff & relevant
      Hi Nick,

      Thanks for that. Excellent. I've tried it for two years and it works! How does one extend it across say, 10 years, as that is the challenge I'm facing?

      Thanks.

      Comment


      • #4
        How you would want to extend it? You could work out e.g. the number of years in which each code appears. Indeed why don't you just cross-tabulate code and year?

        Comment


        • #5
          Originally posted by Nick Cox View Post
          How you would want to extend it? You could work out e.g. the number of years in which each code appears. Indeed why don't you just cross-tabulate code and year?
          I'd like to extend such that the same occupational code must occur in all the years (e.g. 2004, 2005, 2006, 2007, 2008 etc.), not just the first two years.
          Last edited by Chris Rooney; 20 Mar 2020, 07:57.

          Comment


          • #6
            Code:
            egen tag = tag(code year) 
            egen wanted = total(tag), by(code) 
            list code if wanted == 10 & tag, noobs

            Comment

            Working...
            X