Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks for the hint to create a simplified version of the dataset. This made me realise that my intention of not making the example too complicated has backfired on me.
    There are 4 observations for every ID (every household): 1) industry in year1 for male, 2) industry in year2 for male, 3) industry in year1 for female, 4) industry in year2 for female

    Code:
    clear
    input ID year industry sex
    1 1 3 1
    1 2 10 1
    1 1 3 2
    1 2 3 2
    2 1 10 1
    2 2 3 1
    2 1 10 2
    2 2 3 2
    3 1 42 1
    3 2 42 1
    3 1 7 2
    3 2 8 2
    end
    bysort ID (year): gen var=industry[1]==3 & industry[2]==10 if sex==1
    list, sepby(ID)
    
         +----------------------------------+
         | ID   year   industry   sex   var |
         |----------------------------------|
      1. |  1      1          3     2     . |
      2. |  1      1          3     1     0 |
      3. |  1      2         10     1     0 |
      4. |  1      2          3     2     . |
         |----------------------------------|
      5. |  2      1         10     2     . |
      6. |  2      1         10     1     0 |
      7. |  2      2          3     1     0 |
      8. |  2      2          3     2     . |
         |----------------------------------|
      9. |  3      1         42     1     0 |
     10. |  3      1          7     2     . |
     11. |  3      2          8     2     . |
     12. |  3      2         42     1     0 |
         +----------------------------------+
    I assume that I would need to sort the observations differently...? Or does this problem require a completely different approach?
    Thanks a lot for your patience, I am new to Stata and barely have any coding experience (as you might have noticed).
    Any suggestion on how to solve this is much appreciated! Thanks

    Comment


    • #17
      I think you need:

      Code:
      isid ID sex year, sort
      by ID sex (year): gen var = industry[1] == 3 & industry[2] == 10 if sex == 1
      Lest we get into further difficulties from other complications in the data, you can only depend on this sort order if ID sex and year uniquely identify the observations. That is why I put the -isid- command in there. If that is not true, then you cannot be sure which observations will sort into the first two positions of a given ID sex year group, so the operation could fail. If ID sex and year do not uniquely identify observations in your data, then the -isid- command will halt execution, and you will have to figure out whether

      1. ID sex and year should uniquely identify observations, so you have a data error that you need to fix, or,
      2. Some additional variable(s) need to be specified to uniquely identify observations and therefore determine a unique sort order.

      Comment


      • #18
        Thanks, Clyde!
        Indeed, when I tried to run the code you suggested in #17, I got an error message saying that the variables cannot be uniquely identified.
        Some observations in my dataset did not follow the desired pattern but displayed two different observations for the same year (ID1 and ID3 have incorrect observations):
        Code:
        clear
        input ID year industry sex
        1 1 3 1
        1 2 10 1
        1 1 23 2
        1 1 7 2
        2 1 10 1
        2 2 3 1
        2 1 4 2
        2 2 4 2
        3 1 42 1
        3 2 42 1
        3 2 42 2
        3 2 42 2
        end
        I dropped the incorrect observations using
        Code:
        bysort ID year: drop if _N!=2
        Then the -isid- command worked perfectly.

        Thanks a lot for your help, Nick and Clyde!

        Comment


        • #19
          I have a similar problem, I would like to see if a variable changes twice over a time period (ignoring missing values). I have a variable, gender, that takes value of 1 or 2. I would like to create a variable that = 1 if the value of gender changes twice over the time period for an individual.

          Comment


          • #20
            I have a variable, gender, that takes value of 1 or 2. I would like to create a variable that = 1 if the value of gender changes twice over the time period for an individual.
            Is this really possible?

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(ID gender time)
            1 1 1
            1 2 2
            1 . 3
            1 2 4
            1 1 5
            1 2 6
            1 1 7
            2 2 1
            2 1 2
            2 2 3
            3 2 1
            3 2 2
            4 . 1
            4 . 2
            4 2 3
            4 2 4
            end
            
            
            gen time2= cond(missing(gender), ., time)
            bys ID (time2): gen change= sum(gender!= gender[_n-1]) if _n>1 & !missing(gender)
            bys ID (time): egen wanted = max(change)
            replace wanted= wanted>1

            Result:

            Code:
              
            . l, sepby(ID)
            
                 +----------------------------------------------+
                 | ID   gender   time   time2   change   wanted |
                 |----------------------------------------------|
              1. |  1        1      1       1        .        1 |
              2. |  1        2      2       2        1        1 |
              3. |  1        .      3       .        .        1 |
              4. |  1        2      4       4        1        1 |
              5. |  1        1      5       5        2        1 |
              6. |  1        2      6       6        3        1 |
              7. |  1        1      7       7        4        1 |
                 |----------------------------------------------|
              8. |  2        2      1       1        .        1 |
              9. |  2        1      2       2        1        1 |
             10. |  2        2      3       3        2        1 |
                 |----------------------------------------------|
             11. |  3        2      1       1        .        0 |
             12. |  3        2      2       2        0        0 |
                 |----------------------------------------------|
             13. |  4        .      1       .        .        0 |
             14. |  4        .      2       .        .        0 |
             15. |  4        2      3       3        .        0 |
             16. |  4        2      4       4        0        0 |
                 +----------------------------------------------+
            Last edited by Andrew Musau; 22 Mar 2019, 09:21.

            Comment


            • #21
              Andrew Musau

              I agree with you that in reality, a person's actual gender changing twice over the course of any study period is going to be very rare (although nowadays children and adolescents sometimes declare themselves trans-gender and then desist only a short time later).

              But if Ms. O'Brien's data comes from an electronic medical records data base, my experience is that it would not be terribly uncommon for the recorded gender to change twice or more over a matter of a year. Such is the poor quality of electronic health data in many settings.

              Comment


              • #22
                Thanks for the context, Clyde.

                Comment

                Working...
                X