Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I recode the variable in this long format data set with Stata?

    I want to create a new variable "signal" in this long format data set. The rule is set below,
    if all status values equal 1 within id, then signal==number of the consecutive status value (which is one)+1
    if all status values equal 0 within id, then signal==1
    if the first 0 occurs several consecutive 1 within id, then signal==number of the consecutive status value (which is one)+1
    For example, for id==1, signal=3+1=4
    *Simulated data for illustrative purpose.
    clear
    input byte (id status)
    1 1
    1 1
    1 1
    1 0
    1 0
    1 0
    1 0
    1 0
    1 0
    2 0
    2 0
    2 0
    2 0
    2 0
    3 1
    3 1
    3 1
    3 1
    3 1
    3 1
    3 1
    4 1
    4 0
    4 0
    4 0
    end

    Thank you for your help!
    Last edited by smith Jason; 27 Mar 2022, 22:28.

  • #2
    Your conditions seem to imply that signal is equal to one if no status from start to end, otherwise one plus the longest status spell. Presumably, you should have a time variable in the dataset.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id status)
    1 1
    1 1
    1 1
    1 0
    1 0
    1 0
    1 0
    1 0
    1 0
    2 0
    2 0
    2 0
    2 0
    2 0
    3 1
    3 1
    3 1
    3 1
    3 1
    3 1
    3 1
    4 1
    4 0
    4 0
    4 0
    5 0
    5 1
    5 1
    5 0
    5 1
    5 1
    5 1
    end
    
    gen time=_n
    bys id (time): replace time=_n
    xtset id time
    bys id: gen spell= sum(status & l.status!=1)
    replace spell=0 if !status
    bys id spell: gen count=-_N if spell
    bys id (count): gen signal= cond(missing(count[1])&missing(count[_N])&!status[1]&!status[_N],1,abs(count[1])+1)
    drop spell count
    Res.:

    Code:
    . sort id time
    
    . l, sepby(id)
    
         +-----------------------------+
         | id   status   time   signal |
         |-----------------------------|
      1. |  1        1      1        4 |
      2. |  1        1      2        4 |
      3. |  1        1      3        4 |
      4. |  1        0      4        4 |
      5. |  1        0      5        4 |
      6. |  1        0      6        4 |
      7. |  1        0      7        4 |
      8. |  1        0      8        4 |
      9. |  1        0      9        4 |
         |-----------------------------|
     10. |  2        0      1        1 |
     11. |  2        0      2        1 |
     12. |  2        0      3        1 |
     13. |  2        0      4        1 |
     14. |  2        0      5        1 |
         |-----------------------------|
     15. |  3        1      1        8 |
     16. |  3        1      2        8 |
     17. |  3        1      3        8 |
     18. |  3        1      4        8 |
     19. |  3        1      5        8 |
     20. |  3        1      6        8 |
     21. |  3        1      7        8 |
         |-----------------------------|
     22. |  4        1      1        2 |
     23. |  4        0      2        2 |
     24. |  4        0      3        2 |
     25. |  4        0      4        2 |
         |-----------------------------|
     26. |  5        0      1        4 |
     27. |  5        1      2        4 |
     28. |  5        1      3        4 |
     29. |  5        0      4        4 |
     30. |  5        1      5        4 |
     31. |  5        1      6        4 |
     32. |  5        1      7        4 |
         +-----------------------------+
    
    .

    Comment


    • #3
      Thank you for your response!
      I think that two lines will work for this data set.
      egen signal=total (status==1), by (id)
      replace signal=signal+1
      Last edited by smith Jason; 28 Mar 2022, 11:40.

      Comment


      • #4
        s
        Originally posted by Andrew Musau View Post
        Your conditions seem to imply that signal is equal to one if no status from start to end, otherwise one plus the longest status spell. Presumably, you should have a time variable in the dataset.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(id status)
        1 1
        1 1
        1 1
        1 0
        1 0
        1 0
        1 0
        1 0
        1 0
        2 0
        2 0
        2 0
        2 0
        2 0
        3 1
        3 1
        3 1
        3 1
        3 1
        3 1
        3 1
        4 1
        4 0
        4 0
        4 0
        5 0
        5 1
        5 1
        5 0
        5 1
        5 1
        5 1
        end
        
        gen time=_n
        bys id (time): replace time=_n
        xtset id time
        bys id: gen spell= sum(status & l.status!=1)
        replace spell=0 if !status
        bys id spell: gen count=-_N if spell
        bys id (count): gen signal= cond(missing(count[1])&missing(count[_N])&!status[1]&!status[_N],1,abs(count[1])+1)
        drop spell count
        Res.:

        Code:
        . sort id time
        
        . l, sepby(id)
        
        +-----------------------------+
        | id status time signal |
        |-----------------------------|
        1. | 1 1 1 4 |
        2. | 1 1 2 4 |
        3. | 1 1 3 4 |
        4. | 1 0 4 4 |
        5. | 1 0 5 4 |
        6. | 1 0 6 4 |
        7. | 1 0 7 4 |
        8. | 1 0 8 4 |
        9. | 1 0 9 4 |
        |-----------------------------|
        10. | 2 0 1 1 |
        11. | 2 0 2 1 |
        12. | 2 0 3 1 |
        13. | 2 0 4 1 |
        14. | 2 0 5 1 |
        |-----------------------------|
        15. | 3 1 1 8 |
        16. | 3 1 2 8 |
        17. | 3 1 3 8 |
        18. | 3 1 4 8 |
        19. | 3 1 5 8 |
        20. | 3 1 6 8 |
        21. | 3 1 7 8 |
        |-----------------------------|
        22. | 4 1 1 2 |
        23. | 4 0 2 2 |
        24. | 4 0 3 2 |
        25. | 4 0 4 2 |
        |-----------------------------|
        26. | 5 0 1 4 |
        27. | 5 1 2 4 |
        28. | 5 1 3 4 |
        29. | 5 0 4 4 |
        30. | 5 1 5 4 |
        31. | 5 1 6 4 |
        32. | 5 1 7 4 |
        +-----------------------------+
        
        .
        There is no id==5 in the original data set. I know you would like to add this data record for illustration. However, it is incorrect because it is not allowed to have a zero between consecutive 1s. Anyway, thank you for your kindly help!

        Comment

        Working...
        X