Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replace variable values with first meaningful value in Panel study

    Dear all,
    I am looking to replace by pidp the missing and the negative values in racel_bh, with the only positive value reported over the time period. In the following examples the value that substitute the other is 1 but this takes a range of 1-18. The variables is recorded from wave 13 but the individuals can enter the panel before or after. I am required to replace it from wave 13 (included) since before I have a different variable.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pidp float wave byte(racel_bh)
     1073047 11  .
     1073047 12  .
     1073047 13  1
     1073047 14 -8
     1073047 15 -8
     1073047 16 -8
    10485684 13  .
    10485684 14  1
    10485684 15 -8
    10485684 16 -8
    10499284 14  1
    10499284 15 -8
    12545878 15  1
    12545878 16 -8
    12545878 17 -8
    12545878 18 -8
    68441325 13  .    
    68441325 14  .
    68441325 15  1        
    68441325 16 -8        
    68441325 17 -8    
    68441325 18 -8
    end
    label values racel_bh bm_racel_bh
    label def bm_racel_bh -8 "inapplicable", modify
    label def bm_racel_bh 1 "White British", modify
    Last edited by Pio Medolla; 06 Feb 2022, 05:43.

  • #2
    Thanks for your data example, although the command

    Code:
    label values sex a_sex 
    causes the code to fail, suggesting that you took dataex output and then decided to edit it without checking that the code would still work.


    You want to find a positive value of the race variable and spread it to all observations for the same identifier. If there were at most one such distinct positive value, it wouldn't matter how you found it as the minimum, maximum, mean or indeed several other summaries would all return that value. Here I look for the minimum such value and as a check look for the maximum too. In the data example there is no problem but in a larger dataset it would be prudent to check that wanted == check.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pidp float wave byte(racel_bh)
     1073047 11  .
     1073047 12  .
     1073047 13  1
     1073047 14 -8
     1073047 15 -8
     1073047 16 -8
    10485684 13  .
    10485684 14  1
    10485684 15 -8
    10485684 16 -8
    10499284 14  1
    10499284 15 -8
    12545878 15  1
    12545878 16 -8
    12545878 17 -8
    12545878 18 -8
    68441325 13  .    
    68441325 14  .
    68441325 15  1        
    68441325 16 -8        
    68441325 17 -8    
    68441325 18 -8
    end
    
    label values racel_bh bm_racel_bh
    label def bm_racel_bh -8 "inapplicable", modify
    label def bm_racel_bh 1 "White British", modify
    
    egen wanted = min(cond(inrange(racel_bh, 1, .), racel_bh, .)), by(pidp)
    egen check = max(cond(inrange(racel_bh, 1, .), racel_bh, .)), by(pidp)
    
    list, sepby(pidp)
    
         +--------------------------------------------------+
         |     pidp   wave        racel_bh   wanted   check |
         |--------------------------------------------------|
      1. |  1073047     11               .        1       1 |
      2. |  1073047     12               .        1       1 |
      3. |  1073047     13   White British        1       1 |
      4. |  1073047     14    inapplicable        1       1 |
      5. |  1073047     15    inapplicable        1       1 |
      6. |  1073047     16    inapplicable        1       1 |
         |--------------------------------------------------|
      7. | 10485684     13               .        1       1 |
      8. | 10485684     14   White British        1       1 |
      9. | 10485684     15    inapplicable        1       1 |
     10. | 10485684     16    inapplicable        1       1 |
         |--------------------------------------------------|
     11. | 10499284     14   White British        1       1 |
     12. | 10499284     15    inapplicable        1       1 |
         |--------------------------------------------------|
     13. | 12545878     15   White British        1       1 |
     14. | 12545878     16    inapplicable        1       1 |
     15. | 12545878     17    inapplicable        1       1 |
     16. | 12545878     18    inapplicable        1       1 |
         |--------------------------------------------------|
     17. | 68441325     13               .        1       1 |
     18. | 68441325     14               .        1       1 |
     19. | 68441325     15   White British        1       1 |
     20. | 68441325     16    inapplicable        1       1 |
     21. | 68441325     17    inapplicable        1       1 |
     22. | 68441325     18    inapplicable        1       1 |
         +--------------------------------------------------+
    The egen calls look perhaps tricky but the logic is just to focus on positive values and ignore negatives and missings. It's not obvious without reading the help but inrange(racel_bh, 1, .) is not true if the first argument is missing..

    Code:
    . di inrange(42, 1, .)
    1
    
    . di inrange(., 1, .)
    0

    Looking for values 1 up captures any positive integers,


    https://www.stata-journal.com/articl...article=dm0055 says more. See Section 9.

    Comment


    • #3
      Dear Nick,

      thanks for your reply, I did modify the datex with the aim to provide more useful information (and not a bunch of trivial pidps and other variabels), next time I will run the code again to make sure that it will eventually work.

      All good, I was able to extend positive values of racel_bh (and other variables as well) to other observations without any problem after your code.

      I also verified that check==wanted at each observation and they are indeed:
      Code:
      . summ pidp if check != wanted
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
              pidp |          0
      
      . summ pidp if check == wanted
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
              pidp |    715,183    6.28e+08    4.67e+08        687   1.65e+09
      Thanks again Nick

      Comment


      • #4
        Thanks for the nice closure!

        Comment

        Working...
        X