Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • subtracting values from a series of variables

    I have a dataset with five variables (r11livsib to r15livsib) that record the number of living siblings reported by each participant across five waves of data collection, along with an ID variable (hhidpn). I aim to calculate the number of siblings who passed away for each participant between consecutive waves.

    However, there are two challenges:
    1. Inconsistent Reporting: Some participants reported an increase in the number of living siblings over time, which is logically inconsistent since the number of siblings can only decrease or remain the same. For example, participant hhidpn = 540244010 reported 4 living siblings in r13livsib but 8 in r15livsib.
    2. Missing Data: Some participants have missing values for one or more waves.
    Here’s what I want to do:
    1. For inconsistent entries where the number of living siblings in a later wave exceeds the earlier wave, assign a value of -888 to the calculated number of deceased siblings for that wave.
    2. For valid entries (where the number of living siblings decreases or stays the same), compute the number of deceased siblings as the difference between the two waves.
    3. For missing values, assign a code of -999 to indicate missing data.
    My goal is to generate a new set of variables that reflect the number of deceased siblings for each participant across the waves, with special codes for inconsistent or missing data.

    I have some code I wrote attached below as well.


    input long hhidpn byte(r11livsib r12livsib r13livsib r14livsib r15livsib)
    74379010 . . . . .
    203464020 . . . . .
    204640010 . . . . .
    79613010 4 . . 4 .
    910474010 . . 7 . 7
    55388010 . . . . .
    206025010 . . . . .
    502305020 . . . . .
    212090010 . . . . .
    201875010 . . . . .
    201152010 . . . . .
    82990020 . . . . .
    206281010 . . . . .
    57586030 . . . . .
    201200010 . . . . .
    200516020 . . . . .
    202811020 . . . . .
    540244010 . . 4 . 8
    211913010 . . . . .
    202212010 . . . . .
    116697020 . . . . .
    60076020 . . . . .
    47679030 . . . . .
    200792010 . . . . .
    200994020 . . . . .
    206976020 . . . . .
    11863010 . . . . .
    200861010 . . . . .
    20918020 . 11 . . .
    58401010 . . . . .
    79537040 . . . . .
    912349011 . . . . 3
    200183020 . . . . .
    207059010 . . . . .
    502138020 . . . . .
    41043010 . . . . .
    56660010 . . . . .
    205771020 . . . . .
    210031020 . . . . .
    543927020 . . . . .
    501016010 . . . . .
    45943010 . . . . .
    41901010 1 . . . .
    208830010 . . . . .
    21225010 . . . . .
    202382010 . . . . .
    45781010 . . . . .
    18157010 . . . . .
    202393010 . . . . .
    210123010 . . . . .
    206569010 . . . . .
    211374010 . . . . .
    203044020 . . . . .
    213253020 . . . . .
    205150010 . . . . .
    203639010 . . . . .
    87483010 . . . . .
    48604020 . . . . .
    206080010 . . . . .
    500086020 . 5 . . .
    174300010 . . . . .
    202424010 . . . . .
    203214010 . . . . .
    61107030 . . . . .
    208259010 . . . . .
    201612020 . . . . .
    86023020 . . . . .
    37025010 . . . . .
    55831020 . . . . .
    43284010 . . . . .
    205274010 . . . . .
    52632010 . . . . .
    54794010 . . . . .
    204370010 . . . . .
    53063040 . . . . .
    59300020 10 . . . .
    212238010 . . . . .
    38739010 2 . . . .
    212137020 1 . . . .
    544928010 . . 3 . 3
    45759020 . . . . .
    204567010 . . . . .
    207233010 . . . . .
    22731010 5 . . . .
    500651020 3 . . . .
    57210010 . . . . .
    200527010 . 1 . . .
    11787040 . . . . .
    202563020 . . . . .
    204060020 1 . . . .
    203704020 . . . . .
    205914020 . . . . .
    205014010 . . . . .
    11155010 . . . . .
    22938010 . . . . .
    211447010 . . . . .
    84778010 . . . . .
    78493040 . . . . .
    210535010 . . . . .
    212805010 . . . . .
    end
    [/CODE]




    local siblings r11livsib r12livsib r13livsib r14livsib r15livsib


    local i = 1
    foreach v of local siblings{

    recode `v' (.m=.) (.d=.) (.r=.)

    }


    forvalue i = 1/4{

    bys hhidpn: gen sibling_death_wave`i' =.
    }



    bys hhidpn: replace sibling_death_wave1 = -999 if (r11livsib == . | r12livsib == .)

    bys hhidpn :replace sibling_death_wave1 = -888 if (r12livsib > r11livsib) & (!missing(r12livsib) & !missing(r11livsib))

    bys hhidpn: replace sibling_death_wave1 = (r11livsib - r12livsib) if (sibling_death_wave1 != -888 ) & (sibling_death_wave1 != -999)



    **** Sibling deaths wave2*******


    bys hhidpn: replace sibling_death_wave2 = -999 if (r12livsib == . | r13livsib == .)

    bys hhidpn :replace sibling_death_wave2 = -888 if (r12livsib > r13livsib) & (!missing(r12livsib) & !missing(r13livsib))

    bys hhidpn: replace sibling_death_wave2 = (r12livsib - r13livsib) if (sibling_death_wave2 != -888 ) & (sibling_death_wave2 != -999)

    ****sibling deaths wave3*****


    bys hhidpn: replace sibling_death_wave3 = -999 if (r13livsib == . | r14livsib == .)

    bys hhidpn :replace sibling_death_wave3 = -888 if (r14livsib > r13livsib) & (!missing(r13livsib) & !missing(r14livsib))

    bys hhidpn: replace sibling_death_wave3 = (r13livsib - r14livsib) if (sibling_death_wave3 != -888 ) & (sibling_death_wave3 != -999)

    **** sibling death wave4*****

    bys hhidpn: replace sibling_death_wave4 = -999 if (r15livsib == . | r14livsib == .)

    bys hhidpn :replace sibling_death_wave4 = -888 if (r15livsib > r14livsib) & (!missing(r14livsib) & !missing(r15livsib))

    bys hhidpn: replace sibling_death_wave4 = (r14livsib - r15livsib) if (sibling_death_wave4 != -888 ) & (sibling_death_wave4 != -999)


  • #2
    Well, your code can be considerably simplified. But I also think you should reconsider what you're doing in two ways.
    1. Don't use magic numbers (-888, -999) to represent uncomputable or invalid results: use Stata extended missing values. If you use -888 and -999, any calculations you do with the variables you create will have to be complicated in the code, and slowed down in execution, by -if !inlist(variable, -888, -999)-. And it is likely that somewhere along the line you will mistakenly omit that, and then your resulting calculations will be wrong.
    2. In some instances, you have a certain number of living siblings reported in some round, and then the same number is reported in a later round, with one or more missing values of living siblings in between. Since you have stipulated that the number of siblings can never increase in this data, you can infer that the number of living siblings stayed the same between those observations. This can be filled in.
    All in all, this is what I would do with this:
    Code:
    reshape long r1@livsib, i(hhidpn) j(round)
    
    by hhidpn (round): gen last_valid = r1livsib if _n == 1
    by hhidpn (round): replace last_valid = cond(r1livsib <= last_valid[_n-1], ///
        r1livsib, last_valid[_n-1]) if _n > 1
    gsort hhidpn -round
    by hhidpn: gen next_up = r1livsib if _n == 1
    by hhidpn: replace next_up = cond(!missing(r1livsib), r1livsib, ///
        next_up[_n-1]) if _n > 1
        
    sort hhidpn round
    clonevar livsib = r1livsib
    replace livsib = last_valid if next_up == last_valid & missing(livsib)
    
    label define deceased_siblings    .i     "Inconsistent"    .m "Missing"
    by hhidpn (round): gen deceased_siblings:deceased_siblings = livsib - livsib[_n-1]
    by hhidpn (round): replace deceased_siblings = .m if missing(livsib) & _n > 1
    by hhidpn (round): replace deceased_siblings = .i if livsib > last_valid & !missing(livsib)
    The above code calculates variables livsib, representing the number of living siblings, with missing values replaced by the inferred number where this is possible, and deceased_siblings, representing the number of siblings that died since the previous round, with some extended missing values explaining the cases where the data was inconsistent or missing. (For the first round, this is always just system missing value (.) because there is no previous round to serve as a reference.) It also calculates intermediate variables last_valid and next_up that are used to determine when it is possible to infer the number of living siblings.

    You may want to leave the data in this long layout for further calculations, as most Stata data management and analysis is simpler, or only possible, in long layout. However, if you need to go back to the original arrangement of the data, retaining the newly calculated variables:
    Code:
    keep hhidpn round r1livsib livsib deceased_siblings
    reshape wide r1@livsib livsib deceased_siblings, i(hhidpn) j(round)

    Comment

    Working...
    X