Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacement of missing values (.)

    Dear all,

    As shown in my dataex, I have a panel data collected for the years 2013 and 2015. My "sex" variable (defined 1=male, and 0=female) has missed values either in the year 2013 or 2015. Example: the "sex" of an individual with an ID number "0101010160100202" is 1 during 2013, but missed (.) in 2015. Assuming an individual's gender is constant over time, I want to replace the missed value by the available, non missing "sex" value in 2013. How could I do this replacement in Stata? Thanks for helping.

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 household_id str16 individual_id float year byte sex
    "01010101601002" "0101010160100201" 2013 0
    "01010101601002" "0101010160100201" 2015 0
    "01010101601002" "0101010160100202" 2013 1
    "01010101601002" "0101010160100202" 2015 .
    "01010101601002" "0101010160100203" 2013 0
    "01010101601002" "0101010160100203" 2015 .
    "01010101601002" "0101010160100204" 2013 .
    "01010101601002" "0101010160100204" 2015 .
    "01010101601002" "0101010160100205" 2013 1
    "01010101601002" "0101010160100205" 2015 .
    "01010101601002" "0101010160100206" 2013 0
    "01010101601002" "0101010160100206" 2015 .
    "01010101601002" "0101010160100207" 2013 1
    "01010101601002" "0101010160100207" 2015 .
    "01010101601002" "0101010160100208" 2013 0
    "01010101601002" "0101010160100208" 2015 .
    "01010101601017" "0101010160101701" 2013 1
    "01010101601017" "0101010160101701" 2015 1
    "01010101601017" "0101010160101702" 2013 0
    "01010101601017" "0101010160101702" 2015 0
    "01010101601017" "0101010160101703" 2013 1
    "01010101601017" "0101010160101703" 2015 1
    "01010101601017" "0101010160101704" 2013 1
    "01010101601017" "0101010160101704" 2015 1
    "01010101601017" "0101010160101705" 2013 0
    "01010101601017" "0101010160101705" 2015 0
    "01010101601017" "0101010160101706" 2013 0
    "01010101601017" "0101010160101706" 2015 1
    "01010101601017" "0101010160101707" 2013 1
    "01010101601017" "0101010160101707" 2015 0
    "01010101601017" "0101010160101708" 2013 0
    "01010101601017" "0101010160101708" 2015 0
    "01010101601034" "0101010160103401" 2013 0
    "01010101601034" "0101010160103401" 2015 0
    "01010101601049" "0101010160104901" 2013 0
    "01010101601049" "0101010160104901" 2015 0
    "01010101601049" "0101010160104902" 2013 0
    "01010101601049" "0101010160104902" 2015 .
    "01010101601049" "0101010160104903" 2013 0
    "01010101601049" "0101010160104903" 2015 0
    "01010101601049" "0101010160104904" 2013 0
    "01010101601049" "0101010160104904" 2015 0
    "01010101601064" "0101010160106401" 2013 1
    "01010101601064" "0101010160106401" 2015 1
    "01010101601064" "0101010160106402" 2013 0
    "01010101601064" "0101010160106402" 2015 0
    "01010101601064" "0101010160106403" 2013 0
    "01010101601064" "0101010160106403" 2015 .
    "01010101601064" "0101010160106404" 2013 0
    "01010101601064" "0101010160106404" 2015 .
    "01010101601080" "0101010160108001" 2013 0
    "01010101601080" "0101010160108001" 2015 0
    "01010101601080" "0101010160108002" 2013 1
    "01010101601080" "0101010160108002" 2015 1
    "01010101601080" "0101010160108003" 2013 0
    "01010101601080" "0101010160108003" 2015 0
    "01010101601080" "0101010160108004" 2013 1
    "01010101601080" "0101010160108004" 2015 0
    "01010101601087" "0101010160108701" 2013 1
    "01010101601087" "0101010160108701" 2015 1
    "01010101601087" "0101010160108702" 2013 0
    "01010101601087" "0101010160108702" 2015 0
    "01010101601087" "0101010160108703" 2013 0
    "01010101601087" "0101010160108703" 2015 .
    "01010101601087" "0101010160108704" 2013 0
    "01010101601087" "0101010160108704" 2015 0
    "01010101601087" "0101010160108705" 2013 1
    "01010101601087" "0101010160108705" 2015 1
    "01010101601087" "0101010160108706" 2013 0
    "01010101601087" "0101010160108706" 2015 0
    "01010101601101" "0101010160110101" 2013 1
    "01010101601101" "0101010160110101" 2015 1
    "01010101601101" "0101010160110102" 2013 0
    "01010101601101" "0101010160110102" 2015 0
    "01010101601101" "0101010160110103" 2013 1
    "01010101601101" "0101010160110103" 2015 1
    "01010101601101" "0101010160110104" 2013 1
    "01010101601101" "0101010160110104" 2015 1
    "01010101601101" "0101010160110105" 2013 0
    "01010101601101" "0101010160110105" 2015 0
    "01010101601116" "0101010160111601" 2013 1
    "01010101601116" "0101010160111601" 2015 1
    "01010101601116" "0101010160111602" 2013 1
    "01010101601116" "0101010160111602" 2015 1
    "01010101601116" "0101010160111603" 2013 1
    "01010101601116" "0101010160111603" 2015 1
    "01010101601116" "0101010160111604" 2013 1
    "01010101601116" "0101010160111604" 2015 1
    "01010101601116" "0101010160111605" 2013 0
    "01010101601116" "0101010160111605" 2015 0
    "01010101601116" "0101010160111606" 2013 1
    "01010101601116" "0101010160111606" 2015 1
    "01010101601131" "0101010160113101" 2013 1
    "01010101601131" "0101010160113101" 2015 .
    "01010101601131" "0101010160113102" 2013 0
    "01010101601131" "0101010160113102" 2015 0
    "01010101601131" "0101010160113103" 2013 1
    "01010101601131" "0101010160113103" 2015 1
    "01010101601131" "0101010160113104" 2013 0
    "01010101601131" "0101010160113104" 2015 0
    end
    label values sex sex
    label def sex 0 "female", modify
    label def sex 1 "male", modify
    ------------------ copy up to and including the previous line ------------------


  • #2
    Just use ipolate with the epolate option. If you get a result that isn't 0 or 1 or missing, back off.

    Comment


    • #3
      Your example data and explanation suggest you have only two observations per person, one each for 2013 and 2015. The code below will still work even if there are a larger number of years. I have taken the extra step of verifying that when the sex variable is not missing, it remains consistent in all observations for the person. If the first command gives you an error message "assertion is false" then you have somebody in your data for whom the sex variable has changed. You should then investigate whether this is a gender-transition or, more likely, a data error.

      Code:
      by household_id individual_id (sex), sort: assert sex == sex[1] | missing(sex)
      by household_id individual_id (sex): replace sex = sex[1]
      Added: Crossed with #2, which offers a different approach.

      Comment


      • #4
        Thanks so much, dear Nick.

        Comment


        • #5
          Many thanks @ dear Clyde! The first command gave me the "assertion is false" error message as you anticipate due to the change in sex over time for some observations (most probably due to data error as trans-gender is illegal in the geography of data collection). Slight modification of your second code into "by household_id individual_id (sex), sort: replace sex = sex[1]" solved my problem. The beauty of this command is that it maintains similar "sex" across years in case "sex" value varies between 2013 and 2015 for the same individual. I am so thankful for your solution.


          Comment

          Working...
          X