Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fill in missing values for variable 'AGE' of an executive based on age values I do know from that executive

    Hi,
    In my dataset of executives I have age values (AGE) for the executives (EXECID) but not for every year (Year). Therefore, I want to fill in the age of the executives in the missing years based on the years I do know from them. I have added a screenshot of my browser to make it clear. EXECID is the ID for an executive. In the screenshot you can see that for the EXECID the previous year(s) are missing. I want Stata to fill in those based on the other age values of the same EXECID.

    Which command will let Stata know to fill in the age of the executive of the previous year with the missing value?
    And which command will let Stata know to fill in the age of the executive of the next year with a missing value?

    Thanks in advance.
    Attached Files

  • #2
    EXECID "00246" is 31 in both 2006 and 2007. You need to sort out such inconsistencies. But you can determine the birthyear and then calculate age:

    Code:
    gen birthyear= Year- AGE
    bys EXECID (birthyear): replace birthyear= birthyear[1]
    replace AGE = Year- birthyear if missing(AGE)

    Comment


    • #3
      The code Andrew Musau presents is very much like what I have used in similar circumstances.

      In my work, however, it was sometimes the case that one calculated birthyear would be an outlier - likely due to a survey response or data entry error. I found that instead of something like
      Code:
      bys EXECID (birthyear): replace birthyear= birthyear[1]
      which trusts the oldest calculated birthyear, it was fruitful to use something like
      Code:
      bys EXECID: egen mbirthyear = mode(birthyear), minmode
      which trusts the most common calculated birthyear, ignoring missing values and breaking ties in favor of the oldest of the tied values.

      Comment


      • #4
        Clever, William Lisowski!

        Comment


        • #5
          This problem arises quite often. Another solution to use ipolate age year, epolate by(id) is rarely mentioned.

          It remains true that inconsistencies need to be checked for, with

          Code:
          bysort id (year) : gen bad1 = (age - age[_n-1]) != (year - year[_n-1])  
          by id year (age) : gen bad2 = age[1] != age[_N]
          as two checks to start with
          Last edited by Nick Cox; 03 May 2022, 11:40.

          Comment


          • #6
            To add detail to my reasoning in post #3, the context is a panel survey of families that in each wave asks for each family member's age at the time of the survey administration. So depending on when in the year a birthday falls, and when in the year the survey team administers the survey, tests like the first one in post #5 will not be passed.

            In fact, the survey also directly asks for each individual's birth year as well as age. My exploration of the data found the reported birth years to be more consistent than the reported ages, and my code uses the modal reported birth year to calculate and report each individual's age in years as that attained on their birthday in the year of the current wave.

            Comment

            Working...
            X