Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correcting inconsistency of variable "age" in panel dataset

    Hello,

    I was wondering if anyone would be able to help me to solve the following problem. I have a panel dataset with 3 waves (2014, 2016, and 2019). Problem is mother’s average age (in years) in 2nd wave is smaller than mother’s average age in 1st wave.
    I can assume that mother’s age was reported correctly in 1st wave. In this case, if I want to add 2 years with mother’s age in 1st wave to get age in 2nd wave and add 3 years with mother’s age in 2nd wave to get age in 3rd wave, what command I can use in Stata?

    Any help is greatly appreciated.
    ​​​​​​​Afroza
    Last edited by Afroza Ahammed; 29 Dec 2021, 12:20.

  • #2
    Assuming your data are in long layout, and the the mother's age variable is called age and is a correct numeric variable, and that each person or household or whatever your unit of analysis is has a variable uniquely identifying it--which I'll call unique_id, and there is a numeric variable called year that takes on the values 2014, 2016, and 2019, you can do this:

    Code:
    isid unique_id year, sort
    by unique_id (year): egen corrected_age = max(cond(year == 2014, age, .))
    replace corrected_age = corrected_age + (year - 2014)
    Because you did not provide example data, this code is untested. It may contain typos or other errors, and may be entirely unsuited to your data if my guesses about the latter are incorrect. In the future, when asking for help with coding, always show example data, and always do that using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: Assuming, as is conventional, at least here in the US, that people respond to age questions with their age at preceding birthday, it is not necessarily the case that the person will be 2 years older at wave2 than she was at wave 1, depending on whether her birthday falls between the dates she responded in the two waves. The amount of error this will introduce will, in general be small, particularly if there is good consistency about the time of year when the surveys are administered. But at least be aware that this solution is imperfect.

    One other question: have you even diagnosed the problem correctly? Have you looked within person to see if the ages are not in correct order? You could easily have mean age at wave 2 lower than at age 1, even with all ages being accurately reported, if there is dropout between the two waves and older women were more likely to drop out than younger women. If this is what happened, you will still observe the same phenomenon with these corrected ages.
    Last edited by Clyde Schechter; 29 Dec 2021, 12:53.

    Comment

    Working...
    X