I would like to generate a variable that approximates the day of birth based on the available survey data.
I have a continuous variable for age at the moment when the survey has been conducted, I have year and month of birth, I have the year of the survey, and I have three exact dates referring to three moments of the survey (i.e. when the survey started, when the survey ended, and the middle of the survey).
Below you can see a subsample of the dataset.
How would you proceed to estimate the day of birth?
[Please, note that everything else in the dataset is anonymized, so an estimate of the day of birth would not allow the identification of a person; plus, without the exact day of birth, identification would not be possible anyway]
I have a continuous variable for age at the moment when the survey has been conducted, I have year and month of birth, I have the year of the survey, and I have three exact dates referring to three moments of the survey (i.e. when the survey started, when the survey ended, and the middle of the survey).
Below you can see a subsample of the dataset.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float age double(yearbirth monthbirth Survey_Year Survey_Start Survey_End Survey_Middle) 15.660274 1995 9 2011 18724 18763 18743 15.928767 1991 4 2007 17198 17256 17227 16.019178 1991 4 2007 17258 17262 17260 16.019178 1991 4 2007 17258 17262 17260 16.279451 1995 1 2011 18721 18732 18726 15.59726 1995 5 2011 18567 18627 18597 15.517808 1991 11 2007 17252 17331 17291 15.838356 1995 5 2011 18672 18699 18685 15.59452 1991 8 2007 17198 17256 17227 15.635616 1995 8 2011 18659 18747 18703 15.750685 1991 8 2007 17272 17297 17284 15.69315 1995 8 2011 18714 18735 18724 15.673972 1991 9 2007 17257 17317 17287 15.91233 1995 6 2011 18724 18763 18743 15.671233 1995 7 2011 18672 18699 18685 15.517808 1991 10 2007 17258 17262 17260 15.50685 1991 11 2007 17257 17317 17287 15.630137 1992 9 2007 17592 17683 17637 15.421918 1995 12 2011 18721 18773 18747 15.48767 1995 12 2011 18764 18778 18771 end format %td Survey_Start format %td survey_End format %td survey_Middle label values yearbirth C02a label values monthbirth OC02b label def OC02b 1 "January", modify label def OC02b 4 "April", modify label def OC02b 5 "May", modify label def OC02b 6 "June", modify label def OC02b 7 "July", modify label def OC02b 8 "August", modify label def OC02b 9 "September", modify label def OC02b 10 "October", modify label def OC02b 11 "November", modify label def OC02b 12 "December", modify
[Please, note that everything else in the dataset is anonymized, so an estimate of the day of birth would not allow the identification of a person; plus, without the exact day of birth, identification would not be possible anyway]
Comment