Hi,
I have a panel dataset with 13 waves and my dataset involves questionnaire responses from individuals in the Netherlands.
The dependent variable is binary, measuring an individual's ability to save (saving=1 if individual indicated an ability to save; 0 otherwise).
My regression is as follows (please note that incomescaled is income divided by 1000 because this makes AME interpretations at a later stage more meaningful):
I would like to investigate missingness in my dataset. In particular my aim is to see whether the attrition in my dataset is random and informative - I would like to see if there are differences between the attriting and non-attriting samples.
I think this can be done by conducting significance tests of missingness, so I have done the following for incomescaled, to see if there is a significant difference in income between the attrited and non-attrited sample (because theoretically maybe more poorer households left the sample, which may then lead to sample bias due to under-representation of poor households)
From this could I conclude that there is a difference in the incomes of the attrited and non-attrited samples?
Or could someone please suggest an alternative method if the above is incorrect?
Please let me know if further clarification is required
Thank you
I have a panel dataset with 13 waves and my dataset involves questionnaire responses from individuals in the Netherlands.
The dependent variable is binary, measuring an individual's ability to save (saving=1 if individual indicated an ability to save; 0 otherwise).
Code:
. xtdes hhid: 6, 21, ..., 89972 n = 2976 year: 2004, 2005, ..., 2016 T = 13 Delta(year) = 1 unit Span(year) = 13 periods (hhid*year uniquely identifies each observation)
Code:
. gen incomescaled = income/1000 . xtprobit saving $xlist employed retired health incomescaled risk > selfcontrol child savingexp partner uni owner male c.age##c.age > i.year, re vce(cluster hhid) nolog
I think this can be done by conducting significance tests of missingness, so I have done the following for incomescaled, to see if there is a significant difference in income between the attrited and non-attrited sample (because theoretically maybe more poorer households left the sample, which may then lead to sample bias due to under-representation of poor households)
Code:
. mdesc saving incomescaled Variable | Missing Total Percent Missing ----------------+----------------------------------------------- saving | 266 13,217 2.01 incomescaled | 5,759 13,217 43.57 ----------------+----------------------------------------------- . . gen incomescaled_m=1 if incomescaled==. (7,458 missing values generated) . . replace incomescaled_m=0 if incomescaled!=. (7,458 real changes made) . . tab incomescaled_m incomescale | d_m | Freq. Percent Cum. ------------+----------------------------------- 0 | 7,458 56.43 56.43 1 | 5,759 43.57 100.00 ------------+----------------------------------- Total | 13,217 100.00 . . sort incomescaled_m . . by incomescaled_m: su saving -------------------------------------------------------------------------------------------- -> incomescaled_m = 0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- saving | 7,330 .4278308 .494798 0 1 -------------------------------------------------------------------------------------------- -> incomescaled_m = 1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- saving | 5,621 .3239637 .468028 0 1 . . ttest saving, by(incomescaled_m) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 7,330 .4278308 .0057793 .494798 .4165017 .4391599 1 | 5,621 .3239637 .0062426 .468028 .3117258 .3362016 ---------+-------------------------------------------------------------------- combined | 12,951 .3827504 .0042712 .4860769 .3743781 .3911226 ---------+-------------------------------------------------------------------- diff | .1038671 .0085697 .0870693 .120665 ------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = 12.1203 Ho: diff = 0 degrees of freedom = 12949 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
Or could someone please suggest an alternative method if the above is incorrect?
Please let me know if further clarification is required
Thank you
Comment