Hi,
I have a panel dataset with 13 waves and my dataset involves questionnaire responses from individuals in the Netherlands.
The dependent variable is binary, measuring an individual's ability to save (saving=1 if individual indicated an ability to save; 0 otherwise).
My regression is as follows (please note that incomescaled is income divided by 1000 because this makes AME interpretations at a later stage more meaningful):
I would like to investigate missingness in my dataset. In particular my aim is to see whether the attrition in my dataset is random and informative - I would like to see if there are differences between the attriting and non-attriting samples.
I think this can be done by conducting significance tests of missingness, so I have done the following for incomescaled, to see if there is a significant difference in income between the attrited and non-attrited sample (because theoretically maybe more poorer households left the sample, which may then lead to sample bias due to under-representation of poor households)
From this could I conclude that there is a difference in the incomes of the attrited and non-attrited samples?
Or could someone please suggest an alternative method if the above is incorrect?
Please let me know if further clarification is required
Thank you
I have a panel dataset with 13 waves and my dataset involves questionnaire responses from individuals in the Netherlands.
The dependent variable is binary, measuring an individual's ability to save (saving=1 if individual indicated an ability to save; 0 otherwise).
Code:
. xtdes
hhid: 6, 21, ..., 89972 n = 2976
year: 2004, 2005, ..., 2016 T = 13
Delta(year) = 1 unit
Span(year) = 13 periods
(hhid*year uniquely identifies each observation)
Code:
. gen incomescaled = income/1000 . xtprobit saving $xlist employed retired health incomescaled risk > selfcontrol child savingexp partner uni owner male c.age##c.age > i.year, re vce(cluster hhid) nolog
I think this can be done by conducting significance tests of missingness, so I have done the following for incomescaled, to see if there is a significant difference in income between the attrited and non-attrited sample (because theoretically maybe more poorer households left the sample, which may then lead to sample bias due to under-representation of poor households)
Code:
. mdesc saving incomescaled
Variable | Missing Total Percent Missing
----------------+-----------------------------------------------
saving | 266 13,217 2.01
incomescaled | 5,759 13,217 43.57
----------------+-----------------------------------------------
.
. gen incomescaled_m=1 if incomescaled==.
(7,458 missing values generated)
.
. replace incomescaled_m=0 if incomescaled!=.
(7,458 real changes made)
.
. tab incomescaled_m
incomescale |
d_m | Freq. Percent Cum.
------------+-----------------------------------
0 | 7,458 56.43 56.43
1 | 5,759 43.57 100.00
------------+-----------------------------------
Total | 13,217 100.00
.
. sort incomescaled_m
.
. by incomescaled_m: su saving
--------------------------------------------------------------------------------------------
-> incomescaled_m = 0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
saving | 7,330 .4278308 .494798 0 1
--------------------------------------------------------------------------------------------
-> incomescaled_m = 1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
saving | 5,621 .3239637 .468028 0 1
.
. ttest saving, by(incomescaled_m)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 7,330 .4278308 .0057793 .494798 .4165017 .4391599
1 | 5,621 .3239637 .0062426 .468028 .3117258 .3362016
---------+--------------------------------------------------------------------
combined | 12,951 .3827504 .0042712 .4860769 .3743781 .3911226
---------+--------------------------------------------------------------------
diff | .1038671 .0085697 .0870693 .120665
------------------------------------------------------------------------------
diff = mean(0) - mean(1) t = 12.1203
Ho: diff = 0 degrees of freedom = 12949
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
Or could someone please suggest an alternative method if the above is incorrect?
Please let me know if further clarification is required
Thank you

Comment