I am using xtdidregress command in Stata 18.0. Purpose of doing this is to identify the causal effect of an educational program on treated cohort vs control cohort. Treated cohort was born 4 years later than control cohort.
This is the command I used
I am using age_group as my panel variable because I can only compare the 2 groups across age_group and not the year (due to the 4 year gap between them). The policy was implemented for the treated cohort when they are 6-7 years old. Despite having data for all observations in the first wave (i.e., 4-5 age group), when I run the above command, I receive following output
treat = all observations in the treated cohort (that is younger cohort).
control = all observations in the untreated (i.e., earlier) cohort
post = all observations in both cohorts that is aged 6 and above is marked as 1 and 0 otherwise.
did = post*treat
These are my problems:
1) A note appears
. When I checked the number of observations for treat and did (i.e., post*treat), there is a variation, and they are not the same. That means there should not be collinearity.
So I can't figure out why Stata omits treat variable stating collinearity problem.
2) In the first output, under time, the maximum time for both groups is 8, indicating that each group has observations that first appear in different ages. But this is not true when I check the data. All observations for control group first appears only at the age of 4 and no observations start from age 8. Similarly all observations for treated group first appears only at the age of 6.
Is there a way to ask Stata to give me a list of unique IDs where above (2) doesn't hold?
This is the command I used
Code:
xtset hicid age_group xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group)
Code:
. xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group) note: 1.treat omitted because of collinearity. Treatment and time information Time variable: age_group Control: did = 0 Treatment: did = 1 Control Treatment Group hicid 980 906 Time Minimum 4 6 Maximum 8 8 Difference-in-differences regression Number of obs = 8,036 Data type: Longitudinal (Std. err. adjusted for 1,886 clusters in hicid) Robust learning Coefficient std. err. t P>t [95% conf. interval] ATET did (1 vs 0) .0979579 .0461282 2.12 0.034 .0074902 .1884255 Note: ATET estimate adjusted for covariates and panel effects. Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.
control = all observations in the untreated (i.e., earlier) cohort
post = all observations in both cohorts that is aged 6 and above is marked as 1 and 0 otherwise.
did = post*treat
These are my problems:
1) A note appears
Code:
note: 1.treat omitted because of collinearity.
Code:
. tabulate treat if cohort == "B" treat | Freq. Percent Cum. ------------+----------------------------------- 1 | 4,953 100.00 100.00 ------------+----------------------------------- Total | 4,953 100.00 . tabulate did if cohort == "B" did | Freq. Percent Cum. ------------+----------------------------------- 0 | 906 18.29 18.29 1 | 4,047 81.71 100.00 ------------+----------------------------------- Total | 4,953 100.00
2) In the first output, under time, the maximum time for both groups is 8, indicating that each group has observations that first appear in different ages. But this is not true when I check the data. All observations for control group first appears only at the age of 4 and no observations start from age 8. Similarly all observations for treated group first appears only at the age of 6.
Is there a way to ask Stata to give me a list of unique IDs where above (2) doesn't hold?