Dear Stata experts,
I would need to run an OLS regression where the dependent variable and an independent variable derive from the same categorical variable: in particular, I am using data on education and seeing how educational attainment changes after the implementatio of a reform.
The dependent variable is sec_compl and takes value 1 if the respondent at least completed secondary education (so it comprises those who left school after completing and also those who enrolled in university) and 0 otherwise. The independent variables are cs_treat, a dummy=1 if exposed to the reform, 0 if not, and higher, a dummy=1 if the respondent enrolled in university and 0 otherwise.
In other words, higher is contained in sec_compl.
Here are the regression results:
reg sec_compl cs_treat higher, vce(cluster v001)
Linear regression Number of obs = 35,098
F(2, 3710) = 8919.79
Prob > F = 0.0000
R-squared = 0.3493
Root MSE = .39472
(Std. Err. adjusted for 3,711 clusters in v001)
------------------------------------------------------------------------------
| Robust
sec_compl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cs_treat | .0387245 .0042615 9.09 0.000 .0303694 .0470796
higher | .6062808 .0045538 133.14 0.000 .5973527 .615209
_cons | .3728845 .0050688 73.56 0.000 .3629466 .3828223
------------------------------------------------------------------------------
Should I interpret the coefficient on cs_treat as the effect of the reform on the probability of completing secondary school and then leaving education?
Would you say there is another way to keep all the observation and to measure the effect of the reform only on this fraction of respondents (who complete secondary school but do not enroll in university)?
I hope I expressed my doubts clearly and I thank you in advance for your help!
I would need to run an OLS regression where the dependent variable and an independent variable derive from the same categorical variable: in particular, I am using data on education and seeing how educational attainment changes after the implementatio of a reform.
The dependent variable is sec_compl and takes value 1 if the respondent at least completed secondary education (so it comprises those who left school after completing and also those who enrolled in university) and 0 otherwise. The independent variables are cs_treat, a dummy=1 if exposed to the reform, 0 if not, and higher, a dummy=1 if the respondent enrolled in university and 0 otherwise.
In other words, higher is contained in sec_compl.
Here are the regression results:
reg sec_compl cs_treat higher, vce(cluster v001)
Linear regression Number of obs = 35,098
F(2, 3710) = 8919.79
Prob > F = 0.0000
R-squared = 0.3493
Root MSE = .39472
(Std. Err. adjusted for 3,711 clusters in v001)
------------------------------------------------------------------------------
| Robust
sec_compl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cs_treat | .0387245 .0042615 9.09 0.000 .0303694 .0470796
higher | .6062808 .0045538 133.14 0.000 .5973527 .615209
_cons | .3728845 .0050688 73.56 0.000 .3629466 .3828223
------------------------------------------------------------------------------
Should I interpret the coefficient on cs_treat as the effect of the reform on the probability of completing secondary school and then leaving education?
Would you say there is another way to keep all the observation and to measure the effect of the reform only on this fraction of respondents (who complete secondary school but do not enroll in university)?
I hope I expressed my doubts clearly and I thank you in advance for your help!
Comment