Regress an indicator variable from another indicator variable originating from the same categorical variable

Giovanni Scotti

Join Date: Feb 2022

Posts: 2
#1

Regress an indicator variable from another indicator variable originating from the same categorical variable

25 Feb 2022, 11:15

Dear Stata experts,

I would need to run an OLS regression where the dependent variable and an independent variable derive from the same categorical variable: in particular, I am using data on education and seeing how educational attainment changes after the implementatio of a reform.
The dependent variable is sec_compl and takes value 1 if the respondent at least completed secondary education (so it comprises those who left school after completing and also those who enrolled in university) and 0 otherwise. The independent variables are cs_treat, a dummy=1 if exposed to the reform, 0 if not, and higher, a dummy=1 if the respondent enrolled in university and 0 otherwise.
In other words, higher is contained in sec_compl.

Here are the regression results:

reg sec_compl cs_treat higher, vce(cluster v001)

Linear regression Number of obs = 35,098
F(2, 3710) = 8919.79
Prob > F = 0.0000
R-squared = 0.3493
Root MSE = .39472

(Std. Err. adjusted for 3,711 clusters in v001)
------------------------------------------------------------------------------
| Robust
sec_compl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cs_treat | .0387245 .0042615 9.09 0.000 .0303694 .0470796
higher | .6062808 .0045538 133.14 0.000 .5973527 .615209
_cons | .3728845 .0050688 73.56 0.000 .3629466 .3828223
------------------------------------------------------------------------------

Should I interpret the coefficient on cs_treat as the effect of the reform on the probability of completing secondary school and then leaving education?
Would you say there is another way to keep all the observation and to measure the effect of the reform only on this fraction of respondents (who complete secondary school but do not enroll in university)?

I hope I expressed my doubts clearly and I thank you in advance for your help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

25 Feb 2022, 11:21

Maybe I'm missing something, but this does not make sense to me. Perhaps my perspective from the United States is too limited, but at least here, it is extremely unusual to be allowed to enroll in a university if you have not completed secondary education. There are occasional exceptions, but most universities won't even consider an applicant who has not completed secondary education. Consequently, I would expect in your data that higher == 1 implies sec_compl == 1. If you were doing a logistic regression instead of a linear probability model, this would result in higher being removed from the model due to perfect prediction, and the estimation carried out only among observations with higher == 0. The lienar probability model you are using is more tolerant of this situation, but still I do not understand how you would interpret this model at all.

What am I missing here?
Comment
Giovanni Scotti

Join Date: Feb 2022

Posts: 2
#3

25 Feb 2022, 12:31

Thank you for the reply!
I'm sorry I expressed myself in the wrong way; indeed also for this data (DHS series for Peru) it is the same as in the US. If you did not complete secondary education you cannot enroll in university. My variable sec_compl is taken from the variable edu_attainment, which is structured in the following way:

0 No education
1 Incomplete primary
2 Complete primary
3 Incomplete secondary
4 Complete secondary
5 Higher

From this variable I created the variables sec_compl=1 if edu_attainment>=4 and higher=1 if edu_attainment==5
Let me be more precise: I am looking at the effect of education on fertility outcomes using a reform that made secondary education compulsory. When I run reduced form equations, I want to capture the effect of the reform on fertility outcomes for those respondents who completed secondary education but did not move on to university (in other words the compliers).
Ultimately, I suppose that my question is: how can I look at this effect without restricting the sample to edu_attainment<5 but at the same time controlling for intrinsic trends in fertility outcomes for those who attended university (who can be considered always takers)?
Thank you for your availability!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

25 Feb 2022, 13:30

You have not stated what kind of intervention you are talking about. It's relevant because the pathways by which you think it might influence fertility and influence or be influenced by other variables in your data are key to modeling this correctly. Even if I knew what the intervention is, I don't have the substantive knowledge in this area to advise you specifically. I have some intuitions about it, but they are based on US experience and might prove misleading as applied to Peru.

It seems to me you could clarify your thinking by drawing a diagram of the causal relationships among the variables you are working with. That is often quite helpful in deciding what variables to include, what variables to exclude, and in what ways to best develop a statistical model. Remember that you want to include confounders, and you want to exclude colliders and mediators from the covariates.
Comment

Announcement

Regress an indicator variable from another indicator variable originating from the same categorical variable

Comment

Comment

Comment