Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regress an indicator variable from another indicator variable originating from the same categorical variable

    Dear Stata experts,

    I would need to run an OLS regression where the dependent variable and an independent variable derive from the same categorical variable: in particular, I am using data on education and seeing how educational attainment changes after the implementatio of a reform.
    The dependent variable is sec_compl and takes value 1 if the respondent at least completed secondary education (so it comprises those who left school after completing and also those who enrolled in university) and 0 otherwise. The independent variables are cs_treat, a dummy=1 if exposed to the reform, 0 if not, and higher, a dummy=1 if the respondent enrolled in university and 0 otherwise.
    In other words, higher is contained in sec_compl.

    Here are the regression results:

    reg sec_compl cs_treat higher, vce(cluster v001)

    Linear regression Number of obs = 35,098
    F(2, 3710) = 8919.79
    Prob > F = 0.0000
    R-squared = 0.3493
    Root MSE = .39472

    (Std. Err. adjusted for 3,711 clusters in v001)
    ------------------------------------------------------------------------------
    | Robust
    sec_compl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    cs_treat | .0387245 .0042615 9.09 0.000 .0303694 .0470796
    higher | .6062808 .0045538 133.14 0.000 .5973527 .615209
    _cons | .3728845 .0050688 73.56 0.000 .3629466 .3828223
    ------------------------------------------------------------------------------

    Should I interpret the coefficient on cs_treat as the effect of the reform on the probability of completing secondary school and then leaving education?
    Would you say there is another way to keep all the observation and to measure the effect of the reform only on this fraction of respondents (who complete secondary school but do not enroll in university)?

    I hope I expressed my doubts clearly and I thank you in advance for your help!

  • #2
    Maybe I'm missing something, but this does not make sense to me. Perhaps my perspective from the United States is too limited, but at least here, it is extremely unusual to be allowed to enroll in a university if you have not completed secondary education. There are occasional exceptions, but most universities won't even consider an applicant who has not completed secondary education. Consequently, I would expect in your data that higher == 1 implies sec_compl == 1. If you were doing a logistic regression instead of a linear probability model, this would result in higher being removed from the model due to perfect prediction, and the estimation carried out only among observations with higher == 0. The lienar probability model you are using is more tolerant of this situation, but still I do not understand how you would interpret this model at all.

    What am I missing here?

    Comment


    • #3
      Thank you for the reply!
      I'm sorry I expressed myself in the wrong way; indeed also for this data (DHS series for Peru) it is the same as in the US. If you did not complete secondary education you cannot enroll in university. My variable sec_compl is taken from the variable edu_attainment, which is structured in the following way:

      0 No education
      1 Incomplete primary
      2 Complete primary
      3 Incomplete secondary
      4 Complete secondary
      5 Higher

      From this variable I created the variables sec_compl=1 if edu_attainment>=4 and higher=1 if edu_attainment==5
      Let me be more precise: I am looking at the effect of education on fertility outcomes using a reform that made secondary education compulsory. When I run reduced form equations, I want to capture the effect of the reform on fertility outcomes for those respondents who completed secondary education but did not move on to university (in other words the compliers).
      Ultimately, I suppose that my question is: how can I look at this effect without restricting the sample to edu_attainment<5 but at the same time controlling for intrinsic trends in fertility outcomes for those who attended university (who can be considered always takers)?
      Thank you for your availability!

      Comment


      • #4
        You have not stated what kind of intervention you are talking about. It's relevant because the pathways by which you think it might influence fertility and influence or be influenced by other variables in your data are key to modeling this correctly. Even if I knew what the intervention is, I don't have the substantive knowledge in this area to advise you specifically. I have some intuitions about it, but they are based on US experience and might prove misleading as applied to Peru.

        It seems to me you could clarify your thinking by drawing a diagram of the causal relationships among the variables you are working with. That is often quite helpful in deciding what variables to include, what variables to exclude, and in what ways to best develop a statistical model. Remember that you want to include confounders, and you want to exclude colliders and mediators from the covariates.

        Comment

        Working...
        X