Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Postestimation LPM Regression

    Hello my dear,

    I am not very experienced with stata and statistical modeling in general. I have used a linear probability model for my regression.

    regress overweight income `y', cluster(idpers)

    I estimate the effect of income on the probability of being overweight using a panel dataset and clustering at the individual level because I use a lot of annual data.
    Now I would like to evaluate the estimates and visualize them as well.
    Do you have any suggestions or recommendations on how best to do the postestimation?


  • #2
    If you want to approximate the causal effect that income has on being overweight, a panel regression model with fixed effects might be superior. Assuming that both income and the overweight indicator vary for each person over time you could estimate this as such:

    Code:
    xtset idpers timevar
    xtreg overweight income `y', vce(robust) fe
    The advantage of this model is that all time-constant confounders (gender, place of residence, migration background, ...) are accounted for automatically. Only time-varying controls need to be included.

    In any case, regardless of whether using xtreg or reg, I suggest using coefplot to visualize your results, see https://repec.sowi.unibe.ch/stata/co...g-started.html
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      okey thanks that helped me in any case. i will consider both estimation methods. Do you have any ideas on how I can diagnose my results, initially in relation to my LP model.
      My model consists of the following variables

      local y " household income age working hoursPw ownkid activePw food ib(freq).sex ib(freq).education ib(freq).civsta ib(freq).health ib(freq).illnes ib(freq).smoke"

      regress Overweight Income `y', cluster(idpers)

      where y are my control variables.

      I was thinking of a residual analysis in the form of F-test and residual plots. Would other or further tests be more suitable.
      Furthermore, I might want to perform a cluster analysis to identify segments, which I might use to perform further regressions, would that be a useful approach?
      I was also thinking of doing a mediation analysis where I analyze direct and indirect effects, do you have any ideas on the best way to do this?

      Comment


      • #4
        I am not aware of any special test or diagnostic that is relevant for a LPM, I would stick to the basics any good textbook on OLS regression model diagnostic offers (Kohler/Kreuter: Data Analysis Using Stata).

        Regarding your model, I am a bit confused since you have income twice, again in your local y. Is this intended?

        Regarding the other analysis approaches, this really depends on your research questions and what you want to find out. A mediation analysis is possible but why would you do it? Is this your primary research goal?
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          no i use household income as a control variable and estimate the effect of personal income at the individual level.
          My research objective is to show how income affects the likelihood of being overweight. Since this area of research is endogenous, other models would be more appropriate, but due to the nature of my work, I lean towards simple regressions.
          I thought I could reduce/control the bias caused by endogeneity through mediation analysis.

          Comment


          • #6
            Mediation analysis wont help you to for this. If you have panel data available, FE regression models (xtreg) are in my opinion the best option you have to approximate causal effects regarding your research question. You might want to study the literature for comparable research projects and check the methods used there.
            Best wishes

            (Stata 16.1 MP)

            Comment

            Working...
            X