Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear probability model

    Hello everyone,

    I'm new to stata and I'm trying to run a linear probability model with 2 fixed effects in stata. my data is panel data and I found a lot of topics that said I can use xtreg, reghdfe or glm. Which one is the best? Is it possible to use reghdfe? I found the easiest to absorb fixed effects since my dataset has millions of observations and reghdfe is the fastest. My code is as follows

    Code:
    eststo: reghdfe Y l.X, a(A B) vce(robust)
    If it is correct do i interpret it as the interpretation of a normal regression? My dependent is a dummy and my independent is a log of a continuous variable.

    Thank you all for your time
    Last edited by Jake Naismith; 20 Jul 2020, 15:14.

  • #2
    With linear regression, you are modeling the conditional mean of Y. If Y can only take the values 0 and 1, then the mean is the proportion of 1s. The mean is the sum of the values divided by the number of values, it you add 0 + 1 + 1 + 0 +1 +0 +0 +0 +1 + ..., then you are counting the number of 1s. If you divide the number of 1s by the number of values you get the proportion of 1s. So a linear regression on a binary dependent variable can be interpreted as a model explaining the proportions of 1s.

    Beware, many datasets use the coding 1, 2 instead of 0, 1, so you may need to recode before estimating the model. This changes the constant, but not the coefficients, but it will bite when you later use something like margins to plot predicted values. Even if Y is coded 0, 1, you may want to flip the categories depending on your research question. The latter does not change the model, but could make the interpretation easier.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you Maarten. So, is it correct to interpret the results as follows: = a 100% change in X generates a 100*β2 percentage point change in the probability of Y.

      Thanks again

      Comment


      • #4
        The correct interpretation is: "If we increase x by one percent, we expect y to increase by (β/100) units of y."

        Comment


        • #5
          To follow-up on Chris' helpful comment, since the LPM estimates a probability, the "units" of y is really best viewed as the units of the mean function in this case. The mean function is the probability of a "success" (y = 1).

          Here is how I often report the effect. If b is the coefficient on log(X), then b/10 is the change in P(y = 1|X,Z) when X increases by 10 percent (holding Z fixed). The 10 percent change in X is the approximation if log(X) increases by 0.10. For example, if b = 0.75 then if X increases by 10 percent, P(y = 1|X,Z) is estimated to increase by 0.075, or 7.5 percentage points.

          JW

          Comment


          • #6
            Thank you Chris and Jeff, this is really helpful. I really appreciate it.

            Comment

            Working...
            X