Linear probability model

Jake Naismith

Join Date: Jul 2020

Posts: 18
#1

Linear probability model

20 Jul 2020, 14:15

Hello everyone,

I'm new to stata and I'm trying to run a linear probability model with 2 fixed effects in stata. my data is panel data and I found a lot of topics that said I can use xtreg, reghdfe or glm. Which one is the best? Is it possible to use reghdfe? I found the easiest to absorb fixed effects since my dataset has millions of observations and reghdfe is the fastest. My code is as follows

Code:

eststo: reghdfe Y l.X, a(A B) vce(robust)

If it is correct do i interpret it as the interpretation of a normal regression? My dependent is a dummy and my independent is a log of a continuous variable.

Thank you all for your time

Last edited by Jake Naismith; 20 Jul 2020, 15:14.
Tags: GLM, linear probability model, reghdfe, xtreg
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

21 Jul 2020, 01:22

With linear regression, you are modeling the conditional mean of Y. If Y can only take the values 0 and 1, then the mean is the proportion of 1s. The mean is the sum of the values divided by the number of values, it you add 0 + 1 + 1 + 0 +1 +0 +0 +0 +1 + ..., then you are counting the number of 1s. If you divide the number of 1s by the number of values you get the proportion of 1s. So a linear regression on a binary dependent variable can be interpreted as a model explaining the proportions of 1s.

Beware, many datasets use the coding 1, 2 instead of 0, 1, so you may need to recode before estimating the model. This changes the constant, but not the coefficients, but it will bite when you later use something like margins to plot predicted values. Even if Y is coded 0, 1, you may want to flip the categories depending on your research question. The latter does not change the model, but could make the interpretation easier.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Jake Naismith

Join Date: Jul 2020

Posts: 18
#3

22 Jul 2020, 11:27

Thank you Maarten. So, is it correct to interpret the results as follows: = a 100% change in X generates a 100*β2 percentage point change in the probability of Y.

Thanks again
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#4

22 Jul 2020, 17:38

The correct interpretation is: "If we increase x by one percent, we expect y to increase by (β/100) units of y."
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#5

23 Jul 2020, 07:59

To follow-up on Chris' helpful comment, since the LPM estimates a probability, the "units" of y is really best viewed as the units of the mean function in this case. The mean function is the probability of a "success" (y = 1).

Here is how I often report the effect. If b is the coefficient on log(X), then b/10 is the change in P(y = 1|X,Z) when X increases by 10 percent (holding Z fixed). The 10 percent change in X is the approximation if log(X) increases by 0.10. For example, if b = 0.75 then if X increases by 10 percent, P(y = 1|X,Z) is estimated to increase by 0.075, or 7.5 percentage points.

JW
1 like
Comment
Jake Naismith

Join Date: Jul 2020

Posts: 18
#6

23 Jul 2020, 11:30

Thank you Chris and Jeff, this is really helpful. I really appreciate it.
Comment

Announcement

Linear probability model

Comment

Comment

Comment

Comment

Comment