Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression vs linear probability model

    Dear community,

    Dor my research I'm trying to research the determinants of a buyout depending on 5 variables. I read online that a logistic regression is better than a linear probability model when the dependent variable is a binary variable. In my studies I have mostly used OLS regressions and have only briefly covered logit regressions. Therefore I have some points that I'm not sure about:
    1) I'm also looking at different sectors so I'm doing a different regression for 6 different sectors. Due to data availability I my least frequent outcome of the dependent variable is quite small for every sector (for one sector it is 5 and for the others it ranges between 10 and 30). I was wondering for which kind of regression this is a bigger problem?
    2) I read online that an assumption for the logistic regression is that the independent variables are linear with the log-odds and that this can be tested with the box-tidwell test. For this test you need the logarithm of the variabel and have one variable which is positive and negative. Since it is not possible to take the logarithm of a negative value I'm not sure how to perform this test.
    3) Is it possible to include binary variables as independent variables in a logistic regression?

    Kind regards,
    Joris

  • #2
    Joris:
    1) you may want to perform an unique -logit- with -i.sector- as independent variable;
    2) read Adrian Keister's comment in Logistic regression check linearity assumption in R - Cross Validated (stackexchange.com);
    3) yes, you can include binary variables as independent variables in a logistic regression.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      1. Unless you are predicting the outcome, a Linear Probability Model (OLS) will give you nearly identical effects as Logit. You can see this by running in OLS, then in Logit and get the margins.
      2. It's not good to have the dependent variable be very small (less than 0.05) and the count of successes be very small (if that's what you mean in 1).
      3. Don't estimate separately by sector. As Carlo suggests, include i.sector as a fixed effect. If you want different coefficients by sector, then you can i.sector#(c.x1 c.x2 c.x3 c.x4 c.x4) i.sector.

      Comment

      Working...
      X