Probit or Logit when concers about non-linearity?

Theo Pollen

Join Date: Sep 2024

Posts: 5
#1

Probit or Logit when concers about non-linearity?

08 Oct 2024, 10:02

Hello everyone,

I'm working with a binary outcome variable and have encountered an interesting issue regarding model specification using both logit and probit regression models in Stata. I have a specific predictor variable that I've found significant in both models. Initially, the link test was significant for both the logit and probit models, indicating potential misspecification.

When I include a polynomial term of this predictor in the probit model, the link test becomes insignificant, suggesting that the non-linearity may be adequately captured. However, in the logit model, the link test remains significant even after adding the polynomial term. Interestingly, when I only include the polynomial term in the logit model (excluding the base term), the sign of the coefficients flips and the polynomial term becomes positive, but when both polynomial and base term are included they are both negative. Additionally, I noticed probit and logit adjusted R2 are very comparable with the lowest R2 achieved with only the polynomial, followed by only the base term, and the highest R2 is in the model with both base term and polynomial. The predictor variable also appears as an interaction with another contiuous variable, VIF shows no problem with multicolinearity, and all continous variables are centered when relevant. Furthermore, I have treid many different speciications including polynomials of other variables and interaction terms with and between other variables, but the source of non-linearity is clearly coming from the variable I'm disucssing.

Given this context, how should I interpret the significance of the link test in both models? Why might the probit model capture the relationship better with the polynomial while the logit model does not? Also, is a 10% significance level too lenient for model evaluation in this scenario? Any insights or recommendations on how to proceed, which model to select or any other relevant matters would be greatly appreciated!

Thank you!
Tags: None
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#2

08 Oct 2024, 11:49

Is the variable you're considering always positive? If so you may want to use the natural log to capture the nonlinearity in the latent variable mean. Having said that, both logit and probit should throw similar results. as you are finding. The only difference between them, from what you say, seems to be in the link test with the quadratic (I'm assuming that's what you mean for polynomial) specification. Hence the suggestion about the natural log to see how that works. I never pay too much attention to the pseudo R-squared. Furthermore, are the coefficients on both terms of the quadratic variable significant? Because you cannot be dropping variables from a specification unless a test supports that decision.

As a side comment, sign switching is usually a sign of omitted variable bias, so it could be that the link test is capturing that misspecification, i.e. that you have omitted a relevant variable, rather than that the variable has a non-linear relationship with the latent variable.

Alfonso Sanchez-Penalver
Comment
George Ford

Join Date: Aug 2014

Posts: 3040
#3

08 Oct 2024, 13:51

use probit.

HTML Code:

https://www.stata.com/manuals/rlinktest.pdf
Comment
Theo Pollen

Join Date: Sep 2024

Posts: 5
#4

09 Oct 2024, 05:13

Originally posted by Alfonso Sánchez-Peñalver View Post

Is the variable you're considering always positive? If so you may want to use the natural log to capture the nonlinearity in the latent variable mean. Having said that, both logit and probit should throw similar results. as you are finding. The only difference between them, from what you say, seems to be in the link test with the quadratic (I'm assuming that's what you mean for polynomial) specification. Hence the suggestion about the natural log to see how that works. I never pay too much attention to the pseudo R-squared. Furthermore, are the coefficients on both terms of the quadratic variable significant? Because you cannot be dropping variables from a specification unless a test supports that decision.

As a side comment, sign switching is usually a sign of omitted variable bias, so it could be that the link test is capturing that misspecification, i.e. that you have omitted a relevant variable, rather than that the variable has a non-linear relationship with the latent variable.

Thank you for the reply, Im using an index variable, which is not valid as a log transformation (I thought?) unfortunately. I will include the quadratic term, but I was wondering wether the difference in linktest waas due to probit not being able to capture non-linearity as well, or due to probit providing a better fit?
Comment
Theo Pollen

Join Date: Sep 2024

Posts: 5
#5

09 Oct 2024, 05:15

Originally posted by George Ford View Post

use probit.

HTML Code:

https://www.stata.com/manuals/rlinktest.pdf

Thank you for your reply, if you have the time could you please explain why probit is preferred I don't really understand what is said in the link you provided, and how this relates to probit or logit?
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#6

09 Oct 2024, 11:29

Originally posted by Theo Pollen View Post

Thank you for the reply, Im using an index variable, which is not valid as a log transformation (I thought?) unfortunately. I will include the quadratic term, but I was wondering wether the difference in linktest waas due to probit not being able to capture non-linearity as well, or due to probit providing a better fit?

As George Ford indicated, the explanation is in the documentation. Read it through, and you will see what he's referring to.

Alfonso Sanchez-Penalver
1 like
Comment

Announcement

Probit or Logit when concers about non-linearity?

Comment

Comment

Comment

Comment

Comment