What kind of a binary regression model should I choose if I work with pool or panel data?

Artem Abduramanov

Join Date: Feb 2020

Posts: 5
#1

What kind of a binary regression model should I choose if I work with pool or panel data?

29 Feb 2020, 08:38

Hello!

I have a longitudinal dataset, which contains observations of individuals between 2008-2017. I want to examine only one year period, but it contains too few observations, that is why I have decided to combine all the years to make a pool. However, now, I am not sure what kind of regression to choose in order to find out the relationship between the dependent and independent variables. The types of variables are illustrated in the table below:
Response variable dummy

Variable of interest dummy

Control variables different ones (dummy, float, etc.)

1) Could you, please, help me by recommending what kind of a model to choose?
2) May I just use probit regression by not paying attention to years while using the pool regression (similar to that one I would use if I worked with cross-sectional data)?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#2

29 Feb 2020, 13:58

Artem:
welcome to this forum.
If you have a panel dataset (by the way: a binomial regressand calls for -xtlogit-) why focusing on one year only and throw way tons of potentially precious information?

Kind regards,
Carlo
(StataNow 19.0)
1 like
Comment
Artem Abduramanov

Join Date: Feb 2020

Posts: 5
#3

29 Feb 2020, 14:30

Dear Carlo Lazzaro,

Thank you for your reply!

There has been no necessity to use panel structure as there was no need to take into account the longitudinal effects. However, now I am going to deal with the option that covers a 10-year-period of time. What would you suggest in this way? As I understand you recommend logit but not probit regression. What are the reasons?

Kind regards,
Artem
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#4

29 Feb 2020, 15:38

Artem:
-probit- and -logit- give back similar cofficients, but differenti standard errors. Choosing one of them is often led by research field traditions and preferences. That said in your case I would go -xtlogit-.

Kind regards,
Carlo
(StataNow 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

01 Mar 2020, 20:34

A few comments.

1. There is nothing wrong with using pooled probit or logit for a panel data set. The use of xtprobit or xtlogit imposes a very strong conditional independence assumption. It is analogous to assuming there is no serial correlation in a linear panel data model, in which case we would never cluster our standard errors in the linear case. Since we almost always cluster our standard errors in the linear case, it is odd to suggest nonlinear procedures that assume away a feature that we see regularly in linear models.
2. Related to point (1), xtprobit and xtlogit are actually inconsistent if there is serial correlation. So the situation is much more dire than in the linear case, where serial correlation only affects inference (and efficiency).
3. As I discuss in my MIT Press book, pooled probit and logit require no restrictions on serial dependence. Consistency holds, and one must simply cluster standard errors.
4. A pooled version of correlated random effects is trivial to implement. One computes the time averages, as in the linear case considered by Mundlak, and adds them to the pooled probit estimation. One can test significance of the time averages as a kind of robust Hausman test. If the test rejects, the time averages should be left in.
5. If in (4) we use linear regression instead of probit, we would actually obtain the FE estimator. So the pooled method is definitely okay in the linear case, which suggests it is fine in the nonlinear case, too.
6. A correction to Carlo: probit and logit, whether pooled or using joint MLE (xtprobit, xtlogit), will not give similar coefficients. In fact, the coefficients logit coefficients will be uniformly larger due to the implicit scaling. However, the average partial (marginal) effects are often very similar. The statistical significance of the APEs also tends to be similar across the two methods.
7. I would estimate a linear probability model by FE to compare with the Chamberlain-Mundlak CRE probit that I described in (4).

JW
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#6

02 Mar 2020, 00:36

Jeff:
thanks for correcting me.

Kind regards,
Carlo
(StataNow 19.0)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17672
#7

02 Mar 2020, 11:35

Artem:
Jeff was obviulsy right in correcting me.
I shoud have written that the z-values of -logit- and -probit- are similar, whereas coefficients of -logit-, as Jeff highlighted, are larger than those from -probit-.

Kind regards,
Carlo
(StataNow 19.0)
Comment

Response variable	dummy
Variable of interest	dummy
Control variables	different ones (dummy, float, etc.)

Announcement

What kind of a binary regression model should I choose if I work with pool or panel data?

Comment

Comment

Comment

Comment

Comment

Comment