Pooled probit or Fixed Effects probit

Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#1

Pooled probit or Fixed Effects probit

24 Mar 2024, 11:37

Dear professors,
Before proceeding with the coding I have the following theoretical question:
I have a longitudinal dataset in which each country i is observed at different points in time. My dependent variable is a dummy variable, and I want to estimate the probability of success, say y_i =1, conditional to a set of predictors. I also add a full set of time dummies to the equation to estimate. The problem is that for the majority of high-income countries in the sample the dependent variable always take value 0. In this case the country dummy will perfectly predict the outcome variable. If I use country fixed effects what happens? Is reasonable to use instead regions dummies?
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#2

24 Mar 2024, 13:46

Unconditional fixed effects (FE) probit suffers from the incidental parameters problem (see https://gburtch.github.io/posts/2021/03/logit-ipp/ for a description). While there is no conditional FE probit estimator available, a conditional FE logit estimator exists where the fixed effects are conditioned out of the likelihood. See

Code:

help xtlogit

FE estimation requires that your variables vary within units (countries). If there is sufficient variation in a number of countries, then these will provide informative data for the likelihood calculation. You can disregard countries that exhibit no variation as they are uninformative. However, if you find that you are losing a significant portion of the sample, you may want to consider correlated random effects (Mundlak) estimation. For more details, refer to https://conference.iza.org/conferenc...nonlin_iza.pdf.
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#3

24 Mar 2024, 14:06

Dear Professor Musau, thank you very much for your prompt response.
My idea is to estimate the following pooled probit model:
P(y_it=1| x_it , t_t , region_r) = Phi (x_it b + t_t + region_r )
where x_it is a vector of predictors that vary across countries i and time t. Rather than controlling for the time-invariant unobserved heterogeneity across countries i, I control for the time-invariant heterogeneity across regions r (Americas, Europe, Africa...). I do this because for a relevant number of countries, the dependent variable is always zero. For instance, for Germany the dependent variable is always zero, for France is the same, and so on.
This idea comes from the following statement that I was reading on an article:
<An important disadvantage of a fixed effects model is the effect it has on the sample size. That is, countries for which the outcome variable is always zero are excluded from the sample, because the country dummy will perfectly predict the outcome variable in this case.>>
To me, it's not entirely clear why countries for which the outcome variable is always zero are excluded from the sample. Could you kindly provide an intuition about it, so that I can decide which estimator is more appropriate before coding in STATA?
Thanks

Last edited by Frank Giaquinto; 24 Mar 2024, 14:10.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#4

24 Mar 2024, 14:21

Originally posted by Frank Giaquinto View Post

This idea comes from the following statement that I was reading on an article:
<An important disadvantage of a fixed effects model is the effect it has on the sample size. That is, countries for which the outcome variable is always zero are excluded from the sample, because the country dummy will perfectly predict the outcome variable in this case.>>
To me, it's not entirely clear why countries for which the outcome variable is always zero are excluded from the sample. Could you kindly provide an intuition about it, so that I can decide which estimator is more appropriate before coding in STATA?
Thanks

The authors make a case defending their use of pooled probit with region dummies, but this also comes at the cost of ignoring country-level unobserved heterogeneity. You cannot blame them as they need to "sell" their model to the reviewers. If you still have a large enough sample, I would still favor conditional FE logit over the authors' approach. Additionally, I would recommend CRE probit over the authors' model.
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#5

24 Mar 2024, 14:32

Thank you very much!
Comment
Frank Giaquinto

Join Date: Dec 2023

Posts: 30
#6

24 Mar 2024, 15:12

Dear Professor,
I have one final question to clarify any lingering theoretical doubts.
If I estimate:
P(y_it=1| x_it , t_t , country_i) = Phi (x_it b + t_t + country_i )
Then, countries for which the outcome variable is always zero are excluded from the sample because the regression is estimated by maximum likelihood and the maximum likelihood estimate of the coefficient of a perfect predictor (country dummy) is infinite. So such an estimation cannot converge. Is that correct?
Comment

Announcement

Pooled probit or Fixed Effects probit

Comment

Comment

Comment

Comment

Comment