xtlogit, fe results in numeric overflow

Kayleigh Amber

Join Date: Jan 2022

Posts: 21
#1

xtlogit, fe results in numeric overflow

23 Mar 2022, 06:07

Hello all,

I am trying to run a fixed effects regression on a binary dependant variable on an unbalanced panel set of 173,000 observations and 301 groups. This results in the following error:

xtstet id

. xtlogit Overdue30 SALES_REV_TURN CF_FREE_CASH_FLOW BS_TOT_ASSET PROF_MARGIN EBITDA RET
> URN_ON_ASSET RETURN_ON_INV_CAPITAL CUR_RATIO CASH_RATIO TOT_DEBT_TO_TOT_ASSET TOT_DEBT
> _TO_TOT_EQY SHORT_AND_LONG_TERM_DEBT Num_Execs age avg_board_tenure quarterend, fe

note: multiple positive outcomes within groups encountered.
note: 57 groups (825 obs) dropped because of all positive or
all negative outcomes.
note: Num_Execs omitted because of no within-group variance.
note: avg_board_tenure omitted because of no within-group variance.
5,971 (group size) take 727 (# positives) combinations results in numeric overflow;
computations cannot proceed
r(1400);

So far I have tried to use a clogit, group(id) with the same result.

I have also tries a regular logit regression including i.id, however as I am using the AIC for model specification I do not want these extra 300 variables included. I imagine these will skew the AIC and result in suboptimal model specification.

What can I do to solve this?
Any help would be much appreciated!

Thanks,
KAyleigh
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 9957
#2

23 Mar 2022, 13:33

The issue would arise if you were comparing the AICs from -xtlogit,fe- and -logit i.id- . As long as the number of observations across identifiers is large (e.g., >30), then you can estimate all models using unconditional fixed effects logit and directly compare their AIC values. Alternatively, 301 groups does not sound too many, so you can see whether Stata MP is able to handle the estimation in case you are using a different flavor of Stata.

Last edited by Andrew Musau; 23 Mar 2022, 13:35.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4889
#3

23 Mar 2022, 16:54

I think the problem is too massive for clogit or xtlogit, fe. The problem isn't the # of groups but the size of some groups. If I am reading the manual entry for clogit right:

Ti = the number of records in the ith group. You average 574 records a group, and 1 group has a size of 5,971.

k1i = the number of successes in the group, i.e. the number of times yi = 1. In the group the message complains about k1i = 727

k2i = the number of failures in the group = Ti - k1i. For the group mentioned, k2i = 5,971 - 727 = 5,244.

Quoting the clogit manual entry,

If min(k1i, k2i) is small, computation time is not an issue. But if it is large—say, 100 or more—patience may be required.

In your case, min(k1i, k2i) = 727, which is far greater than 100. It isn't that Stata takes forever to do the calculation, it can't do it, period. And, that is just one of your 301 groups. Others are probably problematic too.

I've never tried to estimate a fixed effects model with groups this big. I would try Andrew's approach of using -logit y x1 x2 x3 i.id- and see if that works. I've never tried something that big either, but maybe Stata can do it. Or, I suppose Stata/MP might be able to do it. If it does run rather than crash immediately, I suspect you will need superhuman patience to get your results.

If by some chance you have Limdep, it claims that it can have up to 50,000 groups with its Unconditional Estimators http://www.limdep.com/features/capab...s_models_4.php. I can't vouch for it though. Limdep is a popular program but I'm generally very happy with Stata.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 18.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Kayleigh Amber

Join Date: Jan 2022

Posts: 21
#4

24 Mar 2022, 08:50

Hi all,

Thank you for your inputs. Sadly I am lacking a comfortable level of observations for a good chunk of ID's here. I will try to aggregate observations over a quarterly basis and see if this makes the pool small enough for Stata to cope. Will keep you updated and confirm if this was succesful
Comment
Kayleigh Amber

Join Date: Jan 2022

Posts: 21
#5

25 Mar 2022, 08:26

Hello all,
Just an update to confirm that aggregating the dataset using bysort: id q, where q is a quarterly variable, has alleviated the problem and allowed me to run the regression. This is a good solution to the problem if you are comfortable using mean values for a period rather than individual observations.

Thanks for your inputs,
Kayleigh
Comment

Announcement

xtlogit, fe results in numeric overflow

Comment

Comment

Comment

Comment