Assistance Needed: Optimizing Logistic Regression Model for Unbalanced Panel Data in Stata

Michael Poitier

Join Date: Apr 2024
Posts: 1

Assistance Needed: Optimizing Logistic Regression Model for Unbalanced Panel Data in Stata

04 Apr 2024, 09:38

Dear Stata-Community,

I am currently grappling with a challenge related to setting up a logistic regression model in Stata. Specifically, I am working with a highly unbalanced dataset that contains firm-level variables as dependent variables. Given the nature of the data and the possibility of fixed effects at both national and yearly levels, I am seeking your expertise to help me navigate this hurdle effectively.

Here's a brief summary of my dataset:

NATION	YEAR	DEP. VAR. (0 or 1)	VAL. DEP. VAR.	IND. VAR.	VAL. IND. VAR.
Nation A	2009	Variable Dep	Value 1	Variable Ind	Value 1
Nation A	2011	Variable Dep	Value 2	Variable Ind	Value 2
Nation B	2009	Variable Dep	Value 3	Variable Ind	Value 3
Nation B	2010	Variable Dep	Value 4	Variable Ind	Value 4
Nation B	2011	Variable Dep	Value 5	Variable Ind	Value 5
Nation B	2012	Variable Dep	Value 6	Variable Ind	Value 6
Nation B	2013	Variable Dep	Value 7	Variable Ind	Value 7
Nation C	2012	Variable Dep	Value 8	Variable Ind	Value 8
Nation D	2011	Variable Dep	Value 9	Variable Ind	Value 9
Nation D	2012	Variable Dep	Value 10	Variable Ind	Value 10
Nation D	2013	Variable Dep	Value 11	Variable Ind	Value 11

Despite conducting multiple tests, I am struggling to identify the most suitable model setup. I have explored several options, including:

logit DEPVAR INDVAR, vce(cluster CommonIdentifier_NATION-YEAR)
logit DEPVAR INDVAR i.NATIONDUMMY i.YEARDUMMY
logit DEPVAR INDVAR i.NATIONDUMMY, vce(YEAR)
logit DEPVAR INDVAR i.YEARDUMMY, vce(NATION)

Additionally, I attempted to utilize xtlogit for panel data analysis. However, I encountered the following error message:

xtset NATION YEAR
repeated time values within panel
r(451);

Even after setting "xtset NATION," I continued to face issues:

xtlogit DEPVAR INDVAR, fe
note: multiple positive outcomes within groups encountered.
1,991 (group size) take 1,635 (# positives) combinations results in numeric overflow;
computations cannot proceed
r(1400);

I would deeply appreciate your explicit support and guidance on this matter. Your insights are invaluable to me as I endeavor to optimize my regression model and derive the right conclusions from the data.

Thank you very much for your support.

Kind regards,

Michael

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

05 Apr 2024, 00:45

Michael:
the issue here seems to rest on the evidence that your dependent variable has a too limilted within panel variation.
A remarkable variation in time-varying variables is one of the main conditions for the (conditional) -fe- estimator to work out properly.
As an aside, using -panelid- dummy as a predictor (like you would do to estimate -fe- with OLS) does not work with -logit-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Assistance Needed: Optimizing Logistic Regression Model for Unbalanced Panel Data in Stata

Comment