Minimum number of observations for panel data regression

Stephen Zamore

Join Date: Nov 2016

Posts: 10
#1

Minimum number of observations for panel data regression

16 Sep 2017, 08:18

Does anyone have an idea whether there is a rule of thumb for minimum number of observations for panel data regression? If yes, any source I can cite?
I am estimating a logit model with a panel data of 18 firms observed over 9 years giving a total of 162 observations.
However, one of the variables of interest has only 97 obs which drastically reduces the observations for all models. In this case, I think pooled logit model is appropriate but a journal reviewer said we should investigate panel data techniques since we have panel data.

Best wishes, Stephen
Tags: None

1 like
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

16 Sep 2017, 08:41

Stephen:
see https://stats.stackexchange.com/ques...tic-regression.
That said, the reviewer is barking at the right tree in this instance, since:
- you have a limited panel dataset. However, that feature, per se, does not allow you to skip the consequence of a panel data structure (say, individual effect). Stata can help you out in this respect: if the LR test appearing at the foot of the -xtlogit- outcome table fails to reach statistical significance, you can go back to -logit- (with -cluster-ed standard errors, as your observations are not independent);
- however, the main issue with your dataset seems to rest on the missing values totalled by one of the variables. I suppose that the reviewer advised you about that, too. In a scientific paper (especilly in the last years) is submitters' duty to justify why they started their research with, say, 100 patients but only 80 made it in the reported regression.
Hence, I think you have to:
- identify the missing mechanism of you data (MCAR, MAR, MNAR);
- usually, dealing with non-informative missing values (MCAR and mostly MAR) calls for multiple imputation and Stata offers an ad hoc suite of commands for doing it. Conversely, there's no rationale for considering a pooled -logit- as a mean to deal with missing data (as I get from you post).

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Stephen Zamore

Join Date: Nov 2016

Posts: 10
#3

16 Sep 2017, 09:34

Thank you Carlo for your comment. It is helpful, i will consider it. In the first place, LR test is significant, so I need to explore panel techniques. In dealing with missing values, are MCAR and MAR user-written packages I need to install?
Stephen
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

16 Sep 2017, 09:38

Stephen:
sorry for the acronyms.
MCAR=missing completely at random; MAR=missing at random; MNAR (or NMAR)=missing not at random.
You can easily grasp the basic terminology at -help mi glossary- and related entry in Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Stephen Zamore

Join Date: Nov 2016

Posts: 10
#5

16 Sep 2017, 09:40

To deal with missing values, I only know of replacing with mean where the -egen- command is used.
Stephen
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#6

16 Sep 2017, 10:25

You are dealing with a reviewer who seems reasonably sophisticated statistically. I doubt very much that he or she will accept replacing the missing values of the mean. Even under the most favorable circumstances, this method will produce biased estimates of regression coefficients.

This situation gives you the opportunity to learn about missing data and its appropriate management. For an introduction to the issues and approaches, I recommend you Google (or use whatever your favorite search engine is for) Richard Williams missing data. The excellent Richard Williams has a series of papers on this topic on his website and, as with all of Richard's writing, they are models of clarity.

These issues are complicated enough that you may find you need to involve a professional statistician for this. Nevertheless, to even work effectively with a statistician you need to know the basics of missing data analysis.
3 likes
Comment
Stephen Zamore

Join Date: Nov 2016

Posts: 10
#7

16 Sep 2017, 10:57

Carlo
Thanks again, I got missing mechanisms understanding.
Comment
Stephen Zamore

Join Date: Nov 2016

Posts: 10
#8

16 Sep 2017, 11:04

Thank you Clyde for providing more information on missing data.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

16 Sep 2017, 11:17

Stephen:
as an aside to Clyde's wise advice (including - but by no means limited to - seeking for the help of a statiscian/colleagues trained/expert in dealing with missing data), you may want to take a look at the following online resource: http://www.missingdata.org.uk/. That website is maintained by Jonathan Bartlett, whose posts appeared from time to time on this forum until some months ago.
Among the reference textbooks on missing values, which can integrate the excellent teaching notes prepared by Richard, you may find interesting:

Allison PD. Missing Data. Thousand Oaks, CA: SAGE Publications, 2001.

Little RJA, Rubin DR. Statistical analysis with missing data, 2nd ed. Chichester: Wiley, 2002.

van Buuren S. Flexible Imputation of Missing Data. Boca Raton, FL: Chapman and Hall/CRC, 2012.

Kind regards,
Carlo
(Stata 19.0)
3 likes
Comment
Stephen Zamore

Join Date: Nov 2016

Posts: 10
#10

16 Sep 2017, 11:59

Carlo:
Great! Very helpful resources.
.................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .......
Stephen
Comment

Announcement

Minimum number of observations for panel data regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment