Panel data steps

Robert Hofland

Join Date: Jul 2017

Posts: 1
#1

Panel data steps

29 Jul 2017, 16:28

Dear users,

First of all, I'd like to say that I am a complete beginner in Stata and econometrics, so my questions may be fairly simple, and not only related to Stata. I have only used SPSS and analyses like ANOVA and a cross-sectional regression before, and even with those I am not great. I am currently trying to extend my previous research with a cross-sectional regression to a greater period of time, to gain some practice in the methodology of panel data.

Some more detail about my study: I am attempting to study the effect of various country specific variables (such as corruption and trade openness) on the economic growth of countries. I have now collected data over a longer period of time, and even though not every variable covers the same period of time (i.e. 1980-2015, 1981-2015), and there are some missing values, Stata tells me the data is strongly balanced. I have executed the fixed and random effect regression analyses on the basis of a guide and based on the significant Hausman test, I believe I have to use the fixed effects regression results.

However, aside from normal distribution of the variables (I have transformed my variables), I have not tested any of the assumptions of the analysis, as it is not clear to me which assumptions I have to test for in an analysis with panel data. I also do not know and cannot find how to execute these in Stata (I have tried to find about multicollinearity and heteroskedasticity). I have searched quite a bit through Google, and often hit this forum, but replies frequently contain discussions about the necessity of the assumption of multicollinearity, for example, rather than a (for me) understandable tutorial on executing these tests.

Basically, my questions are the following:

1. What are the assumptions of a regression analysis with panel data (fixed effect), and how do I execute them in Stata (commands, maybe interpretation)?
2. Would anyone be able to give me a fairly short step by step guide on what steps are part of a panel data regression (including these assumptions)? As far as I understand, I have to meet the assumptions, run the analyses, use the Hausman test to choose between fixed or random effects, and then interpret the results, however, I am probably missing something?)

Thanks for your time.

Robert Hofland
Tags: None

1 like
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

29 Jul 2017, 17:31

So first, there are no assumptions at all about the distributions of any of the variable in the analysis. There are assumptions about the residuals: independence and identical distribution over panels, and absence of serial correlation. But it is not typically necessary to test for these assumptions, because if you have an adequate number of panels, you can simply use the cluster robust variance (specify -vce(cluster panel_variable)- in your regression) and these problems are overcome if they are present. Given the simplicity of solving the problem, there is really no reason to test for any of these things.

(If you have only a very small number of panels, then it is questionable whether you should be using panel data methods at all, as their statistical properties in large samples are derived assuming N >> T.)

As for multicolinearity, as often pointed out on this Forum by me and others, it is a non-problem in most situations. If it involves only variables that are included only as control variables, it affects nothing at all and can be completely ignored. If there is multicolinearity involving a variable of actual interest in the analysis, then the effect of multi-colinearity is to increase the standard error of that variable's coefficient estimate. If your standard error is already small enough for practical purposes (e.g. the confidence interval for the coefficient of that variable is narrow enough that it makes no practical difference whether the true value were at one end of it or the other) then the multicolinearity is innocuous and you should not waste time and energy thinking about it. So no testing beyond an inspection of the confidence intervals of the coefficients of principal interest is needed. If you end up with a confidence interval that is wide enough to matter, however, then you have a problem. The difficulty in this situation is that there usually isn't anything that you can do about it. If the colinearity involves your variable(s) of interest and some other variable(s) that are not actually relevant to the outcome and are not needed as control variables, then you can just delete those other variable(s). But then there was no reason to have included those other variable(s) in the first place. So unless your model includes extraneous variables that really don't belong there, there is nothing you can do about multicolinearity with your existing data set. You would have to start from scratch with a different design, such as perhaps matched pairs, that would break the multi-colinearity. Or, you would have to get a much larger data sample. But there would be nothing you could do to fix it with the existing data set.

In some disciplines, including econometrics, the Hausman test often serves as the basis for choosing between fixed and random effects. In interpreting your results, it is important to remember that a fixed effects model is a model of within-panel effects. The same variables may have different effects (even opposite sign) on the same outcome when viewed across panels--but the across-panel effects are not estimable in a fixed-effects model. If across-panel effects are an important part of the research question, then a fixed-effects model is not suitable, Hausman notwithstanding. (It works the other way, too. If your research question is about within-panel effects, you should avoid the random effects model, despite their enhanced efficiency, even if Hausman says it's OK, because the effects it estimates are a mixture of within- and between- panel effects, and those are not what you are after.)

On the other side of the scale, one of the beauties of the fixed-effects model is that any time-invariant attributes of the panels, even if unobserved and unobservable, are adjusted for automatically by virtue of the fixed effects. So omitted variable bias is only a threat to fixed-effects models with respect to time-varying attributes of the panels. Even there, if you have a time-varying attribute that is the same across all panels (e.g. a certain new policy went into effect at a certain time period and all of the panels were affected by it), including time fixed effects in the model will also automatically adjust for those effects, again, even if they are unobserved or unobservable.

In short, the amount of formal testing needed to work with these models is rather minimal. You should invest your time and energy, instead, in doing your best to assure that the model you specify is a plausible and at least somewhat realistic model of the real world process you are studying. If the model is mis-specified, it doesn't matter how many exotic tests it passes: its answers will not be correct. So focus on building a model that makes sense from a theoretical and real-world perspective.
4 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

29 Jul 2017, 20:15

Welcome to Statalist, and to Stata.

As you're new to Stata, let me add to Clyde's statistical advice some programming advice of my own.

I'm sympathetic to you as a new user of Stata - it's a lot to absorb.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals. And, in doing so, you will also see that you cannot rely on your intuition built from other languages - SPSS in your case, SAS in mine - to intuit how to do in Stata things you are familiar with in other languages.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.
3 likes
Comment
Jihad El Yaagoubi

Join Date: May 2018

Posts: 65
#4

29 May 2018, 08:33

Hello everyone,

Please, could you explain to me this phrase? (written by Clyde)

Code:

(If you have only a very small number of panels, then it is questionable whether you should be using panel data methods at all, as their statistical properties in large samples are derived assuming N >> T.)

because I have in my study 28 firms and 5 years and I'm using panel data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#5

29 May 2018, 08:41

So, if, for example you had 5 firms and 28 years of data on each, the mathematical foundations on which the use of -xtreg- etc. rest would not apply well. But since you have 28 firms and 5 years of data, I think you are OK using -xtreg- from that perspective.
Comment
Jihad El Yaagoubi

Join Date: May 2018

Posts: 65
#6

29 May 2018, 09:00

Thank you very much ..
Comment
Azizjon Rakhmonov

Join Date: Sep 2018

Posts: 3
#7

02 Sep 2018, 22:34

Hello everyone
could you give me directions
I have 10 countries with 15 years observation Panel data which method is most appropriate
Hausman test says Fixed effect model is appropriate. According above mentioned "
(If you have only a very small number of panels, then it is questionable whether you should be using panel data methods at all, as their statistical properties in large samples are derived assuming N >> T.) "phrase is not best way so what could you suggest me, Thanks in advance
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#8

03 Sep 2018, 00:27

Azizjon:
you do not provide details about the Stata command you used to perform the panel data regression you mention.
Since you have a T>N panel dataset, you should consider -xtgls-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment