Panel data issues

Maxim van Reenen

Join Date: May 2018

Posts: 5
#1

Panel data issues

30 May 2018, 06:55

Hello,

Please, could someone help me out with the following issue?

For my thesis, I've got panel data from 300 US companies for over a timespan of 10 years. I'm doing a simple mediation analysis. So I've got a dv, idv, mv, and 4 control variables. Due to missing data, just 83 groups were left to research with all variables included.

What I've figured out until now is the following.

First determine whether it's pooled, fixed, or random. Therefore, I've conducted Hausman first. H0 was rejected, so it's not random.
Then, I want to determine whether it's pooled or fixed. Therefore, I've conducted BP with xttest2, but it failed and gave the error message: too few observations. So, therefore, I balanced the years with xtbalance and got a strongly balanced dataset. Still, it doesn't work. I've also read about the rule N large and T low, and the other way around but I cannot find some proper reading about that.

I'm in a reaaaaaally hurry with the thesis deadline coming up and still got a lot of work to do.

The only thing I would like to know is how to determine what kind of panel data I have, what tests to do in order to meet the OLS assumptions with the commands (If that's even possible for panel data), and then I'll perform the three-step mediation analyses from Hayes, 2009. (I never had panel data in school, so therefore this all may sound a bit rookie-like)

Thanks in advance! I hope I do not ask for too much at once.

Last edited by Maxim van Reenen; 30 May 2018, 07:00.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

30 May 2018, 08:36

Maxim:
welcome to this forum.
Despite my being symphatetic with tight deadlines and the like, I would urge you to grasp at least the building blocks of panel data regression from any decent textbook on panel data econometrics, in addition to -xt- and -xtreg- related entries in Stata .pdf manual. The fact that you weren't taught in panel data econometrics at school, will hardly touch your discussant.
All that said, the following toy-example will give you an idea of the steps to be take for selecting among -xtreg, fe-, -xtreg, re- and pooled OLS (I assume no robust/cluster standard errors):

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)


. xtreg ln_wage age i.race, re

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.1032                                         avg =        6.1
     overall = 0.0945                                         max =         15

                                                Wald chi2(3)      =    3242.34
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .018534    .000331    55.99   0.000     .0178852    .0191828
             |
        race |
      black  |  -.1209428   .0129079    -9.37   0.000    -.1462418   -.0956439
      other  |   .0981941   .0538424     1.82   0.068    -.0073351    .2037233
             |
       _cons |    1.15423   .0118069    97.76   0.000     1.131089    1.177371
-------------+----------------------------------------------------------------
     sigma_u |  .36581626
     sigma_e |  .30349389
         rho |  .59231394   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estimates store re

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        ln_wage[idcode,t] = Xb + u[idcode] + e[idcode,t]

        Estimated results:
                         |       Var     sd = sqrt(Var)
                ---------+-----------------------------
                 ln_wage |   .2285836       .4781042
                       e |   .0921085       .3034939
                       u |   .1338215       .3658163

        Test:   Var(u) = 0
                             chibar2(01) = 26748.43
                          Prob > chibar2 =   0.0000

. xtreg ln_wage age i.race, fe
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                F(1,23799)        =    2720.20
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
             |
        race |
      black  |          0  (omitted)
      other  |          0  (omitted)
             |
       _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000

. estimates store fe

. hausman fe re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       .            re         Difference          S.E.
-------------+----------------------------------------------------------------
         age |    .0181349      .018534       -.0003991        .0001064
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       14.08
                Prob>chi2 =      0.0002

.

Despite the -xttest0- (that should be performed after - xtreg, re-) reject the null, you cannot stop there but should take a step beyond instead, and test which specification (-fe- or -re-) fits your data better via -hausman- test.
In the toy-example above, -fe- is the way to go (by the way: the F-test appearing as a footnote under -xtreg,fe- outcome table is significant and tells you that -xtreg,fe- outperform an OLS).
Please note that -hausman- allows for default standard errors only.
As an aside, hold back from converting an unbalanced panel into a balanced one: this approach in fact skip the mandatory investigation on the reason of missing data (which, in turn, can be ignorable or not); at the top of that, Stata can manage both unbalanced andf balanced pane datasets with no problem.

Last edited by Carlo Lazzaro; 30 May 2018, 08:46.

Kind regards,
Carlo
(Stata 19.0)

Comment

Maxim van Reenen

Join Date: May 2018

Posts: 5
#3

01 Jun 2018, 05:34

Ah thanks Carlo!
This answer is already pretty more elaborated than I expected it to be.
I'm going to implement these steps and hopefully it will work out properly.
Comment
Maxim van Reenen

Join Date: May 2018

Posts: 5
#4

05 Jun 2018, 02:48

One more question, regarding my calculations.

I've run both tests on my hypotheses and found out that for three of the four hypotheses, the fixed model is the most appropriate. For one of them, the random model is more appropriate. I've tested for autocorrelation, which is not present. But, after testing, there is heteroskedasticity in the data.

Now I'm wondering, should I use GLS estimation with -panels(heteroskedastic) instead of robust? (Or cluster? But I don't know the difference)
And if I should use GLS, then how do I incorporate the fixed effects in the gls command? And also, how do I incorporate the random effects in the gls command, for the hypothesis that normally would be assessed with a logistic regression? (Because the dependent variable in that hypothesis is binary)
The sample with groups is reasonable large (N=100)

Thanks a lot!

Last edited by Maxim van Reenen; 05 Jun 2018, 03:32.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

05 Jun 2018, 03:01

Maxim:
- if your dependent variable is continuous and N>T, go -xtreg- (with robust/clustered standard errors if you detected heteroskedasticity and/or autocorrelation);
- if your dependent variable is binary and you have panel data, you should go -xtlogit-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maxim van Reenen

Join Date: May 2018

Posts: 5
#6

05 Jun 2018, 15:04

Hi Carlo,

Thanks for your quick response.
Could you elaborate on your suggestion, for my understanding?
The feasible GLS method does give better, significant results, but maybe that should not affect the model choice.

Last edited by Maxim van Reenen; 05 Jun 2018, 15:12.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

05 Jun 2018, 15:31

Maxim:
-xtgls- works for T>N panel datasets. You have an N>T panel dataset: so go -xtreg-, with robust or clustered standard errors (they do the same job under -xtreg-) if you suspect/have evidence of heteroskedasticity and/or autocorrelation;
-looking for the statistical strategy that give significant coefficients sounds like a weak approach; I would prefer the one that gives the fairest and truest view of the data generating process and analyze the data accordingly.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maxim van Reenen

Join Date: May 2018

Posts: 5
#8

06 Jun 2018, 03:14

Alright, thanks!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment