Steps to conduct a panel analysis

Jun Gen

Join Date: Jun 2024
Posts: 2

Steps to conduct a panel analysis

22 Jun 2024, 18:19

Dear Stata Forum,

I have a panel (N = 3584 and T = 14) after omitting observations with any missing values. What I’m trying to analyze if the impact of business practices on firm performance.
With my decent knowledge on statistics, I have been reading various posts and come up with the following steps to perform the analysis for my paper.
However, I'm unsure with the following points.
Question 1) if my steps are econometrically sound
Question 2) what to do after I perform the last step

I appreciate your advice.

The following part describes the steps I took.
1) create the initial regression model
2) check multicollinearity of 1) by -vif- and omit some dummy variables whose vif > 5
3) declare the panel
4) compare POLS model and RE model -xttest0-.

Code:

xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, re
estimates store re_model
xttest0

Code:

Breusch and Pagan Lagrangian multiplier test for random effects

        ln_tq[co_cik,t] = Xb + u[co_cik] + e[co_cik,t]

        Estimated results:
                         |       Var     SD = sqrt(Var)
                ---------+-----------------------------
                   ln_tq |   .2866833       .5354281
                       e |    .091313       .3021804
                       u |   .1865672       .4319343

        Test: Var(u) = 0
                             chibar2(01) =  4595.75
                          Prob > chibar2 =   0.0000

5) compare FE model and RE model -hausman-.
The test says FE model is appropriate

Code:

xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe
estimates store fe_model
hausman fe_model re_model

Code:

F test that all u_i=0: F(459, 3103) = 15.11                  Prob > F = 0.0000

Code:

Test of H0: Difference in coefficients not systematic

   chi2(19) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            =  90.29
Prob > chi2 = 0.0000
(V_b-V_B is not positive definite)

6) After the Hausman test, I invoke non-default standard errors for RE model: -xi: xtreg, re vce(robust)- and -xtoverid-
The test says the null is rejected, so I go with the robust FE model.

Code:

xi: xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, re vce(robust)
xtoverid

Code:

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(co_cik)
Sargan-Hansen statistic 134.156  Chi-sq(21)   P-value = 0.0000

7) check heteros in the FE model

Code:

xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe
xttest3

Code:

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (460)  =   1.4e+35
Prob>chi2 =      0.0000

8) check autocorrelation in the FE model

Code:

xtserial ln_tq x1 x2 x3 x4 x5 size re exps

Code:

. xtserial ln_tq x1 x2 x3 x4 x5 size re exps

Wooldridge test for autocorrelation in panel data
H0: no first order autocorrelation
    F(  1,     369) =     65.771
           Prob > F =      0.0000

9) since both the heteros and autocorrelation tests are rejected, I invoke non-default standard errors into the FE model

Code:

xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe robust

Code:

Fixed-effects (within) regression               Number of obs     =      3,584
Group variable: co_cik                          Number of groups  =        460

R-squared:                                      Obs per group:
     Within  = 0.1323                                         min =          1
     Between = 0.0000                                         avg =        7.8
     Overall = 0.0162                                         max =         14

                                                F(21, 459)        =      17.14
corr(u_i, Xb) = -0.1523                         Prob > F          =     0.0000

                               (Std. err. adjusted for 460 clusters in co_cik)
------------------------------------------------------------------------------
             |               Robust
       ln_tq | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0101627    .011002    -0.92   0.356    -.0317832    .0114579
          x2 |  -.0288111   .0102505    -2.81   0.005    -.0489549   -.0086673
          x3 |  -.0065449   .0089465    -0.73   0.465    -.0241262    .0110364
          x4 |  -.0000741   .0018472    -0.04   0.968     -.003704    .0035559
          x5 |  -.0085098   .0197696    -0.43   0.667    -.0473599    .0303404
        size |  -.0097784   .0046018    -2.12   0.034    -.0188217   -.0007352
          re |  -8.93e-09   6.30e-07    -0.01   0.989    -1.25e-06    1.23e-06
        exps |   .0000149   6.68e-06     2.23   0.026     1.76e-06     .000028
             |
        year |
       2011  |  -.0341651   .0178455    -1.91   0.056    -.0692341     .000904
       2012  |  -.0296391   .0218184    -1.36   0.175    -.0725153    .0132372
       2013  |   .1215146   .0255803     4.75   0.000     .0712456    .1717835
       2014  |   .1738158   .0273033     6.37   0.000     .1201609    .2274708
       2015  |   .1637915   .0311645     5.26   0.000     .1025487    .2250343
       2016  |   .1830113   .0288137     6.35   0.000     .1263881    .2396345
       2017  |   .2382127   .0320691     7.43   0.000     .1751923    .3012331
       2018  |   .1821995   .0352102     5.17   0.000     .1130064    .2513926
       2019  |   .2392736   .0354988     6.74   0.000     .1695133    .3090338
       2020  |   .3039198   .0380133     8.00   0.000     .2292181    .3786215
       2021  |   .3863713   .0370685    10.42   0.000     .3135263    .4592163
       2022  |   .1891405    .039879     4.74   0.000     .1107725    .2675085
       2023  |   .1689371   .0426209     3.96   0.000     .0851809    .2526934
             |
       _cons |   .8312923   .0506949    16.40   0.000     .7316695    .9309151
-------------+----------------------------------------------------------------
     sigma_u |  .50856322
     sigma_e |  .30218043
         rho |  .73906807   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

22 Jun 2024, 20:53

I don't understand the problem. On a broad level, you have a very small time dimension. Why can we hope to learn something meaningful about business practices (the effects of which may be long term) for only 14 years of data?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17749

23 Jun 2024, 05:12

Jun:
welcome to this forum.
Some comments about you post:
1) as far as your question #1 is concerned, with such a large sample non default standard errors are the way to go.
This way:
a) you can skip heteroskedastcity and serial correlation tests;
b) you can go -xtoverid- without considering -hausman- (that, as we know, does not support non-default standard errors);
c) here and elsewhere, multicollinearity is hardly an issue (see Chapter 23, A Course in Econometrics — Harvard University Press, by Arthur S. Goldberger);
d) your point 9): being pedantic, I 'd rehrase it as follows:

since both the heteros and autocorrelation tests rejected the null, I invoke non-default standard errors into the FE model

;
2) as far as your question #2 is concerned:
a) check whether the functional form of your regressand is correctly specified (aka: do the -linktest- by hand):

Code:

. use "https://www.stata-press.com/data/r18/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1087                                         min =          1
     Between = 0.1015                                         avg =        6.1
     Overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    1258.33
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
             |
 c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
             |
       _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
-------------+----------------------------------------------------------------
     sigma_u |   .3654049
     sigma_e |  .30245467
         rho |  .59342665   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted sq_fitted , re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1088                                         min =          1
     Between = 0.1045                                         avg =        6.1
     Overall = 0.0887                                         max =         15

                                                Wald chi2(2)      =    1316.74
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   2.805959   .6246598     4.49   0.000     1.581648    4.030269
   sq_fitted |  -.5516341   .1920793    -2.87   0.004    -.9281026   -.1751656
       _cons |  -1.468083   .5055433    -2.90   0.004     -2.45893   -.4772365
-------------+----------------------------------------------------------------
     sigma_u |  .36481589
     sigma_e |  .30242516
         rho |  .59269507   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

           chi2(  1) =    8.25
         Prob > chi2 =    0.0041

.

As the -test- outcome rejects the null, the regression is clearly misspecified.
b) you may want to challenge yourself with postestimation commands, such as -test- and -testparm-

Kind regards,
Carlo
(Stata 19.0)

Comment

Jun Gen

Join Date: Jun 2024

Posts: 2
#4

24 Jun 2024, 18:42

Hello Carlo,

Thank you very much for your attentive feedback. I appreciate it!

Hello Jared,

Yes, you raised a valid point. Thank you!
Comment

Announcement

Steps to conduct a panel analysis

Comment

Comment

Comment