Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Steps to conduct a panel analysis

    Dear Stata Forum,

    I have a panel (N = 3584 and T = 14) after omitting observations with any missing values. What I’m trying to analyze if the impact of business practices on firm performance.
    With my decent knowledge on statistics, I have been reading various posts and come up with the following steps to perform the analysis for my paper.
    However, I'm unsure with the following points.
    Question 1) if my steps are econometrically sound
    Question 2) what to do after I perform the last step

    I appreciate your advice.


    The following part describes the steps I took.
    1) create the initial regression model
    2) check multicollinearity of 1) by -vif- and omit some dummy variables whose vif > 5
    3) declare the panel
    4) compare POLS model and RE model -xttest0-.
    Code:
    xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, re
    estimates store re_model
    xttest0
    Code:
    Breusch and Pagan Lagrangian multiplier test for random effects
    
            ln_tq[co_cik,t] = Xb + u[co_cik] + e[co_cik,t]
    
            Estimated results:
                             |       Var     SD = sqrt(Var)
                    ---------+-----------------------------
                       ln_tq |   .2866833       .5354281
                           e |    .091313       .3021804
                           u |   .1865672       .4319343
    
            Test: Var(u) = 0
                                 chibar2(01) =  4595.75
                              Prob > chibar2 =   0.0000
    5) compare FE model and RE model -hausman-.
    The test says FE model is appropriate
    Code:
    xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe
    estimates store fe_model
    hausman fe_model re_model
    Code:
    F test that all u_i=0: F(459, 3103) = 15.11                  Prob > F = 0.0000
    Code:
    Test of H0: Difference in coefficients not systematic
    
       chi2(19) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                =  90.29
    Prob > chi2 = 0.0000
    (V_b-V_B is not positive definite)

    6) After the Hausman test, I invoke non-default standard errors for RE model: -xi: xtreg, re vce(robust)- and -xtoverid-
    The test says the null is rejected, so I go with the robust FE model.
    Code:
    xi: xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, re vce(robust)
    xtoverid
    Code:
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(co_cik)
    Sargan-Hansen statistic 134.156  Chi-sq(21)   P-value = 0.0000
    7) check heteros in the FE model
    Code:
    xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe
    xttest3
    Code:
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (460)  =   1.4e+35
    Prob>chi2 =      0.0000
    8) check autocorrelation in the FE model
    Code:
    xtserial ln_tq x1 x2 x3 x4 x5 size re exps
    Code:
    . xtserial ln_tq x1 x2 x3 x4 x5 size re exps
    
    Wooldridge test for autocorrelation in panel data
    H0: no first order autocorrelation
        F(  1,     369) =     65.771
               Prob > F =      0.0000
    9) since both the heteros and autocorrelation tests are rejected, I invoke non-default standard errors into the FE model
    Code:
    xtreg ln_tq x1 x2 x3 x4 x5 size re exps i.year, fe robust
    Code:
    Fixed-effects (within) regression               Number of obs     =      3,584
    Group variable: co_cik                          Number of groups  =        460
    
    R-squared:                                      Obs per group:
         Within  = 0.1323                                         min =          1
         Between = 0.0000                                         avg =        7.8
         Overall = 0.0162                                         max =         14
    
                                                    F(21, 459)        =      17.14
    corr(u_i, Xb) = -0.1523                         Prob > F          =     0.0000
    
                                   (Std. err. adjusted for 460 clusters in co_cik)
    ------------------------------------------------------------------------------
                 |               Robust
           ln_tq | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |  -.0101627    .011002    -0.92   0.356    -.0317832    .0114579
              x2 |  -.0288111   .0102505    -2.81   0.005    -.0489549   -.0086673
              x3 |  -.0065449   .0089465    -0.73   0.465    -.0241262    .0110364
              x4 |  -.0000741   .0018472    -0.04   0.968     -.003704    .0035559
              x5 |  -.0085098   .0197696    -0.43   0.667    -.0473599    .0303404
            size |  -.0097784   .0046018    -2.12   0.034    -.0188217   -.0007352
              re |  -8.93e-09   6.30e-07    -0.01   0.989    -1.25e-06    1.23e-06
            exps |   .0000149   6.68e-06     2.23   0.026     1.76e-06     .000028
                 |
            year |
           2011  |  -.0341651   .0178455    -1.91   0.056    -.0692341     .000904
           2012  |  -.0296391   .0218184    -1.36   0.175    -.0725153    .0132372
           2013  |   .1215146   .0255803     4.75   0.000     .0712456    .1717835
           2014  |   .1738158   .0273033     6.37   0.000     .1201609    .2274708
           2015  |   .1637915   .0311645     5.26   0.000     .1025487    .2250343
           2016  |   .1830113   .0288137     6.35   0.000     .1263881    .2396345
           2017  |   .2382127   .0320691     7.43   0.000     .1751923    .3012331
           2018  |   .1821995   .0352102     5.17   0.000     .1130064    .2513926
           2019  |   .2392736   .0354988     6.74   0.000     .1695133    .3090338
           2020  |   .3039198   .0380133     8.00   0.000     .2292181    .3786215
           2021  |   .3863713   .0370685    10.42   0.000     .3135263    .4592163
           2022  |   .1891405    .039879     4.74   0.000     .1107725    .2675085
           2023  |   .1689371   .0426209     3.96   0.000     .0851809    .2526934
                 |
           _cons |   .8312923   .0506949    16.40   0.000     .7316695    .9309151
    -------------+----------------------------------------------------------------
         sigma_u |  .50856322
         sigma_e |  .30218043
             rho |  .73906807   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------



  • #2
    I don't understand the problem. On a broad level, you have a very small time dimension. Why can we hope to learn something meaningful about business practices (the effects of which may be long term) for only 14 years of data?

    Comment


    • #3
      Jun:
      welcome to this forum.
      Some comments about you post:
      1) as far as your question #1 is concerned, with such a large sample non default standard errors are the way to go.
      This way:
      a) you can skip heteroskedastcity and serial correlation tests;
      b) you can go -xtoverid- without considering -hausman- (that, as we know, does not support non-default standard errors);
      c) here and elsewhere, multicollinearity is hardly an issue (see Chapter 23, A Course in Econometrics — Harvard University Press, by Arthur S. Goldberger);
      d) your point 9): being pedantic, I 'd rehrase it as follows:
      since both the heteros and autocorrelation tests rejected the null, I invoke non-default standard errors into the FE model
      ;
      2) as far as your question #2 is concerned:
      a) check whether the functional form of your regressand is correctly specified (aka: do the -linktest- by hand):
      Code:
      . use "https://www.stata-press.com/data/r18/nlswork.dta"
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . xtreg ln_wage c.age##c.age, re vce(cluster idcode)
      
      Random-effects GLS regression                   Number of obs     =     28,510
      Group variable: idcode                          Number of groups  =      4,710
      
      R-squared:                                      Obs per group:
           Within  = 0.1087                                         min =          1
           Between = 0.1015                                         avg =        6.1
           Overall = 0.0870                                         max =         15
      
                                                      Wald chi2(2)      =    1258.33
      corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
      
                                   (Std. err. adjusted for 4,710 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               age |   .0590339   .0041049    14.38   0.000     .0509884    .0670795
                   |
       c.age#c.age |  -.0006758   .0000688    -9.83   0.000    -.0008107    -.000541
                   |
             _cons |   .5479714   .0587198     9.33   0.000     .4328826    .6630601
      -------------+----------------------------------------------------------------
           sigma_u |   .3654049
           sigma_e |  .30245467
               rho |  .59342665   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . predict fitted, xb
      (24 missing values generated)
      
      . g sq_fitted=fitted^2
      (24 missing values generated)
      
      . xtreg ln_wage fitted sq_fitted , re vce(cluster idcode)
      
      Random-effects GLS regression                   Number of obs     =     28,510
      Group variable: idcode                          Number of groups  =      4,710
      
      R-squared:                                      Obs per group:
           Within  = 0.1088                                         min =          1
           Between = 0.1045                                         avg =        6.1
           Overall = 0.0887                                         max =         15
      
                                                      Wald chi2(2)      =    1316.74
      corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
      
                                   (Std. err. adjusted for 4,710 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            fitted |   2.805959   .6246598     4.49   0.000     1.581648    4.030269
         sq_fitted |  -.5516341   .1920793    -2.87   0.004    -.9281026   -.1751656
             _cons |  -1.468083   .5055433    -2.90   0.004     -2.45893   -.4772365
      -------------+----------------------------------------------------------------
           sigma_u |  .36481589
           sigma_e |  .30242516
               rho |  .59269507   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . test sq_fitted
      
       ( 1)  sq_fitted = 0
      
                 chi2(  1) =    8.25
               Prob > chi2 =    0.0041
      
      .
      As the -test- outcome rejects the null, the regression is clearly misspecified.
      b) you may want to challenge yourself with postestimation commands, such as -test- and -testparm-
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Hello Carlo,

        Thank you very much for your attentive feedback. I appreciate it!

        Hello Jared,

        Yes, you raised a valid point. Thank you!

        Comment

        Working...
        X