Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question regarding unbalanced panel data

    Hello, I am working on my undergraduate dissertation and am looking at the effects of ESG on financial performance for US energy firms, running pooled OLS, fixed effects and Arellano bond estimators.

    Currently, I am struggling to decide the amount of years to explore. I had originally decided to explore a period of 5 years after discussing with my supervisor and being warned about how unbalanced panel data could bias dynamic panel estimators such as Arellano Bond, but after reading the paper Baltagi, B.H. and Chang, Y.-J. (1994) ‘Incomplete panels’, Journal of Econometrics, 62(2), pp. 67–89. doi:10.1016/0304-4076(94)90017-5. , which finds attempting to make the data balanced by dropping observations worsens the performance of estimators compared to using the entire unbalanced data set.

    Code:
    tabulate year has_esg
    
               |        has_esg
          year |         0          1 |     Total
    -----------+----------------------+----------
          2004 |        87         18 |       105 
          2005 |        90         27 |       117 
          2006 |        96         32 |       128 
          2007 |       100         33 |       133 
          2008 |        99         39 |       138 
          2009 |        97         44 |       141 
          2010 |        89         61 |       150 
          2011 |        98         67 |       165 
          2012 |        98         74 |       172 
          2013 |       104         77 |       181 
          2014 |       108         81 |       189 
          2015 |       107         86 |       193 
          2016 |       111         94 |       205 
          2017 |       101        116 |       217 
          2018 |        71        155 |       226 
          2019 |        60        170 |       230 
          2020 |        51        184 |       235 
          2021 |        27        216 |       243 
          2022 |        10        236 |       246 
          2023 |         0        247 |       247 
    -----------+----------------------+----------
         Total |     1,604      2,057 |     3,661
    Also, I am planning to run all my regressions with 2 specifications, one including the control variable "R&D intensity" and one without, as there are many missing observations (RD_TR in summary statistics table), however, it is an important variable used in past literature, and I will be mentioning this as a limitation of my study, is this fine?

    Code:
    tabulate year has_RD
    
               |        has_RD
          year |         0          1 |     Total
    -----------+----------------------+----------
          2004 |        76         29 |       105 
          2005 |        85         32 |       117 
          2006 |        94         34 |       128 
          2007 |        98         35 |       133 
          2008 |       100         38 |       138 
          2009 |        96         45 |       141 
          2010 |       105         45 |       150 
          2011 |       111         54 |       165 
          2012 |       118         54 |       172 
          2013 |       125         56 |       181 
          2014 |       130         59 |       189 
          2015 |       132         61 |       193 
          2016 |       141         64 |       205 
          2017 |       152         65 |       217 
          2018 |       156         70 |       226 
          2019 |       160         70 |       230 
          2020 |       164         71 |       235 
          2021 |       166         77 |       243 
          2022 |       168         78 |       246 
          2023 |       169         78 |       247 
    -----------+----------------------+----------
         Total |     2,546      1,115 |     3,661
    Furthermore, when testing for heteroskedasticity for my models, I use xttest3 in Stata 17.0, however, I find no difference after using vce(robust) and am unsure as to why. Here I am using data from 2018-2023 and have dropped all firms which did not report their CO2 emissions.

    Here is a summary statistic table for before and after dropping observations based on the 2 criteria mentioned above:
    Code:
    summarize TOBIN ROA ESGC ENV SOC GOV lnCO2 EMIS Target Prod RU Policy RD_TR DE BETA SIZE age
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           TOBIN |      3,182    .9839965    .0682519   .2757434   2.460593
             ROA |      1,844    .0270842    .1612215    -4.7646     .99519
            ESGC |      2,057    36.47043    18.90526   .9054831   88.83961
             ENV |      2,057    29.20133    26.18804          0   96.92313
             SOC |      2,057    36.77819    22.12286   .4434122   94.85254
    -------------+---------------------------------------------------------
             GOV |      2,057    52.02294    23.23856   .2800454   98.42676
           lnCO2 |        996    14.13439    2.451057   1.699279   18.88316
            EMIS |      2,057    37.09499    31.56243          0   99.68553
          Target |      1,882    20.42819    35.29809          0   95.91837
            Prod |      3,661    .5605026    .4963937          0          1
    -------------+---------------------------------------------------------
              RU |      2,057    31.88244    31.62478          0   99.79839
          Policy |      3,661    .7437859    .4366011          0          1
           RD_TR |      1,115    .8232445    12.81121    -.00055   339.7368
              DE |      3,104    51.49517     29.2946          3        100
            BETA |      1,698    1.710485    1.005974  -3.574506   7.031454
    -------------+---------------------------------------------------------
            SIZE |      3,497    21.17943    2.214364   6.907755   26.63424
             age |      3,601    17.14524    18.58871          0        141
    
    . drop if has_lnCO2 == 0
    (2,665 observations deleted)
    
    . keep if year >=2018
    (336 observations deleted)
    
    . summarize TOBIN ROA ESGC ENV SOC GOV lnCO2 EMIS Target Prod RU Policy RD_TR DE BETA SIZE age
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
           TOBIN |        655    .9682978    .0600428   .4543625   1.213569
             ROA |        451    .0341375    .0863776     -.5233      .4139
            ESGC |        660     49.2028    16.77136   9.576389   88.83961
             ENV |        660    46.56343    22.21136   .2285714   96.92313
             SOC |        660    49.90482    21.10299   6.186469   94.84564
    -------------+---------------------------------------------------------
             GOV |        660    59.54962    22.04239   .2800454    96.5379
           lnCO2 |        660    13.56501    2.529294   1.699279    18.5429
            EMIS |        660    59.99171    24.13653          0    99.0625
          Target |        647    39.07833    39.55785          0   93.89313
            Prod |        660    .3075758    .4618399          0          1
    -------------+---------------------------------------------------------
              RU |        660    51.82752    27.55294          0   99.79839
          Policy |        660    .8651515    .3418207          0          1
           RD_TR |        259    .0323892    .1052471   .0000315   1.066443
              DE |        646    53.43344    25.08295          3        100
            BETA |        637    1.884157    .9794677  -.8407099   6.850461
    -------------+---------------------------------------------------------
            SIZE |        660    22.51583    1.594792   17.68055   26.63424
             age |        647    20.85781    22.66347          0        141
    Code:
    xtreg TOBIN ENV SOC GOV RD_TR DE BETA SIZE age, fe
    
    Fixed-effects (within) regression               Number of obs     =        278
    Group variable: ID                              Number of groups  =         58
    
    R-squared:                                      Obs per group:
         Within  = 0.6058                                         min =          1
         Between = 0.4386                                         avg =        4.8
         Overall = 0.5556                                         max =          8
    
                                                    F(8,212)          =      40.73
    corr(u_i, Xb) = -0.4431                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
           TOBIN | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             ENV |   -.000227   .0001361    -1.67   0.097    -.0004954    .0000413
             SOC |   .0001672   .0001384     1.21   0.228    -.0001057    .0004401
             GOV |   .0000361    .000101     0.36   0.722    -.0001631    .0002352
           RD_TR |  -.0295279    .012995    -2.27   0.024    -.0551439   -.0039119
              DE |  -.0012643   .0000784   -16.12   0.000    -.0014189   -.0011097
            BETA |  -.0040603   .0016222    -2.50   0.013    -.0072581   -.0008625
            SIZE |  -.0009862   .0036487    -0.27   0.787    -.0081785    .0062062
             age |   .0004106   .0006199     0.66   0.508    -.0008112    .0016325
           _cons |   1.058147   .0831861    12.72   0.000     .8941694    1.222125
    -------------+----------------------------------------------------------------
         sigma_u |  .02791698
         sigma_e |  .01557575
             rho |  .76260957   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(57, 212) = 5.39                     Prob > F = 0.0000
    
    . xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (58)  =    1.7e+29
    Prob>chi2 =      0.0000
    
    
    . xtreg TOBIN ENV SOC GOV RD_TR DE BETA SIZE age, fe vce(robust)
    
    Fixed-effects (within) regression               Number of obs     =        278
    Group variable: ID                              Number of groups  =         58
    
    R-squared:                                      Obs per group:
         Within  = 0.6058                                         min =          1
         Between = 0.4386                                         avg =        4.8
         Overall = 0.5556                                         max =          8
    
                                                    F(8,57)           =      33.75
    corr(u_i, Xb) = -0.4431                         Prob > F          =     0.0000
    
                                        (Std. err. adjusted for 58 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
           TOBIN | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             ENV |   -.000227   .0001707    -1.33   0.189    -.0005689    .0001148
             SOC |   .0001672   .0001588     1.05   0.297    -.0001507    .0004852
             GOV |   .0000361    .000114     0.32   0.753    -.0001922    .0002644
           RD_TR |  -.0295279   .0253025    -1.17   0.248    -.0801953    .0211395
              DE |  -.0012643   .0001046   -12.09   0.000    -.0014738   -.0010549
            BETA |  -.0040603   .0013741    -2.95   0.005     -.006812   -.0013087
            SIZE |  -.0009862   .0050542    -0.20   0.846     -.011107    .0091347
             age |   .0004106   .0006892     0.60   0.554    -.0009694    .0017907
           _cons |   1.058147    .113507     9.32   0.000     .8308535    1.285441
    -------------+----------------------------------------------------------------
         sigma_u |  .02791698
         sigma_e |  .01557575
             rho |  .76260957   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (58)  =    1.7e+29
    Prob>chi2 =      0.0000
Working...
X