Addressing Heteroskedasticity, Autocorrelation, and Endogeneity in FEM with Micro Panel Data in Stata

Doan Ngan

Join Date: Jan 2025

Posts: 6
#1

Addressing Heteroskedasticity, Autocorrelation, and Endogeneity in FEM with Micro Panel Data in Stata

02 Jan 2025, 06:59

Hello everyone,
I am currently working with micro panel data and have encountered some issues in my analysis. After selecting the appropriate model, I performed the Hausman test, which indicated that the Fixed Effects Model (FEM) is the preferred choice. Subsequently, I tested for heteroskedasticity and autocorrelation in the FEM, and the results confirmed the presence of both issues. Additionally, based on a study I referenced, endogeneity is also likely present in the data. However, I am unsure how to test for endogeneity in this context.

Given these findings, how should I address heteroskedasticity, autocorrelation, and endogeneity in the FEM? I am using Stata for my analysis. Thank you.

Last edited by Doan Ngan; 02 Jan 2025, 07:02.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3120
#2

02 Jan 2025, 13:18

clustered errors will address the hetero and autocorr. Endogeneity will require some type of IV approach. What sort of variables are the DV and potentially endogenous variable?
1 like
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 6
#3

02 Jan 2025, 17:49

Originally posted by George Ford View Post

clustered errors will address the hetero and autocorr. Endogeneity will require some type of IV approach. What sort of variables are the DV and potentially endogenous variable?

Thank you for your response.

Dependent Variable (DV): Green Investment (GI), proxied by total nuclear, renewables, and other energy production.

Independent Variables (IVs): Information and Communications Technology (ICT), Financial Development (FD), GDP per capita, CO2 emissions, Human Capital (HC), Trade Openness, Financial Globalization, and Natural Resources Rents (NRR).

Potential Endogenous Variable: Financial Development (FD) is suspected to be endogenous due to potential reverse causality, as green investment could also influence financial development.

The context of my study focuses on the impact of ICT and financial development on green investment in highly polluted economies. Data spans from 2000 to 2021, covering 88 countries.

Could you suggest appropriate instrumental variables or methodologies to test and address the endogeneity of FD in this case?
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#4

03 Jan 2025, 08:02

Everything is continuous so it's straightforward. ivreghdfe or ivreg2. You'll need instruments to test exogeneity. I might look for stuff that affects basic consumer banking since that's far Green Investment. Looks like WorldBank has ATM/1000000, bank branches, etc.....

Both ivreg2 and ivreghdfe can provide a test of exogeneity (add endog(FD) as an option).

I'm not convinced it's a problem. Motives for energy investments are unlike normal financial transactions. In many countries the government funds it, and if there was some market problem with getting financing, the government would step in. That is, the dependent variable has a strong policy influence, and nuclear energy is not off-the-shelf.

Also, CO2emmissions in endogenous in that model, no?
Comment

Doan Ngan

Join Date: Jan 2025
Posts: 6

03 Jan 2025, 19:29

I’m not entirely sure whether variables like CO2 emissions or FD (financial development) are actually endogenous. The paper I referenced (link) uses a 2SLS approach to address potential endogeneity but does not explicitly state which variables are considered endogenous or what instruments were used to resolve the issue.

That’s why I tried treating each variable as endogenous one by one and used their lags as instruments. Specifically, I ran the model using ivreg2 ..., endog() in Stata. The results showed that log(GDP) (lGDP) is endogenous.

Code:

 ivreg2 GI FD ICT lCO2 lHC lTrade lFG NRR (lGDP = l.lGDP), endog(lGDP)
Warning: time variable year has 25 gap(s) in relevant range

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

                                                      Number of obs =     1430
                                                      F(  8,  1421) =   107.06
                                                      Prob > F      =   0.0000
Total (centered) SS     =  4878.936376                Centered R2   =   0.3763
Total (uncentered) SS   =  5435.226302                Uncentered R2 =   0.4402
Residual SS             =  3042.769701                Root MSE      =    1.459

------------------------------------------------------------------------------
          GI | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        lGDP |   .3024666   .0823188     3.67   0.000     .1411246    .4638085
          FD |  -.1702853   .3646959    -0.47   0.641    -.8850762    .5445055
         ICT |   -.114932   .0707552    -1.62   0.104    -.2536096    .0237456
        lCO2 |   .6241343   .0394167    15.83   0.000     .5468789    .7013897
         lHC |  -1.349843   .2074737    -6.51   0.000    -1.756484   -.9432023
      lTrade |  -.8112085   .0910336    -8.91   0.000    -.9896311   -.6327859
         lFG |   1.634508   .2746078     5.95   0.000     1.096287    2.172729
         NRR |  -.0135545     .00445    -3.05   0.002    -.0222763   -.0048327
       _cons |  -5.910915   1.160105    -5.10   0.000     -8.18468   -3.637151
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic):        1421.020
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              2.2e+05
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments):           0.000
                                                 (equation exactly identified)
-endog- option:
Endogeneity test of endogenous regressors:                               2.804
                                                   Chi-sq(1) P-val =    0.0940
Regressors tested:    lGDP
------------------------------------------------------------------------------
Instrumented:         lGDP
Included instruments: FD ICT lCO2 lHC lTrade lFG NRR
Excluded instruments: L.lGDP
------------------------------------------------------------------------------

However, when I followed a method I found on a forum to test for endogeneity, the results indicated that lCO2 and ICT are endogenous instead.

Code:

xtreg ICT FD lGDP lCO2 lHC lTrade lFG NRR L.ICT, fe 

Fixed-effects (within) regression               Number of obs     =      1,426
Group variable: id                              Number of groups  =         79

R-squared:                                      Obs per group:
     Within  = 0.9854                                         min =          4
     Between = 0.9898                                         avg =       18.1
     Overall = 0.9869                                         max =         21

                                                F(8,1339)         =   11265.06
corr(u_i, Xb) = 0.2210                          Prob > F          =     0.0000

------------------------------------------------------------------------------
         ICT | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          FD |  -.0365899   .0497154    -0.74   0.462    -.1341186    .0609387
        lGDP |  -.0194949   .0196783    -0.99   0.322    -.0580985    .0191088
        lCO2 |   .0414793    .012239     3.39   0.001     .0174696    .0654889
         lHC |  -.0534944   .0238189    -2.25   0.025    -.1002209    -.006768
      lTrade |   .0170831   .0152012     1.12   0.261    -.0127375    .0469038
         lFG |   .0504013   .0255411     1.97   0.049     .0002964    .1005063
         NRR |   .0045693   .0006247     7.31   0.000     .0033439    .0057947
             |
         ICT |
         L1. |   .8904805   .0052027   171.16   0.000      .880274    .9006869
             |
       _cons |   .2663344   .2223763     1.20   0.231    -.1699095    .7025782
-------------+----------------------------------------------------------------
     sigma_u |   .0738328
     sigma_e |  .07114689
         rho |  .51851976   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(78, 1339) = 2.14                    Prob > F = 0.0000

. predict e1, e
(334 missing values generated)

. xtreg lCO2 ICT lGDP FD lHC lTrade lFG NRR L.lCO2, fe

Fixed-effects (within) regression               Number of obs     =      1,432
Group variable: id                              Number of groups  =         79

R-squared:                                      Obs per group:
     Within  = 0.9106                                         min =          4
     Between = 0.9959                                         avg =       18.1
     Overall = 0.9939                                         max =         21

                                                F(8,1345)         =    1713.43
corr(u_i, Xb) = 0.4964                          Prob > F          =     0.0000

------------------------------------------------------------------------------
        lCO2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         ICT |  -.0013211   .0053219    -0.25   0.804    -.0117611    .0091189
        lGDP |   .0643995   .0179453     3.59   0.000     .0291957    .0996032
          FD |  -.0519122   .0459337    -1.13   0.259    -.1420216    .0381972
         lHC |   .0250903   .0218544     1.15   0.251    -.0177822    .0679627
      lTrade |   .0129446   .0140771     0.92   0.358    -.0146708      .04056
         lFG |  -.0310768   .0235728    -1.32   0.188    -.0773201    .0151666
         NRR |   .0012009   .0005762     2.08   0.037     .0000706    .0023313
             |
        lCO2 |
         L1. |   .9462465   .0116915    80.93   0.000      .923311     .969182
             |
       _cons |   .0208739   .2054028     0.10   0.919    -.3820708    .4238186
-------------+----------------------------------------------------------------
     sigma_u |  .10734546
     sigma_e |  .06587119
         rho |  .72645335   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(78, 1345) = 2.66                    Prob > F = 0.0000

. predict e2, e
(328 missing values generated)

. xtreg GI lGDP FD lHC lTrade lFG NRR e1 e2, fe cluster(id)

Fixed-effects (within) regression               Number of obs     =      1,425
Group variable: id                              Number of groups  =         79

R-squared:                                      Obs per group:
     Within  = 0.0601                                         min =          4
     Between = 0.0555                                         avg =       18.0
     Overall = 0.0575                                         max =         21

                                                F(8,78)           =       2.27
corr(u_i, Xb) = 0.0031                          Prob > F          =     0.0306

                                    (Std. err. adjusted for 79 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          GI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        lGDP |   .3980835   .1488902     2.67   0.009     .1016658    .6945011
          FD |   -.049109   .4678615    -0.10   0.917    -.9805496    .8823317
         lHC |    .072506   .0818205     0.89   0.378    -.0903862    .2353981
      lTrade |   -.016467   .1121024    -0.15   0.884    -.2396458    .2067118
         lFG |  -.2653458   .2651809    -1.00   0.320    -.7932804    .2625889
         NRR |   .0008462   .0025865     0.33   0.744    -.0043031    .0059955
          e1 |  -.1090508   .0631722    -1.73   0.088     -.234817    .0167153
          e2 |  -.1818542    .074515    -2.44   0.017    -.3302021   -.0335063
       _cons |  -2.224472   2.000074    -1.11   0.269    -6.206314     1.75737
-------------+----------------------------------------------------------------
     sigma_u |  1.6813962
     sigma_e |  .26983173
         rho |  .97489255   (fraction of variance due to u_i)
------------------------------------------------------------------------------

I’m not sure which method is correct. Can you help clarify this for me?

Comment

George Ford

Join Date: Aug 2014

Posts: 3120
#6

06 Jan 2025, 08:51

The paper uses xtabond2 with lag(lGDP) as a regressor.

one-year lags on these persistent series do not make good instruments. use xtabond2.
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#7

06 Jan 2025, 09:01

ivqregress can do the IV quantile part.
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 6
#8

06 Jan 2025, 09:20

Thank you very much for your support and guidance in addressing my questions over the past time. I truly appreciate your help and the time you've taken to assist me.
Comment

Announcement