Ivreg2 giving negative centred r2 values and very large confidence intervals

Ross Batcher

Join Date: Mar 2020
Posts: 1

Ivreg2 giving negative centred r2 values and very large confidence intervals

02 Mar 2020, 10:46

Hello Statalist

I am very inexperienced with stata and I am currently trying to test a hypothesis using countries gdp and consumption data. A key part to testing my hypothesis involves using the ivreg2 command to estimate a key variable. It's my first time using this command and when I ran my first two regressions, they came back looking like I predicted. However my last two regressions using ivreg2 come back with negative centred r2 value and very large confidence intervals which was not expected and believe I have gone wrong somewhere. I am completely aware that the negative centred r2 is possible and doesn't mean that I have gone wrong. So I was just wondering if someone could possibly look at my do file and maybe see if I have gone wrong somewhere.
Using Stata/IC 16.0

Code:

// Look in the folder with the project in
cd H:\ConsumptionGdp

// Import the dataset, clear anything in memory
import delimited "H:\ConsGdp\consumptionData.csv", clear

// Install the addons needed for Instrumental variable regression
ssc install ivreg2 
ssc install ranktest 

// Sort by country and year
sort country year

// Removes unwanted countries
keep if country== "United Kingdom"

//Set year as time series data
tsset year

// creates variable for log of gdp
gen lngdp = log(gdp)
// creates variable for log of consumption
gen lncons = log(consumption)

// Do Dfuller test for stationarity. Both of them are rejected at 10% critical value. Rejection means that there is a unit root and so they are nonstationary.
dfuller lngdp
dfuller lncons

// The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For GDP:
gen dif1_lngdp = D.lngdp
gen dif2_lngdp = D2.lngdp 
//Generate lag1 and lag2 of gdp
gen lag1_lngdp = L.lngdp
gen lag2_lngdp = L2.lngdp

// Check for autocorrelation in the data, using 8 lags
corrgram lngdp, lags(8)

//regress dif of lngdp on lag of lngdp
reg D.lngdp L.lngdp, rob
//regress dif of lngdp on lag of gdp
reg D.lngdp L.lngdp, rob 

// Create a graph of it so you can see it's stationary
tsline d.lngdp

// The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For consumption:
gen dif1_lncons = D.lncons
gen dif2_lncons = D2.lncons 
//Generate lag1 and lag2 of consumption
gen lag1_lncons = L.lncons
gen lag2_lncons = L2.lncons

// Check for autocorrelation in the data, using 8 lags
corrgram lngdp, lags(8)

//regress dif of lncons on lag of lncons
reg D.lncons L.lncons, rob
//regress dif of lncons on lag of consumption trend
reg D.lngdp L.lngdp, rob 
// Create a graph of it so you can see it's stationary
tsline d.lncons

** REGRESSIONS
//Lagged at least twice to get rid of first-order serial correlation
//Regress a simple OLS model {This has endogeneity problem},{x causes y but y may cause x} robust Need to change to log values
reg consumption gdp, rob
// regress y lagged from t-2 to t-4 on consumption, then on income
reg d.lncons L(2/4).d.lngdp, rob
reg d.lngdp L(2/4).d.lngdp, rob
// regress y lagged from t-2 to t-6 on consumption, then on income
reg d.lncons L(2/6).d.lngdp, rob
reg d.lngdp L(2/6).d.lngdp,rob
// regress y lagged from t-2 to t-4 on consumption, then on income
reg d.lncons L(2/4).d.lncons, rob
reg d.lngdp L(2/4).d.lncons, rob
// regress c lagged from t-2 to t-6 on consumption, then on income
reg d.lncons L(2/6).d.lncons, rob
reg d.lngdp L(2/6).d.lncons, rob


//Using ivreg2 to estimate the lambda values as the error term may be correlated with the change in income so OLS cannot be used
ivreg2 cons (gdp = L(2/4)gdp), rob
// 
ivreg2 d.lncons (d.lngdp = L(2/4).d.lngdp), rob
//
ivreg2 d.lncons (d.lngdp = L(2/6).d.lngdp), rob
//
ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob
// 
ivreg2 d.lngdp (d.lncons = L(2/6).d.lncons),rob

One of my weird regression results

Code:

. ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

                                                      Number of obs =       44
                                                      F(  1,    42) =     0.11
                                                      Prob > F      =   0.7387
Total (centered) SS     =  .0160808974                Centered R2   =  -0.4563
Total (uncentered) SS   =  .0367406497                Uncentered R2 =   0.3626
Residual SS             =  .0234181371                Root MSE      =   .02307

------------------------------------------------------------------------------
             |               Robust
     D.lngdp |      Coef.     Std. Err.        z      P>|z|       [95% Conf. Interval]
-------------+----------------------------------------------------------------
      lncons |
         D1. |  -.2960821   .8615568    -0.34   0.731    -1.984702    1.392538
             |
       _cons |   .0283756   .0193331     1.47   0.142    -.0095165    .0662678
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              1.971
                                                   Chi-sq(3) P-val =    0.5785
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):                0.831
                         (Kleibergen-Paap rk Wald F statistic):          0.636
Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    13.91
                                         10% maximal IV relative bias     9.08
                                         20% maximal IV relative bias     6.46
                                         30% maximal IV relative bias     5.39
                                         10% maximal IV size             22.30
                                         15% maximal IV size             12.83
                                         20% maximal IV size              9.54
                                         25% maximal IV size              7.80
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.866
                                                   Chi-sq(2) P-val =    0.6486
------------------------------------------------------------------------------
Instrumented:         D.lncons
Excluded instruments: L2D.lncons L3D.lncons L4D.lncons
------------------------------------------------------------------------------

Any help/guidance would be appreciated a lot, thank you very much.

Tags: None

Announcement

Ivreg2 giving negative centred r2 values and very large confidence intervals