Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ivreg2 giving negative centred r2 values and very large confidence intervals

    Hello Statalist

    I am very inexperienced with stata and I am currently trying to test a hypothesis using countries gdp and consumption data. A key part to testing my hypothesis involves using the ivreg2 command to estimate a key variable. It's my first time using this command and when I ran my first two regressions, they came back looking like I predicted. However my last two regressions using ivreg2 come back with negative centred r2 value and very large confidence intervals which was not expected and believe I have gone wrong somewhere. I am completely aware that the negative centred r2 is possible and doesn't mean that I have gone wrong. So I was just wondering if someone could possibly look at my do file and maybe see if I have gone wrong somewhere.
    Using Stata/IC 16.0

    Code:
    // Look in the folder with the project in
    cd H:\ConsumptionGdp
    
    // Import the dataset, clear anything in memory
    import delimited "H:\ConsGdp\consumptionData.csv", clear
    
    // Install the addons needed for Instrumental variable regression
    ssc install ivreg2 
    ssc install ranktest 
    
    // Sort by country and year
    sort country year
    
    // Removes unwanted countries
    keep if country== "United Kingdom"
    
    //Set year as time series data
    tsset year
    
    // creates variable for log of gdp
    gen lngdp = log(gdp)
    // creates variable for log of consumption
    gen lncons = log(consumption)
    
    // Do Dfuller test for stationarity. Both of them are rejected at 10% critical value. Rejection means that there is a unit root and so they are nonstationary.
    dfuller lngdp
    dfuller lncons
    
    // The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For GDP:
    gen dif1_lngdp = D.lngdp
    gen dif2_lngdp = D2.lngdp 
    //Generate lag1 and lag2 of gdp
    gen lag1_lngdp = L.lngdp
    gen lag2_lngdp = L2.lngdp
    
    // Check for autocorrelation in the data, using 8 lags
    corrgram lngdp, lags(8)
    
    //regress dif of lngdp on lag of lngdp
    reg D.lngdp L.lngdp, rob
    //regress dif of lngdp on lag of gdp
    reg D.lngdp L.lngdp, rob 
    
    // Create a graph of it so you can see it's stationary
    tsline d.lngdp
    
    // The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For consumption:
    gen dif1_lncons = D.lncons
    gen dif2_lncons = D2.lncons 
    //Generate lag1 and lag2 of consumption
    gen lag1_lncons = L.lncons
    gen lag2_lncons = L2.lncons
    
    // Check for autocorrelation in the data, using 8 lags
    corrgram lngdp, lags(8)
    
    //regress dif of lncons on lag of lncons
    reg D.lncons L.lncons, rob
    //regress dif of lncons on lag of consumption trend
    reg D.lngdp L.lngdp, rob 
    // Create a graph of it so you can see it's stationary
    tsline d.lncons
    
    ** REGRESSIONS
    //Lagged at least twice to get rid of first-order serial correlation
    //Regress a simple OLS model {This has endogeneity problem},{x causes y but y may cause x} robust Need to change to log values
    reg consumption gdp, rob
    // regress y lagged from t-2 to t-4 on consumption, then on income
    reg d.lncons L(2/4).d.lngdp, rob
    reg d.lngdp L(2/4).d.lngdp, rob
    // regress y lagged from t-2 to t-6 on consumption, then on income
    reg d.lncons L(2/6).d.lngdp, rob
    reg d.lngdp L(2/6).d.lngdp,rob
    // regress y lagged from t-2 to t-4 on consumption, then on income
    reg d.lncons L(2/4).d.lncons, rob
    reg d.lngdp L(2/4).d.lncons, rob
    // regress c lagged from t-2 to t-6 on consumption, then on income
    reg d.lncons L(2/6).d.lncons, rob
    reg d.lngdp L(2/6).d.lncons, rob
    
    
    //Using ivreg2 to estimate the lambda values as the error term may be correlated with the change in income so OLS cannot be used
    ivreg2 cons (gdp = L(2/4)gdp), rob
    // 
    ivreg2 d.lncons (d.lngdp = L(2/4).d.lngdp), rob
    //
    ivreg2 d.lncons (d.lngdp = L(2/6).d.lngdp), rob
    //
    ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob
    // 
    ivreg2 d.lngdp (d.lncons = L(2/6).d.lncons),rob
    One of my weird regression results
    Code:
    . ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob
    
    IV (2SLS) estimation
    --------------------
    
    Estimates efficient for homoskedasticity only
    Statistics robust to heteroskedasticity
    
                                                          Number of obs =       44
                                                          F(  1,    42) =     0.11
                                                          Prob > F      =   0.7387
    Total (centered) SS     =  .0160808974                Centered R2   =  -0.4563
    Total (uncentered) SS   =  .0367406497                Uncentered R2 =   0.3626
    Residual SS             =  .0234181371                Root MSE      =   .02307
    
    ------------------------------------------------------------------------------
                 |               Robust
         D.lngdp |      Coef.     Std. Err.        z      P>|z|       [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          lncons |
             D1. |  -.2960821   .8615568    -0.34   0.731    -1.984702    1.392538
                 |
           _cons |   .0283756   .0193331     1.47   0.142    -.0095165    .0662678
    ------------------------------------------------------------------------------
    Underidentification test (Kleibergen-Paap rk LM statistic):              1.971
                                                       Chi-sq(3) P-val =    0.5785
    ------------------------------------------------------------------------------
    Weak identification test (Cragg-Donald Wald F statistic):                0.831
                             (Kleibergen-Paap rk Wald F statistic):          0.636
    Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    13.91
                                             10% maximal IV relative bias     9.08
                                             20% maximal IV relative bias     6.46
                                             30% maximal IV relative bias     5.39
                                             10% maximal IV size             22.30
                                             15% maximal IV size             12.83
                                             20% maximal IV size              9.54
                                             25% maximal IV size              7.80
    Source: Stock-Yogo (2005).  Reproduced by permission.
    NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
    ------------------------------------------------------------------------------
    Hansen J statistic (overidentification test of all instruments):         0.866
                                                       Chi-sq(2) P-val =    0.6486
    ------------------------------------------------------------------------------
    Instrumented:         D.lncons
    Excluded instruments: L2D.lncons L3D.lncons L4D.lncons
    ------------------------------------------------------------------------------
    Any help/guidance would be appreciated a lot, thank you very much.
Working...
X