IVREGHDFE with absorb as well as cluster options shows INSUFFICIENT observations and ivreghdfe with absorb showed negative Centered R2

Wei LIIU

Join Date: Mar 2022

Posts: 19
#1

IVREGHDFE with absorb as well as cluster options shows INSUFFICIENT observations and ivreghdfe with absorb showed negative Centered R2

23 Mar 2022, 05:41

Hi everyone,

Hi dear, my estimation model looks like following:

Yijkt = β0 + β1 ∗ treat_j + β2 ∗ post_t + β3 ∗ treat_j ∗ post_t + β4X_it +β5y_ijk,t-1+ ξ_i + δ_t + ψ_k + ϵ_ijkt
where treat equals 1 if in the treatment group, post equals 1 if the year is equal to or greater than 2012 (policy takes effect in 2012), and X_it are city level time varying variables. ξ_i is the city fixed effect, δ_t is the time fixed effect, and ψ_k is the product fixed effect.

I am running a regression ivreghdfe sales_spec did_1 treat_1 $X1list (lag_sales=lag_price), absorb(i.code i.year i.product) vce(cl province) and ivreghdfe sales_spec did_1 treat_1 $X1list (lag_sales=lag_price), absorb(i.code i.year i.product,savefe) separately, however,the results are very strange:

My question is, why does the first regression show an insufficient number of observations? because there isn't a singleton here.My confusion with the second regression result is why there is a negative centered R square.
Tags: insufficient observations, ivreghdfe, negative R square
Lorenzo Bombino

Join Date: Dec 2021

Posts: 14
#2

19 Apr 2023, 02:47

Hey, I am facing a very similar situation. Did you understand what is going on?Did you find a solution?

Thanks!
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#3

19 Apr 2023, 03:41

I do not think you should use the i. notation in the variables being absorbed.
Comment
Lorenzo Bombino

Join Date: Dec 2021

Posts: 14
#4

19 Apr 2023, 07:50

Hi!

Unfortunately I did not include i. in my absorb option, so my issue must be due to something else.

I have a panel data on which I am trying to run a 2sls with fixed effects, but an issue arises.

If I use the command ivreghdfe including vce(cluster var) in the options, I get the error message "insufficient observations r(2001);"
If I use the command ivreghdfe without including vce(cluster var), I obtain a sound 2sls estimation with an high F statistic suggesting the instrument is not weak. However, I get negative centered and uncentered R2.
If I run xtivreg2, adding the option fe cluster(var), I don't get any error message but I get the same negative R2 (and the same coefficients as the previous estimation).

Since I do not have singletons nor clusters with only one observation, where could this issue come from?
Why do I get the error message "insufficient information" or a negative R2?

Thank you very much in advance for any hint you might have!

************************************************** ************************************************** *******

. ivreghdfe right trend population pop2 crime (imm = instrument) ///
> unemployment income pov, absorb(constituency_kanton) vce(cluster constituency_kanton)
insufficient observations
r(2001);

************************************************** ************************************************** *******

ivreghdfe right trend population pop2 crime (imm = instrument) ///
> unemployment income pov, absorb(constituency_kanton)
(MWFE estimator converged in 1 iterations)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

Number of obs = 607
F( 8, 495) = 7.63
Prob > F = 0.0000
Total (centered) SS = 2.118301826 Centered R2 = -3.3205
Total (uncentered) SS = 2.118301826 Uncentered R2 = -3.3205
Residual SS = 9.152046786 Root MSE = .136

------------------------------------------------------------------------------
right | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
imm | .0752692 .0107059 7.03 0.000 .0542346 .0963037
trend | -.0034977 .0072386 -0.48 0.629 -.01772 .0107246
population | -.0088932 .0016944 -5.25 0.000 -.0122224 -.005564
pop2 | 1.20001 .3759172 3.19 0.002 .4614201 1.9386
crime | .3478775 .480614 0.72 0.470 -.5964174 1.292172
unemployment | .4054641 .426799 0.95 0.343 -.433097 1.244025
income | -.0236863 .0132427 -1.79 0.074 -.0497051 .0023326
pov | .1717674 .2878754 0.60 0.551 -.393841 .7373759
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic): 57.718
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 52.014
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: imm
Included instruments: trend population pop2 crime unemployment income pov
Excluded instruments: instrument
Partialled-out: _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
---------------------+---------------------------------------|
constituency_kanton | 104 0 104 |
-------------------------------------------------------------+

************************************************** ************************************************** *******

. tsset constituency_id yr
panel variable: constituency_id (unbalanced)
time variable: yr, 1999 to 2019, but with gaps
delta: 1 unit

. xtivreg2 right trend population pop2 crime (imm = instrument) ///
> unemployment income pov, fe cluster(constituency_kanton)

FIXED EFFECTS ESTIMATION
------------------------
Number of groups = 104 Obs per group: min = 3
avg = 5.8
max = 6

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on constituency_kanton

Number of clusters (constituency_kanton) = 104 Number of obs = 607
F( 8, 103) = 16.43
Prob > F = 0.0000
Total (centered) SS = 2.118301826 Centered R2 = -3.3205
Total (uncentered) SS = 2.118301826 Uncentered R2 = -3.3205
Residual SS = 9.152046786 Root MSE = .1349

------------------------------------------------------------------------------
| Robust
right | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
imm | .0752692 .0090418 8.32 0.000 .0575476 .0929908
trend | -.0034977 .0077963 -0.45 0.654 -.0187783 .0117828
population | -.0088932 .0028699 -3.10 0.002 -.014518 -.0032683
pop2 | 1.20001 .4283309 2.80 0.005 .360497 2.039523
crime | .3478775 .722296 0.48 0.630 -1.067797 1.763552
unemployment | .4054641 .7041382 0.58 0.565 -.9746215 1.78555
income | -.0236863 .013788 -1.72 0.086 -.0507103 .0033378
pov | .1717674 .3564138 0.48 0.630 -.5267908 .8703257
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 36.758
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 52.014
(Kleibergen-Paap rk Wald F statistic): 67.219
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: imm
Included instruments: trend population pop2 crime unemployment income pov
Excluded instruments: instrument
------------------------------------------------------------

Attached Files
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#5

19 Apr 2023, 08:11

Maybe you can check these issues with the authors of the commands but I would not worry about the R2, especially in an IV regression where the standard interpretation does not apply.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10085
#6

19 Apr 2023, 08:13

ivreghdfe is from SSC (FAQ Advice #12). Earlier versions of the command could allow string variables within -absorb()- and -vce()-, but not the latest version from GitHub. For the $R^2$ statistics, refer to the ivreg2 documentation on how these are calculated, but typically these are not meaningful in IV regressions. See https://www.stata.com/support/faqs/s...least-squares/.

Note: Crossed with #5.
2 likes
Comment
Lorenzo Bombino

Join Date: Dec 2021

Posts: 14
#7

19 Apr 2023, 08:29

Dear Joao, dear Andrew,

Thanks a lot for your help!

Indeed, encoding the variable within absorb and vce makes ivreghdfe work!

Thanks a lot!
1 like
Comment

Announcement

IVREGHDFE with absorb as well as cluster options shows INSUFFICIENT observations and ivreghdfe with absorb showed negative Centered R2

Comment

Comment

Comment

Comment

Comment

Comment