Testing equality of coefficients from two identical instrumented regressions (ivreg2) estimated on different samples

Ellen Espin

Join Date: Feb 2019

Posts: 4
#1

Testing equality of coefficients from two identical instrumented regressions (ivreg2) estimated on different samples

08 Feb 2019, 15:07

Dear Statalist:

I am trying to make a comparison similar to that in Columns 1 and 2 of Table 6 in the paper http://www.ericzwick.com/stimulus/stimulus.pdf, which I have reproduced here:

I am running the separate models using the code:

Code:

xi: ivreg2 outcome (treatment = instrument) control1 control2 i.year i.province if small == 1 [pw=triangle], cluster(clusterid) estimates store ModelA xi: ivreg2 outcome (treatment = instrument) control1 control2 i.year i.province if small == 0 [pw=triangle], cluster(clusterid) estimates store ModelB

outcome is continuous, treatment is binary, the instrument is binary and small = 1 for the bottom 3 deciles of sales (within year province), 0 for the top 3 deciles, and missing otherwise. Triangle are triangular kernel weights.

I would like to compare the coefficients on treatment to see if they are significantly different.

1. The first part of my question is a Stata question: How do I properly code this comparison?

What I have tried:

I have found https://stats.idre.ucla.edu/stata/co...s-using-suest/, where the answer was to use suest. This does not appear to work with ivreg2; it throws errors for my triangular kernel weights and because ivreg2 doesn't cluster using vce(cluster).
(specifically my errors are "ModelA was estimated with pweights, you should re-estimate using iweights" and "ModelA was estimated with a nonstandard vce (cluster)")

Alternatively, I found https://www.stata.com/statalist/arch.../msg00487.html, which looked perfect.

Code:

gmm /// (eq1: outcome - {b1}*treatment - {b2}*control1 - {b3}*control2 - {b0} if small == 1) /// (eq2: outcome - {b1}*treatment - {b2}*control1 - {b3}*control2 - {b0} if small == 0), /// instruments(eq1: instrument control1 control2) /// instruments(eq2: instrument control1 control2) /// onestep winitial(unadjusted, indep)

But it seems that I am unable to use if statements within the equation definitions using gmm. My error is "could not evaluate equation 1 r(498)."

Am I correct I cannot use these methods for my setting, or am I simply not setting them up correctly?

2. The second part of my question is more of a statistical question, but is related to my first because it concerns another way I have tried to do sub-sample analysis:

A final way I have found to do this is to estimate a pooled regressions including interactions with my small dummy

Code:

g treatxsmall = treatment*small g instrumentxsmall = instrument*small xi: ivreg2 outcome (treatment treatxsmall = instrument instrumentxsmall) small control1 control2 i.year i.province [pw = triangle] if small != ., cluster(clusteridi)

Am I correct in believing that then the coefficient on treatxsmall will tell me if the difference between small firms and large firms is significant, i.e., could I report the p-value in the table as the p-value in the above table?

From looking at the replication code included with the published version of the paper, that appears to be what the authors do, i.e., run the sub-sample regressions and report their coefficients and standard errors and then run the pooled regression and report the p-value on the interaction term between the subsample indicator and treatment. However, their code is quite advanced for my level (they appear to use nested Stata sub-routines they can call repeatedly ("program define...")) and I find it hard to back out what they are doing from code alone while the underlying data is confidential so I can't reverse-engineer it easily.

I am using Stata/MP, version 14.2

Thank you for your time, Ellen
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10058

09 Feb 2019, 15:02

First, ivreg2 is from SSC (you are asked to explain). Second, a data example increases your chances of obtaining timely and helpful replies. As long as you are dealing with the same regression but different samples, joint estimation is always possible. This enables you to bypass suest's nonstandard VCE restriction. I will illustrate how you can do this using ivregress which handles factor variables, but the procedure works for ivreg2 as well.

Code:

webuse hsng2
*create 2 samples
gen group= cond(_n<26, 1, 2)

*separate regressions
ivregress 2sls rent pcturban (hsngval = faminc i.region) if group==1, cluster(division)
ivregress 2sls rent pcturban (hsngval = faminc i.region) if group==2, cluster(division)

*Joint regression (interact variables with group variable). Note with 2 constant terms, we create our own

gen cons=1
ivregress 2sls rent c.pcturban#i.group c.cons#i.group (c.hsngval#i.group = (c.faminc i.region)#i.group), nocons cluster(division)

*Now compare coefficients across groups, e.g.,
test 1.group#c.hsngval=  2.group#c.hsngval

Results:

Code:

. ivregress 2sls rent pcturban (hsngval = faminc i.region) if group==1, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =     190.16
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5444
                                                  Root MSE        =     27.437

                               (Std. Err. adjusted for 8 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hsngval |   .0023382    .000492     4.75   0.000      .001374    .0033024
    pcturban |  -.3853024   1.092628    -0.35   0.724    -2.526813    1.756208
       _cons |   147.9727   50.88684     2.91   0.004     48.23637    247.7091
------------------------------------------------------------------------------
Instrumented:  hsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

. 
. ivregress 2sls rent pcturban (hsngval = faminc i.region) if group==2, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =     179.25
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.7033
                                                  Root MSE        =     15.006

                               (Std. Err. adjusted for 9 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hsngval |   .0018066   .0003457     5.23   0.000      .001129    .0024842
    pcturban |    .487531    .350634     1.39   0.164    -.1996991    1.174761
       _cons |   114.5466   13.02833     8.79   0.000     89.01149    140.0816
------------------------------------------------------------------------------
Instrumented:  hsngval
Instruments:   pcturban faminc 2.region 3.region 4.region


. 
. ivregress 2sls rent c.pcturban#i.group c.cons#i.group (c.hsngval#i.group = (c.faminc i.region)#i.group), nocons cluster(divi
> sion)
note: 4.region#2.group dropped due to collinearity

Instrumental variables (2SLS) regression          Number of obs   =         50
                                                  Wald chi2(6)    =          .
                                                  Prob > chi2     =          .
                                                  R-squared       =          .
                                                  Root MSE        =     22.113

                                   (Std. Err. adjusted for 9 clusters in division)
----------------------------------------------------------------------------------
                 |               Robust
            rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
 group#c.hsngval |
              1  |   .0023382    .000492     4.75   0.000      .001374    .0033024
              2  |   .0018066   .0003457     5.23   0.000      .001129    .0024842
                 |
group#c.pcturban |
              1  |  -.3853024   1.092628    -0.35   0.724    -2.526813    1.756208
              2  |    .487531    .350634     1.39   0.164    -.1996991    1.174761
                 |
    group#c.cons |
              1  |   147.9727   50.88684     2.91   0.004     48.23637    247.7091
              2  |   114.5466   13.02833     8.79   0.000     89.01149    140.0816
----------------------------------------------------------------------------------
Instrumented:  1b.group#c.hsngval 2.group#c.hsngval
Instruments:   1b.group#c.pcturban 2.group#c.pcturban 1b.group#c.cons
               2.group#c.cons 1b.group#c.faminc 2.group#c.faminc
               1b.region#2.group 2.region#1b.group 2.region#2.group
               3.region#1b.group 3.region#2.group 4.region#1b.group

. 
. test 1.group#c.hsngval=  2.group#c.hsngval

 ( 1)  1b.group#c.hsngval - 2.group#c.hsngval = 0

           chi2(  1) =    0.75
         Prob > chi2 =    0.3864

Comment

Ellen Espin

Join Date: Feb 2019
Posts: 4

09 Feb 2019, 15:46

Thank you Andrew,

Please allow me to test my understanding of the joint regression:

We don't include the main effects, so this is not the equivalent of the pooled sub-sample analysis via interaction terms. Thus, we can't interpret either interaction term of hsngval with group as the magnitude and significance of the difference across groups.

Rather, we are running the equivalent of the seemingly unrelated regressions of suest: jointly estimating the two models separately. And thus we need to run the separate Chow test on the two coefficients.

Is that correct?

And this test is equivalent to running the pooled regression with an interaction term instead, taking group 1 as the reference/excluded group? In the example you gave, if I run:

Code:

webuse hsng2
gen group = cond(_n<26, 1, 2)
gen cons = 1

ivregress 2sls rent pcturban group c.pcturban#i.group c.cons#i.group (hsngval c.hsngval#i.group = faminc i.region (c.faminc i.region)#i.group), nocons cluster(division)
note: 2.group#c.cons omitted because of collinearity
note: 4.region#2.group dropped due to collinearity

Instrumental variables (2SLS) regression          Number of obs   =         50
                                                  Wald chi2(6)    =          .
                                                  Prob > chi2     =          .
                                                  R-squared       =          .
                                                  Root MSE        =     22.113

                                   (Std. Err. adjusted for 9 clusters in division)
----------------------------------------------------------------------------------
                 |               Robust
            rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         hsngval |   .0023382    .000492     4.75   0.000      .001374    .0033024                 |
 group#c.hsngval |
              2  |  -.0005316   .0006137    -0.87   0.386    -.0017344    .0006713
                 |
        pcturban |  -.3853024   1.092628    -0.35   0.724    -2.526813    1.756208
           group |   57.27328   6.514167     8.79   0.000     44.50575    70.04081
                 |
group#c.pcturban |
              2  |   .8728334     1.1577     0.75   0.451    -1.396216    3.141883
                 |
    group#c.cons |
              1  |   90.69947   51.08289     1.78   0.076    -9.421159    190.8201
              2  |          0  (omitted)
----------------------------------------------------------------------------------
Instrumented:  hsngval 2.group#c.hsngval
Instruments:   pcturban group 2.group#c.pcturban 1b.group#c.cons faminc 2.region
               3.region 4.region 2.group#c.faminc 1b.region#2.group
               2.region#2.group 3.region#2.group 4.region#2.group

The the interaction is indeed insignificant, telling us the same story as the Chow test, but is this general?

Thanks!
Ellen

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10058

10 Feb 2019, 04:32

Rather, we are running the equivalent of the seemingly unrelated regressions of suest: jointly estimating the two models separately. And thus we need to run the separate Chow test on the two coefficients.

Is that correct?

Yes, that is correct.

And this test is equivalent to running the pooled regression with an interaction term instead, taking group 1 as the reference/excluded group? In the example you gave, if I run:

Your idea is spot on but there is a minor glitch in your implementation. We do not need a separate constant in the pooled regression with interactions. The P-values of the interaction term and Wald test must be equivalent. Here is the corrected version yielding a P-value of 0.386.

Code:

. webuse hsng2
(1980 Census housing data)

. gen group = cond(_n<26, 1, 2)

. gen cons = 1

 ivregress 2sls rent pcturban group c.pcturban#i.group (hsngval c.hsngval#i.group = faminc i.region (c.fa
> minc i.region)#i.group), cluster(division)
note: 4.region#2.group dropped due to collinearity

Instrumental variables (2SLS) regression          Number of obs   =         50
                                                  Wald chi2(5)    =    1050.04
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.6008
                                                  Root MSE        =     22.113

                                   (Std. Err. adjusted for 9 clusters in division)
----------------------------------------------------------------------------------
                 |               Robust
            rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
         hsngval |   .0023382    .000492     4.75   0.000      .001374    .0033024
                 |
 group#c.hsngval |
              2  |  -.0005316   .0006137    -0.87   0.386    -.0017344    .0006713
                 |
        pcturban |  -.3853024   1.092628    -0.35   0.724    -2.526813    1.756208
           group |  -33.42619   52.09915    -0.64   0.521    -135.5387    68.68627
                 |
group#c.pcturban |
              2  |   .8728334     1.1577     0.75   0.451    -1.396216    3.141883
                 |
           _cons |   181.3989   102.1658     1.78   0.076    -18.84232    381.6402
----------------------------------------------------------------------------------
Instrumented:  hsngval 2.group#c.hsngval
Instruments:   pcturban group 2.group#c.pcturban faminc 2.region 3.region
               4.region 2.group#c.faminc 1b.region#2.group 2.region#2.group
               3.region#2.group

.

Comment

Ellen Espin

Join Date: Feb 2019

Posts: 4
#5

10 Feb 2019, 09:49

I understand now. Thank you for a very helpful and clear explanation Andrew. Ellen
Comment
Yuliya Kazakova

Join Date: Jun 2020

Posts: 4
#6

11 Jun 2020, 02:15

Dear Andrew Musau,

I am trying to solve the same issue but instead of a continuous endogenous variable, I have a dummy variable that I want to instrument, .i.e. in the example you discussed here my "hsngval" variable is binary (0 and 1). When I run the code you suggested, in the results I get that (1.hsngval#2.group) and (1.hsngval#1.group) are omitted because of collinearity and I have a coefficient only for (0.hsngval#1.group). Can I interpret the coefficient as the difference between small and large firms and the significance of this coefficient as the significance of the difference between small and large firms?

Thank you very much in advance!

Yuliya
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10058

11 Jun 2020, 04:49

For a 0/1 dummy, just treat the variable as continuous as you cannot have both categories in the regression at the same time. Doing so will just result in one category being collinear with the intercepts in the joint regression. Below, notice that it does not matter if I consider the variable as categorical and specify the base as 0 and if I consider the variable as continuous.

Code:

webuse hsng2
*create 2 samples
gen group= cond(_n<26, 1, 2)

qui sum hsngval, d
gen hihsngval= hsngval>`r(p50)'

*separate regressions (categorical with base 0)
ivregress 2sls rent pcturban (ib0.hihsngval = faminc i.region) if group==1, cluster(division)
ivregress 2sls rent pcturban (ib0.hihsngval = faminc i.region) if group==2, cluster(division)

*separate regressions (continuous)
ivregress 2sls rent pcturban (c.hihsngval = faminc i.region) if group==1, cluster(division)
ivregress 2sls rent pcturban (c.hihsngval = faminc i.region) if group==2, cluster(division)

Res.:

Code:

. *separate regressions (categorical with base 0)

. ivregress 2sls rent pcturban (ib0.hihsngval = faminc i.region) if group==1, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =      28.14
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1893
                                                  Root MSE        =     36.597

                               (Std. Err. adjusted for 8 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 1.hihsngval |    102.738   56.46556     1.82   0.069    -7.932433    213.4085
    pcturban |  -.7900045   1.561238    -0.51   0.613    -3.849974    2.269965
       _cons |   244.3666   88.91957     2.75   0.006      70.0874    418.6457
------------------------------------------------------------------------------
Instrumented:  1.hihsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

. ivregress 2sls rent pcturban (ib0.hihsngval = faminc i.region) if group==2, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =     204.79
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.4365
                                                  Root MSE        =     20.679

                               (Std. Err. adjusted for 9 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 1.hihsngval |   39.42183   11.22963     3.51   0.000     17.41216     61.4315
    pcturban |   .4301993   .5332451     0.81   0.420    -.6149419     1.47534
       _cons |   181.9472   29.65598     6.14   0.000     123.8225    240.0718
------------------------------------------------------------------------------
Instrumented:  1.hihsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

.
. *separate regressions (continuous)

. ivregress 2sls rent pcturban (c.hihsngval = faminc i.region) if group==1, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =      28.14
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1893
                                                  Root MSE        =     36.597

                               (Std. Err. adjusted for 8 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   hihsngval |    102.738   56.46556     1.82   0.069    -7.932433    213.4085
    pcturban |  -.7900045   1.561238    -0.51   0.613    -3.849974    2.269965
       _cons |   244.3666   88.91957     2.75   0.006      70.0874    418.6457
------------------------------------------------------------------------------
Instrumented:  hihsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

. ivregress 2sls rent pcturban (c.hihsngval = faminc i.region) if group==2, cluster(division)

Instrumental variables (2SLS) regression          Number of obs   =         25
                                                  Wald chi2(2)    =     204.79
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.4365
                                                  Root MSE        =     20.679

                               (Std. Err. adjusted for 9 clusters in division)
------------------------------------------------------------------------------
             |               Robust
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   hihsngval |   39.42183   11.22963     3.51   0.000     17.41216     61.4315
    pcturban |   .4301993   .5332451     0.81   0.420    -.6149419     1.47534
       _cons |   181.9472   29.65598     6.14   0.000     123.8225    240.0718
------------------------------------------------------------------------------
Instrumented:  hihsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

Last edited by Andrew Musau; 11 Jun 2020, 04:57.

Comment

Yuliya Kazakova

Join Date: Jun 2020

Posts: 4
#8

14 Jun 2020, 22:35

Dear Andrew Musau Thank you so much for your help!!
Comment
Guest
#9

15 Jun 2020, 11:47

Andrew Musau
Hello! I have read many of your posts, but my issue did not get solved.
Hence, posting it here
I am using the user writtten command - cmp.
And i wished to conduct a hausman test to see if random or fixed effects is better for my study.
however, on using hausman i got the error it can't be used with p-weighted data, which i guess cmp uses.
Then i decided to go for suest.
However, whenever i run suest , i get the error "was estimated with a nonstandard vce( robus)t"
I wonder why do i get this error, when i didn't specify anything as such.
Whats the solution for the same?
Thanks in advance
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10058
#10

15 Jun 2020, 14:24

Are these linear fixed and random effects models? Usually, with robust VCE, you want to use a test of overidentifying restrictions (implemented by xtoverid from SSC) to choose between random and fixed effects. However, because this will not work with cmp(SSC), you can implement it by means of an artificial regression as illustrated in the link below. Just adapt the procedure to cmp.

https://www.statalist.org/forums/for...scoll-kraay-se

Otherwise, maybe the author of cmp David Roodman may have other suggestions.
Comment
Guest
#11

16 Jun 2020, 10:37

Andrew Musau Thanks! can you provide me a reference to the procedure?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10058
#12

16 Jun 2020, 10:54

You will find references at

Code:

*ssc install xtoverid help xtoverid
Comment
Christine Kaufmann

Join Date: Jul 2020

Posts: 1
#13

29 Jul 2020, 05:15

@Andrew Musau: can I also use this, when I have the same control, but two different independent variables?
Equation 1: ivreg2 outcome1 control1 control2 i.time i.region (control3 = instrument), partial (i.time i.region) cluster(region)
Equation 2: ivreg2 outcome1 control1 control2 i.time i.region (control3 = instrument), partial (i.time i.region) cluster(region)
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10058
#14

29 Jul 2020, 05:37

Your Eq. 1 and Eq. 2 are identical, as far as I can see. From an estimation perspective, there is no difference in the treatment of control variables and independent variables. These are only terms applicable to your research question, so anything that applies to independent variables automatically applies to control variables. That is, unless you are confusing the terms.
Comment
Dennis Wajda

Join Date: Apr 2019

Posts: 7
#15

21 Jun 2021, 11:24

Hi Andrew, I'm trying to use this procedure for panel data & year fixed effects. Would I have to make an adjustment for the year fixed effects (i.e. i.group#i.year) or leave the year fixed effects as is? (i.e. i.year)
Comment

Announcement