Different Results When Running IV Regression

Anh Quynh Nguyen

Join Date: Dec 2021

Posts: 14
#1

Different Results When Running IV Regression

10 Apr 2022, 15:31

Hi all,

This is my 3rd post so far on this topic and I am so lost. Thank you guys so much for having Statalist.

My Dataex is here:

Code:

input float ln_earnings byte(age married race) float drink_intensity byte days_exer_week 8.517193 44 0 1 4 0 9.903487 45 1 1 .5 0 10.08581 25 1 1 .5 2 11.523855 44 1 1 1 2 9.305651 20 0 4 12 5 10.308952 20 0 1 30 5

I am investigating the impacts of alcohol consumption on earnings and run an IV regression:
Dependent Variable: ln_earnings

Control Variables: married, race, education

Explanatory Variables: drink_intensity and its square drink_intensity_sq (Because the relationship is quadratic, not linear) (drink_intensity is number of drink per week)

Instruments: days_exer_week (Number of days exercise per week) and its square

The thing is, as I include the square term of the explanatory variable, I don't know how should I run the IV. So far, I try 2 ways and they show different results.

First way:

Code:

regress drink_intensity age age_2 age_3 age_4 i.educgrp married i.race days_exer_week if female==0 predict drink_hat, xb generate drink_hat_sq = drink_hat*drink_hat regress ln_earnings age age_2 age_3 age_4 i.educgrp married i.race drink_hat drink_hat_sq if female==0

Second Way:

Code:

ivregress 2sls ln_earnings age age_2 age_3 age_4 married i.race i.educgrp (c.drink_intensity##c.drink_intensity = bmi c.days_exer_week##c.days_exer_week) if female==0, first

Even though I know the first way is not ideal, it shows statistically significant. The second way is not significant at all. They produce different coefficients as well. I don't have much options with the instruments because the data is quite restricted. In that case, which one should I go for? Is that correct to run IV regression with squared terms?

Thanks a lot. Any answer is so much appreciated!! I drop my dataset here if anyone's interested in replication.

Attached Files

nhis_alcohol.dta (3.49 MB, 1 view)
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

11 Apr 2022, 01:06

Anh:
some comments about your post:
1) -linktest- outcome tells that the functional form of the regressand in correctly specified if you stop age terms at:

Code:

age##age

2) I fail to get why you went -if female==0. instead of plugging -i.gender- in the right-hand side of your regression equation (and go -test- or -lincom- during postestimation);
3) as far as -ivregress- is cocerned, you reported the first stage code only (and it does not help understanding what you're complaining about).
In addition:
a) did you detect a reverse causation-led endogeneity between -ln_earnings- and -drink-?;
b) does local taxation play any role here (please, see KitBaum 's valuable textbook
https://www.stata.com/bookstore/modern-econometrics-stata, pages 220-31).

Kind regards,
Carlo
(Stata 19.0)
Comment
Anh Quynh Nguyen

Join Date: Dec 2021

Posts: 14
#3

11 Apr 2022, 08:58

1) Thanks so much. I don't know the -linktest- command before. It helps a lot. I have a follow-up question for this command. What should I do if -linktest- tells me that the functional form is not correctly specified, even when I add more or get rid of the age terms: (This is a simple OLS model, just ignore the if female==0 as I haven't figured it out)

Code:

regress ln_earnings age age_2 i.educgrp i.race married drink_intensity drink_intensity_sq if female==0 regress ln_earnings age age_2 age_3 age_4 i.educgrp i.race married drink_intensity drink_intensity_sq if female==0 regress ln_earnings age age_2 age_3 age_4 age_5 age_6 i.educgrp i.race married drink_intensity drink_intensity_sq if female==0

2) The original paper that I follow separated into 2 models, with different numbers of observations. That's why I separated. I try plug i.gender in the code and try -lincom-as you said. For example:

Code:

regress ln_earnings age age_2 i.educgrp i.race married drink_intensity drink_intensity_sq if female==0 lincome female+drink_intensity+drink_intensity_sq+age+age_2+married

However, the coefficients returned is for the combination of all those variables, not a single coefficient for each age, education, married, female. Can you help me fix the code to get different coefficient for each variable, for males and females separately?

3) The -ivregress- I ran with 'first' at the end reported both first-stage and IV regression results. The first stage regressed both drink_intensity and drink_intensity_sq. I think that's why it was different from running 2 stages separately. In that case, which one is correct? Should I pick the one that I ran manually? Or pick the ivregression and interpret that it is insignificant?

-- Thanks a lot for these additional comments:
a) I haven't thought about it. However, I just ran a quick regression with ln_earnings as independent variables and -drink- as dependent variable, with all other control variables. The coefficient is statistically insignificant. Is it enough to tell there is no reverse causation? I just did a quick research and so far GMM is the solution to test for reverse causation. But not sure if GMM works with cross-sectional data.

b) Yes I definitely think so. There are quite a lot literatures using taxation as an instrument as well. However, I don't have any data about taxation. This dataset is US National Health Interview Survey but I don't have regional variables and hence, cannot find relevant taxation. Do you have any suggestion to overcome this?

Again. Thank you so much!!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

11 Apr 2022, 10:04

Anh:
1) why not using your previous model specification with -c.age##c.age- that passed the -linktest-?
2) I still think that the -if- clause is not a good idea, whereas I would plug -i.gender- in the right-hand side of your regression equation.
As far as additional combinations of coefficients are concerned, the following toy-example can hopefully shed some lights:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. regress price i.foreign mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.07
       Model |   180261702         2  90130850.8   Prob > F        =    0.0000
    Residual |   454803695        71  6405685.84   R-squared       =    0.2838
-------------+----------------------------------   Adj R-squared   =    0.2637
       Total |   635065396        73  8699525.97   Root MSE        =    2530.9

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
       _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
------------------------------------------------------------------------------

. mat list e(b)

e(b)[1,4]
            0b.          1.                      
       foreign     foreign         mpg       _cons
y1           0   1767.2922  -294.19553   11905.415


. lincom (0b.foreign + mpg)-(1.foreign + mpg)

 ( 1)  0b.foreign - 1.foreign = 0

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -1767.292    700.158    -2.52   0.014    -3163.368   -371.2169
------------------------------------------------------------------------------

.

3) the issue is: is there endogeneity or not? Oftentimes, the presence of this nuisance can be retrieved from literature.
4) as far as taxation is concerned, see the example worked out in KitBaum 's valuable textbook
https://www.stata.com/bookstore/modern-econometrics-stata, pages 220-31

Last edited by Carlo Lazzaro; 11 Apr 2022, 10:11.

Kind regards,
Carlo
(Stata 19.0)

Comment

Adeel Dar

Join Date: Apr 2022

Posts: 9
#5

14 Apr 2022, 03:41

Hi Everyone,

I have a question regarding the IV regression. My model contains an index (Trade agreement) as the endogenous explanatory variable. Can I take dummy as an instrument for this. Let's say that I take the value one for all the member countries and use it as an instrument for the index?
Comment

Announcement