Including an interaction term vs running a regression with a portion of the dataset

Santiago Valdivieso

Join Date: Dec 2019

Posts: 37
#1

Including an interaction term vs running a regression with a portion of the dataset

06 Mar 2022, 14:05

I ran an ordered probit regression with the following equation:

(Eq.1) Y = B0 + B1 X1 + B2 Not_Poor + Other controls

Where Y is a categorical variable, X1 is a continous variable and Not_Poor is dichotomous variable that takes the value of 1 if an individual is not poor and 0 otherwise.

I got and insignificant B1. Theory suggest, however, that relationship between X1 and Y should be stronger among the poor. To test this, I have contemplated two methods.

A) Keeping only the poor individuals of my dataset (i.e., running the line "keep if Not_Poor==0") and running Eq.1 again.

B) Adding an interaction term between X1 and Not_Poor as follows: Y = B0 + B1 X1 + B2 Not_Poor + B3 (Not_Poor * X1) + Controls.

In both cases, B1 is the effect of X1 on Y among the poor. However, I find that B1 is non-significant with method A and significant with method B.

My questions are:

i) What could explain this discrepancy?
ii) Under which circumstances should method A be prefered to method B and viceversa?

Thank you in advance!

Last edited by Santiago Valdivieso; 06 Mar 2022, 14:43.
Tags: econometrics, interaction term
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

06 Mar 2022, 14:20

Santiago:
1) different regression specifications give back different coefficients; no wonder about that;
2) the approach B is the way to go, as interaction gives you an idea of the effect of X1 on rich and poor, something that you cannot get with approach A.

Kind regards,
Carlo
(Stata 19.0)
Comment
Santiago Valdivieso

Join Date: Dec 2019

Posts: 37
#3

13 Mar 2022, 19:50

Thanks Carlo Lazzaro !
Comment

Announcement

Including an interaction term vs running a regression with a portion of the dataset

Comment

Comment