Choosing the right strategy to introduce time and geographical dummies in a regression

Imran Khan

Join Date: Sep 2017

Posts: 68
#1

Choosing the right strategy to introduce time and geographical dummies in a regression

22 Jan 2018, 09:21

Dear All,

I am running a regression in which I have reasons to believe that the dependent variable could be potentially affected by the end of cold war i.e., 1990. I have a panel data of 70 countries across different regions from 1980-2015.

Given this background, I have two questions:

Q1) Which of the following strategy is the right one to introduce the dummy variable for Cold War.

Strategy I

Code:

reg DV IV 1990.year

Strategy II

Code:

generate CWDummy = 0 replace CWDummy =1 if (Year>1990) reg DV IV CWDummy

Strategy III

Code:

generate CWDummy = 0 replace CWDummy =1 if (Year<1990) reg DV IV CWDummy

DV stands for dependent variable and IV represent a set of independent variables.
I am struggling to differentiate between the logic of above three regressions. I am getting very different results from each of these. In short, which one is most appropriate to evaluate the influence of the end of Cold War (Year 1990) on the dependent variable?

Q2) As I have 70 countries in my data across 4 geographical regions. I want to control the heterogeneity in terms of countries. The idea behind introducing geographical dummies is three folds: First, I want to examine if the key or main IV is affecting the DV differently in each region. Second, if geographical location of the countries get some favourable treatment in terms of DV. Third, if geographical location of the countries get some favourable treatment in terms of IV.

Strategy I

Code:

generate d1= 0 generate d2= 0 generate d3= 0 generate d4= 0 replace d1 =1 if (region ==1) replace d2 =2 if (region ==2) replace d3 =3 if (region ==3) replace d4 =4 if (region ==4) reg DV IV d1 d2 d3 reg DV IV d2 d3 d4

Strategy II

Code:

reg DV IV i.region

Strategy III

Code:

reg DV IV if region==1 reg DV IV if region==2 reg DV IV if region==3 reg DV IV if region==4

I dropped one dummy to avoid dummy variable trap from Strategy I and II. Now, my confusion is again to choose the right strategy as well as its interpretation. The above 3 regressions produced very different results. How to choose between them?
Moreover, what does the sign and coefficient on the dummy tell us?

Looking forward to your response.

Best regards,
Imran Khan
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

22 Jan 2018, 09:37

Imran:
life gets way too complicated as it comes that you do not need to create yourself other nuisances.
Hence:
-why insisting with -regress- when -xt- commands were conceived to deal with panel dataset?
- why creating categorical variables/interactions yourself when -fvvarlist- can do it for you?

Kind regards,
Carlo
(Stata 19.0)
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#3

22 Jan 2018, 09:47

Dear Carlo,

Many thanks for your reply.

I am actually using -xtreg, forgot to write it here.

So if I have got you correctly, do you suggest to stick to the following two commands for Cold War and geographical dummies?

Code:

xtreg DV IV 1990.year xtreg DV IV i.region

Best regards,
Imran Khan
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

22 Jan 2018, 09:57

Imran:
assuming that -re- is the right specification for your regression model (see -hausman- in this respect), I would go:

Code:

xtreg DV IV i.region i.CWDummy

Kind regards,
Carlo
(Stata 19.0)
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#5

22 Jan 2018, 13:56

Dear Carlo,

Many thanks for your reply.

Can’t I use the following command directly instead of creating CWDummy?

Code:

xtreg DV IV i.region 1990.year

If no, then which strategy I should follow to generate CWDummy (see Strategy II and II from the first message)?

Best regards,
Imran Khan
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

22 Jan 2018, 14:57

Imran:
try your last code and see if Stata gives back what you're after.
I'm used to code factor variables with
-i.<categoricalvariable>- (personal taste, you know...).

Kind regards,
Carlo
(Stata 19.0)
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#7

22 Jan 2018, 15:35

Dear Carlo,

This is where the confusion arises. The following codes give different results.

Code:

xtreg DV IV i.region 1990.year xtreg DV IV i.region i.CWDummy

So, how to choose between them?

Best regards,
Imran Khan
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

23 Jan 2018, 00:07

Imran:
results (that you should have posted, as you did with your codes) do not differ when I apply your approach to a toy-example:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. g flag=1 if year==70
(26,848 missing values generated)

. replace flag=0 if year!=70
(26,848 real changes made)

. xtreg ln_wage i.flag

Random-effects GLS regression                   Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-sq:                                           Obs per group:
     within  = 0.0086                                         min =          1
     between = 0.0180                                         avg =        6.1
     overall = 0.0077                                         max =         15

                                                Wald chi2(1)      =     244.04
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.flag |  -.1313552   .0084085   -15.62   0.000    -.1478355   -.1148749
       _cons |   1.664364   .0060783   273.82   0.000      1.65245    1.676277
-------------+----------------------------------------------------------------
     sigma_u |  .38266832
     sigma_e |  .31892099
         rho |  .59011733   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage 70.year

Random-effects GLS regression                   Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-sq:                                           Obs per group:
     within  = 0.0086                                         min =          1
     between = 0.0180                                         avg =        6.1
     overall = 0.0077                                         max =         15

                                                Wald chi2(1)      =     244.04
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     70.year |  -.1313552   .0084085   -15.62   0.000    -.1478355   -.1148749
       _cons |   1.664364   .0060783   273.82   0.000      1.65245    1.676277
-------------+----------------------------------------------------------------
     sigma_u |  .38266832
     sigma_e |  .31892099
         rho |  .59011733   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Imran Khan

Join Date: Sep 2017

Posts: 68
#9

23 Jan 2018, 14:35

Dear Carlo,

Many thanks for your reply.

I was doing a mistake in generating the dummy variable. It worked fine now.

Best regards,
Imran Khan.
Comment

Announcement

Choosing the right strategy to introduce time and geographical dummies in a regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment