creating dummy variables for country-specific and time-fixed effects in panel data for country analysis

Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#1

creating dummy variables for country-specific and time-fixed effects in panel data for country analysis

19 Dec 2022, 11:35

Hi everyone,

I am currently writing my thesis which focuses on the relationship between poverty reduction and foreign aid (official development assistance). I am using unbalanced panel data for 21 SSA countries over the period 2000-2020. I was told by my supervisor to add country-specific and time-fixed effects but I am facing difficulties understanding how to include them in my model.
Based on the hausman test, I should use the FEM.
My first question concerns the country-fixed effects. I have created a dummy variable ("landlocked) however, when i regress log_inf mort ( my dependant variable) on log_odagni (key independant variable) and the other 5 control variables and add the "i.landlocked" or even "i.country" variable to account for country-fixed effects I get the " " countryid/landlocked ommited because" of collinearity error message. Following this, my supervisor told me that I should add a dummy variable for each country but I read that adding dummy variables which are time-constant are likely to be swiped out as when one uses the "xtreg, fe" command because this command already includes the country-specific effets. So I am kind of confused on how I should account for country-fixed effects in this case ?

code:
xtreg log_infmort log_odagni log_gini log_infl log_trade log_gdppercap log_fdi i.countryid, fe vce(cluster countryid)
xtreg log_infmort log_odagni log_gini log_infl log_trade log_gdppercap log_fdi i.landlocked, fe vce(cluster countryid)

Stata says that " countryid/landlocked ommited because" of collinearity error.

Following those results, I was told to use the "regress i.year I.country , ro" command. What I don't understand is the difference between "xtreg, fe" and " regress, ro". Can I use the latter in a fixed-effect panel data analysis ? Also, does using the '" fe, vce (cluster countryid) command makes any sense when I only have 21 countries in my dataset ?
Thanks in advance!
Tags: country-specific effects, i.country, i.year, panel data, time-fixed effects
Andrew Musau

Join Date: Oct 2014

Posts: 10084
#2

19 Dec 2022, 13:22

When you xtset your data with

xtset countryid year

-xtreg, fe- already includes the country effects in the regression for you. So you just need to add the year effects

Code:

xtreg log_infmort ..._fdi i.year, fe vce(cluster countryid)

Stata says that " countryid/landlocked ommited because" of collinearity error.

The variable landlocked, I presume, is an indicator of whether a country is landlocked. This feature is time-invariant (does not change over time), and since the fixed effects estimator only looks at within-variation (variation over time), then this variable will be dropped. In a sense, its effect is already accounted for by the country effects.

Also, does using the '" fe, vce (cluster countryid) command makes any sense when I only have 21 countries in my dataset ?

Ideally, you want to have at least 30 countries.

Last edited by Andrew Musau; 19 Dec 2022, 13:25.
1 like
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#3

20 Dec 2022, 04:44

Thank you for your past reply Andrew!
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#4

20 Dec 2022, 04:45

Thank you for your fast reply Andrew!
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#5

20 Dec 2022, 09:16

Hi Andrew,
I have a few other questions
After I add the year effects with the 'i.year' command, the results come out as signficant.
code
xtreg log_infmort log_odagdp log_trade log_fdi gini governance infl log_gdp_pc i.year, fe
Those are the results I get :

However, my issue is that when i use the robust standard errors with the "fe vce (robust) command", most of the explanaraty variables turn insignficant

What do you think is causing that and how would you do advise to overcome this issue ?

Thank you!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10084
#6

20 Dec 2022, 09:51

I think 20 clusters is too small a number. Look at wild bootstrap which is recommended for estimates with clustered standard errors and few clusters, implemented by boottest from SSC.

Code:

ssc install boottest, replace

The help file will have some examples of how you implement this.

Code:

help boottest

Also, rescale some of your variables so that their coefficients are readable (so you don't have "0.000x"). Report raw values of the variables in thousands or millions, for example.

Last edited by Andrew Musau; 20 Dec 2022, 09:58.
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#7

20 Dec 2022, 14:47

Based on what I read , I ran the following code boottest log_odagdp, reps(999)and this is what I got

However, I have no clue on how to interpret this result since I am not familiar at all with the boostrapping tehcnique. I couldn't find any clear explanation on how to interpret.

Could you, perhaps, give me an example of what an code with boottest would look like?
Since I only have 20 countries, is it still necessary to cluster the SE?

Thank you so much for your help.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10084

20 Dec 2022, 18:33

Since I only have 20 countries, is it still necessary to cluster the SE?

Yes, that is exactly what wild bootstrap addresses. Clustering with a small number of clusters. Here is an example using the Grunfeld dataset. I also use estout from SSC. Assuming no clustering, my regression and results would look as follows:

Code:

webuse grunfeld, clear
regress invest mvalue kstock i.company i.year
esttab ., indicate("Firm effects= *.company" "Year effects=*.year") ci starl(* 0.1 ** 0.05 *** 0.01) nocons

Res.:

Code:

. esttab ., indicate("Firm effects= *.company" "Year effects=*.year") ci starl(* 0.1 ** 0.05 *** 0.01) nocons

--------------------------------------
                                (1)  
                             invest  
--------------------------------------
mvalue                        0.118***
                     [0.0906,0.145]  

kstock                        0.358***
                      [0.313,0.403]  

Firm effects                    Yes  

Year effects                    Yes  
--------------------------------------
N                               200  
--------------------------------------
95% confidence intervals in brackets
* p<0.1, ** p<0.05, *** p<0.01

Alternatively, you could choose to report t-statistics in place of confidence intervals.

Code:

webuse grunfeld, clear
regress invest mvalue kstock i.company i.year
esttab ., indicate("Firm effects= *.company" "Year effects=*.year") t starl(* 0.1 ** 0.05 *** 0.01) nocons

Res.:

Code:

. esttab ., indicate("Firm effects= *.company" "Year effects=*.year") t starl(* 0.1 ** 0.05 *** 0.01) nocons

----------------------------
                      (1)  
                   invest  
----------------------------
mvalue              0.118***
                   (8.56)  

kstock              0.358***
                  (15.75)  

Firm effects          Yes  

Year effects          Yes  
----------------------------
N                     200  
----------------------------
t statistics in parentheses
* p<0.1, ** p<0.05, *** p<0.01

With wild bootstrap, the coefficients stay the same. You just report the Wild bootstrap-t or the Wild bootstrap-confidence intervals (highlighted below) in place of the previous. With the example above:

Code:

webuse grunfeld, clear
regress invest mvalue kstock i.company i.year
*SEED SET ONLY FOR REPLICATION. REMOVE IN ACTUAL IMPLEMENTATION
set seed 12212022
foreach var in mvalue kstock{
    boottest `var', cluster(company)
}

Res.:

Code:

. foreach var in mvalue kstock{
  2.
.     boottest `var', cluster(company)
  3.
. }

Overriding estimator's cluster/robust settings with cluster(company)

Wild bootstrap-t, null imposed, 999 replications, Wald test, clustering by company, bootstrap clustering by company, Rademacher weights:
  mvalue

                            t(9) =    10.5965
                        Prob>|t| =     0.0000

95% confidence set for null hypothesis expression: [.05921, .1299]

Overriding estimator's cluster/robust settings with cluster(company)

Wild bootstrap-t, null imposed, 999 replications, Wald test, clustering by company, bootstrap clustering by company, Rademacher weights:
  kstock

                            t(9) =     7.2887
                        Prob>|t| =     0.0400

95% confidence set for null hypothesis expression: [.02991, .5805]

.

See how to do this in estout in #9 of https://www.statalist.org/forums/for...t-few-clusters, although I believe that now bootest stores these statistics after a recent update, so it is easier to access them than as shown in the link.

Last edited by Andrew Musau; 20 Dec 2022, 19:08.

Comment

Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#9

04 Jan 2023, 15:11

Hi Andrew,

Thank your for your help on the cluster issue. My supervisor told me to continue with the cluster SE despite the very small amount so I followed his advices and kept the robust-clustered SE.

I am getting back to you, again, because I am facing new issues with my model.

So I decided to stick with the fixed-effects model ( panel data). Although a few variables appear to be significant, the within R-sq is way too high compared to the litterature and I do not see where this very high value would come from. The Within R-Squared is at 0.9249 while it should be around 0.2 based on the literature.
The VIF being smaller than 10, the cause should not be multicollinearity. What do you think could be leading to such a high value ?

Thanks in advance!
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#10

04 Jan 2023, 16:27

Country-specific and year-fixed effects are probably the reasons why the whithin r-squared is so high. I am thinking about broadening my dataset and taking into account more countries to see if the issues get fixed but I don't know if that's the best approach to take..
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10084

#11

04 Jan 2023, 17:16

It appears that you have infant mortality on the left-hand side and a bunch of macroeconomic variables on the right-hand side. Therefore, it should not be surprising that the \(\text{within-}R^2\) statistic is high given the strong correlation between the outcome and regressors in addition to the inclusion of country and time effects. Take the log of GDP as an example. Tell me the change in a country's wealth over time and I can make a pretty good guess about the change in its infant mortality rate. Additionally, there is a strong time effect as infant mortality has been declining for almost all countries due to advances in medical care. Expanding the sample won't help. In fact, let's take all the data the World Bank has on infant mortality and GDP (all countries, starting in 1960). Below, I use multiline from SSC to compare the trends of these two variables over time.

Code:

tempfile infm
copy "https://api.worldbank.org/v2/en/indicator/SP.DYN.IMRT.IN?downloadformat=csv" API_SP.DYN.IMRT.MA.IN_DS2_en_csv_v2_4772824.zip, replace
unzipfile API_SP.DYN.IMRT.MA.IN_DS2_en_csv_v2_4772824.zip, replace
import delimited "API_SP.DYN.IMRT.IN_DS2_en_csv_v2_4770442.csv", varnames(1) encoding(UTF-8) rowrange(6) clear
rename (v5-v65) infm#, addnumber(1960)
rename datasource country
keep country inf*
reshape long infm, i(country) j(year)
save `infm', replace
copy "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD?downloadformat=csv" API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.zip, replace
unzipfile API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.zip, replace
import delimited "API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.csv",varnames(1) encoding(UTF-8) rowrange(6) clear
rename (v5-v65) gdp#, addnumber(1960)
rename datasource country
keep country gdp*
reshape long gdp, i(country) j(year)
gen lngpd= ln(gdp)
merge 1:1 country year using `infm', nogen
encode country, g(countryn)  

*GRAPH TRENDS
preserve
collapse inf gdp, by(year)
*ssc install multiline, replace
set scheme s1color
multiline inf gdp year
restore

Click image for larger version

Name: Graph.png
Views: 1
Size: 23.1 KB
ID: 1695976

Almost an inverted mirror image. It is clear that the wealthier a country gets over time, the lower is the infant mortality. With only these two variables and taking country and year effects into account (sample: whole world 1960-2020), the \(\text{within-}R^2\) is already above 70%.

Code:

xtset countryn year
xtreg infm lngpd i.year, fe cluster(countryn)

Res.:

Code:


. xtreg infm lngpd i.year, fe cluster(countryn)

Fixed-effects (within) regression               Number of obs     =     10,364
Group variable: countryn                        Number of groups  =        239

R-sq:                                           Obs per group:
     within  = 0.7093                                         min =          8
     between = 0.0668                                         avg =       43.4
     overall = 0.2035                                         max =         61

                                                F(61,238)         =      16.02
corr(u_i, Xb)  = -0.5579                        Prob > F          =     0.0000

                             (Std. Err. adjusted for 239 clusters in countryn)
------------------------------------------------------------------------------
             |               Robust
        infm |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lngpd |  -12.73475   3.199117    -3.98   0.000    -19.03695   -6.432552
             |
        year |
       1961  |   -2.21748   .3705631    -5.98   0.000    -2.947482   -1.487477
       1962  |  -4.023321   .5790093    -6.95   0.000    -5.163959   -2.882683
       1963  |  -5.720347   .8187534    -6.99   0.000    -7.333276   -4.107418
       1964  |  -6.764065   1.228427    -5.51   0.000    -9.184044   -4.344086
       1965  |  -5.502915   2.157301    -2.55   0.011    -9.752758   -1.253073
       1966  |  -7.157392   2.293465    -3.12   0.002    -11.67548   -2.639308
       1967  |  -8.484931   2.513037    -3.38   0.001    -13.43557   -3.534294
       1968  |  -10.41661    2.73544    -3.81   0.000    -15.80538   -5.027846
       1969  |  -11.58164   2.905099    -3.99   0.000    -17.30463   -5.858645
       1970  |  -13.44369   3.141233    -4.28   0.000    -19.63186   -7.255519
       1971  |  -13.89519   3.338002    -4.16   0.000    -20.47099   -7.319387
       1972  |  -15.22334   3.439078    -4.43   0.000    -21.99826   -8.448416
       1973  |  -16.89866   3.564241    -4.74   0.000    -23.92014   -9.877168
       1974  |  -18.54105   3.741073    -4.96   0.000    -25.91089    -11.1712
       1975  |  -20.68005   3.870905    -5.34   0.000    -28.30567   -13.05444
       1976  |  -22.33382   4.041255    -5.53   0.000    -30.29502   -14.37263
       1977  |  -25.16434   4.193598    -6.00   0.000    -33.42565   -16.90303
       1978  |  -26.82776   4.316486    -6.22   0.000    -35.33115   -18.32436
       1979  |  -28.42597   4.452018    -6.38   0.000    -37.19637   -19.65558
       1980  |  -29.55085   4.642306    -6.37   0.000     -38.6961   -20.40559
       1981  |  -30.90049   4.765964    -6.48   0.000    -40.28935   -21.51163
       1982  |  -32.50226   4.817566    -6.75   0.000    -41.99277   -23.01174
       1983  |  -33.96469   4.918505    -6.91   0.000    -43.65406   -24.27533
       1984  |  -35.20083   5.016136    -7.02   0.000    -45.08253   -25.31914
       1985  |  -36.20073   5.131291    -7.05   0.000    -46.30928   -26.09218
       1986  |  -37.16819   5.239843    -7.09   0.000    -47.49059    -26.8458
       1987  |  -38.12364   5.355746    -7.12   0.000    -48.67437   -27.57292
       1988  |  -38.87782   5.470494    -7.11   0.000    -49.65459   -28.10105
       1989  |  -39.79213   5.566489    -7.15   0.000    -50.75801   -28.82625
       1990  |  -39.47025   5.709229    -6.91   0.000    -50.71733   -28.22318
       1991  |  -40.40695   5.747001    -7.03   0.000    -51.72844   -29.08547
       1992  |  -41.28604   5.782534    -7.14   0.000    -52.67753   -29.89456
       1993  |  -41.91871   5.836191    -7.18   0.000     -53.4159   -30.42152
       1994  |  -42.44101   5.932459    -7.15   0.000    -54.12784   -30.75417
       1995  |  -43.63822   5.984055    -7.29   0.000     -55.4267   -31.84974
       1996  |   -44.1279   6.098038    -7.24   0.000    -56.14092   -32.11488
       1997  |  -44.78316   6.212994    -7.21   0.000    -57.02265   -32.54368
       1998  |  -45.50057   6.296598    -7.23   0.000    -57.90475   -33.09638
       1999  |  -46.40061   6.390239    -7.26   0.000    -58.98927   -33.81196
       2000  |  -47.03476   6.496424    -7.24   0.000    -59.83259   -34.23692
       2001  |  -47.96282   6.579339    -7.29   0.000      -60.924   -35.00164
       2002  |  -48.88277   6.660583    -7.34   0.000      -62.004   -35.76155
       2003  |  -49.72484   6.759084    -7.36   0.000    -63.04011   -36.40957
       2004  |  -50.28563   6.900377    -7.29   0.000    -63.87925   -36.69202
       2005  |  -50.95231   7.035003    -7.24   0.000    -64.81114   -37.09349
       2006  |  -51.42709    7.18992    -7.15   0.000     -65.5911   -37.26308
       2007  |  -51.80968   7.337568    -7.06   0.000    -66.26455   -37.35481
       2008  |  -52.23486   7.446523    -7.01   0.000    -66.90438   -37.56535
       2009  |  -53.25886    7.46135    -7.14   0.000    -67.95758   -38.56014
       2010  |  -53.45914   7.531236    -7.10   0.000    -68.29553   -38.62274
       2011  |  -54.10295    7.68767    -7.04   0.000    -69.24752   -38.95838
       2012  |  -54.46806   7.773322    -7.01   0.000    -69.78136   -39.15476
       2013  |   -54.8021    7.85836    -6.97   0.000    -70.28292   -39.32127
       2014  |  -55.06814   7.955975    -6.92   0.000    -70.74126   -39.39501
       2015  |  -55.41551   8.042443    -6.89   0.000    -71.25898   -39.57205
       2016  |  -55.68486    8.13643    -6.84   0.000    -71.71347   -39.65624
       2017  |  -55.95722   8.230028    -6.80   0.000    -72.17022   -39.74422
       2018  |   -56.1778   8.326473    -6.75   0.000     -72.5808   -39.77481
       2019  |  -56.35894   8.420748    -6.69   0.000    -72.94765   -39.77022
       2020  |   -57.5621   8.304639    -6.93   0.000    -73.92209   -41.20211
             |
       _cons |   398.9355   73.55277     5.42   0.000     254.0379    543.8331
-------------+----------------------------------------------------------------
     sigma_u |  44.304969
     sigma_e |  12.958064
         rho |  .92119949   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Last edited by Andrew Musau; 04 Jan 2023, 17:28.

Comment

Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#12

04 Jan 2023, 18:46

Thank you so much for your help!! Makes more sense now..
So since expanding the data set would not change anything, keeping all those macro-economic variables in the model might be problematic ? If I understand well, the best alternative would be to select other variables ?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10084
#13

05 Jan 2023, 00:20

No. You should rely on economic theory to decide what variables to include. A high within \(R^2\) statistic in itself is not a problem. On multicollinearity, note that it does not affect the overall fit of the model and the estimation of the coefficients on nonmulticollinear variables remain largely unaffected. So if you have extreme multicollinearity, it is only a problem if a key variable whose effect you need to estimate is involved, and the results show a confidence interval that is so wide that the estimate is not useful for practical purposes. More importantly, you should check for functional form misspecification using a RESET test.
Comment
Bélise Umutesi

Join Date: Dec 2022

Posts: 13
#14

09 Jan 2023, 12:17

Ok I see
Thank you so much for your help Andrew!
Comment

Announcement