Testing for Endogeneity of Dummy Variables.

Nicholas Hamer

Join Date: May 2019
Posts: 23

Testing for Endogeneity of Dummy Variables.

02 Jul 2019, 08:46

Hi all,

I am conducting some research into impact of the World Cup on the growth rate of host countries. I have a panel of 160 countries which includes observations of variables such as GDP, GDP Growth for the period 1990-2016. An example of the data is shown below. This data has been xtset.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str30 CountryName double(GDP GDPGROWTH GDPCAP) float(Pre4 Pre3 Pre2 Pre1 t Post1 Post2 Post3 Post4)
"Albania"  6178834824.387625  -9.575640169486448 1880.0413396170275 0 0 0 0 0 0 0 0 0
"Albania"  4350579530.232994  -29.58899770129115 1331.7597795490356 0 0 0 0 0 0 0 0 0
"Albania" 4037337804.6581616  -7.199999986164059  1243.390610540299 0 0 0 0 0 0 0 0 0
"Albania"  4424922235.738512   9.600000045405338 1371.0966008720366 0 0 0 0 0 0 0 0 0
"Albania"  4792190780.412167    8.29999997982695 1494.0411519659224 0 0 0 0 0 0 0 0 0
"Albania"  5429552153.841033  13.299999992363581   1703.23715591804 0 0 0 0 0 0 0 0 0
"Albania"  5923641400.259521   9.100000007716176 1869.8168233283936 0 0 0 0 0 0 0 0 0
"Albania"   5276779759.35118 -10.920000000000016  1676.082839921589 0 0 0 0 0 0 0 0 0
"Albania"  5742719412.101891   8.830000000000027  1835.596721815642 0 0 0 0 0 0 0 0 0
"Albania"  6482955944.321828  12.890000000000072 2085.3711472230657 0 0 0 0 0 0 0 0 0
"Albania" 6933521382.4521885   6.949999999999903 2244.5648362582097 0 0 0 0 0 0 0 0 0
"Albania"  7508310305.057478   8.290000000000049  2453.557463926869 0 0 0 0 0 0 0 0 0
"Albania"  7849187592.907091   4.540000000000049 2572.6522013717067 0 0 0 0 0 0 0 0 0
"Albania"  8283247666.794856    5.53000000000003 2725.0967447186936 0 0 0 0 0 0 0 0 0
"Albania"   8739654613.23524   5.509999999999863  2887.291291048561 0 0 0 0 0 0 0 0 0
"Albania"  9222957513.347157   5.530000000000086 3062.5925044163087 0 0 0 0 0 0 0 0 0
"Albania"  9767112006.634628   5.899999999999878 3263.8124001509846 0 0 0 0 0 0 0 0 0
"Albania"  10351185304.63138   5.980000000000004 3485.2276282025928 0 0 0 0 0 0 0 0 0
"Albania"  11127524202.47875   7.500000000000156  3775.479708805628 0 0 0 0 0 0 0 0 0
"Albania" 11500296263.261787    3.34999999999998  3928.342143385504 0 0 0 0 0 0 0 0 0
"Albania" 11926957254.628792  3.7099999999999227 4094.3602035923504 0 0 0 0 0 0 0 0 0
"Albania" 12231094664.621826  2.5500000000000114   4210.07700502783 0 0 0 0 0 0 0 0 0
"Albania" 12404776208.859451   1.419999999999959  4276.917643063649 0 0 0 0 0 0 0 0 0
"Albania" 12528823970.948051  1.0000000000000426  4327.608231775726 0 0 0 0 0 0 0 0 0
"Albania" 12750584155.233839  1.7700000000000529  4413.335122319529 0 0 0 0 0 0 0 0 0
"Albania" 13033647123.480019  2.2199999999999136  4524.467507924287 0 0 0 0 0 0 0 0 0
"Albania" 13470274302.116606   3.350000000000051  4683.519216507559 0 0 0 0 0 0 0 0 0
"Albania" 13986932578.969046  3.8355438446510135  4867.632464647651 0 0 0 0 0 0 0 0 0
"Algeria"  91989749446.97946   .8000005799814147  3550.032671541718 0 0 0 0 0 0 0 0 0
"Algeria"  90885871916.03632  -1.200000584390537 3422.6386182093443 0 0 0 0 0 0 0 0 0
"Algeria"  92521819701.82526  1.8000023010180257 3403.9034522240077 0 0 0 0 0 0 0 0 0
"Algeria"   90578860785.6483  -2.100000759214012 3259.8436797716563 0 0 0 0 0 0 0 0 0
"Algeria"  89763654164.59029  -.8999965488494865 3164.8985771543003 0 0 0 0 0 0 0 0 0
"Algeria"  93174668346.04738  3.7999947898763793 3223.5575604032097 0 0 0 0 0 0 0 0 0
"Algeria"  96994828322.43536   4.099998469755789  3297.863374558326 0 0 0 0 0 0 0 0 0
"Algeria"   98061771373.7004   1.099999937850555  3281.102139095419 0 0 0 0 0 0 0 0 0
"Algeria" 103062925254.28166   5.100003610502341 3397.4101977918867 0 0 0 0 0 0 0 0 0
"Algeria"  106360940460.5076  3.2000015505953456 3457.1370464975817 0 0 0 0 0 0 0 0 0
"Algeria" 110423586430.99423  3.8196784955987937 3541.0720367972917 0 0 0 0 0 0 0 0 0
"Algeria" 113745564598.73032  3.0083954661371592 3600.4372541095986 0 0 0 0 0 0 0 0 0
"Algeria" 120125920931.17308   5.609323189832736 3754.5162751500056 0 0 0 0 0 0 0 0 0
"Algeria" 128777236288.58391  7.2018722440160445  3974.175032022265 0 0 0 0 0 0 0 0 0
"Algeria" 134316749131.75137   4.301624264364293 4091.1442350797965 0 0 0 0 0 0 0 0 0
"Algeria" 142251902308.52768   5.907791268081326  4273.312751467654 0 0 0 0 0 0 0 0 0
"Algeria" 144648118986.03094  1.6844883186912796  4282.328230917478 0 0 0 0 0 0 0 0 0
"Algeria"  149526919452.5767  3.3728751543716413  4359.375747522446 0 0 0 0 0 0 0 0 0
"Algeria"   153055956405.236   2.360134861053268  4390.499632759569 0 0 0 0 0 0 0 0 0
"Algeria"  155554202821.6213  1.6322438375223243  4386.038895588909 0 0 0 0 0 0 0 0 0
"Algeria" 161207268655.39215  3.6341453533424612  4463.394674889505 0 0 0 0 0 0 0 0 0
"Algeria" 165869166838.53906  2.8918659946484837   4504.92009813206 0 0 0 0 0 0 0 0 0
"Algeria"  171466867482.1632   3.374768650687841  4564.435016789671 0 0 0 0 0 0 0 0 0
"Algeria" 176212451150.23447  2.7676388667711223  4596.219627388071 0 0 0 0 0 0 0 0 0
"Algeria" 182889354514.36853   3.789121211668231  4675.885024476667 0 0 0 0 0 0 0 0 0
"Algeria" 189772334940.90765  3.7634669578312554  4759.595241519403 0 0 0 0 0 0 0 0 0
"Algeria"  196034821993.4945  3.2999999997559684  4827.724251387318 0 0 0 0 0 0 0 0 0
"Algeria" 199171379146.41522  1.6000000005227548  4820.434063719884 0 0 0 0 0 0 0 0 0
"Andorra" 1950360346.8336878   3.781387589629162   35780.5196725988 0 0 0 0 0 0 0 0 0
"Andorra" 2000016577.2760122  2.5460028718764107 35291.711409292446 0 0 0 0 0 0 0 0 0
"Andorra"  2018600970.439821   .9292119562888956 34278.647100255075 0 0 0 0 0 0 0 0 0
"Andorra" 1997779424.1846933 -1.0314840109578967  32766.05967074008 0 0 0 0 0 0 0 0 0
"Andorra" 2045390257.5149548   2.383187690988038  32633.82512747826 0 0 0 0 0 0 0 0 0
"Andorra" 2101791877.2436512   2.757499187329728  32917.64882135711 0 0 0 0 0 0 0 0 0
"Andorra"  2199519738.003028   4.649740148750595  34175.26006841249 0 0 0 0 0 0 0 0 0
"Andorra" 2398964977.7997584   9.067672199105132  37293.28241329082 0 0 0 0 0 0 0 0 0
"Andorra" 2475606878.1163473   3.194790296058514 38595.723209696414 0 0 0 0 0 0 0 0 0
"Andorra" 2577084005.2476096   4.099080836634087 40035.482449085124 0 0 0 0 0 0 0 0 0
"Andorra"  2668012839.882807    3.52836129711109  40801.54213003222 0 0 0 0 0 0 0 0 0
"Andorra"  2789321202.452289   4.546768319706089 41420.846177696934 0 0 0 0 0 0 0 0 0
"Andorra"   2969818586.67949   6.471014670827913  42396.30239802838 0 0 0 0 0 0 0 0 0
"Andorra" 3331207491.6417465  12.168719886904597 45519.492383943405 0 0 0 0 0 0 0 0 0
"Andorra"  3585973903.044556   7.647869790219858  47032.86688847065 0 0 0 0 0 0 0 0 0
"Andorra"  3851227772.830262   7.396982715365013 48831.929359938404 0 0 0 0 0 0 0 0 0
"Andorra" 4025933064.0290008   4.536353119159912  49708.40048930129 0 0 0 0 0 0 0 0 0
"Andorra"   4027543882.80023  .04001106689084111 48710.664620299576 0 0 0 0 0 0 0 0 0
"Andorra"  3681577710.753998  -8.590003786766744  43900.95170286543 0 0 0 0 0 0 0 0 0
"Andorra" 3545703433.7638264  -3.690653509588529  41979.86590139739 0 0 0 0 0 0 0 0 0
"Andorra" 3355695364.2384105  -5.358825775333358   39736.3540626699 0 0 0 0 0 0 0 0 0
"Andorra" 3199771534.4574075  -4.646543051633373 38205.771088791866 0 0 0 0 0 0 0 0 0
"Andorra" 3148088242.8188205  -1.615218182986652  38190.58658537226 0 0 0 0 0 0 0 0 0
"Andorra" 3159158337.7785444  .35164500185076975  39104.30184901897 0 0 0 0 0 0 0 0 0
"Andorra" 3231113955.0167494  2.2776831530642028 40785.049228339616 0 0 0 0 0 0 0 0 0
"Andorra" 3258326510.4285784   .8422035183741343  41765.92035312352 0 0 0 0 0 0 0 0 0
"Andorra"  3319880351.133421   1.889124386639395  42958.55839253401 0 0 0 0 0 0 0 0 0
"Andorra" 3382068237.5248113  1.8731966159611488 43942.938186510895 0 0 0 0 0 0 0 0 0
"Angola"  31822141256.216652 -3.4500986836048355 2614.4925039045625 0 0 0 0 0 0 0 0 0
"Angola"   32137613010.45088   .9913592919288448  2560.063030537661 0 0 0 0 0 0 0 0 0
"Angola"  30261328943.397697  -5.838280728699502 2333.4765495055613 0 0 0 0 0 0 0 0 0
"Angola"  23003628103.413578 -23.983417428756297 1716.2104308705007 0 0 0 0 0 0 0 0 0
"Angola"  23311730287.409233  1.3393634369786014 1684.2152545782533 0 0 0 0 0 0 0 0 0
"Angola"  26808489836.790936  15.000000026897695 1878.7932657895108 0 0 0 0 0 0 0 0 0
"Angola"  30439530825.347652  13.544369752501368 2073.2149592902338 0 0 0 0 0 0 0 0 0
"Angola"  32653786720.212154   7.274277345367764  2164.081638131306 0 0 0 0 0 0 0 0 0
"Angola"  34185623674.956352   4.691146444574585  2204.909862849585 0 0 0 0 0 0 0 0 0
"Angola"   34931379545.11695   2.181489731623415  2190.087274328473 0 0 0 0 0 0 0 0 0
"Angola"  35998401929.875435   3.054624233721796 2189.5607527822303 0 0 0 0 0 0 0 0 0
"Angola"  37512494194.487854   4.205998553940972 2208.7915360030192 0 0 0 0 0 0 0 0 0
"Angola"   42638834053.56884  13.665686511013874  2426.431783481753 0 0 0 0 0 0 0 0 0
"Angola"   43913671240.49369  2.9898500163565274 2412.3925214334604 0 0 0 0 0 0 0 0 0
"Angola"   48723474948.03264  10.952861766437167   2582.64647618106 0 0 0 0 0 0 0 0 0
"Angola"   56046084738.63072  15.028915319377802  2866.434693689993 0 0 0 0 0 0 0 0 0
end

I have 9 created dummies which each correspond to a year in a country's World Cup cycle. The dummy named t refers to the year of the World Cup, whilst the dummies Pre4, Pre3, Pre2, and Pre1 refer to the years preceding the World Cup. The dummies Post1, Post2, Post3, and Post4 refer to the years following the World Cup.

I am looking to test the exogeneity of the dummies I use. Specifically, I want to establish whether countries with high growth in the years before a World Cup host is selected are more likely to be selected as a hosts.

I have seen a similar approach taken in Sterken (2006), which is described as:

"In order to test this selection bias hypothesis we estimated a binary choice (logit) model with the event dummy variables as dependent variables (taking the value 1 if a country organized an event and 0 in other cases) and lagged GDP per capita growth as determinants. There is no endogeneity of the events found for lags up to eight years."

I have tried to implement this using a logit model to regress my dummy for the World Cup year (t) on up to 12 lags of GDP growth, as below.

Code:

*1 Lag
gen lagOwnGDPGROWTH=OwnGDPGROWTH[_n-1]
*2 lags 
gen lagOwnGDPGROWTH2=OwnGDPGROWTH[_n-2]
*3 lags 
gen lagOwnGDPGROWTH3=OwnGDPGROWTH[_n-3]
*4 Lags
gen lagOwnGDPGROWTH4=OwnGDPGROWTH[_n-4]
*5 Lags
gen lagOwnGDPGROWTH5=OwnGDPGROWTH[_n-5]
*6 Lags
gen lagOwnGDPGROWTH6=OwnGDPGROWTH[_n-6]
*7 Lags
gen lagOwnGDPGROWTH7=OwnGDPGROWTH[_n-7]
*8 Lags
gen lagOwnGDPGROWTH8=OwnGDPGROWTH[_n-8]
*9 Lags
gen lagOwnGDPGROWTH9=OwnGDPGROWTH[_n-9]
*10 Lags
gen lagOwnGDPGROWTH10=OwnGDPGROWTH[_n-10]
*11 Lags
gen lagOwnGDPGROWTH11=OwnGDPGROWTH[_n-11]
*12 Lags
gen lagOwnGDPGROWTH12=OwnGDPGROWTH[_n-12]


logistic t lagGDPGROWTH lagGDPROWTH2 lagGDPGROWTH3 lagGDPGROWTH4 lagGDPGROWTH5 lagGDPGROWTH6 lagGDPGROWTH7 lagGDPGROWTH8 lagGDPGROWTH9 lagGDPGROWTH10 lagGDPGROWTH11 lagGDPGROWTH12

This gives the output:

Code:

Logistic regression                             Number of obs     =        286
                                                LR chi2(12)       =      11.27
                                                Prob > chi2       =     0.5057
Log likelihood =  -27.24784                     Pseudo R2         =     0.1714

-----------------------------------------------------------------------------------
                t | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
  lagOwnGDPGROWTH |   .7803537   .1386373    -1.40   0.163     .5508932     1.10539
 lagOwnGDPGROWTH2 |   1.123929   .1439277     0.91   0.362      .874452     1.44458
 lagOwnGDPGROWTH3 |   1.009245   .1552665     0.06   0.952     .7465252    1.364423
 lagOwnGDPGROWTH4 |    .831256    .105247    -1.46   0.144     .6485784    1.065386
 lagOwnGDPGROWTH5 |   .6763069   .1177278    -2.25   0.025     .4808075    .9512975
 lagOwnGDPGROWTH6 |   1.157211   .1726232     0.98   0.328     .8638482      1.5502
 lagOwnGDPGROWTH7 |   1.207327   .1974243     1.15   0.249     .8762637     1.66347
 lagOwnGDPGROWTH8 |   1.110775   .1947707     0.60   0.549     .7877176    1.566323
 lagOwnGDPGROWTH9 |   .9037453    .163654    -0.56   0.576     .6337339    1.288799
lagOwnGDPGROWTH10 |   .9988944   .1671454    -0.01   0.995     .7195942    1.386601
lagOwnGDPGROWTH11 |   1.169153   .1938346     0.94   0.346     .8447922    1.618052
lagOwnGDPGROWTH12 |   .8571109   .1352739    -0.98   0.329     .6290653    1.167826
            _cons |   .0276128   .0234739    -4.22   0.000     .0052179    .1461254
-----------------------------------------------------------------------------------

I am unsure if this is the correct approach or how to interpret the odds ratios. I also have the logit coefficients which appear when I use outreg2 to convert the tables to word.

I hope I have been as clear as I can with my issue. Any help is much appreciated.

Nic

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

03 Jul 2019, 15:31

I don't see how year dummies would be endogenous. I also don't see that the logit done here is a test for endogeneity.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#3

03 Jul 2019, 21:34

I believe what you want to do is test for strict exogeneity of your explanatory variable, which is a dummy variable. You can't test for contemporaneous exogeneity without an instrumental variable. I'm guessing the paper by Sterken (2006) has in mind that you should see whether the explanatory variable is correlated with the past dependent variable. But you should use a more direct feedback test in the context of fixed effects estimation. In the model with GDP growth, say, as the dependent variable, put next year's or the next couple of year's dummy variable as explanatory variables, along with the current and and lagged indicator. This tests directly for the kind of feedback that invalidates fixed effects. If "event" is the dummy variable, and you want to allow it to have, say, a two year lagged effect, then here is the code. The second and third commands test for feedback.

Code:

xtreg GDP_growth even L.event L2.event i.year, fe cluster(country) xtreg GDP_growth even L.event L2.event i.year F.event F2.event, fe cluster(country) test F1.even F2.event
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#4

03 Jul 2019, 22:40

The method described (using a logit model to test whether previous GDP growth is associated with countries receiving the World Cup, rather than the World Cup increasing future GDP by itself) seems reasonable to me. It won't necessarily tell you anything causal, but you can at least see the correlation. (To put it another way, you are basically seeing if lagged GDP would make sense to include in a propensity score equation).

Your method has an issue, though--you aren't using the -xtset- structure of the data when defining your lags. So your lagOwnGDPGROWTH values for year 1990 for Andorra will be defined as the year 2016 values for Algeria, and so forth--not what you want. The right way to do it would be either with a -tsvarlist-

Code:

gen lagOwnGDPGROWTH = L.OwnGDPGROWTH

or with a -by- statement:

Code:

by country_id: gen lagOwnGDPGROWTH = OwnGDPGROWTH[_n-1]

The tsvarlist option is usually better, because you can just run the regression without needing to generate all the variables first, as follows:

Code:

logistic t L(1/12).OwnGDPGROWTH

To interpret the odds ratios, see here: https://stats.idre.ucla.edu/stata/fa...ic-regression/

To determine if there is a relationship between all the growth lags and World Cup hosting, you want a joint test on the lags--but you can also just look at the chi2 test reported in your output. Right now, your chi2 test has p = 0.5057 > 0.05 -- meaning the lags of GDP taken together do not significantly predict World Cup hosting. (Once you fix the variables, you might get a different result).

Last edited by Kye Lippold; 03 Jul 2019, 22:55.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

05 Jul 2019, 06:27

The problem with the proposed test, even as corrected by Kye, is that you could reject even though there is not a problem. Christopher Sims showed long again the equivalence of Granger and Sims causality. Sims causality is essentially what I proposed, and it has the benefit of allowing country and time effects. Both are likely to be very important. The other test is not a test of Granger causality. Lagged "event" or "t" would have to be included. And country effects and time effects. But it is very difficult to sensibly estimate logit models with fixed effects and a lagged dependent variable. Just putting in country dummies need not work well.
1 like
Comment

Nicholas Hamer

Join Date: May 2019
Posts: 23

08 Jul 2019, 02:32

Thanks all for the guidance, it has been very helpful. I have attempted both suggested approaches with different levels of success.

I implemented Kye's method like this:

Code:

gen lagOwnGDPGROWTH = L.OwnGDPGROWTH

logistic t L(1/12).OwnGDPGROWTH
outreg2 using "C:\Users\nicha\OneDrive\Documents\EndoTest.xml" , replace ctitle(Logit Coeff)  excel word dec(3) nocons title(Test for Endogeneity of World Cup Dummies )

logistic t L(1/12).OwnGDPGROWTH
outreg2 using "C:\Users\nicha\OneDrive\Documents\EndoTest.xml" , append ctitle(Odds Ratio)   excel word dec(3)  nocons title(Test for Endogeneity of World Cup Dummies ) eform

This produced the following output.

Code:

logistic t L(1/12).OwnGDPGROWTH

Logistic regression                             Number of obs     =        117
                                                LR chi2(12)       =       9.95
                                                Prob > chi2       =     0.6202
Log likelihood = -8.9763132                     Pseudo R2         =     0.3566

------------------------------------------------------------------------------
           t | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
OwnGDPGROWTH |
         L1. |   .5656405   .2156382    -1.49   0.135     .2679405    1.194105
         L2. |    1.10017   .4727135     0.22   0.824     .4739382    2.553867
         L3. |   .6951029   .2785166    -0.91   0.364     .3169468    1.524445
         L4. |   1.996551   1.143956     1.21   0.228     .6494876    6.137477
         L5. |   .9015574   .2859209    -0.33   0.744     .4842196    1.678589
         L6. |   2.340874   1.569126     1.27   0.204      .629222    8.708678
         L7. |   1.634971   .9273095     0.87   0.386     .5379386    4.969213
         L8. |   1.277927   .4678695     0.67   0.503     .6235445    2.619054
         L9. |   .7404189    .343347    -0.65   0.517     .2983722     1.83737
        L10. |   .7437728   .3159831    -0.70   0.486     .3234576    1.710264
        L11. |    .711831   .2719238    -0.89   0.374     .3366764    1.505016
        L12. |   .7148379   .2736704    -0.88   0.381     .3375448    1.513853
             |
       _cons |   .0018194   .0047165    -2.43   0.015     .0000113    .2927953
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

I wanted to ensure that my interpretation of this was correct. As I currently understand, the chi2 value of 0.6202 > 0.05 means that that together the lags of GDP do not significantly predict the World Cup hosting.

With respect to the odds ratios, if the value is less than 1 I have interpreted this as meaning that a country with higher growth in this period is less likely to become a host. If the value is greater than 1, I have interpreted this as meaning that a country with higher growth in this period is more likely to become a host. I am unsure if this is correct. Would I be able to say, as in Sterken (2006), that my dummies are found to be exogenous for these lags?

World Cups are typically assigned around 7 years prior to the hosting of the tournament. Would it thus make sense to disregard lags 1 to 6 as these occur after the host country has already been assigned. I would then estimate the equation below. I wanted to ensure that this makes sense to do.

Code:

logistic t L(7/12).OwnGDPGROWTH

My conclusions from this analysis would be qualitatively similar to the analysis with all 12 lags included.

I had more problems with Jeff's method. Specifically, I had trouble expanding the model to incorporate the 12 lags (I am not sure if this is even possible). I have considered just testing the 7th and 8th lags although I am not sure this is suitable as I am not sure I fully understand the method. The code I attempted is as below:

Code:

xtreg OwnGDPGROWTH t L.t L2.t L3.t L4.t L5.t L6.t L7.t L8.t L9.t L10.t L11.t L12.t i.year, fe cluster(Country)
xtreg OwnGDPGROWTH t L.t L2.t L3.t L4.t L5.t L6.t L7.t L8.t L9.t L10.t L11.t L12.t i.year F.t F2.t F3.t F4.t F5.t F6.t F7.t F8.t F9.t F10.t F11.t F12.t, fe cluster(Country)
test F1.t F2.t F3.t F4.t F5.t F6.t F7.t F8.t F9.t F10.t F11.t F12.t , coef

The output I obtained is as below, with many of the constraints dropped.

Code:

( 1)  F.t = 0
 ( 2)  F2.t = 0
 ( 3)  oF3.t = 0
 ( 4)  oF4.t = 0
 ( 5)  F5.t = 0
 ( 6)  F6.t = 0
 ( 7)  oF7.t = 0
 ( 8)  oF8.t = 0
 ( 9)  oF9.t = 0
 (10)  oF10.t = 0
 (11)  oF11.t = 0
 (12)  oF12.t = 0
       Constraint 2 dropped
       Constraint 3 dropped
       Constraint 4 dropped
       Constraint 7 dropped
       Constraint 8 dropped
       Constraint 9 dropped
       Constraint 10 dropped
       Constraint 11 dropped
       Constraint 12 dropped

 F(  3,     6) = 7.7e+16
            Prob > F =    0.0000
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

                                (Std. Err. adjusted for clustering on Country)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |
         --. |          0  (omitted)
         L1. |  -5.377228          .        .       .            .           .
         L2. |   3.635992          .        .       .            .           .
         L3. |          0  (omitted)
         L4. |          0  (omitted)
         L5. |  -.8743378          .        .       .            .           .
         L6. |  -2.358823          .        .       .            .           .
         L7. |          0  (omitted)
         L8. |          0  (omitted)
         L9. |  -.6177994          .        .       .            .           .
        L10. |  -3.086667          .        .       .            .           .
        L11. |          0  (omitted)
        L12. |          0  (omitted)
             |
        year |
       2004  |   3.416629          .        .       .            .           .
       2005  |   .6605102          .        .       .            .           .
             |
           t |
         F1. |   7.80e-14          .        .       .            .           .
         F2. |   2.210522          .        .       .            .           .
         F3. |          0  (omitted)
         F4. |          0  (omitted)
         F5. |   1.05e-13          .        .       .            .           .
         F6. |  -2.66e-15          .        .       .            .           .
         F7. |          0  (omitted)
         F8. |          0  (omitted)
         F9. |          0  (omitted)
        F10. |          0  (omitted)
        F11. |          0  (omitted)
        F12. |          0  (omitted)
             |
       _cons |   6.206352          .        .       .            .           .
------------------------------------------------------------------------------

Thanks again for the help. I very much appreciate any further guidance you can offer me.

Last edited by Nicholas Hamer; 08 Jul 2019, 03:23.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#7

08 Jul 2019, 08:09

Nicholas: You've put in way to many lead values, essentially reducing the data set to a few years. I would put in, at most, two or three lead values. Notice you only have (3,6) degrees of freedom in the F test, with the "6" being particularly alarming.
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#8

10 Jul 2019, 11:48

Jeff: Thank you for your clarification; I hadn't seen your first post when I posted (think we were editing at the same time), but I agree that your fixed effects model with feedback is a much better way to test this question. I hadn't considered how regressing the treatment dummy on lagged outcomes is a means of testing Granger causality, but I agree, a test like this in general should include lags of the dependent variable to deal with spurious correlation from time trends (and including fixed effects would be better overall). In this case, where the treatment is a binary variable that is only "on" for one period per country, I would not expect there to be a time trend in treatment, so I would think adding lags of treatment would not change the empirical results--does that sound right to you? (To put it another way--when doing things like estimating a propensity score equation for treatment that includes lagged outcomes, I typically haven't included lagged values of the treatment dummy in the model. Would it be a good idea to do so?)

Nic: I got your private message requesting comment, so am weighing in to clarify. I think that Jeff's method is a better way to get your answer. Looking at Sterken's paper (https://www.tandfonline.com/doi/full...84740601154516), he actually implements Jeff's recommended model (a growth model with time and country fixed effects and lags and leads for feedback) for his main analysis (in Table 2). So you will want to run that model to answer your question, not the logit.

My initial response assumed you wanted to replicate the logit model of Sterken exactly (maybe it is considered relevant in your field)--but looking at it more closely, I agree that the results of this model are not very useful. Your cited source (Sterken 2006) appears to draw an invalid conclusion from the logit model. He runs the model to see whether "expected economic growth indeed might be relevant to the choice of the host country"--in other words, to try to see if there is selection in the choice of the World Cup. He finds the lags in the model are not significant, and says that indicates selection is exogenous. But that is not a correct conclusion--failing to reject the null does not mean we accept the null that the effect of prior GDP growth is zero, since the test could be of low power. In particular, in Sterken's main model (Table 2), he finds that World Cup countries have significantly lower growth in the years prior to hosting (the coefficients on t-2 and t-3 are significant and negative). Note that this directly contradicts his conclusion from the logit model that prior growth and World Cup hosting are unrelated! This could have happened because the logit model was underpowered, or because of differing growth levels in the countries studied (i.e., the issues dealt with by fixed effects). Thus, the logit model doesn't give a reliable answer for whether growth is exogenous to World Cup selection; Jeff's model with feedback does.

An important question--why do you only have 117 data points in the logit model? Given your data description, I would expect the logit model to have (160 countries) * (27 years - 12 lags) = 2400 data points (approximately). You should check that you haven't dropped a lot of data somewhere along the way. Similarly, for the fixed effects model, you only have 27 years of data--so using 12 lags and leads on each side only gives you at most 3 years of observations (2002-2004). That is why you will want to use fewer lags and leads (Sterken's paper uses 4 of each).

Thus, the model I would advise actually running follows Jeff's recommendation:

Code:

xtreg OwnGDPGROWTH t L(1/4).t F(1/8).t i.year , fe cluster(Country) test F1.t F2.t F3.t F4.t , coef

If you are concerned about testing for exogeneity at the time of selection of the cup countries (7 years before the event occurs), I would run the same model but define "t" as the year when selection is announced, rather than the year the cup is held (and include enough lags of treatment to go up to the date the event is hosted). This might be more interesting for your case, because presumably the sporting event would affect GDP both through host country investment in the years before the event (building stadiums, etc.), and through increased tourism, trade, etc. during and after the event. So the relevant date of "treatment" could very well be the time of announcement. (But you may not have enough data to say anything meaningful about this).

All that said, if you are just trying to understand what the logit model is telling you--here are my answers.

As I currently understand, the chi2 value of 0.6202 > 0.05 means that that together the lags of GDP do not significantly predict the World Cup hosting.

In principle, I agree with this interpretation of the regression. But note you have huge confidence intervals on the coefficients (failing to reject the null even for an odds ratio of 0.56)--so the test has low power. With this data, I would say you don't have enough information to say whether prior GDP growth predicts World Cup hosting or not (based on this model).

With respect to the odds ratios, if the value is less than 1 I have interpreted this as meaning that a country with higher growth in this period is less likely to become a host. If the value is greater than 1, I have interpreted this as meaning that a country with higher growth in this period is more likely to become a host. I am unsure if this is correct.

I agree with your interpretation. Odds ratios above 1 mean a higher x-value increases the probability of the dependent variable occurring.

Would I be able to say, as in Sterken (2006), that my dummies are found to be exogenous for these lags?

When Sterken writes "There is no endogeneity of the events found for lags up to eight years", I believe he is simply checking if each of the coefficients in his logit are significant. By this standard, you would reach the same conclusion as he does. BUT I don't think Sterken's original conclusion is justified, as detailed above. The more defensible way to answer this question is with the lags and leads of event time in the xtreg model.

World Cups are typically assigned around 7 years prior to the hosting of the tournament. Would it thus make sense to disregard lags 1 to 6 as these occur after the host country has already been assigned?

No, I would not use fewer lags. You are concerned not just about prior year GDP influencing the World Cup selection, but about GDP trends being different in all years leading up to the event. (See my point above).
Comment

Nicholas Hamer

Join Date: May 2019
Posts: 23

11 Jul 2019, 07:29

Hi,

Thanks to both of you for the detailed responses. I feel like I should have made some things clearer in my initial question to avoid confusion. Apologies for this, I was trying keep my query as short as possible, but I have left out valuable information.

why do you only have 117 data points in the logit model?

I only have 117 observations at this point as I have reduced my sample to include only the countries who hosted the World or unsuccessfully bid to host the World Cup. There was also another small error which excluded some data points that have now been corrected.

In my analysis, the growth rates of host countries are compared to those countries who unsuccessfully bid to host the same World Cup. The models that I use to estimate the impact of the World Cup are variations of the following, which are similar to those in Sterken (2006) and can capture impacts beyond the year of the World Cup itself, which I agree is more appropriate.

Code:

reg OwnGDPGROWTH Pre4 Pre3 Pre2 Pre1 t Post1 Post2 Post3 Post4 i.year i.Country , cluster(CountryName)

I actually estimate 9 dummies to enable me to capture the impacts beyond the year of the World Cup and do this separately in 6 different sub-samples. I think this is similar to the model you suggest with lags and leads. An example of the output for one of the sub-samples using this approach is shown below:

Code:

Linear regression                               Number of obs     =        162
                                                F(4, 5)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.2241
                                                Root MSE          =     10.807

                            (Std. Err. adjusted for 6 clusters in CountryName)
------------------------------------------------------------------------------
             |               Robust
OwnGDPGROWTH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Pre4 |   5.092806   .9437865     5.40   0.003     2.666726    7.518887
        Pre3 |   10.67591   14.62349     0.73   0.498    -26.91496    48.26679
        Pre2 |  -3.659146   4.022882    -0.91   0.405    -14.00029    6.682001
        Pre1 |  -2.408727   3.484485    -0.69   0.520    -11.36588    6.548428
           t |  -.3363159   1.072218    -0.31   0.766     -3.09254    2.419908
       Post1 |  -2.763951   1.911053    -1.45   0.208     -7.67647    2.148568
       Post2 |  -2.885869   3.492905    -0.83   0.446    -11.86467    6.092929
       Post3 |  -.0987747   1.614039    -0.06   0.954    -4.247793    4.050244
       Post4 |  -2.343121   6.311543    -0.37   0.726    -18.56746    13.88122
             |
        year |
       1992  |   27.42669   26.79865     1.02   0.353    -41.46144    96.31481
       1993  |   26.33544   26.28451     1.00   0.362    -41.23104    93.90191
       1994  |   22.63955    21.3101     1.06   0.337     -32.1398    77.41891
       1995  |   22.47377   20.99864     1.07   0.333    -31.50495    76.45249
       1996  |   22.55007   23.36053     0.97   0.379    -37.50009    82.60023
       1997  |   24.29115   24.83938     0.98   0.373    -39.56052    88.14281
       1998  |   25.51277   27.94467     0.91   0.403    -46.32128    97.34681
       1999  |   22.22362   24.53556     0.91   0.407    -40.84705    85.29428
       2000  |    21.4149   21.18292     1.01   0.358    -33.03753    75.86733
       2001  |   20.24687   21.61087     0.94   0.392    -35.30563    75.79937
       2002  |   19.63245   19.59116     1.00   0.362    -30.72822    69.99313
       2003  |     13.785    13.5528     1.02   0.356    -21.05357    48.62357
       2004  |   29.68065   28.79745     1.03   0.350    -44.34555    103.7068
       2005  |   22.40984    21.5882     1.04   0.347    -33.08439    77.90407
       2006  |   23.57809   22.50404     1.05   0.343    -34.27039    81.42657
       2007  |   22.55836   20.78301     1.09   0.327    -30.86607    75.98279
       2008  |   22.41753   22.29749     1.01   0.361    -34.89999    79.73505
       2009  |    19.4936   21.95156     0.89   0.415    -36.93468    75.92187
       2010  |   21.26696   22.37958     0.95   0.386    -36.26157    78.79548
       2011  |   20.38663   20.81378     0.98   0.372    -33.11689    73.89015
       2012  |   23.20477    23.9706     0.97   0.377    -38.41363    84.82317
       2013  |   22.06126   22.80935     0.97   0.378    -36.57205    80.69456
       2014  |   20.40287   21.15379     0.96   0.379    -33.97469    74.78043
       2015  |   20.02728   21.67411     0.92   0.398    -35.68781    75.74236
       2016  |   21.32224   23.93651     0.89   0.414    -40.20852    82.85301
       2017  |    19.1717   21.14272     0.91   0.406    -35.17739    73.52078
             |
     Country |
         11  |   .7328593   .4135862     1.77   0.137    -.3302978    1.796016
         43  |   1.171458   .4135862     2.83   0.037     .1083006    2.234615
         93  |   1.714289   .4135862     4.14   0.009      .651132    2.777446
        100  |   2.496825   .4135862     6.04   0.002     1.433668    3.559982
        207  |   .2728263   .2178162     1.25   0.266    -.2870881    .8327406
             |
       _cons |  -19.08876   21.89014    -0.87   0.423    -75.35915    37.18164
------------------------------------

I appreciate that my samples here are small and the resulting confidence intervals are very large, but this is a separate issue.

My specific question here regards testing whether the assignment of host status to a country is related to a country's growth performance prior to this (e.g. 7 years prior to t). I had originally wanted to replicate Sterken's logit model but I had not realised the issues Jeff highlighted with this. Thanks for your comments explaining this approach and its limitations.

I have thus implemented what you suggested, with t defined as the year of the World Cup. This produced the following output:

Code:

Fixed-effects (within) regression               Number of obs     =        255
Group variable: Country                         Number of groups  =         17

R-sq:                                           Obs per group:
     within  = 0.1537                                         min =         15
     between = 0.1197                                         avg =       15.0
     overall = 0.1214                                         max =         15

                                                F(16,16)          =          .
corr(u_i, Xb)  = -0.0928                        Prob > F          =          .

                               (Std. Err. adjusted for 17 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
OwnGDPGROWTH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |
         --. |   2.653496   1.121724     2.37   0.031     .2755478    5.031445
         L1. |   2.983787   1.756274     1.70   0.109    -.7393483    6.706923
         L2. |  -.3698165   1.615731    -0.23   0.822    -3.795014    3.055381
         L3. |   .9225178   .5290171     1.74   0.100    -.1989483    2.043984
         L4. |   .3999053   .6577751     0.61   0.552    -.9945157    1.794326
         F1. |   .4984292   .5331391     0.93   0.364    -.6317751    1.628634
         F2. |   1.474331   1.071948     1.38   0.188    -.7980982     3.74676
         F3. |   2.829454    1.34341     2.11   0.051    -.0184491    5.677357
         F4. |  -4.042115   3.565449    -1.13   0.274    -11.60053      3.5163
         F5. |   .9470954   .6841577     1.38   0.185    -.5032542    2.397445
         F6. |   1.300196   1.332141     0.98   0.344    -1.523817    4.124209
         F7. |   4.355499   1.447786     3.01   0.008      1.28633    7.424668
         F8. |   1.140999   1.277572     0.89   0.385    -1.567333    3.849331
             |
        year |
       1996  |   2.558152   1.731729     1.48   0.159    -1.112948    6.229253
       1997  |   2.492108   1.580758     1.58   0.134    -.8589502    5.843166
       1998  |   2.622827   2.500869     1.05   0.310    -2.678778    7.924433
       1999  |   1.180259   1.630144     0.72   0.480    -2.275493     4.63601
       2000  |   2.027257   1.036664     1.96   0.068    -.1703732    4.224886
       2001  |   .5364225   1.130106     0.47   0.641    -1.859295     2.93214
       2002  |  -.3211735   1.014379    -0.32   0.756    -2.471561    1.829214
       2003  |  -2.235074   2.904044    -0.77   0.453    -8.391373    3.921225
       2004  |   4.795163   2.970005     1.61   0.126    -1.500967    11.09129
       2005  |   1.654457   .9148319     1.81   0.089       -.2849    3.593814
       2006  |   3.112902   1.268141     2.45   0.026     .4245639     5.80124
       2007  |   1.888668   .9085485     2.08   0.054    -.0373689    3.814705
       2008  |   1.119909   1.216915     0.92   0.371    -1.459835    3.699653
       2009  |   -2.14663   1.237797    -1.73   0.102    -4.770642    .4773827
             |
       _cons |   1.920893   1.055881     1.82   0.088    -.3174753     4.15926
-------------+----------------------------------------------------------------
     sigma_u |  1.7600888
     sigma_e |  4.9089611
         rho |  .11391139   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test F1.t F2.t F3.t F4.t  , coef

 ( 1)  F.t = 0
 ( 2)  F2.t = 0
 ( 3)  F3.t = 0
 ( 4)  F4.t = 0

       F(  4,    16) =    1.41
            Prob > F =    0.2762
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

                                (Std. Err. adjusted for clustering on Country)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           t |
         --. |   .6884362          .        .       .            .           .
         L1. |   4.884297          .        .       .            .           .
         L2. |  -.1435667          .        .       .            .           .
         L3. |   1.678332          .        .       .            .           .
         L4. |   .9637783          .        .       .            .           .
         F1. |   3.33e-16          .        .       .            .           .
         F2. |   6.66e-16          .        .       .            .           .
         F3. |   8.88e-16          .        .       .            .           .
         F4. |  -5.33e-15          .        .       .            .           .
         F5. |   .6150695          .        .       .            .           .
         F6. |   .0674169          .        .       .            .           .
         F7. |   4.151843          .        .       .            .           .
         F8. |   1.892883          .        .       .            .           .
             |
        year |
       1996  |   2.698131          .        .       .            .           .
       1997  |   3.540787          .        .       .            .           .
       1998  |   4.581715          .        .       .            .           .
       1999  |   1.995063          .        .       .            .           .
       2000  |   1.853367          .        .       .            .           .
       2001  |   .3306032          .        .       .            .           .
       2002  |  -1.109568          .        .       .            .           .
       2003  |  -5.296549          .        .       .            .           .
       2004  |   7.366505          .        .       .            .           .
       2005  |   1.681484          .        .       .            .           .
       2006  |   2.954729          .        .       .            .           .
       2007  |   1.684928          .        .       .            .           .
       2008  |   1.419183          .        .       .            .           .
       2009  |  -2.212433          .        .       .            .           .
             |
       _cons |   1.792721          .        .       .            .           .
------------------------------------------------------------------------------

I also attempted this by defining t (now t1) as the year of assignment (7 years prior).

Code:

xtreg OwnGDPGROWTH t1 L(1/4).t1 F(1/8).t1 i.year , fe cluster(Country)
test F1.t1 F2.t1 F3.t1 F4.t1  , coef

Code:

Fixed-effects (within) regression               Number of obs     =        255
Group variable: Country                         Number of groups  =         17

R-sq:                                           Obs per group:
     within  = 0.1467                                         min =         15
     between = 0.0839                                         avg =       15.0
     overall = 0.1301                                         max =         15

                                                F(16,16)          =          .
corr(u_i, Xb)  = -0.0139                        Prob > F          =          .

                               (Std. Err. adjusted for 17 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
OwnGDPGROWTH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         --. |   3.693964   1.233568     2.99   0.009     1.078916    6.309012
         L1. |   .8353307   1.402383     0.60   0.560    -2.137588    3.808249
         L2. |   .4039902   .8380209     0.48   0.636    -1.372535    2.180515
         L3. |  -4.803118    3.65856    -1.31   0.208    -12.55892    2.952683
         L4. |   2.120379     1.3889     1.53   0.146     -.823958    5.064715
         F1. |   .6793627   .7083436     0.96   0.352    -.8222586    2.180984
         F2. |   .2586615   .4348103     0.59   0.560    -.6630951    1.180418
         F3. |  -.0180365   1.130893    -0.02   0.987    -2.415422    2.379349
         F4. |   .5673288   1.813355     0.31   0.758    -3.276813     4.41147
         F5. |  -1.189435   2.646489    -0.45   0.659    -6.799742    4.420872
         F6. |  -1.140072   .9780018    -1.17   0.261    -3.213343    .9331992
         F7. |   .3028983   .9057945     0.33   0.742      -1.6173    2.223097
         F8. |  -.5272299   1.274282    -0.41   0.685    -3.228588    2.174128
             |
        year |
       1996  |   2.333928   1.685105     1.39   0.185    -1.238335    5.906191
       1997  |   2.380582   1.511749     1.57   0.135     -.824183    5.585347
       1998  |   2.635867   2.402159     1.10   0.289    -2.456482    7.728217
       1999  |   1.110769   1.587329     0.70   0.494    -2.254219    4.475757
       2000  |   1.836128   1.035823     1.77   0.095    -.3597187    4.031975
       2001  |   .3751063   1.136715     0.33   0.746    -2.034621    2.784834
       2002  |  -.2144201   1.083297    -0.20   0.846    -2.510907    2.082067
       2003  |  -2.204567   2.908035    -0.76   0.459    -8.369325    3.960192
       2004  |   4.453993   2.804066     1.59   0.132    -1.490362    10.39835
       2005  |   1.453652   .9187917     1.58   0.133    -.4940992    3.401403
       2006  |   2.905801   1.236088     2.35   0.032     .2854114    5.526191
       2007  |   1.643224   .9325815     1.76   0.097     -.333761    3.620208
       2008  |    .907911   1.221445     0.74   0.468    -1.681437    3.497259
       2009  |  -2.411431   1.247395    -1.93   0.071     -5.05579    .2329278
             |
       _cons |   2.246961   1.032329     2.18   0.045     .0585219      4.4354
-------------+----------------------------------------------------------------
     sigma_u |  1.6293552
     sigma_e |  4.9292742
         rho |    .098499   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test F1.t1 F2.t1 F3.t1 F4.t1  , coef

 ( 1)  F.t1 = 0
 ( 2)  F2.t1 = 0
 ( 3)  F3.t1 = 0
 ( 4)  F4.t1 = 0

       F(  4,    16) =    0.40
            Prob > F =    0.8025
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

                                (Std. Err. adjusted for clustering on Country)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         --. |   3.490757          .        .       .            .           .
         L1. |   1.664232          .        .       .            .           .
         L2. |   .0216596          .        .       .            .           .
         L3. |  -7.100768          .        .       .            .           .
         L4. |   2.439465          .        .       .            .           .
         F1. |   1.11e-16          .        .       .            .           .
         F2. |  -3.33e-16          .        .       .            .           .
         F3. |  -2.32e-16          .        .       .            .           .
         F4. |  -9.99e-16          .        .       .            .           .
         F5. |  -.6223671          .        .       .            .           .
         F6. |  -1.148199          .        .       .            .           .
         F7. |   .0615866          .        .       .            .           .
         F8. |  -1.314433          .        .       .            .           .
             |
        year |
       1996  |   1.954583          .        .       .            .           .
       1997  |   2.064994          .        .       .            .           .
       1998  |   1.905463          .        .       .            .           .
       1999  |   1.006008          .        .       .            .           .
       2000  |   2.095411          .        .       .            .           .
       2001  |   .2367843          .        .       .            .           .
       2002  |   .1313834          .        .       .            .           .
       2003  |  -1.096031          .        .       .            .           .
       2004  |   3.168245          .        .       .            .           .
       2005  |   1.261396          .        .       .            .           .
       2006  |   2.460325          .        .       .            .           .
       2007  |   1.440149          .        .       .            .           .
       2008  |    .241585          .        .       .            .           .
       2009  |  -3.061001          .        .       .            .           .
             |
       _cons |   2.507001          .        .       .            .           .
------------------------------------------------------------------------------

Again I am not sure how this should be interpreted. I believe the first part of this approach is similar to the one I have implemented in the first regression shown in this post. I find the test code harder to understand.

I hope I have explained my specific issue more clearly now.

Alternatively, Would I be able to tackle my issue by adding additional PreX dummies to my original model (e.g. Pre7,...Pre10) to see if growth is systematically higher in these periods in host countries than in the countries who unsuccessfully bid to host, which may suggest that countries with higher growth in these preceding years are chosen as hosts more often?

Thanks again for your help, it is very much appreciated.

Nic

Comment

Kye Lippold

Join Date: Jun 2019

Posts: 67
#10

11 Jul 2019, 23:22

Glad you found the advice helpful so far!

Your main model is indeed identical to the model Jeff and I suggested (but if you expand the sample to more countries, you will probably want to use the -xtreg- version, just so your output doesn't get filled with 100+ country dummies). I meant to write my model with only 4 forward lags, but typed 8 by mistake. So it should have been:

Code:

xtreg OwnGDPGROWTH F(4/1).t t L(1/4).t i.year , fe cluster(Country)

which is the same as your main model. (You can read -help tsvarlist- to learn why--briefly, L1.t is exactly the same as what you call "Post1", L2.t is "Post2", F1.t is "Pre1", etc. Using the lag notation just saves you the time of generating new variables for every pre/post period of interest. I also changed the order of the coefficients so you have results in the same order as your model with the pre/post dummies). Given this, hopefully it is clear that the models you already ran with F4-F8 already include additional "Pre" dummies.

My main suggestion is to use all countries, not limit your sample to only 6 countries (the ones that were hosting the world cup or bidding to host). Why? Because 1) the countries that never hosted or bid can be used to identify your time effects and country fixed effects, and 2) the decision to make a bid is likely correlated with growth. The fixed effect growth model effectively asks "is growth higher in hosting countries than would be expected, given world GDP trends and the country's average growth rate?" So you would ideally include the non-bidding countries to get your projection of what is "expected" growth for a country (your counterfactual). Doing this will make your sample much bigger and help with precision. (You could potentially exclude countries that are not in FIFA at all, under the argument that they would never have been in contention for the cup--but I would include any country that could have potentially made a bid, even if they didn't do so).

If you are concerned that countries that make bids are different from other countries in general, you can check this directly with a placebo test. Define b as a binary variable that is equal to what t would have been for each country that made a bid (e.g. if Spain was bidding for the cup in 2006, but didn't succeed, set b to 1 in 2006 for Spain). Then run the same model using b instead of t and dropping all countries that actually hosted:

Code:

xtset by country: egen ever_hosted = max(t) xtreg OwnGDPGROWTH F(4/1).b b L(1/4).b i.year if ever_hosted==0, fe cluster(Country)

If you find effects of the placebo (i.e. significant coefficients on b and/or the post values), you would be concerned about selection into bidding.

(A different approach would be to use a staggered Diff-in-Diff incorporating the variation in timing of when the cup is hosted, as in Fadlon and Nielsen 2015: http://pages.ucsd.edu/~yfadlon/pdfs/...dresponses.pdf. But I expect you don't have enough data to implement this approach effectively).

Thinking more about the distinction between selection versus hosting (your t1 vs t variables)--effectively, you have a policy with two different periods of effect. The policy first appears as a news shock when the hosting decision is announced, and then when the cup actually happens, you have the shock of the event itself. The trouble is that you can't separate out whether differences in GDP growth prior to the event reflect differing pre-existing trends in the selected country, or anticipatory effects from knowing the event is coming. Given this, I might do the following:

1. Run the test for exogeneity on the date of selection (your variable t1). This tests whether FIFA is choosing the hosting country based on growth.

Code:

xtreg OwnGDPGROWTH F(3/1).t1 t1 i.year , fe cluster(Country) test F1.t1 F2.t1 F3.t1 , coef

I include only 3 lags, because 4 years seems like a long time in growth terms. If you are on trend for 3 years, that is probably enough to say growth is consistent (and will give you a little more sample)--but 4 could work as well.
To understand the test output--you mainly care about the P-value of the test that all the coefficients are equal to zero (which is 0.8025 in your last example). If p < 0.05, we would reject the null that all the pre period values are zero--in other words, we reject that countries chosen as hosts have parallel GDP growth to other countries. The reason the test output looks so strange (all the dropped standard errors) is likely because you have so few observations, so Stata doesn't have much variation left to compute the standard errors. Once you increase your sample, you will likely get sensible output.
If the p-value > 0.05, then you fail to reject the null that there are parallel trends - which suggests (but does not prove) that world cup selection is exogenous.

2. Then run the main model as you already have, including the 4 pre and post lags to test for effects before and after the cup is hosted. If you find the "pre" variables are significant, it could indicate either a violation of parallel trends, or anticipation effects... but the evidence from the model in point 1 could help clarify what is happening.
Comment

Nicholas Hamer

Join Date: May 2019
Posts: 23

#11

12 Jul 2019, 04:02

Hi, thanks again for your detailed reply. It is encouraging that my model was the same as what you suggested! I will strive to use the lag notation in the future.

I have now run the test for exogeneity on the full sample of countries, as you suggested in 1.

Code:

xtreg OwnGDPGROWTH F(4/1).t1 t1  i.year , fe cluster(Country)
test F1.t1 F2.t1 F3.t1 F4.t1 , coef

The output does look more sensible in this way and is as below.

Code:

Fixed-effects (within) regression               Number of obs     =      3,793
Group variable: Country                         Number of groups  =        165

R-sq:                                           Obs per group:
     within  = 0.0658                                         min =         21
     between = 0.0037                                         avg =       23.0
     overall = 0.0565                                         max =         23

                                                F(27,164)         =    5855.11
corr(u_i, Xb)  = -0.0039                        Prob > F          =     0.0000

                              (Std. Err. adjusted for 165 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
OwnGDPGROWTH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         F4. |   2.284326    1.81898     1.26   0.211    -1.307313    5.875965
         F3. |   1.408977   .9805844     1.44   0.153    -.5272212    3.345175
         F2. |   .7899224    .969165     0.82   0.416    -1.123727    2.703572
         F1. |   1.824204   1.192234     1.53   0.128    -.5299037    4.178311
         --. |   1.734545   .7121378     2.44   0.016     .3284045    3.140686
             |
        year |
       1992  |   .0169227   1.043822     0.02   0.987    -2.044139    2.077985
       1993  |   .5208019   .9914119     0.53   0.600    -1.436775    2.478379
       1994  |   .9075229    .991018     0.92   0.361    -1.049276    2.864322
       1995  |   2.603797   .9079204     2.87   0.005     .8110769    4.396517
       1996  |   3.318816   .9539918     3.48   0.001     1.435127    5.202506
       1997  |   3.347204   1.077689     3.11   0.002     1.219269    5.475139
       1998  |   2.245179    1.04841     2.14   0.034     .1750569    4.315301
       1999  |   2.239737   .9680196     2.31   0.022     .3283493    4.151126
       2000  |   3.051624   .8923962     3.42   0.001     1.289557    4.813692
       2001  |   2.084328   .9560553     2.18   0.031     .1965636    3.972092
       2002  |   2.192957   .8802679     2.49   0.014     .4548375    3.931076
       2003  |   2.778802   .7402921     3.75   0.000     1.317069    4.240534
       2004  |   4.524011   1.095758     4.13   0.000     2.360398    6.687624
       2005  |   3.926661   .9093815     4.32   0.000     2.131056    5.722266
       2006  |   4.498317   .9362509     4.80   0.000     2.649657    6.346977
       2007  |   4.559561   .8847155     5.15   0.000     2.812659    6.306462
       2008  |   2.762681   .9239623     2.99   0.003     .9382853    4.587077
       2009  |  -.6920835   .8815106    -0.79   0.434    -2.432657     1.04849
       2010  |   3.338225   .8711098     3.83   0.000     1.618189    5.058261
       2011  |   2.930187   .9159911     3.20   0.002     1.121531    4.738843
       2012  |    2.38536   .9294319     2.57   0.011     .5501645    4.220555
       2013  |   2.222929   .9358898     2.38   0.019     .3749822    4.070876
             |
       _cons |   .9937039   .8239187     1.21   0.230    -.6331521     2.62056
-------------+----------------------------------------------------------------
     sigma_u |  2.1830128
     sigma_e |  5.5049329
         rho |   .1358873   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test F1.t1 F2.t1 F3.t1 F4.t1 , coef

 ( 1)  F.t1 = 0
 ( 2)  F2.t1 = 0
 ( 3)  F3.t1 = 0
 ( 4)  F4.t1 = 0

       F(  4,   164) =    2.89
            Prob > F =    0.0239
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

                                (Std. Err. adjusted for clustering on Country)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         F4. |  -8.88e-16          .        .       .            .           .
         F3. |  -2.22e-16          .        .       .            .           .
         F2. |   1.44e-15          .        .       .            .           .
         F1. |  -1.11e-15          .        .       .            .           .
         --. |   .9798687          .        .       .            .           .
             |
        year |
       1992  |   .0496838          .        .       .            .           .
       1993  |   .0892312          .        .       .            .           .
       1994  |   2.011886          .        .       .            .           .
       1995  |   2.449021          .        .       .            .           .
       1996  |   2.997502          .        .       .            .           .
       1997  |   2.783332          .        .       .            .           .
       1998  |   2.429947          .        .       .            .           .
       1999  |   2.091716          .        .       .            .           .
       2000  |   2.907634          .        .       .            .           .
       2001  |   1.696141          .        .       .            .           .
       2002  |   2.210631          .        .       .            .           .
       2003  |   2.806222          .        .       .            .           .
       2004  |   4.559728          .        .       .            .           .
       2005  |   3.874509          .        .       .            .           .
       2006  |   4.588809          .        .       .            .           .
       2007  |   4.530197          .        .       .            .           .
       2008  |   2.666386          .        .       .            .           .
       2009  |  -.7953148          .        .       .            .           .
       2010  |   3.404284          .        .       .            .           .
       2011  |   2.874236          .        .       .            .           .
       2012  |   2.354694          .        .       .            .           .
       2013  |   2.248895          .        .       .            .           .
             |
       _cons |   1.043858          .        .       .            .           .
------------------------------------------------------------------------------

As p = 0.0239 <0.05, I believe that this suggests that growth rates prior to selection of a host country do have an influence on the assignment of a host country, so the parallel trends assumption would be rejected. In my opinion, this would be expected as the countries that are selected to host a World Cup are not representative of the entire population of countries?

Can I interpret whether the chance of being selected is positively/negatively affected by the growth rate in the preceding period from this output?

Is it also possible for me to make valuable interpretations based on the individual coefficients, and their p-values shown in the first part of the output?

I also ran the test on the smaller sample of countries which either hosted or bid to host the World Cup. I acknowledge the problems you mention with this but I believe that this could still be useful to know for my analysis. The output for this is shown below.

Code:

Fixed-effects (within) regression               Number of obs     =        391
Group variable: Country                         Number of groups  =         17

R-sq:                                           Obs per group:
     within  = 0.0976                                         min =         23
     between = 0.0351                                         avg =       23.0
     overall = 0.0915                                         max =         23

                                                F(16,16)          =          .
corr(u_i, Xb)  = -0.0331                        Prob > F          =          .

                               (Std. Err. adjusted for 17 clusters in Country)
------------------------------------------------------------------------------
             |               Robust
OwnGDPGROWTH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         F4. |   6.469129   5.344997     1.21   0.244    -4.861758    17.80002
         F3. |  -.4051233   1.443056    -0.28   0.783    -3.464265    2.654018
         F2. |   .0202616   .9458773     0.02   0.983    -1.984909    2.025432
         F1. |   1.160813   1.110055     1.05   0.311    -1.192399    3.514025
         --. |   4.527132   2.608705     1.74   0.102    -1.003075    10.05734
             |
        year |
       1992  |   9.656502   8.733474     1.11   0.285    -8.857637    28.17064
       1993  |    8.77952   8.531489     1.03   0.319    -9.306429    26.86547
       1994  |   8.961711     6.9308     1.29   0.214    -5.730928    23.65435
       1995  |   6.958115   6.805376     1.02   0.322    -7.468638    21.38487
       1996  |    9.41244     7.3769     1.28   0.220     -6.22589    25.05077
       1997  |   9.323467   7.972873     1.17   0.259    -7.578268     26.2252
       1998  |   8.963248   8.679357     1.03   0.317    -9.436166    27.36266
       1999  |   7.944467   7.664559     1.04   0.315    -8.303672    24.19261
       2000  |   8.839136   6.928326     1.28   0.220    -5.848259    23.52653
       2001  |   7.284487   7.029684     1.04   0.315    -7.617778    22.18675
       2002  |   6.649712   6.425469     1.03   0.316    -6.971674     20.2711
       2003  |   4.361678   4.087364     1.07   0.302    -4.303146     13.0265
       2004  |   11.48832   9.428968     1.22   0.241    -8.500199    31.47684
       2005  |    8.45386   7.084341     1.19   0.250    -6.564273    23.47199
       2006  |   9.557364   7.292943     1.31   0.209    -5.902984    25.01771
       2007  |   8.681362   6.719467     1.29   0.215    -5.563272      22.926
       2008  |   7.919468   7.309662     1.08   0.295    -7.576324    23.41526
       2009  |   4.574753   7.257157     0.63   0.537    -10.80973    19.95924
       2010  |   8.681128   7.158668     1.21   0.243     -6.49457    23.85683
       2011  |   7.728206   7.286837     1.06   0.305    -7.719199    23.17561
       2012  |   7.884014   7.640629     1.03   0.317    -8.313396    24.08142
       2013  |   7.556006   7.297041     1.04   0.316    -7.913031    23.02504
             |
       _cons |  -4.715459   7.070428    -0.67   0.514     -19.7041    10.27318
-------------+----------------------------------------------------------------
     sigma_u |  1.3879308
     sigma_e |  7.0952851
         rho |  .03685428   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test F1.t1 F2.t1 F3.t1 F4.t1 , coef

 ( 1)  F.t1 = 0
 ( 2)  F2.t1 = 0
 ( 3)  F3.t1 = 0
 ( 4)  F4.t1 = 0

       F(  4,    16) =    0.74
            Prob > F =    0.5767
Warning:  variance matrix is nonsymmetric or highly singular


Constrained coefficients

                                (Std. Err. adjusted for clustering on Country)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          t1 |
         F4. |   4.44e-15          .        .       .            .           .
         F3. |  -8.88e-16          .        .       .            .           .
         F2. |  -1.39e-17          .        .       .            .           .
         F1. |   1.11e-15          .        .       .            .           .
         --. |   1.906011          .        .       .            .           .
             |
        year |
       1992  |   3.382928          .        .       .            .           .
       1993  |   2.192104          .        .       .            .           .
       1994  |   3.376567          .        .       .            .           .
       1995  |   1.927998          .        .       .            .           .
       1996  |   3.220271          .        .       .            .           .
       1997  |   3.096454          .        .       .            .           .
       1998  |     2.5727          .        .       .            .           .
       1999  |   1.575296          .        .       .            .           .
       2000  |   3.154047          .        .       .            .           .
       2001  |   1.578896          .        .       .            .           .
       2002  |   1.380732          .        .       .            .           .
       2003  |   .7589511          .        .       .            .           .
       2004  |   4.468466          .        .       .            .           .
       2005  |   2.993558          .        .       .            .           .
       2006  |   3.836304          .        .       .            .           .
       2007  |    3.32013          .        .       .            .           .
       2008  |   2.386979          .        .       .            .           .
       2009  |   -1.07889          .        .       .            .           .
       2010  |   3.035535          .        .       .            .           .
       2011  |   1.918516          .        .       .            .           .
       2012  |   1.879719          .        .       .            .           .
       2013  |   1.747876          .        .       .            .           .
             |
       _cons |   .9110398          .        .       .            .           .
------------------------------------------------------------------------------

Here, as p=0.5767 > 0.05, would I be able to conclude that, within this sample of countries who select into the bidding process, the parallel growth trends assumption cannot be rejected? Thus, there is not evidence of selection within this group.

Just to ensure I understand this correctly, the test (test F1.t1 F2.t1 F3.t1 F4.t1 , coef) jointly assesses whether the pre-period growth impacts are zero? This then tests whether prior growth rates influence the decision of selecting a host country? How would I correctly refer to and explain the test I have conducted?

Also, if I was to present this evidence, would it be suitable to just reference the test and discuss how this was produced or would I also need to include both/one of the tables which are produced? Does the second table provide me with any additional insight? I realise that this question is quite specific to my own thesis, so no worries if you can't offer specific advice on this.

Thanks again,

Nic

Comment

Kye Lippold

Join Date: Jun 2019

Posts: 67
#12

12 Jul 2019, 15:24

As p = 0.0239 < 0.05, I believe that this suggests that growth rates prior to selection of a host country do have an influence on the assignment of a host country, so the parallel trends assumption would be rejected. In my opinion, this would be expected as the countries that are selected to host a World Cup are not representative of the entire population of countries?

Yes, that all sounds right to me.

Can I interpret whether the chance of being selected is positively/negatively affected by the growth rate in the preceding period from this output?

The F-test just tells you if the result is significant, not the sign. But by looking at the individual coefficients, you can see that the point estimates are all positive--meaning countries that are selected have higher growth in years before the selection.

Is it also possible for me to make valuable interpretations based on the individual coefficients, and their p-values shown in the first part of the output?

The individual coefficients are valid to say things like "what is the relationship of cup selection to growth 1 year prior, holding other variables constant." But the key is that in a time series, there is strong serial correlation in growth, so you can get individually non-significant coefficients (since the effect for each individual coefficient is holding growth at other time periods constant, meaning there is little variation left to detect) but a significant joint test. See here for some intuitive explanations: https://stats.stackexchange.com/questions/3549/why-is-it-possible-to-get-significant-f-statistic-p-001-but-non-significant-r

Here, as p=0.5767 > 0.05, would I be able to conclude that, within this sample of countries who select into the bidding process, the parallel growth trends assumption cannot be rejected? Thus, there is not evidence of selection within this group.

I am skeptical about drawing this conclusion from your data. It is true that the test does not reject parallel trends for this subsample, but you have gotten rid of 90% of your sample. So your standard errors are large, and the F-test is likely underpowered to detect meaningful differences. (One way to see this: you are trying to estimate 17 variables on a dataset of only 17 countries. Your overall model F-stat is missing, which is a sign that Stata can't evaluate the fit of your model with so few groups. Stata will dutifully spit out results, but they are not very meaningful with such a small sample).

Just to ensure I understand this correctly, the test (test F1.t1 F2.t1 F3.t1 F4.t1 , coef) jointly assesses whether the pre-period growth impacts are zero? This then tests whether prior growth rates influence the decision of selecting a host country? How would I correctly refer to and explain the test I have conducted?

Yes, the -test- command is performing an F-test of the null hypothesis that the relationship of cup selection to growth is zero in all 4 pre-periods. Any econometrics textbook will explain more about this procedure. I would describe the procedure as "an F test that pre-treatment effects are jointly zero".

Also, if I was to present this evidence, would it be suitable to just reference the test and discuss how this was produced or would I also need to include both/one of the tables which are produced? Does the second table provide me with any additional insight?

I would just describe it as an F-test, and report the p-value. No need to include the tables. The main insight of the second table is that something is off about your standard errors--likely there are too few treated countries for the model to compute the variance (i.e. sparse covariance matrix).
Comment
Nicholas Hamer

Join Date: May 2019

Posts: 23
#13

13 Jul 2019, 13:19

Thanks again for your reply. I think I have enough to proceed with what you suggested now using the full sample of countries.

I have one last slightly different query, relating to the treatment effects I uncover in my regression. Are the treatment effects I estimate average treatment effects (ATE) or average treatment effects on the treated (ATT). Does this change when using the full sample of countries and the sample which includes only bidding countries?

In the sample formed of only bidding countries, I had been told that the treatments effects were ATT, but I found this confusing as not all countries in this sample receive the treatment, they just self-select into the group which can be potentially treated. I am unsure if I understand the distinction between these treatment effects entirely. I have looked this up but the examples regarding ATT seem to focus on analysis in which all units have been treated (such as estimating the returns to a university degree for those who have been to university).

Or is it that I am estimating the impact on host countries, but cannot observe what would have happened to them in the absence of the World Cup. Thus the remaining countries serve as the counterfactual. This seems to lead me think I am estimating ATT's, but again this isnt clear to me.

Thanks,

Nic
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#14

14 Jul 2019, 00:56

Your model is a fixed effects model, which is a generalized version of a difference-in-differences model. Thus, it estimates an ATT (and not an ATE) regardless of the sample used.

is it that I am estimating the impact on host countries, but cannot observe what would have happened to them in the absence of the World Cup. Thus the remaining countries serve as the counterfactual. This seems to lead me think I am estimating ATT's

This logic is correct. The fixed effects method uses the non-hosting countries as the counterfactual for the host countries, so you are finding the effect on the host (treated) countries.

See the discussion at https://www.statalist.org/forums/for...in-differences for more detail, which includes some references that should help clarify the difference between the two types of treatment effects.
Comment
Nicholas Hamer

Join Date: May 2019

Posts: 23
#15

15 Jul 2019, 03:21

Ok, thanks for your reply. I think I understand this now.

Thanks for all the help you have offered me, it is much appreciated.

Nic
1 like
Comment

Announcement

Testing for Endogeneity of Dummy Variables.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment