Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating dummy variables for country-specific and time-fixed effects in panel data for country analysis

    Hi everyone,

    I am currently writing my thesis which focuses on the relationship between poverty reduction and foreign aid (official development assistance). I am using unbalanced panel data for 21 SSA countries over the period 2000-2020. I was told by my supervisor to add country-specific and time-fixed effects but I am facing difficulties understanding how to include them in my model.
    Based on the hausman test, I should use the FEM.
    My first question concerns the country-fixed effects. I have created a dummy variable ("landlocked) however, when i regress log_inf mort ( my dependant variable) on log_odagni (key independant variable) and the other 5 control variables and add the "i.landlocked" or even "i.country" variable to account for country-fixed effects I get the " " countryid/landlocked ommited because" of collinearity error message. Following this, my supervisor told me that I should add a dummy variable for each country but I read that adding dummy variables which are time-constant are likely to be swiped out as when one uses the "xtreg, fe" command because this command already includes the country-specific effets. So I am kind of confused on how I should account for country-fixed effects in this case ?

    code:
    xtreg log_infmort log_odagni log_gini log_infl log_trade log_gdppercap log_fdi i.countryid, fe vce(cluster countryid)
    xtreg log_infmort log_odagni log_gini log_infl log_trade log_gdppercap log_fdi i.landlocked, fe vce(cluster countryid)


    Stata says that " countryid/landlocked ommited because" of collinearity error.

    Following those results, I was told to use the "regress i.year I.country , ro" command. What I don't understand is the difference between "xtreg, fe" and " regress, ro". Can I use the latter in a fixed-effect panel data analysis ? Also, does using the '" fe, vce (cluster countryid) command makes any sense when I only have 21 countries in my dataset ?
    Thanks in advance!






  • #2
    When you xtset your data with

    xtset countryid year
    -xtreg, fe- already includes the country effects in the regression for you. So you just need to add the year effects

    Code:
    xtreg log_infmort ..._fdi i.year, fe vce(cluster countryid)
    Stata says that " countryid/landlocked ommited because" of collinearity error.
    The variable landlocked, I presume, is an indicator of whether a country is landlocked. This feature is time-invariant (does not change over time), and since the fixed effects estimator only looks at within-variation (variation over time), then this variable will be dropped. In a sense, its effect is already accounted for by the country effects.

    Also, does using the '" fe, vce (cluster countryid) command makes any sense when I only have 21 countries in my dataset ?
    Ideally, you want to have at least 30 countries.
    Last edited by Andrew Musau; 19 Dec 2022, 13:25.

    Comment


    • #3
      Thank you for your past reply Andrew!

      Comment


      • #4
        Thank you for your fast reply Andrew!

        Comment


        • #5
          Hi Andrew,
          I have a few other questions
          After I add the year effects with the 'i.year' command, the results come out as signficant.
          code
          xtreg log_infmort log_odagdp log_trade log_fdi gini governance infl log_gdp_pc i.year, fe
          Those are the results I get :
          Click image for larger version

Name:	Capture d’écran (185).png
Views:	1
Size:	57.0 KB
ID:	1694265

          Click image for larger version

Name:	Capture d’écran (187).png
Views:	1
Size:	39.0 KB
ID:	1694266


          However, my issue is that when i use the robust standard errors with the "fe vce (robust) command", most of the explanaraty variables turn insignficant
          Click image for larger version

Name:	Capture d’écran (188).png
Views:	1
Size:	53.6 KB
ID:	1694267
          Click image for larger version

Name:	Capture d’écran (189).png
Views:	1
Size:	50.2 KB
ID:	1694268

          What do you think is causing that and how would you do advise to overcome this issue ?

          Thank you!

          Comment


          • #6
            I think 20 clusters is too small a number. Look at wild bootstrap which is recommended for estimates with clustered standard errors and few clusters, implemented by boottest from SSC.

            Code:
            ssc install boottest, replace
            The help file will have some examples of how you implement this.

            Code:
            help boottest

            Also, rescale some of your variables so that their coefficients are readable (so you don't have "0.000x"). Report raw values of the variables in thousands or millions, for example.
            Last edited by Andrew Musau; 20 Dec 2022, 09:58.

            Comment


            • #7

              Based on what I read , I ran the following code boottest log_odagdp, reps(999)and this is what I got
              Click image for larger version

Name:	Capture d’écran (190).png
Views:	1
Size:	18.8 KB
ID:	1694309

              However, I have no clue on how to interpret this result since I am not familiar at all with the boostrapping tehcnique. I couldn't find any clear explanation on how to interpret.

              Could you, perhaps, give me an example of what an code with boottest would look like?
              Since I only have 20 countries, is it still necessary to cluster the SE?

              Thank you so much for your help.

              Comment


              • #8
                Since I only have 20 countries, is it still necessary to cluster the SE?
                Yes, that is exactly what wild bootstrap addresses. Clustering with a small number of clusters. Here is an example using the Grunfeld dataset. I also use estout from SSC. Assuming no clustering, my regression and results would look as follows:

                Code:
                webuse grunfeld, clear
                regress invest mvalue kstock i.company i.year
                esttab ., indicate("Firm effects= *.company" "Year effects=*.year") ci starl(* 0.1 ** 0.05 *** 0.01) nocons
                Res.:

                Code:
                . esttab ., indicate("Firm effects= *.company" "Year effects=*.year") ci starl(* 0.1 ** 0.05 *** 0.01) nocons
                
                --------------------------------------
                                                (1)  
                                             invest  
                --------------------------------------
                mvalue                        0.118***
                                     [0.0906,0.145]  
                
                kstock                        0.358***
                                      [0.313,0.403]  
                
                Firm effects                    Yes  
                
                Year effects                    Yes  
                --------------------------------------
                N                               200  
                --------------------------------------
                95% confidence intervals in brackets
                * p<0.1, ** p<0.05, *** p<0.01
                Alternatively, you could choose to report t-statistics in place of confidence intervals.

                Code:
                webuse grunfeld, clear
                regress invest mvalue kstock i.company i.year
                esttab ., indicate("Firm effects= *.company" "Year effects=*.year") t starl(* 0.1 ** 0.05 *** 0.01) nocons
                Res.:

                Code:
                . esttab ., indicate("Firm effects= *.company" "Year effects=*.year") t starl(* 0.1 ** 0.05 *** 0.01) nocons
                
                ----------------------------
                                      (1)  
                                   invest  
                ----------------------------
                mvalue              0.118***
                                   (8.56)  
                
                kstock              0.358***
                                  (15.75)  
                
                Firm effects          Yes  
                
                Year effects          Yes  
                ----------------------------
                N                     200  
                ----------------------------
                t statistics in parentheses
                * p<0.1, ** p<0.05, *** p<0.01
                With wild bootstrap, the coefficients stay the same. You just report the Wild bootstrap-t or the Wild bootstrap-confidence intervals (highlighted below) in place of the previous. With the example above:



                Code:
                webuse grunfeld, clear
                regress invest mvalue kstock i.company i.year
                *SEED SET ONLY FOR REPLICATION. REMOVE IN ACTUAL IMPLEMENTATION
                set seed 12212022
                foreach var in mvalue kstock{
                    boottest `var', cluster(company)
                }
                Res.:

                Code:
                . foreach var in mvalue kstock{
                  2.
                .     boottest `var', cluster(company)
                  3.
                . }
                
                Overriding estimator's cluster/robust settings with cluster(company)
                
                Wild bootstrap-t, null imposed, 999 replications, Wald test, clustering by company, bootstrap clustering by company, Rademacher weights:
                  mvalue
                
                                            t(9) =    10.5965
                                        Prob>|t| =     0.0000
                
                95% confidence set for null hypothesis expression: [.05921, .1299]
                
                Overriding estimator's cluster/robust settings with cluster(company)
                
                Wild bootstrap-t, null imposed, 999 replications, Wald test, clustering by company, bootstrap clustering by company, Rademacher weights:
                  kstock
                
                                            t(9) =     7.2887
                                        Prob>|t| =     0.0400
                
                95% confidence set for null hypothesis expression: [.02991, .5805]
                
                .

                See how to do this in estout in #9 of https://www.statalist.org/forums/for...t-few-clusters, although I believe that now bootest stores these statistics after a recent update, so it is easier to access them than as shown in the link.
                Last edited by Andrew Musau; 20 Dec 2022, 19:08.

                Comment


                • #9
                  Hi Andrew,

                  Thank your for your help on the cluster issue. My supervisor told me to continue with the cluster SE despite the very small amount so I followed his advices and kept the robust-clustered SE.

                  I am getting back to you, again, because I am facing new issues with my model.

                  So I decided to stick with the fixed-effects model ( panel data). Although a few variables appear to be significant, the within R-sq is way too high compared to the litterature and I do not see where this very high value would come from. The Within R-Squared is at 0.9249 while it should be around 0.2 based on the literature.
                  The VIF being smaller than 10, the cause should not be multicollinearity. What do you think could be leading to such a high value ?

                  Click image for larger version

Name:	Capture d’écran (217).png
Views:	1
Size:	61.1 KB
ID:	1695960



                  Thanks in advance!

                  Comment


                  • #10
                    Country-specific and year-fixed effects are probably the reasons why the whithin r-squared is so high. I am thinking about broadening my dataset and taking into account more countries to see if the issues get fixed but I don't know if that's the best approach to take..

                    Comment


                    • #11
                      It appears that you have infant mortality on the left-hand side and a bunch of macroeconomic variables on the right-hand side. Therefore, it should not be surprising that the \(\text{within-}R^2\) statistic is high given the strong correlation between the outcome and regressors in addition to the inclusion of country and time effects. Take the log of GDP as an example. Tell me the change in a country's wealth over time and I can make a pretty good guess about the change in its infant mortality rate. Additionally, there is a strong time effect as infant mortality has been declining for almost all countries due to advances in medical care. Expanding the sample won't help. In fact, let's take all the data the World Bank has on infant mortality and GDP (all countries, starting in 1960). Below, I use multiline from SSC to compare the trends of these two variables over time.

                      Code:
                      tempfile infm
                      copy "https://api.worldbank.org/v2/en/indicator/SP.DYN.IMRT.IN?downloadformat=csv" API_SP.DYN.IMRT.MA.IN_DS2_en_csv_v2_4772824.zip, replace
                      unzipfile API_SP.DYN.IMRT.MA.IN_DS2_en_csv_v2_4772824.zip, replace
                      import delimited "API_SP.DYN.IMRT.IN_DS2_en_csv_v2_4770442.csv", varnames(1) encoding(UTF-8) rowrange(6) clear
                      rename (v5-v65) infm#, addnumber(1960)
                      rename datasource country
                      keep country inf*
                      reshape long infm, i(country) j(year)
                      save `infm', replace
                      copy "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD?downloadformat=csv" API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.zip, replace
                      unzipfile API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.zip, replace
                      import delimited "API_NY.GDP.MKTP.KD_DS2_en_csv_v2_4770407.csv",varnames(1) encoding(UTF-8) rowrange(6) clear
                      rename (v5-v65) gdp#, addnumber(1960)
                      rename datasource country
                      keep country gdp*
                      reshape long gdp, i(country) j(year)
                      gen lngpd= ln(gdp)
                      merge 1:1 country year using `infm', nogen
                      encode country, g(countryn)  
                      
                      *GRAPH TRENDS
                      preserve
                      collapse inf gdp, by(year)
                      *ssc install multiline, replace
                      set scheme s1color
                      multiline inf gdp year
                      restore
                      Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.1 KB
ID:	1695976





                      Almost an inverted mirror image. It is clear that the wealthier a country gets over time, the lower is the infant mortality. With only these two variables and taking country and year effects into account (sample: whole world 1960-2020), the \(\text{within-}R^2\) is already above 70%.

                      Code:
                      xtset countryn year
                      xtreg infm lngpd i.year, fe cluster(countryn)

                      Res.:

                      Code:
                      
                      . xtreg infm lngpd i.year, fe cluster(countryn)
                      
                      Fixed-effects (within) regression               Number of obs     =     10,364
                      Group variable: countryn                        Number of groups  =        239
                      
                      R-sq:                                           Obs per group:
                           within  = 0.7093                                         min =          8
                           between = 0.0668                                         avg =       43.4
                           overall = 0.2035                                         max =         61
                      
                                                                      F(61,238)         =      16.02
                      corr(u_i, Xb)  = -0.5579                        Prob > F          =     0.0000
                      
                                                   (Std. Err. adjusted for 239 clusters in countryn)
                      ------------------------------------------------------------------------------
                                   |               Robust
                              infm |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                             lngpd |  -12.73475   3.199117    -3.98   0.000    -19.03695   -6.432552
                                   |
                              year |
                             1961  |   -2.21748   .3705631    -5.98   0.000    -2.947482   -1.487477
                             1962  |  -4.023321   .5790093    -6.95   0.000    -5.163959   -2.882683
                             1963  |  -5.720347   .8187534    -6.99   0.000    -7.333276   -4.107418
                             1964  |  -6.764065   1.228427    -5.51   0.000    -9.184044   -4.344086
                             1965  |  -5.502915   2.157301    -2.55   0.011    -9.752758   -1.253073
                             1966  |  -7.157392   2.293465    -3.12   0.002    -11.67548   -2.639308
                             1967  |  -8.484931   2.513037    -3.38   0.001    -13.43557   -3.534294
                             1968  |  -10.41661    2.73544    -3.81   0.000    -15.80538   -5.027846
                             1969  |  -11.58164   2.905099    -3.99   0.000    -17.30463   -5.858645
                             1970  |  -13.44369   3.141233    -4.28   0.000    -19.63186   -7.255519
                             1971  |  -13.89519   3.338002    -4.16   0.000    -20.47099   -7.319387
                             1972  |  -15.22334   3.439078    -4.43   0.000    -21.99826   -8.448416
                             1973  |  -16.89866   3.564241    -4.74   0.000    -23.92014   -9.877168
                             1974  |  -18.54105   3.741073    -4.96   0.000    -25.91089    -11.1712
                             1975  |  -20.68005   3.870905    -5.34   0.000    -28.30567   -13.05444
                             1976  |  -22.33382   4.041255    -5.53   0.000    -30.29502   -14.37263
                             1977  |  -25.16434   4.193598    -6.00   0.000    -33.42565   -16.90303
                             1978  |  -26.82776   4.316486    -6.22   0.000    -35.33115   -18.32436
                             1979  |  -28.42597   4.452018    -6.38   0.000    -37.19637   -19.65558
                             1980  |  -29.55085   4.642306    -6.37   0.000     -38.6961   -20.40559
                             1981  |  -30.90049   4.765964    -6.48   0.000    -40.28935   -21.51163
                             1982  |  -32.50226   4.817566    -6.75   0.000    -41.99277   -23.01174
                             1983  |  -33.96469   4.918505    -6.91   0.000    -43.65406   -24.27533
                             1984  |  -35.20083   5.016136    -7.02   0.000    -45.08253   -25.31914
                             1985  |  -36.20073   5.131291    -7.05   0.000    -46.30928   -26.09218
                             1986  |  -37.16819   5.239843    -7.09   0.000    -47.49059    -26.8458
                             1987  |  -38.12364   5.355746    -7.12   0.000    -48.67437   -27.57292
                             1988  |  -38.87782   5.470494    -7.11   0.000    -49.65459   -28.10105
                             1989  |  -39.79213   5.566489    -7.15   0.000    -50.75801   -28.82625
                             1990  |  -39.47025   5.709229    -6.91   0.000    -50.71733   -28.22318
                             1991  |  -40.40695   5.747001    -7.03   0.000    -51.72844   -29.08547
                             1992  |  -41.28604   5.782534    -7.14   0.000    -52.67753   -29.89456
                             1993  |  -41.91871   5.836191    -7.18   0.000     -53.4159   -30.42152
                             1994  |  -42.44101   5.932459    -7.15   0.000    -54.12784   -30.75417
                             1995  |  -43.63822   5.984055    -7.29   0.000     -55.4267   -31.84974
                             1996  |   -44.1279   6.098038    -7.24   0.000    -56.14092   -32.11488
                             1997  |  -44.78316   6.212994    -7.21   0.000    -57.02265   -32.54368
                             1998  |  -45.50057   6.296598    -7.23   0.000    -57.90475   -33.09638
                             1999  |  -46.40061   6.390239    -7.26   0.000    -58.98927   -33.81196
                             2000  |  -47.03476   6.496424    -7.24   0.000    -59.83259   -34.23692
                             2001  |  -47.96282   6.579339    -7.29   0.000      -60.924   -35.00164
                             2002  |  -48.88277   6.660583    -7.34   0.000      -62.004   -35.76155
                             2003  |  -49.72484   6.759084    -7.36   0.000    -63.04011   -36.40957
                             2004  |  -50.28563   6.900377    -7.29   0.000    -63.87925   -36.69202
                             2005  |  -50.95231   7.035003    -7.24   0.000    -64.81114   -37.09349
                             2006  |  -51.42709    7.18992    -7.15   0.000     -65.5911   -37.26308
                             2007  |  -51.80968   7.337568    -7.06   0.000    -66.26455   -37.35481
                             2008  |  -52.23486   7.446523    -7.01   0.000    -66.90438   -37.56535
                             2009  |  -53.25886    7.46135    -7.14   0.000    -67.95758   -38.56014
                             2010  |  -53.45914   7.531236    -7.10   0.000    -68.29553   -38.62274
                             2011  |  -54.10295    7.68767    -7.04   0.000    -69.24752   -38.95838
                             2012  |  -54.46806   7.773322    -7.01   0.000    -69.78136   -39.15476
                             2013  |   -54.8021    7.85836    -6.97   0.000    -70.28292   -39.32127
                             2014  |  -55.06814   7.955975    -6.92   0.000    -70.74126   -39.39501
                             2015  |  -55.41551   8.042443    -6.89   0.000    -71.25898   -39.57205
                             2016  |  -55.68486    8.13643    -6.84   0.000    -71.71347   -39.65624
                             2017  |  -55.95722   8.230028    -6.80   0.000    -72.17022   -39.74422
                             2018  |   -56.1778   8.326473    -6.75   0.000     -72.5808   -39.77481
                             2019  |  -56.35894   8.420748    -6.69   0.000    -72.94765   -39.77022
                             2020  |   -57.5621   8.304639    -6.93   0.000    -73.92209   -41.20211
                                   |
                             _cons |   398.9355   73.55277     5.42   0.000     254.0379    543.8331
                      -------------+----------------------------------------------------------------
                           sigma_u |  44.304969
                           sigma_e |  12.958064
                               rho |  .92119949   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      Last edited by Andrew Musau; 04 Jan 2023, 17:28.

                      Comment


                      • #12
                        Thank you so much for your help!! Makes more sense now..
                        So since expanding the data set would not change anything, keeping all those macro-economic variables in the model might be problematic ? If I understand well, the best alternative would be to select other variables ?

                        Comment


                        • #13
                          No. You should rely on economic theory to decide what variables to include. A high within \(R^2\) statistic in itself is not a problem. On multicollinearity, note that it does not affect the overall fit of the model and the estimation of the coefficients on nonmulticollinear variables remain largely unaffected. So if you have extreme multicollinearity, it is only a problem if a key variable whose effect you need to estimate is involved, and the results show a confidence interval that is so wide that the estimate is not useful for practical purposes. More importantly, you should check for functional form misspecification using a RESET test.

                          Comment


                          • #14
                            Ok I see
                            Thank you so much for your help Andrew!

                            Comment

                            Working...
                            X