Fixed effects, Random effects, Mixed-effects, clustered standard errors and the difference

Sam Murgatroyd

Join Date: Oct 2023
Posts: 33

Fixed effects, Random effects, Mixed-effects, clustered standard errors and the difference

13 Jun 2024, 04:45

Hello,

I am conceptually stuck on the difference between random effects and mixed-effects models, when to use which one, and on the need for clustering standard errors in the mixed-effects setting. Most of the expositions of mixed-effects models I have seen frame the issue in terms of patients nested within doctors or students nested within schools. I am finding it hard to reconcile these examples with the country-level data that I am using and am hoping that the information I provide is clear enough that someone with a better understanding can help me resolve my confusion. The data and main variables that I am using are as follows:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str56 country float id double price_dispersion_use float TS_ce2 byte E double(coc unem) float lnGDPPC str3 region float region_id int year
"Afghanistan"          1                 15 . 4   -1.36474287509918               7.91  8.014661 "EMR" 3 2014
"Afghanistan"          1                  . . 5   -1.54035270214081             10.092  7.994392 "EMR" 3 2016
"Afghanistan"          1 13.333333333333334 . 5   -1.50288057327271             11.131  7.974823 "EMR" 3 2018
"Afghanistan"          1  11.76470588235294 . 5   -1.49369978904724              11.71  7.928968 "EMR" 3 2020
"Afghanistan"          1                  . . 5   -1.18377649784088               14.1         . "EMR" 3 2022
"Albania"              2  44.44444444444444 1 5   -.586141347885132              18.05  9.465752 "EUR" 4 2014
"Albania"              2 56.666666666666664 1 5   -.471469223499298              15.42  9.524819 "EUR" 4 2016
"Albania"              2               62.5 1 5   -.545840263366699               12.3  9.604934 "EUR" 4 2018
"Albania"              2  60.60606060606061 1 5   -.572924494743347             12.833   9.60202 "EUR" 4 2020
"Albania"              2                 60 1 5   -.407875537872314             11.629  9.756207 "EUR" 4 2022
"Algeria"              3  33.33333333333333 6 4    -.61265641450882              10.21  9.511575 "AFR" 1 2014
"Algeria"              3 35.714285714285715 . 4    -.67341673374176               10.2  9.539472 "AFR" 1 2016
"Algeria"              3                 15 . 5   -.658660113811493             12.145  9.525714 "AFR" 1 2018
"Algeria"              3                 50 . 5    -.66646021604538             14.036  9.447598 "AFR" 1 2020
"Algeria"              3  48.57142857142857 5 5   -.637929856777191             12.491 9.4796715 "AFR" 1 2022
"Andorra"              4  72.85714285714285 6 2    1.22070860862732 3.4574213637138524 11.030043 "EUR" 4 2014
"Andorra"              4  72.85714285714285 6 2    1.15955591201782 3.4620623503549632 11.067958 "EUR" 4 2016
"Andorra"              4  77.77777777777777 1 2    1.17916560173035 3.4856245150448624 11.053652 "EUR" 4 2018
"Andorra"              4  68.44993141289439 1 2    1.26600527763367 3.5770777680298496  10.91981 "EUR" 4 2020
"Andorra"              4  69.86301369863014 1 2    1.27020359039307  3.890483345078273 11.056888 "EUR" 4 2022
"Angola"               5                 75 . 2   -1.45779824256897             16.317  9.236285 "AFR" 1 2014
"Angola"               5                  . . 2   -1.48333728313446             16.577  9.147498 "AFR" 1 2016
"Angola"               5                  . . 4   -1.19925093650818             16.626   9.06262 "AFR" 1 2018
"Angola"               5                 25 2 4   -.938672542572021             16.698 8.9309025 "AFR" 1 2020
"Angola"               5                 25 . 4   -.601941287517548             14.478  8.910195 "AFR" 1 2022
"Antigua and Barbuda"  6                 75 . 2    .634897768497467 3.4574213637138524 10.143725 "AMR" 2 2014
"Antigua and Barbuda"  6                 50 . 2    .645558714866638 3.4620623503549632  10.18348 "AMR" 2 2016
"Antigua and Barbuda"  6             66.875 . 5    .236239701509476 3.4856245150448624 10.263378 "AMR" 2 2018
"Antigua and Barbuda"  6  62.05673758865249 . 5    .238533273339272 3.5770777680298496 10.073401 "AMR" 2 2020
"Antigua and Barbuda"  6 63.829787234042556 . 5    .310604453086853  3.890483345078273  10.23125 "AMR" 2 2022
"Argentina"            7 41.935483870967744 2 4   -.549443066120148               7.27  10.25563 "AMR" 2 2014
"Argentina"            7              37.75 3 4   -.298964887857437              8.085 10.240202 "AMR" 2 2016
"Argentina"            7  45.34920634920635 3 4   -.098668172955513               9.22 10.220944 "AMR" 2 2018
"Argentina"            7 18.726114649681527 3 4    -.16378065943718              11.46 10.076843 "AMR" 2 2020
"Argentina"            7 13.384615384615383 3 4   -.447030484676361              6.805   10.2083 "AMR" 2 2022
"Armenia"              8                 30 6 2   -.565155386924744             11.989  9.488754 "EUR" 4 2014
"Armenia"              8 26.666666666666668 6 2   -.659123718738556             12.625  9.530623 "EUR" 4 2016
"Armenia"              8 42.857142857142854 3 2   -.408891350030899              13.21  9.663906 "EUR" 4 2018
"Armenia"              8               47.5 1 5 -.00343869999051094              12.18  9.673404 "EUR" 4 2020
"Armenia"              8  48.23529411764706 1 5   .0280352365225554              8.588  9.857456 "EUR" 4 2022
"Australia"            9  78.93318965517241 1 4    1.84946465492249               6.08  10.90798 "WPR" 6 2014
"Australia"            9  73.84341637010677 1 4    1.77200365066528               5.71  10.92716 "WPR" 6 2016
"Australia"            9  82.34126984126985 1 4    1.76737761497498                5.3 10.947303 "WPR" 6 2018
"Australia"            9  71.02189781021899 6 4    1.63295590877533               6.46 10.938417 "WPR" 6 2020
"Australia"            9  68.45524542829644 6 4    1.76448953151703                3.7 10.987324 "WPR" 6 2022
"Austria"             10  80.61224489795919 4 4    1.46674907207489               5.67 11.040983 "EUR" 4 2014
"Austria"             10                 80 4 4    1.49696803092957               6.06 11.048753 "EUR" 4 2016
"Austria"             10                 80 4 4    1.56836605072021               4.93 11.083235 "EUR" 4 2018
"Austria"             10  82.45614035087719 4 4    1.47778916358948                5.2 11.020405 "EUR" 4 2020
"Austria"             10  68.35820895522387 4 4    1.25861942768097               4.99 11.094935 "EUR" 4 2022
"Azerbaijan"          11                 24 6 4   -1.02249026298523               4.91  9.938668 "EUR" 4 2014
"Azerbaijan"          11              56.25 1 4   -.852654457092285                  5  9.894967 "EUR" 4 2016
"Azerbaijan"          11 23.076923076923077 6 5   -.852769494056702                4.9  9.893378 "EUR" 4 2018
"Azerbaijan"          11  47.05882352941177 6 5   -1.07708406448364               7.24  9.858809 "EUR" 4 2020
"Azerbaijan"          11  55.55555555555556 6 5   -1.04057228565216               5.65  9.953777 "EUR" 4 2022
"Bahamas"             12 48.658536585365916 1 2    1.30873775482178                  . 10.381657 "AMR" 2 2014
"Bahamas"             12  40.22346368715088 1 2    1.06738793849945               12.7 10.366473 "AMR" 2 2016
"Bahamas"             12                  . . 2    1.09553563594818                 10 10.405302 "AMR" 2 2018
"Bahamas"             12  61.08949416342412 1 2    1.10620594024658             12.563 10.118558 "AMR" 2 2020
"Bahamas"             12                  . 1 2    1.25618994235992             10.089 10.401076 "AMR" 2 2022
"Bahrain"             13                 50 . 5    .273521840572357              1.147 10.890368 "EMR" 3 2014
"Bahrain"             13  33.33333333333333 . 5  -.0476647540926933              1.193 10.877423 "EMR" 3 2016
"Bahrain"             13                 40 2 5   -.176231503486633              1.198  10.88669 "EMR" 3 2018
"Bahrain"             13  34.78260869565218 3 5  -.0935939401388168              1.786 10.867227 "EMR" 3 2020
"Bahrain"             13 58.333333333333336 3 5    .139385640621185              1.339 10.944588 "EMR" 3 2022
"Bangladesh"          14 15.789473684210526 . 4   -.892129957675934              4.405  8.543592 "SEA" 5 2014
"Bangladesh"          14 22.727272727272727 . 4    -.88687801361084               4.35  8.651562 "SEA" 5 2016
"Bangladesh"          14  33.33333333333333 . 4   -.926946818828583              4.373  8.761912 "SEA" 5 2018
"Bangladesh"          14 32.142857142857146 . 4   -1.00367724895477              5.316  8.849105 "SEA" 5 2020
"Bangladesh"          14                 25 . 4    -1.0755273103714              4.271   8.96254 "SEA" 5 2022
"Barbados"            15  79.32850559578671 1 2    1.13345634937286              12.17  9.704554 "AMR" 2 2014
"Barbados"            15              81.25 1 2     1.2135511636734               8.25    9.7494 "AMR" 2 2016
"Barbados"            15  45.23433385992628 1 2    1.37191247940063               8.32  9.741538 "AMR" 2 2018
"Barbados"            15                  . . 2    1.19406688213348              9.743  9.604329 "AMR" 2 2020
"Barbados"            15  78.84615384615384 1 2    1.28457343578339              8.501  9.700481 "AMR" 2 2022
"Belarus"             16             35.625 6 4    -.23470650613308              5.908 10.187328 "EUR" 4 2014
"Belarus"             16 31.914893617021278 6 4   -.224086627364159               5.84  10.12049 "EUR" 4 2016
"Belarus"             16 30.645161290322577 6 4    -.15480200946331               4.76 10.179738 "EUR" 4 2018
"Belarus"             16  25.71428571428572 6 4   -.133964225649834               4.05 10.193598 "EUR" 4 2020
"Belarus"             16 23.958333333333332 6 4    -.57967621088028               3.57 10.185905 "EUR" 4 2022
"Belgium"             17  80.82901554404145 4 4    1.51295030117035               8.52  10.96869 "EUR" 4 2014
"Belgium"             17  81.64556962025317 4 4    1.53148806095123               7.83  10.99063 "EUR" 4 2016
"Belgium"             17  83.33333333333334 4 4    1.42942035198212               5.95 11.016062 "EUR" 4 2018
"Belgium"             17  85.29411764705883 4 4    1.44595634937286               5.55 10.974466 "EUR" 4 2020
"Belgium"             17               72.5 4 4    1.49504864215851               5.56  11.05771 "EUR" 4 2022
"Belize"              18  41.66666666666667 . 2   -.159106820821762               8.24 9.4002905 "AMR" 2 2014
"Belize"              18  41.66666666666667 1 2   -.229891732335091                  7  9.390294 "AMR" 2 2016
"Belize"              18                 40 1 2   -.169436514377594              7.899  9.343116 "AMR" 2 2018
"Belize"              18                 50 1 2   -.193349361419678             10.619  9.203805 "AMR" 2 2020
"Belize"              18 50.391644908616186 1 2   -.237028583884239              8.672  9.426018 "AMR" 2 2022
"Benin"               19                  . 2 4   -.669143795967102              1.808  8.040519 "AFR" 1 2014
"Benin"               19                 20 2 4   -.529120028018951              1.843  8.031984 "AFR" 1 2016
"Benin"               19               22.5 2 5   -.391388416290283               1.47  8.093288 "AFR" 1 2018
"Benin"               19 47.368421052631575 2 5   -.040327787399292              1.616  8.140294 "AFR" 1 2020
"Benin"               19                  . 2 5   -.124255605041981              1.476  8.215443 "AFR" 1 2022
"Bhutan"              20                  . . 4    1.30612897872925               2.63  9.345343 "SEA" 5 2014
"Bhutan"              20                  . . 4    1.09102046489716              2.747  9.469749 "SEA" 5 2016
"Bhutan"              20                  . . 4    1.59051811695099               3.35  9.533319 "SEA" 5 2018
"Bhutan"              20                  . . 4    1.61823654174805               5.03  9.467918 "SEA" 5 2020
"Bhutan"              20                  . 2 4    1.51425933837891               5.95         . "SEA" 5 2022
end
label values TS_ce2 TS_ce2_l
label def TS_ce2_l 1 "specific uniform", modify
label def TS_ce2_l 2 "adv un NO min", modify
label def TS_ce2_l 3 "adv uni WITH min", modify
label def TS_ce2_l 4 "mixed uni NO min", modify
label def TS_ce2_l 5 "mixed uni WITH min", modify
label def TS_ce2_l 6 "specific tiered", modify
label values region_id region_id_l
label def region_id_l 1 "AFR", modify
label def region_id_l 2 "AMR", modify
label def region_id_l 3 "EMR", modify
label def region_id_l 4 "EUR", modify
label def region_id_l 5 "SEA", modify
label def region_id_l 6 "WPR", modify

Outside of region dummies, which identify different continents, none of the variables I am interested are time-invariant by definition. So my starting point was to run a Hausman test on the model with the time-varying independent variables only, to see if fixed effects or random effects is the consistent estimator. My understanding is that if I use the fixed effects estimator, I am controlling for the region-specific effects, the effects just aren't estimable because they are absorbed into the fixed component. I ran the Hausman test (with no region dummies) as follows:

Code:

xi: xtreg price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.year, re cluster(id)
i.TS_ce2          _ITS_ce2_1-10       (naturally coded; _ITS_ce2_1 omitted)
i.year            _Iyear_2014-2022    (naturally coded; _Iyear_2014 omitted)

Random-effects GLS regression                   Number of obs     =        664
Group variable: id                              Number of groups  =        165

R-squared:                                      Obs per group:
     Within  = 0.0934                                         min =          1
     Between = 0.5087                                         avg =        4.0
     Overall = 0.4617                                         max =          5

                                                Wald chi2(15)     =    1676.20
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                   (Std. err. adjusted for 165 clusters in id)
------------------------------------------------------------------------------
             |               Robust
price_disp~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  _ITS_ce2_2 |  -14.02557   3.794631    -3.70   0.000    -21.46291   -6.588226
  _ITS_ce2_3 |  -10.88748   4.686501    -2.32   0.020    -20.07285   -1.702108
  _ITS_ce2_4 |   4.796852   3.110406     1.54   0.123    -1.299433    10.89314
  _ITS_ce2_5 |  -1.795415   4.094529    -0.44   0.661    -9.820545    6.229715
  _ITS_ce2_6 |  -9.561305   3.260805    -2.93   0.003    -15.95237   -3.170244
  _ITS_ce2_8 |  -11.07366   3.741402    -2.96   0.003    -18.40668   -3.740649
 _ITS_ce2_10 |  -21.19221   3.316523    -6.39   0.000    -27.69248   -14.69195
           E |   -.420992     1.1004    -0.38   0.702    -2.577737    1.735753
         coc |   5.599903    1.50856     3.71   0.000     2.643179    8.556627
        unem |  -.3365144   .1896276    -1.77   0.076    -.7081776    .0351489
     lnGDPPC |   4.062827   1.478944     2.75   0.006      1.16415    6.961504
 _Iyear_2016 |   1.719148   1.237929     1.39   0.165    -.7071489    4.145445
 _Iyear_2018 |   2.422449   1.561346     1.55   0.121    -.6377324     5.48263
 _Iyear_2020 |   3.136992   1.555324     2.02   0.044     .0886123    6.185371
 _Iyear_2022 |   2.683144   1.712374     1.57   0.117    -.6730476    6.039336
       _cons |   21.95015   14.74566     1.49   0.137    -6.950821    50.85112
-------------+----------------------------------------------------------------
     sigma_u |  12.909408
     sigma_e |  11.049002
         rho |   .5771861   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic  79.296  Chi-sq(15)   P-value = 0.0000

Evidently, fixed effects is the way to go if I am not interested in knowing the coefficients on the region dummies. However, if I compare the mean of my dependent variable by region, I see that there are statistically significant differences, which may be interesting to investigate in the regression.

Code:

oneway price_dispersion_use region_id, bonferroni

                        Analysis of variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      132305.334      5   26461.0667     65.51     0.0000
 Within groups      324737.956    804    403.90293
------------------------------------------------------------------------
    Total            457043.29    809   564.948442

Bartlett's equal-variances test: chi2(5) =  15.7086    Prob>chi2 = 0.008

             Comparison of price_dispersion_use by group(region)
                                (Bonferroni)
Row Mean-|
Col Mean |        AFR        AMR        EMR        EUR        SEA
---------+-------------------------------------------------------
     AMR |    18.4197
         |      0.000
         |
     EMR |   -2.77116   -21.1908
         |      1.000      0.000
         |
     EUR |    28.7939    10.3743    31.5651
         |      0.000      0.000      0.000
         |
     SEA |   -3.69121   -22.1109   -.920047   -32.4851
         |      1.000      0.000      1.000      0.000
         |
     WPR |    18.4931    .073463    21.2643   -10.3008    22.1843
         |      0.000      1.000      0.000      0.000      0.000

.

Moreover, a reality of my data is that most of the variation in my variables comes from that between countries as opposed to that arising from within them (see below). So even though fixed effects is the consistent estimator, my understanding is that I am ignoring what could be an interesting perspective if I don't account for the between variation.

Code:

xtsum price_dispersion_use TS_ce2 E coc unem lnGDPPC year

Variable         |      Mean   Std. dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
price_~e overall |   54.1633   23.68788          4        100 |     N =     821
         between |             21.47537   7.836111   92.83242 |     n =     192
         within  |             10.13382   16.45002   99.53367 | T-bar = 4.27604
                 |                                            |
TS_ce2   overall |  3.118182   1.910972          1         10 |     N =     770
         between |              1.63939          1          7 |     n =     176
         within  |             .9785963  -1.131818   9.318182 | T-bar =   4.375
                 |                                            |
E        overall |   3.78359   1.078699          2          5 |     N =     975
         between |             1.002953          2          5 |     n =     195
         within  |             .4022524    1.38359    6.18359 |     T =       5
                 |                                            |
coc      overall | -.0858656   .9881318  -1.936706   2.402744 |     N =     963
         between |             .9822009  -1.780401   2.250184 |     n =     193
         within  |             .1345866  -.9270451   .4870617 | T-bar = 4.98964
                 |                                            |
unem     overall |  7.565258   5.748101        .11      28.84 |     N =     960
         between |             5.560955       .146    26.5954 |     n =     193
         within  |             1.455914   .2532579   15.11326 | T-bar = 4.97409
                 |                                            |
lnGDPPC  overall |  9.485856   1.149466   6.753617   11.82817 |     N =     918
         between |             1.148124   6.810744   11.80229 |     n =     185
         within  |             .0820738   9.140171   10.27743 |     T = 4.96216
                 |                                            |
year     overall |      2018   2.829879       2014       2022 |     N =     975
         between |                    0       2018       2018 |     n =     195
         within  |             2.829879       2014       2022 |     T =       5

Given this, I thought that, in addition to the fixed effects estimates, I would run an alternate specification in which I also control for the region dummy using random effects. However, I am unsure if this should be implemented as xtreg, re cluster (id) or in two-level multilevel model in which 'observations are clustered in countries'. The latter consideration came when I found a study that also uses country level data to model a very similar situation to mine. In this study, the authors used "a two-level mixed effects linear regression model".

According to the authors, country was treated as the level 2 identifier, but the explanation on why and how that they provide is very thin. I have tried to implement it to understand how it is different to the RE regression with clustered standard errors. I used the the Stata help manual on multi-level mixed effects to execute the code for the multi-level/mixed model I show below, but I am no closer to understanding if treating the country (id) as the level 2 identifier in the multilevel model is a similar thing to clustering standard errors at the country level in the random effects regression. If these approaches are not doing the same thing, I am lost on what the difference between the two approaches below is and would greatly appreciate guidance on how 'treating observations as nested in countries in my dataset' is different to clustering standard errors at the country level in the non-hierachical model (xtreg, re cluster(id)).

Code:

xtreg price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year, re cluster(id)

Random-effects GLS regression                   Number of obs     =        659
Group variable: id                              Number of groups  =        164

R-squared:                                      Obs per group:
     Within  = 0.0934                                         min =          1
     Between = 0.5542                                         avg =        4.0
     Overall = 0.4828                                         max =          5

                                                Wald chi2(20)     =    1788.60
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                          (Std. err. adjusted for 164 clusters in id)
-------------------------------------------------------------------------------------
                    |               Robust
price_dispersion_~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
             TS_ce2 |
     adv un NO min  |  -12.40899   3.996279    -3.11   0.002    -20.24155   -4.576423
  adv uni WITH min  |  -8.338264   5.190598    -1.61   0.108    -18.51165    1.835122
  mixed uni NO min  |   4.808446   3.674408     1.31   0.191    -2.393261    12.01015
mixed uni WITH min  |  -1.597436    4.06049    -0.39   0.694     -9.55585    6.360977
   specific tiered  |  -8.308067   3.384624    -2.45   0.014    -14.94181   -1.674326
 tird adv WITH min  |  -8.823732   4.080642    -2.16   0.031    -16.82164   -.8258213
  mxd trd WITH min  |  -19.90516   3.876494    -5.13   0.000    -27.50295   -12.30737
                    |
                  E |   .2397692   1.121628     0.21   0.831    -1.958581    2.438119
                coc |   5.508633   1.577934     3.49   0.000      2.41594    8.601326
               unem |  -.2725512   .1849838    -1.47   0.141    -.6351129    .0900104
            lnGDPPC |   2.830525    1.64612     1.72   0.086    -.3958122    6.056861
                    |
          region_id |
               AMR  |   8.763069   4.040908     2.17   0.030     .8430354     16.6831
               EMR  |   -4.99006   5.627373    -0.89   0.375    -16.01951    6.039388
               EUR  |   6.148668   4.709602     1.31   0.192    -3.081982    15.37932
               SEA  |  -6.283476   6.768348    -0.93   0.353    -19.54919    6.982243
               WPR  |   6.940622   5.036833     1.38   0.168    -2.931389    16.81263
                    |
               year |
              2016  |   1.931037   1.249372     1.55   0.122    -.5176866    4.379761
              2018  |    2.56888   1.583531     1.62   0.105     -.534783    5.672544
              2020  |   3.320599   1.587323     2.09   0.036     .2095038    6.431694
              2022  |   3.046376   1.754632     1.74   0.083    -.3926392    6.485391
                    |
              _cons |   26.26522   15.42733     1.70   0.089    -3.971789    56.50223
--------------------+----------------------------------------------------------------
            sigma_u |  12.469835
            sigma_e |  11.083854
                rho |  .55864045   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------

Code:

xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -2657.2751  
Iteration 1:   log likelihood = -2657.2751  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        659
Group variable: id                              Number of groups  =        164
                                                Obs per group:
                                                              min =          1
                                                              avg =        4.0
                                                              max =          5
                                                Wald chi2(20)     =     267.79
Log likelihood = -2657.2751                     Prob > chi2       =     0.0000

--------------------------------------------------------------------------------------
price_dispersion_use | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
              TS_ce2 |
      adv un NO min  |  -12.49909   2.729109    -4.58   0.000    -17.84805   -7.150137
   adv uni WITH min  |  -8.384119    3.86549    -2.17   0.030    -15.96034   -.8078988
   mixed uni NO min  |   5.015721   3.225648     1.55   0.120    -1.306432    11.33787
 mixed uni WITH min  |  -1.830149   2.964851    -0.62   0.537     -7.64115    3.980852
    specific tiered  |  -8.339198   2.554442    -3.26   0.001    -13.34581   -3.332584
  tird adv WITH min  |  -8.622508   12.77176    -0.68   0.500    -33.65469    16.40967
   mxd trd WITH min  |  -19.78572   5.321532    -3.72   0.000    -30.21573   -9.355705
                     |
                   E |   .2622187   .8506864     0.31   0.758    -1.405096    1.929533
                 coc |   5.498374   1.530339     3.59   0.000     2.498964    8.497784
                unem |  -.2666603   .1678856    -1.59   0.112      -.59571    .0623893
             lnGDPPC |   2.855437   1.628536     1.75   0.080     -.336435    6.047309
                     |
           region_id |
                AMR  |   8.716485   3.774297     2.31   0.021     1.318999    16.11397
                EMR  |  -5.048147   4.628445    -1.09   0.275    -14.11973    4.023439
                EUR  |   5.882058   4.439661     1.32   0.185    -2.819518    14.58363
                SEA  |  -6.389057   6.032405    -1.06   0.290    -18.21235     5.43424
                WPR  |   6.907721   4.138175     1.67   0.095    -1.202953    15.01839
                     |
                year |
               2016  |   1.928479   1.407243     1.37   0.171    -.8296663    4.686624
               2018  |   2.582868   1.459202     1.77   0.077    -.2771154    5.442851
               2020  |    3.31731   1.456869     2.28   0.023     .4618988    6.172721
               2022  |   3.048146   1.504786     2.03   0.043     .0988195    5.997472
                     |
               _cons |   25.99444   14.52699     1.79   0.074    -2.477937    54.46682
--------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |   11.90176   .8518652      10.34396    13.69417
-----------------------------+------------------------------------------------
                sd(Residual) |   11.06389   .3550944      10.38936    11.78222
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 207.65        Prob >= chibar2 = 0.0000

Thank you!

Sam

References:

StataCorp. 2023. Stata 18 Multilevel Mixed-Effects Reference Manual. College Station, TX: Stata Press.

Last edited by Sam Murgatroyd; 13 Jun 2024, 04:50. Reason: spelling

Tags: clustered standard errors, mixed-effect, panel data, random effects

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

13 Jun 2024, 10:03

You are asking two different questions, and I will answer them separately.

1. Fixed effects vs random effects. You have followed the standard practice in economics of using a Hausman test to approach this, but then you noticed that your outcome variable shows a large amount of interesting variation by country, so you feel you don't want to miss out on being able to estimate those effects. Well, the problem here is that you should clearly identify your research questions before you begin analyzing your data, so that you can choose the analyses that answer those specific questions. If your research questions require you to estimate the country-level differences, then it doesn't matter what Hausman or any other test tells you--that cannot be done in a fixed effects model and you shouldn't use fixed effects. If, however, these country-level differences are just a matter of your personal interest but are not relevant to your research goals, then, given your Hausman test results, the fixed-effects model is better because it will give you consistent estimates of whatever variables actually are relevant to the research questions, whereas the random effects model will not. If some of your research questions revolve around the country-level differences and others do not, then you will need to estimate both models and use the results from each model for the analyses to which it is appropriate. More generally, with 2-level clustered data, if your research question involves between-cluster differences you cannot use a fixed effects model for the purpose. If it does not, then a fixed effects model will always give consistent estimates of the relevant parameters, whereas a random effects model may fail to do so. In this situation, a Hausman type test can identify the situations where the random effects model will give consistent estimates, and when that is true, the random effects model is preferred for reasons of efficiency.

2. What is the difference between the random effects model (-xtreg, re-) and the multilevel model (-xtmixed-)?
Both of these are the same underlying model: DV = constant + sum(beta_i * IndVar_i) + u_cluster + e_observation. You have probably also noticed that although the results you got from the two commands differ, they differ only slightly. The differences arise because different calculation algorithms are used to estimate the coefficients. -xtmixed- uses maximum likelihood estimation, whereas -xtreg, re- uses generalized least squares. The differences between them are always small. In fact, I would say that the differences you are getting are about as large as they ever get, at least in my experience. The advantage of the -xtreg, re- command is that it has a simpler syntax and it is not subject to convergence problems. The advantage of -xtmixed- (which, by the way, was renamed -mixed- several versions back) is that it can handle models with more than 2 levels and can also estimate random slopes. It also offers a wide array of correlation structures among the random effects and among the residuals. As your model has only 2 levels, simple independent random effect and residual correlation structure, and no random slopes, there is no particular reason to prefer one of these over the other.
2 likes
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 428
#3

13 Jun 2024, 11:32

I 100% agree with Clyde here (great answer!). The one issue he didn't address is your use of cluster(id) in your xtreg model. This provides an adjustment to the standard error calculation to get so-called robust standard errors. It is worth reading up on what these are in the Stata documentation see help vce_option. You can obtain the same standard errors in mixed:

Code:

xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id: , vce(cluster id)
2 likes
Comment
Sam Murgatroyd

Join Date: Oct 2023

Posts: 33
#4

14 Jun 2024, 00:39

Thank you Clyde and Erik for taking the time to engage with my question and to provide very clear responses. I really appreciate your time and effort.

Something that has still got me confused is this idea of a two-level model in which 'observations' are clustered in 'countries'. I am stuck in the thinking of ‘students’ clustered in ‘classes’. Is the observation in the country-level panel data case 'country-year' and 'country(id)' is the cluster? In terms Clyde's formula:

Originally posted by Clyde Schechter View Post

DV = constant + sum(beta_i * IndVar_i) + u_cluster + e_observation.

becomes DV = constant + sum(beta_i * IndVar_i) + u_{cluster=country} + e_{observation=country-year}

Relatedly, my understanding of clustering standard errors is that you do it to account for the correlation between years within each panel unit; but in the xtmixed framework you are accounting for that by nesting the different country-years within countries. Is clustering the standard errors by country in a two-level model not 'tautologous'? Why would you need to cluster your standard errors at the country-level when you run:

Code:

xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id:

Thank you again for your time!

Sam
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 428
#5

14 Jun 2024, 09:14

You are correct in your interpretation. Multilevel data can take many forms. In the cross-sectional case, it might be students within classrooms (2-level). You might also have data on students from multiple schools, in which case the nesting structure is students > classrooms > schools. Longitudinal data is another type of multilevel data, and this exactly what you have - repeated (yearly) observations of countries (your level 2 identifier). Another example would be an educational study where you collect data on students multiple times per year. There you have repeated observations nested within students who are further nested within classrooms (a 3-level structure). It can get quite complex, especially when you start getting into cross-classifications in which you have, for example, some students who show up in two different classrooms.

As to your other question regarding standard error correction. Some folks still use those with panel or multilevel data because they worry about heteroskedasticity of the residuals. See this excellent primer by Richard Williams. Note that this is pretty much the default approach among folks in the econometrics tradition, and if that is your audience, then you should probably use vce(cluster id) or vce(robust) in your mixed model.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#6

14 Jun 2024, 10:20

The problem with going down this rabbit hole of "I am interested in differences between countries, therefore I will use a random effects model" is that you will likely end up fitting noise, and any results you report will be gibberish. Unless you have a randomized intervention, this approach is flawed. That is why we insist on testing whether the random effects estimator is consistent if you choose to use it. I refer you to Jeff Wooldridge's comments in the following thread: https://www.statalist.org/forums/for...nel-regression. As he states there, if your goal is a descriptive regression, then just use pooled OLS. You appear to have economic data, so this is a heads up to you. You don't want to exert too much effort on this and then get a rude awakening from the referees later on if you intend to publish your results.
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 428
#7

14 Jun 2024, 11:56

I think Andrew's take is consistent with what would happen if you tried to publish your paper in an econometrics journal. But if you are going to political science, psychology, or sociology, there is a lot more openness to a random effects models. You can always introduce the country means of any within-country predictor to deal with the endogeneity problem (correlation between predictors and the error term) for the within-country predictors. That still doesn't solve the problem of potential endogeneity for level 2 predictors, but as Clyde says, if your substantive questions deal with between country predictors, then you have to do a good job convincing reviewers that your between country covariates adjust for confounding factors.
1 like
Comment

Announcement

Fixed effects, Random effects, Mixed-effects, clustered standard errors and the difference

Comment

Comment

Comment

Comment

Comment

Comment