Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects, Random effects, Mixed-effects, clustered standard errors and the difference

    Hello,

    I am conceptually stuck on the difference between random effects and mixed-effects models, when to use which one, and on the need for clustering standard errors in the mixed-effects setting. Most of the expositions of mixed-effects models I have seen frame the issue in terms of patients nested within doctors or students nested within schools. I am finding it hard to reconcile these examples with the country-level data that I am using and am hoping that the information I provide is clear enough that someone with a better understanding can help me resolve my confusion. The data and main variables that I am using are as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str56 country float id double price_dispersion_use float TS_ce2 byte E double(coc unem) float lnGDPPC str3 region float region_id int year
    "Afghanistan"          1                 15 . 4   -1.36474287509918               7.91  8.014661 "EMR" 3 2014
    "Afghanistan"          1                  . . 5   -1.54035270214081             10.092  7.994392 "EMR" 3 2016
    "Afghanistan"          1 13.333333333333334 . 5   -1.50288057327271             11.131  7.974823 "EMR" 3 2018
    "Afghanistan"          1  11.76470588235294 . 5   -1.49369978904724              11.71  7.928968 "EMR" 3 2020
    "Afghanistan"          1                  . . 5   -1.18377649784088               14.1         . "EMR" 3 2022
    "Albania"              2  44.44444444444444 1 5   -.586141347885132              18.05  9.465752 "EUR" 4 2014
    "Albania"              2 56.666666666666664 1 5   -.471469223499298              15.42  9.524819 "EUR" 4 2016
    "Albania"              2               62.5 1 5   -.545840263366699               12.3  9.604934 "EUR" 4 2018
    "Albania"              2  60.60606060606061 1 5   -.572924494743347             12.833   9.60202 "EUR" 4 2020
    "Albania"              2                 60 1 5   -.407875537872314             11.629  9.756207 "EUR" 4 2022
    "Algeria"              3  33.33333333333333 6 4    -.61265641450882              10.21  9.511575 "AFR" 1 2014
    "Algeria"              3 35.714285714285715 . 4    -.67341673374176               10.2  9.539472 "AFR" 1 2016
    "Algeria"              3                 15 . 5   -.658660113811493             12.145  9.525714 "AFR" 1 2018
    "Algeria"              3                 50 . 5    -.66646021604538             14.036  9.447598 "AFR" 1 2020
    "Algeria"              3  48.57142857142857 5 5   -.637929856777191             12.491 9.4796715 "AFR" 1 2022
    "Andorra"              4  72.85714285714285 6 2    1.22070860862732 3.4574213637138524 11.030043 "EUR" 4 2014
    "Andorra"              4  72.85714285714285 6 2    1.15955591201782 3.4620623503549632 11.067958 "EUR" 4 2016
    "Andorra"              4  77.77777777777777 1 2    1.17916560173035 3.4856245150448624 11.053652 "EUR" 4 2018
    "Andorra"              4  68.44993141289439 1 2    1.26600527763367 3.5770777680298496  10.91981 "EUR" 4 2020
    "Andorra"              4  69.86301369863014 1 2    1.27020359039307  3.890483345078273 11.056888 "EUR" 4 2022
    "Angola"               5                 75 . 2   -1.45779824256897             16.317  9.236285 "AFR" 1 2014
    "Angola"               5                  . . 2   -1.48333728313446             16.577  9.147498 "AFR" 1 2016
    "Angola"               5                  . . 4   -1.19925093650818             16.626   9.06262 "AFR" 1 2018
    "Angola"               5                 25 2 4   -.938672542572021             16.698 8.9309025 "AFR" 1 2020
    "Angola"               5                 25 . 4   -.601941287517548             14.478  8.910195 "AFR" 1 2022
    "Antigua and Barbuda"  6                 75 . 2    .634897768497467 3.4574213637138524 10.143725 "AMR" 2 2014
    "Antigua and Barbuda"  6                 50 . 2    .645558714866638 3.4620623503549632  10.18348 "AMR" 2 2016
    "Antigua and Barbuda"  6             66.875 . 5    .236239701509476 3.4856245150448624 10.263378 "AMR" 2 2018
    "Antigua and Barbuda"  6  62.05673758865249 . 5    .238533273339272 3.5770777680298496 10.073401 "AMR" 2 2020
    "Antigua and Barbuda"  6 63.829787234042556 . 5    .310604453086853  3.890483345078273  10.23125 "AMR" 2 2022
    "Argentina"            7 41.935483870967744 2 4   -.549443066120148               7.27  10.25563 "AMR" 2 2014
    "Argentina"            7              37.75 3 4   -.298964887857437              8.085 10.240202 "AMR" 2 2016
    "Argentina"            7  45.34920634920635 3 4   -.098668172955513               9.22 10.220944 "AMR" 2 2018
    "Argentina"            7 18.726114649681527 3 4    -.16378065943718              11.46 10.076843 "AMR" 2 2020
    "Argentina"            7 13.384615384615383 3 4   -.447030484676361              6.805   10.2083 "AMR" 2 2022
    "Armenia"              8                 30 6 2   -.565155386924744             11.989  9.488754 "EUR" 4 2014
    "Armenia"              8 26.666666666666668 6 2   -.659123718738556             12.625  9.530623 "EUR" 4 2016
    "Armenia"              8 42.857142857142854 3 2   -.408891350030899              13.21  9.663906 "EUR" 4 2018
    "Armenia"              8               47.5 1 5 -.00343869999051094              12.18  9.673404 "EUR" 4 2020
    "Armenia"              8  48.23529411764706 1 5   .0280352365225554              8.588  9.857456 "EUR" 4 2022
    "Australia"            9  78.93318965517241 1 4    1.84946465492249               6.08  10.90798 "WPR" 6 2014
    "Australia"            9  73.84341637010677 1 4    1.77200365066528               5.71  10.92716 "WPR" 6 2016
    "Australia"            9  82.34126984126985 1 4    1.76737761497498                5.3 10.947303 "WPR" 6 2018
    "Australia"            9  71.02189781021899 6 4    1.63295590877533               6.46 10.938417 "WPR" 6 2020
    "Australia"            9  68.45524542829644 6 4    1.76448953151703                3.7 10.987324 "WPR" 6 2022
    "Austria"             10  80.61224489795919 4 4    1.46674907207489               5.67 11.040983 "EUR" 4 2014
    "Austria"             10                 80 4 4    1.49696803092957               6.06 11.048753 "EUR" 4 2016
    "Austria"             10                 80 4 4    1.56836605072021               4.93 11.083235 "EUR" 4 2018
    "Austria"             10  82.45614035087719 4 4    1.47778916358948                5.2 11.020405 "EUR" 4 2020
    "Austria"             10  68.35820895522387 4 4    1.25861942768097               4.99 11.094935 "EUR" 4 2022
    "Azerbaijan"          11                 24 6 4   -1.02249026298523               4.91  9.938668 "EUR" 4 2014
    "Azerbaijan"          11              56.25 1 4   -.852654457092285                  5  9.894967 "EUR" 4 2016
    "Azerbaijan"          11 23.076923076923077 6 5   -.852769494056702                4.9  9.893378 "EUR" 4 2018
    "Azerbaijan"          11  47.05882352941177 6 5   -1.07708406448364               7.24  9.858809 "EUR" 4 2020
    "Azerbaijan"          11  55.55555555555556 6 5   -1.04057228565216               5.65  9.953777 "EUR" 4 2022
    "Bahamas"             12 48.658536585365916 1 2    1.30873775482178                  . 10.381657 "AMR" 2 2014
    "Bahamas"             12  40.22346368715088 1 2    1.06738793849945               12.7 10.366473 "AMR" 2 2016
    "Bahamas"             12                  . . 2    1.09553563594818                 10 10.405302 "AMR" 2 2018
    "Bahamas"             12  61.08949416342412 1 2    1.10620594024658             12.563 10.118558 "AMR" 2 2020
    "Bahamas"             12                  . 1 2    1.25618994235992             10.089 10.401076 "AMR" 2 2022
    "Bahrain"             13                 50 . 5    .273521840572357              1.147 10.890368 "EMR" 3 2014
    "Bahrain"             13  33.33333333333333 . 5  -.0476647540926933              1.193 10.877423 "EMR" 3 2016
    "Bahrain"             13                 40 2 5   -.176231503486633              1.198  10.88669 "EMR" 3 2018
    "Bahrain"             13  34.78260869565218 3 5  -.0935939401388168              1.786 10.867227 "EMR" 3 2020
    "Bahrain"             13 58.333333333333336 3 5    .139385640621185              1.339 10.944588 "EMR" 3 2022
    "Bangladesh"          14 15.789473684210526 . 4   -.892129957675934              4.405  8.543592 "SEA" 5 2014
    "Bangladesh"          14 22.727272727272727 . 4    -.88687801361084               4.35  8.651562 "SEA" 5 2016
    "Bangladesh"          14  33.33333333333333 . 4   -.926946818828583              4.373  8.761912 "SEA" 5 2018
    "Bangladesh"          14 32.142857142857146 . 4   -1.00367724895477              5.316  8.849105 "SEA" 5 2020
    "Bangladesh"          14                 25 . 4    -1.0755273103714              4.271   8.96254 "SEA" 5 2022
    "Barbados"            15  79.32850559578671 1 2    1.13345634937286              12.17  9.704554 "AMR" 2 2014
    "Barbados"            15              81.25 1 2     1.2135511636734               8.25    9.7494 "AMR" 2 2016
    "Barbados"            15  45.23433385992628 1 2    1.37191247940063               8.32  9.741538 "AMR" 2 2018
    "Barbados"            15                  . . 2    1.19406688213348              9.743  9.604329 "AMR" 2 2020
    "Barbados"            15  78.84615384615384 1 2    1.28457343578339              8.501  9.700481 "AMR" 2 2022
    "Belarus"             16             35.625 6 4    -.23470650613308              5.908 10.187328 "EUR" 4 2014
    "Belarus"             16 31.914893617021278 6 4   -.224086627364159               5.84  10.12049 "EUR" 4 2016
    "Belarus"             16 30.645161290322577 6 4    -.15480200946331               4.76 10.179738 "EUR" 4 2018
    "Belarus"             16  25.71428571428572 6 4   -.133964225649834               4.05 10.193598 "EUR" 4 2020
    "Belarus"             16 23.958333333333332 6 4    -.57967621088028               3.57 10.185905 "EUR" 4 2022
    "Belgium"             17  80.82901554404145 4 4    1.51295030117035               8.52  10.96869 "EUR" 4 2014
    "Belgium"             17  81.64556962025317 4 4    1.53148806095123               7.83  10.99063 "EUR" 4 2016
    "Belgium"             17  83.33333333333334 4 4    1.42942035198212               5.95 11.016062 "EUR" 4 2018
    "Belgium"             17  85.29411764705883 4 4    1.44595634937286               5.55 10.974466 "EUR" 4 2020
    "Belgium"             17               72.5 4 4    1.49504864215851               5.56  11.05771 "EUR" 4 2022
    "Belize"              18  41.66666666666667 . 2   -.159106820821762               8.24 9.4002905 "AMR" 2 2014
    "Belize"              18  41.66666666666667 1 2   -.229891732335091                  7  9.390294 "AMR" 2 2016
    "Belize"              18                 40 1 2   -.169436514377594              7.899  9.343116 "AMR" 2 2018
    "Belize"              18                 50 1 2   -.193349361419678             10.619  9.203805 "AMR" 2 2020
    "Belize"              18 50.391644908616186 1 2   -.237028583884239              8.672  9.426018 "AMR" 2 2022
    "Benin"               19                  . 2 4   -.669143795967102              1.808  8.040519 "AFR" 1 2014
    "Benin"               19                 20 2 4   -.529120028018951              1.843  8.031984 "AFR" 1 2016
    "Benin"               19               22.5 2 5   -.391388416290283               1.47  8.093288 "AFR" 1 2018
    "Benin"               19 47.368421052631575 2 5   -.040327787399292              1.616  8.140294 "AFR" 1 2020
    "Benin"               19                  . 2 5   -.124255605041981              1.476  8.215443 "AFR" 1 2022
    "Bhutan"              20                  . . 4    1.30612897872925               2.63  9.345343 "SEA" 5 2014
    "Bhutan"              20                  . . 4    1.09102046489716              2.747  9.469749 "SEA" 5 2016
    "Bhutan"              20                  . . 4    1.59051811695099               3.35  9.533319 "SEA" 5 2018
    "Bhutan"              20                  . . 4    1.61823654174805               5.03  9.467918 "SEA" 5 2020
    "Bhutan"              20                  . 2 4    1.51425933837891               5.95         . "SEA" 5 2022
    end
    label values TS_ce2 TS_ce2_l
    label def TS_ce2_l 1 "specific uniform", modify
    label def TS_ce2_l 2 "adv un NO min", modify
    label def TS_ce2_l 3 "adv uni WITH min", modify
    label def TS_ce2_l 4 "mixed uni NO min", modify
    label def TS_ce2_l 5 "mixed uni WITH min", modify
    label def TS_ce2_l 6 "specific tiered", modify
    label values region_id region_id_l
    label def region_id_l 1 "AFR", modify
    label def region_id_l 2 "AMR", modify
    label def region_id_l 3 "EMR", modify
    label def region_id_l 4 "EUR", modify
    label def region_id_l 5 "SEA", modify
    label def region_id_l 6 "WPR", modify

    Outside of region dummies, which identify different continents, none of the variables I am interested are time-invariant by definition. So my starting point was to run a Hausman test on the model with the time-varying independent variables only, to see if fixed effects or random effects is the consistent estimator. My understanding is that if I use the fixed effects estimator, I am controlling for the region-specific effects, the effects just aren't estimable because they are absorbed into the fixed component. I ran the Hausman test (with no region dummies) as follows:

    Code:
    xi: xtreg price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.year, re cluster(id)
    i.TS_ce2          _ITS_ce2_1-10       (naturally coded; _ITS_ce2_1 omitted)
    i.year            _Iyear_2014-2022    (naturally coded; _Iyear_2014 omitted)
    
    Random-effects GLS regression                   Number of obs     =        664
    Group variable: id                              Number of groups  =        165
    
    R-squared:                                      Obs per group:
         Within  = 0.0934                                         min =          1
         Between = 0.5087                                         avg =        4.0
         Overall = 0.4617                                         max =          5
    
                                                    Wald chi2(15)     =    1676.20
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                       (Std. err. adjusted for 165 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
    price_disp~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      _ITS_ce2_2 |  -14.02557   3.794631    -3.70   0.000    -21.46291   -6.588226
      _ITS_ce2_3 |  -10.88748   4.686501    -2.32   0.020    -20.07285   -1.702108
      _ITS_ce2_4 |   4.796852   3.110406     1.54   0.123    -1.299433    10.89314
      _ITS_ce2_5 |  -1.795415   4.094529    -0.44   0.661    -9.820545    6.229715
      _ITS_ce2_6 |  -9.561305   3.260805    -2.93   0.003    -15.95237   -3.170244
      _ITS_ce2_8 |  -11.07366   3.741402    -2.96   0.003    -18.40668   -3.740649
     _ITS_ce2_10 |  -21.19221   3.316523    -6.39   0.000    -27.69248   -14.69195
               E |   -.420992     1.1004    -0.38   0.702    -2.577737    1.735753
             coc |   5.599903    1.50856     3.71   0.000     2.643179    8.556627
            unem |  -.3365144   .1896276    -1.77   0.076    -.7081776    .0351489
         lnGDPPC |   4.062827   1.478944     2.75   0.006      1.16415    6.961504
     _Iyear_2016 |   1.719148   1.237929     1.39   0.165    -.7071489    4.145445
     _Iyear_2018 |   2.422449   1.561346     1.55   0.121    -.6377324     5.48263
     _Iyear_2020 |   3.136992   1.555324     2.02   0.044     .0886123    6.185371
     _Iyear_2022 |   2.683144   1.712374     1.57   0.117    -.6730476    6.039336
           _cons |   21.95015   14.74566     1.49   0.137    -6.950821    50.85112
    -------------+----------------------------------------------------------------
         sigma_u |  12.909408
         sigma_e |  11.049002
             rho |   .5771861   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(id)
    Sargan-Hansen statistic  79.296  Chi-sq(15)   P-value = 0.0000
    Evidently, fixed effects is the way to go if I am not interested in knowing the coefficients on the region dummies. However, if I compare the mean of my dependent variable by region, I see that there are statistically significant differences, which may be interesting to investigate in the regression.

    Code:
    oneway price_dispersion_use region_id, bonferroni
    
                            Analysis of variance
        Source              SS         df      MS            F     Prob > F
    ------------------------------------------------------------------------
    Between groups      132305.334      5   26461.0667     65.51     0.0000
     Within groups      324737.956    804    403.90293
    ------------------------------------------------------------------------
        Total            457043.29    809   564.948442
    
    Bartlett's equal-variances test: chi2(5) =  15.7086    Prob>chi2 = 0.008
    
                 Comparison of price_dispersion_use by group(region)
                                    (Bonferroni)
    Row Mean-|
    Col Mean |        AFR        AMR        EMR        EUR        SEA
    ---------+-------------------------------------------------------
         AMR |    18.4197
             |      0.000
             |
         EMR |   -2.77116   -21.1908
             |      1.000      0.000
             |
         EUR |    28.7939    10.3743    31.5651
             |      0.000      0.000      0.000
             |
         SEA |   -3.69121   -22.1109   -.920047   -32.4851
             |      1.000      0.000      1.000      0.000
             |
         WPR |    18.4931    .073463    21.2643   -10.3008    22.1843
             |      0.000      1.000      0.000      0.000      0.000
    
    .
    Moreover, a reality of my data is that most of the variation in my variables comes from that between countries as opposed to that arising from within them (see below). So even though fixed effects is the consistent estimator, my understanding is that I am ignoring what could be an interesting perspective if I don't account for the between variation.

    Code:
    xtsum price_dispersion_use TS_ce2 E coc unem lnGDPPC year
    
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    price_~e overall |   54.1633   23.68788          4        100 |     N =     821
             between |             21.47537   7.836111   92.83242 |     n =     192
             within  |             10.13382   16.45002   99.53367 | T-bar = 4.27604
                     |                                            |
    TS_ce2   overall |  3.118182   1.910972          1         10 |     N =     770
             between |              1.63939          1          7 |     n =     176
             within  |             .9785963  -1.131818   9.318182 | T-bar =   4.375
                     |                                            |
    E        overall |   3.78359   1.078699          2          5 |     N =     975
             between |             1.002953          2          5 |     n =     195
             within  |             .4022524    1.38359    6.18359 |     T =       5
                     |                                            |
    coc      overall | -.0858656   .9881318  -1.936706   2.402744 |     N =     963
             between |             .9822009  -1.780401   2.250184 |     n =     193
             within  |             .1345866  -.9270451   .4870617 | T-bar = 4.98964
                     |                                            |
    unem     overall |  7.565258   5.748101        .11      28.84 |     N =     960
             between |             5.560955       .146    26.5954 |     n =     193
             within  |             1.455914   .2532579   15.11326 | T-bar = 4.97409
                     |                                            |
    lnGDPPC  overall |  9.485856   1.149466   6.753617   11.82817 |     N =     918
             between |             1.148124   6.810744   11.80229 |     n =     185
             within  |             .0820738   9.140171   10.27743 |     T = 4.96216
                     |                                            |
    year     overall |      2018   2.829879       2014       2022 |     N =     975
             between |                    0       2018       2018 |     n =     195
             within  |             2.829879       2014       2022 |     T =       5
    Given this, I thought that, in addition to the fixed effects estimates, I would run an alternate specification in which I also control for the region dummy using random effects. However, I am unsure if this should be implemented as xtreg, re cluster (id) or in two-level multilevel model in which 'observations are clustered in countries'. The latter consideration came when I found a study that also uses country level data to model a very similar situation to mine. In this study, the authors used "a two-level mixed effects linear regression model".

    According to the authors, country was treated as the level 2 identifier, but the explanation on why and how that they provide is very thin. I have tried to implement it to understand how it is different to the RE regression with clustered standard errors. I used the the Stata help manual on multi-level mixed effects to execute the code for the multi-level/mixed model I show below, but I am no closer to understanding if treating the country (id) as the level 2 identifier in the multilevel model is a similar thing to clustering standard errors at the country level in the random effects regression. If these approaches are not doing the same thing, I am lost on what the difference between the two approaches below is and would greatly appreciate guidance on how 'treating observations as nested in countries in my dataset' is different to clustering standard errors at the country level in the non-hierachical model (xtreg, re cluster(id)).

    Code:
    xtreg price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year, re cluster(id)
    
    Random-effects GLS regression                   Number of obs     =        659
    Group variable: id                              Number of groups  =        164
    
    R-squared:                                      Obs per group:
         Within  = 0.0934                                         min =          1
         Between = 0.5542                                         avg =        4.0
         Overall = 0.4828                                         max =          5
    
                                                    Wald chi2(20)     =    1788.60
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                              (Std. err. adjusted for 164 clusters in id)
    -------------------------------------------------------------------------------------
                        |               Robust
    price_dispersion_~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    --------------------+----------------------------------------------------------------
                 TS_ce2 |
         adv un NO min  |  -12.40899   3.996279    -3.11   0.002    -20.24155   -4.576423
      adv uni WITH min  |  -8.338264   5.190598    -1.61   0.108    -18.51165    1.835122
      mixed uni NO min  |   4.808446   3.674408     1.31   0.191    -2.393261    12.01015
    mixed uni WITH min  |  -1.597436    4.06049    -0.39   0.694     -9.55585    6.360977
       specific tiered  |  -8.308067   3.384624    -2.45   0.014    -14.94181   -1.674326
     tird adv WITH min  |  -8.823732   4.080642    -2.16   0.031    -16.82164   -.8258213
      mxd trd WITH min  |  -19.90516   3.876494    -5.13   0.000    -27.50295   -12.30737
                        |
                      E |   .2397692   1.121628     0.21   0.831    -1.958581    2.438119
                    coc |   5.508633   1.577934     3.49   0.000      2.41594    8.601326
                   unem |  -.2725512   .1849838    -1.47   0.141    -.6351129    .0900104
                lnGDPPC |   2.830525    1.64612     1.72   0.086    -.3958122    6.056861
                        |
              region_id |
                   AMR  |   8.763069   4.040908     2.17   0.030     .8430354     16.6831
                   EMR  |   -4.99006   5.627373    -0.89   0.375    -16.01951    6.039388
                   EUR  |   6.148668   4.709602     1.31   0.192    -3.081982    15.37932
                   SEA  |  -6.283476   6.768348    -0.93   0.353    -19.54919    6.982243
                   WPR  |   6.940622   5.036833     1.38   0.168    -2.931389    16.81263
                        |
                   year |
                  2016  |   1.931037   1.249372     1.55   0.122    -.5176866    4.379761
                  2018  |    2.56888   1.583531     1.62   0.105     -.534783    5.672544
                  2020  |   3.320599   1.587323     2.09   0.036     .2095038    6.431694
                  2022  |   3.046376   1.754632     1.74   0.083    -.3926392    6.485391
                        |
                  _cons |   26.26522   15.42733     1.70   0.089    -3.971789    56.50223
    --------------------+----------------------------------------------------------------
                sigma_u |  12.469835
                sigma_e |  11.083854
                    rho |  .55864045   (fraction of variance due to u_i)
    -------------------------------------------------------------------------------------
    Code:
    xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id:
    
    Performing EM optimization:
    
    Performing gradient-based optimization:
    
    Iteration 0:   log likelihood = -2657.2751  
    Iteration 1:   log likelihood = -2657.2751  
    
    Computing standard errors:
    
    Mixed-effects ML regression                     Number of obs     =        659
    Group variable: id                              Number of groups  =        164
                                                    Obs per group:
                                                                  min =          1
                                                                  avg =        4.0
                                                                  max =          5
                                                    Wald chi2(20)     =     267.79
    Log likelihood = -2657.2751                     Prob > chi2       =     0.0000
    
    --------------------------------------------------------------------------------------
    price_dispersion_use | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    ---------------------+----------------------------------------------------------------
                  TS_ce2 |
          adv un NO min  |  -12.49909   2.729109    -4.58   0.000    -17.84805   -7.150137
       adv uni WITH min  |  -8.384119    3.86549    -2.17   0.030    -15.96034   -.8078988
       mixed uni NO min  |   5.015721   3.225648     1.55   0.120    -1.306432    11.33787
     mixed uni WITH min  |  -1.830149   2.964851    -0.62   0.537     -7.64115    3.980852
        specific tiered  |  -8.339198   2.554442    -3.26   0.001    -13.34581   -3.332584
      tird adv WITH min  |  -8.622508   12.77176    -0.68   0.500    -33.65469    16.40967
       mxd trd WITH min  |  -19.78572   5.321532    -3.72   0.000    -30.21573   -9.355705
                         |
                       E |   .2622187   .8506864     0.31   0.758    -1.405096    1.929533
                     coc |   5.498374   1.530339     3.59   0.000     2.498964    8.497784
                    unem |  -.2666603   .1678856    -1.59   0.112      -.59571    .0623893
                 lnGDPPC |   2.855437   1.628536     1.75   0.080     -.336435    6.047309
                         |
               region_id |
                    AMR  |   8.716485   3.774297     2.31   0.021     1.318999    16.11397
                    EMR  |  -5.048147   4.628445    -1.09   0.275    -14.11973    4.023439
                    EUR  |   5.882058   4.439661     1.32   0.185    -2.819518    14.58363
                    SEA  |  -6.389057   6.032405    -1.06   0.290    -18.21235     5.43424
                    WPR  |   6.907721   4.138175     1.67   0.095    -1.202953    15.01839
                         |
                    year |
                   2016  |   1.928479   1.407243     1.37   0.171    -.8296663    4.686624
                   2018  |   2.582868   1.459202     1.77   0.077    -.2771154    5.442851
                   2020  |    3.31731   1.456869     2.28   0.023     .4618988    6.172721
                   2022  |   3.048146   1.504786     2.03   0.043     .0988195    5.997472
                         |
                   _cons |   25.99444   14.52699     1.79   0.074    -2.477937    54.46682
    --------------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    id: Identity                 |
                       sd(_cons) |   11.90176   .8518652      10.34396    13.69417
    -----------------------------+------------------------------------------------
                    sd(Residual) |   11.06389   .3550944      10.38936    11.78222
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 207.65        Prob >= chibar2 = 0.0000

    Thank you!

    Sam

    References:

    StataCorp. 2023. Stata 18 Multilevel Mixed-Effects Reference Manual. College Station, TX: Stata Press.



    Last edited by Sam Murgatroyd; 13 Jun 2024, 05:50. Reason: spelling

  • #2
    You are asking two different questions, and I will answer them separately.

    1. Fixed effects vs random effects. You have followed the standard practice in economics of using a Hausman test to approach this, but then you noticed that your outcome variable shows a large amount of interesting variation by country, so you feel you don't want to miss out on being able to estimate those effects. Well, the problem here is that you should clearly identify your research questions before you begin analyzing your data, so that you can choose the analyses that answer those specific questions. If your research questions require you to estimate the country-level differences, then it doesn't matter what Hausman or any other test tells you--that cannot be done in a fixed effects model and you shouldn't use fixed effects. If, however, these country-level differences are just a matter of your personal interest but are not relevant to your research goals, then, given your Hausman test results, the fixed-effects model is better because it will give you consistent estimates of whatever variables actually are relevant to the research questions, whereas the random effects model will not. If some of your research questions revolve around the country-level differences and others do not, then you will need to estimate both models and use the results from each model for the analyses to which it is appropriate. More generally, with 2-level clustered data, if your research question involves between-cluster differences you cannot use a fixed effects model for the purpose. If it does not, then a fixed effects model will always give consistent estimates of the relevant parameters, whereas a random effects model may fail to do so. In this situation, a Hausman type test can identify the situations where the random effects model will give consistent estimates, and when that is true, the random effects model is preferred for reasons of efficiency.

    2. What is the difference between the random effects model (-xtreg, re-) and the multilevel model (-xtmixed-)?
    Both of these are the same underlying model: DV = constant + sum(betai * IndVari) + ucluster + eobservation. You have probably also noticed that although the results you got from the two commands differ, they differ only slightly. The differences arise because different calculation algorithms are used to estimate the coefficients. -xtmixed- uses maximum likelihood estimation, whereas -xtreg, re- uses generalized least squares. The differences between them are always small. In fact, I would say that the differences you are getting are about as large as they ever get, at least in my experience. The advantage of the -xtreg, re- command is that it has a simpler syntax and it is not subject to convergence problems. The advantage of -xtmixed- (which, by the way, was renamed -mixed- several versions back) is that it can handle models with more than 2 levels and can also estimate random slopes. It also offers a wide array of correlation structures among the random effects and among the residuals. As your model has only 2 levels, simple independent random effect and residual correlation structure, and no random slopes, there is no particular reason to prefer one of these over the other.

    Comment


    • #3
      I 100% agree with Clyde here (great answer!). The one issue he didn't address is your use of cluster(id) in your xtreg model. This provides an adjustment to the standard error calculation to get so-called robust standard errors. It is worth reading up on what these are in the Stata documentation see help vce_option. You can obtain the same standard errors in mixed:
      Code:
      xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id: , vce(cluster id)

      Comment


      • #4
        Thank you Clyde and Erik for taking the time to engage with my question and to provide very clear responses. I really appreciate your time and effort.

        Something that has still got me confused is this idea of a two-level model in which 'observations' are clustered in 'countries'. I am stuck in the thinking of ‘students’ clustered in ‘classes’. Is the observation in the country-level panel data case 'country-year' and 'country(id)' is the cluster? In terms Clyde's formula:

        Originally posted by Clyde Schechter View Post
        DV = constant + sum(betai * IndVari) + ucluster + eobservation.
        becomes DV = constant + sum(betai * IndVari) + ucluster=country + eobservation=country-year

        Relatedly, my understanding of clustering standard errors is that you do it to account for the correlation between years within each panel unit; but in the xtmixed framework you are accounting for that by nesting the different country-years within countries. Is clustering the standard errors by country in a two-level model not 'tautologous'? Why would you need to cluster your standard errors at the country-level when you run:

        Code:
         xtmixed price_dispersion_use i.TS_ce2 E coc unem lnGDPPC i.region_id i.year || id:
        Thank you again for your time!

        Sam




        Comment


        • #5
          You are correct in your interpretation. Multilevel data can take many forms. In the cross-sectional case, it might be students within classrooms (2-level). You might also have data on students from multiple schools, in which case the nesting structure is students > classrooms > schools. Longitudinal data is another type of multilevel data, and this exactly what you have - repeated (yearly) observations of countries (your level 2 identifier). Another example would be an educational study where you collect data on students multiple times per year. There you have repeated observations nested within students who are further nested within classrooms (a 3-level structure). It can get quite complex, especially when you start getting into cross-classifications in which you have, for example, some students who show up in two different classrooms.

          As to your other question regarding standard error correction. Some folks still use those with panel or multilevel data because they worry about heteroskedasticity of the residuals. See this excellent primer by Richard Williams. Note that this is pretty much the default approach among folks in the econometrics tradition, and if that is your audience, then you should probably use vce(cluster id) or vce(robust) in your mixed model.

          Comment


          • #6
            The problem with going down this rabbit hole of "I am interested in differences between countries, therefore I will use a random effects model" is that you will likely end up fitting noise, and any results you report will be gibberish. Unless you have a randomized intervention, this approach is flawed. That is why we insist on testing whether the random effects estimator is consistent if you choose to use it. I refer you to Jeff Wooldridge's comments in the following thread: https://www.statalist.org/forums/for...nel-regression. As he states there, if your goal is a descriptive regression, then just use pooled OLS. You appear to have economic data, so this is a heads up to you. You don't want to exert too much effort on this and then get a rude awakening from the referees later on if you intend to publish your results.

            Comment


            • #7
              I think Andrew's take is consistent with what would happen if you tried to publish your paper in an econometrics journal. But if you are going to political science, psychology, or sociology, there is a lot more openness to a random effects models. You can always introduce the country means of any within-country predictor to deal with the endogeneity problem (correlation between predictors and the error term) for the within-country predictors. That still doesn't solve the problem of potential endogeneity for level 2 predictors, but as Clyde says, if your substantive questions deal with between country predictors, then you have to do a good job convincing reviewers that your between country covariates adjust for confounding factors.

              Comment

              Working...
              X