Panel data regressions techniques for company data

Wout Baetens

Join Date: May 2024

Posts: 8
#1

Panel data regressions techniques for company data

31 May 2024, 08:00

Hello network,

I am currently writing a master's thesis on the integration of the european insurance market. I would like to measure how different countries insurance markets react to a global shock and how different country variables affect the outcome. I have a panel dataset of 942 life insurance companies from 18 different countries over 9 years time(2014-2022) with some missing values mostly in the year 2022. I have tried xtreg with fixed effect structure as well as random effects. Both give me some result but the F-statistic is very low.
code used: xtreg GWP_growth GWP c4_ratio GDP_growth inflation economic_downturn_dummy.
GWP= gross written premiums and c4 ratio is a measure of market concentration. I have tried other variables but f-statistic does not seem to be improving. What are better techniques or structures that I can use to get better results?
Thanks in advance for your help!
Tags: fixed effects, panel, panel data, regression, Suggestion
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

31 May 2024, 08:28

Wout:
welcome to this forum.
As per FAQ, please share what you typed and what Stata gave you back to increase your chances of getting helpful replies. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Wout Baetens

Join Date: May 2024

Posts: 8
#3

31 May 2024, 12:27

my code: xtreg GWP_growth_life_w log_total_revenue_life solvency_ratio_w marketshare GDP_growth log_population inflation c4_ratio,fe
response:
Fixed-effects (within) regression Number of obs = 3,863
Group variable: company1 Number of groups = 523

R-squared: Obs per group:
Within = 0.1069 min = 1
Between = 0.0003 avg = 7.4
Overall = 0.0015 max = 9

F(7, 3333) = 57.02
corr(u_i, Xb) = -0.9713 Prob > F = 0.0000

----------------------------------------------------------------------------------
GWP_growth_lif~w | Coefficient Std. err. t P>|t| [95% conf. interval]
-----------------+----------------------------------------------------------------
log_total_reve~e | 33.41725 1.754276 19.05 0.000 29.97768 36.85681
solvency_ratio_w | .2176293 .0400664 5.43 0.000 .1390721 .2961865
marketshare | 77.81642 24.82696 3.13 0.002 29.1388 126.494
GDP_growth | .1456874 .2500398 0.58 0.560 -.3445597 .6359345
log_population | -52.45524 50.05718 -1.05 0.295 -150.6012 45.69067
inflation | .610816 .4248622 1.44 0.151 -.2222011 1.443833
c4_ratio | -13.78162 19.56075 -0.70 0.481 -52.13391 24.57067
_cons | 514.8184 870.9144 0.59 0.554 -1192.763 2222.399
-----------------+----------------------------------------------------------------
sigma_u | 111.70597
sigma_e | 44.605294
rho | .86247912 (fraction of variance due to u_i)
----------------------------------------------------------------------------------
F test that all u_i=0: F(522, 3333) = 2.39 Prob > F = 0.0000
Comment
JJ Kovach

Join Date: Feb 2018

Posts: 29
#4

31 May 2024, 14:50

It helps read your output if you put it within CODE tags (see the # button on the ribbon).
There are two F statistics reported, which report different things. The first, under the R-squared values, is the overall model fit, which is 57 and statistically significant. The second is the last line and it has to do with the panel structure of the data. It is 2.39, which is low, but still significant.

In your panel model, you have Y(i,t) = b0 + b1*X1(i,t) + ... + u(i) + e(i,t)

The Xs and Ys are for each firm (i) at time (t) and there are firm-specific, time invariant effects u(i) for each firm. The second F test has a null that all the u(i) for each of the firms are 0; i.e. , there are no firm-specific effects in the model. The test is rejected, but with a very low F-stat, indicating there is at least one firm with a statistically significant u(i). Also, the sigma_u and sigma_e values indicate that 86% of the variation is within firms as opposed to between firms. I suspect, but feel free to verify, that due to the combination of these things, your fixed effect results will be similar to the random effects results. Additionally, I would not be surprised if a pooled OLS model yielded similar results as well.

Not sure what you mean by getting "better results". It appears there are only small differences between your panels (insurance companies), at least given the current variables in the model.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

31 May 2024, 23:43

Wout:
as an aside to JJ's helpful reply (your output is difficult to read), you should take a look at within R-sq when dealing with the -fe- estimator.
With such a large sample, you should go -robust- or -vce(cluster, panelid)- standard errors.
I would also recommend you to check the functional form of your regressand (this is a test that checks whether your regression is correctly specified).
Basically, you have to calculate by hand the -linktest-, which is not supported by -xt- commands.
Let's show it with a toy-example:

Code:

. use "https://www.stata-press.com/data/r18/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1087                                         min =          1
     Between = 0.1006                                         avg =        6.1
     Overall = 0.0865                                         max =         15

                                                F(2, 4709)        =     507.42
corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1092                                         min =          1
     Between = 0.1033                                         avg =        6.1
     Overall = 0.0881                                         max =         15

                                                F(2, 4709)        =     523.09
corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
   sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
       _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
-------------+----------------------------------------------------------------
     sigma_u |    .403403
     sigma_e |  .30238578
         rho |  .64025357   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =    4.85
            Prob > F =    0.0276

.

As expected, the outcome of the (redundant, in this case) -test- reject the null that -sq_fitted- has not informative power. Therefore, the regression is (as expected again) misspecified.

Kind regards,
Carlo
(Stata 19.0)

Comment

Wout Baetens

Join Date: May 2024
Posts: 8

01 Jun 2024, 07:27

Thanks to both of you for your insights, here I have the codes for first the RE regression and second the FE regression. Is it now fair of me to assume that the bigger the company the higher the growth? The vce(cluster panelid) fixed effect regression gave me a lower F-statistic.

Code:

 xtreg GWP_growth_life_w log_total_revenue_life solvency_ratio_w benefits_paid_to_
> NPW_life_w GDP_growth log_population inflation c4_ratio,re

Random-effects GLS regression                   Number of obs     =      3,863
Group variable: company1                        Number of groups  =        523

R-squared:                                      Obs per group:
     Within  = 0.0692                                         min =          1
     Between = 0.0013                                         avg =        7.4
     Overall = 0.0057                                         max =          9

                                                Wald chi2(7)      =      47.95
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

----------------------------------------------------------------------------------
GWP_growth_lif~w | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
log_total_reve~e |   2.293654   .5268653     4.35   0.000     1.261017    3.326291
solvency_ratio_w |   .0478744   .0144962     3.30   0.001     .0194624    .0762864
benefits_paid_~w |  -.0059781    .001456    -4.11   0.000    -.0088319   -.0031243
      GDP_growth |    .171903   .2344935     0.73   0.464    -.2876958    .6315018
  log_population |  -1.462672   1.010944    -1.45   0.148    -3.444087     .518742
       inflation |  -.4290087   .3704173    -1.16   0.247    -1.155013    .2969958
        c4_ratio |  -10.11063   6.526214    -1.55   0.121    -22.90178    2.680512
           _cons |   11.90004   21.25739     0.56   0.576    -29.76367    53.56375
-----------------+----------------------------------------------------------------
         sigma_u |  17.849728
         sigma_e |  43.809415
             rho |  .14237275   (fraction of variance due to u_i)
----------------------------------------------------------------------------------

Code:

. xtreg GWP_growth_life_w log_total_revenue_life solvency_ratio_w benefits_paid_to_NPW_life_w GDP_growth log_population inflation c4_r
> atio,fe 

Fixed-effects (within) regression               Number of obs     =      3,863
Group variable: company1                        Number of groups  =        523

R-squared:                                      Obs per group:
     Within  = 0.1385                                         min =          1
     Between = 0.0003                                         avg =        7.4
     Overall = 0.0024                                         max =          9

                                                F(7, 3333)        =      76.56
corr(u_i, Xb) = -0.9521                         Prob > F          =     0.0000

---------------------------------------------------------------------------------------------
          GWP_growth_life_w | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
----------------------------+----------------------------------------------------------------
     log_total_revenue_life |   36.62378   1.731628    21.15   0.000     33.22862    40.01894
           solvency_ratio_w |   .2232736   .0393541     5.67   0.000     .1461129    .3004343
benefits_paid_to_NPW_life_w |  -.0354331   .0030796   -11.51   0.000    -.0414711    -.029395
                 GDP_growth |   .1329537   .2454861     0.54   0.588    -.3483649    .6142724
             log_population |  -16.23394   49.27217    -0.33   0.742    -112.8407    80.37281
                  inflation |   .7954277   .4128897     1.93   0.054    -.0141152    1.604971
                   c4_ratio |  -4.406103   19.05757    -0.23   0.817    -41.77183    32.95963
                      _cons |  -148.8261   857.4759    -0.17   0.862    -1830.059    1532.406
----------------------------+----------------------------------------------------------------
                    sigma_u |  94.024101
                    sigma_e |  43.809415
                        rho |  .82162629   (fraction of variance due to u_i)
---------------------------------------------------------------------------------------------
F test that all u_i=0: F(522, 3333) = 2.70                   Prob > F = 0.0000

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

01 Jun 2024, 10:02

Wout:
1) go -robust- standard errors;
2) test wether -re- is the way to go via the community-contributed module -xtoverid- (type -search xtoverid- and follow the instructions to install it).

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment

Wout Baetens

Join Date: May 2024
Posts: 8

03 Jun 2024, 07:19

Dear carlo, I have tried the robust standard errors you suggested and tested the RE regression with xtoverid. However the P-value of my Sargan-hausmann test is very low, it is supposed to be above 0.10 but I don't know how to get it up, do you have some ideas?

Code:

. xtreg GWP_growth_life_w log_total_revenue_life solvency_ratio_w GDP_growth log_po
> p inflation c4_ratio2,re robust
(1 missing value generated)

Random-effects GLS regression                   Number of obs     =      6,462
Group variable: company1                        Number of groups  =        868

R-squared:                                      Obs per group:
     Within  = 0.0215                                         min =          1
     Between = 0.0007                                         avg =        7.4
     Overall = 0.0048                                         max =          9

                                                Wald chi2(6)      =      53.98
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                 (Std. err. adjusted for 868 clusters in company1)
----------------------------------------------------------------------------------
                 |               Robust
GWP_growth_lif~w | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
log_total_reve~e |   1.526335   .2983307     5.12   0.000     .9416179    2.111053
solvency_ratio_w |   .0183528   .0080698     2.27   0.023     .0025364    .0341693
      GDP_growth |   .4378522   .1663199     2.63   0.008     .1118711    .7638333
         log_pop |   -1.85549   .8487064    -2.19   0.029    -3.518924   -.1920561
       inflation |  -.3907189   .2042236    -1.91   0.056    -.7909898     .009552
       c4_ratio2 |  -9.849762   4.300189    -2.29   0.022    -18.27798   -1.421546
           _cons |   23.63569   16.96632     1.39   0.164    -9.617693    56.88908
-----------------+----------------------------------------------------------------
         sigma_u |  14.216874
         sigma_e |  39.373681
             rho |  .11533827   (fraction of variance due to u_i)
----------------------------------------------------------------------------------

. 
end of do-file

. do "C:\Users\woutb\AppData\Local\Temp\STD34c8_000000.tmp"

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(company1)
Sargan-Hansen statistic  78.108  Chi-sq(6)    P-value = 0.0000

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

03 Jun 2024, 07:32

Wout:
I do not undestand ythe reasons of your concern: the -xtoverid- outcome clearly rejects the null that -re- is the way to go.
Therefore, you should stick with -fe- and explore wether your regression is correctly specified (see my previous example).

Kind regards,
Carlo
(Stata 19.0)
Comment
Wout Baetens

Join Date: May 2024

Posts: 8
#10

03 Jun 2024, 07:38

Hausman test also indicates that FE model fits better
Comment
Wout Baetens

Join Date: May 2024

Posts: 8
#11

03 Jun 2024, 07:40

Okay thank you for the clarification
Comment

Sam Murgatroyd

Join Date: Oct 2023
Posts: 33

#12

14 Jun 2024, 03:27

Dear Carlo,

In response to your comment quoted below:

Originally posted by Carlo Lazzaro View Post

2) test wether -re- is the way to go via the community-contributed module -xtoverid- (type -search xtoverid- and follow the instructions to install it).

If one has the potential to include time-invariant controls in their RE model, should the xtoverid test be implemented using a model that includes these time invariant controls, or without them?

As an example, I have one time invariant dummy that represents different regional groupings, should my xtoverid test include the region dummy, or not?

WithOUT region dummy:

Code:

 xi: xtreg price_dispersion_use i.TS_ce2 E unem lnGDPPC i.year, re cluster(id)
i.TS_ce2          _ITS_ce2_1-10       (naturally coded; _ITS_ce2_1 omitted)
i.year            _Iyear_2014-2022    (naturally coded; _Iyear_2014 omitted)

Random-effects GLS regression                   Number of obs     =        664
Group variable: id                              Number of groups  =        165

R-squared:                                      Obs per group:
     Within  = 0.0915                                         min =          1
     Between = 0.4706                                         avg =        4.0
     Overall = 0.4428                                         max =          5

                                                Wald chi2(14)     =    1547.38
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                   (Std. err. adjusted for 165 clusters in id)
------------------------------------------------------------------------------
             |               Robust
price_disp~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  _ITS_ce2_2 |  -15.40622   3.801623    -4.05   0.000    -22.85727   -7.955176
  _ITS_ce2_3 |  -12.90865   4.646619    -2.78   0.005    -22.01586   -3.801448
  _ITS_ce2_4 |   4.656375   3.178191     1.47   0.143    -1.572766    10.88552
  _ITS_ce2_5 |  -2.627148    4.08374    -0.64   0.520    -10.63113    5.376835
  _ITS_ce2_6 |  -10.30967   3.339777    -3.09   0.002    -16.85552   -3.763831
  _ITS_ce2_8 |  -13.72805    3.66635    -3.74   0.000    -20.91397   -6.542139
 _ITS_ce2_10 |  -22.86692   3.457971    -6.61   0.000    -29.64442   -16.08942
           E |   -.456633   1.104486    -0.41   0.679    -2.621386     1.70812
        unem |  -.3757004   .1861485    -2.02   0.044    -.7405448    -.010856
     lnGDPPC |   7.160454   1.115432     6.42   0.000     4.974248    9.346661
 _Iyear_2016 |   1.473102   1.239416     1.19   0.235    -.9561087    3.902312
 _Iyear_2018 |   2.016894   1.545326     1.31   0.192    -1.011889    5.045677
 _Iyear_2020 |   2.913825   1.559089     1.87   0.062    -.1419326    5.969584
 _Iyear_2022 |   2.054701   1.688859     1.22   0.224    -1.255403    5.364804
       _cons |  -6.286011   11.86555    -0.53   0.596    -29.54207    16.97004
-------------+----------------------------------------------------------------
     sigma_u |  13.232832
     sigma_e |  11.041641
         rho |  .58953776   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic 225.588  Chi-sq(14)   P-value = 0.0000

With region dummy:

Code:

xi: xtreg price_dispersion_use i.TS_ce2 E unem lnGDPPC i.region_id i.year, re cluster(id)
i.TS_ce2          _ITS_ce2_1-10       (naturally coded; _ITS_ce2_1 omitted)
i.region_id       _Iregion_id_1-6     (naturally coded; _Iregion_id_1 omitted)
i.year            _Iyear_2014-2022    (naturally coded; _Iyear_2014 omitted)

Random-effects GLS regression                   Number of obs     =        664
Group variable: id                              Number of groups  =        165

R-squared:                                      Obs per group:
     Within  = 0.0918                                         min =          1
     Between = 0.5101                                         avg =        4.0
     Overall = 0.4556                                         max =          5

                                                Wald chi2(19)     =    1682.55
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                    (Std. err. adjusted for 165 clusters in id)
-------------------------------------------------------------------------------
              |               Robust
price_dispe~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
   _ITS_ce2_2 |  -13.51359   4.018199    -3.36   0.001    -21.38912   -5.638067
   _ITS_ce2_3 |  -10.14077   5.144088    -1.97   0.049      -20.223   -.0585472
   _ITS_ce2_4 |   5.253318   3.835672     1.37   0.171    -2.264461     12.7711
   _ITS_ce2_5 |  -1.953641   4.079841    -0.48   0.632    -9.949982    6.042701
   _ITS_ce2_6 |  -8.720096   3.455647    -2.52   0.012    -15.49304   -1.947152
   _ITS_ce2_8 |  -11.38548   4.023333    -2.83   0.005    -19.27107   -3.499896
  _ITS_ce2_10 |   -21.0094   4.130606    -5.09   0.000    -29.10524   -12.91356
            E |   .0301852   1.129646     0.03   0.979     -2.18388     2.24425
         unem |  -.3046913   .1797088    -1.70   0.090    -.6569141    .0475315
      lnGDPPC |   6.373911   1.309296     4.87   0.000     3.807737    8.940085
_Iregion_id_2 |   6.161709   4.126013     1.49   0.135    -1.925129    14.24855
_Iregion_id_3 |  -7.405391   5.823265    -1.27   0.203    -18.81878    4.007998
_Iregion_id_4 |   3.858108   4.960603     0.78   0.437    -5.864495    13.58071
_Iregion_id_5 |  -8.662549   6.677738    -1.30   0.195    -21.75067    4.425577
_Iregion_id_6 |   6.337647   5.404951     1.17   0.241    -4.255861    16.93116
  _Iyear_2016 |   1.585421   1.242743     1.28   0.202    -.8503104    4.021153
  _Iyear_2018 |   2.133654    1.55553     1.37   0.170    -.9151293    5.182437
  _Iyear_2020 |   3.042491   1.573126     1.93   0.053    -.0407788     6.12576
  _Iyear_2022 |    2.29141       1.72     1.33   0.183    -1.079727    5.662548
        _cons |   -4.39099   12.18405    -0.36   0.719    -28.27129     19.4893
--------------+----------------------------------------------------------------
      sigma_u |  12.989944
      sigma_e |  11.041641
          rho |  .58054322   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic 142.171  Chi-sq(14)   P-value = 0.0000

From both, it is clear that fixed effects is the way to go; and I also noticed that Chi-sq(14) in both models; but the Sargan-Hansen statistics are different. I don't know which version is the most correct.

If dataex helps with answering my question, here it is:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str56 country float id double price_dispersion_use float TS_ce2 byte E double(coc unem) float lnGDPPC str3 region float region_id int year
"Afghanistan"          1                 15 . 4   -1.36474287509918               7.91  8.014661 "EMR" 3 2014
"Afghanistan"          1                  . . 5   -1.54035270214081             10.092  7.994392 "EMR" 3 2016
"Afghanistan"          1 13.333333333333334 . 5   -1.50288057327271             11.131  7.974823 "EMR" 3 2018
"Afghanistan"          1  11.76470588235294 . 5   -1.49369978904724              11.71  7.928968 "EMR" 3 2020
"Afghanistan"          1                  . . 5   -1.18377649784088               14.1         . "EMR" 3 2022
"Albania"              2  44.44444444444444 1 5   -.586141347885132              18.05  9.465752 "EUR" 4 2014
"Albania"              2 56.666666666666664 1 5   -.471469223499298              15.42  9.524819 "EUR" 4 2016
"Albania"              2               62.5 1 5   -.545840263366699               12.3  9.604934 "EUR" 4 2018
"Albania"              2  60.60606060606061 1 5   -.572924494743347             12.833   9.60202 "EUR" 4 2020
"Albania"              2                 60 1 5   -.407875537872314             11.629  9.756207 "EUR" 4 2022
"Algeria"              3  33.33333333333333 6 4    -.61265641450882              10.21  9.511575 "AFR" 1 2014
"Algeria"              3 35.714285714285715 . 4    -.67341673374176               10.2  9.539472 "AFR" 1 2016
"Algeria"              3                 15 . 5   -.658660113811493             12.145  9.525714 "AFR" 1 2018
"Algeria"              3                 50 . 5    -.66646021604538             14.036  9.447598 "AFR" 1 2020
"Algeria"              3  48.57142857142857 5 5   -.637929856777191             12.491 9.4796715 "AFR" 1 2022
"Andorra"              4  72.85714285714285 6 2    1.22070860862732 3.4574213637138524 11.030043 "EUR" 4 2014
"Andorra"              4  72.85714285714285 6 2    1.15955591201782 3.4620623503549632 11.067958 "EUR" 4 2016
"Andorra"              4  77.77777777777777 1 2    1.17916560173035 3.4856245150448624 11.053652 "EUR" 4 2018
"Andorra"              4  68.44993141289439 1 2    1.26600527763367 3.5770777680298496  10.91981 "EUR" 4 2020
"Andorra"              4  69.86301369863014 1 2    1.27020359039307  3.890483345078273 11.056888 "EUR" 4 2022
"Angola"               5                 75 . 2   -1.45779824256897             16.317  9.236285 "AFR" 1 2014
"Angola"               5                  . . 2   -1.48333728313446             16.577  9.147498 "AFR" 1 2016
"Angola"               5                  . . 4   -1.19925093650818             16.626   9.06262 "AFR" 1 2018
"Angola"               5                 25 2 4   -.938672542572021             16.698 8.9309025 "AFR" 1 2020
"Angola"               5                 25 . 4   -.601941287517548             14.478  8.910195 "AFR" 1 2022
"Antigua and Barbuda"  6                 75 . 2    .634897768497467 3.4574213637138524 10.143725 "AMR" 2 2014
"Antigua and Barbuda"  6                 50 . 2    .645558714866638 3.4620623503549632  10.18348 "AMR" 2 2016
"Antigua and Barbuda"  6             66.875 . 5    .236239701509476 3.4856245150448624 10.263378 "AMR" 2 2018
"Antigua and Barbuda"  6  62.05673758865249 . 5    .238533273339272 3.5770777680298496 10.073401 "AMR" 2 2020
"Antigua and Barbuda"  6 63.829787234042556 . 5    .310604453086853  3.890483345078273  10.23125 "AMR" 2 2022
"Argentina"            7 41.935483870967744 2 4   -.549443066120148               7.27  10.25563 "AMR" 2 2014
"Argentina"            7              37.75 3 4   -.298964887857437              8.085 10.240202 "AMR" 2 2016
"Argentina"            7  45.34920634920635 3 4   -.098668172955513               9.22 10.220944 "AMR" 2 2018
"Argentina"            7 18.726114649681527 3 4    -.16378065943718              11.46 10.076843 "AMR" 2 2020
"Argentina"            7 13.384615384615383 3 4   -.447030484676361              6.805   10.2083 "AMR" 2 2022
"Armenia"              8                 30 6 2   -.565155386924744             11.989  9.488754 "EUR" 4 2014
"Armenia"              8 26.666666666666668 6 2   -.659123718738556             12.625  9.530623 "EUR" 4 2016
"Armenia"              8 42.857142857142854 3 2   -.408891350030899              13.21  9.663906 "EUR" 4 2018
"Armenia"              8               47.5 1 5 -.00343869999051094              12.18  9.673404 "EUR" 4 2020
"Armenia"              8  48.23529411764706 1 5   .0280352365225554              8.588  9.857456 "EUR" 4 2022
"Australia"            9  78.93318965517241 1 4    1.84946465492249               6.08  10.90798 "WPR" 6 2014
"Australia"            9  73.84341637010677 1 4    1.77200365066528               5.71  10.92716 "WPR" 6 2016
"Australia"            9  82.34126984126985 1 4    1.76737761497498                5.3 10.947303 "WPR" 6 2018
"Australia"            9  71.02189781021899 6 4    1.63295590877533               6.46 10.938417 "WPR" 6 2020
"Australia"            9  68.45524542829644 6 4    1.76448953151703                3.7 10.987324 "WPR" 6 2022
"Austria"             10  80.61224489795919 4 4    1.46674907207489               5.67 11.040983 "EUR" 4 2014
"Austria"             10                 80 4 4    1.49696803092957               6.06 11.048753 "EUR" 4 2016
"Austria"             10                 80 4 4    1.56836605072021               4.93 11.083235 "EUR" 4 2018
"Austria"             10  82.45614035087719 4 4    1.47778916358948                5.2 11.020405 "EUR" 4 2020
"Austria"             10  68.35820895522387 4 4    1.25861942768097               4.99 11.094935 "EUR" 4 2022
"Azerbaijan"          11                 24 6 4   -1.02249026298523               4.91  9.938668 "EUR" 4 2014
"Azerbaijan"          11              56.25 1 4   -.852654457092285                  5  9.894967 "EUR" 4 2016
"Azerbaijan"          11 23.076923076923077 6 5   -.852769494056702                4.9  9.893378 "EUR" 4 2018
"Azerbaijan"          11  47.05882352941177 6 5   -1.07708406448364               7.24  9.858809 "EUR" 4 2020
"Azerbaijan"          11  55.55555555555556 6 5   -1.04057228565216               5.65  9.953777 "EUR" 4 2022
"Bahamas"             12 48.658536585365916 1 2    1.30873775482178                  . 10.381657 "AMR" 2 2014
"Bahamas"             12  40.22346368715088 1 2    1.06738793849945               12.7 10.366473 "AMR" 2 2016
"Bahamas"             12                  . . 2    1.09553563594818                 10 10.405302 "AMR" 2 2018
"Bahamas"             12  61.08949416342412 1 2    1.10620594024658             12.563 10.118558 "AMR" 2 2020
"Bahamas"             12                  . 1 2    1.25618994235992             10.089 10.401076 "AMR" 2 2022
"Bahrain"             13                 50 . 5    .273521840572357              1.147 10.890368 "EMR" 3 2014
"Bahrain"             13  33.33333333333333 . 5  -.0476647540926933              1.193 10.877423 "EMR" 3 2016
"Bahrain"             13                 40 2 5   -.176231503486633              1.198  10.88669 "EMR" 3 2018
"Bahrain"             13  34.78260869565218 3 5  -.0935939401388168              1.786 10.867227 "EMR" 3 2020
"Bahrain"             13 58.333333333333336 3 5    .139385640621185              1.339 10.944588 "EMR" 3 2022
"Bangladesh"          14 15.789473684210526 . 4   -.892129957675934              4.405  8.543592 "SEA" 5 2014
"Bangladesh"          14 22.727272727272727 . 4    -.88687801361084               4.35  8.651562 "SEA" 5 2016
"Bangladesh"          14  33.33333333333333 . 4   -.926946818828583              4.373  8.761912 "SEA" 5 2018
"Bangladesh"          14 32.142857142857146 . 4   -1.00367724895477              5.316  8.849105 "SEA" 5 2020
"Bangladesh"          14                 25 . 4    -1.0755273103714              4.271   8.96254 "SEA" 5 2022
"Barbados"            15  79.32850559578671 1 2    1.13345634937286              12.17  9.704554 "AMR" 2 2014
"Barbados"            15              81.25 1 2     1.2135511636734               8.25    9.7494 "AMR" 2 2016
"Barbados"            15  45.23433385992628 1 2    1.37191247940063               8.32  9.741538 "AMR" 2 2018
"Barbados"            15                  . . 2    1.19406688213348              9.743  9.604329 "AMR" 2 2020
"Barbados"            15  78.84615384615384 1 2    1.28457343578339              8.501  9.700481 "AMR" 2 2022
"Belarus"             16             35.625 6 4    -.23470650613308              5.908 10.187328 "EUR" 4 2014
"Belarus"             16 31.914893617021278 6 4   -.224086627364159               5.84  10.12049 "EUR" 4 2016
"Belarus"             16 30.645161290322577 6 4    -.15480200946331               4.76 10.179738 "EUR" 4 2018
"Belarus"             16  25.71428571428572 6 4   -.133964225649834               4.05 10.193598 "EUR" 4 2020
"Belarus"             16 23.958333333333332 6 4    -.57967621088028               3.57 10.185905 "EUR" 4 2022
"Belgium"             17  80.82901554404145 4 4    1.51295030117035               8.52  10.96869 "EUR" 4 2014
"Belgium"             17  81.64556962025317 4 4    1.53148806095123               7.83  10.99063 "EUR" 4 2016
"Belgium"             17  83.33333333333334 4 4    1.42942035198212               5.95 11.016062 "EUR" 4 2018
"Belgium"             17  85.29411764705883 4 4    1.44595634937286               5.55 10.974466 "EUR" 4 2020
"Belgium"             17               72.5 4 4    1.49504864215851               5.56  11.05771 "EUR" 4 2022
"Belize"              18  41.66666666666667 . 2   -.159106820821762               8.24 9.4002905 "AMR" 2 2014
"Belize"              18  41.66666666666667 1 2   -.229891732335091                  7  9.390294 "AMR" 2 2016
"Belize"              18                 40 1 2   -.169436514377594              7.899  9.343116 "AMR" 2 2018
"Belize"              18                 50 1 2   -.193349361419678             10.619  9.203805 "AMR" 2 2020
"Belize"              18 50.391644908616186 1 2   -.237028583884239              8.672  9.426018 "AMR" 2 2022
"Benin"               19                  . 2 4   -.669143795967102              1.808  8.040519 "AFR" 1 2014
"Benin"               19                 20 2 4   -.529120028018951              1.843  8.031984 "AFR" 1 2016
"Benin"               19               22.5 2 5   -.391388416290283               1.47  8.093288 "AFR" 1 2018
"Benin"               19 47.368421052631575 2 5   -.040327787399292              1.616  8.140294 "AFR" 1 2020
"Benin"               19                  . 2 5   -.124255605041981              1.476  8.215443 "AFR" 1 2022
"Bhutan"              20                  . . 4    1.30612897872925               2.63  9.345343 "SEA" 5 2014
"Bhutan"              20                  . . 4    1.09102046489716              2.747  9.469749 "SEA" 5 2016
"Bhutan"              20                  . . 4    1.59051811695099               3.35  9.533319 "SEA" 5 2018
"Bhutan"              20                  . . 4    1.61823654174805               5.03  9.467918 "SEA" 5 2020
"Bhutan"              20                  . 2 4    1.51425933837891               5.95         . "SEA" 5 2022
end
label values TS_ce2 TS_ce2_l
label def TS_ce2_l 1 "specific uniform", modify
label def TS_ce2_l 2 "adv un NO min", modify
label def TS_ce2_l 3 "adv uni WITH min", modify
label def TS_ce2_l 4 "mixed uni NO min", modify
label def TS_ce2_l 5 "mixed uni WITH min", modify
label def TS_ce2_l 6 "specific tiered", modify
label values region_id region_id_l
label def region_id_l 1 "AFR", modify
label def region_id_l 2 "AMR", modify
label def region_id_l 3 "EMR", modify
label def region_id_l 4 "EUR", modify
label def region_id_l 5 "SEA", modify
label def region_id_l 6 "WPR", modify

Thank you!

Sam

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#13

14 Jun 2024, 03:36

Sam:

As an example, I have one time invariant dummy that represents different regional groupings, should my xtoverid test include the region dummy, or not?

Short answer: yes, you should include it.

In addition, you coded two different specifications in the two regressions. Therefore, while both -xtoverid- outcomes point you towards -fe-, no wonder that the Sargan statististics differ.
That said, if your time-invariant predictor are notwithstanding crucial for your research goal, you may want to explore The Stata Blog » Fixed effects or random effects: The Mundlak approach

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Sam Murgatroyd

Join Date: Oct 2023
Posts: 33

#14

14 Jun 2024, 04:24

Carlo Lazzaro thank you for the advice. I read the article that you shared a link to and tried to implement it.

Code:

. ** Step 1: gen means of time-varying controls:
.
. ** TS_ce_2 is a categorical variable:

. bysort id: egen mean_TS_ce2 = mean(TS_ce2)
(205 missing values generated)

.
. ** E is a continuous variable:
. bysort id: egen mean_E = mean(E)
(110 missing values generated)

.
. ** unem is a continuous variable:
. bysort id: egen mean_unem = mean(unem)
(120 missing values generated)

.
. ** lnGDPPC is a continuous variable:
. bysort id: egen mean_lnGDPPC = mean(lnGDPPC)
(160 missing values generated)

.
. ** year
. bysort id: egen mean_year = mean(year)
(110 missing values generated)

.
. ** Step 2:  run regression including time-invariant region_id, all time-varying controls and their panel means:
.
. quietly xtreg price_dispersion_use region_id i.TS_ce2 E unem lnGDPPC i.year mean_TS_ce2 mean_E mean_unem mean_lnGDPPC mean_year, vce(robust)

.
. estimates store mundlak

.
. ** Step 3: do the test:
.
. test mean_TS_ce2 mean_E mean_unem mean_lnGDPPC mean_year

 ( 1)  mean_TS_ce2 = 0
 ( 2)  mean_E = 0
 ( 3)  mean_unem = 0
 ( 4)  mean_lnGDPPC = 0
 ( 5)  o.mean_year = 0
       Constraint 5 dropped

           chi2(  4) =    6.60
         Prob > chi2 =    0.1585

According to this, we fail to reject the null. This is evidence that there is no correlation between the time-invariant unobservable and my regressors; that is, the random effects assumptions are satisfied.

However, in step 3, my year mean gets omitted. I am not sure if this is right? I also included the region_id in my regression, and I am not sure if I was meant to do this (I treated region_id) as being the same as x1 in the example article you shared.

Moreover, it is not clear to me if this test is meant to supplement the Hausman test implemented with xtoverid; or if or is meant to replace it. The Hausman test in my original post pointed to FE being the consistent estimator. But, if I have implemented this Mundlak test correctly, then I am told that RE is the way to go. Which test am I to follow?

Thank you!

Sam

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#15

14 Jun 2024, 05:14

Sam:
your panel is severely unbalanced. Therefore, Mundlak outcome is less reliable.
I would trust -xtoverid- outcome more than Mundlak one and go -fe-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement