Beware when using estat ic after mixed ..., nostderr

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

Beware when using estat ic after mixed ..., nostderr

01 Jul 2024, 05:44

Dear Statalisters,
Here's something I just learned (the hard way) about the mixed command in Stata.

I'm fitting thousands of mixed models to determine a frequentist model averaging (FMA) estimator for a set of fixed effects. Since my focus is not on the random effects, I'm using the nostderr option to speed up the process.

I was surprised to find that this makes a difference when I later use estat ic to calculate information criteria, such as Akaike's information criterion (AIC), which I need to determine FMA weights.

Here's an example from the documentation:

Code:

  use https://www.stata-press.com/data/r18/productivity, clear
 
. mixed gsp private emp hwy water other unemp || region: || state:
  Performing EM optimization ...
    Performing gradient-based optimization:
  Iteration 0:  Log likelihood =  1430.5017  
  Iteration 1:  Log likelihood =  1430.5017  
    Computing standard errors ...
    Mixed-effects ML regression                           Number of obs =      816
            Grouping information
          -------------------------------------------------------------
                          |     No. of       Observations per group
           Group variable |     groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                   region |          9         51       90.7        136
                    state |         48         17       17.0         17
          -------------------------------------------------------------
                                                          Wald chi2(6)  = 18829.06
  Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
    ------------------------------------------------------------------------------
           gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
  -------------+----------------------------------------------------------------
       private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
           emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
           hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
         water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
         other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
         unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
         _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
  ------------------------------------------------------------------------------
    ------------------------------------------------------------------------------
    Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
  -----------------------------+------------------------------------------------
  region: Identity             |
                    var(_cons) |   .0014506   .0012995      .0002506    .0083957
  -----------------------------+------------------------------------------------
  state: Identity              |
                    var(_cons) |   .0062757   .0014871      .0039442    .0099855
  -----------------------------+------------------------------------------------
                 var(Residual) |   .0013461   .0000689      .0012176    .0014882
  ------------------------------------------------------------------------------
  LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
    Note: LR test is conservative and provided only for reference.


. estat ic
    Akaike's information criterion and Bayesian information criterion
    -----------------------------------------------------------------------------
         Model |          N   ll(null)  ll(model)      df        AIC        BIC
  -------------+---------------------------------------------------------------
             . |        816          .   1430.502      10  -2841.003  -2793.959
  -----------------------------------------------------------------------------
  Note: BIC uses N = number of observations. See [R] IC note.

Since AIC is calculated as -2 * log-likelihood + 2 * degrees of freedom, everything works as intended because, by default, Stata computes the number of degrees of freedom as kf + kr, where kf and kr are the number of estimated fixed-effects (here: 7) and random-effects parameters (here: 3).
So I didn't expect that the results would change if I used

Code:

nostderr

. However, this is not the case:

Code:

  . mixed gsp private emp hwy water other unemp || region: || state:, nostderr /// <-- here's the change
    Performing EM optimization ...
    Performing gradient-based optimization:
  Iteration 0:  Log likelihood =  1430.5017  
  Iteration 1:  Log likelihood =  1430.5017  
    Mixed-effects ML regression                           Number of obs =      816
            Grouping information
          -------------------------------------------------------------
                          |     No. of       Observations per group
           Group variable |     groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                   region |          9         51       90.7        136
                    state |         48         17       17.0         17
          -------------------------------------------------------------
                                                          Wald chi2(6)  = 18829.06
  Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
    ------------------------------------------------------------------------------
           gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
  -------------+----------------------------------------------------------------
       private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
           emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
           hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
         water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
         other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
         unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
         _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
  ------------------------------------------------------------------------------
    ------------------------------------------------------------------------------
    Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
  -----------------------------+------------------------------------------------
  region: Identity             |
                    var(_cons) |   .0014506          .             .           .
  -----------------------------+------------------------------------------------
  state: Identity              |
                    var(_cons) |   .0062757          .             .           .
  -----------------------------+------------------------------------------------
                 var(Residual) |   .0013461          .             .           .
  ------------------------------------------------------------------------------
  LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
    Note: LR test is conservative and provided only for reference.
    

. estat ic
    Akaike's information criterion and Bayesian information criterion
    -----------------------------------------------------------------------------
         Model |          N   ll(null)  ll(model)      df        AIC        BIC
  -------------+---------------------------------------------------------------
             . |        816          .   1430.502       7  -2847.003  -2814.072
  -----------------------------------------------------------------------------
  Note: BIC uses N = number of observations. See [R] IC note.

As you can see, the number of degrees of freedom is now 7 instead of 10. One could argue that this makes sense because the standard errors for the random effects are not calculated. However, I didn't expect this to affect the computation of information criteria, so I thought it might be worth mentioning.

HTH,
Ali

Last edited by Alexander Koplenig; 01 Jul 2024, 05:46.

Tags: None

Sebastian Kripfganz

Join Date: May 2014

Posts: 2561
#2

01 Jul 2024, 06:53

It should indeed not make a difference. This seems to be a bug; you might want to report it to Stata's Technical Support.

https://www.kripfganz.de/stata/
1 like
Comment
Alexander Koplenig

Join Date: Jul 2014

Posts: 39
#3

01 Jul 2024, 07:09

Good idea. I've done so
Comment

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

02 Jul 2024, 01:32

Here's the answer that I received from the Technical Support team

The command -estat ic- after -mixed- obtains the degrees of freedom using the rank of the matrix e(V). When no standard error of the variance component is computed, it cannot accurately compute the degree of freedom. It might be better that we should make this command error out when the variance components are not calculated. I have thus added this into our Request/Bug database for our developers to make further evaluation.

So that settles that. However, I don't think it's necessary or desirable to get an error when using the nostderr option. Here's a simple workaround

Code:

. mixed gsp private emp hwy water other unemp || region: || state:, nostderr

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:  Log likelihood =  1430.5017  
Iteration 1:  Log likelihood =  1430.5017  

Mixed-effects ML regression                           Number of obs =      816

        Grouping information
        -------------------------------------------------------------
                        |     No. of       Observations per group
         Group variable |     groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                 region |          9         51       90.7        136
                  state |         48         17       17.0         17
        -------------------------------------------------------------

                                                      Wald chi2(6)  = 18829.06
Log likelihood =  1430.5017                           Prob > chi2   =   0.0000

------------------------------------------------------------------------------
         gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
         emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
         hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
       water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
       other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
       unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
       _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
region: Identity             |
                  var(_cons) |   .0014506          .             .           .
-----------------------------+------------------------------------------------
state: Identity              |
                  var(_cons) |   .0062757          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   .0013461          .             .           .
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. *Compute AIC

. di -2*e(ll) + 2*e(k)
-2841.0034

. *Or:

. local k = e(k)

. estat ic, df(`k')

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |        816          .   1430.502      10  -2841.003  -2793.959
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] IC note.

Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2561

02 Jul 2024, 04:07

This workaround only works if there are no omitted variables. In the case of perfect collinearity, for example, e(k) is too large:

Code:

. webuse productivity
(Public capital productivity)

. gen priv2 = private * 2

. mixed gsp private priv2 emp hwy water other unemp || region: || state:, nostderr
note: priv2 omitted because of collinearity.

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:  Log likelihood =  1430.5017  
Iteration 1:  Log likelihood =  1430.5017  

Mixed-effects ML regression                           Number of obs =      816

        Grouping information
        -------------------------------------------------------------
                        |     No. of       Observations per group
         Group variable |     groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                 region |          9         51       90.7        136
                  state |         48         17       17.0         17
        -------------------------------------------------------------

                                                      Wald chi2(6)  = 18829.06
Log likelihood =  1430.5017                           Prob > chi2   =   0.0000

------------------------------------------------------------------------------
         gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
       priv2 |          0  (omitted)
         emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
         hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
       water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
       other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
       unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
       _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
region: Identity             |
                  var(_cons) |   .0014506          .             .           .
-----------------------------+------------------------------------------------
state: Identity              |
                  var(_cons) |   .0062757          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   .0013461          .             .           .
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. local k = e(k)

. estat ic, df(`k')

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |        816          .   1430.502      11  -2839.003  -2787.255
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] IC note.

This is why, in general, obtaining the degrees of freedom from the rank of the variance-covariance matrix is the preferred way.

https://www.kripfganz.de/stata/

Comment

Alexander Koplenig

Join Date: Jul 2014

Posts: 39
#6

02 Jul 2024, 04:28

Interesting, and something I hadn't thought of. However, from a model selection perspective, I would think that a difference in AIC of (-2839.003 - -2841.003) = 2 between the model with the omitted collinear variable and the original one is appropriate, as the former is rightly penalised for the added but unnecessary complexity.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3804
#7

02 Jul 2024, 04:32

Now I might have never fully understood the theoretical concept behind degrees of freedom but from what I understand,

Originally posted by Alexander Koplenig View Post

One could argue that this makes sense because the standard errors for the random effects are not calculated.

is indeed a valid argument. If you estimate fewer parameters from the data, this should affect the degrees of freedom, right?* This, in turn, would be another argument for why

Originally posted by Sebastian Kripfganz View Post

in general, obtaining the degrees of freedom from the rank of the variance-covariance matrix is the preferred way.

I can see that the behavior is unexpected but I am not sure whether it is actually invalid. Conversely: if the number of estimated parameters affects the degrees of freedom, would that render the suggested workarounds (if they actually worked) invalid?

*Edit: I think I am wrong. Estimating the standard errors does not impose additional restrictions and thus should not affect the degrees of freedom.

Last edited by daniel klein; 02 Jul 2024, 04:52.
Comment
Alexander Koplenig

Join Date: Jul 2014

Posts: 39
#8

02 Jul 2024, 04:46

Given that the documentation says that

By default, the number of degrees of freedom in estat ic is calculated as k = k_f + k_r , where k_f and k_r
are the number of estimated fixed-effects and random-effects parameters, respectively

I would argue that from a model selection perspective, the AIC values of identical mixed models should be identical whether or not the -nostderr- option is used.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2561
#9

03 Jul 2024, 03:16

Originally posted by daniel klein View Post

*Edit: I think I am wrong. Estimating the standard errors does not impose additional restrictions and thus should not affect the degrees of freedom.

Yes, "estimated parameters" are the coefficients, not the standard errors.

https://www.kripfganz.de/stata/
1 like
Comment

Announcement

Beware when using estat ic after mixed ..., nostderr

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment