Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beware when using estat ic after mixed ..., nostderr

    Dear Statalisters,
    Here's something I just learned (the hard way) about the mixed command in Stata.

    I'm fitting thousands of mixed models to determine a frequentist model averaging (FMA) estimator for a set of fixed effects. Since my focus is not on the random effects, I'm using the nostderr option to speed up the process.

    I was surprised to find that this makes a difference when I later use estat ic to calculate information criteria, such as Akaike's information criterion (AIC), which I need to determine FMA weights.

    Here's an example from the documentation:

    Code:
      use https://www.stata-press.com/data/r18/productivity, clear
     
    . mixed gsp private emp hwy water other unemp || region: || state:
      Performing EM optimization ...
        Performing gradient-based optimization:
      Iteration 0:  Log likelihood =  1430.5017  
      Iteration 1:  Log likelihood =  1430.5017  
        Computing standard errors ...
        Mixed-effects ML regression                           Number of obs =      816
                Grouping information
              -------------------------------------------------------------
                              |     No. of       Observations per group
               Group variable |     groups    Minimum    Average    Maximum
              ----------------+--------------------------------------------
                       region |          9         51       90.7        136
                        state |         48         17       17.0         17
              -------------------------------------------------------------
                                                              Wald chi2(6)  = 18829.06
      Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
        ------------------------------------------------------------------------------
               gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
               emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
               hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
             water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
             other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
             unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
             _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
      ------------------------------------------------------------------------------
        ------------------------------------------------------------------------------
        Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
      -----------------------------+------------------------------------------------
      region: Identity             |
                        var(_cons) |   .0014506   .0012995      .0002506    .0083957
      -----------------------------+------------------------------------------------
      state: Identity              |
                        var(_cons) |   .0062757   .0014871      .0039442    .0099855
      -----------------------------+------------------------------------------------
                     var(Residual) |   .0013461   .0000689      .0012176    .0014882
      ------------------------------------------------------------------------------
      LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
        Note: LR test is conservative and provided only for reference.
    
    
    . estat ic
        Akaike's information criterion and Bayesian information criterion
        -----------------------------------------------------------------------------
             Model |          N   ll(null)  ll(model)      df        AIC        BIC
      -------------+---------------------------------------------------------------
                 . |        816          .   1430.502      10  -2841.003  -2793.959
      -----------------------------------------------------------------------------
      Note: BIC uses N = number of observations. See [R] IC note.
    Since AIC is calculated as -2 * log-likelihood + 2 * degrees of freedom, everything works as intended because, by default, Stata computes the number of degrees of freedom as kf + kr, where kf and kr are the number of estimated fixed-effects (here: 7) and random-effects parameters (here: 3).
    So I didn't expect that the results would change if I used
    Code:
    nostderr
    . However, this is not the case:
    Code:
      . mixed gsp private emp hwy water other unemp || region: || state:, nostderr /// <-- here's the change
        Performing EM optimization ...
        Performing gradient-based optimization:
      Iteration 0:  Log likelihood =  1430.5017  
      Iteration 1:  Log likelihood =  1430.5017  
        Mixed-effects ML regression                           Number of obs =      816
                Grouping information
              -------------------------------------------------------------
                              |     No. of       Observations per group
               Group variable |     groups    Minimum    Average    Maximum
              ----------------+--------------------------------------------
                       region |          9         51       90.7        136
                        state |         48         17       17.0         17
              -------------------------------------------------------------
                                                              Wald chi2(6)  = 18829.06
      Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
        ------------------------------------------------------------------------------
               gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
               emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
               hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
             water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
             other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
             unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
             _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
      ------------------------------------------------------------------------------
        ------------------------------------------------------------------------------
        Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
      -----------------------------+------------------------------------------------
      region: Identity             |
                        var(_cons) |   .0014506          .             .           .
      -----------------------------+------------------------------------------------
      state: Identity              |
                        var(_cons) |   .0062757          .             .           .
      -----------------------------+------------------------------------------------
                     var(Residual) |   .0013461          .             .           .
      ------------------------------------------------------------------------------
      LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
        Note: LR test is conservative and provided only for reference.
        
    
    . estat ic
        Akaike's information criterion and Bayesian information criterion
        -----------------------------------------------------------------------------
             Model |          N   ll(null)  ll(model)      df        AIC        BIC
      -------------+---------------------------------------------------------------
                 . |        816          .   1430.502       7  -2847.003  -2814.072
      -----------------------------------------------------------------------------
      Note: BIC uses N = number of observations. See [R] IC note.
    As you can see, the number of degrees of freedom is now 7 instead of 10. One could argue that this makes sense because the standard errors for the random effects are not calculated. However, I didn't expect this to affect the computation of information criteria, so I thought it might be worth mentioning.


    HTH,
    Ali
    Last edited by Alexander Koplenig; 01 Jul 2024, 05:46.

  • #2
    It should indeed not make a difference. This seems to be a bug; you might want to report it to Stata's Technical Support.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Good idea. I've done so

      Comment


      • #4
        Here's the answer that I received from the Technical Support team

        The command -estat ic- after -mixed- obtains the degrees of freedom using the rank of the matrix e(V). When no standard error of the variance component is computed, it cannot accurately compute the degree of freedom. It might be better that we should make this command error out when the variance components are not calculated. I have thus added this into our Request/Bug database for our developers to make further evaluation.
        So that settles that. However, I don't think it's necessary or desirable to get an error when using the nostderr option. Here's a simple workaround

        Code:
        . mixed gsp private emp hwy water other unemp || region: || state:, nostderr
        
        Performing EM optimization ...
        
        Performing gradient-based optimization:
        Iteration 0:  Log likelihood =  1430.5017  
        Iteration 1:  Log likelihood =  1430.5017  
        
        Mixed-effects ML regression                           Number of obs =      816
        
                Grouping information
                -------------------------------------------------------------
                                |     No. of       Observations per group
                 Group variable |     groups    Minimum    Average    Maximum
                ----------------+--------------------------------------------
                         region |          9         51       90.7        136
                          state |         48         17       17.0         17
                -------------------------------------------------------------
        
                                                              Wald chi2(6)  = 18829.06
        Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
        
        ------------------------------------------------------------------------------
                 gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
             private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
                 emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
                 hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
               water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
               other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
               unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
               _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
          Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
        -----------------------------+------------------------------------------------
        region: Identity             |
                          var(_cons) |   .0014506          .             .           .
        -----------------------------+------------------------------------------------
        state: Identity              |
                          var(_cons) |   .0062757          .             .           .
        -----------------------------+------------------------------------------------
                       var(Residual) |   .0013461          .             .           .
        ------------------------------------------------------------------------------
        LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
        
        Note: LR test is conservative and provided only for reference.
        
        . *Compute AIC
        
        . di -2*e(ll) + 2*e(k)
        -2841.0034
        
        . *Or:
        
        . local k = e(k)
        
        . estat ic, df(`k')
        
        Akaike's information criterion and Bayesian information criterion
        
        -----------------------------------------------------------------------------
               Model |          N   ll(null)  ll(model)      df        AIC        BIC
        -------------+---------------------------------------------------------------
                   . |        816          .   1430.502      10  -2841.003  -2793.959
        -----------------------------------------------------------------------------
        Note: BIC uses N = number of observations. See [R] IC note.

        Comment


        • #5
          This workaround only works if there are no omitted variables. In the case of perfect collinearity, for example, e(k) is too large:

          Code:
          . webuse productivity
          (Public capital productivity)
          
          . gen priv2 = private * 2
          
          . mixed gsp private priv2 emp hwy water other unemp || region: || state:, nostderr
          note: priv2 omitted because of collinearity.
          
          Performing EM optimization ...
          
          Performing gradient-based optimization:
          Iteration 0:  Log likelihood =  1430.5017  
          Iteration 1:  Log likelihood =  1430.5017  
          
          Mixed-effects ML regression                           Number of obs =      816
          
                  Grouping information
                  -------------------------------------------------------------
                                  |     No. of       Observations per group
                   Group variable |     groups    Minimum    Average    Maximum
                  ----------------+--------------------------------------------
                           region |          9         51       90.7        136
                            state |         48         17       17.0         17
                  -------------------------------------------------------------
          
                                                                Wald chi2(6)  = 18829.06
          Log likelihood =  1430.5017                           Prob > chi2   =   0.0000
          
          ------------------------------------------------------------------------------
                   gsp | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          -------------+----------------------------------------------------------------
               private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
                 priv2 |          0  (omitted)
                   emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
                   hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
                 water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
                 other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
                 unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
                 _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
          ------------------------------------------------------------------------------
          
          ------------------------------------------------------------------------------
            Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
          -----------------------------+------------------------------------------------
          region: Identity             |
                            var(_cons) |   .0014506          .             .           .
          -----------------------------+------------------------------------------------
          state: Identity              |
                            var(_cons) |   .0062757          .             .           .
          -----------------------------+------------------------------------------------
                         var(Residual) |   .0013461          .             .           .
          ------------------------------------------------------------------------------
          LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
          
          Note: LR test is conservative and provided only for reference.
          
          . local k = e(k)
          
          . estat ic, df(`k')
          
          Akaike's information criterion and Bayesian information criterion
          
          -----------------------------------------------------------------------------
                 Model |          N   ll(null)  ll(model)      df        AIC        BIC
          -------------+---------------------------------------------------------------
                     . |        816          .   1430.502      11  -2839.003  -2787.255
          -----------------------------------------------------------------------------
          Note: BIC uses N = number of observations. See [R] IC note.
          This is why, in general, obtaining the degrees of freedom from the rank of the variance-covariance matrix is the preferred way.
          https://www.kripfganz.de/stata/

          Comment


          • #6
            Interesting, and something I hadn't thought of. However, from a model selection perspective, I would think that a difference in AIC of (-2839.003 - -2841.003) = 2 between the model with the omitted collinear variable and the original one is appropriate, as the former is rightly penalised for the added but unnecessary complexity.

            Comment


            • #7
              Now I might have never fully understood the theoretical concept behind degrees of freedom but from what I understand,

              Originally posted by Alexander Koplenig View Post
              One could argue that this makes sense because the standard errors for the random effects are not calculated.
              is indeed a valid argument. If you estimate fewer parameters from the data, this should affect the degrees of freedom, right?* This, in turn, would be another argument for why

              Originally posted by Sebastian Kripfganz View Post
              in general, obtaining the degrees of freedom from the rank of the variance-covariance matrix is the preferred way.
              I can see that the behavior is unexpected but I am not sure whether it is actually invalid. Conversely: if the number of estimated parameters affects the degrees of freedom, would that render the suggested workarounds (if they actually worked) invalid?


              *Edit: I think I am wrong. Estimating the standard errors does not impose additional restrictions and thus should not affect the degrees of freedom.
              Last edited by daniel klein; 02 Jul 2024, 04:52.

              Comment


              • #8
                Given that the documentation says that
                By default, the number of degrees of freedom in estat ic is calculated as k = kf + kr , where kf and kr
                are the number of estimated fixed-effects and random-effects parameters, respectively
                I would argue that from a model selection perspective, the AIC values of identical mixed models should be identical whether or not the -nostderr- option is used.

                Comment


                • #9
                  Originally posted by daniel klein View Post
                  *Edit: I think I am wrong. Estimating the standard errors does not impose additional restrictions and thus should not affect the degrees of freedom.
                  Yes, "estimated parameters" are the coefficients, not the standard errors.
                  https://www.kripfganz.de/stata/

                  Comment

                  Working...
                  X