Understanding the bic in Stata's estat ic

Frank Taumann

Join Date: Feb 2017
Posts: 40

Understanding the bic in Stata's estat ic

13 Apr 2017, 08:41

I read here ( http://www.stata.com/statalist/archi.../msg00884.html ) that statas estat ic does not include a degree of freedom correction. This indeed seems to be the case. Here is an example, the BIC rises as the model gets larger but I cannot replicate the exact number (bic outputs in ps to not clutter the question):

Code:

*create some data
sysuse auto, clear
set seed 123
gen y= rbinomial(1,0.05)

*run four different logits and calculate the BIC's
logit y price
estat ic

logit y price mpg
estat ic

logit y price mpg headroom
estat ic

logit y price mpg headroom length
estat ic

However, in the documentation, this formula is given for the BIC: BIC = -2 lnL + k lnN. This has a degree of freedom correction in k (the number of regressors). I think the discussion about which N to choose has nothing to do with my issue (please correct me if I'm wrong). If I plug the log likelihood of the last model into the formula (Log likelihood = -122.75521, k=4, n= 528), I get: -2*(-122.75521)+4*log(528)=270.586 which is not identical to the 276.855 estat ic returns.

It is also far off from the glm BIC:

Code:

glm binvar packpc pop income tax, nolog fam(bin)

Since I have the log likelihood, I should be able to exactly calculate the BIC corrected for model size by hand, no? There should be no magic here!

Any hint would be great!
Frank

P.s.: Here the output for the BICs:

Code:

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        528  -123.442  -123.1639       2    250.3277   258.8659
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        528  -123.442  -123.1172       3    252.2345   265.0418
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        528  -123.442  -123.1162       4    254.2324   271.3087
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           . |        528  -123.442  -122.7552       5    255.5104   276.8559
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.

and the glm:

Code:

. glm binvar packpc pop income tax, fam(bin) link(logit)

Iteration 0:   log likelihood = -129.32473  
Iteration 1:   log likelihood = -122.79829  
Iteration 2:   log likelihood = -122.75525  
Iteration 3:   log likelihood = -122.75521  
Iteration 4:   log likelihood = -122.75521  

Generalized linear models                         No. of obs      =        528
Optimization     : ML                             Residual df     =        523
                                                  Scale parameter =          1
Deviance         =  245.5104226                   (1/df) Deviance =   .4694272
Pearson          =  526.2333414                   (1/df) Pearson  =   1.006182

Variance function: V(u) = u*(1-u)                 [Bernoulli]
Link function    : g(u) = ln(u/(1-u))             [Logit]

                                                  AIC             =   .4839213
Log likelihood   = -122.7552113                   BIC             =  -3033.227

------------------------------------------------------------------------------
             |                 OIM
      binvar |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      packpc |  -.0096746   .0096484    -1.00   0.316    -.0285852     .009236
         pop |  -8.73e-08   1.92e-07    -0.45   0.650    -4.64e-07    2.90e-07
      income |   3.83e-09   9.04e-09     0.42   0.671    -1.39e-08    2.15e-08
         tax |  -.0169302   .0201709    -0.84   0.401    -.0564645    .0226041
       _cons |  -.9446007   1.551501    -0.61   0.543    -3.985486    2.096284
------------------------------------------------------------------------------

.
end of do-file

Last edited by Frank Taumann; 13 Apr 2017, 08:45.

Tags: None

Richard Williams

Join Date: Apr 2014

Posts: 4953
#2

13 Apr 2017, 09:27

I think you have posted the wrong example. The first code uses the auto data set and then the rest of the code uses some other dataset.

There are different formulas for BIC, If you follow the glm command with an estat ic command, my guess is that that particular discrepancy will go away.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4953
#3

13 Apr 2017, 09:36

Also, you are supposed to be including the constant as a parameter estimated, so you should use k = 5, not k = 4. So, fixing your hand calculation,

. di -2*(-122.75521)+5*ln(528)
276.8559

which matches up perfectly with what Stata reported.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4953
#4

13 Apr 2017, 09:39

If you look at the methods and formulas for estat ic, it says "k is the number of parameters estimated." And the constant is an estimated parameter.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Frank Taumann

Join Date: Feb 2017
Posts: 40

13 Apr 2017, 10:07

Thank you so much. Sorry about the error, here is the correct code for glm:

Code:

glm y price mpg headroom length, fam(bin) link(logit)

and indeed, that followed by

Code:

estat ic

leads to the same result. Thanks also for the hint to include the constant. For future reference: here is the relevant part of the help file of glm explaining the BIC based on Raftery(1995) there:

Code:

glm and binreg, ml use the following formulas to compute the values of AIC
    and BIC:

        AIC = (-2lnL + 2k)/N
 
        BIC = D2 - (N-k)ln(N)

    where lnL and D2 are the overall likelihood and the overall deviance,
    reported by glm, k is the number of parameters of the model, and N-k is
    the degrees of freedom associated with the deviance D2.  These formulas
    are from Akaike (1973) and Raftery (1995), respectively.

Comment

Frank Taumann

Join Date: Feb 2017

Posts: 40
#6

13 Apr 2017, 11:10

another addendum for future reference: The relative sizes of the original BIC and the Raftery 1995 BIC are identical when comparing models. The difference is only in levels. For most applications it should therefore not matter much which one you use. Raftery (1995) also explicitely mentions this in the context of logit models.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4953
#7

13 Apr 2017, 11:22

Yes, there are different ways to compute BIC. For model comparisons, the key thing is to be consistent in which formula you use. I discuss this in Appx C of

http://www3.nd.edu/~rwilliam/xsoc73994/L05.pdf

Long & Freese's fitstat command (discussed in the above) can give results for different formulas if you want them.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Understanding the bic in Stata's estat ic

Comment

Comment

Comment

Comment

Comment

Comment