Adjustment of seasonality in Negative Binomial Regression Model

Nick Cox

Join Date: Mar 2014

Posts: 35451
#16

02 Mar 2022, 18:01

Discussion of sines and cosines may provoke memories of https://www.stata-journal.com/articl...article=st0116 In essence,

1. A pair of sine and cosine terms is best considered a double act, like say ... well, insert your own favourites. https://en.wikipedia.org/wiki/Double_act

2. One sine and one cosine term will match a cycle with peak and trough 6 months apart. With environmental data I sometimes find a need for 2 or 3 pairs. Even 3 pairs with 6 coefficients to estimate is more parsimonious than 12 predictors, one for each month.

3. If you have say a spike (up or down) in one month sines and cosines won't work so well. This is precisely why indicator variables for each month are often needed for socio-economic data in which December or August or any month with holiday or vacation effects may be awkward to handle.

Last edited by Nick Cox; 02 Mar 2022, 18:27.
1 like
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#17

02 Mar 2022, 19:52

Discussion of sines and cosines may provoke memories of https://www.stata-journal.com/articl...article=st0116 In essence,

1. A pair of sine and cosine terms is best considered a double act, like say ... well, insert your own favourites. https://en.wikipedia.org/wiki/Double_act

2. One sine and one cosine term will match a cycle with peak and trough 6 months apart. With environmental data I sometimes find a need for 2 or 3 pairs. Even 3 pairs with 6 coefficients to estimate is more parsimonious than 12 predictors, one for each month.

3. If you have say a spike (up or down) in one month sines and cosines won't work so well. This is precisely why indicator variables for each month are often needed for socio-economic data in which December or August or any month with holiday or vacation effects may be awkward to handle.

Thanks.
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#18

02 Mar 2022, 19:56

Originally posted by Clyde Schechter View Post

That's fine. I'm not sure why you superimposed the normal curve on the histogram: there is no reason to expect the residuals to have a normal distribution in this kind of regression.

Thanks. As far normality curve is concerned, is not it preferable to see normal distribution of residuals after fitting the model? I was considering that more normal residuals, better the model is. No? Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#19

02 Mar 2022, 20:44

As far normality curve is concerned, is not it preferable to see normal distribution of residuals after fitting the model? I was considering that more normal residuals, better the model is. No?

No. After ordinary least squares linear regression, in a small sample, a normal distribution supports the use of the usual normal-theory based t-statistics. Actually, even with ordinary least squares linear regression, if the sample is large, the distribution of the residuals doesn't really matter--if normal, then the use of t-statistics is "exact." But even if not normal, in a large sample, the central limit theorem will still support the use of z-statistics (and in a large sample t -> z). So with a linear regression it can be nice to see a normal residual distribution, but it's of diminishing importance as the sample size grows.

In the generalized linear models, there isn't even any theoretically interesting role for the normal distribution. The residual distributions are complicated and not easy to write down. Yes they will generally be sort-of bell-shaped in appearance at a quick glance. But if you look at them in greater detail, they are typically not symmetrical and may have very different tails from a normal.

Last edited by Clyde Schechter; 02 Mar 2022, 20:55.
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#20

03 Mar 2022, 10:07

Originally posted by Clyde Schechter View Post

No. After ordinary least squares linear regression, in a small sample, a normal distribution supports the use of the usual normal-theory based t-statistics. Actually, even with ordinary least squares linear regression, if the sample is large, the distribution of the residuals doesn't really matter--if normal, then the use of t-statistics is "exact." But even if not normal, in a large sample, the central limit theorem will still support the use of z-statistics (and in a large sample t -> z). So with a linear regression it can be nice to see a normal residual distribution, but it's of diminishing importance as the sample size grows.

In the generalized linear models, there isn't even any theoretically interesting role for the normal distribution. The residual distributions are complicated and not easy to write down. Yes they will generally be sort-of bell-shaped in appearance at a quick glance. But if you look at them in greater detail, they are typically not symmetrical and may have very different tails from a normal.

Thanks a lot, Dr. Schchter. I have two additional questions if you can help me understanding:
How can we check autocorrelation and heteroskedasticity of the residuals after this model. nbreg outcome predictor i.month. Apparently, Breusch–Godfrey and Breusch–Pagan tests could be used but I am not sure how can we run those after running the model. Any leads?

Can/should we also estimate marginal effects after the model and interpret them? If so, how can that be done?

Thanks in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#21

03 Mar 2022, 11:29

Heteroskedasticity is expected with a negative binomial model. In fact, if you didn't have it, there would be something wrong! So you don't need to test for that.

If there is a concern about autocorrelation, rather than testing for it I would re-run the regression using -glm- instead of the -nbreg- convenience command, and use standard errors that are robust to autocorrelation. So it would probably look something like this:

Code:

tsset t glm outcome i.predictor seasonal_adjustment_variable(s), link(log) family(nbinomial) vce(hac nwest)

Can/should we also estimate marginal effects after the model and interpret them? If so, how can that be done?

Yes. The -margins- command is somewhat complicated, and you don't say which marginal effects you are interested in here. If it's the marginal effect of the variable predictor, it would be simply

Code:

margins, dydx(predictor)

to get the expected difference (average marginal effect) in the number of cases of the outcome disease between predictor = 1 and predictor = 0 times.
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#22

03 Mar 2022, 23:39

The command below shows the "Robust SE" from NBREG model instead of SE.

Code:

nbreg outcome predictor i.month, r

What is the difference between Robust SE and HAC Newey West SE?

I created GLM model for HAC Newey West SE using below command

Code:

. glm outcome predictor i.month, family(nbinomial 1) link(log) vce(hac nwest)

The coefficient remains the same as of NBREG model but SE changes. I just now need to finalize if Robust SE needs to be chosen or HAC Newey West SE needs to be preferred.
Comment

Waseem Shaukat

Join Date: Feb 2022
Posts: 23

#23

04 Mar 2022, 08:58

in enclose below output from both models (NBREG using Robust SE) and GLM using HAC SE below. I notice that point estimates of coefficient are same but SE and 95% CI values change. However, the log likelihood of NBREG model is better than GLM model. What do you think should be used?

HTML Code:

. nbreg outcome predictor i.month, r
 
Fitting Poisson model:
 
Iteration 0:   log pseudolikelihood = -897751.26 
Iteration 1:   log pseudolikelihood = -896984.79 
Iteration 2:   log pseudolikelihood = -896984.73 
 
Fitting constant-only model:
 
Iteration 0:   log pseudolikelihood =  -568.2126 
Iteration 1:   log pseudolikelihood =  -534.6042 
Iteration 2:   log pseudolikelihood = -534.59459 
Iteration 3:   log pseudolikelihood = -534.59459 
 
Fitting full model:
 
Iteration 0:   log pseudolikelihood = -523.04052 
Iteration 1:   log pseudolikelihood = -515.29188 
Iteration 2:   log pseudolikelihood = -514.59607 
Iteration 3:   log pseudolikelihood = -514.58245 
Iteration 4:   log pseudolikelihood = -514.58245 
 
Negative binomial regression                            Number of obs =     36
                                                        Wald chi2(12) =  47.78
Dispersion: mean                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -514.58245                       Pseudo R2     = 0.0374
 
------------------------------------------------------------------------------
             |               Robust
     outcome | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   predictor |  -.0193398   .0037682    -5.13   0.000    -.0267253   -.0119544
             |
       month |
          2  |  -.0268557   .0261563    -1.03   0.305     -.078121    .0244097
          3  |  -.0507621   .0685009    -0.74   0.459    -.1850215    .0834972
          4  |  -.1700029   .1441665    -1.18   0.238    -.4525642    .1125583
          5  |  -.2218715   .1324537    -1.68   0.094    -.4814759    .0377329
          6  |  -.1656603   .1191594    -1.39   0.164    -.3992085    .0678879
          7  |  -.3011788   .0991624    -3.04   0.002    -.4955335   -.1068242
          8  |   -.330806   .1095939    -3.02   0.003    -.5456062   -.1160059
          9  |  -.1680422   .0584821    -2.87   0.004    -.2826651   -.0534194
         10  |   .0160934   .0729883     0.22   0.825    -.1269611    .1591479
         11  |  -.0733032   .0826868    -0.89   0.375    -.2353664      .08876
         12  |  -.0221653    .013235    -1.67   0.094    -.0481054    .0037748
             |
       _cons |   14.97863   .0254589   588.35   0.000     14.92873    15.02853
-------------+----------------------------------------------------------------
    /lnalpha |  -3.754159   .2268579                     -4.198792   -3.309526
-------------+----------------------------------------------------------------
       alpha |   .0234201    .005313                      .0150137    .0365335
------------------------------------------------------------------------------

GLM Model

HTML Code:

. glm outcome predictor i.month, family(nbinomial 1) link(log) vce(hac nwest)

Iteration 0:   log likelihood = -567.36146  
Iteration 1:   log likelihood = -567.35945  
Iteration 2:   log likelihood = -567.35945  

Generalized linear models                         Number of obs   =         36
Optimization     : ML                             Residual df     =         23
                                                  Scale parameter =          1
Deviance         =  .8463594756                   (1/df) Deviance =   .0367982
Pearson          =  .8105851992                   (1/df) Pearson  =   .0352428

Variance function: V(u) = u+(1)u^2                [Neg. Binomial]
Link function    : g(u) = ln(u)                   [Log]

HAC kernel (lags): Newey–West (34)
                                                  AIC             =   32.24219
Log likelihood   = -567.3594522                   BIC             =  -81.57458

------------------------------------------------------------------------------
             |                 HAC
     outcome | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   predictor |    -.01934   .0009781   -19.77   0.000    -.0212571   -.0174229
             |
       month |
          2  |  -.0268555   .0236233    -1.14   0.256    -.0731564    .0194454
          3  |  -.0507627   .0490326    -1.04   0.301    -.1468649    .0453394
          4  |  -.1700044   .1101501    -1.54   0.123    -.3858946    .0458859
          5  |  -.2218752   .1033597    -2.15   0.032    -.4244566   -.0192939
          6  |  -.1656637   .0958607    -1.73   0.084    -.3535472    .0222197
          7  |  -.3011812   .0756699    -3.98   0.000    -.4494915   -.1528709
          8  |  -.3308069   .0835307    -3.96   0.000    -.4945241   -.1670897
          9  |  -.1680428    .041486    -4.05   0.000    -.2493539   -.0867317
         10  |   .0160927   .0546617     0.29   0.768    -.0910423    .1232276
         11  |  -.0733048   .0627009    -1.17   0.242    -.1961964    .0495867
         12  |  -.0221651   .0107537    -2.06   0.039     -.043242   -.0010883
             |
       _cons |   14.97863   .0067672  2213.40   0.000     14.96537    14.99189
------------------------------------------------------------------------------

Thanks for your comment in advance.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#24

04 Mar 2022, 09:49

Robust standard errors are not robust to autocorrelation; Newey-West standard errors are. Your observation that the coefficients agree is correct: the same model is being estimated, but different corrections are applied to the calculation of standard errors to accommodate the violation of the usual assumption underlying maximum likelihood estimation that residuals are independent.

I would not attach any meaning to the difference in the log-likelihood given by -glm- and by -nbreg-. Both programs are trying to maximize a likelihood function. But sometimes two different programs will approach it differently: for example one may ignore a constant multiplier that the other includes. That leads to finding the same maximizing solution, but the "log-likelihoods" reported are different. Actually, given the complexity of the true likelihood function for a negative binomial model, I would guess that no program actually maximizes it directly--it would be too computationally intensive to do that. So various monotone transformations are applied to simplify the calculations. (Indeed, even using a log-likelihood instead of the likelihood itself is such a transformation--we just tend to take it for granted because everybody does it.) So there is nothing to be learned by comparing the reported log-likelihoods of two programs.
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#25

04 Mar 2022, 10:12

Originally posted by Clyde Schechter View Post

Robust standard errors are not robust to autocorrelation; Newey-West standard errors are. Your observation that the coefficients agree is correct: the same model is being estimated, but different corrections are applied to the calculation of standard errors to accommodate the violation of the usual assumption underlying maximum likelihood estimation that residuals are independent.

I would not attach any meaning to the difference in the log-likelihood given by -glm- and by -nbreg-. Both programs are trying to maximize a likelihood function. But sometimes two different programs will approach it differently: for example one may ignore a constant multiplier that the other includes. That leads to finding the same maximizing solution, but the "log-likelihoods" reported are different. Actually, given the complexity of the true likelihood function for a negative binomial model, I would guess that no program actually maximizes it directly--it would be too computationally intensive to do that. So various monotone transformations are applied to simplify the calculations. (Indeed, even using a log-likelihood instead of the likelihood itself is such a transformation--we just tend to take it for granted because everybody does it.) So there is nothing to be learned by comparing the reported log-likelihoods of two programs.

Thanks. This is much helpful. I really appreciate.

I am sorry if I am bothering you again and again but thinking to learn and let this thread be a learning for many like me in future, I have yet another question.

I run acrtest command to test for autocorrelation in my outcome variable. Below is the output. I need to understand what is the difference between tests/p-values on the left columns (range lags) and the right columns (individual lags). I have tried to comprehend the interpretition of these but thought to get your insight to really understand. Would you mind guiding on the difference between left hand side of this output and the right hand side. And what do I conclude from this output and lastly, running a -glm- with -hac- SEs as done earlier will actually handle this autocorrelation.

HTML Code:

. actest outcome, lag(12) Cumby-Huizinga test for autocorrelation H0: disturbance is MA process up to order q HA: serial correlation present at specified lags >q ----------------------------------------------------------------------------- H0: q=0 (serially uncorrelated) | H0: q=specified lag-1 HA: s.c. present at range specified | HA: s.c. present at lag specified -----------------------------------------+----------------------------------- lags | chi2 df p-val | lag | chi2 df p-val -----------+-----------------------------+-----+----------------------------- 1 - 1 | 22.872 1 0.0000 | 1 | 22.872 1 0.0000 1 - 2 | 23.421 2 0.0000 | 2 | 5.028 1 0.0249 1 - 3 | 23.511 3 0.0000 | 3 | 1.519 1 0.2178 1 - 4 | 23.536 4 0.0001 | 4 | 0.354 1 0.5516 1 - 5 | 23.596 5 0.0003 | 5 | 0.078 1 0.7797 1 - 6 | 23.706 6 0.0006 | 6 | 0.057 1 0.8119 1 - 7 | 23.837 7 0.0012 | 7 | 0.033 1 0.8557 1 - 8 | 23.903 8 0.0024 | 8 | 0.005 1 0.9435 1 - 9 | 24.207 9 0.0040 | 9 | 0.032 1 0.8583 1 - 10 | 24.474 10 0.0064 | 10 | 0.226 1 0.6346 1 - 11 | 24.507 11 0.0108 | 11 | 0.535 1 0.4646 1 - 12 | 24.588 12 0.0169 | 12 | 0.594 1 0.4407 ----------------------------------------------------------------------------- Test requires conditional homoskedasticity
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#26

04 Mar 2022, 14:42

I am not familiar with -actest-. I have looked at its help file on SSC and will give you some tentative advice based on that.

First, you have performed the test on the outcome variable itself. That's probably not relevant. More relevant is to run it on the residuals of your regression, for that is what matters for the present purpose.

As for interpreting the output table, the results on the right side of the table are tests for serial correlation at the particular lag level. So in your results, the chi2 statistic for serial correlation at lag 1 is 22.872. For serial correlation at lag 2 it is 5.028, etc. The results on the right side cover ranges of lags. So, looking at the first row, there we see the test for serial correlation at any range between 1 and 1--and, unsurprisingly, the result is exactly the same as on the right side. But looking at the next row, we see a test for serial correlation at any range between 1 and 2. That is really two tests combined (in a sense): one for lag1 and another for lag 2. So the test for correlation somewhere in the lag 1 to lag 2 range comes out at 23.421. (Notice it has 2 df, not 1). Similarly as we get to the bottom, the test for serial correlation at any range between 1 and 12 comes to 24.588 with 12 df.

I believe that running the -glm- with -vce(hac nwest)- will deal with this autocorrelation. But to see if it does, try running -actest- on the residuals of the regression.
Comment
Waseem Shaukat

Join Date: Feb 2022

Posts: 23
#27

04 Mar 2022, 15:20

Originally posted by Clyde Schechter View Post

I am not familiar with -actest-. I have looked at its help file on SSC and will give you some tentative advice based on that.

First, you have performed the test on the outcome variable itself. That's probably not relevant. More relevant is to run it on the residuals of your regression, for that is what matters for the present purpose.

As for interpreting the output table, the results on the right side of the table are tests for serial correlation at the particular lag level. So in your results, the chi2 statistic for serial correlation at lag 1 is 22.872. For serial correlation at lag 2 it is 5.028, etc. The results on the right side cover ranges of lags. So, looking at the first row, there we see the test for serial correlation at any range between 1 and 1--and, unsurprisingly, the result is exactly the same as on the right side. But looking at the next row, we see a test for serial correlation at any range between 1 and 2. That is really two tests combined (in a sense): one for lag1 and another for lag 2. So the test for correlation somewhere in the lag 1 to lag 2 range comes out at 23.421. (Notice it has 2 df, not 1). Similarly as we get to the bottom, the test for serial correlation at any range between 1 and 12 comes to 24.588 with 12 df.

I believe that running the -glm- with -vce(hac nwest)- will deal with this autocorrelation. But to see if it does, try running -actest- on the residuals of the regression.

Thanks. What command could I use to generate "residuals" after the "glm" model and whether to generate pearson residuals or deviance residuals. Subsequently, I could run -actest- on generated residuals.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#28

04 Mar 2022, 15:26

Code:

predict residual, pearson
Comment

Waseem Shaukat

Join Date: Feb 2022
Posts: 23

#29

04 Mar 2022, 15:34

Originally posted by Clyde Schechter View Post

Code:

predict residual, pearson

I have done and below is the output.

Code:

 predict residual, pearson
. actest residual, lag(12)

Cumby-Huizinga test for autocorrelation
  H0: disturbance is MA process up to order q
  HA: serial correlation present at specified lags >q
-----------------------------------------------------------------------------
  H0: q=0 (serially uncorrelated)        |  H0: q=specified lag-1
  HA: s.c. present at range specified    |  HA: s.c. present at lag specified
-----------------------------------------+-----------------------------------
    lags   |      chi2      df     p-val | lag |      chi2      df     p-val
-----------+-----------------------------+-----+-----------------------------
   1 -  1  |      3.557      1    0.0593 |   1 |      3.557      1    0.0593
   1 -  2  |      3.560      2    0.1686 |   2 |      0.356      1    0.5507
   1 -  3  |      7.368      3    0.0611 |   3 |      3.508      1    0.0611
   1 -  4  |      7.975      4    0.0925 |   4 |      2.710      1    0.0997
   1 -  5  |      7.976      5    0.1576 |   5 |      0.429      1    0.5127
   1 -  6  |      8.689      6    0.1919 |   6 |      1.452      1    0.2282
   1 -  7  |      9.277      7    0.2334 |   7 |      0.384      1    0.5356
   1 -  8  |      9.385      8    0.3108 |   8 |      0.017      1    0.8972
   1 -  9  |     11.302      9    0.2556 |   9 |      0.041      1    0.8398
   1 - 10  |     11.306      10   0.3342 |  10 |      0.022      1    0.8824
   1 - 11  |     11.474      11   0.4044 |  11 |      0.037      1    0.8466
   1 - 12  |     11.632      12   0.4757 |  12 |      0.190      1    0.6633
-----------------------------------------------------------------------------
  Test requires conditional homoskedasticity

Apparantly it looks like there is no autocorrelation in the residuals, hence the problem is solved. Please correct me if I am wrong.

Can I also visualize these residuals through scatter plots or boxplots or anyother visualization tool to look validate model fitting and assumptions? If so, I will appreciate if you can guide with commands.

Thanks again.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#30

04 Mar 2022, 16:31

Yes, this looks fine.

You can graph these residuals, but I think it is hard to see what is going on when you do that. Unlike the residual plots that are sometimes done after linear regression, the residuals of a negative binomial regression are not going to be patterned in a way that is visually comprehensible. If you want to graph something, I would just plot the predicted values vs the observed values. The -predict- command will give you the predictive values. And then -graph twoway scatter- will produce the graphs. Nothing fancy or special needed here. The purpose here is not to "test assumptions" or anything like that--it's just to get a general impression of how the model looks compared with the observed values.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment