Inconsistent Margins with/without Offset

Fouziah Almouqati

Join Date: Oct 2023
Posts: 13

Inconsistent Margins with/without Offset

18 Apr 2024, 08:45

Dear all,

I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.

Year	Counts of CT	Population size
2015	6010	59188
2016	8760	68875
2017	9036	71747
2018	10062	71373
2019	10614	71373
2020	12622	72725
2021	13350	68828
2022	12259	63612

Initially, I used the code where the year was treated as a factor variable

Code:

nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) irr allbaselevels

Negative binomial regression            Number of obs    = 552,366
            LR chi2(7)    = 2727.85
Dispersion: mean            Prob > chi2    =  0.0000
Log likelihood = -243592.72            Pseudo R2    =  0.0056

                
total_CT_AT         IRR   Std. err.    z    P>z    [95% conf.    interval]
                
pre_year_cat 
2015            1  (base)
2016     1.252569   .0229418    12.30    0.000    1.208402    1.298351
2017     1.240314   .0225643    11.84    0.000    1.196868    1.285337
2018     1.388384   .0248287    18.35    0.000    1.340563    1.43791
2019     1.375061   .0243329    18.00    0.000    1.328187    1.423589
2020     1.709242   .0295652    30.99    0.000    1.652267    1.768183
2021     1.910185   .0328772    37.60    0.000    1.846821    1.975722
2022     1.897908   .0331448    36.69    0.000    1.834045    1.963995

_cons    .1015409   .0014213    -163.41    0.000    .098793    .1043651
                
/lnalpha     .558674   .0143598            .5305294    .5868187
                
alpha    1.748353    .025106            1.699832    1.798259
                
margins pre_year_cat, expression(1000*predict())

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
pre_year_cat |
       2015  |   101.5409   1.421312    71.44   0.000     98.75513    104.3266
       2016  |   127.1869    1.50242    84.65   0.000     124.2422    130.1316
       2017  |   125.9426   1.463519    86.05   0.000     123.0741     128.811
       2018  |   140.9777     1.5691    89.85   0.000     137.9023    144.0531
       2019  |   139.6248   1.511656    92.37   0.000      136.662    142.5876
       2020  |   173.5579   1.763705    98.41   0.000     170.1011    177.0147
       2021  |   193.9618   1.942604    99.85   0.000     190.1543    197.7692
       2022  |   192.7152   2.012535    95.76   0.000     188.7707    196.6597
------------------------------------------------------------------------------

I did a hand calculation and it matches the above results.

However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.

Code:

nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) offset(log_pre1) irr
Negative binomial regression            Number of obs =      8
            LR chi2(6)    =  56.99
Dispersion: mean            Prob > chi2   = 0.0000
Log likelihood = -44.217095            Pseudo R2     = 0.3919

            
total_CT_AT         IRR   Std. err.    z    P>z    [95% conf. interval]
            
pre_year_cat 
2016     1.252568   .0209798    13.44    0.000    1.212116     1.29437
2017     1.240314   .0206451    12.94    0.000    1.200503    1.281444
2018     1.388383   .0226342    20.13    0.000    1.344722    1.433461
2019      1.37506   .0221979    19.73    0.000    1.332234    1.419263
2020     1.709241   .0267875    34.20    0.000    1.657537    1.762559
2021     1.910184   .0296722    41.66    0.000    1.852904    1.969234
2022     1.897908    .029886    40.69    0.000    1.840227    1.957397

_cons    .1015409   .0013098    -177.32    0.000    .0990059    .1041408
log_pre1           1  (offset)
            
/lnalpha   -20.24877          .            .           .
            
alpha    1.61e-09          .            .           .

margins pre_year_cat, expression(1000*predict())

Expression: 1000*predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
pre_year_cat |
       2015  |    7010967   90435.87    77.52   0.000      6833716     7188218
       2016  |    8781715   93826.89    93.59   0.000      8597818     8965613
       2017  |    8695798   91478.98    95.06   0.000      8516502     8875093
       2018  |    9733907   97038.72   100.31   0.000      9543715     9924100
       2019  |    9640501   93575.05   103.02   0.000      9457097     9823904
       2020  |   1.20e+07   106663.8   112.35   0.000     1.18e+07    1.22e+07
       2021  |   1.34e+07   115907.7   115.54   0.000     1.32e+07    1.36e+07
       2022  |   1.33e+07   120178.2   110.72   0.000     1.31e+07    1.35e+07
------------------------------------------------------------------------------

Why do the results of the margins, when an offset is included, do not match the hand calculation and the first approach?

I tried using exposure ( as the code below) instead of offset, but the issue remains the same

Code:

nbreg total_CT_AT ib(first). pre_year_cat, dispersion(mean) exposure(pre1) irr

Negative binomial regression            Number of obs =      8
            LR chi2(6)    =  56.99
Dispersion: mean            Prob > chi2   = 0.0000
Log likelihood = -44.217095            Pseudo R2     = 0.3919

            
total_CT_AT         IRR   Std. err.    z    P>z    [95% conf. interval]
            
pre_year_cat 
2015            1  (base)
2016     1.252569   .0209799    13.45    0.000    1.212117    1.294371
2017     1.240314   .0206451    12.94    0.000    1.200503    1.281445
2018     1.388384   .0226342    20.13    0.000    1.344723    1.433462
2019     1.375061   .0221979    19.73    0.000    1.332235    1.419263
2020     1.709242   .0267875    34.20    0.000    1.657538     1.76256
2021     1.910184   .0296722    41.66    0.000    1.852904    1.969235
2022     1.897908    .029886    40.69    0.000    1.840227    1.957397

_cons    .1015409   .0013098    -177.32    0.000    .0990059    .1041407
ln(pre1)           1  (exposure)
            
/lnalpha   -20.24547          .            .           .

margins pre_year_cat, expression(1000*predict())

Expression: 1000*predict()

                
Delta-method
Margin   std. err.    z    P>z    [95% conf.    interval]
                
pre_year_cat 
2015      7010964   90435.83    77.52    0.000    6833713    7188215
2016      8781717   93826.91    93.59    0.000    8597820    8965615
2017      8695798   91478.98    95.06    0.000    8516502    8875093
2018      9733910   97038.74    100.31    0.000    9543717    9924102
2019      9640501   93575.05    103.02    0.000    9457097    9823905
2020     1.20e+07   106663.9    112.35    0.000    1.18e+07    1.22e+07
2021     1.34e+07   115907.7    115.54    0.000    1.32e+07    1.36e+07
2022     1.33e+07   120178.1    110.72    0.000    1.31e+07    1.35e+07

Any advice is appreciated

Tags: None

Announcement

Inconsistent Margins with/without Offset