Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistent Margins with/without Offset

    Dear all,

    I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.

    Year Counts of CT Population size
    2015 6010 59188
    2016 8760 68875
    2017 9036 71747
    2018 10062 71373
    2019 10614 71373
    2020 12622 72725
    2021 13350 68828
    2022 12259 63612

    Initially, I used the code where the year was treated as a factor variable

    Code:
    nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) irr allbaselevels
    
    Negative binomial regression            Number of obs    = 552,366
                LR chi2(7)    = 2727.85
    Dispersion: mean            Prob > chi2    =  0.0000
    Log likelihood = -243592.72            Pseudo R2    =  0.0056
    
                    
    total_CT_AT         IRR   Std. err.    z    P>z    [95% conf.    interval]
                    
    pre_year_cat 
    2015            1  (base)
    2016     1.252569   .0229418    12.30    0.000    1.208402    1.298351
    2017     1.240314   .0225643    11.84    0.000    1.196868    1.285337
    2018     1.388384   .0248287    18.35    0.000    1.340563    1.43791
    2019     1.375061   .0243329    18.00    0.000    1.328187    1.423589
    2020     1.709242   .0295652    30.99    0.000    1.652267    1.768183
    2021     1.910185   .0328772    37.60    0.000    1.846821    1.975722
    2022     1.897908   .0331448    36.69    0.000    1.834045    1.963995
    
    _cons    .1015409   .0014213    -163.41    0.000    .098793    .1043651
                    
    /lnalpha     .558674   .0143598            .5305294    .5868187
                    
    alpha    1.748353    .025106            1.699832    1.798259
                    
    margins pre_year_cat, expression(1000*predict())
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    pre_year_cat |
           2015  |   101.5409   1.421312    71.44   0.000     98.75513    104.3266
           2016  |   127.1869    1.50242    84.65   0.000     124.2422    130.1316
           2017  |   125.9426   1.463519    86.05   0.000     123.0741     128.811
           2018  |   140.9777     1.5691    89.85   0.000     137.9023    144.0531
           2019  |   139.6248   1.511656    92.37   0.000      136.662    142.5876
           2020  |   173.5579   1.763705    98.41   0.000     170.1011    177.0147
           2021  |   193.9618   1.942604    99.85   0.000     190.1543    197.7692
           2022  |   192.7152   2.012535    95.76   0.000     188.7707    196.6597
    ------------------------------------------------------------------------------

    I did a hand calculation and it matches the above results.

    However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.

    Code:
    nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) offset(log_pre1) irr
    Negative binomial regression            Number of obs =      8
                LR chi2(6)    =  56.99
    Dispersion: mean            Prob > chi2   = 0.0000
    Log likelihood = -44.217095            Pseudo R2     = 0.3919
    
                
    total_CT_AT         IRR   Std. err.    z    P>z    [95% conf. interval]
                
    pre_year_cat 
    2016     1.252568   .0209798    13.44    0.000    1.212116     1.29437
    2017     1.240314   .0206451    12.94    0.000    1.200503    1.281444
    2018     1.388383   .0226342    20.13    0.000    1.344722    1.433461
    2019      1.37506   .0221979    19.73    0.000    1.332234    1.419263
    2020     1.709241   .0267875    34.20    0.000    1.657537    1.762559
    2021     1.910184   .0296722    41.66    0.000    1.852904    1.969234
    2022     1.897908    .029886    40.69    0.000    1.840227    1.957397
    
    _cons    .1015409   .0013098    -177.32    0.000    .0990059    .1041408
    log_pre1           1  (offset)
                
    /lnalpha   -20.24877          .            .           .
                
    alpha    1.61e-09          .            .           .
    
    margins pre_year_cat, expression(1000*predict())
    
    Expression: 1000*predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    pre_year_cat |
           2015  |    7010967   90435.87    77.52   0.000      6833716     7188218
           2016  |    8781715   93826.89    93.59   0.000      8597818     8965613
           2017  |    8695798   91478.98    95.06   0.000      8516502     8875093
           2018  |    9733907   97038.72   100.31   0.000      9543715     9924100
           2019  |    9640501   93575.05   103.02   0.000      9457097     9823904
           2020  |   1.20e+07   106663.8   112.35   0.000     1.18e+07    1.22e+07
           2021  |   1.34e+07   115907.7   115.54   0.000     1.32e+07    1.36e+07
           2022  |   1.33e+07   120178.2   110.72   0.000     1.31e+07    1.35e+07
    ------------------------------------------------------------------------------
    Why do the results of the margins, when an offset is included, do not match the hand calculation and the first approach?

    I tried using exposure ( as the code below) instead of offset, but the issue remains the same

    Code:
    nbreg total_CT_AT ib(first). pre_year_cat, dispersion(mean) exposure(pre1) irr
    
    Negative binomial regression            Number of obs =      8
                LR chi2(6)    =  56.99
    Dispersion: mean            Prob > chi2   = 0.0000
    Log likelihood = -44.217095            Pseudo R2     = 0.3919
    
                
    total_CT_AT         IRR   Std. err.    z    P>z    [95% conf. interval]
                
    pre_year_cat 
    2015            1  (base)
    2016     1.252569   .0209799    13.45    0.000    1.212117    1.294371
    2017     1.240314   .0206451    12.94    0.000    1.200503    1.281445
    2018     1.388384   .0226342    20.13    0.000    1.344723    1.433462
    2019     1.375061   .0221979    19.73    0.000    1.332235    1.419263
    2020     1.709242   .0267875    34.20    0.000    1.657538     1.76256
    2021     1.910184   .0296722    41.66    0.000    1.852904    1.969235
    2022     1.897908    .029886    40.69    0.000    1.840227    1.957397
    
    _cons    .1015409   .0013098    -177.32    0.000    .0990059    .1041407
    ln(pre1)           1  (exposure)
                
    /lnalpha   -20.24547          .            .           .
    
    margins pre_year_cat, expression(1000*predict())
    
    Expression: 1000*predict()
    
                    
    Delta-method
    Margin   std. err.    z    P>z    [95% conf.    interval]
                    
    pre_year_cat 
    2015      7010964   90435.83    77.52    0.000    6833713    7188215
    2016      8781717   93826.91    93.59    0.000    8597820    8965615
    2017      8695798   91478.98    95.06    0.000    8516502    8875093
    2018      9733910   97038.74    100.31    0.000    9543717    9924102
    2019      9640501   93575.05    103.02    0.000    9457097    9823905
    2020     1.20e+07   106663.9    112.35    0.000    1.18e+07    1.22e+07
    2021     1.34e+07   115907.7    115.54    0.000    1.32e+07    1.36e+07
    2022     1.33e+07   120178.1    110.72    0.000    1.31e+07    1.35e+07
    Any advice is appreciated
Working...
X