Dear all,
I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.
Initially, I used the code where the year was treated as a factor variable
I did a hand calculation and it matches the above results.
However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.
Why do the results of the margins, when an offset is included, do not match the hand calculation and the first approach?
I tried using exposure ( as the code below) instead of offset, but the issue remains the same
Any advice is appreciated
I aim to predict the annual CT scan counts from 2015 to 2022. My dataset rows are per presentation with year and CT scan count, and many other variables are omitted for simplicity. Below is a summary of the total CT scans and annual population per year.
Year | Counts of CT | Population size |
2015 | 6010 | 59188 |
2016 | 8760 | 68875 |
2017 | 9036 | 71747 |
2018 | 10062 | 71373 |
2019 | 10614 | 71373 |
2020 | 12622 | 72725 |
2021 | 13350 | 68828 |
2022 | 12259 | 63612 |
Initially, I used the code where the year was treated as a factor variable
Code:
nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) irr allbaselevels Negative binomial regression Number of obs = 552,366 LR chi2(7) = 2727.85 Dispersion: mean Prob > chi2 = 0.0000 Log likelihood = -243592.72 Pseudo R2 = 0.0056 total_CT_AT IRR Std. err. z P>z [95% conf. interval] pre_year_cat 2015 1 (base) 2016 1.252569 .0229418 12.30 0.000 1.208402 1.298351 2017 1.240314 .0225643 11.84 0.000 1.196868 1.285337 2018 1.388384 .0248287 18.35 0.000 1.340563 1.43791 2019 1.375061 .0243329 18.00 0.000 1.328187 1.423589 2020 1.709242 .0295652 30.99 0.000 1.652267 1.768183 2021 1.910185 .0328772 37.60 0.000 1.846821 1.975722 2022 1.897908 .0331448 36.69 0.000 1.834045 1.963995 _cons .1015409 .0014213 -163.41 0.000 .098793 .1043651 /lnalpha .558674 .0143598 .5305294 .5868187 alpha 1.748353 .025106 1.699832 1.798259 margins pre_year_cat, expression(1000*predict()) ------------------------------------------------------------------------------ | Delta-method | Margin std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- pre_year_cat | 2015 | 101.5409 1.421312 71.44 0.000 98.75513 104.3266 2016 | 127.1869 1.50242 84.65 0.000 124.2422 130.1316 2017 | 125.9426 1.463519 86.05 0.000 123.0741 128.811 2018 | 140.9777 1.5691 89.85 0.000 137.9023 144.0531 2019 | 139.6248 1.511656 92.37 0.000 136.662 142.5876 2020 | 173.5579 1.763705 98.41 0.000 170.1011 177.0147 2021 | 193.9618 1.942604 99.85 0.000 190.1543 197.7692 2022 | 192.7152 2.012535 95.76 0.000 188.7707 196.6597 ------------------------------------------------------------------------------
I did a hand calculation and it matches the above results.
However, I have been asked to collapse the data by time instead of presentation. Each row must now include the annual count of CT scans and population size. Then, to run the negative binomial regression using an offset.
Code:
nbreg total_CT_AT ib(first).pre_year_cat, dispersion(mean) offset(log_pre1) irr Negative binomial regression Number of obs = 8 LR chi2(6) = 56.99 Dispersion: mean Prob > chi2 = 0.0000 Log likelihood = -44.217095 Pseudo R2 = 0.3919 total_CT_AT IRR Std. err. z P>z [95% conf. interval] pre_year_cat 2016 1.252568 .0209798 13.44 0.000 1.212116 1.29437 2017 1.240314 .0206451 12.94 0.000 1.200503 1.281444 2018 1.388383 .0226342 20.13 0.000 1.344722 1.433461 2019 1.37506 .0221979 19.73 0.000 1.332234 1.419263 2020 1.709241 .0267875 34.20 0.000 1.657537 1.762559 2021 1.910184 .0296722 41.66 0.000 1.852904 1.969234 2022 1.897908 .029886 40.69 0.000 1.840227 1.957397 _cons .1015409 .0013098 -177.32 0.000 .0990059 .1041408 log_pre1 1 (offset) /lnalpha -20.24877 . . . alpha 1.61e-09 . . . margins pre_year_cat, expression(1000*predict()) Expression: 1000*predict() ------------------------------------------------------------------------------ | Delta-method | Margin std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- pre_year_cat | 2015 | 7010967 90435.87 77.52 0.000 6833716 7188218 2016 | 8781715 93826.89 93.59 0.000 8597818 8965613 2017 | 8695798 91478.98 95.06 0.000 8516502 8875093 2018 | 9733907 97038.72 100.31 0.000 9543715 9924100 2019 | 9640501 93575.05 103.02 0.000 9457097 9823904 2020 | 1.20e+07 106663.8 112.35 0.000 1.18e+07 1.22e+07 2021 | 1.34e+07 115907.7 115.54 0.000 1.32e+07 1.36e+07 2022 | 1.33e+07 120178.2 110.72 0.000 1.31e+07 1.35e+07 ------------------------------------------------------------------------------
I tried using exposure ( as the code below) instead of offset, but the issue remains the same
Code:
nbreg total_CT_AT ib(first). pre_year_cat, dispersion(mean) exposure(pre1) irr Negative binomial regression Number of obs = 8 LR chi2(6) = 56.99 Dispersion: mean Prob > chi2 = 0.0000 Log likelihood = -44.217095 Pseudo R2 = 0.3919 total_CT_AT IRR Std. err. z P>z [95% conf. interval] pre_year_cat 2015 1 (base) 2016 1.252569 .0209799 13.45 0.000 1.212117 1.294371 2017 1.240314 .0206451 12.94 0.000 1.200503 1.281445 2018 1.388384 .0226342 20.13 0.000 1.344723 1.433462 2019 1.375061 .0221979 19.73 0.000 1.332235 1.419263 2020 1.709242 .0267875 34.20 0.000 1.657538 1.76256 2021 1.910184 .0296722 41.66 0.000 1.852904 1.969235 2022 1.897908 .029886 40.69 0.000 1.840227 1.957397 _cons .1015409 .0013098 -177.32 0.000 .0990059 .1041407 ln(pre1) 1 (exposure) /lnalpha -20.24547 . . . margins pre_year_cat, expression(1000*predict()) Expression: 1000*predict() Delta-method Margin std. err. z P>z [95% conf. interval] pre_year_cat 2015 7010964 90435.83 77.52 0.000 6833713 7188215 2016 8781717 93826.91 93.59 0.000 8597820 8965615 2017 8695798 91478.98 95.06 0.000 8516502 8875093 2018 9733910 97038.74 100.31 0.000 9543717 9924102 2019 9640501 93575.05 103.02 0.000 9457097 9823905 2020 1.20e+07 106663.9 112.35 0.000 1.18e+07 1.22e+07 2021 1.34e+07 115907.7 115.54 0.000 1.32e+07 1.36e+07 2022 1.33e+07 120178.1 110.72 0.000 1.31e+07 1.35e+07