Poisson regression adjusted to exposure time

filipepaula

Join Date: Oct 2014

Posts: 22
#1

Poisson regression adjusted to exposure time

15 Dec 2022, 10:35

Hi all,

I am sorry if this question has more to do with statistical knowledge and not necessarily with Stata usage, but I really need some help. I hope this is appropriate.

I am trying to test the association between a continuous variable and the number of infections each patient had during the follow-up period. The follow-up period length is not the same for all patients, and this is not survival data in the sense that I just have the total number of infection events, not the time until each event.

In this example of the data, the variable <period> is the observation time in years, and the <infection> is the number of infections during that time. <indvar> is the variable being tested for the association, and <covar*> and <age> are co-variables to be included in the regression.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte id double(indvar age) byte covar2 float covar3 byte(infection period) 4 .552 54.31666666666667 1 1 18 9 5 .781 36.59722222222222 1 0 6 5 6 .672 56.56388888888889 1 0 8 9 7 .796 48.61944444444445 1 0 0 6 8 .394 53.41388888888889 0 1 3 7 9 .2725 57.013888888888886 0 0 1 1 10 .48350000000000004 53.36388888888889 1 0 4 10 11 .344 46.69444444444444 1 0 10 7 12 1.3975 42.75 0 0 1 10 13 1.1880000000000002 51.202777777777776 0 0 2 9 end

Is a Poisson regression (or a negative binomial regression) the most appropriate here? If it is, I can't figure out how to "adjust" the number of infections to the observation (at risk) period using the <poisson> or the <nbreg> commands.

Many thanks.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#2

15 Dec 2022, 10:42

Is a Poisson regression (or a negative binomial regression) the most appropriate here?

It is almost never possible to answer a question like this. As the oft-quoted maxim of Box goes: all models are wrong but some models are useful. Poisson regression is a reasonable candidate here, and many people would use it. In reality, it is unlikely to be the most appropriate model, and it is even more unlikely that anyone will know what the most appropriate model is. Of the ones that are readily available, pre-programmed in commercial statistical packages, it is certainly at or near the top of the appropriateness list.

how to "adjust" the number of infections to the observation (at risk) period using the <poisson> or the <nbreg> commands.

That is what the exposure option is for.

Code:

poisson infection indvar age covar*, exposure(period)
2 likes
Comment
filipepaula

Join Date: Oct 2014

Posts: 22
#3

15 Dec 2022, 11:59

Thank you Clyde for your insightful comments!
I was misinterpreting the exposure option in the help file.

Thank you so much.
Comment
Ches Zin

Join Date: Jun 2023

Posts: 20
#4

13 Nov 2023, 08:03

Dear Statalists,
I just would like to ask about the exposure using poisson regression whether it needs to be specified or not. I am a bit confused. When I ran the analysis with and without specifying the exposure, the results are too different. Below is my command for the regression. dose50 is the number of events. I want to know the incidence rate ratio between different egfr groups(4 groups). Here, egfr_group4 has different number of patients. So, I include this in the exposure(same variable for independent and exposure) to normalise the different number of patients. Please correct me if I understand this wrongly. Appreciate any helps.

poisson dose50 ib1.egfr_group4, vce(robust) irr
poisson dose50 ib1.egfr_group4, exposure (egfr_group4) vce(robust) irr

Thank you so much.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#5

13 Nov 2023, 11:07

The reason that -exposure()- is syntactically optional in -poisson- is that in some data sets the value of the exposure variable would be the same for all observations. For example, in a clinical trial whose outcome is a count of events per unit time, the study design might have arranged that the amount of observation time is the same for all participants. Or in a study like yours, it might have been designed so that the number of patients in all of the groups is the same. In that situation, the estimated irr will be the same with or without the exposure variable, and the calculations are a bit simpler without it.

You do not show, nor even describe, your data, so it is not possible for me to say what the correct approach here would be. I can tell you that having egfr_group4 as both the grouping variable and the exposure variable is almost certainly incorrect. Here's the way to think of it. An incidence rate, generically, has a numerator, which is a count of outcome events, and a denominator which is a number of observation units (people, in your case) * a duration of observation (not described in your post). Thus, incidence rates come in units like "cases per 10,000 people per year" or "motor vehicle collisions per 100000 miles per month". The exposure variable in a Poisson regression should be the denominator.

How this plays out depends on your data set. If you simply have an aggregated observation for each egfr_group4 that contains dose50, the total number of outcome events in that group, then your -exposure()- variable should be the total number of person-years (or person-months, person-days, person-weeks, whichever is most suitable for the actual duration of your study) of observation in the study. If you have a single observation per person, with the dose50 showing the total number of events experienced by that patient while under observation in your study, then the -exposure()- variable should be the duration of observation for that patient. (If you have some other data organization, then you need to re-organize it into one of these before doing your -poisson- regression.)
1 like
Comment
Ches Zin

Join Date: Jun 2023

Posts: 20
#6

13 Nov 2023, 11:58

Thanks so much Clyde. It helps me to understand better.
Best
Comment

Dart Stater

Join Date: Apr 2023
Posts: 18

06 Dec 2023, 23:32

In this example we analyze the influence of the stage of malignancy on the event "cancer death" (crd) over the observation time period (pyr) which is different case to case.
The hypothesis to be tested by fitting the regression model was "patients are more likely to die having greater stage of cancer development". We also try to find how much strong is this relationship.

Code:

poisson crd i.stage, exp(pyr) ir

Iteration 0:   log likelihood = -523.99847  
Iteration 1:   log likelihood = -523.99847  

Poisson regression                                Number of obs   =        315
                                                  LR chi2(3)      =      28.43
                                                  Prob > chi2     =     0.0000
Log likelihood = -523.99847                       Pseudo R2       =     0.0264

------------------------------------------------------------------------------
         crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       stage |
          2  |   1.738581   .5106652     1.88   0.060     .9776326    3.091821
          3  |   2.706141   .7645725     3.52   0.000     1.555459    4.708065
          4  |   3.285802   .9325866     4.19   0.000     1.883869    5.731022
         pyr | (exposure)

Interpreting the result, we can conclude that stage 3 and 4 is a sufficient predictor of cancer death during the follow-up period. Among patients with 3rd stage, the incidence rate of cancer death (per 1 year) is 2.7 higher than among those with 1st stage of cancer. Among patients with 4 stage, the IRR of death from cancer is 3.3 times higher than among 1st stage cancer patients. We cannot say definitely that 2nd stage is more likely lead to cancer death than 1st stage, however, loking at the 95%CI we cannot totally neglect this. Also, we can see that the IRR has increased from stage to stage.
In this suggestion, we now try to test the linear dose-effect relationship between the stage and the crd.

Code:

poisson crd stage if stage>0, exp(pyr) ir

Iteration 0:   log likelihood = -524.77876  
Iteration 1:   log likelihood = -524.77874  

Poisson regression                                Number of obs   =        315
                                                  LR chi2(1)      =      26.87
                                                  Prob > chi2     =     0.0000
Log likelihood = -524.77874                       Pseudo R2       =     0.0250

------------------------------------------------------------------------------
         crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       stage |   1.433167   .1017284     5.07   0.000     1.247031    1.647087
         pyr | (exposure)
------------------------------------------------------------------------------

How can we interpret this IRR coefficient? Should it be treated as excess relative risk (ERR) per unit of stage? (or, more accurate, Excess relative hazard per unit of stage?)

Comment

Dart Stater

Join Date: Apr 2023

Posts: 18
#8

07 Dec 2023, 02:25

a correction for:

Originally posted by Dart Stater View Post

Should it be treated as excess relative risk (ERR) per unit of stage? (or, more accurate, Excess relative hazard per unit of stage?)

Should (IRR-1) be treated as ERR per unit of stage?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#9

07 Dec 2023, 06:06

Originally posted by Dart Stater View Post

In this suggestion, we now try to test the linear dose-effect relationship between the stage and the crd.

I don't have an answer about the IRR / ERR thing, sorry, but if your objective is "to test the linear dose-effect relationship", then I'm curious as to why you wouldn't just do the following after fitting the first model

Code:

contrast p.stage

(leaving everything in the estimation metric) in lieu of going back and fitting a different model that treats stage as continuous.

For instance, using the example in the helpfile for poisson, the tack of just fitting a categorical-as-continuous model doesn't actually "test the linear . . . relationship" in that it fails to reveal a nonlinear (quadratic) component in the "dose-effect relationship" whose presence calls into question the suitability of any such IRR-1 as ERR-per-unit-of-stage interpretation.

Code:

webuse dollhill3 poisson deaths i.smokes i.agecat, exposure(pyears) nolog * Does "test the linear [component of the] dose-effect relationship" as well as reveal a nonlinear component: contrast p.agecat * Doesn't really "test" that the relationship is linear or imply that its coefficient is unambiguously interpretable: poisson deaths i.smokes c.agecat, exposure(pyears) nolog
Comment

Dart Stater

Join Date: Apr 2023
Posts: 18

#10

07 Dec 2023, 22:52

Thank you Joseph
First, this

Code:

contrast p.stage

doesn't work

Code:

. contrast p.stage
factor variables and time-series operators not allowed

If you have an idea how to deal with that please provide me with a proper script.

Second, I maybe wrong, but the initial step of any analyses of the data is the test of linear hypothesis using 2 categories of factor variable (yes/no), and 2 states of outcome (yes/no), and, therefore, leads to 2x2 table analyses. Thus, analysing the 2x2 table we presume the linear relationship (as possibly wrong but the most simple way).
Then, if we are lucky, the estimate's CI above 1.0 tell us that the linear component is strong enough, and our [linear] hypothesis was correct.
Of course, in real, none of the models, neither linear nor linear-quadratic, can precisely describe the relationship. So, having 2x2 data, the result of RR estimation can be only treated as "The linear relationship is found / is not found"
Otherwise, (when the point estimate (within CI's) failed to be different from 1.0) one should try to deal with non-linearity fitting the non-linear model - this step requires categorized factor variable.
So I'm not fully understand

in that it fails to reveal a nonlinear (quadratic) component

Obviously, it cannot help to reveal the non-linearity,, so we need to check for it manually, but perhaps it needs to be more clarified to me how to use contrast for this purpose.

In this example, testing the linear-quadratic model failed to show the significant result:

Code:

. poisson crd c.stage, exp(pyr) ir

Iteration 0:   log likelihood = -524.77876  
Iteration 1:   log likelihood = -524.77874  

Poisson regression                                Number of obs   =        315
                                                  LR chi2(1)      =      26.87
                                                  Prob > chi2     =     0.0000
Log likelihood = -524.77874                       Pseudo R2       =     0.0250

------------------------------------------------------------------------------
         crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       stage |   1.433167   .1017284     5.07   0.000     1.247031    1.647087
         pyr | (exposure)
------------------------------------------------------------------------------

. est sto a

. poisson crd c.stage##c.stage, exp(pyr) ir

Iteration 0:   log likelihood = -524.02375  
Iteration 1:   log likelihood = -524.02375  

Poisson regression                                Number of obs   =        315
                                                  LR chi2(2)      =      28.38
                                                  Prob > chi2     =     0.0000
Log likelihood = -524.02375                       Pseudo R2       =     0.0264

------------------------------------------------------------------------------
         crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       stage |   2.430983   1.079009     2.00   0.045     1.018521    5.802215
             |
     c.stage#|
     c.stage |   .9081364   .0721035    -1.21   0.225     .7772631    1.061046
         pyr | (exposure)
------------------------------------------------------------------------------

. est sto b

. lrtest a b

Likelihood-ratio test                                  LR chi2(1)  =      1.51
(Assumption: a nested in b)                            Prob > chi2 =    0.2191

that was not surprising, according the results of the first model:

Code:

. poisson crd i.stage, exp(pyr) ir

Iteration 0:   log likelihood = -523.99847  
Iteration 1:   log likelihood = -523.99847  

Poisson regression                                Number of obs   =        315
                                                  LR chi2(3)      =      28.43
                                                  Prob > chi2     =     0.0000
Log likelihood = -523.99847                       Pseudo R2       =     0.0264

------------------------------------------------------------------------------
         crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       stage |
          2  |   1.738581   .5106652     1.88   0.060     .9776326    3.091821
          3  |   2.706141   .7645725     3.52   0.000     1.555459    4.708065
          4  |   3.285802   .9325866     4.19   0.000     1.883869    5.731022
         pyr | (exposure)
------------------------------------------------------------------------------

Last edited by Dart Stater; 07 Dec 2023, 23:26.

Announcement