Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poisson regression adjusted to exposure time

    Hi all,

    I am sorry if this question has more to do with statistical knowledge and not necessarily with Stata usage, but I really need some help. I hope this is appropriate.


    I am trying to test the association between a continuous variable and the number of infections each patient had during the follow-up period. The follow-up period length is not the same for all patients, and this is not survival data in the sense that I just have the total number of infection events, not the time until each event.

    In this example of the data, the variable <period> is the observation time in years, and the <infection> is the number of infections during that time. <indvar> is the variable being tested for the association, and <covar*> and <age> are co-variables to be included in the regression.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id double(indvar age) byte covar2 float covar3 byte(infection period)
     4               .552  54.31666666666667 1 1 18  9
     5               .781  36.59722222222222 1 0  6  5
     6               .672  56.56388888888889 1 0  8  9
     7               .796  48.61944444444445 1 0  0  6
     8               .394  53.41388888888889 0 1  3  7
     9              .2725 57.013888888888886 0 0  1  1
    10 .48350000000000004  53.36388888888889 1 0  4 10
    11               .344  46.69444444444444 1 0 10  7
    12             1.3975              42.75 0 0  1 10
    13 1.1880000000000002 51.202777777777776 0 0  2  9
    end

    Is a Poisson regression (or a negative binomial regression) the most appropriate here? If it is, I can't figure out how to "adjust" the number of infections to the observation (at risk) period using the <poisson> or the <nbreg> commands.

    Many thanks.



  • #2
    Is a Poisson regression (or a negative binomial regression) the most appropriate here?
    It is almost never possible to answer a question like this. As the oft-quoted maxim of Box goes: all models are wrong but some models are useful. Poisson regression is a reasonable candidate here, and many people would use it. In reality, it is unlikely to be the most appropriate model, and it is even more unlikely that anyone will know what the most appropriate model is. Of the ones that are readily available, pre-programmed in commercial statistical packages, it is certainly at or near the top of the appropriateness list.

    how to "adjust" the number of infections to the observation (at risk) period using the <poisson> or the <nbreg> commands.
    That is what the exposure option is for.
    Code:
    poisson infection indvar age covar*, exposure(period)

    Comment


    • #3
      Thank you Clyde for your insightful comments!
      I was misinterpreting the exposure option in the help file.

      Thank you so much.

      Comment


      • #4
        Dear Statalists,
        I just would like to ask about the exposure using poisson regression whether it needs to be specified or not. I am a bit confused. When I ran the analysis with and without specifying the exposure, the results are too different. Below is my command for the regression. dose50 is the number of events. I want to know the incidence rate ratio between different egfr groups(4 groups). Here, egfr_group4 has different number of patients. So, I include this in the exposure(same variable for independent and exposure) to normalise the different number of patients. Please correct me if I understand this wrongly. Appreciate any helps.

        poisson dose50 ib1.egfr_group4, vce(robust) irr
        poisson dose50 ib1.egfr_group4, exposure (egfr_group4) vce(robust) irr

        Thank you so much.

        Comment


        • #5
          The reason that -exposure()- is syntactically optional in -poisson- is that in some data sets the value of the exposure variable would be the same for all observations. For example, in a clinical trial whose outcome is a count of events per unit time, the study design might have arranged that the amount of observation time is the same for all participants. Or in a study like yours, it might have been designed so that the number of patients in all of the groups is the same. In that situation, the estimated irr will be the same with or without the exposure variable, and the calculations are a bit simpler without it.

          You do not show, nor even describe, your data, so it is not possible for me to say what the correct approach here would be. I can tell you that having egfr_group4 as both the grouping variable and the exposure variable is almost certainly incorrect. Here's the way to think of it. An incidence rate, generically, has a numerator, which is a count of outcome events, and a denominator which is a number of observation units (people, in your case) * a duration of observation (not described in your post). Thus, incidence rates come in units like "cases per 10,000 people per year" or "motor vehicle collisions per 100000 miles per month". The exposure variable in a Poisson regression should be the denominator.

          How this plays out depends on your data set. If you simply have an aggregated observation for each egfr_group4 that contains dose50, the total number of outcome events in that group, then your -exposure()- variable should be the total number of person-years (or person-months, person-days, person-weeks, whichever is most suitable for the actual duration of your study) of observation in the study. If you have a single observation per person, with the dose50 showing the total number of events experienced by that patient while under observation in your study, then the -exposure()- variable should be the duration of observation for that patient. (If you have some other data organization, then you need to re-organize it into one of these before doing your -poisson- regression.)

          Comment


          • #6
            Thanks so much Clyde. It helps me to understand better.
            Best

            Comment


            • #7
              In this example we analyze the influence of the stage of malignancy on the event "cancer death" (crd) over the observation time period (pyr) which is different case to case.
              The hypothesis to be tested by fitting the regression model was "patients are more likely to die having greater stage of cancer development". We also try to find how much strong is this relationship.

              Code:
              poisson crd i.stage, exp(pyr) ir
              
              Iteration 0:   log likelihood = -523.99847  
              Iteration 1:   log likelihood = -523.99847  
              
              Poisson regression                                Number of obs   =        315
                                                                LR chi2(3)      =      28.43
                                                                Prob > chi2     =     0.0000
              Log likelihood = -523.99847                       Pseudo R2       =     0.0264
              
              ------------------------------------------------------------------------------
                       crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     stage |
                        2  |   1.738581   .5106652     1.88   0.060     .9776326    3.091821
                        3  |   2.706141   .7645725     3.52   0.000     1.555459    4.708065
                        4  |   3.285802   .9325866     4.19   0.000     1.883869    5.731022
                       pyr | (exposure)
              Interpreting the result, we can conclude that stage 3 and 4 is a sufficient predictor of cancer death during the follow-up period. Among patients with 3rd stage, the incidence rate of cancer death (per 1 year) is 2.7 higher than among those with 1st stage of cancer. Among patients with 4 stage, the IRR of death from cancer is 3.3 times higher than among 1st stage cancer patients. We cannot say definitely that 2nd stage is more likely lead to cancer death than 1st stage, however, loking at the 95%CI we cannot totally neglect this. Also, we can see that the IRR has increased from stage to stage.
              In this suggestion, we now try to test the linear dose-effect relationship between the stage and the crd.

              Code:
              poisson crd stage if stage>0, exp(pyr) ir
              
              Iteration 0:   log likelihood = -524.77876  
              Iteration 1:   log likelihood = -524.77874  
              
              Poisson regression                                Number of obs   =        315
                                                                LR chi2(1)      =      26.87
                                                                Prob > chi2     =     0.0000
              Log likelihood = -524.77874                       Pseudo R2       =     0.0250
              
              ------------------------------------------------------------------------------
                       crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                     stage |   1.433167   .1017284     5.07   0.000     1.247031    1.647087
                       pyr | (exposure)
              ------------------------------------------------------------------------------
              How can we interpret this IRR coefficient? Should it be treated as excess relative risk (ERR) per unit of stage? (or, more accurate, Excess relative hazard per unit of stage?)

              Comment


              • #8
                a correction for:
                Originally posted by Dart Stater View Post
                Should it be treated as excess relative risk (ERR) per unit of stage? (or, more accurate, Excess relative hazard per unit of stage?)
                Should (IRR-1) be treated as ERR per unit of stage?

                Comment


                • #9
                  Originally posted by Dart Stater View Post
                  In this suggestion, we now try to test the linear dose-effect relationship between the stage and the crd.
                  I don't have an answer about the IRR / ERR thing, sorry, but if your objective is "to test the linear dose-effect relationship", then I'm curious as to why you wouldn't just do the following after fitting the first model
                  Code:
                  contrast p.stage
                  (leaving everything in the estimation metric) in lieu of going back and fitting a different model that treats stage as continuous.

                  For instance, using the example in the helpfile for poisson, the tack of just fitting a categorical-as-continuous model doesn't actually "test the linear . . . relationship" in that it fails to reveal a nonlinear (quadratic) component in the "dose-effect relationship" whose presence calls into question the suitability of any such IRR-1 as ERR-per-unit-of-stage interpretation.
                  Code:
                  webuse dollhill3
                  
                  poisson deaths i.smokes i.agecat, exposure(pyears) nolog
                  
                  * Does "test the linear [component of the] dose-effect relationship" as well as reveal a nonlinear component:
                  contrast p.agecat
                  
                  * Doesn't really "test" that the relationship is linear or imply that its coefficient is unambiguously interpretable:
                  poisson deaths i.smokes c.agecat, exposure(pyears) nolog

                  Comment


                  • #10
                    Thank you Joseph
                    First, this
                    Code:

                    contrast p.stage
                    doesn't work
                    Code:
                    . contrast p.stage
                    factor variables and time-series operators not allowed
                    If you have an idea how to deal with that please provide me with a proper script.

                    Second, I maybe wrong, but the initial step of any analyses of the data is the test of linear hypothesis using 2 categories of factor variable (yes/no), and 2 states of outcome (yes/no), and, therefore, leads to 2x2 table analyses. Thus, analysing the 2x2 table we presume the linear relationship (as possibly wrong but the most simple way).
                    Then, if we are lucky, the estimate's CI above 1.0 tell us that the linear component is strong enough, and our [linear] hypothesis was correct.
                    Of course, in real, none of the models, neither linear nor linear-quadratic, can precisely describe the relationship. So, having 2x2 data, the result of RR estimation can be only treated as "The linear relationship is found / is not found"
                    Otherwise, (when the point estimate (within CI's) failed to be different from 1.0) one should try to deal with non-linearity fitting the non-linear model - this step requires categorized factor variable.
                    So I'm not fully understand
                    in that it fails to reveal a nonlinear (quadratic) component
                    Obviously, it cannot help to reveal the non-linearity,, so we need to check for it manually, but perhaps it needs to be more clarified to me how to use contrast for this purpose.

                    In this example, testing the linear-quadratic model failed to show the significant result:
                    Code:
                    . poisson crd c.stage, exp(pyr) ir
                    
                    Iteration 0:   log likelihood = -524.77876  
                    Iteration 1:   log likelihood = -524.77874  
                    
                    Poisson regression                                Number of obs   =        315
                                                                      LR chi2(1)      =      26.87
                                                                      Prob > chi2     =     0.0000
                    Log likelihood = -524.77874                       Pseudo R2       =     0.0250
                    
                    ------------------------------------------------------------------------------
                             crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           stage |   1.433167   .1017284     5.07   0.000     1.247031    1.647087
                             pyr | (exposure)
                    ------------------------------------------------------------------------------
                    
                    . est sto a
                    
                    . poisson crd c.stage##c.stage, exp(pyr) ir
                    
                    Iteration 0:   log likelihood = -524.02375  
                    Iteration 1:   log likelihood = -524.02375  
                    
                    Poisson regression                                Number of obs   =        315
                                                                      LR chi2(2)      =      28.38
                                                                      Prob > chi2     =     0.0000
                    Log likelihood = -524.02375                       Pseudo R2       =     0.0264
                    
                    ------------------------------------------------------------------------------
                             crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           stage |   2.430983   1.079009     2.00   0.045     1.018521    5.802215
                                 |
                         c.stage#|
                         c.stage |   .9081364   .0721035    -1.21   0.225     .7772631    1.061046
                             pyr | (exposure)
                    ------------------------------------------------------------------------------
                    
                    . est sto b
                    
                    . lrtest a b
                    
                    Likelihood-ratio test                                  LR chi2(1)  =      1.51
                    (Assumption: a nested in b)                            Prob > chi2 =    0.2191
                    that was not surprising, according the results of the first model:
                    Code:
                    . poisson crd i.stage, exp(pyr) ir
                    
                    Iteration 0:   log likelihood = -523.99847  
                    Iteration 1:   log likelihood = -523.99847  
                    
                    Poisson regression                                Number of obs   =        315
                                                                      LR chi2(3)      =      28.43
                                                                      Prob > chi2     =     0.0000
                    Log likelihood = -523.99847                       Pseudo R2       =     0.0264
                    
                    ------------------------------------------------------------------------------
                             crd |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           stage |
                              2  |   1.738581   .5106652     1.88   0.060     .9776326    3.091821
                              3  |   2.706141   .7645725     3.52   0.000     1.555459    4.708065
                              4  |   3.285802   .9325866     4.19   0.000     1.883869    5.731022
                             pyr | (exposure)
                    ------------------------------------------------------------------------------
                    Last edited by Dart Stater; 07 Dec 2023, 23:26.

                    Comment

                    Working...
                    X