Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Month-specific incidence rates in STATA?

    Dear all,
    thanks in advance.

    I have a dataset of 900 patients followed for one year, organized into two departments (oncology, cardiology).
    Among these patients, I have information about their age, sex, admission_date, length_of_stay, discharge_date and mortality outcome at discharge (variable "outcome", which can have a value of 1 for deceased or 0 for alive).

    Is there a way in STATA to calculate month-specific incidence rates? For example: the number of deaths in March divided by person-days in March.

    I'm unsure if there's a way to account for the fact that some patients have admission days spread across multiple months.

    Some time ago, if I remember correctly, I had used stpstime, but I was in a slightly different situation, and this command doesn't seem appropriate now.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id age sex admission_date length_of_stay discharge_date outcome) str10 department
     1 54.61835 0 23364 17.857428  23381.86 1 "cardiology"
     2 45.71597 0 23331 14.076464 23345.076 0 "oncology"  
     3  58.9018 0 23027  7.090492  23034.09 0 "oncology"  
     4 58.87616 0 23184  17.30037   23201.3 0 "oncology"  
     5 63.64021 0 23240  21.30667 23261.307 1 "cardiology"
     6 57.74791 0 23065 25.892675  23090.89 0 "cardiology"
     7 70.58585 1 23278 13.036489  23291.04 1 "oncology"  
     8 53.71004 1 23060 35.847523  23095.85 0 "cardiology"
     9 58.72459 0 23030  22.15348 23052.154 1 "oncology"  
    10 39.22841 1 23078  19.79576 23097.795 0 "cardiology"
    end

  • #2
    Code:
    gen start_month = mofd(admission_date)
    gen end_month = mofd(discharge_date)
    expand end_month-start_month + 1
    by id, sort: gen mdate = dofm(start_month + _n - 1)
    format *month %tm
    format mdate %td
    
    gen admitted_days = min(floor(discharge_date), lastdayofmonth(mdate)) ///
        - max(admission_date, firstdayofmonth(mdate))
    by id (mdate), sort: replace outcome = 0 if _n < _N // CAN ONLY DIE IN LAST MONTH
        
    collapse (sum) admitted_days outcome, by(department mdate)
    gen admitted_months = admitted_days/daysinmonth(mdate)
    gen monthly_mortality_rate = outcome/admitted_months
    You don't say as much, but I have assumed you want the rates separately by department. If that's not true, just remove mention of department from the -by()- option of the -collapse- command. Similarly, if you want sex-specific mortality rates, add sex to the -by() option of the -collapse- command.

    Comment


    • #3
      Dear Clyde Schechter

      Thanks as always for your invaluable help.

      Yes, I would be interested, then, in addititon to the raw estimates of the incidence rates, in making a comparison to see if the incidence increases in any of the departments and if it increases in any particular month.
      So, I would need to conduct a count regression (Poisson, negative binomial).
      Moreover, I'd like to adjust for age or any other variable.
      Therefore, I should expand the "collapse" command.

      Code:
      collapse (sum) admitted_days outcome (mean) age var var var, by(department mdate)
      xi: poisson outcome i.mdate i.department age...
      I have limited data management skills in STATA and am trying to understand your code conceptually.
      I have a question: is this script correct even for patients who have admission days spanning more than 2 months? For example, admission in January, discharge in April?

      Finally, I would like to ask for your advice, if I'm not bothering you excessively.
      To answer the question "does the incidence change between departments and months?", I would need too many estimates (considering that we would also need the interaction of 12 months*department). Do you think it would make sense to settle for raw estimates, and in a multivariate model consider time as a continuous variable (possibly using splines to capture non-linearity)?

      Thank you again for your help.

      Gianfranco

      Comment


      • #4
        I have a question: is this script correct even for patients who have admission days spanning more than 2 months? For example, admission in January, discharge in April?
        Yes, it will work no matter how long or short the stay, even extending into years.

        Code:
        collapse (sum) admitted_days outcome (mean) age var var var, by(department mdate)
        xi: poisson outcome i.mdate i.department age...
        I don't think I would do it this way. Using mean age within a department and month is really not the best approach here. That's for (at least) two reasons: mortality is certainly non-linear in age, and the -poisson- model adds another layer of non-linearity into the mix. I would be inclined not to -collapse- the data and instead use the individual observations in the analysis. I would either use a spline to capture the non-linearity of the relationship to age, or I might make a categorical variable out of age, using a large enough number of narrow bins that mortality rates can be expected to be roughly constant over those age ranges. Whether this last approach is practical depends on whether your data set is large enough to support analysis with a discrete variable having a large number of levels. I would use the admitted_days variable as the -exposure()- option in the poisson regression. And since we have created repeated monthly observations for each person in the study, this becomes a multi-level model. So something like this:
        Code:
        mepoisson outcome i.mdate i.age_categories i.sex etc, exposure(admitted_days) || id:
        OR
        mepoisson outcome i.mdate age_spline_variables i.sex etc, exposure(admitted_days) || id:
        I also question the idea of doing monthly estimates even in this way. Are you looking for seasonal effects that recur annually? If so, the variable you want is not mdate, but rather a different variable: -gen month_of_year = month(mdate)-. If you are not looking for seasonality, then generating a separate effect for each month is going to produce an enormous amount of output that will be difficult to interpret sensibly: there will be noisy fluctuations unless your data set is gargantuan, and no clear picture of what is going on. You are probably better off treating mdate as a continuous variable and representing its effects with a spline, or some other method of representing non-linearity, as well.

        Note, by the way, that -xi:- is pretty much obsolete. Do learn factor-variable notation by reading -help fvvarlist- and use it instead. (There are a few old commands that do not support factor-variable notation and would require the use of -xi:-, but most of those have more modern equivalents that do support factor-variable notation. Really the core of the remaining use for -xi:- is with older user-written commands that have not been updated. So don't banish -xi- from your brain altogether, but consign it to a dusty corner somewhere.)




        Comment


        • #5
          Thanks, Clyde Schechter , for the stimulating discussion.

          Using mean age within a department and month is really not the best approach here. That's for (at least) two reasons: mortality is certainly non-linear in age, and the -poisson- model adds another layer of non-linearity into the mix
          I completely agree.

          Are you looking for seasonal effects that recur annually?
          No, I'm seeking specific differences between THOSE months, or, continuously, throughout THAT year.

          You are probably better off treating mdate as a continuous variable and representing its effects with a spline, or some other method of representing non-linearity, as well.
          Thank you. It strikes me as odd to use a spline on mdate, which has 13 categories, but it may just be my limited experience with splines. Nevertheless, I appreciate and will follow your suggestion.

          So, in the end, my code will be, without collapsing data:

          Code:
          gen start_month = mofd(admission_date)
          gen end_month = mofd(discharge_date)
          expand end_month-start_month + 1
          by id, sort: gen mdate = dofm(start_month + _n - 1)
          format *month %tm
          format mdate %td
          
          gen admitted_days = min(floor(discharge_date), lastdayofmonth(mdate)) ///
              - max(admission_date, firstdayofmonth(mdate))
          by id (mdate), sort: replace outcome = 0 if _n < _N
          and then the model
          Code:
          mepoisson outcome i.mdate i.age_categories i.sex, exposure(admitted_days) || id:

          I have one last question. Utilizing a Poisson regression on individual patient data will yield the incidence rate ratio (IRR). However, in your opinion, wouldn't it be more appropriate to use melogit, since my outcome, based on individual data, is a binary outcome (0/1)?

          Gianfranco
          Last edited by Gianfranco Di Gennaro; 29 Mar 2024, 17:24.

          Comment


          • #6
            I didn't understand, when I wrote #4, that you had only 13 months in your data set and that you are interested in differences among those 13 specific months. In that case, and given that you have 900 patients, you can indeed go ahead with i.mdate instead of treating month as continuous.

            Concerning your last question, in #1 you wrote "Is there a way in STATA to calculate month-specific incidence rates?" Logistic regression cannot give you an incidence rate. It can give you an incidence probability, but that will fail to account for the fact that an admission may cover fractions of its first and final months. And -melogit- does not even have an -exposure()- option to put in such a variable. (And there is no sensible way it could have that.) For a proper incidence rate, you want a Poisson model.

            Comment


            • #7
              Dear Clyde Schechter , I ran your code on my data and it worked perfectly.
              Thanks again for your support.
              Gianfranco

              Comment


              • #8
                Dear Clyde Schechter , hope you're well.
                I wrote a post about a situation that is an extension of the problem addressed here.

                Can I ask your opinion on it? Obviously only if you can; I would hate to bother you.
                I have tried several strategies, but none of them convince me and I would like to do everything possible to avoid gross errors.
                The post is at the link:
                https://www.statalist.org/forums/forum/general-stata-discussion/general/1773241-estimating-hospital-mortality-with-recurring-patients-and-duplicates

                Comment

                Working...
                X