Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poisson regression glm vs poisson

    Hello stata users


    I want to see if certain doctors who are more experienced have less number of complications in patients




    Plan is to use poisson regression:

    (1)


    Code:
    poisson complication vol_grp,irr exposure(number_cases)



    However, someone here on stata had recommended this:




    (2)

    Code:
    glm complication i.doctor_id , family(poisson) link(log) exposure(number_cases) vce(robust)

    glm and poisson both test the same way. From what i understand glm, gives more fleixbility in termsnof paraterms ot investigate



    However my quesiton is, should i include doctorid in the poisson model ? It’s a categorial variable of Id with 400 doctors, there are around 4,000 patients.


    Which means the poisson model will give be values for each doctor… 400 is a lot .


    My thoughts were to group them into High or Low volume doctors in a categorical variable as seen in (1)

    what are your suggestions

  • #2
    From a statistical perspective, -glm, link(log) family(poisson)- and -poisson- are exactly the same analysis. There may be some differences in what additional kinds of weights and options are permitted, and if that matters for what you are doing, you should pick the one that allows what you need to use.

    As for the modeling issues you raise, I think both i.doctor_id and vol_grp are, at best, poor proxies for the effects of volume. If you use doctor_id, you are, indirectly capturing the experience of the doctor (volume), but you are also capturing other attributes of the doctor that may affect complication rates independently of volume such as the nature and extent of his/her previous education and training, manual dexterity, age,... And there is no way you can separate out all of those contributions to really pin down the role of volume. When you use a dichotomized verson of volume, you are at least working almost directly with volume, but now you are discarding useful variation in volume and replacing it with noise. In particular, if you use, say, 10 prior procedures as the cutoff between high and low volume, you are saying that a doctor with 9 procedures is the equivalent of one with no prior procedures at all, but is radically different from a doctor who has done 11 procedures. If that is true, then go for it. But that seems utterly implausible to me--the world doesn't work that way. For a more extensive takedown of creating dichotomous variables out of continuous and count variables, see Frank Harrell's https://www.fharrell.com/post/errmed/#catg.

    I would create a variable showing the experience of each doctor (number of similar cases previously performed) and use that variable directly in the model.

    Comment

    Working...
    X