Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting modified poisson regression

    Hi All,

    I'm hoping to check my interpretation of Poisson regression models in Stata17 makes sense.

    I have a set of binary outcomes and several continuous and categorical variables from a cross-sectional study.

    I have used Poisson regression with robust variance instead of log-binomial regression, because of the low prevalence and age as a continuous covariate.

    My data looks like this
    Code:
    . input long(sex VitD_Summer) byte AGE long(FP educ)
    
                  sex   VitD_Summer       AGE            FP          educ
      1. 2 1 71 2 2
      2. 1 1 76 2 2
      3. 2 1 81 2 0
      4. 2 1 62 3 1
      5. 2 0 47 2 1
      6. 2 1 55 2 0
      7. 1 0 36 2 1
      8. 1 0 53 2 1
      9. 1 1 26 2 2
     10. 2 1 58 3 1
     11. 2 0 65 2 2
     12. 1 . 60 3 0
     13. 1 0 36 2 1
     14. 1 1 19 2 0
     15. 1 0 37 2 2
     16. 1 1 19 2 1
     17. 1 1 51 3 2
     18. 2 0 55 2 1
     19. 2 1 65 2 0
     20. 1 1 48 2 1
     21. 1 1 34 2 2
     22. 2 1 76 2 0
     23. 2 1 74 2 0
     24. 1 1 62 1 2
     25. 1 0 37 2 0
     26. 2 0 63 1 1
     27. 1 0 80 1 1
     28. 1 1 37 3 2
     29. 1 1 83 2 1
     30. 2 1 75 1 2
     31. 1 1 31 2 2
     32. 1 1 61 2 1
     33. 1 0 75 1 2
     34. 1 . 43 2 2
     35. 2 1 66 1 1
     36. 2 0 67 2 1
     37. 1 0 56 2 1
     38. 1 0 54 2 1
     39. 1 1 71 2 1
     40. 2 1 60 2 0
     41. 2 1 73 2 0
     42. 2 1 38 2 2
     43. 1 0 64 2 1
     44. 1 1 55 3 2
     45. 2 1 46 2 1
     46. 1 0 36 2 1
     47. 1 1 50 4 1
     48. 2 0 44 2 1
     49. 2 . 73 2 0
     50. 1 0 72 2 1
     51. 1 1 71 2 1
     52. 2 1 69 2 0
     53. 1 0 82 2 1
     54. 2 1 64 3 0
     55. 2 1 78 1 0
     56. 2 1 82 1 0
     57. 1 1 19 2 2
     58. 1 1 37 1 1
     59. 2 1 71 2 1
     60. 1 1 20 1 0
     61. 1 0 20 2 1
     62. 2 . 82 2 1
     63. 2 1 81 2 1
     64. 1 0 77 2 0
     65. 2 0 62 1 1
     66. 1 . 51 2 0
     67. 1 1 54 3 0
     68. 1 1 76 1 0
     69. 1 1 84 1 0
     70. 2 1 40 2 1
     71. 1 0 64 1 1
     72. 1 1 75 2 0
     73. 1 1 46 2 1
     74. 1 0 62 1 1
     75. 2 1 38 2 0
     76. 1 1 31 2 1
     77. 2 1 59 2 1
     78. 1 0 78 1 0
     79. 1 1 72 2 1
     80. 1 1 48 2 1
     81. 1 1 45 2 2
     82. 2 1 82 2 0
     83. 1 1 18 2 0
     84. 1 1 69 2 1
     85. 1 1 39 2 2
     86. 1 0 18 2 0
     87. 1 1 28 2 2
     88. 2 1 19 2 0
     89. 2 1 28 3 0
     90. 1 1 76 2 0
     91. 1 1 54 2 2
     92. 2 1 67 2 2
     93. 2 1 88 1 0
     94. 1 0 71 2 1
     95. 1 0 40 2 2
     96. 1 0 29 2 2
     97. 2 1 60 3 0
     98. 2 1 62 2 0
     99. 1 1 79 1 1
    100. 1 1 62 2 0
    101. end
    
    . label values sex sex
    
    . label def sex 1 "FEMALE", modify
    
    . label def sex 2 "MALE", modify
    
    . label values VitD_Summer VitD_Summer
    
    . label def VitD_Summer 0 "Correct <10mins", modify
    
    . label def VitD_Summer 1 "Over-estimated", modify
    
    . label values FP FP
    
    . label def FP 1 "FP I", modify
    
    . label def FP 2 "FP II", modify
    
    . label def FP 3 "FP III", modify
    
    . label def FP 4 "FP IV", modify
    
    . label values educ educ
    
    . label def educ 0 "No post-school qual", modify
    
    . label def educ 1 "VET", modify
    
    . label def educ 2 "Higher Education", modify
    I then create a two-way table
    Code:
     tabulate sex VitD_Summer, row
    To determine the females are associated with correctly identifying vitamin D requirements, I then perform step-wise modified Poisson regression. I also check the baseline for males.
    Code:
    glm VitD_Summer ib2.sex, fam(poisson) link(log) vce(robust) eform
    glm VitD_Summer ib2.sex c.AGE, fam(poisson) link(log) vce(robust) eform
    glm VitD_Summer ib2.sex c.AGE i.FP, fam(poisson) link(log) vce(robust) eform
    glm VitD_Summer ib2.sex c.AGE i.FP i.educ, fam(poisson) link(log) vce(robust) eform
    glm VitD_Summer ib1.sex c.AGE i.FP, fam(poisson) link(log) vce(robust) eform
    Based on the results from the above- I would infer that males are more likely than females to over-estimate the vitamin D requirements in summer.
    I think I can also infer that skin types 3 (p 0.032) + 4 (p 0.000) and VET education (p 0.007) affect whether females are likely to over-estimate the vitamin D requirements in summer.
    Does this make sense?

    Where I am a little confused is when I run my model the first option of the variable is 'dropped' or not present in the analysis, unless I manually add it in. Is there are reason for this? Or is there a way I can get both presented using only one line of code?

    For example. I have to run :
    Code:
     glm VitD_Summer ib2.sex, fam(poisson) link(log) vce(robust) eform
    instead of the following, otherwise I get the computation for males automatically :
    Code:
     glm VitD_Summer i.sex, fam(poisson) link(log) vce(robust) eform
    Similarly - when running the above poisson regerssion. The results exclude the first variable option for the groups.
    For example education has three options : no post school, VET or higher education.
    But the model only shows us VET and higher education.

    Thanks in advance and sorry if this is very basic!
    I hope I used the dataex command correctly!

  • #2
    I don’t see any reason to use Poisson regression with an exponential mean. You have a binary outcome, which means it has a Bernoulli distribution. The only thing left is to choose a functional form for the response probability. Might as well use logit. The exponential mean doesn’t make a lot of sense.

    Comment


    • #3
      It seems to me the question in #1 can be boiled down to why is one level of a factor variable omitted. The answer is that one level must be omitted when there is a constant in the model to make the model identifiable. See the entry at UCLA's stats help page about "dummy coding".

      Jeff Wooldridge, the model referred to is commonly used in epidemiology and biostatistics, introduced by Zou (2004). The use of the sandwich estimator breaks the mean-variance relationship dependence. As a result, the model can be used to estimate binary outcomes with correct variance estimates, often as a means to directly estimate risk ratios instead of odds ratios.

      Of course, there are several models that can model the same kind of relationship (see e.g., Cummings 2009).

      Zou G. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol. 2004 Apr 1;159(7):702-6. doi: 10.1093/aje/kwh090. PMID: 15033648.

      Cummings, P. (2009). Methods for Estimating Adjusted Risk Ratios. The Stata Journal, 9(2), 175-196. https://doi.org/10.1177/1536867X0900900201

      Comment


      • #4
        Thanks Leonardo. That makes sense. I know about Poisson regression's robustness and how to obtain proper standard errors. And using the (incorrect) exponential functional form is not much different from using a linear probability model -- and at least there are no negative estimated probabilities. I guess the idea is to have a constant risk ratio, rather than averaging across the distribution of the covariates. Not sure why that's necessary, but I think I get it.

        Comment

        Working...
        X