Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting Poisson regression coefficients.

    Dear All,

    Recently, I should have been clearer when asking for help interpreting regression coefficients from a high-dimension fixed-effects Poisson regression using the ppmlhdfe command from SSC.

    To clarify, I am regressing monthly zip-code-level prescription drug claims on the proportional change in relative annual wellness visits (among other predictors). The "relative annual wellness visits" metric represents the percentage deviation in the rate of wellness visits from a baseline, calculated as:

    relative annual wellness visits =(rate of wellness visits at time t/rate of wellness visits at baseline)-1

    This measure ranges from -1 (indicating a 100% decline from baseline) to 0.5 (indicating a 50% increase from baseline).

    In my regression output, the variable _DDdd1_3_all reflects this metric. Here’s the output for reference (from my older post):

    Code:
     
     . ppmlhdfe rx_new_cum  _DDdd1_3_all unemp_rate deaths if keepc == 1 /// >    [w=cellsize], /*eform*/ absorb(i.monthnum i.cohort2 i.zipcode) cluster(zipcohort) (sampling weights assumed) Iteration 1:   deviance = 5.0229e+05  eps = .         iters = 3    tol = 1.0e-04  min(eta) =  -4.10  P   Iteration 2:   deviance = 4.5978e+05  eps = 9.25e-02  iters = 3    tol = 1.0e-04  min(eta) =  -5.50       Iteration 3:   deviance = 4.5719e+05  eps = 5.67e-03  iters = 2    tol = 1.0e-04  min(eta) =  -6.82       Iteration 4:   deviance = 4.5711e+05  eps = 1.82e-04  iters = 2    tol = 1.0e-04  min(eta) =  -7.77       Iteration 5:   deviance = 4.5710e+05  eps = 7.26e-06  iters = 2    tol = 1.0e-04  min(eta) =  -8.43       Iteration 6:   deviance = 4.5710e+05  eps = 5.48e-07  iters = 2    tol = 1.0e-05  min(eta) =  -8.75       Iteration 7:   deviance = 4.5710e+05  eps = 1.22e-08  iters = 2    tol = 1.0e-07  min(eta) =  -8.82   S   Iteration 8:   deviance = 4.5710e+05  eps = 1.29e-11  iters = 2    tol = 1.0e-09  min(eta) =  -8.83   S O ------------------------------------------------------------------------------------------------------------ (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance) Converged in 8 iterations and 18 HDFE sub-iterations (tol = 1.0e-08)  HDFE PPML regression                              No. of obs      =    738,121 Absorbing 3 HDFE groups                           Residual df     =     26,361 Statistics robust to heteroskedasticity           Wald chi2(3)    =     295.83 Deviance             =  457103.9504               Prob > chi2     =     0.0000 Log pseudolikelihood = -3763869.632               Pseudo R2       =     0.0919  Number of clusters (zipcohort)=    26,362                             (Std. err. adjusted for 26,362 clusters in zipcohort) ---------------------------------------------------------------------------------                 |               Robust      rx_new_cum | Coefficient  std. err.      z    P>|z|     [95% conf. interval] ----------------+----------------------------------------------------------------    _DDdd1_3_all |   .0144727   .0038331     3.78   0.000       .00696    .0219853      unemp_rate |   .0035381   .0017633     2.01   0.045      .000082    .0069942          deaths |   1.07e-06   1.20e-06     0.89   0.371    -1.28e-06    3.42e-06           _cons |  -2.417783   .0081682  -296.00   0.000    -2.433793   -2.401774 ---------------------------------------------------------------------------------
    I believe the coefficient of 0.0144727 on _DDdd1_3_all suggests that a 10-percentage-point change in the wellness visit rate would result in a 0.145% change in prescription drug claims, calculated as (exp⁡(0.0144727×0.1)−1)×100.

    I'm uncertain about the small percent change implied by this coefficient for our "proportional change" variable. I may be misinterpreting it and would greatly appreciate any insights.

    Thank you!

    Sumedha

  • #2
    Here is the code and output again (sorry the copy and paste earlier did not work):
    Code:
    . ppmlhdfe rx_new_cum  _DDdd1_1_all unemp_rate deaths if keepc == 1 ///
    >    [w=cellsize], /*eform*/ absorb(i.monthnum i.cohort2 i.zipcode) cluster(zipc
    > ohort)
    (sampling weights assumed)
    Iteration 1:   deviance = 5.0229e+05  eps = .         iters = 3    tol = 1.0e-04
    >   min(eta) =  -4.10  P   
    Iteration 2:   deviance = 4.5978e+05  eps = 9.25e-02  iters = 3    tol = 1.0e-04
    >   min(eta) =  -5.50      
    Iteration 3:   deviance = 4.5719e+05  eps = 5.67e-03  iters = 2    tol = 1.0e-04
    >   min(eta) =  -6.82      
    Iteration 4:   deviance = 4.5711e+05  eps = 1.82e-04  iters = 2    tol = 1.0e-04
    >   min(eta) =  -7.77      
    Iteration 5:   deviance = 4.5710e+05  eps = 7.26e-06  iters = 2    tol = 1.0e-04
    >   min(eta) =  -8.43      
    Iteration 6:   deviance = 4.5710e+05  eps = 5.48e-07  iters = 2    tol = 1.0e-05
    >   min(eta) =  -8.75      
    Iteration 7:   deviance = 4.5710e+05  eps = 1.22e-08  iters = 2    tol = 1.0e-07
    >   min(eta) =  -8.82   S  
    Iteration 8:   deviance = 4.5710e+05  eps = 1.29e-11  iters = 2    tol = 1.0e-09
    >   min(eta) =  -8.83   S O
    --------------------------------------------------------------------------------
    > ----------------------------
    (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon b
    > elow tolerance)
    Converged in 8 iterations and 18 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =    738,121
    Absorbing 3 HDFE groups                           Residual df     =     26,361
    Statistics robust to heteroskedasticity           Wald chi2(3)    =     295.83
    Deviance             =  457103.9504               Prob > chi2     =     0.0000
    Log pseudolikelihood = -3763869.632               Pseudo R2       =     0.0919
    
    Number of clusters (zipcohort)=    26,362
                              (Std. err. adjusted for 26,362 clusters in zipcohort)
    -------------------------------------------------------------------------------
                  |               Robust
       rx_new_cum | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
      _DDdd1_1_all |  -.0021002    .004252    -0.49   0.621     -.010434    .0062336
       unemp_rate |   .0035381   .0017633     2.01   0.045      .000082    .0069942
           deaths |   1.07e-06   1.20e-06     0.89   0.371    -1.28e-06    3.42e-06
            _cons |  -2.417783   .0081682  -296.00   0.000    -2.433793   -2.401774
    -------------------------------------------------------------------------------

    Comment


    • #3
      I think your variable of interest (relative annual wellness visit) is a proportion, varying from -1 to 1. While the variable in the post you referred to (https://www.statalist.org/forums/for...tages-and-logs) is a percent, varying from 0 to 100. Professor Wooldridge also clarified it in his reply to that post (not a proportion, but a percent). So the formula you plugged in to understand the marginal effect of X might not be true in this case. Maybe you can re-scale your proportion variable to percent by multiplying 100, and re-run the regression?


      One more comment about Poisson regression in general is that, I am not sure whether you can interpret Poisson regression coefficient as a uniform effect across all values of X like OLS. I believe the coefficients on Poisson regression can NOT be interpreted as a uniform effect across all values of right-hand-side (RHS); a unit change in X changes your fitted y value (yhat) by yhat * (exp(coefficient) -1). So the change in y varies by yhat, which varies by RHS.


      Here's an example.

      Code:
      . use "https://stats.idre.ucla.edu/stat/stata/notes/lahigh", clear
      
      . generate female = (gender == 1)
      
      . ppmlhdfe        daysabs mathnce langnce female // you get same coefficient using "poisson" command
      Iteration 1:   deviance = 2.2884e+03  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -1.04  P  
      Iteration 2:   deviance = 2.2349e+03  eps = 2.39e-02  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
      Iteration 3:   deviance = 2.2345e+03  eps = 1.61e-04  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
      Iteration 4:   deviance = 2.2345e+03  eps = 1.09e-08  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
      Iteration 5:   deviance = 2.2345e+03  eps = 0.00e+00  iters = 1    tol = 1.0e-05  min(eta) =  -1.12   S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
      Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)
      
      PPML regression                                   No. of obs      =        316
                                                        Residual df     =        312
                                                        Wald chi2(3)    =      27.62
      Deviance             =  2234.546183               Prob > chi2     =     0.0000
      Log pseudolikelihood = -1547.970944               Pseudo R2       =     0.0536
      ------------------------------------------------------------------------------
                   |               Robust
           daysabs | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           mathnce |  -.0035232   .0076373    -0.46   0.645     -.018492    .0114455
           langnce |  -.0121521   .0052951    -2.29   0.022    -.0225304   -.0017739
            female |   .4009209   .1395915     2.87   0.004     .1273266    .6745153
             _cons |   2.286745   .2380629     9.61   0.000      1.82015     2.75334
      ------------------------------------------------------------------------------
      
      . scalar  beta_mathnce    =       _b[mathnce]
      
      . scalar  list    beta_mathnce
      beta_mathnce = -.00352322

      And here's the marginal effect of "methnce" variable, when all covariates are at their mean value, computed using "margins, atmeans" command

      Code:
      . margins, dydx(mathnce) atmeans  
      
      Conditional marginal effects                               Number of obs = 316
      Model VCE: Robust
      
      Expression: Predicted mean of daysabs, predict()
      dy/dx wrt:  mathnce
      At: mathnce = 48.75101 (mean)
          langnce = 50.06379 (mean)
          female  = .5126582 (mean)
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           mathnce |  -.0195214   .0420781    -0.46   0.643     -.101993    .0629501
      ------------------------------------------------------------------------------
      Let me verify this by creating an additional observation where all covariates are at mean value computed above, with a corresponding fitted value.

      Code:
      . set obs 317
      Number of observations (_N) was 316, now 317.
      
      . replace mathnce=48.75101 in 317
      (1 real change made)
      
      . replace langnce=50.06379 in 317
      (1 real change made)
      
      . replace female=.5126582 in 317
      (1 real change made)
      
      . predict daysabs_hat // fitted value of y
      (option mu assumed; predicted mean of depvar)
      
      . summ    daysabs_hat in 317
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
       daysabs_hat |          1    5.540798           .   5.540798   5.540798
      
      . di      5.540798*(exp(beta_mathnce)-1)
      -.0194871 // nearly identical to the marginal effect computed at mean values (-.0195214).

      If you are interested in the average effect of X across all values in covariates, I think you can use "margins, dydx(mathnce)" without any further option.
      But somehow that gives a very different result from the estimate when I get using the formula you wrote; [exp(b1)-1]*100. Not sure why though.

      Code:
      . margins, dydx(mathnce)
      
      Average marginal effects                                   Number of obs = 316
      Model VCE: Robust
      
      Expression: Predicted mean of daysabs, predict()
      dy/dx wrt:  mathnce
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           mathnce |  -.0204704   .0442754    -0.46   0.644    -.1072486    .0663079
      ------------------------------------------------------------------------------
      
      
      . di (exp(-.00352320)-1)*100
      -.35170008

      Comment

      Working...
      X