Interpreting Poisson regression coefficients.

Sumedha Gupta

Join Date: May 2016
Posts: 279

Interpreting Poisson regression coefficients.

12 Nov 2024, 06:39

Dear All,

Recently, I should have been clearer when asking for help interpreting regression coefficients from a high-dimension fixed-effects Poisson regression using the ppmlhdfe command from SSC.

To clarify, I am regressing monthly zip-code-level prescription drug claims on the proportional change in relative annual wellness visits (among other predictors). The "relative annual wellness visits" metric represents the percentage deviation in the rate of wellness visits from a baseline, calculated as:

relative annual wellness visits =(rate of wellness visits at time t/rate of wellness visits at baseline)-1

This measure ranges from -1 (indicating a 100% decline from baseline) to 0.5 (indicating a 50% increase from baseline).

In my regression output, the variable _DDdd1_3_all reflects this metric. Here’s the output for reference (from my older post):

Code:

 
 . ppmlhdfe rx_new_cum  _DDdd1_3_all unemp_rate deaths if keepc == 1 /// >    [w=cellsize], /*eform*/ absorb(i.monthnum i.cohort2 i.zipcode) cluster(zipcohort) (sampling weights assumed) Iteration 1:   deviance = 5.0229e+05  eps = .         iters = 3    tol = 1.0e-04  min(eta) =  -4.10  P   Iteration 2:   deviance = 4.5978e+05  eps = 9.25e-02  iters = 3    tol = 1.0e-04  min(eta) =  -5.50       Iteration 3:   deviance = 4.5719e+05  eps = 5.67e-03  iters = 2    tol = 1.0e-04  min(eta) =  -6.82       Iteration 4:   deviance = 4.5711e+05  eps = 1.82e-04  iters = 2    tol = 1.0e-04  min(eta) =  -7.77       Iteration 5:   deviance = 4.5710e+05  eps = 7.26e-06  iters = 2    tol = 1.0e-04  min(eta) =  -8.43       Iteration 6:   deviance = 4.5710e+05  eps = 5.48e-07  iters = 2    tol = 1.0e-05  min(eta) =  -8.75       Iteration 7:   deviance = 4.5710e+05  eps = 1.22e-08  iters = 2    tol = 1.0e-07  min(eta) =  -8.82   S   Iteration 8:   deviance = 4.5710e+05  eps = 1.29e-11  iters = 2    tol = 1.0e-09  min(eta) =  -8.83   S O ------------------------------------------------------------------------------------------------------------ (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance) Converged in 8 iterations and 18 HDFE sub-iterations (tol = 1.0e-08)  HDFE PPML regression                              No. of obs      =    738,121 Absorbing 3 HDFE groups                           Residual df     =     26,361 Statistics robust to heteroskedasticity           Wald chi2(3)    =     295.83 Deviance             =  457103.9504               Prob > chi2     =     0.0000 Log pseudolikelihood = -3763869.632               Pseudo R2       =     0.0919  Number of clusters (zipcohort)=    26,362                             (Std. err. adjusted for 26,362 clusters in zipcohort) ---------------------------------------------------------------------------------                 |               Robust      rx_new_cum | Coefficient  std. err.      z    P>|z|     [95% conf. interval] ----------------+----------------------------------------------------------------    _DDdd1_3_all |   .0144727   .0038331     3.78   0.000       .00696    .0219853      unemp_rate |   .0035381   .0017633     2.01   0.045      .000082    .0069942          deaths |   1.07e-06   1.20e-06     0.89   0.371    -1.28e-06    3.42e-06           _cons |  -2.417783   .0081682  -296.00   0.000    -2.433793   -2.401774 ---------------------------------------------------------------------------------

I believe the coefficient of 0.0144727 on _DDdd1_3_all suggests that a 10-percentage-point change in the wellness visit rate would result in a 0.145% change in prescription drug claims, calculated as (exp⁡(0.0144727×0.1)−1)×100.

I'm uncertain about the small percent change implied by this coefficient for our "proportional change" variable. I may be misinterpreting it and would greatly appreciate any insights.

Thank you!

Sumedha

Tags: None

Sumedha Gupta

Join Date: May 2016
Posts: 279

12 Nov 2024, 06:58

Here is the code and output again (sorry the copy and paste earlier did not work):

Code:

. ppmlhdfe rx_new_cum  _DDdd1_1_all unemp_rate deaths if keepc == 1 ///
>    [w=cellsize], /*eform*/ absorb(i.monthnum i.cohort2 i.zipcode) cluster(zipc
> ohort)
(sampling weights assumed)
Iteration 1:   deviance = 5.0229e+05  eps = .         iters = 3    tol = 1.0e-04
>   min(eta) =  -4.10  P   
Iteration 2:   deviance = 4.5978e+05  eps = 9.25e-02  iters = 3    tol = 1.0e-04
>   min(eta) =  -5.50      
Iteration 3:   deviance = 4.5719e+05  eps = 5.67e-03  iters = 2    tol = 1.0e-04
>   min(eta) =  -6.82      
Iteration 4:   deviance = 4.5711e+05  eps = 1.82e-04  iters = 2    tol = 1.0e-04
>   min(eta) =  -7.77      
Iteration 5:   deviance = 4.5710e+05  eps = 7.26e-06  iters = 2    tol = 1.0e-04
>   min(eta) =  -8.43      
Iteration 6:   deviance = 4.5710e+05  eps = 5.48e-07  iters = 2    tol = 1.0e-05
>   min(eta) =  -8.75      
Iteration 7:   deviance = 4.5710e+05  eps = 1.22e-08  iters = 2    tol = 1.0e-07
>   min(eta) =  -8.82   S  
Iteration 8:   deviance = 4.5710e+05  eps = 1.29e-11  iters = 2    tol = 1.0e-09
>   min(eta) =  -8.83   S O
--------------------------------------------------------------------------------
> ----------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon b
> elow tolerance)
Converged in 8 iterations and 18 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    738,121
Absorbing 3 HDFE groups                           Residual df     =     26,361
Statistics robust to heteroskedasticity           Wald chi2(3)    =     295.83
Deviance             =  457103.9504               Prob > chi2     =     0.0000
Log pseudolikelihood = -3763869.632               Pseudo R2       =     0.0919

Number of clusters (zipcohort)=    26,362
                          (Std. err. adjusted for 26,362 clusters in zipcohort)
-------------------------------------------------------------------------------
              |               Robust
   rx_new_cum | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
  _DDdd1_1_all |  -.0021002    .004252    -0.49   0.621     -.010434    .0062336
   unemp_rate |   .0035381   .0017633     2.01   0.045      .000082    .0069942
       deaths |   1.07e-06   1.20e-06     0.89   0.371    -1.28e-06    3.42e-06
        _cons |  -2.417783   .0081682  -296.00   0.000    -2.433793   -2.401774
-------------------------------------------------------------------------------

Comment

Seungmin Lee

Join Date: Feb 2020
Posts: 32

12 Nov 2024, 15:43

I think your variable of interest (relative annual wellness visit) is a proportion, varying from -1 to 1. While the variable in the post you referred to (https://www.statalist.org/forums/for...tages-and-logs) is a percent, varying from 0 to 100. Professor Wooldridge also clarified it in his reply to that post (not a proportion, but a percent). So the formula you plugged in to understand the marginal effect of X might not be true in this case. Maybe you can re-scale your proportion variable to percent by multiplying 100, and re-run the regression?

One more comment about Poisson regression in general is that, I am not sure whether you can interpret Poisson regression coefficient as a uniform effect across all values of X like OLS. I believe the coefficients on Poisson regression can NOT be interpreted as a uniform effect across all values of right-hand-side (RHS); a unit change in X changes your fitted y value (yhat) by yhat * (exp(coefficient) -1). So the change in y varies by yhat, which varies by RHS.

Here's an example.

Code:

. use "https://stats.idre.ucla.edu/stat/stata/notes/lahigh", clear

. generate female = (gender == 1)

. ppmlhdfe        daysabs mathnce langnce female // you get same coefficient using "poisson" command
Iteration 1:   deviance = 2.2884e+03  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -1.04  P  
Iteration 2:   deviance = 2.2349e+03  eps = 2.39e-02  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
Iteration 3:   deviance = 2.2345e+03  eps = 1.61e-04  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
Iteration 4:   deviance = 2.2345e+03  eps = 1.09e-08  iters = 1    tol = 1.0e-04  min(eta) =  -1.12      
Iteration 5:   deviance = 2.2345e+03  eps = 0.00e+00  iters = 1    tol = 1.0e-05  min(eta) =  -1.12   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)

PPML regression                                   No. of obs      =        316
                                                  Residual df     =        312
                                                  Wald chi2(3)    =      27.62
Deviance             =  2234.546183               Prob &gt; chi2     =     0.0000
Log pseudolikelihood = -1547.970944               Pseudo R2       =     0.0536
------------------------------------------------------------------------------
             |               Robust
     daysabs | Coefficient  std. err.      z    P&gt;|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     mathnce |  -.0035232   .0076373    -0.46   0.645     -.018492    .0114455
     langnce |  -.0121521   .0052951    -2.29   0.022    -.0225304   -.0017739
      female |   .4009209   .1395915     2.87   0.004     .1273266    .6745153
       _cons |   2.286745   .2380629     9.61   0.000      1.82015     2.75334
------------------------------------------------------------------------------

. scalar  beta_mathnce    =       _b[mathnce]

. scalar  list    beta_mathnce
beta_mathnce = -.00352322

And here's the marginal effect of "methnce" variable, when all covariates are at their mean value, computed using "margins, atmeans" command

Code:

. margins, dydx(mathnce) atmeans  

Conditional marginal effects                               Number of obs = 316
Model VCE: Robust

Expression: Predicted mean of daysabs, predict()
dy/dx wrt:  mathnce
At: mathnce = 48.75101 (mean)
    langnce = 50.06379 (mean)
    female  = .5126582 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     mathnce |  -.0195214   .0420781    -0.46   0.643     -.101993    .0629501
------------------------------------------------------------------------------

Let me verify this by creating an additional observation where all covariates are at mean value computed above, with a corresponding fitted value.

Code:

. set obs 317
Number of observations (_N) was 316, now 317.

. replace mathnce=48.75101 in 317
(1 real change made)

. replace langnce=50.06379 in 317
(1 real change made)

. replace female=.5126582 in 317
(1 real change made)

. predict daysabs_hat // fitted value of y
(option mu assumed; predicted mean of depvar)

. summ    daysabs_hat in 317

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
 daysabs_hat |          1    5.540798           .   5.540798   5.540798

. di      5.540798*(exp(beta_mathnce)-1)
-.0194871 // nearly identical to the marginal effect computed at mean values (-.0195214).

If you are interested in the average effect of X across all values in covariates, I think you can use "margins, dydx(mathnce)" without any further option.
But somehow that gives a very different result from the estimate when I get using the formula you wrote; [exp(b1)-1]*100. Not sure why though.

Code:

. margins, dydx(mathnce)

Average marginal effects                                   Number of obs = 316
Model VCE: Robust

Expression: Predicted mean of daysabs, predict()
dy/dx wrt:  mathnce

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     mathnce |  -.0204704   .0442754    -0.46   0.644    -.1072486    .0663079
------------------------------------------------------------------------------


. di (exp(-.00352320)-1)*100
-.35170008

Announcement

Interpreting Poisson regression coefficients.

Comment

Comment