Prevalence Ratio from a Logistic Model Using Continuous Predictors?

Sepehr Hashemi

Join Date: Mar 2021
Posts: 8

Prevalence Ratio from a Logistic Model Using Continuous Predictors?

01 Jun 2021, 18:30

Hello,

I have a model that predicts a smoking outcome using a manually splined education exposures variables. I am specifically interested in the effect of the eduhigh variable. From my logistic model's eduhigh coefficient, I can interpret that for each additional year of education after 11 years of education *, a person has 0.92 times the odds of ever smoking, compared to a person with one less year of education:

Code:

. qui svyset secu [pweight=wt_1992], singleunit(certainty) strata(stratum) vce(linearized)

. global basemodel_conf "c.myrs i. female i.race i.bplace c.birthyr_c i.myrs_mi c.fyrs i.fyrs_mi
> "

. svy: logistic smokeever c.edulow c.eduhigh i.edu11 $basemodel_conf if firstiw==1992
(running logistic on estimation sample)

note: 0.myrs_mi omitted because of collinearity
note: 0.fyrs_mi omitted because of collinearity

Survey: Logistic regression

Number of strata   =        52                 Number of obs     =       5,851
Number of PSUs     =       104                 Population size   =  14,556,027
                                               Design df         =          52
                                               F(  12,     41)   =       33.06
                                               Prob > F          =      0.0000

----------------------------------------------------------------------------------
                 |             Linearized
       smokeever | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
          edulow |   1.062162    .038769     1.65   0.105     .9871466    1.142877
         eduhigh |   .9233532   .0196973    -3.74   0.000     .8846617    .9637369
         1.edu11 |   .5742586   .0768288    -4.15   0.000     .4390507    .7511046
            myrs |   1.025851   .0124483     2.10   0.040     1.001173    1.051137
        1.female |   .4006192   .0200339   -18.29   0.000     .3623694    .4429064
                 |
            race |
          black  |   .9421785   .1014996    -0.55   0.583     .7590149    1.169543
       hispanic  |   .7825791    .090535    -2.12   0.039     .6204532     .987069
other / missing  |   .8619439   .1529093    -0.84   0.406     .6037801    1.230493
                 |
          bplace |
 southern birth  |   .8329578   .0763799    -1.99   0.051     .6929648    1.001232
      immigrant  |   .6140813   .0722154    -4.15   0.000     .4849993    .7775183
                 |
       birthyr_c |   1.001843   .0149788     0.12   0.902     .9722321    1.032355
       0.myrs_mi |          1  (omitted)
            fyrs |   1.003104   .0113082     0.27   0.784      .980667    1.026054
       0.fyrs_mi |          1  (omitted)
           _cons |   4.826659    .657321    11.56   0.000     3.672521      6.3435
----------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

However, odds are difficult to interpret, and I would like to use a more easily interpretable measure, like prevalence ratio... Normally I would use margins command to calculate average causal effects, but here the predictor I am interested in (eduhigh) is continuous and not categorical. How can I best convert the odds ratio reported from this logistic model to a probability ratio or a non-odds effect measure, given a continuous exposure variable?

Thank you for the support!

* For completeness, my education variables are as followed: "edulow" (continuous variable representing 0-11 years of education), "eduhigh" (continuous variable representing12-17 years of education), and "edu11" (a binary discontinuity term, split at 11 years of education).

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

01 Jun 2021, 20:14

Well, given the non-linearity of the logistic model, there is no such thing as a prevalence ratio that characterizes the entire range of the data. Rather there are infinitely many prevalence ratios, depending on the exact values of all of the variable in the model. You could pick some values to constrain the variables to, values that you consider representative or interesting, and then use -margins- with the -post- option. For example, if you wanted to look at the prevalence ratio corresponding to 12 years of education vs 13 years of education, with all other predictors constrained to their estimation sample means, you could have

Code:

margins, at(eduhigh = (12 13) edulow = 0 edu11 = 0) atmeans post

Then you can use nlcom to calculate the ratio between the prevalence for eduhigh = 12 and eudhigh = 13, along with its confidence interval, etc. (I'm not sure exactly what the correct way to reference those margins in -nlcom- is, but if you -matrix list e(b)- you will see what Stata calls them.)

Would that be suitable for your purposes?

It seems to me that you are focusing here on find a way to express the results that is easy for an audience to grasp and understand. In my view, the best way to do that, really, is with a graph(s) of the predicted probability vs the predictor(s) of interest.
Comment

Sepehr Hashemi

Join Date: Mar 2021
Posts: 8

01 Jun 2021, 23:02

Dear Clyde,

Thank you for the helpful response-- that is what I have done! (e,g, in the graphic below, please excuse that this is for multiple cohorts, not just 1992). I am happy that you also suggest this approach!

Adjusted predicted smokeever by schoolyears, by cohort.pdf

However, when an image is not possible, it would be so helpful to be able to include a simple statistic (like the 0.92 Odds Ratio) to describe the experience of everyone in the higher education levels. Thank you for suggesting -margins at()-. I might explore averaging the prevalence ratios/confidence intervals for every 1 year increment of increasing education from 11-17 years.

Regarding the post-estimation, do you know how can one get the confidence intervals in the same lincom/nlcom command below?

Code:

. qui svyset secu [pweight=wt_1992], singleunit(certainty) strata(stratum) vce(linearized)

.
. qui global basemodel_conf "c.myrs i. female i.race i.bplace c.birthyr_c i.myrs_mi c.fyrs i.fyr
> s_mi"

. qui svy: logistic smokeever c.edulow c.eduhigh i.edu11 $basemodel_conf if firstiw==1992

.
. margins, at(eduhigh = (12 13) edulow = 0 edu11 = 0) atmeans post        // get predicted proba
> bilities at specificied education levels and mean value of covariates            

Adjusted predictions

Number of strata   =        52                 Number of obs     =       5,851
Number of PSUs     =       104                 Population size   =  14,556,027
Model VCE    : Linearized                      Design df         =          52

Expression   : Pr(smokeever), predict()

1._at        : edulow          =           0
               eduhigh         =          12
               edu11           =           0
               myrs            =    9.598962 (mean)
               0.female        =    .4749568 (mean)
               1.female        =    .5250432 (mean)
               0.race          =    .8049443 (mean)
               1.race          =    .1012051 (mean)
               2.race          =    .0562611 (mean)
               3.race          =    .0375895 (mean)
               0.bplace        =    .5782837 (mean)
               1.bplace        =    .3271921 (mean)
               2.bplace        =    .0945242 (mean)
               birthyr_c       =   -1.753657 (mean)
               myrs_mi         =           0
               fyrs            =    9.354802 (mean)
               fyrs_mi         =           0

2._at        : edulow          =           0
               eduhigh         =          13
               edu11           =           0
               myrs            =    9.598962 (mean)
               0.female        =    .4749568 (mean)
               1.female        =    .5250432 (mean)
               0.race          =    .8049443 (mean)
               1.race          =    .1012051 (mean)
               2.race          =    .0562611 (mean)
               3.race          =    .0375895 (mean)
               0.bplace        =    .5782837 (mean)
               1.bplace        =    .3271921 (mean)
               2.bplace        =    .0945242 (mean)
               birthyr_c       =   -1.753657 (mean)
               myrs_mi         =           0
               fyrs            =    9.354802 (mean)
               fyrs_mi         =           0

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .5686642   .0707116     8.04   0.000      .426771    .7105575
          2  |   .5490071   .0761642     7.21   0.000     .3961725    .7018418
------------------------------------------------------------------------------

.
. matrix list e(b)

e(b)[1,2]
            1.         2.
          _at        _at
y1  .56866424  .54900714

. lincom (b[1,2]/b[1,1]) // to get the predicted probability ratio

 ( 1) = -.9654329

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   .9654329          .        .       .            .           .
------------------------------------------------------------------------------

.

Thank you again,
S

Comment

Sepehr Hashemi

Join Date: Mar 2021

Posts: 8
#4

02 Jun 2021, 01:31

One quick edit, I noticed that my code in the previous post is missing the following line before running the lincom at the end:

Code:

matrix define b = e(b)

Regardless, as before, the confidence intervals are not calculated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#5

05 Jun 2021, 18:49

OK, my advice was not clear on how to use -nlcom- here and you misinterpreted it. When you save matrix e(b) as matrix b and then feed elements of b into -lincom-, -lincom- does not know that this matrix b was originally from e(b). -lincom- thinks you have just asked to to evaluate the ratio of two constants that happen to be in a matrix named b. So there is no standard error to calculate. For -lincom- to do standard errors, it has to know that it is working with regression coefficients. Which means you must refer to the matrix _b[]. The underscore character preceding the b cannot be omitted.

The other problem is, you can't use -lincom- to calculate a ratio of regression coefficients (though, as you saw, it is perfectly happy to calculate a ratio of what it thinks are constants.) You must use -nlcom-.

So

Code:

nlcom _b[1._at] / _b[2._at]

will get you what you want.
Comment

Announcement

Prevalence Ratio from a Logistic Model Using Continuous Predictors?

Comment

Comment

Comment

Comment