Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prevalence Ratio from a Logistic Model Using Continuous Predictors?

    Hello,

    I have a model that predicts a smoking outcome using a manually splined education exposures variables. I am specifically interested in the effect of the eduhigh variable. From my logistic model's eduhigh coefficient, I can interpret that for each additional year of education after 11 years of education *, a person has 0.92 times the odds of ever smoking, compared to a person with one less year of education:

    Code:
    . qui svyset secu [pweight=wt_1992], singleunit(certainty) strata(stratum) vce(linearized)
    
    . global basemodel_conf "c.myrs i. female i.race i.bplace c.birthyr_c i.myrs_mi c.fyrs i.fyrs_mi
    > "
    
    . svy: logistic smokeever c.edulow c.eduhigh i.edu11 $basemodel_conf if firstiw==1992
    (running logistic on estimation sample)
    
    note: 0.myrs_mi omitted because of collinearity
    note: 0.fyrs_mi omitted because of collinearity
    
    Survey: Logistic regression
    
    Number of strata   =        52                 Number of obs     =       5,851
    Number of PSUs     =       104                 Population size   =  14,556,027
                                                   Design df         =          52
                                                   F(  12,     41)   =       33.06
                                                   Prob > F          =      0.0000
    
    ----------------------------------------------------------------------------------
                     |             Linearized
           smokeever | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------+----------------------------------------------------------------
              edulow |   1.062162    .038769     1.65   0.105     .9871466    1.142877
             eduhigh |   .9233532   .0196973    -3.74   0.000     .8846617    .9637369
             1.edu11 |   .5742586   .0768288    -4.15   0.000     .4390507    .7511046
                myrs |   1.025851   .0124483     2.10   0.040     1.001173    1.051137
            1.female |   .4006192   .0200339   -18.29   0.000     .3623694    .4429064
                     |
                race |
              black  |   .9421785   .1014996    -0.55   0.583     .7590149    1.169543
           hispanic  |   .7825791    .090535    -2.12   0.039     .6204532     .987069
    other / missing  |   .8619439   .1529093    -0.84   0.406     .6037801    1.230493
                     |
              bplace |
     southern birth  |   .8329578   .0763799    -1.99   0.051     .6929648    1.001232
          immigrant  |   .6140813   .0722154    -4.15   0.000     .4849993    .7775183
                     |
           birthyr_c |   1.001843   .0149788     0.12   0.902     .9722321    1.032355
           0.myrs_mi |          1  (omitted)
                fyrs |   1.003104   .0113082     0.27   0.784      .980667    1.026054
           0.fyrs_mi |          1  (omitted)
               _cons |   4.826659    .657321    11.56   0.000     3.672521      6.3435
    ----------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.


    However, odds are difficult to interpret, and I would like to use a more easily interpretable measure, like prevalence ratio... Normally I would use margins command to calculate average causal effects, but here the predictor I am interested in (eduhigh) is continuous and not categorical. How can I best convert the odds ratio reported from this logistic model to a probability ratio or a non-odds effect measure, given a continuous exposure variable?

    Thank you for the support!


    * For completeness, my education variables are as followed: "edulow" (continuous variable representing 0-11 years of education), "eduhigh" (continuous variable representing12-17 years of education), and "edu11" (a binary discontinuity term, split at 11 years of education).

  • #2
    Well, given the non-linearity of the logistic model, there is no such thing as a prevalence ratio that characterizes the entire range of the data. Rather there are infinitely many prevalence ratios, depending on the exact values of all of the variable in the model. You could pick some values to constrain the variables to, values that you consider representative or interesting, and then use -margins- with the -post- option. For example, if you wanted to look at the prevalence ratio corresponding to 12 years of education vs 13 years of education, with all other predictors constrained to their estimation sample means, you could have
    Code:
    margins, at(eduhigh = (12 13) edulow = 0 edu11 = 0) atmeans post
    Then you can use nlcom to calculate the ratio between the prevalence for eduhigh = 12 and eudhigh = 13, along with its confidence interval, etc. (I'm not sure exactly what the correct way to reference those margins in -nlcom- is, but if you -matrix list e(b)- you will see what Stata calls them.)

    Would that be suitable for your purposes?

    It seems to me that you are focusing here on find a way to express the results that is easy for an audience to grasp and understand. In my view, the best way to do that, really, is with a graph(s) of the predicted probability vs the predictor(s) of interest.

    Comment


    • #3
      Dear Clyde,

      Thank you for the helpful response-- that is what I have done! (e,g, in the graphic below, please excuse that this is for multiple cohorts, not just 1992). I am happy that you also suggest this approach!

      Adjusted predicted smokeever by schoolyears, by cohort.pdf

      However, when an image is not possible, it would be so helpful to be able to include a simple statistic (like the 0.92 Odds Ratio) to describe the experience of everyone in the higher education levels. Thank you for suggesting -margins at()-. I might explore averaging the prevalence ratios/confidence intervals for every 1 year increment of increasing education from 11-17 years.


      Regarding the post-estimation, do you know how can one get the confidence intervals in the same lincom/nlcom command below?


      Code:
      . qui svyset secu [pweight=wt_1992], singleunit(certainty) strata(stratum) vce(linearized)
      
      .
      . qui global basemodel_conf "c.myrs i. female i.race i.bplace c.birthyr_c i.myrs_mi c.fyrs i.fyr
      > s_mi"
      
      . qui svy: logistic smokeever c.edulow c.eduhigh i.edu11 $basemodel_conf if firstiw==1992
      
      .
      . margins, at(eduhigh = (12 13) edulow = 0 edu11 = 0) atmeans post        // get predicted proba
      > bilities at specificied education levels and mean value of covariates            
      
      Adjusted predictions
      
      Number of strata   =        52                 Number of obs     =       5,851
      Number of PSUs     =       104                 Population size   =  14,556,027
      Model VCE    : Linearized                      Design df         =          52
      
      Expression   : Pr(smokeever), predict()
      
      1._at        : edulow          =           0
                     eduhigh         =          12
                     edu11           =           0
                     myrs            =    9.598962 (mean)
                     0.female        =    .4749568 (mean)
                     1.female        =    .5250432 (mean)
                     0.race          =    .8049443 (mean)
                     1.race          =    .1012051 (mean)
                     2.race          =    .0562611 (mean)
                     3.race          =    .0375895 (mean)
                     0.bplace        =    .5782837 (mean)
                     1.bplace        =    .3271921 (mean)
                     2.bplace        =    .0945242 (mean)
                     birthyr_c       =   -1.753657 (mean)
                     myrs_mi         =           0
                     fyrs            =    9.354802 (mean)
                     fyrs_mi         =           0
      
      2._at        : edulow          =           0
                     eduhigh         =          13
                     edu11           =           0
                     myrs            =    9.598962 (mean)
                     0.female        =    .4749568 (mean)
                     1.female        =    .5250432 (mean)
                     0.race          =    .8049443 (mean)
                     1.race          =    .1012051 (mean)
                     2.race          =    .0562611 (mean)
                     3.race          =    .0375895 (mean)
                     0.bplace        =    .5782837 (mean)
                     1.bplace        =    .3271921 (mean)
                     2.bplace        =    .0945242 (mean)
                     birthyr_c       =   -1.753657 (mean)
                     myrs_mi         =           0
                     fyrs            =    9.354802 (mean)
                     fyrs_mi         =           0
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               _at |
                1  |   .5686642   .0707116     8.04   0.000      .426771    .7105575
                2  |   .5490071   .0761642     7.21   0.000     .3961725    .7018418
      ------------------------------------------------------------------------------
      
      .
      . matrix list e(b)
      
      e(b)[1,2]
                  1.         2.
                _at        _at
      y1  .56866424  .54900714
      
      . lincom (b[1,2]/b[1,1]) // to get the predicted probability ratio
      
       ( 1) = -.9654329
      
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               (1) |   .9654329          .        .       .            .           .
      ------------------------------------------------------------------------------
      
      .
      Thank you again,
      S

      Comment


      • #4
        One quick edit, I noticed that my code in the previous post is missing the following line before running the lincom at the end:

        Code:
        matrix define b = e(b)
        Regardless, as before, the confidence intervals are not calculated.

        Comment


        • #5
          OK, my advice was not clear on how to use -nlcom- here and you misinterpreted it. When you save matrix e(b) as matrix b and then feed elements of b into -lincom-, -lincom- does not know that this matrix b was originally from e(b). -lincom- thinks you have just asked to to evaluate the ratio of two constants that happen to be in a matrix named b. So there is no standard error to calculate. For -lincom- to do standard errors, it has to know that it is working with regression coefficients. Which means you must refer to the matrix _b[]. The underscore character preceding the b cannot be omitted.

          The other problem is, you can't use -lincom- to calculate a ratio of regression coefficients (though, as you saw, it is perfectly happy to calculate a ratio of what it thinks are constants.) You must use -nlcom-.

          So
          Code:
          nlcom _b[1._at] / _b[2._at]
          will get you what you want.

          Comment

          Working...
          X