Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing Odds Ratios Across Time

    Hello All and Thanks In Advance.

    In spite of the fact that I gather that this is a relatively common practice, I've been unable to sort out the syntax for this procedure. Basically, I'd like to graphically represent change in odds ratios over time on a single graph for multiple DVs.

    More specifically, I'm wondering, for example, how do various measures of happiness (binary variables) associate with a specific predictor of happiness (x1)- net of other relevant factors- over time. For what it is worth, I have ~ 20 years in the data set. Ideally, the odds ratios displayed would have notation indicating P values as well.

    In this case, we might use three logit models:

    logit happy1 x1 (key indicator) x2 x3 x4, or
    logit happy2 x1 (key indicator) x2 x3 x4, or
    logit happy3 x1 (key indicator) x2 x3 x4, or

    So, I want to know what the odds are of a unit change in each (happy1 happy2 happy3) predicting a unit change in x1 (x1 and all happy DVs are binary), net of other IVs, over that 20 year span- with three lines (one for each DV) on the same graph.

    Apologies if the above is less than clear- but I do very much appreciate any help sorting this out.

  • #2
    Code:
    sysuse nlsw88, clear
    logit married i.collgrad##c.age i.south i.race, or
    predictnl ormarried = exp(_b[1.collgrad] + _b[1.collgrad#c.age]*age),        ///
        ci(lb_m ub_m)
    
    logit never_married i.collgrad##c.age i.south i.race, or
    predictnl ornever_married = exp(_b[1.collgrad] + _b[1.collgrad#c.age]*age) , ///
        ci(lb_n ub_n)
    
    twoway rarea lb_m ub_m age , sort ||       ///
           rarea lb_n ub_n age , sort ||       ///
           line ormarried ornever_married age, ///
                sort legend(off) yline(1)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      In general, when we select a logistic regression model, we want to get the effect size overall, not exactly across time. For example, mortality within 30 days after hospitalization.

      Considering you wish to have ORs across time, I'm just wondering why not selecting a survival analysis model, where you'd get the HRs.

      There, you surely get a "well-oiled machinery" to dovetail effect size across time.
      Best regards,

      Marcos

      Comment


      • #4
        Originally posted by Ryan LeCount View Post
        More specifically, I'm wondering, for example, how do various measures of happiness (binary variables) associate with a specific predictor of happiness (x1)- net of other relevant factors- over time. For what it is worth, I have ~ 20 years in the data set. Ideally, the odds ratios displayed would have notation indicating P values as well.

        In this case, we might use three logit models:

        logit happy1 x1 (key indicator) x2 x3 x4, or
        logit happy2 x1 (key indicator) x2 x3 x4, or
        logit happy3 x1 (key indicator) x2 x3 x4, or
        Hi Ryan. If I follow, happy1-happy3 are three distinct dichotomous measures of happiness, and the three logit models you show above are all estimated using the same annual sample of individuals. Is that right?

        What's not clear to me is whether you have a completely new (and presumably independent) set of observations for each year, or whether some individuals appear in more than one of those annual samples. I think you need to clarify that before anyone can offer good advice about how to compute and plot the ORs.

        HTH.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          Thanks all for your input and feedback and apologies for not having been clearer. I'll try to address the issues raised:

          ...the three logit models you show above are all estimated using the same annual sample of individuals. Is that right?

          What's not clear to me is whether you have a completely new (and presumably independent) set of observations for each year, or whether some individuals appear in more than one of those annual samples. I think you need to clarify that before anyone can offer good advice about how to compute and plot the ORs.

          These data are cross-sectional with new respondents for each year. The data are rigorously sampled and very much representative, so the objective is to make sense of how strongly these measures are associated with my key IV across time, net of other factors.

          Considering you wish to have ORs across time, I'm just wondering why not selecting a survival analysis model, where you'd get the HRs.
          Thanks for this suggestion. To be honest, I am not familiar with this technique, but my sense is that it is used to evaluate the duration/scope of a single event/treatment on a stable sample across time. My data are different and so is my question- these are all cross-sectionally derived attitudinal/social contextual variables and year to year variation is often directionally variable. I'm just interested in how each of these different measures - over the course of the whole period- are associated with my key IV net of all other factors.

          Again thanks very much and I'd be happy to clarify further if it would be helpful. I'm deeply grateful for each of you who takes the time to address these questions.
          Last edited by Ryan LeCount; 26 Jun 2017, 12:33.

          Comment


          • #6
            Hi Ryan. Getting a plot of ln(OR) with 95% CI across the years is quite straightforward via margins and marginsplot. E.g., for variable happy1, this should do it:

            Code:
            * Let variable yr = Year, and treat it as categorical
            logit happy1 x1##yr x2 x3 x4, or
            * Use r. contrast operator to get ln(OR) for x1.
            margins r.x1@yr, vsquish predict(xb) contrast(nowald effects)
            marginsplot
            I think that in order to get a plot of the ORs with 95% CIs using this approach, you would have to write the contrast results from margins to a tempfile and compute the ORs and CIs yourself. The following is based on an example Clyde Schechter posted here. It also borrows from post #2 in this thread (by Maarten Buis).

            Code:
            * Create some fake data to illustrate
            clear
            set obs 1000
            generate yr = mod(_n,20)
            replace yr = 20 if yr==0
            *sort yr
            generate id = yr*1000+_n
            generate happy1 = rbinomial(1,.3)
            generate x1 = rbinomial(1,.7) // Ryan called it an "indicator"
            generate x2 = rnormal(50, 10)
            generate x3 = rnormal(50, 10)
            generate x4 = rnormal(50, 10)
            preserve // Preserve the original data set
            
            *** The actual code begins here ***
            logit happy1 x1##yr x2 x3 x4, or
            margins yr#x1, vsquish predict(xb) // Get log-odds
            marginsplot
            graph export "C:\Temp\Happy1_logOdds.png", as(png) replace
            
            * Use r. contrast operator to get ln(OR) for x1.
            * Save -margins- table to a temporary file, and then
            * compute and list the ORs with 95% CIs.
            
            tempfile temp1
            margins r.x1@yr, vsquish predict(xb) contrast(nowald effects) saving(`temp1')
            marginsplot
            graph export "C:\Temp\Happy1_logOR.png", ///
            as(png) replace
            
            use `temp1', clear  // Use the temp file
            generate OR = exp(_margin)
            generate OR_lb = exp(_ci_lb)
            generate OR_ub = exp(_ci_ub)
            generate Year = _m2 + 1994 // Replace 1994 with appropriate value
            * Table shown below shows ORs with 95% CIs.
            list Year _margin _ci_lb _ci_ub OR OR_lb OR_ub
            
            twoway rcap OR_lb OR_ub Year , sort ||       ///
                   connected OR Year, ///
                        sort legend(off) yline(1)
            
            restore // revert to original data set
            I hope this helps.
            --
            Bruce Weaver
            Email: [email protected]
            Version: Stata/MP 18.5 (Windows)

            Comment


            • #7
              I was just reviewing some things in Michael Mitchell's nice book Interpreting and Visualizing Regression Models Using Stata, and was reminded of another way you could tackle this problem. Mitchell's approach does not require a temporary file--instead, he uses matrix commands to get access to the r() matrices generated by margins. Here is a variation on the approach he shows in his book.

              Code:
              * Create some fake data to illustrate
              clear
              set obs 1000
              generate yr = mod(_n,20)
              replace yr = 20 if yr==0
              *sort yr
              generate id = yr*1000+_n
              generate happy1 = rbinomial(1,.3)
              generate x1 = rbinomial(1,.7) // Ryan called it an "indicator"
              generate x2 = rnormal(50, 10)
              generate x3 = rnormal(50, 10)
              generate x4 = rnormal(50, 10)
              * End of fake data generation
              
              * The real code begins here
              logit happy1 x1##yr x2 x3 x4, or
              margins r.x1@yr, vsquish predict(xb) contrast(nowald effects)
              
              return list // Comment out or remove this line if you like
              matrix table1 = r(table) // Transpose of original margins table
              matrix table2 = table1'  // Transpose to get desired layout
              matrix list table2 // Comment out or remove this line if you wish
              * Now extract columnns 1, 5 & 6 from that table (B, Lower, Upper)
              matrix table3 = ///
              table2[1..rowsof(table2), 1], ///
              table2[1..rowsof(table2), 5..6]
              matrix list table3 // This shows B with 95% CI -- remove if you like
              * Use mata to exponentiate the table3 macro to get ORs with CIs
              mata : st_matrix("ORs", exp(st_matrix("table3")))
              matrix colnames ORs = "OR" "Lower" "Upper"
              matrix list ORs  // list the matrix of ORs with CIs
              
              svmat ORs // Save ORs matrix to working dataset
              rename (ORs1 ORs2 ORs3) (OR Lower Upper)
              generate Year = _n + 1994 if !missing(OR) // change 1994 to correct value
              list Year OR Lower Upper if !missing(Year)
              
              twoway rcap Lower Upper Year || connected OR Year,  ///
                     sort legend(off) yline(1) ytitle(Odds Ratio & 95% CI) ///
                     xlabel(1995(5)2015) title(Odds Ratios Over Time)




              HTH.
              Attached Files
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Many, many thanks for taking the time to lay all of this out. This is extraordinarily helpful!

                Comment


                • #9
                  Hi all,
                  I wanted to plot the odds ratio from logistic regression model (on the y-axis) over the continuous predictor variable (on the x-axis). Or, odds ratio of health outcome (presence/absence) over the level of air pollution (main predictor var). My data is cross-sectional, and I attached the types of graphs I want to make. I tried Marginsplot but that gives me the probability of the event rather than the odds ratio on the y-axis. My question could be elementary, but I have been struggling for more than 2 weeks. Examples of the graph I wanted to make found in this article (Fig 4 & 5): https://academic.oup.com/ije/article/48/4/1125/5487745
                  I appreciate any advice in advance.

                  Comment


                  • #10
                    It looks like you are mixing up the odds and the odds ratio. The odds is a measure of how likely an event is; it is the expected number of successes per failure. The odds ratio is a measure of how this likelihood of an event differs across groups; it is the ratio of two odds. I am pretty sure you want the odds and not the odds ratio, so that is what I will give you. The graphs in the article you linked to look wrong to me, e.g. the curve starting at exactly 1 looks very suspicious to me.

                    Code:
                    // select the graph scheme
                    set scheme s1mono
                    
                    // open example data
                    sysuse nlsw88, clear
                    // prepare the data
                    
                    gen byte occat = cond(occupation < 3, 1,                    ///
                                     cond(inlist(occupation,5, 6, 8, 9), 2, 3)) ///
                                     if !missing(occupation)
                    label variable occat "occupational category"
                    label define occat 1 "white collar" ///
                                       2 "skilled"      ///
                                       3 "unskilled"
                    label value occat occat
                    gen byte urban = c_city + smsa
                    label define urban 2 "central city" ///
                                       1 "suburban"     ///
                                       0 "rural"
                    label value urban urban
                    label variable urban "urbanicity"
                    
                    // estimate the model
                    logit union grade i.race i.south i.urban i.occat, or base
                    
                    // get the predicted odds
                    margins, at(race=1 south=0 urban=1 occat=2 grade=(0/18)) expression(exp(xb()))
                    
                    // make the graph
                    marginsplot, recastci(rarea) ciopts(astyle(ci)) plotopts(msymbol(i)) ///
                        ytitle(predicted odds) xlab(0(5)15) ///
                        title("The predicted odds of being in a union") ///
                        subtitle("for white women, not from the south," ///
                                 "who live in a suburban area, and do skilled labor")
                    Click image for larger version

Name:	Graph.png
Views:	1
Size:	71.1 KB
ID:	1646111


                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Many thanks, Maarten Buis for the explanation with a very relevant example. This is very helpful!

                      Comment

                      Working...
                      X