Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forest plot with coefficients and C.I. from different logistic regressions

    Dear Statalists,

    I want to study the effect of a treatment on my outcome in different subpopulations of my sample (e.g. male-female, young-old) and I would like to represent the results in a single forest plots including odds ratio and 95% C.I. from all the univariate logistic regressions, to allow comparison. I thought about using metan command for the forest plot, but I would need to produce a dataset with variables' names, variables' categories names, odds ratios and C.I. to do it.
    However, the most I could do so far was producing a matrix with numbers indicating categories of categorical variables, or and lower and upper bound for C.I. using the following code:

    ​​​​​​foreach var in age gender region placeresidence education {
    levelsof(`var'), local(levels`var')
    foreach l of local levels`var' {
    logit y_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId)
    matrix tbl=r(table)
    matrix mytbl=nullmat(mytbl)\[`l',tbl["b","y_kid_01:1.Treatchild_enc"],tbl["ll","y_kid_01:1.Treatchild_enc"],tbl["ul","y_kid_01:1.Treatchild_enc"]]
    }
    }
    matrix colname mytbl= category or ul ll
    matrix list mytbl

    Is there a way to add a column for variable names and to label the variables' categories? Or is there a better/more efficient way to plot the graph?
    Thank you very much in advance!

  • #2
    Look at coefplot from SSC which will require you to just store the estimates using estimates store.

    Code:
    help estimates store
    Examples here: http://repec.sowi.unibe.ch/stata/coe...g-started.html

    Comment


    • #3
      Thank you Andrew! I thought about coefplot, but I am not sure the command is exactly applicable to my situation.
      Namely, I would like to have the subgroups names as y axis labels rather and the treatments (there are 2 versions) as labels rather than viceversa, as it is now, but I do not think this is doable..

      Comment


      • #4
        Can you post your current graph to illustrate what you mean?

        Comment


        • #5
          Sure, attached below you can find the graph I obtain with metan.
          It plots the odds and C.I. of the univariate regressions
          logit y_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId), where "var" are the independent variables (all factor variables) and "l" the categories. I wanted to represent treatment version 1 and treatment version 2 odds ratios in two different forest plots, so here are the effects for treatment version 1.
          The problems here are the categories, which I am not able to label, and the fact that I am not able to associate the variable names.

          Thank you again for your help and time!
          Attached Files

          Comment


          • #6
            With labeled categories, coefplot allows you to plot each categorical variable separately and label it. The following example illustrates, but you will need to adjust the code depending on the coefficients and CIs from your estimation.

            Code:
            sysuse auto, clear
            set seed 02032022
            gen depvar= runiformint(0,1)
            lab def record  0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five"
            lab val rep78 record
            gen mpglev = 1 + (mpg >= 20) + (mpg>=25) if !mi(mpg)
            lab define mpglevels 1 "High mileage" 2 "Average" 3 "Economical"
            lab values mpglev mpglevels
            logit depvar i.rep78 i.mpglev i.foreign, or nolog
            est sto m1
            
            *SELECT X-AXIS POSITION TO SET THE MARKER LABELS
            gen xpos = 45
            
            *GRAPH
            coefplot (m1,  keep(*.rep78) aseq(Repair record)\ m1, keep(*.mpglev) aseq(Mileage) ///
            \ m1, keep(*.foreign) aseq(Origin)), drop(_cons) scheme(s1color) ///
            mlabel(cond(@b==1, "", string(@b,"%9.3f") + "   " +"["+string(@ll,"%9.3f") + ///
            " " +string(@ul,"%9.3f")+ "]" )) baselevels mlabcolor(none) mcolor(blue) ///
            ciopts(lcolor(blue)) eform xline(1) xsc(r(-5 65)) addplot(scatter @at xpos, ///
            ms(i) mlab(@mlbl) mlabcolor(black) mlabgap(3) mlabsize(small)) ///
            graphr(margin(r=10))
            Res.:

            Code:
            . logit depvar i.rep78 i.mpglev i.foreign, or nolog
            note: 1.rep78 != 0 predicts failure perfectly
                  1.rep78 dropped and 2 obs not used
            
            note: 5.rep78 omitted because of collinearity
            
            Logistic regression                             Number of obs     =         67
                                                            LR chi2(6)        =       5.88
                                                            Prob > chi2       =     0.4368
            Log likelihood = -43.433751                     Pseudo R2         =     0.0634
            
            ------------------------------------------------------------------------------
                  depvar | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   rep78 |
                    one  |          1  (empty)
                    two  |   3.940322   4.841931     1.12   0.264     .3544554     43.8028
                  three  |   2.126185   2.109243     0.76   0.447     .3042162    14.86004
                   four  |   .5850226    .512106    -0.61   0.540     .1052117     3.25298
                   five  |          1  (omitted)
                         |
                  mpglev |
                Average  |   .9527521   .6214259    -0.07   0.941      .265332    3.421135
             Economical  |    .843976   .6107955    -0.23   0.815     .2043174     3.48622
                         |
                 foreign |
                Foreign  |   3.949955   3.193417     1.70   0.089     .8098819    19.26471
                   _cons |   .4307538   .4107363    -0.88   0.377     .0664636    2.791738
            ------------------------------------------------------------------------------
            Note: _cons estimates baseline odds.
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	40.9 KB
ID:	1648222

            Comment


            • #7
              In metan, you check use the option bygroup() and group the estimates according to per-specified categories.

              Comment


              • #8
                Thank you very much Andrew! But do you think is possible to have as labels the subpopulation names (e.g. "female") instead of the name of the variable's categories as labels in the graph somehow?

                Comment


                • #9
                  Thanks a lot Tiago, but I have a problem before that specifically in having a column with variables names and categories labels to use as groups..

                  Comment


                  • #10
                    Originally posted by Laura Leone View Post
                    Thank you very much Andrew! But do you think is possible to have as labels the subpopulation names (e.g. "female") instead of the name of the variable's categories as labels in the graph somehow?
                    That would be a bit complicated to code. I think you are on the right track to extract the coefficients. You can use frames to extract the coefficients and CIs from -rtable- then plot using your favorite graphing command. The following illustrates:

                    Code:
                    sysuse auto, clear
                    frame create myresults
                    frame create holding
                    tempfile holding
                    set seed 02032022
                    gen depvar= runiformint(0,1)
                    lab def record  0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five"
                    lab val rep78 record
                    gen mpglev = 1 + (mpg >= 20) + (mpg>=25) if !mi(mpg)
                    lab define mpglevels 1 "High mileage" 2 "Average" 3 "Economical"
                    lab values mpglev mpglevels
                    logit depvar i.mpglev if foreign,  or nolog
                    
                    foreach cat in 0.foreign 1.foreign 2.rep78 3.rep78 4.rep78{
                        logit depvar i.mpglev if `cat',  or nolog
                        mat res= (r(table)[1..6, 1...])'
                        frame holding{
                                clear
                                set obs 100
                                svmat res
                                drop if missing(res1)
                                gen category="`cat'"
                                order category
                                save `holding', replace
                         }
                    
                        frame myresults: append using "`holding'"
                    }
                    frame drop holding
                    frame change myresults
                    rename  (res1- res6) (b se t pvalue ll ul)
                    gen which = ustrregexra(category, "(\d+\.)(\w+)", "$2")
                    gen coefficient=_n
                    bys which (coefficient): replace coefficient=_n
                    l, sepby(which)
                    Note that the coefficient number represents the ordering specified in your command line.

                    Res.:

                    Code:
                    .
                    . l, sepby(which)
                    
                         +---------------------------------------------------------------------------------------------------+
                         |  category          b         se           t     pvalue         ll         ul     which   coeffi~t |
                         |---------------------------------------------------------------------------------------------------|
                      1. | 0.foreign          1          .           .          .          .          .   foreign          1 |
                      2. | 0.foreign   .8571429     .55918   -.2362909   .8132069   .2386433   3.078628   foreign          2 |
                      3. | 0.foreign   .6857143   .5601333    -.461883   .6441652   .1383024   3.399825   foreign          3 |
                      4. | 0.foreign       .875   .3202172   -.3648772    .715203   .4270711   1.792734   foreign          4 |
                      5. | 1.foreign          1          .           .          .          .          .   foreign          5 |
                      6. | 1.foreign   1.333333   1.677741    .2286265   .8191592   .1132054   15.70401   foreign          6 |
                      7. | 1.foreign         .8   .8763561   -.2037013    .838587   .0934642    6.84754   foreign          7 |
                      8. | 1.foreign        1.5   1.369306    .4441648   .6569235   .2506422   8.976942   foreign          8 |
                         |---------------------------------------------------------------------------------------------------|
                      9. |   2.rep78          1          .           .          .          .          .     rep78          1 |
                     10. |   2.rep78        1.2    1.71114    .1278597     .89826   .0733517   19.63144     rep78          2 |
                     11. |   2.rep78         .4   .6572671   -.5576368   .5770924   .0159732   10.01675     rep78          3 |
                     12. |   2.rep78        2.5    2.09165    1.095177   .2734391   .4850357   12.88565     rep78          4 |
                     13. |   3.rep78          1          .           .          .          .          .     rep78          5 |
                     14. |   3.rep78   .7272727   .5707628   -.4057776   .6849061    .156198   3.386251     rep78          6 |
                     15. |   3.rep78   .7272727     .68324   -.3389771    .734627   .1153538   4.585248     rep78          7 |
                     16. |   3.rep78      1.375   .6389077    .6853476   .4931246    .553074   3.418394     rep78          8 |
                     17. |   4.rep78          1          .           .          .          .          .     rep78          9 |
                     18. |   4.rep78   2.222222   2.348277    .7556441   .4498626   .2800928   17.63084     rep78         10 |
                     19. |   4.rep78          1   1.032796   -4.02e-16          1   .1320939   7.570371     rep78         11 |
                     20. |   4.rep78         .6    .438178   -.6994768   .4842541   .1433909    2.51062     rep78         12 |
                         +---------------------------------------------------------------------------------------------------+
                    
                    .

                    Comment


                    • #11
                      In #10, the coefficients correspond to the categories, not the subpopulation so the penultimate line in the code should be:


                      Code:
                      bys category (coefficient): replace coefficient=_n
                      To give you a running count of 1 to 4 in this instance. You can subsequently label these with your variable names:

                      Code:
                      lab def varnames 1 "varname 1" 2 "varname 2" 3 "varname 3" 4 "varname 4"
                      label values coefficient varnames


                      Comment


                      • #12
                        Thank you Andrew very much, I am going to try out this code!

                        Comment


                        • #13
                          I think Andrew is probably on the right lines with his code for initial manipulation of the regression results. This sounds like it would be the most difficult part of your particular project.

                          Having done that, however, can I just point out that forestplot (from the metan package) allows you to plot data in forest-plot "style" in a very flexible way. For instance, you can re-label, add columns of text, add blank rows, change weights, box shapes, colours and so on. Depending on your exact use case and desires, either coefplot or forestplot might be the best program to use.

                          (Note: my suggestion to use forestplot rather than metan, implies that you can avoid the problems you mention above, of metan 's "default" behaviour not being what you want, and of being unsure how to proceed from there. Just go directly to the plotting!)
                          Last edited by David Fisher; 07 Feb 2022, 06:09.

                          Comment


                          • #14
                            Thanks David, I agree that flexibility is an issue with coefplot. I managed to solve the problem with metan, using the following code:

                            ​​​​​​postutil clear
                            postfile table str70 (var var_label cat_label) float (category or lb ub) using results, replace
                            local variables_of_interest Age_enc Gender_enc Region_enc Placeresidence_enc Education_enc Employment_enc Pre_existing_enc Politicalaffiliation_enc Trustgov_enc Trustadm_enc
                            foreach var of local variables_of_interest {
                            levelsof(`var'), local(levels`var')
                            foreach l of local levels`var' {
                            quietly logit Vax_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId)
                            matrix table=r(table)
                            post table ("`var'") (`"`:var label `var''"') (`"`:label (`var') `l''"') (`l') (table["b", "Vax_kid_01:1.Treatchild_enc"]) (table["ll", "Vax_kid_01:1.Treatchild_enc"]) (table["ul", "Vax_kid_01:1.Treatchild_enc"])
                            }
                            }
                            postclose table
                            use results, clear
                            list, noobs clean

                            metan or lb ub, nooverall nohet nobetween nosubgroup nosecsub forestplot(xlab(0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5)) null (1) label(namevar=cat_label) by(var_label)

                            Thank you all for the precious advice.

                            Comment

                            Working...
                            X