Forest plot with coefficients and C.I. from different logistic regressions

Laura Leone

Join Date: Feb 2022

Posts: 32
#1

Forest plot with coefficients and C.I. from different logistic regressions

03 Feb 2022, 09:30

Dear Statalists,

I want to study the effect of a treatment on my outcome in different subpopulations of my sample (e.g. male-female, young-old) and I would like to represent the results in a single forest plots including odds ratio and 95% C.I. from all the univariate logistic regressions, to allow comparison. I thought about using metan command for the forest plot, but I would need to produce a dataset with variables' names, variables' categories names, odds ratios and C.I. to do it.
However, the most I could do so far was producing a matrix with numbers indicating categories of categorical variables, or and lower and upper bound for C.I. using the following code:

foreach var in age gender region placeresidence education {
levelsof(`var'), local(levels`var')
foreach l of local levels`var' {
logit y_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId)
matrix tbl=r(table)
matrix mytbl=nullmat(mytbl)\[`l',tbl["b","y_kid_01:1.Treatchild_enc"],tbl["ll","y_kid_01:1.Treatchild_enc"],tbl["ul","y_kid_01:1.Treatchild_enc"]]
}
}
matrix colname mytbl= category or ul ll
matrix list mytbl

Is there a way to add a column for variable names and to label the variables' categories? Or is there a better/more efficient way to plot the graph?
Thank you very much in advance!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 9944
#2

03 Feb 2022, 09:58

Look at coefplot from SSC which will require you to just store the estimates using estimates store.

Code:

help estimates store

Examples here: http://repec.sowi.unibe.ch/stata/coe...g-started.html
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#3

03 Feb 2022, 10:34

Thank you Andrew! I thought about coefplot, but I am not sure the command is exactly applicable to my situation.
Namely, I would like to have the subgroups names as y axis labels rather and the treatments (there are 2 versions) as labels rather than viceversa, as it is now, but I do not think this is doable..
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9944
#4

03 Feb 2022, 10:48

Can you post your current graph to illustrate what you mean?
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#5

03 Feb 2022, 11:04

Sure, attached below you can find the graph I obtain with metan.
It plots the odds and C.I. of the univariate regressions
logit y_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId), where "var" are the independent variables (all factor variables) and "l" the categories. I wanted to represent treatment version 1 and treatment version 2 odds ratios in two different forest plots, so here are the effects for treatment version 1.
The problems here are the categories, which I am not able to label, and the fact that I am not able to associate the variable names.

Thank you again for your help and time!
Attached Files
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9944

03 Feb 2022, 13:42

With labeled categories, coefplot allows you to plot each categorical variable separately and label it. The following example illustrates, but you will need to adjust the code depending on the coefficients and CIs from your estimation.

Code:

sysuse auto, clear
set seed 02032022
gen depvar= runiformint(0,1)
lab def record  0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five"
lab val rep78 record
gen mpglev = 1 + (mpg >= 20) + (mpg>=25) if !mi(mpg)
lab define mpglevels 1 "High mileage" 2 "Average" 3 "Economical"
lab values mpglev mpglevels
logit depvar i.rep78 i.mpglev i.foreign, or nolog
est sto m1

*SELECT X-AXIS POSITION TO SET THE MARKER LABELS
gen xpos = 45

*GRAPH
coefplot (m1,  keep(*.rep78) aseq(Repair record)\ m1, keep(*.mpglev) aseq(Mileage) ///
\ m1, keep(*.foreign) aseq(Origin)), drop(_cons) scheme(s1color) ///
mlabel(cond(@b==1, "", string(@b,"%9.3f") + "   " +"["+string(@ll,"%9.3f") + ///
" " +string(@ul,"%9.3f")+ "]" )) baselevels mlabcolor(none) mcolor(blue) ///
ciopts(lcolor(blue)) eform xline(1) xsc(r(-5 65)) addplot(scatter @at xpos, ///
ms(i) mlab(@mlbl) mlabcolor(black) mlabgap(3) mlabsize(small)) ///
graphr(margin(r=10))

Res.:

Code:

. logit depvar i.rep78 i.mpglev i.foreign, or nolog
note: 1.rep78 != 0 predicts failure perfectly
      1.rep78 dropped and 2 obs not used

note: 5.rep78 omitted because of collinearity

Logistic regression                             Number of obs     =         67
                                                LR chi2(6)        =       5.88
                                                Prob > chi2       =     0.4368
Log likelihood = -43.433751                     Pseudo R2         =     0.0634

------------------------------------------------------------------------------
      depvar | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
        one  |          1  (empty)
        two  |   3.940322   4.841931     1.12   0.264     .3544554     43.8028
      three  |   2.126185   2.109243     0.76   0.447     .3042162    14.86004
       four  |   .5850226    .512106    -0.61   0.540     .1052117     3.25298
       five  |          1  (omitted)
             |
      mpglev |
    Average  |   .9527521   .6214259    -0.07   0.941      .265332    3.421135
 Economical  |    .843976   .6107955    -0.23   0.815     .2043174     3.48622
             |
     foreign |
    Foreign  |   3.949955   3.193417     1.70   0.089     .8098819    19.26471
       _cons |   .4307538   .4107363    -0.88   0.377     .0664636    2.791738
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Click image for larger version

Name: Graph.png
Views: 1
Size: 40.9 KB
ID: 1648222

Comment

Tiago Pereira

Join Date: Jan 2016

Posts: 365
#7

04 Feb 2022, 09:28

In metan, you check use the option bygroup() and group the estimates according to per-specified categories.
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#8

04 Feb 2022, 10:48

Thank you very much Andrew! But do you think is possible to have as labels the subpopulation names (e.g. "female") instead of the name of the variable's categories as labels in the graph somehow?
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#9

04 Feb 2022, 11:03

Thanks a lot Tiago, but I have a problem before that specifically in having a column with variables names and categories labels to use as groups..
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9944

#10

04 Feb 2022, 15:27

Originally posted by Laura Leone View Post

Thank you very much Andrew! But do you think is possible to have as labels the subpopulation names (e.g. "female") instead of the name of the variable's categories as labels in the graph somehow?

That would be a bit complicated to code. I think you are on the right track to extract the coefficients. You can use frames to extract the coefficients and CIs from -rtable- then plot using your favorite graphing command. The following illustrates:

Code:

sysuse auto, clear
frame create myresults
frame create holding
tempfile holding
set seed 02032022
gen depvar= runiformint(0,1)
lab def record  0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five"
lab val rep78 record
gen mpglev = 1 + (mpg >= 20) + (mpg>=25) if !mi(mpg)
lab define mpglevels 1 "High mileage" 2 "Average" 3 "Economical"
lab values mpglev mpglevels
logit depvar i.mpglev if foreign,  or nolog

foreach cat in 0.foreign 1.foreign 2.rep78 3.rep78 4.rep78{
    logit depvar i.mpglev if `cat',  or nolog
    mat res= (r(table)[1..6, 1...])'
    frame holding{
            clear
            set obs 100
            svmat res
            drop if missing(res1)
            gen category="`cat'"
            order category
            save `holding', replace
     }

    frame myresults: append using "`holding'"
}
frame drop holding
frame change myresults
rename  (res1- res6) (b se t pvalue ll ul)
gen which = ustrregexra(category, "(\d+\.)(\w+)", "$2")
gen coefficient=_n
bys which (coefficient): replace coefficient=_n
l, sepby(which)

Note that the coefficient number represents the ordering specified in your command line.

Res.:

Code:

.
. l, sepby(which)

     +---------------------------------------------------------------------------------------------------+
     |  category          b         se           t     pvalue         ll         ul     which   coeffi~t |
     |---------------------------------------------------------------------------------------------------|
  1. | 0.foreign          1          .           .          .          .          .   foreign          1 |
  2. | 0.foreign   .8571429     .55918   -.2362909   .8132069   .2386433   3.078628   foreign          2 |
  3. | 0.foreign   .6857143   .5601333    -.461883   .6441652   .1383024   3.399825   foreign          3 |
  4. | 0.foreign       .875   .3202172   -.3648772    .715203   .4270711   1.792734   foreign          4 |
  5. | 1.foreign          1          .           .          .          .          .   foreign          5 |
  6. | 1.foreign   1.333333   1.677741    .2286265   .8191592   .1132054   15.70401   foreign          6 |
  7. | 1.foreign         .8   .8763561   -.2037013    .838587   .0934642    6.84754   foreign          7 |
  8. | 1.foreign        1.5   1.369306    .4441648   .6569235   .2506422   8.976942   foreign          8 |
     |---------------------------------------------------------------------------------------------------|
  9. |   2.rep78          1          .           .          .          .          .     rep78          1 |
 10. |   2.rep78        1.2    1.71114    .1278597     .89826   .0733517   19.63144     rep78          2 |
 11. |   2.rep78         .4   .6572671   -.5576368   .5770924   .0159732   10.01675     rep78          3 |
 12. |   2.rep78        2.5    2.09165    1.095177   .2734391   .4850357   12.88565     rep78          4 |
 13. |   3.rep78          1          .           .          .          .          .     rep78          5 |
 14. |   3.rep78   .7272727   .5707628   -.4057776   .6849061    .156198   3.386251     rep78          6 |
 15. |   3.rep78   .7272727     .68324   -.3389771    .734627   .1153538   4.585248     rep78          7 |
 16. |   3.rep78      1.375   .6389077    .6853476   .4931246    .553074   3.418394     rep78          8 |
 17. |   4.rep78          1          .           .          .          .          .     rep78          9 |
 18. |   4.rep78   2.222222   2.348277    .7556441   .4498626   .2800928   17.63084     rep78         10 |
 19. |   4.rep78          1   1.032796   -4.02e-16          1   .1320939   7.570371     rep78         11 |
 20. |   4.rep78         .6    .438178   -.6994768   .4842541   .1433909    2.51062     rep78         12 |
     +---------------------------------------------------------------------------------------------------+

.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 9944
#11

05 Feb 2022, 08:10

In #10, the coefficients correspond to the categories, not the subpopulation so the penultimate line in the code should be:

Code:

bys category (coefficient): replace coefficient=_n

To give you a running count of 1 to 4 in this instance. You can subsequently label these with your variable names:

Code:

lab def varnames 1 "varname 1" 2 "varname 2" 3 "varname 3" 4 "varname 4" label values coefficient varnames
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#12

07 Feb 2022, 05:25

Thank you Andrew very much, I am going to try out this code!
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#13

07 Feb 2022, 06:06

I think Andrew is probably on the right lines with his code for initial manipulation of the regression results. This sounds like it would be the most difficult part of your particular project.

Having done that, however, can I just point out that forestplot (from the metan package) allows you to plot data in forest-plot "style" in a very flexible way. For instance, you can re-label, add columns of text, add blank rows, change weights, box shapes, colours and so on. Depending on your exact use case and desires, either coefplot or forestplot might be the best program to use.

(Note: my suggestion to use forestplot rather than metan, implies that you can avoid the problems you mention above, of metan 's "default" behaviour not being what you want, and of being unsure how to proceed from there. Just go directly to the plotting!)

Last edited by David Fisher; 07 Feb 2022, 06:09.
1 like
Comment
Laura Leone

Join Date: Feb 2022

Posts: 32
#14

08 Feb 2022, 18:51

Thanks David, I agree that flexibility is an issue with coefplot. I managed to solve the problem with metan, using the following code:

postutil clear
postfile table str70 (var var_label cat_label) float (category or lb ub) using results, replace
local variables_of_interest Age_enc Gender_enc Region_enc Placeresidence_enc Education_enc Employment_enc Pre_existing_enc Politicalaffiliation_enc Trustgov_enc Trustadm_enc
foreach var of local variables_of_interest {
levelsof(`var'), local(levels`var')
foreach l of local levels`var' {
quietly logit Vax_kid_01 i.Treatchild_enc if children_018==1 & `var'==`l', or vce(cluster ResponseId)
matrix table=r(table)
post table ("`var'") (`"`:var label `var''"') (`"`:label (`var') `l''"') (`l') (table["b", "Vax_kid_01:1.Treatchild_enc"]) (table["ll", "Vax_kid_01:1.Treatchild_enc"]) (table["ul", "Vax_kid_01:1.Treatchild_enc"])
}
}
postclose table
use results, clear
list, noobs clean

metan or lb ub, nooverall nohet nobetween nosubgroup nosecsub forestplot(xlab(0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5)) null (1) label(namevar=cat_label) by(var_label)

Thank you all for the precious advice.
Comment

Announcement