Clinical Trials and Stata: Creating graphs and computing statistics for subgroup interactions

Philip Jones

Join Date: Mar 2014

Posts: 104
#1

Clinical Trials and Stata: Creating graphs and computing statistics for subgroup interactions

07 Apr 2014, 18:22

Hello all,

I fear I may know the answer already, but I was wondering if anyone has heard of the following for Stata:

In clinical trials, it is common to analyze subgroups using a test for interaction. An example is attached here, where aspirin was compared to placebo on a composite outcome of mortality and non-fatal heart attacks. You can see that four subgroups were specified, and for each, the point estimate (and 95% CI) for each of the subgroups was compared, using a test for interaction, to the others.

Is there any automated way to define subgroups based on factor variables, compute a test for interaction, and plot the data nicely on such a graph? If not, can anyone suggest a good way of constructing such a figure in Stata?

Many thanks for your consideration!

Phil

1 Photo
Tags: clinical trials, RCT
David Fisher

Join Date: Apr 2014

Posts: 407
#2

08 Apr 2014, 04:27

Hi Phil,

I have recently developed a package of IPD meta-analysis commands, one of which is designed to do what you need. It won't give you the interaction p-values automatically, I'm afraid -- instead you'll need to test for each interaction in advance and save the p-values in macros (or similar). But it is certainly possible to have the p-values appear on the graph eventually.

The package is available from SSC and is called "ipdmetan" (ssc describe ipdmetan or ssc install ipdmetan from the command line), and the specific command you need is called ipdover.

Please feel free to contact me for help/queries if you think this command is suitable for your needs.

Many thanks,

David.
Comment
Philip Jones

Join Date: Mar 2014

Posts: 104
#3

08 Apr 2014, 05:11

David,

I will download today and try out. It sounds like exactly what I need. Thank you very much for writing it, and for bringing it to my attention.

Phil
Comment
Philip Jones

Join Date: Mar 2014

Posts: 104
#4

08 Apr 2014, 10:27

David,

I have downloaded the program, and although I'm certainly not a Stata expert, I don't consider myself a Stata novice, but I have to admit I am having a bit of difficulty getting the program to work.

Just so you know, I found a PPT presentation of yours on the web that I read, but it didn't address my problem (below). I also read the help file several times before writing this message, and tried lots of permutations of the command, but I came up empty. (I realize you state that examples are forthcoming, so my problems will probably be addressed with those files, when available.)

Let's say, for example, that I want to graph the mean and lower/upper 95% CIs of the mpg variable in the auto dataset, over the categories (2) of foreign.

My first attempt was

Code:

sysuse auto ipdover, over(foreign) : ci mpg

But I do not know how to pass which returned values I want to the ipdover program. Plus, I was very confused about the [exp_list] part of the command syntax.

Am I going about it the wrong way? Do I have to generate a dataset file with my pre-summarized variables, labelled as ES lci uci in order to get the program to work?

I apologize for my ignorance. I'm hoping it won't take you too long to point me in the right direction, as I feel I am missing something important that, once I understand it, I can move forward.

Thank you,

Phil
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#5

09 Apr 2014, 03:28

Hi Phil,

This is very useful feedback, thankyou. Because it originated from a meta-analysis program, the syntax takes a particular form which, while I don't think is limiting, may not be obvious. This is probably the part of my package that has been least tested by people other than myself.

ipdmetan and ipdover work best with e-class estimation commands (regress, logistic etc). With e-class programs there is no need for [exp_list] nor for pre-summarized variables etc.
For means, there is the command mean (singular) which is e-class. Hence, try the following:

Code:

sysuse auto, clear ipdover, over(foreign rep78) forest(nonull xlabel(10(5)40, force)) : mean mpg

(the forest() options just tidy up the plot a bit). I've added a second categorical variable, rep78, to further demonstrate how the command works.

I hope this is helpful, but please don't hesitate to ask further questions.

Thanks,

David.
Comment
Philip Jones

Join Date: Mar 2014

Posts: 104
#6

09 Apr 2014, 04:15

Ahh. That makes sense. Thank you for the clarification David. Great program which will prove very handy!
Comment
Jamie Marra

Join Date: Jan 2015

Posts: 7
#7

30 Apr 2015, 06:05

Hello: I am using IPD meta-anlaysis commands to generate a subgroup forest plot similar to the one Philip was trying to generate at the beginning of this thread. I am using ipdover in the following way:

ipdover, over(8 binary variables) over(1 subgroup variable) hr forest(nonull nooverall boxscale(0) xlabel(0.2(0.5)2, force)) : stcox i.indextype `demo' `clin' `meds2' `comorb'

where stcox is a multivariate model developed earlier and demo, clin, meds, and comorb are local macros for different varlists and indextype is a categorical variable (6 cats).

My issue is that every time I run this code, I get the following error:
[1.indextype] not found
post: above message corresponds to expression 6, variable _ES
r(111);

When I create my own dummy variable for indextype (namely indextype_1 through indextype_6) and use those in place of i.indextype in the stcox model, I get the following error instead:
no observations
r(2000);

However, when I use only one of the dummy variables like this:

ipdover, over(8 binary variables) over(1 subgroup) hr forest(nonull nooverall boxscale(0) xlabel(0.2(0.5)2, force)) : stcox indextype_3 `demo' `clin' `meds2' `comorb'

I get the graph (attached below) I'm looking for but it's obviously not correct because all of the indextype categories aren't included in the model. When I test stcox model in the full format (not within ipdover), I don't get any errors. I'm fairly new to Stata so maybe this is a very basic error I'm making but I would greatly appreciate some guidance.

Thank you,
Jamie

1 Photo

Last edited by Jamie Marra; 30 Apr 2015, 06:20.
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#8

30 Apr 2015, 10:29

Hi Jamie,

Could you give a little more detail about the structure of the plot you wish to generate? ipdover is designed to show how an effect estimate (e.g. the effect of a drug vs placebo) differs within different subgroups of the data. There are two "over()" options, so that the effect estimate may be shown within all combinations of two categorical variables.

It appears that you wish to show six different effect estimates within each subgroup, is that right? Or is it the effect of five categories each with respect to a reference category? Either way, unfortunately I suspect that yours is not a situation that ipdover can handle easily.

What your last line of code (... stcox indextype_3 ...) is doing is estimating the coefficient of indextype_3 within each combination of your "8 binary variables" and "1 subgroup", and plotting these.

There will undoubtedly be a way to achieve your aim, but it may require a bit of coding -- some combination of post and forestplot, maybe, or a series of calls to ipdover, nograph saving() then appending the saved datasets and tweaking them before running forestplot.

Could you possibly create a similar example using one of the standard Stata example datasets (e.g. auto.dta, as used by Philip earlier in this thread)? Then I could have a play around and see what might work best.

Thanks,

David.
Comment
Jamie Marra

Join Date: Jan 2015

Posts: 7
#9

30 Apr 2015, 10:58

Hi David,

That's actually very helpful. I had come to the same conclusion that I needed a series of calls but I hadn't thought to append the saved datasets and then run forestplot. In fact, I was about to abandon ipd and try manually using marginsplot! I'll try what you've suggested instead and if I run into more difficulties, I'll try to generate an auto.dta example.

Thank you kindly,
Jamie
Comment
David Fisher

Join Date: Apr 2014

Posts: 407
#10

06 Jul 2015, 03:41

For the benefit of future readers: I have posted an example of how to use repeated calls to ipdover followed by forestplot in the following post: http://www.statalist.org/forums/foru...es#post1301387
Comment
Philip Jones

Join Date: Mar 2014

Posts: 104
#11

06 Jul 2015, 03:49

Great news David. Continued thanks for your efforts!
Comment
Virginia Chiocchia

Join Date: Nov 2014

Posts: 8
#12

07 Dec 2016, 03:35

Hello,
I am using ipdover to create forest plots for subgroup interactions for the analysis of a clinical trial. I am using anova to look at the interaction between treatment (ArmRandomised) and another factor (Donor_Type) on a continuous outcome, (ln)PeakAST. My attempt is the following:

Code:

ipdover, over(ArmRandomised) over(Donor_Type) forest(nonull xlabel(0 1 2 5 10, force) astext(30)): anova lnPeakAST

And I get the forest plot below:

I would need some help with the following issues:
I don't know why I get the question marks in front of the figures in the "No. pts" column

What I would like to show in the forest plot is the mean difference (between treatment groups) in each Donor type subgroup, not each treatment effect in each subgroup as appears in the figure. A good example of what I am aiming at is the following forest plot http://bjp.rcpsych.org/content/bjprc...7/F2.large.jpg from MEEWISSE, JOHANNES B. et al. "Cortisol and post-traumatic stress disorder in adults" The BJPsych 2007, 191 (5) 387-392

I would also like, if possible, to show the mean and standard deviation of each treatment for each subgroup as displayed in this figure https://openi.nlm.nih.gov/imgs/512/3...eywords=cancer (ignoring the fact it's a meta analysis) from Barchitta M, et al. "LINE-1 hypomethylation in blood and tissue samples as an epigenetic marker for cancer risk: a systematic review and meta-analysis." PLoS ONE (2014)

I have already looked at the other posts on ipdover and forest plots for subgroup interactions but, despite they have been very helpful, they do not solve my issues above.
Any suggestions?

Many thanks,
Virginia
Comment

David Fisher

Join Date: Apr 2014
Posts: 407

#13

08 Dec 2016, 08:07

Hi Virginia,

Thanks for your message. I'll try to answer each of your issues separately:

- The question marks are, I think, the result of "special character" codes not being mapped consistently across different computer systems. They are supposed to represent non-breaking spaces. Could you try typing "ssc install ipdmetan, replace", as this should have been fixed in an update in June 2015.

- The way forward here is to think about the "estimation command", i.e. the command that follows the colon (":" symbol) in the ipdover command. If you wish to show a mean difference within each group, then your estimation command must calculate that mean difference. anova lnPeakAST will simply give you the overall mean of lnPeakAST. To find the mean difference between treatment groups, you need anova lnPeakAST ArmRandomised.

- That is possible, although it needs a little work to get there. Below is some annotated code that uses one of Stata's example datasets, available on the web, to simulate your own data:

I hope this is helpful; let me know if you have any additional questions.

Thanks,

David.

Code:

// load data (requires internet connection)
webuse lbw, clear

// this bit is not relevant to your own data;
// we are just creating an additional categorical variable to work with
recode age (min/19=1 "< 20 years") (20/max=2 ">= 20 years"), gen(agecat) label(agecat)
label var agecat "Age of mother"

// this is what you are currently doing
ipdover, over(agecat) over(smoke) ///
    forestplot(favours("Favours lower birthweight" "among smokers" # "Favours higher birthweight" "among smokers") ///
        nonull xlabel(0 1000 2000 3000 4000, force)) : anova bwt

// ...and this is what I suggest you should be doing, in my response to your second issue
ipdover, over(agecat) ///
    forestplot(favours("Favours lower birthweight" "among smokers" # "Favours higher birthweight" "among smokers")) ///
    : anova bwt smoke

// now we will work on adding the means and SDs to the plot, and also the interaction p-value
// (and also general tidying and making the graph more attractive)
gen bwt_smoke = bwt if smoke
gen bwt_nosmoke = bwt if !smoke
ipdover, over(agecat) nograph ///
    lcols((mean) bwt_smoke_mean=bwt_smoke (sd) bwt_smoke_sd=bwt_smoke (count) bwt_smoke_count=bwt_smoke %3.0f "Total" ///
        (mean) bwt_nosmoke_mean=bwt_nosmoke (sd) bwt_nosmoke_sd=bwt_nosmoke (count) bwt_nosmoke_count=bwt_nosmoke %3.0f "Total") ///
        saving(mydata, replace) ///
    : anova bwt smoke
// note the use of the "collapse"-style syntax for the "lcols" option; see "help collapse".
// we are using ipdover to generate a saved dataset which we can then manipulate before finally producing the forest plot

// obtain the interaction p-value and store it in a local macro
anova bwt smoke##agecat
local p_int = 2*normal(-abs(_b[1.smoke#2.agecat]/_se[1.smoke#2.agecat]))

// load the saved dataset
preserve
    use mydata, clear
    
    // generate the "mean (SD)" strings... for one "treatment" group (here, smokers vs non-smokers) ....
    gen bwt_smoke_str = string(bwt_smoke_mean, "%4.0f") + " (" + string(bwt_smoke_sd, "%3.0f") + ")"
    format bwt_smoke_str %-10s
    label var bwt_smoke_str "Smokers, Mean (SD)"

    // ... and for the other "treatment" group
    gen bwt_nosmoke_str = string(bwt_nosmoke_mean, "%4.0f") + " (" + string(bwt_nosmoke_sd, "%3.0f") + ")"
    format bwt_nosmoke_str %-10s
    label var bwt_nosmoke_str "Non-smokers, Mean (SD)"

    // create a string containing the interaction p-value and store it in the appropriate place
    gen interaction = "p = " + string(`p_int', "%05.3f") if _LEVEL==1
    format interaction %-11s
    label var interaction "p-value for interaction"
    
    // finally, produce the forest plot
    forestplot, lcols(bwt_smoke_str bwt_smoke_count bwt_nosmoke_str bwt_nosmoke_count) rcols(interaction) ///
        effect("Mean difference in bwt") astext(70) nowt
restore

Comment

Virginia Chiocchia

Join Date: Nov 2014

Posts: 8
#14

09 Dec 2016, 04:13

Hi David,

Thank you very much, that's very helpful.

I am trying the code you provided to add the means and SDs to the plot. Unfortunately, I get the following errors:

Code:

gen interaction = "p = " + string(`p_int', "%05.3f") if _LEVEL==1 type mismatch r(109);

and

Code:

forestplot, lcols(PeakAST_NMP_str PeakAST_NMP_count PeakAST_SCS_str PeakAST_SCS_count) rcols(interaction) /// effect("Mean difference in (ln)Peak AST") astext(70) nowt _USE not found r(111);

I am struggling to understand what they refer to. Any help?

Also, with regards to the interaction p-value, I noticed you run the anova with the interaction effect only without including main effects. However, as in my analysis I used the anova model including main effects as well as the interaction (this is how I have been thought), I used this one to obtain the interaction and store it int he macro. Do you think that's fine or does it create issues?

Hope that's clear but please let me know if you require further info.

Many thanks,
Virginia
Comment
Virginia Chiocchia

Join Date: Nov 2014

Posts: 8
#15

09 Dec 2016, 05:50

Apologies, David.
Please ignore my final comment about interaction and main effect in the anova model - I just realised the double hash mark gives both!

Please also ignore my comment about the second error ( r(111) ) - I realised I need to run the whole code from 'preserve' to 'restore' in one go for the forestplot command to work.

I am still getting the type mismatch error for the interaction p-value so for now I am only running the final forestplot without displaying the interaction p-value. However, there seems to be an issue when displaying the mean and sd for each subgroup in the forestplot - as you can see in the graph I attached, the means are all the same within the randomisation group and all the sd appear as 1. I think it's only an issue for displaying them in the graph because I checked the saved dataset in the data editor and the mean and sd for both groups look fine.

Please let me know if you need any further info.

Many thanks again for your help,
Virginia

Last edited by Virginia Chiocchia; 09 Dec 2016, 06:33.
Comment

Announcement