Adding standard error bars to a stripplot and bringing groups closer to each other

Jay Gold

Join Date: Jul 2023

Posts: 72
#1

Adding standard error bars to a stripplot and bringing groups closer to each other

17 Oct 2023, 20:48

I am graphing a continuous variable, lets say BMI, among two groups, lets say Black/African American, with a "before" and "after". I am not yet at the point where I figured out how to get the two groups with the before and after values on the graph. But before that, I need to figure out how to at least do one. Since this is not published data, I am using dummy data. I started with -stripplot- since it seems to have more functionality than -dotplot-.

This code works for generating the graph below:

Code:

stripplot AgeAtEnrollment , over (BlackAficanAmerican) stack centre vertical refline(lw(medium black)) reflevel(mean) xla(, noticks) xla(, nogrid) yla (,nogrid)

I tried adding in standard error bars, but starting first with just standard deviation for coding simplicity at this point. I tried with the -addplot- option and -rcap- but can't seem to get it to work.

Code:

ci variances BMI if BlackAficanAmerican == 1, sd local b_lsdv = r(sd)*-1 local b_usdv = r(sd) ci variances BMI if BlackAficanAmerican == 0, sd local w_lsdv = r(sd)*-1 local w_usdv = r(sd) stripplot BMI , over (BlackAficanAmerican) stack centre vertical refline(lw(medium black)) reflevel(mean) xla(, noticks) xla(, nogrid) yla (,nogrid) addplot (rcap `b_lsdv' `b_usdv' BMI if BlackAficanAmerican==1) addplot (rcap `w_lsdv' `w_usdv' BMI if BlackAficanAmerican==0)

Returns the error:

Code:

option addplot() not allowed r(198);

Other lingering questions:
How do I bring the two groups closer together? The blank space is not useful

How can I spread the dots out if they are overlapping? Like how -dotplot- with the -center- option functions. In my code above -center- does not seem to do this.

Ultimately, I would like the "before" and "after" paired comparison within each factor (Non-black and Black) to be close to each other, but farther from the other paired comparison

Any help is greatly appreciated!!
Tags: None
Jay Gold

Join Date: Jul 2023

Posts: 72
#2

17 Oct 2023, 20:51

Sorry, must have accidently removed the graph when posting. This is the graph produced by the first section of code posted.
Attached Files
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#3

18 Oct 2023, 06:26

Cross posted to reddit: https://www.reddit.com/r/stata/comme...tripplot_and/?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

18 Oct 2023, 08:54

stripplot is from SSC, as you are asked to explain (FAQ Advice #12).

Your approach is doomed as twoway rcap requires variable names and you are feeding it individual numeric values. ci variances is surely wrong here as you want confidence intervals for the means. It's fine, indeed encouraged, to use fake data if your real data are confidential or sensitive, but we need to see them to be able to use them too.

There is better news. The bar option for stripplot is already dedicated to showing confidence intervals.

stack won't make a difference if each value is unique; you need to bin first.

Here is an example of code for one continuous outcome, two binary predictors, and a need to bin before stacking. If you want mean +/- SE bars you need to adjust level() accordingly, as stripplot defaults to c(level) % confidence intervals, which itself defaults to 95.

Code:

webuse nlswork, clear stripplot ln_wage in 1/200, over(c_city) by(collgrad) vertical bar stack width(0.1)
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#5

18 Oct 2023, 09:36

Thank you Dr. Cox! Very helpful as always!

I added in level() and mean() options. How can I add in standard error (or standard deviation) other than "manually" setting level(34), which is only albeit close approximation of standard deviation and does not help with SE.

Code:

stripplot AgeAtEnrollment , over (BlackAficanAmerican) vertical bar(mean() level (34)) stack width(0.5) center
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

18 Oct 2023, 09:52

What you want is not supported directly. As documented the bar() option is for confidence intervals, but level(68) is to my mind an adequate approximation for means of continuous variables +/- SE (not 34).

Otherwise you could as before use addplot() but what you want to plot must (in your case) exist beforehand as variables in the dataset.
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#7

18 Oct 2023, 11:10

Yes, I am happy to use addplot()with the example (second set of code) I used above. However, I get the error I listed above. How would I add in rcap with addplot()?

Last edited by Jay Gold; 18 Oct 2023, 11:12.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

18 Oct 2023, 11:43

#5 is already answered. The same code will yield the same error. You must create variables first for mean + SE and mean - SE.

Sorry, I am not going to suggest code because you're just creating work for yourself when stripplot has built-in code for a very good approximation to what you want.

As said, and now with emphasis, you could use addplot() -- but it's my fault if I wasn't clear in my implication that such a choice seems perverse.
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#9

18 Oct 2023, 13:11

Dr. Cox, I will sincerely never tire of your witty responses. Thank you for helping out the Stata community so much!
1 like
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#10

24 Aug 2024, 18:26

Hi, Dr. Cox. I am revisiting this issue I had. The level(68) worked well for adding in CI bars for +/- SE. But shouldn't level(68) be SD and not SE? Not sure I understand how 68% resulted in displaying the SE. 68% of the variation should be the SD.

Thanks!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35697

#11

25 Aug 2024, 01:23

I don't think so. For what you are talking about, CIs are mean +/- k SE of mean where k depends on the confidence level.

Here is an example showing that using a level of 68% gives close to mean + SE for the upper limit. I get 21.97 to 2 dp from both approaches. Even for a sample size of 74 SDs are much bigger.

Code:

. ci means mpg , level(68)

    Variable |        Obs        Mean    Std. err.       [68% conf. interval]
-------------+---------------------------------------------------------------
         mpg |         74     21.2973    .6725511        20.62389    21.97071

. ret li

scalars:
                  r(N) =  74
               r(mean) =  21.2972972972973
                 r(se) =  .6725510870764975
                 r(lb) =  20.62388679458369
                 r(ub) =  21.97070780001091
              r(level) =  68

macros:
             r(citype) : "normal"

. di r(mean) + r(se)
21.969848

. su mpg

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

. di r(mean) + r(sd)
27.082801

Official command dotplot offers bars that are mean +/- SD.

As always, the variability of a mean is less than the variability of the data once you have 2 or more data points.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35697
#12

25 Aug 2024, 02:45

Otherwise put, I know three common conventions for error bars

mean +/- SD (not a confidence interval)

mean +/- SE (an understated confidence interval, about 68% level if sampling distribution is normal or nearly so)

mean +/- k SE (k dependent on sample size as well as confidence level, bur usually near 2 for 95% interval)

Oddly, or otherwise, the history of error bars seems undocumented. Most graph types started earlier than people often say (it's easy not to know of uses in some literature somewhere and somewhen that didn't catch on or didn't spread to your own sub-sub-field), but error bars may have started later than might be guessed.
Comment

Announcement