Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding standard error bars to a stripplot and bringing groups closer to each other

    I am graphing a continuous variable, lets say BMI, among two groups, lets say Black/African American, with a "before" and "after". I am not yet at the point where I figured out how to get the two groups with the before and after values on the graph. But before that, I need to figure out how to at least do one. Since this is not published data, I am using dummy data. I started with -stripplot- since it seems to have more functionality than -dotplot-.

    This code works for generating the graph below:
    Code:
    stripplot  AgeAtEnrollment , over (BlackAficanAmerican) stack centre vertical refline(lw(medium black)) reflevel(mean)  xla(, noticks) xla(, nogrid) yla (,nogrid)
    I tried adding in standard error bars, but starting first with just standard deviation for coding simplicity at this point. I tried with the -addplot- option and -rcap- but can't seem to get it to work.

    Code:
    ci variances BMI if BlackAficanAmerican == 1, sd
    local b_lsdv = r(sd)*-1
    local b_usdv = r(sd)
    ci variances BMI if BlackAficanAmerican == 0, sd
    local w_lsdv = r(sd)*-1
    local w_usdv = r(sd)
    stripplot  BMI , over (BlackAficanAmerican) stack centre vertical refline(lw(medium black)) reflevel(mean)  xla(, noticks) xla(, nogrid) yla (,nogrid) addplot (rcap `b_lsdv' `b_usdv' BMI if BlackAficanAmerican==1) addplot (rcap `w_lsdv' `w_usdv' BMI if BlackAficanAmerican==0)
    Returns the error:
    Code:
    option addplot() not allowed
    r(198);
    Other lingering questions:
    1. How do I bring the two groups closer together? The blank space is not useful
    2. How can I spread the dots out if they are overlapping? Like how -dotplot- with the -center- option functions. In my code above -center- does not seem to do this.
    3. Ultimately, I would like the "before" and "after" paired comparison within each factor (Non-black and Black) to be close to each other, but farther from the other paired comparison
    Any help is greatly appreciated!!

  • #2
    Click image for larger version

Name:	Graph.jpg
Views:	4
Size:	15.5 KB
ID:	1730646


    Sorry, must have accidently removed the graph when posting. This is the graph produced by the first section of code posted.
    Attached Files

    Comment


    • #3
      Cross posted to reddit: https://www.reddit.com/r/stata/comme...tripplot_and/?

      Comment


      • #4
        stripplot is from SSC, as you are asked to explain (FAQ Advice #12).

        Your approach is doomed as twoway rcap requires variable names and you are feeding it individual numeric values. ci variances is surely wrong here as you want confidence intervals for the means. It's fine, indeed encouraged, to use fake data if your real data are confidential or sensitive, but we need to see them to be able to use them too.

        There is better news. The bar option for stripplot is already dedicated to showing confidence intervals.

        stack won't make a difference if each value is unique; you need to bin first.

        Here is an example of code for one continuous outcome, two binary predictors, and a need to bin before stacking. If you want mean +/- SE bars you need to adjust level() accordingly, as stripplot defaults to c(level) % confidence intervals, which itself defaults to 95.

        Code:
        webuse nlswork, clear 
        
        stripplot ln_wage in 1/200, over(c_city) by(collgrad) vertical bar stack width(0.1)

        Comment


        • #5
          Thank you Dr. Cox! Very helpful as always!

          I added in level() and mean() options. How can I add in standard error (or standard deviation) other than "manually" setting level(34), which is only albeit close approximation of standard deviation and does not help with SE.

          Code:
          stripplot  AgeAtEnrollment , over (BlackAficanAmerican) vertical bar(mean() level (34)) stack width(0.5) center

          Comment


          • #6
            What you want is not supported directly. As documented the bar() option is for confidence intervals, but level(68) is to my mind an adequate approximation for means of continuous variables +/- SE (not 34).

            Otherwise you could as before use
            addplot() but what you want to plot must (in your case) exist beforehand as variables in the dataset.

            Comment


            • #7
              Yes, I am happy to use addplot()with the example (second set of code) I used above. However, I get the error I listed above. How would I add in rcap with addplot()?
              Last edited by Jay Gold; 18 Oct 2023, 11:12.

              Comment


              • #8
                #5 is already answered. The same code will yield the same error. You must create variables first for mean + SE and mean - SE.

                Sorry, I am not going to suggest code because you're just creating work for yourself when stripplot has built-in code for a very good approximation to what you want.

                As said, and now with emphasis, you could use addplot() -- but it's my fault if I wasn't clear in my implication that such a choice seems perverse.

                Comment


                • #9
                  Dr. Cox, I will sincerely never tire of your witty responses. Thank you for helping out the Stata community so much!

                  Comment


                  • #10
                    Hi, Dr. Cox. I am revisiting this issue I had. The level(68) worked well for adding in CI bars for +/- SE. But shouldn't level(68) be SD and not SE? Not sure I understand how 68% resulted in displaying the SE. 68% of the variation should be the SD.

                    Thanks!

                    Comment


                    • #11
                      I don't think so. For what you are talking about, CIs are mean +/- k SE of mean where k depends on the confidence level.

                      Here is an example showing that using a level of 68% gives close to mean + SE for the upper limit. I get 21.97 to 2 dp from both approaches. Even for a sample size of 74 SDs are much bigger.

                      Code:
                      . ci means mpg , level(68)
                      
                          Variable |        Obs        Mean    Std. err.       [68% conf. interval]
                      -------------+---------------------------------------------------------------
                               mpg |         74     21.2973    .6725511        20.62389    21.97071
                      
                      . ret li
                      
                      scalars:
                                        r(N) =  74
                                     r(mean) =  21.2972972972973
                                       r(se) =  .6725510870764975
                                       r(lb) =  20.62388679458369
                                       r(ub) =  21.97070780001091
                                    r(level) =  68
                      
                      macros:
                                   r(citype) : "normal"
                      
                      . di r(mean) + r(se)
                      21.969848
                      
                      . su mpg
                      
                          Variable |        Obs        Mean    Std. dev.       Min        Max
                      -------------+---------------------------------------------------------
                               mpg |         74     21.2973    5.785503         12         41
                      
                      . di r(mean) + r(sd)
                      27.082801
                      Official command dotplot offers bars that are mean +/- SD.

                      As always, the variability of a mean is less than the variability of the data once you have 2 or more data points.

                      Comment


                      • #12
                        Otherwise put, I know three common conventions for error bars

                        mean +/- SD (not a confidence interval)

                        mean +/- SE (an understated confidence interval, about 68% level if sampling distribution is normal or nearly so)

                        mean +/- k SE (k dependent on sample size as well as confidence level, bur usually near 2 for 95% interval)

                        Oddly, or otherwise, the history of error bars seems undocumented. Most graph types started earlier than people often say (it's easy not to know of uses in some literature somewhere and somewhen that didn't catch on or didn't spread to your own sub-sub-field), but error bars may have started later than might be guessed.

                        Comment

                        Working...
                        X