Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When is STATA going to make graphing with error bars easy?

    I love STATA but I'm stunned they haven't made it simple to include error bars on bar graphs.
    Most if not all of the competition have easy error bars, with flexible options and effortless defaults.
    Why would STATA want to lag so far behind on something so utterly basic?
    Honestly if I had known this before investing in STATA I wouldn't even have purchased it to begin with.
    It really is a deal breaker.
    I assume they must eventually respond to this issue.
    Does anybody have any idea how long we'll have to wait?
    I'd rather climb the learning curve of some other software, maybe R or SAS, than wait years for STATA to get up to speed with this.





  • #2
    Short answer: It is there already.

    What have you tried thus far? What did you find difficult? What error bars do you want?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I agree with Maarten. These are mysterious statements.

      I assume that you want bar graphs for estimates and error bars showing say some multiple of each standard error or equivalently a confidence interval.

      The bigger deal here may be a issue raised often on Statalist and in other forums, namely that these are widely considered to be poor and uninformative plots, which (a) almost always show too little of the data and (b) emphasise comparisons with zero that often are of no use or interest. Dynamite plots and detonator plots are good search phrases.

      Be that as it may, here is some technique: Often, but not always, there is a two step:

      1. Prepare a results set.

      2. Plot it.

      Code:
      set scheme s1color 
      sysuse auto, clear
      statsby, by(foreign) : ci mean mpg 
      twoway bar mean foreign, xla(0 1, valuelabel tlc(none)) xsc(r(-0.2 1.2)) base(0) barw(0.3) ///
      || rspike mean ub foreign, pstyle(p1) ytitle(Miles per gallon) legend(off) yla(0(4)28, ang(h))
      There are good reasons why there isn't a single command to do this directly: it would need to cover many kinds of estimation.

      The two step isn't the only way to do it: see also coefplot (Stata Journal), ciplot (SSC, more limited and no longer under development), etc.

      See also https://www.statalist.org/forums/help#spelling on spelling "Stata" (and as a test of whether people do read the FAQ as we ask).

      Comment


      • #4
        Perhaps it's because I'm a relative newbie to Stata myself, but I am somewhat sympathetic to Gregory's post. As an example, SPSS has several ways to directly plot means with various types of intervals (CI, +/- some number of SEs or SDs). Here are some examples using the same data Nick used in #3.

        Code:
        * The "legacy" GRAPH command in SPSS.
        GRAPH /ERRORBAR(CI 95)=mpg BY Foreign.
        GRAPH /ERRORBAR(STERROR 2)=mpg BY Foreign.
        GRAPH /ERRORBAR(STERROR 1)=mpg BY Foreign.
        GRAPH /ERRORBAR(STDDEV 1)=mpg BY Foreign.
        
        
        * The new-fangled "Chart Builder" allows greater flexibility (at the cost of more code).
        * Watch for the following functions:  MEANCI(), MEANSE(), MEANSD().
        
        * Plot mean & 95% CI.
        GGRAPH
          /GRAPHDATASET NAME="graphdataset" VARIABLES=Foreign MEANCI(mpg, 95)[name="MEAN_mpg"
            LOW="MEAN_mpg_LOW" HIGH="MEAN_mpg_HIGH"] MISSING=LISTWISE REPORTMISSING=NO
          /GRAPHSPEC SOURCE=INLINE.
        BEGIN GPL
          SOURCE: s=userSource(id("graphdataset"))
          DATA: Foreign=col(source(s), name("Foreign"), unit.category())
          DATA: MEAN_mpg=col(source(s), name("MEAN_mpg"))
          DATA: LOW=col(source(s), name("MEAN_mpg_LOW"))
          DATA: HIGH=col(source(s), name("MEAN_mpg_HIGH"))
          GUIDE: axis(dim(1), label("Foreign"))
          GUIDE: axis(dim(2), label("Mean mpg"))
          GUIDE: text.title(label("Simple Bar Mean of mpg by Foreign"))
          GUIDE: text.footnote(label("Error Bars: 95% CI"))
          SCALE: linear(dim(2), include(0))
          ELEMENT: interval(position(Foreign*MEAN_mpg), shape.interior(shape.square))
          ELEMENT: interval(position(region.spread.range(Foreign*(LOW+HIGH))), shape.interior(shape.ibeam))
        END GPL.
        
        * Plot mean & bars extending 2*SE below and ablve.
        GGRAPH
          /GRAPHDATASET NAME="graphdataset" VARIABLES=Foreign MEANSE(mpg, 2)[name="MEAN_mpg"
            LOW="MEAN_mpg_LOW" HIGH="MEAN_mpg_HIGH"] MISSING=LISTWISE REPORTMISSING=NO
          /GRAPHSPEC SOURCE=INLINE.
        BEGIN GPL
          SOURCE: s=userSource(id("graphdataset"))
          DATA: Foreign=col(source(s), name("Foreign"), unit.category())
          DATA: MEAN_mpg=col(source(s), name("MEAN_mpg"))
          DATA: LOW=col(source(s), name("MEAN_mpg_LOW"))
          DATA: HIGH=col(source(s), name("MEAN_mpg_HIGH"))
          GUIDE: axis(dim(1), label("Foreign"))
          GUIDE: axis(dim(2), label("Mean mpg"))
          GUIDE: text.title(label("Simple Bar Mean of mpg by Foreign"))
          GUIDE: text.footnote(label("Error Bars: +/- 2 SE"))
          SCALE: linear(dim(2), include(0))
          ELEMENT: interval(position(Foreign*MEAN_mpg), shape.interior(shape.square))
          ELEMENT: interval(position(region.spread.range(Foreign*(LOW+HIGH))), shape.interior(shape.ibeam))
        END GPL.
        
        * Plot mean & bars extending 1*SE below and ablve.
        GGRAPH
          /GRAPHDATASET NAME="graphdataset" VARIABLES=Foreign MEANSE(mpg, 1)[name="MEAN_mpg"
            LOW="MEAN_mpg_LOW" HIGH="MEAN_mpg_HIGH"] MISSING=LISTWISE REPORTMISSING=NO
          /GRAPHSPEC SOURCE=INLINE.
        BEGIN GPL
          SOURCE: s=userSource(id("graphdataset"))
          DATA: Foreign=col(source(s), name("Foreign"), unit.category())
          DATA: MEAN_mpg=col(source(s), name("MEAN_mpg"))
          DATA: LOW=col(source(s), name("MEAN_mpg_LOW"))
          DATA: HIGH=col(source(s), name("MEAN_mpg_HIGH"))
          GUIDE: axis(dim(1), label("Foreign"))
          GUIDE: axis(dim(2), label("Mean mpg"))
          GUIDE: text.title(label("Simple Bar Mean of mpg by Foreign"))
          GUIDE: text.footnote(label("Error Bars: +/- 1 SE"))
          SCALE: linear(dim(2), include(0))
          ELEMENT: interval(position(Foreign*MEAN_mpg), shape.interior(shape.square))
          ELEMENT: interval(position(region.spread.range(Foreign*(LOW+HIGH))), shape.interior(shape.ibeam))
        END GPL.
        
        
        * Plot mean & bars extending 1*SD below and ablve.
        GGRAPH
          /GRAPHDATASET NAME="graphdataset" VARIABLES=Foreign MEANSD(mpg, 1)[name="MEAN_mpg"
            LOW="MEAN_mpg_LOW" HIGH="MEAN_mpg_HIGH"] MISSING=LISTWISE REPORTMISSING=NO
          /GRAPHSPEC SOURCE=INLINE.
        BEGIN GPL
          SOURCE: s=userSource(id("graphdataset"))
          DATA: Foreign=col(source(s), name("Foreign"), unit.category())
          DATA: MEAN_mpg=col(source(s), name("MEAN_mpg"))
          DATA: LOW=col(source(s), name("MEAN_mpg_LOW"))
          DATA: HIGH=col(source(s), name("MEAN_mpg_HIGH"))
          GUIDE: axis(dim(1), label("Foreign"))
          GUIDE: axis(dim(2), label("Mean mpg"))
          GUIDE: text.title(label("Simple Bar Mean of mpg by Foreign"))
          GUIDE: text.footnote(label("Error Bars: +/- 1 SD"))
          SCALE: cat(dim(1), include("0", "1"))
          SCALE: linear(dim(2), include(0))
          ELEMENT: interval(position(Foreign*MEAN_mpg), shape.interior(shape.square))
          ELEMENT: interval(position(region.spread.range(Foreign*(LOW+HIGH))), shape.interior(shape.ibeam))
        END GPL.
        Note that all of these commands use the raw data.

        I'll not upload all of the graphs. Here are the plots showing means with 95% CIs. The first is from the GRAPH command, the second from GGRAPH (i.e., the Chart Builder).
        Click image for larger version

Name:	CI_plots_from_SPSS.png
Views:	1
Size:	176.9 KB
ID:	1423014
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          I gather the issue is mostly related to products that promise to "create graphs automatically", without much reflection.

          Personally, I find pleasure and purpose when customizing the graphs I wish to produce in Stata.

          When comparing to R, as underlined in #1, it is quite a ladder to climb, for IMHO the commands (related to many packages) are somewhat cumbersome, at least for beginners.

          In the example below, the Stata command is just one - short - line, preceded by a "prep" command:

          Code:
          sysuse bplong.dta
          collapse (mean) bp=bp (sd) sd=bp , by(agegrp)
          serrbar bp sd agegrp, xlabel(1 2 3) ytitle(Mean blood pressure (plus SD))
          Click image for larger version

Name:	Graph_serrbar2.png
Views:	1
Size:	19.9 KB
ID:	1423079

          Last edited by Marcos Almeida; 20 Dec 2017, 05:23.
          Best regards,

          Marcos

          Comment


          • #6
            I think Bruce's example shows that SPSS is a bit (no, a lot) simpler here. That's fine. I (we, presumably) don't mind other programs being easy to use when they are. I just care here about how easy Stata is to use and the original post I think is guilty of exaggeration on this point.

            It seems a bit untidy that you need collapse to produce sd (or semean, as Marcos could have shown) and ci to produce confidence intervals at specified confidence levels such as 95%, but dotplot, bar should be mentioned too.

            There is a larger and different issue that, on the whole, the users are ahead of the developers in programming what they want under this heading. I mentioned coefplot in #3. I could add stripplot (SSC). I suspect that the more statistically minded you are, the more you want confidence intervals. Straight estimate +/- se or estimate +/- 2 * se bars seem to be used for one of more of the following reasons (a) tribal habit in some scientific fields (b) they look better in understating uncertainty (especially +/- se) (c) they are approximations to more precisely calculated confidence intervals.

            I was curious about whether Marcos' example could be adapted to produce the execrable detonator or dynamite plots for consenting adults, and it can.

            Code:
            serrbar bp sd agegrp, xlabel(1 2 3) ytitle(Mean blood pressure (plus SD)) mvopts(recast(bar) barw(0.4) base(0) bfcolor(none)) yla(0(20)160)
            Incidentally, I am given as the author of serrbar in the manuals. This is mostly an example of StataCorp's generosity in attribution. There was a serrbar before me, and I then I generalised it for some purpose -- I guess because someone wanted something it could not quite do -- and StataCorp (as now is) folded that back into the official code for Stata 6, Later still, StataCorp rewrote it again. User contributions like that are sticky.

            I am more a fan of the two step, results sets first and then graphics, as illustrated by Marcos and as discussed for example in http://www.stata-journal.com/sjpdf.h...iclenum=gr0045

            Comment


            • #7
              Nick, as always, provided an insightful reply concerning the matter. No wonder, he guessed my thoughts right, even when remarking about the - se - I should have demonstrated as well, for I preferred to choose the SD in the example I shared.

              Actually, at first, my example presented both options, but I decided to avoid showing the SE for what I fear to be a somewhat misleading presentation in this particular case.

              The reason for this, also guessed correctly by Nick, is precisely my despise of it when applying for situations where the SD should be selected instead (well underlined in a) and b) in #6.

              I do believe we should not use SEs where SDs or CIs are better applicable, for that would be taken as "photoshopping" the results, possibly out of vested interests, who knows.

              Also amazing enough, when preparing my demonstration, I kept for quite a while wondering whether I could successfully - recast - the graphic as a bar graph, like the one Nick shared above. I fiddled with - recast - for a couple of minutes, but then, well, I failed. Besides, it came to my mind a great lesson I learned (in the link, at #5) - guess by who - on the cons of the so called dynamite plots. So convinced I became henceforth, that I decided to give this potential deed (serrbar plus recast bar) a pass.

              All in all, this thread led to a rather interesting discussion, i.e., the appropriateness of the graphic shared the spotlight with its user-friendliness.
              Best regards,

              Marcos

              Comment


              • #8
                Looking at SPSS's new "Chart Builder" syntax in post #4, compared to that of the Graph command, I wonder if Chart Builder wasn't created in response to user complaints along the lines of of "When is SPSS going to make graphing more flexible?"

                Comment


                • #9
                  William Lisowski Leland (Lee) Wilkinson's work on "The Grammar of Graphics" is behind that.
                  http://www.springer.com/gb/book/9780387245447

                  Leland was for a time a Vice-President of SPSS.

                  The same ideas underlie ggplot2 in R.

                  My review at https://www.jstatsoft.org/article/view/v017b03 of Wilkinson's book failed completely to foresee the singular event of Hadley Wickham writing that (and what precedes and follows it). (It also misses the scope for a standard gamma distribution.)

                  Comment


                  • #10
                    Originally posted by William Lisowski View Post
                    Looking at SPSS's new "Chart Builder" syntax in post #4, compared to that of the Graph command, I wonder if Chart Builder wasn't created in response to user complaints along the lines of of "When is SPSS going to make graphing more flexible?"
                    You could be right, William.

                    I should add that prior to GGRAPH, SPSS introduced IGRAPH, which is now "deprecated". You can still find examples of it on the UCLA website and elsewhere though. The big advantage of both IGRAPH and GGRAPH is that they allowed the user to do things via code that could only be done via manual editing previously (e.g., adding a regression line to a scatter-plot).

                    Cheers,
                    Bruce
                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 18.5 (Windows)

                    Comment


                    • #11
                      I think the common theme between this topic and one a few days earlier is the tension, for general-purpose statistical packages, between the conflicting goals of "make it easy to do what I need to do" and "make it possible to do whatever anyone needs to do". My hypothesized user complaint in post #6 played off the title of this topic to (weakly) make that point.

                      With that said, my thanks to Nick Cox for the reference and review. I agree with Nick's assessment that the 1980's were a golden time for statistical graphics, not the least because of the advent of powerful facilities for producing bitmapped graphics. At SAS, John Sall got a Mac and was so taken by it he developed JMP, of which I was an early adopter, not the least for its unsurpassed ability to do fully interactive, and even dynamic, graphics. It provided me tools for easily exploring data visually. Kempthorne once wrote that in the face of data from 100 subjects on 5 stimuli with 20 tests per stimulus given before and after, that to get a feel for the 20,000 resulting numbers he "will do an analysis of variance on them" despite "having no idea of normal law theory with independent (mu, sigma squared) errors and all that rubbish" because "I just want to look at the numbers, get some feel for them." (My apologies for abridging a wonderful paragraph of pure Kempthorne.) JMP was the solution to that problem for my work at that time, allowing me to explore my data in a way that no graphics system that focuses on publication-quality graphics destined for two-dimensional static display could do. In a sense, they were the graphical analogue of Excel Pivot Tables, which have nothing in common with tabular presentation per se, but facilitate quick and easy summarization of multidimensional data in ways that help find the meaning in it. And in another sense, publication-qualiity graphics in routinized formats are the equivalent of regression results in routinized formats, with everything boiled down to a zero-to-three star representation, or to a set of boxes with a whisker sticking up from each.

                      So that's a long way around to saying that there's a whole missing dimension to the graphics debate (I don't know if Wilkinson addressed it, his publisher is not generous enough to show a table of contents): as this Statalist topic debates tradeoffs along the usability and flexibility axes, it misses the fact that we are embedded in more ways than one in Flatland, with no access to the dimension of "purpose" with "summary for publication" and "exploratory analysis" being two points on the purpose dimension.

                      He concludes, dismounting his soapbox and fading into the crowd at Speaker's Corner.

                      Comment


                      • #12
                        Thank you everybody for your detailed comments.
                        I really appreciate it and I didn't mean to offend anybody.
                        My primary experience drawing graphs for my job (bioinformatics) is with PRISM which makes error bars so trivial to add,
                        when you ask for a bar graph it asks you whether you wand SD or SEM or whatever, and they're just that easy to add.
                        Tthe menu based graphing interfaces don't seem to offer that on bar graphs when it seems like it would be so easy to do so.
                        My first impression from the posts is that STATA error bars are simple once you have considerable experience with graphing and the complex syntax.
                        Anyway I'm still working through the comments and I'll comment more later.
                        Thanks again!

                        Comment


                        • #13
                          Hello all,

                          I'm pleased with the output from serrbar, however, it would be nice to connect the dots on my plot. I know that "two way connected" would do this, but I still want error bars. Is there a way to add a line to connect at the means using serrbar?

                          I appreciate this post. Best, Brittany

                          Comment


                          • #14
                            Originally posted by Brittany Krzyzanowski View Post
                            Hello all,

                            I'm pleased with the output from serrbar, however, it would be nice to connect the dots on my plot. I know that "two way connected" would do this, but I still want error bars. Is there a way to add a line to connect at the means using serrbar?

                            I appreciate this post. Best, Brittany
                            Welcome to Statalist! I think you're an alum of my institution. Anyway, the answer is yes:

                            Code:
                            webuse assembly
                            serrbar mean std date, scale(2) yline(195) mvopts(recast(connected))
                            With some graphing commands, you can tell the command to recast() the graph in another style. If you just add recast(connected) as an option, you won't get what you want (in fact, it doesn't even look remotely sensible). However, in the command, mvopts() designates options for the dots (that represent the means). You could change the the color to hot pink with color(pink), you could make the dots as large as Stata can plot with msize(ehuge), or, more sensibly, you could tell Stata to recast that series as connected lines.
                            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                            Comment


                            • #15
                              Thank you, Weiwen. I am just getting adjusted to StataList--I will be sure to use the embed code feature in the future.
                              Code:
                              mvopts()
                              worked perfectly to connect the lines. I'm holding off on the hot pink points for now, but I do believe it may come in handy in the future. Go Gophers! -Brittany

                              Comment

                              Working...
                              X