Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems Introducing CI in Bar Chart

    Hey there, I am having a problem with –catplot- in making descriptive statistics with two categorical variables (using Stata 15 for windows).
    My data structure: One binary and one categorical variable, where var1 (0,1) and var2 (1, 2, 3, 4).
    I want to make something like the following bar chart, which has been made with -catplot-

    command: catplot var1, over(var2) percent(var1) asyvar recast(bar)
    Click image for larger version

Name:	Graph_example.png
Views:	1
Size:	23.0 KB
ID:	1513777

    I would like to introduce now the 90% confidence intervals over the different bars (percentages of categories of var2 within the two groups of var1). I was able to find out the 90% confidence intervals for each category of var2 within the categories of var1 with –proportion-
    command: proportion var2, over(var1) level(90)
    Over Proportion Std. Error 90% CI lower 90% CI upper
    _prop_1
    var1.1 0.46666017 0.0110185 0.4485274 0.484764
    var1.2 0.5220399 0.0070236 0.5104759 0.53335803
    ... ... ... ... ...
    On the table above you can see the upper and lower 90%-CI for the first two bars of the bar chart above.
    Can someone help me to introduce these 90% confidence intervals in this graphic? Or is it not possible to this with the catplot command – is there another possibility to do so?

    Your help is very much appreciated - thanks in advance!

  • #2
    I gather this text will be helpful to you.
    Best regards,

    Marcos

    Comment


    • #3
      catplot is from SSC, as you are asked to explain (FAQ Advice #12). As used here it's a wrapper for graph bar, but there isn't any way to combine that with twoway rcap or twoway rspike, which is what you need for adding confidence intervals.

      For what you want, you probably need twoway bar and twoway rcap combined.

      https://journals.sagepub.com/doi/pdf...867X1001000112 explains one way to approach confidence interval display.

      I would strongly recommend using evocative variable names, not the utterly colourless var1 and so forth.

      Comment


      • #4
        Dear Marcos and Dear Nick
        thank you both very much for your reply. This is indeed extremly helpful!

        Best regards
        Patricia

        Comment


        • #5
          I don't quite share Marcos Almeida 's enthusiasm for https://stats.idre.ucla.edu/stata/fa...th-error-bars/

          It's perhaps obvious to discerning readers, but as shown there the confidence intervals do not show up well. Better to have stronger colours for the intervals and lighter colours for the bars (e,g, none at all). In fact, the much deprecated "dynamite plot" format (Google for discussions) can be avoided altogether, as can the distraction of arbitrary colours and the indirection of a legend.

          Also, as #3 already stated, you don't need such a "do-it-yourself" approach to confidence intervals.

          Here's an alternative.

          Code:
          use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
          
          statsby, by(race ses) : ci mean write , level(90)
          
          twoway scatter race mean, yla(1/4, valuelabel tlength(0) ang(h))scheme(s1color) ///
          || rcap ub lb race, horizontal by(ses, subtitle(Mean writing scores by SES and race with 90% confidence intervals)note("") col(1) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("") 
          Click image for larger version

Name:	meansandci.png
Views:	1
Size:	26.7 KB
ID:	1513815

          Comment


          • #6
            Hey there, I have one more uncertainty concerning the plotting of the percentages and CI's.

            On my Y-axis I do not want to have the means of each category, but I am interested in the percentage of one category over the binary variable (see graph in the first post).
            Example: Within the binary variable of men (0) an women (1), how many percentages are in the first, second, third and fourth category? (so that men sum up to 100% and women sum up to 100%)

            Is there any solution to do this with twoway bar and twoway rcap? Unfortunately everything that I checked referred to the mean and not to percentages within a group.

            Thanks again in advance for your help!

            Best regards,
            Patricia

            Comment


            • #7
              Indeed. But you're still plotting means, as each percent is the mean of an indicator variable (scaled by 100, which is cosmetic here). And it's key not to mix two different problems. twoway bar etc is just for plotting and you still need to calculate the means and confidence intervals first.


              There may well be a simpler or better formulation, but this indicates some technique. (Using Jeffreys's method is partly a personal choice.)

              Code:
              sysuse auto, clear
              
              tab rep78, gen(rep78_)
              
              ci prop rep78_*
              
              gen mean = .
              gen ub = .
              gen lb = .
              gen work = .
              quietly forval j = 1/5 {
                  replace work = rep78 == `j' if rep78 < .
                  ci prop work , jeffreys  level(90)
                  replace mean = r(mean) if rep78 == `j'
                  replace ub = r(ub) if rep78 == `j'
                  replace lb = r(lb) if rep78 == `j'
              }
              
              collapse mean ub lb , by(rep78)
              
              set scheme s1color
              scatter mean rep78 || rcap ub lb rep78 , legend(off) ytitle(Percents and 90% confidence intervals) yla(0 0.1 "10" 0.2 "20" 0.3 "30" 0.4 "40" 0.5 "50", ang(h))

              Your problem is a little different but the mean of an indicator for being female gives you the mean of being male by subtraction (given your categories).

              As you're not giving a data example (FAQ Advice #12) we can't use it.
              Last edited by Nick Cox; 26 Aug 2019, 06:51.

              Comment


              • #8
                This may be closer to your problem.

                Code:
                ysuse auto, clear 
                
                statsby, by(rep78): ci prop foreign, level(90) 
                
                list 
                
                set scheme s1color 
                
                scatter mean rep78 || rcap ub lb rep78 , legend(off) ytitle(Percent foreign and 90% confidence intervals) yla(0 1 "100" 0.5 "50" 0.25 "25" 0.75 "75")

                Comment


                • #9
                  Hello,

                  I just came across this and found it helpful to see that what I want to do can be done, but I need a bit more hand holding so would appreciate the help.

                  I have two binary independent variables ("role" = first, second; "informed" = yes, no) and a continuous dependent variable "dep".

                  I have used "graph bar" to plot the average of "dep" over(informed) over(role) to get something that looks like the first two column clusters in the graph from the first post.

                  I need to add confidence intervals, and I understand I can do twoway (graph bar) (rcap) but I do not know how to compute the rcap over two variables (informed and role) rather than one (as in the examples).

                  Could you please help me? Thanks a lot in advance!

                  Comment


                  • #10
                    #9 isn't really a new question as all the key points are already explained in the thread. Your use of a different dataset without a reference or data example doesn't allow illustrations with your data.

                    graph bar is useless for your purposes as you can't add confidence intervals, as already said.

                    #5 is already a worked example with one outcome and two predictors, exactly equivalent to what you want. If you strongly prefer bars to scatters, which is hard to understand, you need to call up twoway bar.

                    This code shows bar + error bar plots. It is #5 reworked to that effect.

                    Code:
                    use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
                    
                    statsby, by(race ses) : ci mean write , level(90)
                    
                    twoway bar mean race, horizontal base(0) barw(0.8) yla(1/4, valuelabel tlength(0) ang(h)) ///
                    || rcap ub lb race, horizontal by(ses, subtitle(Mean writing scores by SES and race with 90% confidence intervals)note("") col(1) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("")
                    It is a lousy plot compared with #5 as a lot of space and ink is expended on showing that the scores are not zero. Sometimes that is a point that needs to be made. dep is perfectly anonymous in #9, so speculation is futile on whether you need to do that. but I'd say that in my experience more than 90% of confidence interval graphs are better without bars starting at zero. The point is that usually the comparison of interest is of scores with other scores, not with zero.

                    If you need more hand-holding than this, please give a data example. https://www.statalist.org/forums/help#stata explains how to do that.

                    Comment


                    • #11
                      Hello,

                      Thank you very much for your help.

                      I have made progress (and I take the point you made about including more information such as my dataset and information regarding the nature of the variables).

                      I changed my variable names to PREF(No,Yes) and INFO(infored dict, informed recip). My dependent variable is a binary variable stating whether the participant was generous or not. This graph is standard for these types of analyses.

                      The graph on the left is what I need, but with confidence intervals.
                      The graph on the right is where I've got so far.

                      The INFO labels (informed dict, recip) used to be on the left hand side of the graphs, so when I move them manually, the two 'blocks' are separated.

                      For the left hand side figure, I used:
                      Code:
                      graph bar generous, over(pref) over(info) asyvars showyvars legend(off)
                      For the right hand side figure, I used the following (but did some edits by hand for the x-ticks and to move the "role" labels):
                      Code:
                      twoway bar mean pref, yla(0(0.2)0.6, valuelabel tlength(0) ang(h)) ///
                      || rcap ub lb pref, by(info, subtitle(TITLE)note("") col(2) legend(off)) subtitle(, pos(9) nobox nobexpand) ytitle("")
                      I was wondering whether I could have the INFO labels below the graphs by default, so that the space disappears. Or whether there's another way of joining the two panels on the right hand figure so that it looks closer to the one on the left?

                      Thank you

                      Click image for larger version

Name:	TARGET.JPG
Views:	1
Size:	17.3 KB
ID:	1728352
                      CURRENT.JPG

                      Comment


                      • #12
                        I take the point you made about including more information such as my dataset
                        Good, so please give your dataset, or an example.

                        Comment


                        • #13
                          This is my partial dataset:
                          Code:
                          info    pref    generous
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          Informed Dictator    No    0
                          Informed Recipient    No    0
                          Informed Dictator        1
                          Informed Recipient        1
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          Informed Dictator        0
                          Informed Recipient        0
                          Informed Dictator    No    1
                          Informed Recipient    No    1
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          Informed Dictator        0
                          Informed Recipient        0
                          Informed Dictator    Yes    0
                          Informed Recipient    Yes    0
                          Informed Dictator    Yes    1
                          Informed Recipient    Yes    1
                          Informed Dictator        0
                          Informed Recipient        0
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          Informed Dictator    No    0
                          Informed Recipient    No    0
                          Informed Dictator    No    0
                          Informed Recipient    No    0
                          Informed Dictator    Yes    0
                          Informed Recipient    No    0
                          And dataex says
                          input byte(info pref) float generous

                          I hope this helps!

                          Comment


                          • #14
                            Sorry, but I can't yet follow what is going on. dataex wouldn't produce output like that. Strings would be delimited by " ". It seems that you have four values in some observations (rows), and three in the others, but only three variables (columns) are named.

                            Comment


                            • #15
                              Hello,

                              Here's another attempt. Thanks for your patience!


                              Code:
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input byte(info pref) float generous
                              1 1 0
                              2 0 0
                              1 0 1
                              2 0 1
                              1 1 0
                              2 1 0
                              1 0 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 1
                              2 0 1
                              1 1 0
                              2 1 0
                              1 1 0
                              2 0 0
                              1 0 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 1
                              2 0 1
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 1 0
                              1 0 0
                              2 1 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 1 0
                              1 0 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 1 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 0 1
                              2 0 1
                              1 1 0
                              2 1 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 0 0
                              2 0 0
                              1 1 1
                              2 1 1
                              1 1 0
                              2 0 0
                              1 1 0
                              2 1 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 0 0
                              2 0 0
                              1 1 0
                              2 0 0
                              1 1 1
                              2 0 1
                              1 1 0
                              2 0 0
                              1 0 1
                              end
                              label values info info_label
                              label def info_label 1 "Informed Dictator", modify
                              label def info_label 2 "Informed Recipient", modify
                              label values pref pref_label
                              label def pref_label 0 "No", modify
                              label def pref_label 1 "Yes", modify

                              Comment

                              Working...
                              X