Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bar graph of proportions over with confidence interval over categories

    Hi Statelist,

    I have a few binary variables (0 and 1) and want to make bar graph over a categorical variable Z (takes 1 and 2) and the confidence intervals for each category.

    Here's an example of my dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(x1 x2 x4 x3 x6 x5 z)
    0 0 1 0 1 1 1
    1 1 1 1 1 0 1
    1 1 0 0 . . 1
    1 1 0 0 . . 1
    1 1 1 1 1 0 1
    1 1 0 1 . . 2
    1 1 1 1 1 1 1
    1 1 1 1 0 1 1
    1 1 1 1 1 1 2
    1 1 0 0 . . 1
    1 1 0 0 . . 1
    1 1 0 1 . . 1
    1 1 0 0 . . 1
    1 1 0 1 . . 2
    1 1 1 1 1 1 1
    1 1 0 1 . . 2
    1 1 1 1 1 0 1
    1 1 0 0 . . 2
    1 1 0 0 . . 1
    1 1 1 1 0 1 2
    1 1 1 1 1 1 2
    1 1 0 1 . . 2
    1 1 1 1 1 1 2
    0 0 1 0 1 0 2
    1 1 1 1 1 1 2
    end
    label values z z
    label def z 1 "no", modify
    label def z 2 "yes", modify
    the final graph I'm looking for is something like this:


  • #2
    I can see only an icon where you show your desired graph. Regardless, I want to flag strong arguments against what are often called dynamite, plunger or detonator plots.

    For propaganda against their use, see e.g.

    https://biostat.app.vumc.org/wiki/pu...de/Poster3.pdf

    https://simplystatistics.org/posts/2...lots-must-die/

    https://warwick.ac.uk/fac/sci/wdsi/e...es/plunger.pdf

    Here is one way to do it Note in particular

    1. what to do given missing values -- I made one choice

    2. how to calculate confidence intervals particularly if means are or are near 1 or 0. I like the Jeffreys procedure, but the point is not to choose the default without strong grounds.

    3. the need for text better than x1 x2 x3 x4 x5 x6 z



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(x1 x2 x4 x3 x6 x5 z)
    0 0 1 0 1 1 1
    1 1 1 1 1 0 1
    1 1 0 0 . . 1
    1 1 0 0 . . 1
    1 1 1 1 1 0 1
    1 1 0 1 . . 2
    1 1 1 1 1 1 1
    1 1 1 1 0 1 1
    1 1 1 1 1 1 2
    1 1 0 0 . . 1
    1 1 0 0 . . 1
    1 1 0 1 . . 1
    1 1 0 0 . . 1
    1 1 0 1 . . 2
    1 1 1 1 1 1 1
    1 1 0 1 . . 2
    1 1 1 1 1 0 1
    1 1 0 0 . . 2
    1 1 0 0 . . 1
    1 1 1 1 0 1 2
    1 1 1 1 1 1 2
    1 1 0 1 . . 2
    1 1 1 1 1 1 2
    0 0 1 0 1 0 2
    1 1 1 1 1 1 2
    end
    label values z z
    label def z 1 "no", modify
    label def z 2 "yes", modify
    
    gen touse = !missing(x1, x2, x3, x4, x5, x6)
    
    gen axis = _n in 1/6
    
    foreach v in mean ub lb {
        gen `v'1 = .
        gen `v'2 = .
    }
    
    forval j = 1/6 {
        ci proportion x`j' if touse & z == 1, jeffreys  
        quietly {
            replace mean1 = r(proportion) in `j'
            replace ub1 = r(ub) in `j'
            replace lb1 = r(lb) in `j'
        }
        
        ci proportion x`j' if touse & z == 2, jeffreys  
        quietly {
            replace mean2 = r(proportion) in `j'
            replace ub2 = r(ub) in `j'
            replace lb2 = r(lb) in `j'
        }
        local call `call' `j' "x`j'"  
    }
    
    gen axis1 = axis - 0.1
    gen axis2 = axis + 0.1
    
    twoway scatter mean1 axis1, mc(blue)  ms(O) || rcap lb1 ub1 axis1, lc(blue) ///
    || scatter mean2 axis2, mc(red) ms(Th) || rcap lb2 ub2 axis2, lc(red) ///
    xla(`call', tlc(none)) ytitle(means and 95% confidence intervals) ///
    note(Jeffreys procedure: choose your own) legend(order(1 "z is no" 3 "z is yes") pos(12) row(1))
    Click image for larger version

Name:	manyci.png
Views:	1
Size:	70.7 KB
ID:	1756475

    Last edited by Nick Cox; 18 Jun 2024, 03:27.

    Comment


    • #3
      Thanks Nick Cox. The mean I'm looking for is not the overall mean, but rather the mean for each category of z. Specifically, I mean the summation of x1 for z=1 divided by the total sum of z where z=1, and similarly for z=2 and the rest of the variables.

      Comment


      • #4
        not the overall mean, but rather the mean for each category of z
        That's exactly what my code provides for each variable in turn. Did you look at the code carefully or check its results independently? Note that two displays are presented for each x variable.

        the summation of x1 for z=1 divided by the total sum of z where z=1, and similarly for z=2 and the rest of the variables.
        I don't follow what you're saying there.

        You shouldn't want the sum of z to enter any calculation of means for the x variable. A clear counter-example is that you'd be dividing by 0 if z were coded 0 for either case. Perhaps when you say sum you mean frequency.

        Comment


        • #5
          Thanks Nick Cox

          I meant the frequency of each category of z. In the dataex above, there are total n = 25, number of observations for z=1 is 14 and number of observations for z = 2 is 11. So, I'm looking for this:

          proportion of x1 for z = 1 is sum of x1 = 1's for z = 1 divided by 14 = 13 / 14
          proportion of x1 for z = 2 is sum of x1 = 1's for z = 2 divided by 11 = 10 / 11

          and similarly all the other variables.
          ​​

          Comment


          • #6
            Again, I think the code should do what you are asking for, or at least what I take to be standard when calculating any means.

            As already flagged in #2 you could make a different decision from mine about observations with missing values.

            Code:
            . * Example generated by -dataex-. For more info, type help dataex
            . clear
            
            . input byte(x1 x2 x4 x3 x6 x5 z)
            
                       x1        x2        x4        x3        x6        x5         z
              1. 0 0 1 0 1 1 1
              2. 1 1 1 1 1 0 1
              3. 1 1 0 0 . . 1
              4. 1 1 0 0 . . 1
              5. 1 1 1 1 1 0 1
              6. 1 1 0 1 . . 2
              7. 1 1 1 1 1 1 1
              8. 1 1 1 1 0 1 1
              9. 1 1 1 1 1 1 2
             10. 1 1 0 0 . . 1
             11. 1 1 0 0 . . 1
             12. 1 1 0 1 . . 1
             13. 1 1 0 0 . . 1
             14. 1 1 0 1 . . 2
             15. 1 1 1 1 1 1 1
             16. 1 1 0 1 . . 2
             17. 1 1 1 1 1 0 1
             18. 1 1 0 0 . . 2
             19. 1 1 0 0 . . 1
             20. 1 1 1 1 0 1 2
             21. 1 1 1 1 1 1 2
             22. 1 1 0 1 . . 2
             23. 1 1 1 1 1 1 2
             24. 0 0 1 0 1 0 2
             25. 1 1 1 1 1 1 2
             26. end
            
            . label values z z
            
            . label def z 1 "no", modify
            
            . label def z 2 "yes", modify
            
            . 
            . gen touse = !missing(x1, x2, x3, x4, x5, x6)
            
            . 
            . gen axis = _n in 1/6 
            (19 missing values generated)
            
            . 
            . foreach v in mean ub lb n { 
              2.         gen `v'1 = . 
              3.         gen `v'2 = . 
              4. }
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            (25 missing values generated)
            
            . 
            . forval j = 1/6 { 
              2.         ci proportion x`j' if touse & z == 1, jeffreys  
              3.         quietly { 
              4.                 replace n1 = r(N) in `j'
              5.                 replace mean1 = r(proportion) in `j' 
              6.                 replace ub1 = r(ub) in `j'
              7.                 replace lb1 = r(lb) in `j'
              8.         }
              9.         
            .         ci proportion x`j' if touse & z == 2, jeffreys  
             10.         quietly { 
             11.                 replace n2 = r(N) in `j'
             12.                 replace mean2 = r(proportion) in `j' 
             13.                 replace ub2 = r(ub) in `j'
             14.                 replace lb2 = r(lb) in `j'
             15.         }
             16.         local call `call' `j' "x`j'"  
             17. }
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x1 |          7    .8571429      .13226        .4992025    .9841235
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x1 |          6    .8333333    .1521452        .4419432    .9813799
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x2 |          7    .8571429      .13226        .4992025    .9841235
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x2 |          6    .8333333    .1521452        .4419432    .9813799
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x3 |          7    .8571429      .13226        .4992025    .9841235
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x3 |          6    .8333333    .1521452        .4419432    .9813799
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x4 |          7           1           0        .7075638           1
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x4 |          6           1           0        .6696111           1
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x5 |          7    .5714286    .1870439        .2345012    .8611358
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x5 |          6    .8333333    .1521452        .4419432    .9813799
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x6 |          7    .8571429      .13226        .4992025    .9841235
            
                                                                           Jeffreys      
                Variable |        Obs  Proportion    Std. err.       [95% conf. interval]
            -------------+---------------------------------------------------------------
                      x6 |          6    .8333333    .1521452        .4419432    .9813799
            
            . 
            . list n1 mean1 lb1 ub1 n2 mean2 lb2 ub2 in 1/6 
            
                 +---------------------------------------------------------------------------+
                 | n1      mean1        lb1        ub1   n2      mean2        lb2        ub2 |
                 |---------------------------------------------------------------------------|
              1. |  7   .8571429   .4992025   .9841235    6   .8333333   .4419432   .9813799 |
              2. |  7   .8571429   .4992025   .9841235    6   .8333333   .4419432   .9813799 |
              3. |  7   .8571429   .4992025   .9841235    6   .8333333   .4419432   .9813799 |
              4. |  7          1   .7075638          1    6          1   .6696111          1 |
              5. |  7   .5714286   .2345012   .8611358    6   .8333333   .4419432   .9813799 |
                 |---------------------------------------------------------------------------|
              6. |  7   .8571429   .4992025   .9841235    6   .8333333   .4419432   .9813799 |
                 +---------------------------------------------------------------------------+
            
            . 
            . su x? 
            
                Variable |        Obs        Mean    Std. dev.       Min        Max
            -------------+---------------------------------------------------------
                      x1 |         25         .92    .2768875          0          1
                      x2 |         25         .92    .2768875          0          1
                      x4 |         25         .52     .509902          0          1
                      x3 |         25         .64    .4898979          0          1
                      x6 |         13    .8461538    .3755338          0          1
            -------------+---------------------------------------------------------
                      x5 |         13    .6923077    .4803845          0          1
            
            . 
            . su x? if touse 
            
                Variable |        Obs        Mean    Std. dev.       Min        Max
            -------------+---------------------------------------------------------
                      x1 |         13    .8461538    .3755338          0          1
                      x2 |         13    .8461538    .3755338          0          1
                      x4 |         13           1           0          1          1
                      x3 |         13    .8461538    .3755338          0          1
                      x6 |         13    .8461538    .3755338          0          1
            -------------+---------------------------------------------------------
                      x5 |         13    .6923077    .4803845          0          1

            Comment


            • #7
              Nick Cox - mean1 and mean2 for x4 are both equal to 1, which shouldn't be the case. Here's the tab:

              Code:
              . tab x4 z
              
                         |           z
                      x4 |        no        yes |     Total
              -----------+----------------------+----------
                       0 |         7          5 |        12 
                       1 |         7          6 |        13 
              -----------+----------------------+----------
                   Total |        14         11 |        25
              the mean1 should be 7/14 = .5 and the mean2 should be 6/11 = .55

              Comment


              • #8
                Here's the rest of the tabs:


                Code:
                forval i = 1/6 {
                    tab x`i' z
                }
                          |           z
                        x1 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         1          1 |         2 
                         1 |        13         10 |        23 
                -----------+----------------------+----------
                     Total |        14         11 |        25 
                         
                for x1:         
                mean1 should be 13/14 = .93 (while the calculated mean1 is different)
                mean2 should be 10/11 = .91 (the calculated mean2 is different)
                
                           |           z
                        x2 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         1          1 |         2 
                         1 |        13         10 |        23 
                -----------+----------------------+----------
                     Total |        14         11 |        25 
                
                         
                for x2:         
                mean1 should be 13/14 
                mean2 should be 10/11
                
                
                           |           z
                        x3 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         7          2 |         9 
                         1 |         7          9 |        16 
                -----------+----------------------+----------
                     Total |        14         11 |        25 
                
                         
                for x3:         
                mean1 should be 7/14
                mean2 should be 9/11
                
                           |           z
                        x4 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         7          5 |        12 
                         1 |         7          6 |        13 
                -----------+----------------------+----------
                     Total |        14         11 |        25 
                         
                         
                for x4:
                mean1 should be 7/14
                mean2 should be 6/11
                
                           |           z
                        x5 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         3          1 |         4 
                         1 |         4          5 |         9 
                -----------+----------------------+----------
                     Total |         7          6 |        13 
                         
                         
                for x5:
                mean1 should be 4/7
                mean2 should be 5/6
                
                           |           z
                        x6 |        no        yes |     Total
                -----------+----------------------+----------
                         0 |         1          1 |         2 
                         1 |         6          5 |        11 
                -----------+----------------------+----------
                     Total |         7          6 |        13 
                         
                         
                for x6:
                mean1 should be 6/7
                mean2 should be 5/6

                Comment


                • #9
                  The difference you're highlighting was flagged at the outset in #2 and again in #6.

                  I emphasised in #2 and my code makes explicit that I calculated results only for observations or which all x variables are not missing.

                  I also emphasised that that was a choice.

                  Again, the results in #6 show calculations both for all possible values and only for observations with all variables not missing.

                  So, you're highlighting what has already been highlighted. If you want a different choice, feel free to write different code.

                  Comment


                  • #10
                    Thanks, Nick. How can I calculate and add the overall proportions values at the top?

                    Comment


                    • #11
                      I think you've had all the code ideas you need, so I am going to bail out here.

                      Comment


                      • #12
                        Nick, I highlighted here what I'm looking for to add to the graph. those are the overall proportions.

                        Comment


                        • #13
                          As in #11, I have given you some ideas and have now stopped contributing.

                          If anybody else wants to step in, good for them and good news for you.

                          As in #2, the graph you've posted is not visible to me and quite possibly not visible to anyone. To be visible here, a graph image should be defined by a .png file.

                          Comment

                          Working...
                          X