Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating bar chart with multiple binary variables

    Hi, I am attempting to create a bar chart in Stata that shows proportions for multiple different binary variables. Each is a different measure of whether or not a medical provider offered a certain service at a health visit (yes/no). I've tried to recreate a reproducible example here that is similar to my dataset, along with the code I have tried so far to create this. Wondering if there's any way that I can show bar for both yes and no for each indicator? Also I am trying to relabel the variable names on the y axis but this code yields an invalid 'ylabel' error. New Stata user and first attempt at posting a reproducible example, please let me know if this isn't clear!

    Code:
    set obs 10
    egen id = seq(), from(1) to(10)
    gen fp = cond(mod(_n, 2), 0, 1)
    gen cervical = cond(mod(_n, 4), 0, 1)
    gen immunization = cond(mod(_n, 6), 0, 1)
    lab define fp 0 "No" 1 "Yes"
    lab define cervical 0 "No" 1 "Yes"
    lab define immunization 0 "No" 1 "Yes"
    lab values fp fp 
    lab values cervical cervical
    lab values immunization immunization
    
    statplot fp cervical immunization, varopts(sort(1)) blabel(bar), ylabel(fp "Provider discussed family planning with mother" cervical "Provider offered cervical cancer screening to mother" immunization "Provider immunized baby")

  • #2
    statplot is from SSC (FAQ Advice #12). Being a wrapper for graph hbar, you need to look at that command's options.

    Code:
    help graph hbar
    In this case, you need the -relabel()- option within -varopts()-

    Code:
    relabel(1 "Provider discussed family planning with mother" 2"Provider offered cervical cancer screening to mother" 3 "Provider immunized baby")
    Be sure that the numbering matches the relevant bar.

    Comment


    • #3
      Another possibility here is designplot from the Stata Journal:

      Code:
      . search designplot, sj
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-19-3 gr0061_3  . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
              Q3/19   SJ 19(3):748--751
              any attempt to use the missing option of graph dot,
              graph hbar, or graph bar is now ignored and advice on
              what to do instead is shown
      
      SJ-17-3 gr0061_2  . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
              Q3/17   SJ 17(3):779
              help file updated
      
      SJ-15-2 gr0061_1  . . . . . . . . . . . . . . . Software update for designplot
              (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
              Q2/15   SJ 15(2):605--606
              bug fixed for Stata 14
      
      SJ-14-4 gr0061  Design plots for graphical summary of a response given factors
              (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
              Q4/14   SJ 14(4):975--990
              produces a graphical summary of a numeric response variable
              given one or more factors
      designplot wants an outcome variable, but we just need to feed it a variable that is constant as if that were the outcome and ignore that constant otherwise. Then we get a display of the marginal distributions of each "predictor".

      Code:
      clear
      set obs 10
      egen id = seq(), from(1) to(10)
      gen fp = cond(mod(_n, 2), 0, 1)
      gen cervical = cond(mod(_n, 4), 0, 1)
      gen immunization = cond(mod(_n, 6), 0, 1)
      lab define fp 0 "No" 1 "Yes"
      lab define cervical 0 "No" 1 "Yes"
      lab define immunization 0 "No" 1 "Yes"
      lab values fp fp
      lab values cervical cervical
      lab values immunization immunization
      
      set scheme s1color
      
      gen one = 1
      
      label var one "interesting data"
      
      designplot one fp cervical immunization, stat(count) min(1) max(1) recast(hbar) variablenames name(G1, replace)
      Click image for larger version

Name:	designplot_G1.png
Views:	1
Size:	16.0 KB
ID:	1705328



      In your real problem you likely have not 3 but more like 10 or 30 variables. As they're all binary, the means carry all the information (other than sample sizes). A little work to collapse to a dataset of means and counts allows a great deal by way of flexible plotting. For this concocted example, the sample size (# of non-missing values) is the same for all variables, but for real data that clearly isn't guaranteed.

      Showing both Yes and No frequencies for many binary variables is in practice as likely to confuse as to clarify.

      The code here isn't going to be much different for a dataset with many more variables.

      The option choices naturally aren't the only choices possible, or even the best, but mostly given here to underline some possibilities. For a graph with table flavour, I like the table convention of putting explanatory stuff at the top, not the bottom. More on that at https://www.stata-journal.com/articl...article=gr0053

      The more variables you have, the more likely it is that a vertical display will just be a mess.

      Code:
      foreach v of varlist fp-immunization {
          local call `call' (count) count`v'=`v' (mean) mean`v'=`v'
      }
      
      collapse `call'
      
      gen id = 1
      
      reshape long count mean, i(id) j(which) string
      
      gen toshow = which + " ({it:n}=" + strofreal(count) + ")"
      
      sort mean
      
      graph dot (asis) mean, over(toshow, sort(1) descending) linetype(line) lines(lc(gs12) lw(vthin)) name(G2, replace) ysc(alt) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50") ytitle(% practising)
      
      graph hbar (asis) mean, over(toshow, sort(1) descending) bar(1, fcolor(blue*0.2) lcolor(blue))  name(G3, replace) ysc(alt) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50") ytitle(% practising)
      Click image for larger version

Name:	designplot_G2.png
Views:	1
Size:	13.1 KB
ID:	1705329

      Click image for larger version

Name:	designplot_G3.png
Views:	1
Size:	14.4 KB
ID:	1705330

      Last edited by Nick Cox; 11 Mar 2023, 02:59.

      Comment


      • #4
        Thank you so much Nick Cox and Andrew Musau, this is incredibly helpful! I really appreciate your comments and am thrilled to have much better bar graphs now Somewhat related, I am also struggling with how to create a pie graph of "Pregnancy danger signs experienced" from multiple binary indicators, each of which is a yes/no for whether or not a danger sign (eg blurred vision, severe headache) was experienced. My dataset has many different questions that are coded as multiple binary indicators so I'm still learning how to work with this.

        Below is a slightly adjusted example dataset that is similar to what I have (although my data has some missing values in the variables and I couldn't figure out how to add those missing values into the example dataset) My thinking was to create a new categorical variable out of the multiple binary indicators using code like the below and then graph that new categorical variable, but I'm realizing that doesn't work because of course it's creating a variable with a value for each mother in the dataset. Mothers can report multiple danger signs, so what I'm trying to make is a pie chart showing of all danger signs reported (a different n than the n of individuals), this is the % of the danger signs reported that were blurred vision, % that were severe headache etc.

        Code:
        clear
        set obs 10
        egen id = seq(), from(1) to(10)
        gen bleeding = cond(mod(_n, 2), 0, 1)
        gen breathingdiff = cond(mod(_n, 4), 0, 1)
        gen chestpain = cond(mod(_n, 6), 0, 1)
        gen fever = cond(mod(_n, 2), 0, 1)
        gen abdominalpain = cond(mod(_n, 4,), 0, 1)
        lab define bleeding 0 "No" 1 "Yes"
        lab define breathingdiff 0 "No" 1 "Yes"
        lab define chestpain 0 "No" 1 "Yes"
        lab define fever 0 "No" 1 "Yes"
        lab define abdominalpain 0 "No" 1 "Yes"
        lab values bleeding bleeding
        lab values breathingdiff breathingdiff
        lab values chestpain chestpain
        lab values fever fever
        lab values abdominalpain abdominalpain
        
        
        gen mom_dangersign_cat = . 
        replace mom_dangersign_cat = 1 if bleeding == 1 
        replace mom_dangersign_cat = 2 if breathingdiff == 1 
        replace mom_dangersign_cat = 3 if chestpain == 1 
        replace mom_dangersign_cat = 4 if fever == 1
        replace mom_dangersign_cat = 5 if abdominal pain== 1
        
        lab define mom_dangersign_cat 1 "Vaginal bleeding (heavy or sudden increase)" 2 "Breathing difficulty" 3 "Chest pain" 4 "Fever" 5 "Severe abdominal pain" 
        lab values mom_dangersign_cat mom_dangersign_cat
        This seems like it shouldn't be hard to do and Nick's explanation above is making me think I need to create a new dataset for the graphing, but I've gotten completely stuck about how to do that and get the right denominator using these binary indicators. I'd be grateful for any ideas on how to go about this!

        Comment


        • #5
          You can use tabm from tab_chi on SSC to get the reduced dataset you need, at least temporarily. No pie chart is going to be easier to read or write about than a bar chart, or so I suggest. As you mention, you need to explain it's number of symptoms, not number of women being shown, and a pie chart's implications of subdividing a total here are unfortunate, apart from its other problems.


          Code:
          clear
          set obs 10
          egen id = seq(), from(1) to(10)
          gen bleeding = cond(mod(_n, 2), 0, 1)
          gen breathingdiff = cond(mod(_n, 4), 0, 1)
          gen chestpain = cond(mod(_n, 6), 0, 1)
          gen fever = cond(mod(_n, 2), 0, 1)
          gen abdominalpain = cond(mod(_n, 4), 0, 1)
          lab define bleeding 0 "No" 1 "Yes"
          lab define breathingdiff 0 "No" 1 "Yes"
          lab define chestpain 0 "No" 1 "Yes"
          lab define fever 0 "No" 1 "Yes"
          lab define abdominalpain 0 "No" 1 "Yes"
          lab values bleeding bleeding
          lab values breathingdiff breathingdiff
          lab values chestpain chestpain
          lab values fever fever
          lab values abdominalpain abdominalpain
          
          preserve
          * ssc install tab_chi 
          tabm bl-ab, replace
          label li _stack 
          label def _stack 2 "breathing difficulties" 3 "chest pain" 5 "abdominal pain", modify 
          set scheme s1color 
          graph hbar (count) if _values, over(_stack, sort(1) descending) blabel(total)
          save wantedforgraph 
          restore
          Click image for larger version

Name:	difficulties.png
Views:	1
Size:	16.1 KB
ID:	1706538


          Incidentally, although I'm pleased to see egen, seq() in action -- I wrote the first version --

          Code:
          gen id = _n
          would serve your purpose fine.

          Comment


          • #6
            The same information is shown by

            Code:
            graph hbar (sum) bl-ab, blabel(total)
            but I prefer the version above.

            Comment


            • #7
              Wanting the single variable counts is one thing, but as we all know symptoms can occur together.

              Consider upsetplot and vennbar from SSC.

              https://www.statalist.org/forums/for...lable-from-ssc

              https://www.statalist.org/forums/for...lable-from-ssc

              Code:
              clear
              set obs 10
              egen id = seq(), from(1) to(10)
              gen bleeding = cond(mod(_n, 2), 0, 1)
              gen breathingdiff = cond(mod(_n, 4), 0, 1)
              gen chestpain = cond(mod(_n, 6), 0, 1)
              gen fever = cond(mod(_n, 2), 0, 1)
              gen abdominalpain = cond(mod(_n, 4), 0, 1)
              
              label var ab "abdominal pain"
              label var br "breathing difficulty"
              label var ch "chest pain"
              
              upsetplot bl-ab, labelopts(mlabel(_count) mlabpos(12)) ysc(r(. 6)) name(UP, replace)
              
              vennbar bl-ab , varlabels blabel(bar) name(VB, replace)
              The graphs for the full dataset would be more complicated (32 subsets possible) but presumably more interesting.
              Click image for larger version

Name:	UP.png
Views:	1
Size:	13.3 KB
ID:	1706589


              Click image for larger version

Name:	VB.png
Views:	1
Size:	16.8 KB
ID:	1706590

              Comment


              • #8
                Thank you so much Nick Cox for your time and very helpful responses! I ended up using tabm and completely agree this is more clear than a pie graph. In case it's helpful to anyone with a similar need in the future, I made one small tweak below to exclude missing values from the horizontal bar graph as I did not want those included in the counts for the variable being graphed.

                Code:
                clear
                set obs 10
                egen id = seq(), from(1) to(10)
                gen bleeding = cond(mod(_n, 2), 0, 1)
                gen breathingdiff = cond(mod(_n, 4), 0, 1)
                gen chestpain = cond(mod(_n, 6), 0, 1)
                gen fever = cond(mod(_n, 2), 0, 1)
                gen abdominalpain = cond(mod(_n, 4), 0, 1)
                
                lab define bleeding 0 "No" 1 "Yes"
                lab define breathingdiff 0 "No" 1 "Yes"
                lab define chestpain 0 "No" 1 "Yes"
                lab define fever 0 "No" 1 "Yes"
                lab define abdominalpain 0 "No" 1 "Yes"
                lab values bleeding bleeding
                lab values breathingdiff breathingdiff
                lab values chestpain chestpain
                lab values fever fever
                lab values abdominalpain abdominalpain  
                
                preserve
                *ssc install tab_chi
                tabm bleeding-abdominalpain, replace
                label li _stack
                label def _stack 1 "Bleeding" 2 "Breathing difficulty" 3 "Chest pain" 4 "Fever" 5 "Severe abdominal pain", modify
                set scheme s1color
                graph hbar (count) if _values & !missing(_values), over(_stack, sort(1) descending) blabel(total)
                save wantedforgraph
                restore
                Last edited by Valerie Scott; 01 Apr 2023, 15:13.

                Comment

                Working...
                X