Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting frequency of binary data (yes/no) by categorical data.

    Hello all. I am a physician conducting health outcomes research and am new to STATA. Have learned a lot silently browsing the forum but have hit a roadblock and have a question of my own.

    I would like to plot the frequency of post op sepsis in patients, based on the type of wound class they had during surgery. For example, wound is categorized as: Clean, Contaminated, and Dirty--a categorical variable.

    Incidence of post operative sepsis as a complication is coded in my database as 1-No, 2-Yes.

    . tab postopsepsis woundcls

    This gives me a nice table of the frequency of sepsis based on each wound class.

    However, the command: . histogram postopsepsis, by(woundcls) produces the following chart:

    Screen Shot 2020-07-22 at 1.36.49 PM.png

    When what I want is a single graph histogram telling me the frequency of positive cases of sepsis across operative wound class.

    Any thoughts? Thanks for your help!

  • #2
    Screen Shot 2020-07-22 at 2.09.44 PM.png

    Here's the closest I could get to what I want using the command:
    histogram postopsepsis if postopsepsis ==2, discrete frequency by(woundcls, rows(1))

    However, the perfect histogram would be where I have wound class on the x axis as a categorical and sepsis frequency on the Y all in one graph.

    Comment


    • #3
      A data example would help mightily (https://www.statalist.org/forums/help#stata), But your question is similar to any question looking at foreign and rep78 in the auto data.


      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . tab foreign rep78
      
                 |                   Repair Record 1978
        Car type |         1          2          3          4          5 |     Total
      -----------+-------------------------------------------------------+----------
        Domestic |         2          8         27          9          2 |        48 
         Foreign |         0          0          3          9          9 |        21 
      -----------+-------------------------------------------------------+----------
           Total |         2          8         30         18         11 |        69 
      
      set scheme s1color
      A basic graph of frequencies (meaning counts) follows from say

      Code:
      graph bar (count) , over(foreign) by(rep78)
      where you can swap the variables around and change many details.

      Click image for larger version

Name:	sepsis1.png
Views:	1
Size:	23.4 KB
ID:	1564774


      I tend to prefer the results given by a community-contributed command, which you must download before you can use it:

      Code:
      ssc install tabplot 
      
      tabplot foreign rep78, showval
      Click image for larger version

Name:	sepsis2.png
Views:	1
Size:	16.0 KB
ID:	1564775

      Again, you can swap the variables around and change many details.





      Comment


      • #4
        Here's another approach.

        Code:
        // The following was inspired by this example:
        // https://www.stata.com/support/faqs/graphics/gph/graphdocs/bar-chart-with-multiple-bars-graphed-over-another-variable/
        
        clear *
        sysuse auto
        tabulate foreign
        * Generate an indicator for domestic status
        generate byte domestic = !foreign if !missing(foreign)
        * Check that it worked
        tabulate foreign domestic
        
        * Crosstabulate rep78 and foreign
        tab rep78 foreign
        * Variables foreign & domestic are both 1/0 indicator variables,
        * so their SUMS will show the counts we want to graph:
        graph bar (sum) domestic (sum) foreign, over(rep78)  ///
        title(# of Domestic v Foreign cars by Repair Record) ///
        ytitle(Count) ylab(0(5)30,grid)
        I'm sure there's a simple way to change "sum of domestic" and "sum of foreign" to "Domestic" and "Foreign". I could not find it quickly just now, though.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          There are some useful looking examples on this page too: That -splitvallabels- command by Nick Winter and Ben Jann looks quite useful.

          Code:
          ssc describe splitvallabels
          I have to add, though, that as much as I appreciate Stata, this is one example (I think) where SPSS gets the job done in a much more straightforward manner. The following SPSS syntax uses the same auto data as in the examples posted earlier in the thread. Assuming variable and value labels have been added, it's one simple GRAPH command.

          Code:
          * Code to input the data snipped.
          
          VARIABLE LABELS
           rep78 "Repair Record in 1978"
           foreign "Domestic vs Foreign".
          VALUE LABELS
           rep78
            1 "Very low"
            2 "Low"
            3 "Medium"
            4 "Good"
            5 "Excellent" /
           foreign
            1 "Foreign" 0 "Domestic"
          .
          
          GRAPH
            /BAR(GROUPED)=COUNT BY rep78 BY foreign
            /TITLE='# of Domestic vs Foreign Cars by Repair Record'.

          Click image for larger version

Name:	SPSS-clustered-bar-chart.png
Views:	1
Size:	16.3 KB
ID:	1564791
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            You can get graphs like @Bruce Weaver's easily enough in Stata too. Here most of the effort is in tweaking away from Stata's defaults towards what I assume are SPSS's defaults.

            Code:
            ssc inst catplot
            
            catplot foreign rep78, recast(bar) blabel(total) asyvars b1title("`: var label rep78'") bar(1, lcolor(black) fcolor(cranberry*0.8)) bar(2, lcolor(black) fcolor(blue*0.5)) yla(, ang(h)) legend(pos(1) ring(0) col(1)) name(G3)
            Click image for larger version

Name:	sepsis3.png
Views:	2
Size:	18.0 KB
ID:	1564830




            My use of catplot is down to personal convenience. I'm more familiar with its approach to plotting frequencies, proportions or percents for categorical data. It's here just a wrapper for graph bar. The result above is given by this equivalent code. (Equivalent doesn't mean identical.)

            Code:
            graph bar (count) , over(foreign) over(rep78) asyvars legend(pos(1) ring(0) col(1)) blabel(total) b1title("`: var label rep78'") yla(, ang(h)) bar(1, lcolor(black) fcolor(cranberry*0.8)) bar(2, lcolor(black) fcolor(blue*0.5)) name(G4)
            Click image for larger version

Name:	sepsis4.png
Views:	2
Size:	18.0 KB
ID:	1564831



            All that said, I agree with Bruce in this sense. When graph was revised in Stata 8 (2003) graphs like these were not directly possible. In essence you had to create your own frequency or other variable first and then draw it. (A first work-around was to create a variable always equal to 1 and then ask to see sums in a bar chart.)

            catplot was born out of that work-around in 2003. and (depending what you choose) it is a wrapper for graph bar or graph hbar or graph dot. The functionality of graph bar (count) came along some years later.

            In contrast, tabplot goes back to 1999 and was rewritten in 2004 as a wrapper for twoway.

            In general: Stata's officiai graphics for categorical data can seem limited or awkward but people in the community have worked at extending the number of commands and their flexibility.
            Last edited by Nick Cox; 22 Jul 2020, 17:36.

            Comment


            • #7
              Those last 2 plots are exactly what I'm looking for. Thank you guys for your quick replies. Very helpful. I installed the catplot add on.

              Nick, I was not trying to be vague intentionally, will keep that in mind in my posts.

              Thanks again

              Comment

              Working...
              X