Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing two categorical variables

    Hello everyone,

    I'm new to the Forum and relatively beginner at Stata, so sorry if the answer for what I am looking for is too obvious. I need to create a bar chart of two categorical variables: one is Subject (Math or History) and other is Type (A, B, C, or D). My data is structured as follows (example):


    Subject Type

    Math A
    Hist A
    Hist B
    Math C
    Hist C
    Hist C
    Hist D
    Hist A
    Hist A
    Math A
    Math A
    Math A
    Math C
    Math C
    Hist D
    Math D
    Math A

    I'm looking for a command that will visually give me the percentage of each type, by subject, in one single graph. I'm looking for something like this:
    Click image for larger version

Name:	Screen Shot 2020-08-08 at 11.45.27 PM.png
Views:	4
Size:	38.7 KB
ID:	1567675


    If I do graph bar, over(Type) over(Subject), not only I end up with two graphs, one for each subject, but also Stata does not separately calculate the percentage for each Subject (in my database I have many more Math observations than History):

    Click image for larger version

Name:	Screen Shot 2020-08-08 at 11.57.38 PM.png
Views:	2
Size:	55.7 KB
ID:	1567677



    Using graph bar, over(Type) by(Subject) Stata gives me the relative proportions, but still in two graphs:

    Click image for larger version

Name:	Screen Shot 2020-08-09 at 12.00.31 AM.png
Views:	2
Size:	57.0 KB
ID:	1567679



    Any ideas of how to get what I need? Thanks!
    Attached Files

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 subject str1 type
    "Math" "A"
    "Hist" "A"
    "Hist" "B"
    "Math" "C"
    "Hist" "C"
    "Hist" "C"
    "Hist" "D"
    "Hist" "A"
    "Hist" "A"
    "Math" "A"
    "Math" "A"
    "Math" "A"
    "Math" "C"
    "Math" "C"
    "Hist" "D"
    "Math" "D"
    "Math" "A"
    end
    
    bys subject type: gen total=_N
    bys subject: gen percent=(total/_N)*100
    gr bar percent,  over(subject) over(type) asyvars ///
    bargap(10) bar(1, color(red)) bar(2, color(blue)) ///
    ytitle("Percent") scheme(s1color)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.3 KB
ID:	1567701

    Comment


    • #3
      Your graphs appear to be for your full dataset, so using your example data inevitably produces something a bit different.

      Here are two more ways to approach this graph beyond the helpful answer from Andrew Musau. Andrew's post raises a strategic point, which is that sometimes you need to do some calculations ahead of the graph command to make it easier to get what you want. That's undoubtedly hard to know without detailed knowledge of the commands in question.

      First I used catplot from SSC, which you must install first, but which here is a wrapper for graph bar, so there is no difference of principle.

      Then I used tabplot which at the time of writing you should also download from SSC, although a longer write-up at
      https://www.stata-journal.com/articl...article=gr0066 remains germane. (Formal notification of the update is in press at Stata Journal 20(3).) tabplot is a wrapper for twoway rbar.

      I wrote both of these wrappers but neither program knows or cares that I am fonder of
      tabplot and ,find it more useful, both for my own problems and for those that pass my way. Being able to lose the legend (kill the key) is, I believe, a feature as legends are at best necessary evils and at worst so complicated that almost no-one can be bothered to read them in detail. I also am a fan of the idea of hybrid graphs and tables, in which a reader can focus on the graphical elements and/or on the tabulated results, as a matter of taste or importance.

      As in Andrew's post do please note the use of
      dataex for data examples as we do request (https://www.statalist.org/forums/help#stata).

      Andrew's graph is the same as my first graph, except for some cosmetic choices. The only important difference is that catplot will calculate the percents that Andrew calculates first, and I dare say there may be a way of getting graph bar to do that directly too.

      Stata makes it hard to add % to every number on an axis, and I approve.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str4 subject str1 type
      "Math" "A"
      "Hist" "A"
      "Hist" "B"
      "Math" "C"
      "Hist" "C"
      "Hist" "C"
      "Hist" "D"
      "Hist" "A"
      "Hist" "A"
      "Math" "A"
      "Math" "A"
      "Math" "A"
      "Math" "C"
      "Math" "C"
      "Hist" "D"
      "Math" "D"
      "Math" "A"
      end
      
      ssc inst catplot
      catplot subject type, percent(subject) recast(bar) asyvars bar(1, lcolor(blue) fcolor(blue*0.5)) bar(2, lcolor(red) fcolor(red*0.5)) yla(0(25)75, ang(h)) ytitle(%, orient(horiz)) name(danila1, replace)
      
      ssc inst tabplot
      tabplot type subject, percent(subject) separate(subject) bar1(lcolor(blue) fcolor(blue*0.5)) bar2(lcolor(red) fcolor(red*0.5)) showval subtitle(% within subject) aspect(1) name(danila2, replace)
      Last edited by Nick Cox; 09 Aug 2020, 04:37.

      Comment


      • #4
        Here are the graphs.

        Click image for larger version

Name:	danila1.png
Views:	1
Size:	14.8 KB
ID:	1567709


        Click image for larger version

Name:	danila2.png
Views:	1
Size:	16.2 KB
ID:	1567710

        Comment


        • #5
          Andrew's graph is the same as my first graph, except for some cosmetic choices. The only important difference is that catplot will calculate the percents that Andrew calculates first, and I dare say there may be a way of getting graph bar to do that directly too.
          Indeed, I can think of one way using separate. However, directly with the data as is, I struggle!

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str4 subject str1 type
          "Math" "A"
          "Hist" "A"
          "Hist" "B"
          "Math" "C"
          "Hist" "C"
          "Hist" "C"
          "Hist" "D"
          "Hist" "A"
          "Hist" "A"
          "Math" "A"
          "Math" "A"
          "Math" "A"
          "Math" "C"
          "Math" "C"
          "Hist" "D"
          "Math" "D"
          "Math" "A"
          end
          
          encode type, gen(Type)
          separate Type, by(subject)
          gr bar (percent) Type?,  over(Type) bargap(10) bar(1, color(red)) ///
          bar(2, color(blue)) ytitle("Percent") leg(order(1 "Hist" 2 "Math")) ///
          scheme(s1color)
          Last edited by Andrew Musau; 09 Aug 2020, 10:03.

          Comment


          • #6
            Awesome. Thank you so much, Andrew and Nick. I struggled for hours trying to figure out how to do that. It is indeed not as intuitive as I thought it would be! And Nick, I'm definitely incorporating catplot in my future codings. Also thanks for the tips on how to post in this forum!

            Comment


            • #7
              Hi Everyone,

              I am stuck at a problem but could not figure how to solve it. Below is an example of my dataset. Each row represents an observation (patients). The patients in the dataset may or may not have all the symptoms. Each presenting symptom is coded as a separate variable: facial_pain, hyposmia, anosmia headache etc. (0- Absent 1- Present).

              I want to create a single bar graph with multiple bars wherein each bar represent one particular symptom and all these bars are displayed next to each other as multiple bars.

              Thank you very much.

              Regards
              Pavan


              * Example generated by -dataex-. To install: ssc install dataex
              clear

              input byte(face_pain hypos anosmia headache cough)

              1 0 0 0 0
              1 0 0 1 0
              1 0 0 1 0
              1 1 0 1 1
              0 0 0 0 0
              1 0 0 1 0
              0 0 0 0 0




              Comment


              • #8
                If interested in #7, see

                https://www.statalist.org/forums/for...tiple-variable

                Comment


                • #9
                  Hello,

                  I am new to StataForum and I have a question about graph bar with categorical variables. I want to make a graph with two categorical variables, and I want to include the total for var1. The first variable (spm5) is a question from a survey and the second variable is the region the person lives in (no_standardgeo). I have this code:

                  Code:
                   graph bar (percent), over(spm5) over(no_standardgeo, lab(angle(0) labsize(vsmall))) stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ytitle(Prosent)legend(size(vsmall)) bar(1, color(ebg)) bar(2, color(ebblue)) bar(3, color(edkblue))
                  And this is how my graph looks like:
                  Click image for larger version

Name:	Graph_test.png
Views:	1
Size:	26.2 KB
ID:	1764912


                  And I want to include the total values for the variable spm5 in the graph as its own bar. Is this possible in STATA? I cannot find any options to include it.

                  Comment


                  • #10
                    Duplicating the data is one way of adding a total category. Consider the following:

                    Code:
                    sysuse auto, clear
                    graph bar (percent), over(foreign) over(rep78, lab(angle(0) labsize(vsmall))) ///
                    stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ///
                    ytitle(Prosent)legend(size(vsmall)) bar(1, color(ebg)) bar(2, color(ebblue)) ///
                    bar(3, color(edkblue)) saving(gr1, replace)
                    
                    preserve
                    expand 2, gen(new)
                    replace rep78=99 if new
                    graph bar (percent), over(foreign) over(rep78, relabel(6 "Total") lab(angle(0) labsize(vsmall))) ///
                    stack asyvars percentage blabel(bar, pos(center) format(%9.0f)) ytitle(Prosent)legend(size(vsmall)) ///
                    bar(1, color(ebg)) bar(2, color(ebblue)) bar(3, color(edkblue)) saving(gr2, replace)
                    restore
                    
                    gr combine gr1.gph gr2.gph, col(1)
                    Click image for larger version

Name:	Graph.png
Views:	1
Size:	45.4 KB
ID:	1764917

                    Comment


                    • #11
                      Thank you so much, Andrew Musau! That worked perfectly!

                      Comment


                      • #12
                        See https://journals.sagepub.com/doi/pdf...867X1401400117 for more on Andrew Musau's technique.

                        Comment

                        Working...
                        X