Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar Graph for categorical variables

    Hello everyone!

    I was wodering if any of you could perhaps help me to make a bar graph for five categorical variables. Each of them has 600 observations and the possible values are the following: "correct", "incorrect", "refused to answer".

    I need to have vertical bars, one for each variable, that show the percentage of each category.

    Any suggestions?

    I have tried the following command:
    . graph bar (percent), over(f_skill_el_?) stack
    But it seems like I am including too many variables

    I have also tried using commands such as catplot but I think the maximum number of variables is 3.

  • #2
    You will need to reshape your data and calculate the percentages if using graph. For your future posts, please familiarize yourself with the dataex command for presenting data examples (see FAQ Advice #12 for details).

    Code:
    clear
    *GENERATE DATASET
    set obs 600
    set seed 12232023
    forval i=1/6{
        gen f_skill_el_`i'= runiformint(0,1)< 1-0.`i'
        replace f_skill_el_`i'=2 in 1/`i'0
    }
    lab define f_skill 2 "Don't know" 0 "Correct" 1 "Incorrect"
    lab values f_skill* f_skill  
    
    *START HERE
    gen obs_no=_n
    reshape long f_skill_el_, i(obs_no) j(which)
    bys which: gen percent=_N
    bys f_skill_el_ which: replace percent= (_N/percent)*100
    graph bar percent, over(f_skill_el_) over(which ) asyvars stack ytitle("Percent")
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	33.9 KB
ID:	1738058

    Comment


    • #3
      catplot is from SSC (FAQ Advice #12).

      Here is some technique:

      Code:
      * sandbox, given no data example
      clear
      set obs 600
      
      set seed 2312
      
      forval j = 1/5 {
          gen f_skill_el_`j' = cond(_n < 600 / (3 * `j'), 3, cond(_n < 600 / (2 * `j'), 2, 1))
      }
      
      * you start here -- but use sensible value labels
      
      preserve
      
      stack f_skill_el_?, into(toshow) clear
      label def answer 1 "correct" 2 "incorrect" 3 "refused"
      label val toshow answer
      
      label def _stack 1 frog 2 toad 3 newt 4 dragon 5 gecko
      label val _stack _stack
      
      tab toshow _stack
      
      catplot toshow , over(_stack) asyvars l1title("") ysc(alt)
      
      restore

      Comment


      • #4
        Hi Andrew and Nick!

        Thank you very much for your help! I have followed your recommendations and I still cannot manage to get what I need.

        I cannot use dataex since the data I am working with is confidential. So, I will provide more details regarding the data I have

        Context: we interviewed 600 people. We divided them into 4 groups regarding their marital status: single, married, divorced, and widowed. Moreover, each person had to respond to six questions about their skills.
        This is where the variables come from. Each group has six variables, one for each question.

        Group 1: single
        Question 1: f_skill_el_1
        Question 2: f_skill_el_2
        ...

        Group 2: married
        Question 1: f_skill_ma_1
        Question 2: f_skill_ma_2
        ...

        Group 3: divorced
        Question 1: f_skill_di_1
        Question 2: f_skill_di_2
        ​​​​​​​...

        Group 4: widowed
        Question 1: f_skill_wi_1
        Question 2: f_skill_wi_2
        ​​​​​​​...

        Each variable f_skill_??_? can have three possible values: 1 "correct", 2 "incorrect", 3 "refused to answer".


        I have already created the variables and labeled them correctly.

        Now, my task is to have one graph for each group. In each graph, I must have just six vertical bars, one for each question, that show the percentage of people who responded correctly, incorrectly or refused to respond (as you can see in the graph picture of Andrew).

        When I try to reshape as Andrew suggested, I get an error because no xij variables found.

        Thank you again for your help and please let me know if I should provide more information. This is my first time on this Forum. So I am still getting used to it.

        Comment


        • #5
          https://www.statalist.org/forums/help#stata 12.2 says

          If your dataset is confidential, then provide a fake example instead.
          A wild guess is that variables for 4 groups can be combined by something like

          Code:
          forval j = 1/6 { 
              egen f_skill_`j' = rowmax(f_skill_??_`j') 
          }
          as presumably the single people don't answer questions aimed at married, divorced, widowed; and more generally in each observation there are missing values for three out of each four variables.

          So, continuing this wild guess, you can reduce your data to the group identifier and the six variables created by the loop above.

          We just need to see the frequencies as shown by those seven variables and e.g. the groups command from the Stata Journal.

          If this guess is wrong, or you cannot follow what I am saying, and no one else gives a better answer, then as said I think we need a realistic data example.

          Comment


          • #6
            Hi Nick Cox ,

            Thank you for your quick answer.

            I think I still need to be more specific with what I am looking for.

            I made a very simple dataset with the following commands:

            set obs 10

            gen Group1_1 = floor(3 * runiform())
            replace Group1_1 = 99 if Group1_1 == 2

            gen Group1_2 = floor(3 * runiform())
            replace Group1_2 = 99 if Group1_2 == 2

            label define response_lbl 0 "Incorrect" ///
            1 "Correct" ///
            99 "Refuse"

            label values Group1_? response_lbl



            Now, I would like to get a graph like the one attached.

            I hope this is more clear now. If not, please let me know.

            Click image for larger version

Name:	Bar graph.jpg
Views:	1
Size:	812.1 KB
ID:	1738798
            Last edited by Lisa Alejandra Kobrich; 03 Jan 2024, 15:03.

            Comment


            • #7
              The implementation closely follows #2, except that you have multiple groups.

              Code:
              clear
              set obs 100
              set seed 01032024
              forval i=1/4{
                  gen Group`i'_1 = floor(3 * runiform())
                  replace Group`i'_1 = 99 if Group`i'_1 == 2
                  gen Group`i'_2 = floor(3 * runiform())
                  replace Group`i'_2 = 99 if Group`i'_2 == 2
              }
              label define response_lbl 0 "Incorrect" ///
              1 "Correct" ///
              99 "Refused to answer"
              label values Group* response_lbl
              gen long obs_no=_n
              rename Group* response*
              reshape long response, i(obs_no) j(which) string
              gen Q= "Q" + substr(which, -1, 1)
              gen group= "Group " + substr(which, 1, 1)
              drop which
              bys group Q: gen percent=_N
              bys response group Q: replace percent= (_N/percent)*100
              graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, note(""))
              Click image for larger version

Name:	Graph.png
Views:	1
Size:	38.9 KB
ID:	1738803

              Last edited by Andrew Musau; 03 Jan 2024, 15:29.

              Comment


              • #8
                Hi Andrew Musau

                Thank you so much! This is really helpful!!

                I just have one more question, if you do not mind. There are some missing values in my dataset because not everyone is required to respond to all the questions. How can I get the percentage out of the total of respondents required to respond to that specific question and not the total respondents in general?

                Comment


                • #9
                  This takes Andrew Musau's helpful example one step further and simulates some missing values. A good way to ignore them is to drop them before the graphics.

                  I add a display using tabplot from the Stata Journal. This dispenses with a statement of the obvious -- that percent stacked bars add to 100% -- in favour of a layout that allows more direct comparison within the display, including display of percents themselves.

                  For a quick overview of tabplot skip and skim through https://www.statalist.org/forums/for...updated-on-ssc

                  Code:
                  clear
                  set obs 100
                  set seed 01032024
                  forval i=1/4{
                      gen Group`i'_1 = floor(3 * runiform())
                      replace Group`i'_1 = 99 if Group`i'_1 == 2
                      gen Group`i'_2 = floor(3 * runiform())
                      replace Group`i'_2 = 99 if Group`i'_2 == 2
                  }
                  label define response_lbl 0 "Incorrect" ///
                  1 "Correct" ///
                  99 "Refused to answer"
                  label values Group* response_lbl
                  gen long obs_no=_n
                  rename Group* response*
                  reshape long response, i(obs_no) j(which) string
                  replace response = . if runiform() < 0.07 
                  gen Q= "Q" + substr(which, -1, 1)
                  gen group= "Group " + substr(which, 1, 1)
                  drop which
                  drop if missing(response)
                  
                  
                  bys group Q: gen percent=_N
                  bys response group Q: replace percent= (_N/percent)*100
                  graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, note("")) name(Andrew, replace)
                  
                  
                  tabplot response Q, percent(Q group) separate(response) by(group, row(1) note("")) name(Nick, replace) ytitle("") xtitle("") showval(format(%2.1f))
                  Click image for larger version

Name:	Andrew.png
Views:	1
Size:	67.3 KB
ID:	1738810
                  Click image for larger version

Name:	Nick.png
Views:	1
Size:	68.9 KB
ID:	1738811

                  Comment


                  • #10
                    Nick Cox thank you very much! It works just fine now.

                    Do you know if it would be possible to add a general title on the top of all the graphs?

                    Comment


                    • #11
                      Code:
                      graph bar percent, over(response) over(Q) asyvars stack ytitle("Percent") by(group, title(Whatever) note("")) name(Andrew, replace)
                      
                      
                      tabplot response Q, percent(Q group) separate(response) by(group, title(Whatever) row(1) note("")) name(Nick, replace) ytitle("") xtitle("") showval(format(%2.1f))

                      Comment


                      • #12
                        Oh great! Thanks!!

                        Comment

                        Working...
                        X