Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to plot several stacked percent bar charts side-by-side with groups of variables and subgraphs?

    Hello,

    I did a cluster analysis of categorical variables and want to plot the result in a summary graph. There are three groups of variables that contain 'dummy variables'. I'm able to plot one group of these stacked 'dummy variables' with subgraphs by cluster membership. But I want to add two more groups of variables next to the bar.
    Thats the first group of variables:

    Click image for larger version

Name:	wZudJ.png
Views:	2
Size:	18.0 KB
ID:	1332614



    Code:
    graph bar a_group1 b_group1 c_group1 d_group1 e_group1 f_group1 x_ group1,
    by(, legend(off)) xsize(6) ysize(8) aspectratio(1.2)
    by(clus_8_ward_gower) stack percent
    How do I add x_group2 and x_group3 that they are displayed each stacked side-by-side by cluster membership (see sketch, ignoring the 'count' bar)?
    Click image for larger version

Name:	2016-03-25 15.36.08.jpg
Views:	2
Size:	1.66 MB
ID:	1332615



    Is it possible to add a fourth variable next to the percentage-bars that displays a mean on a second scale (see whole sketch)?
    I did an extensive google search and read the state documentation, but I couldn't figure out how to do it.
    If you don't know a solution, perhaps you have a better idea how to visualize clustered categorical data by groups.

    Thanks in advance, Lars
    Stacked variable group 1 by cluster membership. Sketch of what I want to plot.
    Last edited by Lars Hennsky; 25 Mar 2016, 14:25.

  • #2
    Cross-posted at http://stats.stackexchange.com/quest...cent-bar-chart (and on hold as off topic, but an interesting answer is visible as I write).

    "Lars Vegas": please see

    http://www.statalist.org/forums/help#crossposting Explicit policy on cross-posting

    http://www.statalist.org/forums/help#stata Advice on posting examples

    http://www.statalist.org/forums/help#realnames Please use full real names



    Comment


    • #3
      Hello Nick,

      thank you for your advice.

      I contacted the forum administrators to change the name to my real one.

      Because I am unable to edit my original post, here is the additional information:

      I cross-posted this question to Cross Validated / Stack Exchange (http://stats.stackexchange.com/quest.../204078#204078), where a user already proposed to use the package "combineplot". But I did not mange to stack the variable groups with this package.

      In the meantime I made some progress with the following code. But I still can not stack the groups of variables.
      graph bar (sum) variable_1 variable_2 (sum) variable_3 variable_4 (sum) variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 (sum) variable_12 variable_13, nofill percentages showyvars yvaroptions(relabel(1 group_1 2 group_1 3 group_2 4 group_2 5 group_3 6 group_3 7 group_3 8 group_3 9 group_3 10 group_3 11 group_3 12 group_4 13 group_4) label(angle(forty_five) labsize(vsmall))) by(, legend(off)) name(ward_gower_bar_11, replace) by(clus_11_ward_gower)
      Click image for larger version

Name:	red_ward_gower_bar_11_clean_labels_forum.png
Views:	1
Size:	82.1 KB
ID:	1333089


      Perhaps this clarifies my problem.

      Comment


      • #4
        That helps (thanks), but a sample dataset would help more. See the second link in #2.

        Comment


        • #5
          Ok thanks for the advice.
          I am using Stata 14.1 on Windows 7.
          Here is my excerpt:


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int ID byte(variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 variable_12 variable_13 clus_11_ward_gower)
             1 1 0 1 0 0 1 0 0 0 0 0 1 0  1
             2 1 0 0 1 0 1 0 0 0 0 0 1 0  2
             3 1 0 0 1 0 0 0 0 0 1 0 1 0  3
             4 1 0 1 0 0 0 1 0 0 0 0 1 0  3
             5 1 0 1 0 0 0 0 0 0 0 1 1 0  3
             6 1 0 1 0 1 0 0 0 0 0 0 0 1  6
             7 1 0 1 0 0 1 0 0 0 0 0 1 0  1
             8 1 0 0 1 1 0 0 0 0 0 0 1 0  4
             9 1 0 1 0 0 1 0 0 0 0 0 1 0  1
            10 0 1 0 1 0 1 0 0 0 0 0 1 0  9
            11 1 0 1 0 0 0 0 0 0 1 0 1 0  3
            12 1 0 0 1 0 0 0 1 0 0 0 0 1  7
            13 1 0 0 1 0 0 1 0 0 0 0 1 0  3
            14 1 0 1 0 0 0 0 0 1 0 0 1 0  3
            15 1 0 0 1 0 1 0 0 0 0 0 1 0  2
            16 1 0 1 0 0 0 0 1 0 0 0 1 0  3
            17 0 1 1 0 0 1 0 0 0 0 0 1 0 10
            18 0 1 0 1 1 0 0 0 0 0 0 1 0  9
            19 0 1 1 0 0 1 0 0 0 0 0 1 0 10
            20 1 0 1 0 1 0 0 0 0 0 0 1 0  5
            21 0 1 1 0 1 0 0 0 0 0 0 1 0 11
            22 0 1 1 0 0 0 0 1 0 0 0 1 0 11
            23 1 0 1 0 0 1 0 0 0 0 0 1 0  1
            24 1 0 1 0 0 1 0 0 0 0 0 0 1  6
            25 1 0 1 0 0 1 0 0 0 0 0 1 0  1
            26 0 1 0 1 1 0 0 0 0 0 0 0 1  7
            27 1 0 0 1 0 0 0 1 0 0 0 1 0  3
            28 1 0 0 1 0 1 0 0 0 0 0 1 0  2
            29 1 0 0 1 1 0 0 0 0 0 0 0 1  7
            30 0 1 1 0 1 0 0 0 0 0 0 1 0 11
          end

          Comment


          • #6
            Thanks for the example. The size of the problem and the names of the variables seem to change from post to post! Here I am guessing at what you most want.

            My major advice is that you will find graphics a lot easier if you restructure to fewer variables.

            My minor advice is that stacking bars doesn't always help to see structure in data. You can just get a fruit salad display of many colours that has to be decoded.

            I used tabplot (SSC). Here's my code and the result.

            I'd recommend strongly that you use correspondence analysis to produce a seriation here. Unless the names of the variables have inherent meaning, it is highly likely that the original variables and the clusters can be reshuffled to produce a better order.

            Code:
            clear
            set scheme s1color
            input int ID byte(variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 variable_12 variable_13 clus_11_ward_gower)
               1 1 0 1 0 0 1 0 0 0 0 0 1 0  1
               2 1 0 0 1 0 1 0 0 0 0 0 1 0  2
               3 1 0 0 1 0 0 0 0 0 1 0 1 0  3
               4 1 0 1 0 0 0 1 0 0 0 0 1 0  3
               5 1 0 1 0 0 0 0 0 0 0 1 1 0  3
               6 1 0 1 0 1 0 0 0 0 0 0 0 1  6
               7 1 0 1 0 0 1 0 0 0 0 0 1 0  1
               8 1 0 0 1 1 0 0 0 0 0 0 1 0  4
               9 1 0 1 0 0 1 0 0 0 0 0 1 0  1
              10 0 1 0 1 0 1 0 0 0 0 0 1 0  9
              11 1 0 1 0 0 0 0 0 0 1 0 1 0  3
              12 1 0 0 1 0 0 0 1 0 0 0 0 1  7
              13 1 0 0 1 0 0 1 0 0 0 0 1 0  3
              14 1 0 1 0 0 0 0 0 1 0 0 1 0  3
              15 1 0 0 1 0 1 0 0 0 0 0 1 0  2
              16 1 0 1 0 0 0 0 1 0 0 0 1 0  3
              17 0 1 1 0 0 1 0 0 0 0 0 1 0 10
              18 0 1 0 1 1 0 0 0 0 0 0 1 0  9
              19 0 1 1 0 0 1 0 0 0 0 0 1 0 10
              20 1 0 1 0 1 0 0 0 0 0 0 1 0  5
              21 0 1 1 0 1 0 0 0 0 0 0 1 0 11
              22 0 1 1 0 0 0 0 1 0 0 0 1 0 11
              23 1 0 1 0 0 1 0 0 0 0 0 1 0  1
              24 1 0 1 0 0 1 0 0 0 0 0 0 1  6
              25 1 0 1 0 0 1 0 0 0 0 0 1 0  1
              26 0 1 0 1 1 0 0 0 0 0 0 0 1  7
              27 1 0 0 1 0 0 0 1 0 0 0 1 0  3
              28 1 0 0 1 0 1 0 0 0 0 0 1 0  2
              29 1 0 0 1 1 0 0 0 0 0 0 0 1  7
              30 0 1 1 0 1 0 0 0 0 0 0 1 0 11
            end
            
            reshape long variable, i(ID) string
            rename variable frequency
            destring _j, ignore(_) gen(variable)
            
            * to use -tabplot- you must install it first
            * ssc inst tabplot
            
            tabplot clus variable [fw=frequency], bfcolor(none) showval(mlabsize(*0.8) mlabcolor(black))
            Click image for larger version

Name:	clusterbarplot.png
Views:	1
Size:	27.6 KB
ID:	1333118



            Here's extra code for a seriation. Experts on correspondence analysis might well quibble here, but I think the main idea is sound. search labmask to find a download location.

            Code:
            ca variable clus [fw=freq]
            predict rowscore, row(1)
            predict colscore, col(1)
            egen new_variable = group(rowscore variable)
            label var new_variable "variable"
            labmask new_variable, values(variable)
            egen new_cluster = group(colscore clus)
            label var new_cluster "clus_11_ward_gower"
            labmask new_cluster, values(clus)
            tabplot new_clus new_variable [fw=frequency], bfcolor(none) showval(mlabsize(*0.8) mlabcolor(black))
            Click image for larger version

Name:	clusterbarplot2.png
Views:	1
Size:	27.7 KB
ID:	1333119

            Last edited by Nick Cox; 29 Mar 2016, 13:10.

            Comment

            Working...
            X