Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Displaying categorical variable with a frequency of zero on a bar graph / common scale of categorical bar graphs combined

    I would like to be able to display each of the 7 categorical variables in a bar graph, even if they have a frequency count of 0.

    My current code performs the correct output for my data, but excludes the variables with a count of 0.

    The example of my data below contains the combined 11 graphs that are relevant for this data-set (`d' = 11).


    1) Is there a simple method to display variables in a bar graph that have a frequency count of 0?

    or

    2) Is there a method when combing multiple graphs, as completed below, to use a common scale for the categorical axis as can be done with the yaxis (i.e. ycommon)?


    Code:
    forvalues i=1(1)`d'{
        local j=`i'+1
        summ index`j', meanonly
        local z= r(sum)
        graph bar, over(s_`i'_SIC_) title("Index `i'") subtitle((n=`z'){superscript:{&alpha}}) ytitle("Percentage (%)")
        graph save SIC_`i'_percent, replace
        local SIC_percent`"`SIC_percent' "SIC_`i'_percent""'
    }
    gr combine     `SIC_percent', title("title") ycommon        
        gr save freq_combined_percent, replace

    I have tried adding the allcategories group_options but cannot arrive at the correct output.

    Any solutions to the above?

    Liam
    Attached Files

  • #2
    Hi, I have the same problem with my data. I am looking at the scores of resilience 'before' and 'after' an intervention. I used the both the scores to make a dummy of seven categories running across the X axis which categorises the group of scores into 'very low' 'low' 'medium' according to their scores. I did cut these variables for both 'before' (totalprers) and 'after'(totalpostrs):

    egen resrankpre= cut(totalprers), at(14,57,65,74,82,91,98)
    lab var resrankpre "Resilience score ranking"
    lab define resrankprel 14 "very low" 57 "low" 65 "on low end" 74 "moderate" 82 "moderately high" 91 "high"
    lab values resrankpre resrankprel

    egen resrankpost= cut(totalpostrs), at(14,57,65,74,82,91,98)
    lab var resrankpost "Resilience score ranking after intervention"
    lab define resrankpostlab 14 "very low" 57 "low" 65 "on low end" 74 "moderate" 82 "moderately high" 91 "high"
    lab values resrankpost resrankpostlab
    tab resrankpost

    I want to compare the proportion of those who had 'very low/low/medium/high' before and after the intervention.I can do this, but I can't seem to find a way of getting the categories that have 0 to appear in my graphs or tabulate command. When I create two separate graphs of 'before' and 'after' it only creates columns for those variables with items. I need all the categories to appear in order to let the data speak for itself.

    I created two different graphs and saved them in order to use stata graph combine command and then 'ycommon' meant they are comparable across the same Y scale.

    graph bar, over(resrankpre) name(p2)
    graph bar, over(resrankpost) name(p3)
    graph combine p2 p3, ycommon scale(1.4)

    I have tried everything for the last six hours. I read somewhere that we ought to turn our variables with 0 into 'missing data' but nobody has really explained this in much more detail and can see nothing that will help. If you see the attached graph, you can understand why it is important that I include all categories before and after. In the before category a few were in the "very low" but after there was none in the "very low" so the graph starts from the "low" category instead. This makes the comparison across graphs difficult and confusing. I would ideally like to combine both these graphs onto the same scale, but am finding it hard. If you could please help me with a) showing all variables in the graph even when there are 0 observations and b) combining the two graphs
    Thanks very much!
    Click image for larger version

Name:	before and after with categories.PNG
Views:	1
Size:	12.3 KB
ID:	1445492

    Comment


    • #3
      Data example please. FAQ Advice #12 explains.

      That said, your problem is easy to mimic. You have 7 possible values but they don't all occur in each variable.

      If you reshape your data, then zero bars will be shown by default in a two-variable graph. Here are two ways to approach the problem. I note that you are struggling with the interesting detail on what the categories are by using small font. I suggest using horizontal bars instead.

      For much more on tabplot see e.g. https://www.statalist.org/forums/for...updated-on-ssc and its references.

      Code:
      * sandbox to play 
      clear
      set scheme s1color 
      set obs 20 
      set seed 2803
      gen x1 = runiformint(1, 7) 
      gen x2 = runiformint(1, 7) 
      label def x 1 abysmal 2 appalling 3 adequate 4 acceptable 5 admirable 6 amazing 7 "!!!" 
      label val x1 x 
      label val x2 x 
      tab1 x? 
      
      * solutions 
      gen id = _n 
      reshape long x, i(id) j(which) 
      label val x x 
      label def which 1 before 2 after 
      label val which which 
      
      * install from Stata Journal 
      tabplot x, horiz by(which, note("")) showval 
      
      graph hbar (count),  over(which) over(x)
      Click image for larger version

Name:	tabplot_G1.png
Views:	1
Size:	17.4 KB
ID:	1445536
      Click image for larger version

Name:	tabplot_G2.png
Views:	1
Size:	30.8 KB
ID:	1445537


      Comment


      • #4
        Thanks very much. I was slightly confused by the syntax, but just used intuition and copied the code you wrote with my own data and it worked like a charm! It is greatly appreciated (and thanks for the observation about text size!)

        Comment


        • #5
          The syntax of graph hbar can be blamed on StataCorp. I tried a slightly different syntax in catplot (SSC).

          Comment


          • #6
            I think Nick's examples are great... especially when you have to create a two variable graph.

            However, I wrote some code to show zero frequencies with a single bar graph. There must be an easier way to do this, of course. I would love to learn!

            Code:
            clear all
            
            * Essentially, I'm creating a variable that can take four possible values (terrible, adequate, acceptable and awesome).
            
            * However, as we have only five observations, there are no 'adequate' responses' in this simulation.
            
            set obs 5
            set seed 42
            gen x1 = runiformint(1, 4)
            label def x 1 Terrible 2 Adequate 3 Acceptable 4 Awesome
            label val x1 x
            tab x
            
            * I can now contract the data based on frequency and generate four separate variables that take the value of the count.
            
            * The crucial bit here is that we now have a variable that takes the value 0 for the missing 'adequate' responses.
            
            contract x, freq(count)
            gen a = 0
            gen b = 0
            gen c = 0
            gen d = 0
            replace a = count if x    == 1
            replace b = count if x    == 2
            replace c = count if x    == 3
            replace d = count if x    == 4
            
            * I can now collapse my data - so that all observations are in a single line and create my graph.
            
            collapse (sum)     a b c d
            graph bar (sum)  a b c d,  yvaroptions( relabel(1 "1. Terrible" 2 "2 Adequate" 3 "3 Acceptable" 4 "4. Awesome") label(labsize(small))) ascategory title(Example) blabel(bar)
            graph export example.png, replace
            
            * Of course, I can create multiple graphs by running a loop and using the preserve and restore commands.
            I hope this is useful to someone someday!

            Comment

            Working...
            X