Displaying categorical variable with a frequency of zero on a bar graph / common scale of categorical bar graphs combined

Liam Toohey

Join Date: Jan 2017

Posts: 25
#1

Displaying categorical variable with a frequency of zero on a bar graph / common scale of categorical bar graphs combined

28 Feb 2017, 18:03

I would like to be able to display each of the 7 categorical variables in a bar graph, even if they have a frequency count of 0.

My current code performs the correct output for my data, but excludes the variables with a count of 0.

The example of my data below contains the combined 11 graphs that are relevant for this data-set (`d' = 11).

1) Is there a simple method to display variables in a bar graph that have a frequency count of 0?

or

2) Is there a method when combing multiple graphs, as completed below, to use a common scale for the categorical axis as can be done with the yaxis (i.e. ycommon)?

Code:

forvalues i=1(1)`d'{ local j=`i'+1 summ index`j', meanonly local z= r(sum) graph bar, over(s_`i'_SIC_) title("Index `i'") subtitle((n=`z'){superscript:{&alpha}}) ytitle("Percentage (%)") graph save SIC_`i'_percent, replace local SIC_percent`"`SIC_percent' "SIC_`i'_percent""' } gr combine `SIC_percent', title("title") ycommon gr save freq_combined_percent, replace

I have tried adding the allcategories group_options but cannot arrive at the correct output.

Any solutions to the above?

Liam
Attached Files
Tags: None

1 like
Niamh Mohan

Join Date: May 2018

Posts: 2
#2

23 May 2018, 10:16

Hi, I have the same problem with my data. I am looking at the scores of resilience 'before' and 'after' an intervention. I used the both the scores to make a dummy of seven categories running across the X axis which categorises the group of scores into 'very low' 'low' 'medium' according to their scores. I did cut these variables for both 'before' (totalprers) and 'after'(totalpostrs):

egen resrankpre= cut(totalprers), at(14,57,65,74,82,91,98)
lab var resrankpre "Resilience score ranking"
lab define resrankprel 14 "very low" 57 "low" 65 "on low end" 74 "moderate" 82 "moderately high" 91 "high"
lab values resrankpre resrankprel

egen resrankpost= cut(totalpostrs), at(14,57,65,74,82,91,98)
lab var resrankpost "Resilience score ranking after intervention"
lab define resrankpostlab 14 "very low" 57 "low" 65 "on low end" 74 "moderate" 82 "moderately high" 91 "high"
lab values resrankpost resrankpostlab
tab resrankpost

I want to compare the proportion of those who had 'very low/low/medium/high' before and after the intervention.I can do this, but I can't seem to find a way of getting the categories that have 0 to appear in my graphs or tabulate command. When I create two separate graphs of 'before' and 'after' it only creates columns for those variables with items. I need all the categories to appear in order to let the data speak for itself.

I created two different graphs and saved them in order to use stata graph combine command and then 'ycommon' meant they are comparable across the same Y scale.

graph bar, over(resrankpre) name(p2)
graph bar, over(resrankpost) name(p3)
graph combine p2 p3, ycommon scale(1.4)

I have tried everything for the last six hours. I read somewhere that we ought to turn our variables with 0 into 'missing data' but nobody has really explained this in much more detail and can see nothing that will help. If you see the attached graph, you can understand why it is important that I include all categories before and after. In the before category a few were in the "very low" but after there was none in the "very low" so the graph starts from the "low" category instead. This makes the comparison across graphs difficult and confusing. I would ideally like to combine both these graphs onto the same scale, but am finding it hard. If you could please help me with a) showing all variables in the graph even when there are 0 observations and b) combining the two graphs
Thanks very much!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#3

23 May 2018, 13:17

Data example please. FAQ Advice #12 explains.

That said, your problem is easy to mimic. You have 7 possible values but they don't all occur in each variable.

If you reshape your data, then zero bars will be shown by default in a two-variable graph. Here are two ways to approach the problem. I note that you are struggling with the interesting detail on what the categories are by using small font. I suggest using horizontal bars instead.

For much more on tabplot see e.g. https://www.statalist.org/forums/for...updated-on-ssc and its references.

Code:

* sandbox to play clear set scheme s1color set obs 20 set seed 2803 gen x1 = runiformint(1, 7) gen x2 = runiformint(1, 7) label def x 1 abysmal 2 appalling 3 adequate 4 acceptable 5 admirable 6 amazing 7 "!!!" label val x1 x label val x2 x tab1 x? * solutions gen id = _n reshape long x, i(id) j(which) label val x x label def which 1 before 2 after label val which which * install from Stata Journal tabplot x, horiz by(which, note("")) showval graph hbar (count), over(which) over(x)
Comment
Niamh Mohan

Join Date: May 2018

Posts: 2
#4

25 May 2018, 08:55

Thanks very much. I was slightly confused by the syntax, but just used intuition and copied the code you wrote with my own data and it worked like a charm! It is greatly appreciated (and thanks for the observation about text size!)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#5

25 May 2018, 09:37

The syntax of graph hbar can be blamed on StataCorp. I tried a slightly different syntax in catplot (SSC).
Comment

Kabira Namit

Join Date: Nov 2018
Posts: 11

04 Mar 2020, 19:37

I think Nick's examples are great... especially when you have to create a two variable graph.

However, I wrote some code to show zero frequencies with a single bar graph. There must be an easier way to do this, of course. I would love to learn!

Code:

clear all

* Essentially, I'm creating a variable that can take four possible values (terrible, adequate, acceptable and awesome).

* However, as we have only five observations, there are no 'adequate' responses' in this simulation.

set obs 5
set seed 42
gen x1 = runiformint(1, 4)
label def x 1 Terrible 2 Adequate 3 Acceptable 4 Awesome
label val x1 x
tab x

* I can now contract the data based on frequency and generate four separate variables that take the value of the count.

* The crucial bit here is that we now have a variable that takes the value 0 for the missing 'adequate' responses.

contract x, freq(count)
gen a = 0
gen b = 0
gen c = 0
gen d = 0
replace a = count if x    == 1
replace b = count if x    == 2
replace c = count if x    == 3
replace d = count if x    == 4

* I can now collapse my data - so that all observations are in a single line and create my graph.

collapse (sum)     a b c d
graph bar (sum)  a b c d,  yvaroptions( relabel(1 "1. Terrible" 2 "2 Adequate" 3 "3 Acceptable" 4 "4. Awesome") label(labsize(small))) ascategory title(Example) blabel(bar)
graph export example.png, replace

* Of course, I can create multiple graphs by running a loop and using the preserve and restore commands.

I hope this is useful to someone someday!

Announcement

Displaying categorical variable with a frequency of zero on a bar graph / common scale of categorical bar graphs combined

Comment

Comment

Comment

Comment

Comment