Creating a universal graph bar code that can be applied to datasets with different variables

Caitlin Pearson

Join Date: Jun 2021

Posts: 11
#1

Creating a universal graph bar code that can be applied to datasets with different variables

29 Jun 2021, 12:48

Hello,

I am creating stacked bar graphs in STATA to display the results of testing samples for several salmonella serotypes, across 6 years. My current code creates a bar graph which is close to what I want but needs a few alterations – the main one being we are trying to create a code that is universal and can be applied to different datasets and still produce the same graph? The issue is that the number and type of variables change (some datasets may have more salmonella serotypes than others, other datasets have slightly different serotypes).

The example here is a subset of data and sometimes we can have 30+ different serotypes to graph. One short cut I found was to list the variables as <negative-hadar> instead of listing all variables (<negative typhimurium heidelburg enteritidis hadar). But wondering if here is a better way I can tell STATA to graph all variables in the dataset, without having to change the code each time to reflect different variables?

An example of how my data looks:

* Example generated by -dataex-. For more info, type help dataex

Code:

clear input int year byte(negative typhimurium anatum enteritidis hadar) 2010 20 1 1 2 0 2011 18 2 1 2 3 2012 36 4 0 2 3 2013 29 0 1 0 4 2014 23 2 0 1 2 end

My code to generate the bar graph:

Code:

graph bar (asis) negative-hadar, over(year, label(labsize(vsmall))) stack /// legend (size(small)) legend(cols(4)) ylabel(,angle(horizontal) labsize(vsmall)) /// graphregion(color(white)) ytitle("Number of Samples", margin(medium)) /// blabel(bar,position(center) color(white))

My graph looks like this:

I was also wondering:

- Is there a way to label only those samples that tested negative? The blabel option labels every bar and I cannot figure out a way to label only the bars that represent negative samples. Is there there a way to label only a portion of bars on the graph?

- How do I get the bar graph to not graph salmonella serotypes that have a value of 0? One way I see to do this is to change all zero values to missing values with the following code: <mvdecode _all, mv(0)> before inputting my graph code. Since bar graph does not graph missing data this works. But I was wondering if there is a better or different way to achieve this?

Thanks.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10089

29 Jun 2021, 15:33

Thanks for the data example. You can automate the code if the datasets only include the used variables and the variables "negative" and "year" are always named as such. Deleting the bar labels can be done using the graph editor, via the undocumented command gr_edit. As far as the 0 categories go, your approach sounds good.

Code:

clear
input int year byte(negative typhimurium anatum enteritidis hadar)
2010 20 1 1 2 0
2011 18 2 1 2 3
2012 36 4 0 2 3
2013 29 0 1 0 4
2014 23 2 0 1 2
end

*EXTRACT VARIABLE NAMES, SAVE "NEGATIVE" AND "YEAR"
ds year negative, not
*NEEDED FOR THE LABELS
local vars=wordcount("`r(varlist)'")
*GRAPH
graph bar (asis) negative `r(varlist)', over(year, label(labsize(vsmall))) stack ///
legend (size(small)) legend(cols(4)) ylabel(,angle(horizontal) labsize(vsmall)) ///
graphregion(color(white)) ytitle("Number of Samples", margin(medium)) ///
blabel(bar,position(center) color(white))

*EXCLUDE LABELS
levelsof year, local(years)
local bars = wordcount("`years'")
local exclude 1
local count =`vars'+2
forval bars =1/`bars'{
    local exclude "`exclude' , `count'"
    local count = `count' + `vars'+ 1
}
forval i=1/`.Graph.plotregion1.barlabels.arrnels' {
    if !inlist(`i', `exclude'){
        gr_edit .plotregion1.barlabels[`i'].text[1]=" "
    }
}

Click image for larger version

Name: Graph.png
Views: 1
Size: 49.8 KB
ID: 1616844

Comment

Caitlin Pearson

Join Date: Jun 2021

Posts: 11
#3

30 Jun 2021, 06:01

Wow, thank you very much Andrew! That is exactly how I wanted it to work.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35446

30 Jun 2021, 06:34

Click image for larger version

Name: salmonella.png
Views: 1
Size: 22.0 KB
ID: 1616956

The stacked bar design (some say divided or subdivided bar) is popular but both coders and readers struggle with datasets in which frequencies can be zero or even very small.

tabplot from the Stata Journal offers a different take. Here is some sample code with the result above. https://www.statalist.org/forums/for...updated-on-ssc gives a quick tour (originally posted before the SJ write-up).

Code:

clear
input int year byte(negative typhimurium anatum enteritidis hadar)
2010 20 1 1 2 0
2011 18 2 1 2 3
2012 36 4 0 2 3
2013 29 0 1 0 4
2014 23 2 0 1 2
end

local labels 1 negative 

*EXTRACT VARIABLE NAMES, SAVE "NEGATIVE" AND "YEAR"
ds year negative, not 

preserve 

local j = 1
foreach v in `r(varlist)' { 
    local ++j 
    local labels `labels' `j' `v'
}

label def toshow `labels', modify 

rename (negative `r(varlist)') (count=) 

reshape long count, i(year) j(type) string 

encode type, gen(toshow) label(toshow)

tabplot toshow year [fw=count], yla(1/`j', valuelabel) showval ytitle("") xtitle("") scheme(s1color) blcolor(blue) bfcolor(blue*0.2) title(Salmonella serotypes)

restore

Comment

Caitlin Pearson

Join Date: Jun 2021

Posts: 11
#5

13 Jul 2021, 12:54

Thanks Nick for demonstrating how this data would look in tabplot. I do agree it's much easier to extract this information! I will look into using these for sure.
Comment

Caitlin Pearson

Join Date: Jun 2021
Posts: 11

28 Jul 2021, 08:43

Hello Andrew,
When I use code first to change zeros in the dataset to missing values, the removal of labels no longer works. See code and graph below. I see that removing zero values disrupts the sequence by which labels are remove. Is there a way to fix this? I would like to not graph zero values otherwise when I have 10 years in the graph and 20+ serovars for each year it takes a long time to remove all zero labels. Is this possible? Again, something I can live with but would be more efficient for sure.
Thanks,

Code:

clear
input int year byte(negative typhimurium anatum enteritidis hadar)
2010 20 1 1 2 0
2011 18 2 1 2 3
2012 36 4 0 2 3
2013 29 0 1 0 4
2014 23 2 0 1 2
end
 
mvdecode _all, mv(0)
ds year negative, not
local vars=wordcount("`r(varlist)'")
graph bar (asis) negative `r(varlist)', over(year, label(labsize(vsmall))) stack ///
legend (size(vsmall)) legend(region(lstyle(none))) legend(cols(4) colgap(5.2) symxsize(9)) ylabel(,angle(horizontal) labsize(vsmall)) ///
graphregion(color(white)) ytitle("Number of Samples", margin(medium)) ///
blabel(bar,position(center) color(white))
 
levelsof year, local(years)
local bars = wordcount("`years'")
local exclude 1
local count =`vars'+2
forval bars =1/`bars'{
    local exclude "`exclude' , `count'"
    local count = `count' + `vars'+ 1
}
forval i=1/`.Graph.plotregion1.barlabels.arrnels' {
    if !inlist(`i', `exclude'){
        gr_edit .plotregion1.barlabels[`i'].text[1]=" "
    }
}

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10089
#7

28 Jul 2021, 14:18

Why do you want to remove zeros? You are graphing counts, so zero counts will not be displayed. As a matter of fact, I suggest the opposite in your related thread that seeks to keep the bar colors consistent.
https://www.statalist.org/forums/for...-on-bar-graphs
1 like
Comment

Announcement