Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a universal graph bar code that can be applied to datasets with different variables

    Hello,

    I am creating stacked bar graphs in STATA to display the results of testing samples for several salmonella serotypes, across 6 years. My current code creates a bar graph which is close to what I want but needs a few alterations – the main one being we are trying to create a code that is universal and can be applied to different datasets and still produce the same graph? The issue is that the number and type of variables change (some datasets may have more salmonella serotypes than others, other datasets have slightly different serotypes).

    The example here is a subset of data and sometimes we can have 30+ different serotypes to graph. One short cut I found was to list the variables as <negative-hadar> instead of listing all variables (<negative typhimurium heidelburg enteritidis hadar). But wondering if here is a better way I can tell STATA to graph all variables in the dataset, without having to change the code each time to reflect different variables?


    An example of how my data looks:

    * Example generated by -dataex-. For more info, type help dataex
    Code:
    clear
    input int year byte(negative typhimurium anatum enteritidis hadar)
    2010 20 1 1 2 0
    2011 18 2 1 2 3
    2012 36 4 0 2 3
    2013 29 0 1 0 4
    2014 23 2 0 1 2
    end

    My code to generate the bar graph:
    Code:
    graph bar (asis) negative-hadar, over(year, label(labsize(vsmall))) stack ///
    legend (size(small)) legend(cols(4)) ylabel(,angle(horizontal) labsize(vsmall)) ///
    graphregion(color(white)) ytitle("Number of Samples", margin(medium)) ///
    blabel(bar,position(center) color(white))

    My graph looks like this:
    Click image for larger version

Name:	Salmonella Bar Graph example.png
Views:	1
Size:	52.2 KB
ID:	1616817

    I was also wondering:

    - Is there a way to label only those samples that tested negative? The blabel option labels every bar and I cannot figure out a way to label only the bars that represent negative samples. Is there there a way to label only a portion of bars on the graph?

    - How do I get the bar graph to not graph salmonella serotypes that have a value of 0? One way I see to do this is to change all zero values to missing values with the following code: <mvdecode _all, mv(0)> before inputting my graph code. Since bar graph does not graph missing data this works. But I was wondering if there is a better or different way to achieve this?


    Thanks.

  • #2
    Thanks for the data example. You can automate the code if the datasets only include the used variables and the variables "negative" and "year" are always named as such. Deleting the bar labels can be done using the graph editor, via the undocumented command gr_edit. As far as the 0 categories go, your approach sounds good.

    Code:
    clear
    input int year byte(negative typhimurium anatum enteritidis hadar)
    2010 20 1 1 2 0
    2011 18 2 1 2 3
    2012 36 4 0 2 3
    2013 29 0 1 0 4
    2014 23 2 0 1 2
    end
    
    *EXTRACT VARIABLE NAMES, SAVE "NEGATIVE" AND "YEAR"
    ds year negative, not
    *NEEDED FOR THE LABELS
    local vars=wordcount("`r(varlist)'")
    *GRAPH
    graph bar (asis) negative `r(varlist)', over(year, label(labsize(vsmall))) stack ///
    legend (size(small)) legend(cols(4)) ylabel(,angle(horizontal) labsize(vsmall)) ///
    graphregion(color(white)) ytitle("Number of Samples", margin(medium)) ///
    blabel(bar,position(center) color(white))
    
    *EXCLUDE LABELS
    levelsof year, local(years)
    local bars = wordcount("`years'")
    local exclude 1
    local count =`vars'+2
    forval bars =1/`bars'{
        local exclude "`exclude' , `count'"
        local count = `count' + `vars'+ 1
    }
    forval i=1/`.Graph.plotregion1.barlabels.arrnels' {
        if !inlist(`i', `exclude'){
            gr_edit .plotregion1.barlabels[`i'].text[1]=" "
        }
    }
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	49.8 KB
ID:	1616844

    Comment


    • #3
      Wow, thank you very much Andrew! That is exactly how I wanted it to work.

      Comment


      • #4
        Click image for larger version

Name:	salmonella.png
Views:	1
Size:	22.0 KB
ID:	1616956


        The stacked bar design (some say divided or subdivided bar) is popular but both coders and readers struggle with datasets in which frequencies can be zero or even very small.

        tabplot from the Stata Journal offers a different take. Here is some sample code with the result above. https://www.statalist.org/forums/for...updated-on-ssc gives a quick tour (originally posted before the SJ write-up).


        Code:
        clear
        input int year byte(negative typhimurium anatum enteritidis hadar)
        2010 20 1 1 2 0
        2011 18 2 1 2 3
        2012 36 4 0 2 3
        2013 29 0 1 0 4
        2014 23 2 0 1 2
        end
        
        local labels 1 negative 
        
        *EXTRACT VARIABLE NAMES, SAVE "NEGATIVE" AND "YEAR"
        ds year negative, not 
        
        preserve 
        
        local j = 1
        foreach v in `r(varlist)' { 
            local ++j 
            local labels `labels' `j' `v'
        }
        
        label def toshow `labels', modify 
        
        rename (negative `r(varlist)') (count=) 
        
        reshape long count, i(year) j(type) string 
        
        encode type, gen(toshow) label(toshow)
        
        tabplot toshow year [fw=count], yla(1/`j', valuelabel) showval ytitle("") xtitle("") scheme(s1color) blcolor(blue) bfcolor(blue*0.2) title(Salmonella serotypes)
        
        restore

        Comment


        • #5
          Thanks Nick for demonstrating how this data would look in tabplot. I do agree it's much easier to extract this information! I will look into using these for sure.

          Comment


          • #6
            Hello Andrew,
            When I use code first to change zeros in the dataset to missing values, the removal of labels no longer works. See code and graph below. I see that removing zero values disrupts the sequence by which labels are remove. Is there a way to fix this? I would like to not graph zero values otherwise when I have 10 years in the graph and 20+ serovars for each year it takes a long time to remove all zero labels. Is this possible? Again, something I can live with but would be more efficient for sure.
            Thanks,

            Code:
            clear
            input int year byte(negative typhimurium anatum enteritidis hadar)
            2010 20 1 1 2 0
            2011 18 2 1 2 3
            2012 36 4 0 2 3
            2013 29 0 1 0 4
            2014 23 2 0 1 2
            end
             
            mvdecode _all, mv(0)
            ds year negative, not
            local vars=wordcount("`r(varlist)'")
            graph bar (asis) negative `r(varlist)', over(year, label(labsize(vsmall))) stack ///
            legend (size(vsmall)) legend(region(lstyle(none))) legend(cols(4) colgap(5.2) symxsize(9)) ylabel(,angle(horizontal) labsize(vsmall)) ///
            graphregion(color(white)) ytitle("Number of Samples", margin(medium)) ///
            blabel(bar,position(center) color(white))
             
            levelsof year, local(years)
            local bars = wordcount("`years'")
            local exclude 1
            local count =`vars'+2
            forval bars =1/`bars'{
                local exclude "`exclude' , `count'"
                local count = `count' + `vars'+ 1
            }
            forval i=1/`.Graph.plotregion1.barlabels.arrnels' {
                if !inlist(`i', `exclude'){
                    gr_edit .plotregion1.barlabels[`i'].text[1]=" "
                }
            }
            Click image for larger version

Name:	Salmonella bar graph with mvdecode prior.png
Views:	1
Size:	40.9 KB
ID:	1621022

            Comment


            • #7
              Why do you want to remove zeros? You are graphing counts, so zero counts will not be displayed. As a matter of fact, I suggest the opposite in your related thread that seeks to keep the bar colors consistent.
              https://www.statalist.org/forums/for...-on-bar-graphs

              Comment

              Working...
              X