
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • histograms of categorical variables

    Hello everyone,
    first post in the forum so apologies if something is misspecified/does not follow all publication guideliens.

    I have an issue with the following code

    twoway (histogram quality_8 if treatment == 0, discrete frequency width(0.4) start(0) color(blue%50)) ///
    (histogram quality_8 if treatment == 1, discrete frequency width(0.4) start(2) color(red%50)), ///
    xlabel(1 "Outside covered" 2 "Outside not covered" 3 "Inside covered" 4 "Inside not covered", angle(45)) ///
    xtitle("") ///
    ytitle("Frequency") ///
    legend(order(1 "Control" 2 "Treatment") position(6)) ///
    title("Distribution of quality_8 by Treatment Status") ///
    ylabel(, angle(horizontal))

    it should create a histogram for a categorical variable (quality_8) that has 5 categories (I chose the histogram environment has only one category is present in the data, and using bar graph it does not display on the x axis the categories with missing observations), sorted by treatment.
    The issue is that the bars are overlapping, even if I set two different starts.

    Any way to keep the same code and have side by side bars?

    Many thanks in advance.

    Attached Files

  • #2
    For side-by-side bars with twoway histogram you need to use an offset.

    There is no data example here, so here is some technique with an accessible dataset.

    sysuse auto, clear
    gen rep78_0 = rep78 - 0.2 if foreign == 0
    gen rep78_1 = rep78 + 0.2 if foreign == 1
    twoway histogram rep78_0, barw(0.4) color(stc1) discrete freq || histogram rep78_1, barw(0.4) color(stc2) discrete freq legend(order(1 "Domestic" 2 "Foreign")) xtitle("`: var label rep78'") name(G1, replace)
    graph bar (count) ,  asyvars over(foreign) over(rep78) b2title("`: var label rep78'") name(G2, replace)
    You may conclude that graph bar is a better choice for what you want.


    • #3
      Many thanks for the response! I agree that graph bar is a better choice for the type of graph that I need, the only issue I am encountering is that by using it, the graph only displays categories of the variable "quality_8" (rep_78 in you example) for which there is observations, while I need to show all categories, even some have zero frequency.

      graph bar (count) , asyvars over(treatment) over(quality_8) b2title("") name(G2, replace)

      Thanks again
      Attached Files


      • #4
        Solutions for that are various. Here is the flavour of one:

        sysuse auto, clear
        gen rep78_0 = rep78 - 0.2 if foreign == 0
        gen rep78_1 = rep78 + 0.2 if foreign == 1
        twoway histogram rep78_0, barw(0.4) color(stc1) discrete freq xla(0 "not observed" 1/5 6 "doesn't exist") || histogram rep78_1, barw(0.4) color(stc2) discrete freq legend(order(1 "Domestic" 2 "Foreign")) xtitle("`: var label rep78'")


        • #5
          Is an equivalent specification possible in the graph bar environment? I am asking because in twoway I keep having issues in putting columns side by side and not overlapping, while graph has side by side columns, but I can't find the x axis specification for including all categories


          • #6
            I believe not and I would have recommended that if I knew of a good approach.

            As documented, the x axis is considered not to exist with graph bar.

            The only way to work around that that can I imagine is to work with weights and set up fake observations for absent categories with frequencies so minute that they don't show up as visible bars.

            sysuse auto, clear 
            contract rep78, freq(freq)
            expand 2 in L 
            replace rep78 = 6 in L 
            replace freq = 1e-6 in L 
            graph bar (sum) freq, over(rep78 ) ytitle(Frequency)
            As always giving us a data example would let us try code closer to your real problem.

            contract quality_8 treatment if inlist(treatment, 0, 1), zero 

