Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • histograms of categorical variables

    Hello everyone,
    first post in the forum so apologies if something is misspecified/does not follow all publication guideliens.

    I have an issue with the following code

    twoway (histogram quality_8 if treatment == 0, discrete frequency width(0.4) start(0) color(blue%50)) ///
    (histogram quality_8 if treatment == 1, discrete frequency width(0.4) start(2) color(red%50)), ///
    xlabel(1 "Outside covered" 2 "Outside not covered" 3 "Inside covered" 4 "Inside not covered", angle(45)) ///
    xtitle("") ///
    ytitle("Frequency") ///
    legend(order(1 "Control" 2 "Treatment") position(6)) ///
    title("Distribution of quality_8 by Treatment Status") ///
    ylabel(, angle(horizontal))

    it should create a histogram for a categorical variable (quality_8) that has 5 categories (I chose the histogram environment has only one category is present in the data, and using bar graph it does not display on the x axis the categories with missing observations), sorted by treatment.
    The issue is that the bars are overlapping, even if I set two different starts.

    Any way to keep the same code and have side by side bars?

    Many thanks in advance.

    Attached Files

  • #2
    For side-by-side bars with twoway histogram you need to use an offset.

    There is no data example here, so here is some technique with an accessible dataset.

    Code:
    sysuse auto, clear
    
    gen rep78_0 = rep78 - 0.2 if foreign == 0
    gen rep78_1 = rep78 + 0.2 if foreign == 1
    
    twoway histogram rep78_0, barw(0.4) color(stc1) discrete freq || histogram rep78_1, barw(0.4) color(stc2) discrete freq legend(order(1 "Domestic" 2 "Foreign")) xtitle("`: var label rep78'") name(G1, replace)
    
    graph bar (count) ,  asyvars over(foreign) over(rep78) b2title("`: var label rep78'") name(G2, replace)
    You may conclude that graph bar is a better choice for what you want.

    Comment


    • #3
      Many thanks for the response! I agree that graph bar is a better choice for the type of graph that I need, the only issue I am encountering is that by using it, the graph only displays categories of the variable "quality_8" (rep_78 in you example) for which there is observations, while I need to show all categories, even some have zero frequency.

      graph bar (count) , asyvars over(treatment) over(quality_8) b2title("") name(G2, replace)

      Thanks again
      Attached Files

      Comment


      • #4
        Solutions for that are various. Here is the flavour of one:

        Code:
        sysuse auto, clear
        
        gen rep78_0 = rep78 - 0.2 if foreign == 0
        gen rep78_1 = rep78 + 0.2 if foreign == 1
         
        twoway histogram rep78_0, barw(0.4) color(stc1) discrete freq xla(0 "not observed" 1/5 6 "doesn't exist") || histogram rep78_1, barw(0.4) color(stc2) discrete freq legend(order(1 "Domestic" 2 "Foreign")) xtitle("`: var label rep78'")

        Comment


        • #5
          Is an equivalent specification possible in the graph bar environment? I am asking because in twoway I keep having issues in putting columns side by side and not overlapping, while graph has side by side columns, but I can't find the x axis specification for including all categories

          Comment


          • #6
            I believe not and I would have recommended that if I knew of a good approach.

            As documented, the x axis is considered not to exist with graph bar.

            The only way to work around that that can I imagine is to work with weights and set up fake observations for absent categories with frequencies so minute that they don't show up as visible bars.

            Code:
            sysuse auto, clear 
            
            contract rep78, freq(freq)
            
            expand 2 in L 
            
            replace rep78 = 6 in L 
            replace freq = 1e-6 in L 
            
            graph bar (sum) freq, over(rep78 ) ytitle(Frequency)
            As always giving us a data example would let us try code closer to your real problem.

            Code:
            contract quality_8 treatment if inlist(treatment, 0, 1), zero 
            dataex

            Comment

            Working...
            X