Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram plotting by another variable

    Hi,

    This is probably a very easy problem but I have been trying to find this out for a while and cant make it work.

    I have a continuous variable (optical density of a haemorrhage=DfrontalLR) for which I want to create histogram, which is not the problem. But the variable (optical density) should be divided/split based on another variable (haemorrhage yes/no=CTresult). Attached is the example.

    I've tried:
    twoway (hist DfrontalLR) (hist CTresult)

    hist DfrontalLR, by (CTresult) width (0.1)
    -> this

    Is there, I'm sure there is, a command where I have the two groups in the same window with a different colour as in the attached screenshot?

    Thanks!
    Attached Files

  • #2
    It is not clear exactly what you are asking for. The attached Figure seems to superimpose histograms but with an offset in showing the bars of one group which makes occlusions just about visible. The code you tried just superimposes histograms and thus in general runs the risk of occlusion. Occlusion means that a bar in front of greater or equal size makes the bar behind invisible.

    There is an easier method, which is to superimpose bars showing frequencies for one group on bars showing frequencies for both. Then the differences in height represent the other group. This method hinges on frequencies being additive.

    Here is some simple code as a pattern.

    Code:
    sysuse auto, clear
    
    twoway histogram mpg, discrete frequency bcolor(red*0.5) || histogram mpg if foreign, discrete ///
    legend(order(1 "Domestic" 2 "Foreign")) frequency bcolor(blue*0.5)
    That design seems to me a poor choice, adding some new disadvantages to the standard disadvantage of a histogram. How it would work with your data we can't tell, assuming that they aren't the data in the attachment, but my prejudice is that there are several better methods.



    Comment


    • #3
      Without any medical or biostatistical expertise, the following guesses can still be made on the basis of the attachment:

      1. Skewed distributions for the outcome.

      2. Smaller sample size for afflicted patients.

      I made up some values that I hope are plausible as a sandbox.

      Side-by-side quantile-box plots show broad features and allow checks for fine structure while not obliging any kind of binning. I used stripplot (SSC). This kind of graph has been touted before on this forum. As the name implies, the design superimposes boxes showing medians and quartiles on quantile plots. The reference lines show means; that's discretionary and other summaries or none could be used. Once you decide that there is space to show all the data -- and there really is -- then arbitrary extra rules such as plotting plots distinctly if and only if they are more than 1.5 IQR from the nearer quartile are dispensable.


      Code:
      clear
      set obs `=269 + 96' 
      set seed 2803 
      gen outcome = runiform()^3.5 in 1/269 
      replace outcome = 1.5 * runiform()^2.5 in 270/L 
      gen which = _n >= 270 
      label def which 0 no 1 yes
      label val which which 
      stripplot outcome, over(which) cumul cumprob box refline  vertical centre
      Click image for larger version

Name:	isabel.png
Views:	1
Size:	17.5 KB
ID:	1385349

      Comment


      • #4
        Trying to provide some help and at the same time for the sake of honing skills in Stata, I fiddled with Nick's command presented here.

        With small editions, I typed:
        Code:
        . twoway histogram mpg if foreign, width(2) blcolor(green) bfcolor(red) fintensity(10) frequency || histogram mpg if !foreign, width(2) ///
        barw(1) bfcolor(maroon) sort fintensity(80) blcolor(blue) frequency legend(order(1 "Foreign" 2 "Domestic") col(1) pos(1) ring(0))
        Which gives this graph:
        Click image for larger version

Name:	Graph_overlap_hist.png
Views:	1
Size:	12.6 KB
ID:	1385380




        The abusive use of colors (e.g., the bords) was just to explore the possibilities.

        I hope this helps.
        Best regards,

        Marcos

        Comment


        • #5
          Marcos' design does make clear what is what and could be extended to densities, percents and proportions too. A cost is showing different categories very differently. For readerships who realise that the heights of the bars convey the information, that might be fine. The two bars could be of equal width and set side-by-side.

          Comment


          • #6
            Thank you very much both. This is very very helpful. Just one follow-up question: how would I make the two bars of equal width and set side-by-side?

            Comment


            • #7
              That's still a horrible method -- but not rocket science. Here's a demo you can run:

              Code:
              sysuse auto, clear
              
              separate mpg, by(foreign) veryshortlabel 
              
              local W 1 2 4 
              local j = 1 
              
              quietly foreach w of local W { 
              
              local hw = `w'/2 
              replace mpg0 = `w' * floor(mpg/`w')  if !foreign 
              replace mpg1 = `w' * floor(mpg/`w') + `w'/2 if foreign 
              
              if `j' == 1 local ticks xtick(12/41) 
              else if `j' == 2 local ticks xtick(12(2)42) 
              else if `j' == 3 local ticks xtick(12(4)44) 
               
              noisily twoway histogram mpg0, start(12) width(`hw') bfcolor(red*0.5) blcolor(red) frequency || ///
              histogram mpg1, start(12) width(`hw') bfcolor(blue*0.5) blcolor(blue) frequency ///
              legend(order(1 "Domestic" 2 "Foreign")) name(G`j', replace) ///
              subtitle(bin width `w') `ticks' xla(12(4)40) 
              
              local ++j 
              }
              The three tricks used here are

              1. Rounding beforehand.

              2. Each bar width must be half the bin width.

              3. One variable must be offset.

              Comment


              • #8
                You may also fiddle with Stata Tip 20, writtten by David Harrison.

                See the example below:

                Code:
                . sysuse auto
                (1978 Automobile Data)
                
                . twoway__histogram_gen mpg if foreign ==0, frequency gen(h0 x0)
                
                . twoway__histogram_gen mpg if foreign ==1, frequency gen(h1 x1)
                
                . twoway(bar h0 x0, barw(2)) (bar h1 x1, barw(2)), legend(order(1 "Foreign" 2 "Domestic") col(1) pos(1) ring(0))
                Click image for larger version

Name:	Graphhist.png
Views:	1
Size:	13.7 KB
ID:	1385740
                Best regards,

                Marcos

                Comment


                • #9
                  Thank you very much. That makes sense. One last question: the variable on the y-axis can I use instead of "any" frequency the frequency of patients that had the x-axis variable and how would I do that? Because then I would have three variables: the x-axis the y-axis and the variable by which they are divided (in your example above foreign/domestic. Thank you!

                  Comment


                  • #10
                    Sorry, but I don't understand how this is a different question. Please phrase in terms of a data example, your own or a dataset that we can all read into Stata.

                    Comment

                    Working...
                    X