Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram only for non-missing values

    Dear Statalists,
    My dataset contains duration in minutes of exams in multiple subjects.
    I want to graph the frequency of every duration by each subject.
    For example, when I tabulate duration if the subject is math
    Code:
    tab duration if subject==math,mi
    I get
    Code:
    duration in |
        minutes |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             75 |    255,844       23.32       23.32
             90 |    127,314       11.61       34.93
            105 |    165,198       15.06       49.99
            120 |     66,312        6.05       56.04
            180 |    307,021       27.99       84.03
            195 |          1        0.00       84.03
            255 |          3        0.00       84.03
            360 |          1        0.00       84.03
            435 |          1        0.00       84.03
              . |    175,219       15.97      100.00
    ------------+-----------------------------------
          Total |  1,096,914      100.00
    When I use histogram for this propose:
    Code:
    hist duration if subject =="math"
    I get this histogram
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	35.1 KB
ID:	1640346


    As you can see, for one that does not have the tabulate output for above from the duration variable, it is very hard to what are the values around 100.
    Thus, I want my histogram to present only the listed values of duration and specify them.
    It should look like this:

    Click image for larger version

Name:	1.png
Views:	1
Size:	20.5 KB
ID:	1640347


    Many Thanks!

  • #2
    I believe adding the -discrete- option to your -histogram- command will produce what you want.

    Comment


    • #3
      The combination of very high and very low frequencies is challenging here -- and I have sympathy with any position that a table serves very well.

      Otherwise here are a few ideas.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int duration long frequency
       75 255844
       90 127314
      105 165198
      120  66312
      180 307021
      195      1
      255      3
      360      1
      435      1
      end
      
      
      gen zero = 0
      levelsof duration, local(levels)
      gen toshow = cond(freq > 1e3, strofreal(round(freq/1000)) + "K", strofreal(freq))
      spikeplot duration [fw=frequency],  addplot(scatter zero frequency duration, ms(Oh none) mlabel(. toshow) mlabpos(. 12)) ///
      xla(`levels', labsize(small)) yla(none) legend(off) ysc(r(. 320e3) titlegap(*5)) xsc(r(60 450)) xtitle(Duration (minutes))
      Click image for larger version

Name:	exams.png
Views:	1
Size:	23.2 KB
ID:	1640354

      Comment


      • #4
        Thank you Nick!

        Comment

        Working...
        X