Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get mode from a histogram

    Hi all

    Edit: I find a solution, below i post it, but lamentably I can not get the scalar:

    I am trying to get the mode from a histogram:

    Code:
    sysuse auto, clear
    
    hist mpg,  freq addlabels
       
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	11.5 KB
ID:	1595276
    My goal its get the value=25.

    I find the solution but I could not get the scalar:
    graph save Graph "mpg.gph", replace

    clear
    graph use mpg
    serset use
    sum
    Code:
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
        __000000 |         8        9.25    8.293715          1         25
        __000001 |         9           0           0          0          0
        __000002 |         9    24.88889    9.609859         12    39.1875
    
    . ret li
    
    scalars:
                      r(N) =  9
                  r(sum_w) =  9
                   r(mean) =  24.88888888888889
                    r(Var) =  92.34939236111111
                     r(sd) =  9.609859122854566
                    r(min) =  12
                    r(max) =  39.1875
                    r(sum) =  224
    Last edited by Rodrigo Badilla; 26 Feb 2021, 14:26.

  • #2
    That’s not the mode; that is the frequency of most frequent bin, itself sensitive to bin width and origin.

    See hsmode from SSC for a systematic way to estimate the mode.

    Comment


    • #3
      If indeed what you want is the frequency of the most frequent bin, given that the bin widths are arbitrary (at least in your example), then the following might be a start.
      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . hist mpg,  freq addlabels
      (bin=8, start=12, width=3.625)
      
      . serset use
      
      . list, clean
      
             __000000   __000001   __000002  
        1.         10          0    13.8125  
        2.         25          0    17.4375  
        3.         13          0    21.0625  
        4.         15          0    24.6875  
        5.          6          0    28.3125  
        6.          1          0    31.9375  
        7.          3          0    35.5625  
        8.          1          0    39.1875  
        9.          .          0         12  
      
      .
      This suggests to me that the most frequent bin is the second one, with 25 observations, (__000000) and the bin is centered on 17.4375 (__000002) with a width of 3.6250 calculated as the difference between any pair of adjacent centers.

      Comment


      • #4
        Thanks Nick Cox and William Lisowski for you reply. I agree "mode" is wrong, the most frequent bin its the correct.

        Comment


        • #5
          Finally I could get the most frequent bin as local macro:

          Code:
          sysuse auto, clear
          
          hist mpg, freq addlabels
          
          serset  use
          sum __000000
          local maximo `r(max)'
          di "`maximo'"
          25

          Comment


          • #6
            The frequency of the modal bin is necessarily sensitive to bin width and even bin start. Why is of it of interest? Genuinely puzzled.

            Regardless of that, twoway__histogram_gen is arguably a better way to get what you want, which has received its own proper puff

            SJ-5-2 gr0014 . . . . . . . Stata tip 20: Generating histogram bin variables
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. A. Harrison
            Q2/05 SJ 5(2):280--281 (no commands)
            tip illustrating the use of twoway__histogram_gen for
            creation of complex histograms and other graphs or tables

            https://www.stata-journal.com/articl...article=gr0014

            To pursue the problem of artefacts, I pushed harder at some defensible choices for mpg in the auto data, which ranges from 12 to 41.

            25 is just one of several answers. To take it literally would be to put more weight on the defaults of
            histogram than I guess even its designers would want anyone to do.

            FWIW I note that
            modes from the Stata Journal has different goals here, but

            Code:
            search sg113, entry
            if you want to see what it does.


            Code:
            sysuse auto, clear 
            
            gen result = .  
            gen start = . 
            gen width = . 
            
            local i = 1 
            
            tempvar h x 
            quietly foreach s in 10 11 12 { 
                foreach w in 1 2 3 4 5 { 
                    twoway__histogram_gen mpg, freq gen(`h' `x' `replace') w(`w') start(`s')
                    replace width = `w' in `i' 
                    replace start = `s' in `i'
                    su `h', meanonly 
                    replace result = r(max) in `i' 
                    local replace , replace 
                    local ++i 
                }
            }
            
            * tabplot is from the Stata Journal 
            tabplot start width [fw=result], showval yreverse subtitle(which result would you like for modal frequency?)


            Click image for larger version

Name:	modal_frequency.png
Views:	1
Size:	20.9 KB
ID:	1595321

            Comment


            • #7
              Thanks Nick Cox I am clear with your point... I will check twoway__histogram_gen.

              Comment

              Working...
              X