Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selected summary statistics and format of "summarize" output

    Hello everyone,

    browsing through the Stata manuals and the forum I could not find a way yet (without user-written commands) to display only selected summary statistics for a variable. Although I get all the information I need by using the summarize command, I would like to have a handy way to only display the data I really need. Namely: mean, median (=50th percentile), min, max, st. dev., number of observations. How would you do that?

    Secondly, is there a way to change the
    summarize
    output from exponential to decimal? The variables are already formatted (format payments %10.2fc). Is it a limitation within Stata or would it be possible to display the full decimal value in the summarize output?

    My output (with exponential values):

    Code:
    . sum payments, d f
                            Payments
    -------------------------------------------------------------
          Percentiles      Smallest
     1%        34.67           0.00
     5%       201.93           0.00
    10%       498.73           0.00       Obs               87501
    25%     2,207.84           0.00       Sum of Wgt.       87501
    
    50%     9,054.53                      Mean          52,282.45
                            Largest       Std. Dev.     300768.14
    75%    30,926.83       1.51e+07
    90%    88,559.99       1.56e+07       Variance       9.05e+10
    95%    169976.95       1.56e+07       Skewness          29.88
    99%    753000.00       1.61e+07       Kurtosis       1,277.10
    Thank you!
    Best

  • #2
    summarize has a documented format option, which you are using. In your case, there just isn't enough space to honour it always.

    Otherwise consider tabstat with options s(n mean median min max sd) and format().

    Comment


    • #3
      Great, thanks for the quick reply. It's an excellent option!

      Any chance to change the width of the value output window as it is possible for variables varwidth(#). Right now all the values are cluttered together...

      Code:
      .         tabstat payments, s(mean median sd) format(%10.2fc)
      
          variable |      mean       p50        sd
      -------------+------------------------------
          payments |100,357.95 28,100.29509,524.16
      --------------------------------------------

      Comment


      • #4
        You can use the tabstat command. This allows you to list the statistics you want to show and it includes the option to fine tune the display format.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I think you need to write your own command. You want to see numbers as they come when values range from single digits to tens of millions or more. That's a common and understandable call with financial data but tabstat doesn't seem to allow anything beyond what you can see. In the rest of science we just change units or use scientific notation.

          UPDATE:

          Consider s(n mean median min max sd) format(%14.2fc) c(v)
          Last edited by Nick Cox; 05 Nov 2014, 04:42.

          Comment


          • #6
            Alright, I will figure something out.
            Thank you

            Comment


            • #7
              See my UPDATE too.

              Comment


              • #8
                I've been trying to do this for years! Thanks, Nick. Tabstat worked for me.

                Comment

                Working...
                X