Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mean and standard deviation for groups

    Hi All,

    I need to find mean and standard deviation for each category separately. I tried bysort but I couldn't make it. Could you please help me with it?
    Here is a small sample of my data. I need to find mean and standard deviation for each id.
    Code:
    clear
    input int id     int   category  int  value   int   freq
    id        category        value        freq
    1        10        4        8
    1        11        3.69        9
    1        12        4.32        8
    1        12        4.12        7
    2        20        4.62        7
    2        21        3.56        6
    2        10        4.14        5
    2        12        4.6        7
    2        10        4.72        8
    3        21        4.32        9
    3        22        4.12        7
    3        22        4.62        6
    4        10        3.56        9
    4        21        4        9
    end
    Thanks

  • #2
    Code:
    help egen

    Comment


    • #3
      Thanks Nick. I know how to use egen when I want to find mean and standard deviation of some values but how can I use it when I have frequency as well. The values are weighted by frequency. Sorry! I know it should be very basic.I really appreciate your help.

      Comment


      • #4
        Not sure if you intend "find" to mean a report in the Results window, or create a new variable, or something else entirely. For a report, the following seems to do the trick on your (very easy to use, thank you!) sample data.
        Code:
        . tabstat value [fweight=freq], statistics(n mean sd) by(category)
        
        Summary for variables: value
             by categories of: category
        
        category |         N      mean        sd
        ---------+------------------------------
              10 |        30       3.7  .4660916
              11 |         9         3         0
              12 |        22         4         0
              20 |         7         4         0
              21 |        24      3.75  .4423259
              22 |        13         4         0
        ---------+------------------------------
           Total |       105  3.771429  .4219265
        ----------------------------------------
        Or did you want it by id?
        Code:
        . tabstat value [fweight=freq], statistics(n mean sd) by(id)
        
        Summary for variables: value
             by categories of: id 
        
              id |         N      mean        sd
        ---------+------------------------------
               1 |        32   3.71875  .4568034
               2 |        33  3.818182  .3916747
               3 |        22         4         0
               4 |        18       3.5  .5144958
        ---------+------------------------------
           Total |       105  3.771429  .4219265
        ----------------------------------------
        Last edited by William Lisowski; 29 Sep 2015, 11:57.

        Comment


        • #5
          Thanks William,

          I need to create new variables for both sd and mean.

          Comment


          • #6
            You can consider an expand on the frequencies. If your dataset is very large, that may not be practical.

            Comment


            • #7
              I have 11000 unique IDs and each of them have about 30 categories. frequencies are large too.

              Comment


              • #8
                That's not a big dataset by modern standards. The relevant number is the typical frequency. In your example data it is less than 10. At worst you need to expand only the variables for which you want the mean and SD. Once calculated you can collapse and merge back in.

                Of course, it can be programmed directly, but that's a waste of time if you have the memory to get Stata to do the work.

                Comment


                • #9
                  Got it. Thank you very much. I will give it a try.

                  Comment

                  Working...
                  X