Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mean by group and set as new column

    Hello again,

    I'm getting deeper and deeper with Stata/Statistics and love it more and more. :-)
    I come from the classical programming area like c++/java/python and know how to solve problems with lists and loops, but Stata has some very useful command contructions doing complex works in just one command. That is what I am searching for here. So now to the problem:

    I got for example the following data with just few rows for showing what I mean:
    Nr GroupID Value
    1 1 5
    2 1 3
    3 2 5
    4 2 11
    Now I want to calculate the mean value by GroupID and set all to a new column like it will be shown in the next table:
    Nr GroupID Value Mean
    1 1 5 4
    2 1 3 4
    3 2 5 8
    4 2 11 8
    Is this possible in with one nice Stata command and also with other results than mean?

    In general...I try to use Stata for data manipulation and data analysis of chemical engineering process optimization.

    Greetings and thx for your help
    Christian

  • #2
    Give this a try...

    Code:
    bysort GroupID: egen Mean = mean(Value)
    Lance

    Comment


    • #3
      Give this a try...

      Code:
      bysort GroupID: egen Mean = mean(Value)
      Lance

      Comment


      • #4
        Hey,

        thank you very much as the command does what I want, except of one thing I forgot to explain:

        The data is sorted as a time series where the groupID could possibly repeat with time. And these groups should have their own mean calculation cause they are a group later in time. Let's extend my last example:
        Nr GroupID Value Mean
        1 1 5 4
        2 1 3 4
        3 2 5 8
        4 2 11 8
        5 1 30 20
        6 1 10 20
        Might be a big issue that I have forgotten to explain this. Sry for that!

        Greetings and thanks for your help.
        Christian

        Comment


        • #5
          Hey,

          for the reason no solution exist for a single command, I will post my solution with a loop. Maybe someone has code advantages for me. :-)

          Code:
          // Grouping variable X with respect to time
          gen int X_grp = 0
          forvalues num = 1/`=_N' {
              replace X_grp = X_grp[_n-1] + 1 if X[_n] != X[_n-1] & Nr[_n] == _n & _n > 1
              replace X_grp = X_grp[_n-1] if X[_n] == X[_n-1] & Nr[_n] == _n & _n > 1
          }
          The 'Nr' column is just an id to be sure that only this current line will be changed. In general the code looks very bad related to what I know in programming. But using things like incrementation variables or simple conditions just didn't work here in my editor. :-)

          Greetings and thanks for the help.
          Christian

          Comment


          • #6
            Christian: Please contact the administrators to change to an identifier that is your full real name. FAQ Advice explains.

            You don't need a loop over observations for your new problem.

            Code:
            clear 
            input Nr    GroupID    Value    Mean
            1    1    5    4
            2    1    3    4
            3    2    5    8
            4    2    11    8
            5    1    30    20
            6    1    10    20
            end 
            
            gen block = sum(GroupID != GroupID[_n-1]) 
            egen Mean2 = mean(Value), by(block) 
            
            list, sepby(block) 
            
            
                 +---------------------------------------------+
                 | Nr   GroupID   Value   Mean   block   Mean2 |
                 |---------------------------------------------|
              1. |  1         1       5      4       1       4 |
              2. |  2         1       3      4       1       4 |
                 |---------------------------------------------|
              3. |  3         2       5      8       2       8 |
              4. |  4         2      11      8       2       8 |
                 |---------------------------------------------|
              5. |  5         1      30     20       3      20 |
              6. |  6         1      10     20       3      20 |
                 +---------------------------------------------+

            Comment


            • #7
              I am curious. How does STATA handle missing values when calculating mean by group in this way? Will observations with missing values count towards the denominator?

              Comment


              • #8
                Missing values are ignored in calculating numerical
                summaries, both numerators and denominators.

                Comment


                • #9
                  Missing values are ignored in calculating numerical
                  summaries, both numerators and denominators.

                  Comment


                  • #10
                    Hello, for this same example, how do I generate the 10th percentile values instead of the mean values? Thanks!

                    Comment


                    • #11
                      there is a pctile() function for the -egen- command; see
                      Code:
                      h egen
                      and look at the pctile(exp) function in that help file

                      Comment

                      Working...
                      X