Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling missing values when summing

    Dears,

    I have this dataset, I want to fill the missing when sum of earnings for group 1 and group 2. In other words, for each year fill the missing with value calculated for the year. That is because I need the two values for each observations for each year. The code I used as follows

    egen AggEarnGroup1 = sum(earnings), by(fyear), if Group1==1
    egen AggEarnGroup3 = sum(earnings), by(fyear), if Group3==1


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long gvkey double fyear float(earnings Group1 Group3 AggEarnGroup1 AggEarnGroup3)
    1001 1980        . 1 .   159.018          .
    1001 1981        . 1 .    99.012          .
    1001 1982        . 1 . -237.5578          .
    1001 1983    1.135 1 . -169.8562          .
    1001 1984    1.138 1 . -119.1924          .
    1001 1985   2.2226 1 . -438.8768          .
    1004 1980    3.606 . .         .          .
    1004 1981    1.225 . .         .          .
    1004 1982    2.795 . .         .          .
    1004 1983    5.021 . .         .          .
    1004 1984    9.071 . .         .          .
    1004 1985   11.506 . .         .          .
    1004 1986   15.361 . .         .          .
    1004 1987   24.473 . .         .          .
    1004 1988   24.814 . .         .          .
    1004 1989   25.655 . .         .          .
    1004 1990   16.781 . .         .          .
    1004 1991     13.5 . .         .          .
    1004 1992    6.883 . .         .          .
    1004 1993    9.484 . .         .          .
    1004 1994   10.463 . .         .          .
    1004 1995   16.012 . .         .          .
    1004 1996   23.025 . .         .          .
    1004 1997   35.657 . .         .          .
    1004 1998   41.671 . .         .          .
    1004 1999   35.163 . .         .          .
    1004 2000   21.771 . .         .          .
    1004 2001   -7.339 . .         .          .
    1004 2002   -9.194 . .         .          .
    1004 2003   4.7238 . .         .          .
    1004 2004  17.2748 . .         .          .
    1004 2005  37.4988 . .         .          .
    1005 1980     .926 . .         .          .
    1005 1981    1.495 . .         .          .
    1008 1983    -.885 1 . -169.8562          .
    1008 1984    -.543 1 . -119.1924          .
    1008 1985    -.794 1 . -438.8768          .
    1011 1982     .265 . 1         . -257.15283
    1011 1983     .033 . 1         .   760.9262
    1011 1984     .065 . 1         .   668.9444
    1011 1985     .044 . 1         .   316.2338
    1011 1986    -.961 . 1         .     58.193
    1011 1987    -.126 . 1         .  1026.6685
    1011 1988    .0952 . 1         .  1170.6932
    1011 1989    -.788 . 1         .   193.2934
    1011 1990   -1.869 . 1         . -208.47455
    1011 1991  -1.5476 . 1         .  -221.4028
    1011 1992  -2.1236 . 1         . -179.92877
    1011 1993   -3.158 . 1         .    2048.11
    1011 1994   -8.868 . 1         .   3945.824
    1013 1980    3.354 . 1         .   319.3318
    1013 1981    3.542 . 1         .   245.3378
    1013 1982    4.464 . 1         . -257.15283
    1013 1983        . . 1         .   760.9262
    1013 1984    4.143 . 1         .   668.9444
    1013 1985    7.871 . 1         .   316.2338
    1013 1986   11.968 . 1         .     58.193
    1013 1987   15.263 . 1         .  1026.6685
    1013 1988   17.673 . 1         .  1170.6932
    1013 1989   16.145 . 1         .   193.2934
    1013 1990   22.903 . 1         . -208.47455
    1013 1991   22.025 . 1         .  -221.4028
    1013 1992   23.306 . 1         . -179.92877
    1013 1993   31.636 . 1         .    2048.11
    1013 1994   40.521 . 1         .   3945.824
    1013 1995  57.5344 . 1         .   5674.798
    1013 1996   87.463 . 1         .    6889.65
    1013 1997  122.457 . 1         .   6881.083
    1013 1998  146.727 . 1         .  -2.191605
    1013 1999 177.0212 . 1         .  -3606.507
    1013 2000   313.04 . 1         . -11443.117
    1013 2001   -300.4 . 1         .  -46164.83
    1013 2002  -808.04 . 1         . -8761.9375
    1013 2003    -50.6 . 1         .  13818.103
    1013 2004    36.58 . 1         .  22568.734
    1013 2005    91.62 . 1         .   25367.99
    1016 1980   1.2696 . .         .          .
    1016 1981    1.012 . .         .          .
    1016 1982    -.338 . .         .          .
    1016 1983     .815 . .         .          .
    1016 1984    2.211 . .         .          .
    1016 1985     .405 . .         .          .
    1016 1986     .987 . .         .          .
    1016 1987   -.4344 . .         .          .
    1017 1980     .478 . .         .          .
    1017 1981   -4.722 . .         .          .
    1017 1982    2.363 . .         .          .
    1017 1983    4.395 . .         .          .
    1017 1984    7.354 . .         .          .
    1017 1985   5.8088 . .         .          .
    1017 1986    3.321 . .         .          .
    1017 1987        . . .         .          .
    1017 1988  -2.9074 . .         .          .
    1017 1989   -8.987 . .         .          .
    1017 1990   1.6704 . .         .          .
    1017 1991   1.8352 . .         .          .
    1017 1992    -.736 . .         .          .
    1017 1993   1.3884 . .         .          .
    1017 1994        . . .         .          .
    1022 1980    1.357 . .         .          .
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 109124 observations
    Use the count() option to list more


  • #2
    Before I turn to your question, let me point out that your data is poorly organized. You are tracking separate variables for the two groups, when it is much more effective, in Stata, to have a single variable that takes on the distinct values 1 (for Group 1) and 3 for (Group 3). Moreover, both of your group variables are coded 1/., which is an invitation to mistakes in Stata: better to code variables as 1/0, and use . only when the group assignment is actually unknown or undefined. Once you reorganize the data, your aggregate earnings calculations become simpler:
    Code:
    //    RE-ORGANIZE DATA
    gen group = 1 if Group1 == 1
    replace group = 3 if Group3 == 1
    drop *Group*
    
    //    CALCULATE YEARLY AGGREGATE EARNINGS FOR EACH GROUP
    by group fyear, sort: egen aggregate_earnings = total(earnings)
    Now, let me turn to your question about filling in missing values. Fill them in with what? You say "for each year fill the missing with value calculated for the year," but I have no idea what this means. What value calculated for the year are you referring to?

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Before I turn to your question, let me point out that your data is poorly organized. You are tracking separate variables for the two groups, when it is much more effective, in Stata, to have a single variable that takes on the distinct values 1 (for Group 1) and 3 for (Group 3). Moreover, both of your group variables are coded 1/., which is an invitation to mistakes in Stata: better to code variables as 1/0, and use . only when the group assignment is actually unknown or undefined. Once you reorganize the data, your aggregate earnings calculations become simpler:
      Code:
      // RE-ORGANIZE DATA
      gen group = 1 if Group1 == 1
      replace group = 3 if Group3 == 1
      drop *Group*
      
      // CALCULATE YEARLY AGGREGATE EARNINGS FOR EACH GROUP
      by group fyear, sort: egen aggregate_earnings = total(earnings)
      Now, let me turn to your question about filling in missing values. Fill them in with what? You say "for each year fill the missing with value calculated for the year," but I have no idea what this means. What value calculated for the year are you referring to?
      Thanks so much.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Before I turn to your question, let me point out that your data is poorly organized. You are tracking separate variables for the two groups, when it is much more effective, in Stata, to have a single variable that takes on the distinct values 1 (for Group 1) and 3 for (Group 3). Moreover, both of your group variables are coded 1/., which is an invitation to mistakes in Stata: better to code variables as 1/0, and use . only when the group assignment is actually unknown or undefined. Once you reorganize the data, your aggregate earnings calculations become simpler:
        Code:
        // RE-ORGANIZE DATA
        gen group = 1 if Group1 == 1
        replace group = 3 if Group3 == 1
        drop *Group*
        
        // CALCULATE YEARLY AGGREGATE EARNINGS FOR EACH GROUP
        by group fyear, sort: egen aggregate_earnings = total(earnings)
        Now, let me turn to your question about filling in missing values. Fill them in with what? You say "for each year fill the missing with value calculated for the year," but I have no idea what this means. What value calculated for the year are you referring to?
        Sorry Clyde, may you check this? I try to calculate it

        https://www.statalist.org/forums/for...arterly-prices

        Comment


        • #5
          I saw that post earlier and passed it by because I do not understand it. I still don't after reading it and seeing the response you got from George Ford.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I saw that post earlier and passed it by because I do not understand it. I still don't after reading it and seeing the response you got from George Ford.
            I response to George about what I need and try to get. In general, the results that I need can be obtained by keeping the price of 1 day and 2 day after the announcement date. I do not know how can I do it as the date I have is huge. You may look at my response, it may be clear now.

            Comment

            Working...
            X