Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative sum by order and id

    Hi all,

    I am working on this dataset, in which I would like to test whether the cumulative sum of the variable 'votes_margin' for order_m = 2 and 3 is greater than order_m of 1.

    1. Therefore, for each id indicated by the variable 'ac', I would like to create a cumulative sum of the variable 'votes_margin' but only for the values of order_m of 2 and 3.
    2. Then, create a dummy indicating whether this cumulative sum of order_m values 2 and 3 is greater than or lesser than order_m of 1, for each id.

    Any suggestions would be helpful.
    I have provided the example dataset by using dataex, below for your kind verification.

    Thanks.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 state_m str30 constituency float year_m str55 candidate_m double votes_margin long ac float order_m str24 const_m
    "Andhra Pradesh" "  ACHAMPET (SC)"  2004 " DR.VAMSHI KRISHNA"           .538547933101654  2 1 "Achampet" 
    "Andhra Pradesh" "  ACHAMPET (SC)"  2004 " P.RAMULU"                  .36918625235557556  2 2 "Achampet" 
    "Andhra Pradesh" "  ACHAMPET (SC)"  2004 " YAMGONDI VENKATAIAH"      .046903301030397415  2 3 "Achampet" 
    "Andhra Pradesh" "  ACHANTA (SC)"   2004 " PEETHALA SUJATHA"           .4994221329689026  3 1 "Achanta"  
    "Andhra Pradesh" "  ACHANTA (SC)"   2004 " ANAND PRAKASH CHELLEM"     .43905702233314514  3 2 "Achanta"  
    "Andhra Pradesh" "  ACHANTA (SC)"   2004 " JOSHIP MERIPE"            .043371714651584625  3 3 "Achanta"  
    "Andhra Pradesh" "  ALAIR (SC)"     2004 " DR. KUDUDULA NAGESH"        .5575404167175293 12 1 "Alair"    
    "Andhra Pradesh" "  ALAIR (SC)"     2004 " MOTHUKUPALLY NARSIMHULU"    .3478609621524811 12 2 "Alair"    
    "Andhra Pradesh" "  ALAIR (SC)"     2004 " DR. ETIKALA PURUSHOTHAM"   .02296549640595913 12 3 "Alair"    
    "Andhra Pradesh" "  ALLAVARAM (SC)" 2004 " GOLLAPALLI SURYARAO"        .5028068423271179 14 1 "Allavaram"
    "Andhra Pradesh" "  ALLAVARAM (SC)" 2004 " PANDU SWARUPA RANI"         .4317871034145355 14 2 "Allavaram"
    "Andhra Pradesh" "  ALLAVARAM (SC)" 2004 " ETHAKOTA THUKKESWARA RAO"  .04949498176574707 14 3 "Allavaram"
    end
    label values ac ac
    label def ac 2 "  ACHAMPET (SC)", modify
    label def ac 3 "  ACHANTA (SC)", modify
    label def ac 12 "  ALAIR (SC)", modify
    label def ac 14 "  ALLAVARAM (SC)", modify

  • #2
    Hi Vignesh, I didn't see an id, so I created one. // Actually, variable ac was his id

    Code:
    egen id = group(const_m )
    bysort id (order_m): gen cum_votes = sum( votes_margin)  
    bysort id (order_m): gen less_1 = ( votes_margin < votes_margin[1]) // 1 if current candidate received fewer votes than candidate listed as #1
    bysort id (order_m): replace less_1 = 1 if _n==1
    bysort id (order_m): gen less_prior = ( votes_margin < votes_margin[_n-1])  // 1 if current votes less than votes of prior observation
    egen rank = rank( cum_votes), by(id)  // ranks candidates by voter_margin.  Could then do gen less_1 = if order_m == rank
    // NOTE: error in above line.  It should've been "egen rank = rank( votes_margin ), by(id) field" 
    // See post #4
    
    . list id order_m rank candidate_m const_m votes_margin cum_votes less_1 less_prior , sepby(id) noobs
    
      +--------------------------------------------------------------------------------------------------------+
      | id   order_m   rank                 candidate_m     const_m   votes_m~n   cum_vo~s   less_1   less_p~r |
      |--------------------------------------------------------------------------------------------------------|
      |  1         1      1           DR.VAMSHI KRISHNA    Achampet   .53854793   .5385479        1          1 |
      |  1         2      2                    P.RAMULU    Achampet   .36918625   .9077342        1          1 |
      |  1         3      3         YAMGONDI VENKATAIAH    Achampet    .0469033   .9546375        1          1 |
      |--------------------------------------------------------------------------------------------------------|
      |  2         1      1            PEETHALA SUJATHA     Achanta   .49942213   .4994221        1          1 |
      |  2         2      2       ANAND PRAKASH CHELLEM     Achanta   .43905702   .9384792        1          1 |
      |  2         3      3               JOSHIP MERIPE     Achanta   .04337171   .9818509        1          1 |
      |--------------------------------------------------------------------------------------------------------|
      |  3         1      1         DR. KUDUDULA NAGESH       Alair   .55754042   .5575404        1          1 |
      |  3         2      2     MOTHUKUPALLY NARSIMHULU       Alair   .34786096   .9054013        1          1 |
      |  3         3      3     DR. ETIKALA PURUSHOTHAM       Alair    .0229655   .9283669        1          1 |
      |--------------------------------------------------------------------------------------------------------|
      |  4         1      1         GOLLAPALLI SURYARAO   Allavaram   .50280684   .5028068        1          1 |
      |  4         2      2          PANDU SWARUPA RANI   Allavaram    .4317871   .9345939        1          1 |
      |  4         3      3    ETHAKOTA THUKKESWARA RAO   Allavaram   .04949498   .9840889        1          1 |
      +--------------------------------------------------------------------------------------------------------+
    Last edited by David Benson; 22 Feb 2019, 17:56. Reason: I made a couple of mistakes in this code. Edited to point them out. See post #4 below for more info.

    Comment


    • #3
      Hi David Benson,

      Thanks for your reply. Variable called 'ac' is the id, which I was mentioning about in the previous post.

      I just tried your command and it creates a cumulative sum of votes_margin for all the values of order_m and then ranks it again, which will produce the same values as of order_m, an already existing variable.

      What I would like to do instead is, 1. Create cumulative sum of votes_margin, only for the values 2 and 3 of order_m; 2. Create a dummy which takes on the value '1' only if the cumulative sum of the variable votes_margin of the values 2 and 3 of order_m is greater than the votes_margin of order_m of 1.

      Kind regards

      Comment


      • #4
        Hi Vignesh,

        My bad (for a few reasons)!

        1) First, I missed that you mentioned ac as id. (Just replace them in your code).

        2) Second, I thought you were trying to ensure that each candidate got fewer votes than the previous one (or the one listed as #1).

        3) Finally, I made rank as an alternative to "gen less_prior = (votes_margin < votes_margin[_n-1])" (based on my misunderstanding #2) -- but I made a mistake in it! It should have been "egen rank = rank( votes_margin ), by(id) field" (not ranked on cumulative votes, which is silly). But of course, it was not needed anyway.

        Try this instead :-)

        Code:
        bysort id (order_m): gen sum2_3 = votes_margin[2] + votes_margin[3]
        bysort id (order_m): gen greater_1 = ( sum2_3 > votes_margin[1])
        
        . list id order_m candidate_m const_m votes_margin sum2_3 greater_1 , sepby(id) abbrev(12) noobs
        
          +-------------------------------------------------------------------------------------------+
          | id   order_m                 candidate_m     const_m   votes_margin    sum2_3   greater_1 |
          |-------------------------------------------------------------------------------------------|
          |  1         1           DR.VAMSHI KRISHNA    Achampet        0.53855   0.41609           0 |
          |  1         2                    P.RAMULU    Achampet        0.36919   0.41609           0 |
          |  1         3         YAMGONDI VENKATAIAH    Achampet        0.04690   0.41609           0 |
          |-------------------------------------------------------------------------------------------|
          |  2         1            PEETHALA SUJATHA     Achanta        0.49942   0.48243           0 |
          |  2         2       ANAND PRAKASH CHELLEM     Achanta        0.43906   0.48243           0 |
          |  2         3               JOSHIP MERIPE     Achanta        0.04337   0.48243           0 |
          |-------------------------------------------------------------------------------------------|
          |  3         1         DR. KUDUDULA NAGESH       Alair        0.55754   0.37083           0 |
          |  3         2     MOTHUKUPALLY NARSIMHULU       Alair        0.34786   0.37083           0 |
          |  3         3     DR. ETIKALA PURUSHOTHAM       Alair        0.02297   0.37083           0 |
          |-------------------------------------------------------------------------------------------|
          |  4         1         GOLLAPALLI SURYARAO   Allavaram        0.50281   0.48128           0 |
          |  4         2          PANDU SWARUPA RANI   Allavaram        0.43179   0.48128           0 |
          |  4         3    ETHAKOTA THUKKESWARA RAO   Allavaram        0.04949   0.48128           0 |
          +-------------------------------------------------------------------------------------------+

        Edited to add: I found a more general way to do the sum(votes_margin) if order_m==1 or 2. See Nick Cox's code here and Stata Journal article here


        Code:
        egen sum2 = total(inrange(order_m, 2, 3) * votes_margin), by(id)
        
        . list id order_m candidate_m const_m votes_margin sum2_3 sum2 greater_1 , sepby(id) abbrev(12) noobs
        
          +------------------------------------------------------------------------------------------------------+
          | id   order_m                 candidate_m     const_m   votes_margin    sum2_3       sum2   greater_1 |
          |------------------------------------------------------------------------------------------------------|
          |  1         1           DR.VAMSHI KRISHNA    Achampet        0.53855   0.41609   .4160896           0 |
          |  1         2                    P.RAMULU    Achampet        0.36919   0.41609   .4160896           0 |
          |  1         3         YAMGONDI VENKATAIAH    Achampet        0.04690   0.41609   .4160896           0 |
          |------------------------------------------------------------------------------------------------------|
          |  2         1            PEETHALA SUJATHA     Achanta        0.49942   0.48243   .4824287           0 |
          |  2         2       ANAND PRAKASH CHELLEM     Achanta        0.43906   0.48243   .4824287           0 |
          |  2         3               JOSHIP MERIPE     Achanta        0.04337   0.48243   .4824287           0 |
          |------------------------------------------------------------------------------------------------------|
          |  3         1         DR. KUDUDULA NAGESH       Alair        0.55754   0.37083   .3708265           0 |
          |  3         2     MOTHUKUPALLY NARSIMHULU       Alair        0.34786   0.37083   .3708265           0 |
          |  3         3     DR. ETIKALA PURUSHOTHAM       Alair        0.02297   0.37083   .3708265           0 |
          |------------------------------------------------------------------------------------------------------|
          |  4         1         GOLLAPALLI SURYARAO   Allavaram        0.50281   0.48128   .4812821           0 |
          |  4         2          PANDU SWARUPA RANI   Allavaram        0.43179   0.48128   .4812821           0 |
          |  4         3    ETHAKOTA THUKKESWARA RAO   Allavaram        0.04949   0.48128   .4812821           0 |
          +------------------------------------------------------------------------------------------------------+
        Last edited by David Benson; 22 Feb 2019, 18:08.

        Comment


        • #5
          Hi Benson,

          Thanks for your help. It works perfectly.
          Now I can carry on with my work

          Regards

          Comment

          Working...
          X