Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a variable which is the sum of the distinct values of a variable per group

    Hello everyone,
    I would highly appreciate your help with respect to a command that generates a variable which is the sum of the distinct values of a variable per group. To explain with example, consider the auto dataset:
    Code:
    sysuse auto, clear
    I need to create a variable, say "x", which is the sum of the distinct values of variable "rep78" per each category of the variable "foreign". So, at the end, my x variable will be with 2 levels; the first is the sum of the distinct values of rep78 when foreign =1, and the second is the sum when foreign=0

    Thank you

  • #2
    Code:
    egen tag = tag(foreign rep78)
    egen wanted = total(tag), by(foreign)

    Comment


    • #3
      Thanks a lot Øyvind Snilsberg

      Comment


      • #4
        For more discussion, see https://www.stata-journal.com/articl...article=dm0042 especially p.563

        Comment


        • #5
          Thanks for your replies, but I think I was not clear enough. The resulting sum from your suggested codes is the sum of the counts not for the original values of rep78. I need the x variable to be the sum of the distinct values of rep78 itself and this sum is calculated for each category of the foreign variable. So, if I am going to calculate it manually, the sum of x should be (1+2+3+4+5=15) when foreign =0 and (3+4+5=12) when foreign=1.

          Comment


          • #6
            I see. The same kind of machinery can be used.

            Code:
            sysuse auto, clear
            
            egen tag = tag(foreign rep78)
            
            egen wanted = total(tag * rep78), by(foreign)
            
            tabdisp foreign, c(wanted)
            
            ----------------------
            Car       |
            origin    |     wanted
            ----------+-----------
             Domestic |         15
              Foreign |         12
            ----------------------
            EDIT

            This is equivalent

            Code:
            sysuse auto, clear 
            
            bysort foreign rep78 : gen wanted = rep78 * (_n == 1)
            
            by foreign : replace wanted = sum(wanted)
            
            by foreign : replace wanted = wanted[_N]
            
            tabdisp foreign, c(wanted)
            Last edited by Nick Cox; 07 Dec 2022, 15:13.

            Comment


            • #7
              Thank you so much Nick Cox for your help

              Comment


              • #8
                See also my EDIT to #6.

                Comment

                Working...
                X