Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • difference value over subsamples

    Dear Statalist,

    I have 2 variables, Y & X, where X is a categorical variable. I want the following value:

    average_y if x==1 minus average_y if x==0.

    I'd appreciate any insights. Thanks,

    ps: I can do this manually by doing the following code:
    sum y if x==0
    sum y if x==1
    then get their mean value and subtract by a calculator.
    But since I have many variables and subgroups that I want to get the differences, I would want to get the differences more efficiently.
    Last edited by Sa Fe; 20 Dec 2021, 11:58.

  • #2
    Saber:
    you may want to consider:
    Code:
    . set obs 10
    Number of observations (_N) was 0, now 10.
    
    . g y=runiform()
    
    . g x=0 in 1/5
    (5 missing values generated)
    
    . replace x=1 if x==.
    (5 real changes made)
    
    . list
    
         +--------------+
         |        y   x |
         |--------------|
      1. | .3488717   0 |
      2. | .2668857   0 |
      3. | .1366463   0 |
      4. | .0285569   0 |
      5. | .8689333   0 |
         |--------------|
      6. | .3508549   1 |
      7. | .0711051   1 |
      8. |  .323368   1 |
      9. | .5551032   1 |
     10. |  .875991   1 |
         +--------------+
    
    . g wanted=y if x==1
    (5 missing values generated)
    
    . replace wanted=y*(-1) if x==0
    (5 real changes made)
    
    . list
    
         +--------------------------+
         |        y   x      wanted |
         |--------------------------|
      1. | .3488717   0   -.3488717 |
      2. | .2668857   0   -.2668857 |
      3. | .1366463   0   -.1366463 |
      4. | .0285569   0   -.0285569 |
      5. | .8689333   0   -.8689333 |
         |--------------------------|
      6. | .3508549   1    .3508549 |
      7. | .0711051   1    .0711051 |
      8. |  .323368   1     .323368 |
      9. | .5551032   1    .5551032 |
     10. |  .875991   1     .875991 |
         +--------------------------+
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Consider some variation on

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str1 group float(binary whatever)
      "a" 0 10
      "a" 1  1
      "b" 0  9
      "b" 1  2
      "c" 0  8
      "c" 1  3
      "d" 0  7
      "d" 1  4
      "e" 0  6
      "e" 1  5
      end
      
      bysort group : egen wanted = total((binary == 1) * whatever - (binary == 0) * whatever)
      
      list, sepby(group)
      
           +------------------------------------+
           | group   binary   whatever   wanted |
           |------------------------------------|
        1. |     a        0         10       -9 |
        2. |     a        1          1       -9 |
           |------------------------------------|
        3. |     b        0          9       -7 |
        4. |     b        1          2       -7 |
           |------------------------------------|
        5. |     c        0          8       -5 |
        6. |     c        1          3       -5 |
           |------------------------------------|
        7. |     d        0          7       -3 |
        8. |     d        1          4       -3 |
           |------------------------------------|
        9. |     e        0          6       -1 |
       10. |     e        1          5       -1 |
           +------------------------------------+
      The example just has one observation that is 1 and one observation that is 0 for each group, but the code isn't based on that being true.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Saber:
        . g wanted=y if x==1

        . replace wanted=y*(-1) if x==0
        [/CODE]
        Thanks Carlo,
        Actually I wanted to obtain the averages difference in y over subsamples of x. I edited my original post accordingly.
        Thanks,

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Consider some variation on

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str1 group float(binary whatever)
          "a" 0 10
          "a" 1 1
          "b" 0 9
          "b" 1 2
          "c" 0 8
          "c" 1 3
          "d" 0 7
          "d" 1 4
          "e" 0 6
          "e" 1 5
          end
          
          bysort group : egen wanted = total((binary == 1) * whatever - (binary == 0) * whatever)
          
          list, sepby(group)
          
          +------------------------------------+
          | group binary whatever wanted |
          |------------------------------------|
          1. | a 0 10 -9 |
          2. | a 1 1 -9 |
          |------------------------------------|
          3. | b 0 9 -7 |
          4. | b 1 2 -7 |
          |------------------------------------|
          5. | c 0 8 -5 |
          6. | c 1 3 -5 |
          |------------------------------------|
          7. | d 0 7 -3 |
          8. | d 1 4 -3 |
          |------------------------------------|
          9. | e 0 6 -1 |
          10. | e 1 5 -1 |
          +------------------------------------+
          The example just has one observation that is 1 and one observation that is 0 for each group, but the code isn't based on that being true.
          Thanks Nick,
          But what I want is to have only one number: average of y over x==1 - average of y over x==0.

          Thanks,

          Comment


          • #6
            OK; start with

            Code:
            bysort group : egen mean1 = mean(cond(binary == 1, whatever, .))  
            
            bysort group : egen mean0 = mean(cond(binary == 0, whatever, .))

            Comment


            • #7
              Originally posted by Nick Cox View Post
              OK; start with

              Code:
              bysort group : egen mean1 = mean(cond(binary == 1, whatever, .))
              
              bysort group : egen mean0 = mean(cond(binary == 0, whatever, .))
              I get an error:
              invalid syntax
              r(198);

              Comment


              • #8
                Saber:
                Nick's code works for me:
                Code:
                . bysort group : egen mean1 = mean(cond(binary == 1, whatever , .))
                
                . bysort group : egen mean0 = mean(cond(binary == 0, whatever, .))
                
                . list
                
                     +-------------------------------------------+
                     | group   binary   whatever   mean1   mean0 |
                     |-------------------------------------------|
                  1. |     a        0         10       1      10 |
                  2. |     a        1          1       1      10 |
                  3. |     b        0          9       2       9 |
                  4. |     b        1          2       2       9 |
                  5. |     c        0          8       3       8 |
                     |-------------------------------------------|
                  6. |     c        1          3       3       8 |
                  7. |     d        0          7       4       7 |
                  8. |     d        1          4       4       7 |
                  9. |     e        0          6       5       6 |
                 10. |     e        1          5       5       6 |
                     +-------------------------------------------+
                
                .
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Yeah, it worked now! Thanks,

                  Comment

                  Working...
                  X