Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop variables if total sum is the largest

    Dear all,
    I’m working with a dataset of 396 observations and 148 variables. I’m trying to drop a variable if its total sum is the largest among the total sum of the rest of the variables. Each variable is a firm, and each observation is monthly profits. I’m trying to drop the firm with the largest sum of profits and for a second analysis, drop the first and second firms with the largest sum of profits. Any suggestion on how I should approach this?

    Thank you,
    Veronica

  • #2
    The code below does something similar to what you want on a dummy dataset. You might be able to adjust it to work on your data. I'm sure this is not the most elegant solution. If you want help with your code, please see the FAQ, specially point 12.

    Code:
    clear
    input float(var1 var2 var3)
    1 2 1
    1 2 1
    1 2 1
    end
    
    xpose, clear
    egen total = rowtotal(v1-v3)
    sort total
    gen n = _n
    gen N = _N
    drop if n == N
    drop total n N
    xpose, clear

    Comment


    • #3
      In essence you need to loop over the variables keeping track of which variable seen has the largest sum. Here is some technique. For dropping the secpnd biggest too, drop the biggest first, and then repeat.

      Code:
      * sandbox
      clear
      set obs 10
      set seed 2803
      forval j = 1/5 {
          gen y`j' = runiformint(1, 20)
      }
      
      * you start here, except that variable names are likely to be different
      local max = -1e9
      
      foreach v of var y* {
          sum `v', meanonly
          if r(sum) > `max' {
              local max = r(sum)
              local which "`v'"
          }
      }
      
      di "`which'"
      
      * check
      
      tabstat y*, s(sum) c(s)
      
          variable |       sum
      -------------+----------
                y1 |        68
                y2 |       101
                y3 |        77
                y4 |       112
                y5 |       103
      ------------------------
      
      drop `which'
      
      tabstat y*, s(sum) c(s)
      
          variable |       sum
      -------------+----------
                y1 |        68
                y2 |       101
                y3 |        77
                y5 |       103
      ------------------------
      All that said, reshape long is two words of advice.

      Comment


      • #4
        Thank you so much, Nick. That was exactly what I was looking for. Thanks also for the advice. I will try reshaping, adding, ranking, and dropping.
        Best regards,

        Veronica

        Comment

        Working...
        X