Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Refering to multiple variables within if conditions

    Hi all

    I have two problems which are tedious to solve if I need to manually type everything out, so I am looking for a smarter solution. I am posting them in the same post because they both fit under the same headliner.

    In the first problem, my dataset is in the long format and I have a variable 'of' which needs to take the value one, if the variable 'y' takes on a different value than zero in the same row as 'of' and 49 rows below. This could be done typing out:

    Code:
    gen of = 0
    replace of if y!=. & y[_n+1]!=. & y[_n+2]!=.
    ..continuing the code untill y[_n+49].

    In the second problem, I need to drop observations if a long set of variables, lets call them a, b, c, takes on the value 0. Again this could be done writing

    Code:
    drop if a==0 & b==0 & c==0
    ..continuing the code until I have listed every variable. I wonder if there would be some way of not having to write the "==0 &" part for every variable but just, e.g., defining a list of "a b c" and then writing to drop the observations if all variable in the list equals zero?


  • #2
    I believe the following will do what you want. For both requests, there could be some problems depending on how you want to treat missing values. (e.g., For the second request, what would you like if each variable of a, b, c, ... is 0 except for one missing?). I presumed no missing values because you didn't say otherwise.

    Regarding the second question, the answer illustrates a good Stata principle: If you can't think of how to do what seems like something fairly ordinary, look at -help egen- to see if there's anything relevant.

    Code:
    clear
    // Example data for first request
    set obs 20
    gen int id = _n
    gen byte y = cond( runiform() < 0.8, 0, 1)
    //
    // First question.
    // Use a window of 3 to have something that can be checked by eye. Use 49 once tested.
    local window = 3  
    gen int of = 0
    // Count how many nonzero values occur in the window.
    forval i = 0/`window' {
       replace of = of + (y[_n+`i'] != 0)
    }
    replace of = 1 if (of > 0)
    //
    // Example data for second request
    foreach v of newlist a b c d e {
       gen byte `v' = (runiform() > 0.7)
    }
    egen sum = rowtotal(a-d)
    drop if sum == 0

    Comment


    • #3
      Here are different approaches to the same questions:
      Code:
      clear
      // Example data for first request
      set obs 20
      gen int id = _n
      gen byte y = runiformint(0, 1)
      //
      
      local window 3
      gen byte is_non_zero = (y != 0)
      rangestat (min) of = is_non_zero, interval(id 0 `=`window'-1')
      
      // Example data for second request
      foreach v of newlist a b c d e {
         gen byte `v' = (runiform() > 0.7)
      }
      
      ds a-e
      local vbles `r(varlist)'
      egen zero_count = anycount(`vbles'), values(0)
      drop if zero_count == `:word count `vbles''
      Notes:
      1. I have shameless stolen Mike Lacy's code for generating toy data, with a slight modification for the first example.
      2. -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC. In a large data set, this will be noticeably faster than the approach in #2.
      3. For the second problem, whatever is the easiest way to create a local macro vbles that contains all and only the variables of interest should be used. In this case, since a-e are consecutive in the data set, using the a-e wildcard was best. If yours are not so conveniently situated, you may have to resort to some other way to specify that list of variables in the -ds- command. (or skip -ds- and just list them out once in the -local vbles ...- command) Once that's done, -egen- takes care of the rest. I offer this solution because the one in will only work correctly if the variables are all non-negative integers. The code here will work correctly for any numeric variables.

      Comment


      • #4
        #2 Mike Lacy

        It may not bite for the OP's data but my inner pedant wants to point out that a row sum being zero does not imply that each term is zero, as 1 and -1 exemplify. Naturally this is not an issue for indicator variables known to be 0, 1 or even missing.

        Comment


        • #5
          Not a pedantic point on Nick's part at all. For no good reason, I had blithely assumed all the values were non-negative.

          Comment


          • #6
            Just piling on here in response to #4 and #5: this is exactly what I had in mind when I wrote "I offer this solution because the one in will only work correctly if the variables are all non-negative integers. The code here will work correctly for any numeric variables."

            Comment

            Working...
            X