Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apply if condition on each row seperately

    I need to transform some variables to another structure. I try to do this with if else statements that go through each of the old variables and assign the values to the new variables. I attached some mock data. Old variables all start with an f and have suffix 1 to 11 and new variables start with an a and have suffix 1 to 4.

    What I want to achieve? I have 11 variables filled with some data (integers) and a lot of empty cells. There are no more than 4 entries for any observation and I want to "condense" the data into another format such that for each row the 4 new variables are filled from left to right.

    What I want my algorithm to do in words?
    First the entries of the first old variable f865_1 should be assigned. For each observation check whether the first new variable is already filled, if not fill it with the value of the old variable. If it is already filled check the second new variable, if this one is still free place the observation here. Etc. This is repeated for each of the old variables.


    Code:
     foreach v of varlist f865_1 f865_2 f865_3 f865_4 ///
    f865_5 f865_6 f865_7 f865_8 f865_9 f865_10 f865_11{
        if a865_1 == . & `v' != . {
        replace a865_1 = `v'
        }
        else if a865_2 == . & `v' != . {
            replace a865_2 = `v'
        }
        else if a865_3 == . & `v' != . {
            replace a865_3 = `v'
        }
        else if a865_4 == . & `v' != . {
            replace a865_4 = `v'
        }
    }
    However, the code I produced does not check the conditions for each observation but only once for the whole column and therefore produces wrong results. In other programmes I would probably nest another loop inside this one that goes over the observations, but I read that you should (almost) never do this in Stata. Maybe there is even a completely different and much simpler solution to my problem?

    In case my goal is not yet clear, here some example of what I want to achieve:
    old1 old2 old3 old4 old5 old6 old7 new1 new2 new3
    12 . . 3 7 . . 12 3 7
    6 5 . . . . . 6 5 .
    . . . . . . 12 12 . .
    8 . 10 . 7 . . 8 10 7


    Attached Files

  • #2
    The -if- statement doesn't apply separately for each observation. It either performs the action on all the observations or none. You are looking for the -if qualifier- See https://www.stata.com/support/faqs/p...-if-qualifier/

    Comment


    • #3
      Not quite. if in the way that you have used it checks only in the first observation, but we can agree that what it does is not what you want. For more, see https://www.stata.com/support/faqs/p...-if-qualifier/ (To my mind, that FAQ is backward: the supposed question is usually the answer.)

      I haven't looked in your .dta attachment, but here is a direct assault on your problem.


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(old1 old2 old3 old4 old5 old6 old7)
      12 .  . 3 7 .  .
       6 5  . . . .  .
       . .  . . . . 12
       8 . 10 . 7 .  .
       . .  . . . .  .
      end
      
      egen new = concat(old*), p(" ")
      replace new = subinstr(new, ".", "", .)
      split new, destring
      
      l new*
      
      
           +---------------------------------+
           |        new   new1   new2   new3 |
           |---------------------------------|
        1. | 12   3 7       12      3      7 |
        2. |   6 5           6      5      . |
        3. |         12     12      .      . |
        4. | 8  10  7        8     10      7 |
        5. |                 .      .      . |
           +---------------------------------+
      That code hinges on all genuine values being integers. and as you say they are.

      For future visitors: Should that not be true, this is what I would do.


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(old1 old2 old3 old4 old5 old6 old7)
      12 .  . 3 7 .  .
       6 5  . . . .  .
       . .  . . . . 12
       8 . 10 . 7 .  .
       . .  . . . .  .
      end
      
      gen id = _n
      reshape long old, i(id)
      drop if missing(old)
      bysort id (_j) : replace _j = _n
      rename old new
      reshape wide new, i(id) j(_j)
      
      l new*
      .


      Comment


      • #4
        Thank you! Yes, indeed I neglected how the if command works differently from the if qualifier. The blogpost you referenced lifted my confusion. Also, both your suggestions work! I particularly like the idea of temporarily transforming to long format - gonna keep this in mind for the future.

        Comment

        Working...
        X