Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing multiple values, with the same condition

    Hi All,

    Although the -replace- command is fairly intuitive, it is sometimes a bit tricky.

    Specifically, say I have a dataset as the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(y x1 x2)
     5   .  .
    12 123 32
    32 113  1
    end
    In the above, we have data on y, and two covariates: x1 and x2. The first row data for the two x variables is missing. Suppose that when both of these are missing, I know that x1 and x2 should be 5. Intuitively, I would do:

    Code:
    replace x1=5 & x2=5 if missing(x1)
    This is an illegal command ( it shouldn't be, in my view). This can be achieved by brute force:

    Code:
    replace x1=5 if missing(x1)
    replace x2=5 if x1==5
    The problem, of course with the second method is that it would modify the data incorrectly if x1==5 elsewhere, where it was not missing. That being said, there are two issues:
    1. Why can replace not be used for multiple replacements at once as above?
    2. What can be done about the fact that if two replacements are to be made using the same condition (missing(x1) in the case above), and if they cannot be done simultaneously, how can one account for the fact that the second replacement can only be identified using a condition that is not unique for the replacement (if x1==5)?
    Many thanks,
    CS

  • #2
    Use an auxiliary variable to store the condition:

    gen q=mi(x1) & mi(x2)
    replace x1=5 if q
    replace x2=5 if q
    drop q
    As to why -replace- can only replace one variable, it seems more "orthogonal" to keep -replace- simple for the most common case, and to let the user create a variable when needed.
    Last edited by Jean-Claude Arbaut; 11 Aug 2020, 09:44.

    Comment


    • #3
      Q1 is for StataCorp except that there can be a user view too. In my opinion the & in your syntax is totally unStataish. & is a logical operator used in testing the truth of a condition. I don't recall it ever being used for stipulating compound actions. Worse, it is allowed already in replace as you would find with


      Code:
      replace x1=5 & x2==5 if missing(x1)
      Now that doesn't mean anything similar. The point is that a parser would need on your proposal to be able to distinguish two quite different meanings.

      Still, that's cosmetic in a sense and if the question is why not allow


      Code:
      replace x1=5 x2=5 if missing(x1)


      my answer is to protect users from getting confused, but StataCorp's answer might be very different. At this point I am reminded of recode and of why I dislike it, although no one should care, and it has many fans too.

      The broad fact is that brevity and clarity can fight each other. I spent some time happily messing with a language J in which

      Code:
      mean =. +/ % #
      is an entire program and now I can remember what it means, but anything much longer often took about 2 hours to write in concise form and its meaning was usually forgotten in 2 weeks, if not 2 days. Some of these languages were written by geniuses for extraordinarily smart users, which is why they haven't conquered the world. (Even now certain languages are happily being complicated by some of their smartest users, and thereby fuelling their own eventual extinction as just too darned complicated for mortals in a hurry.)

      The problem you pose in Q2 is interesting and I can't suggest more than

      Code:
      gen bad = missing(x1)
      replace x1 = 5 if bad
      replace x2 = 5 if bad 
      although you could always write a program that accepted something like your syntax


      Code:
      csreplace x1=5 x2=5 if missing(x1) 
      if you wanted this enough. Here "you" doesn't mean that I am volunteering.
      Last edited by Nick Cox; 11 Aug 2020, 09:57.

      Comment


      • #4
        -recode- will work here:
        Code:
        recode x1 x2 (nonmissing = 5) if missing(x1)

        Comment


        • #5
          I misread. Of course, if you wat to replace x1 and x2 when x1 alone is missing, it's simpler (unless you use recode) to write

          replace x2=5 if mi(x1)
          replace x1=5 if mi(x1)
          The problem arises when x1 is overwritten and you still need its values. But then an auxiliary variable is the way to go.

          Comment


          • #6
            Thank you all for the responses.
            Nick Cox Thank you especially for the thoughtful answers- always something to learn in your responses.


            Best,
            CS

            Comment


            • #7
              -replace- is a powerful command because it "does things dynamically in the current sort order," and this has applications such as what I call "cascading". Primitive example of "cascading" is that I can replicate anything that the running -sum- does by -replace-.

              Code:
              . set obs 10
              number of observations (_N) was 0, now 10
              
              . gen x = rnormal()
              
              . gen sumx = sum(x)
              
              . replace x = x[_n-1]+x in 2/l
              (9 real changes made)
              
              . list , sep(0)
              
                   +-----------------------+
                   |         x        sumx |
                   |-----------------------|
                1. |  1.095368    1.095368 |
                2. |  2.428017    2.428017 |
                3. |  2.067329    2.067329 |
                4. |  2.802434    2.802434 |
                5. |  1.840951    1.840951 |
                6. |  2.593081    2.593081 |
                7. |  2.126428    2.126428 |
                8. | -.3256483   -.3256482 |
                9. | -2.647654   -2.647654 |
               10. | -3.259224   -3.259224 |
                   +-----------------------+
              This great power of -replace- is also a great danger, hence I also call it "the backstabbing -replace-" because it is the Stata command that catches me off guard most often in applications where I do not want to cascade, but -replace- cascades. Which brings me to this case Original Poster raises: My view is that in this case -replace- on the original variable is fundamentally inappropriate, because no cascading is desired.
              Code:
              . gen x1new = x1
              (1 missing value generated)
              
              . replace x1new = 5 if missing(x1) & missing(x2)
              (1 real change made)
              
              . gen x2new = x2
              (1 missing value generated)
              
              . replace x2new = 5 if missing(x1) & missing(x2)
              (1 real change made)



              Comment

              Working...
              X