Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing variables for a subset of dataset

    Hello! I am trying to apply a series of codes to only certain observations in my dataset, conditional on country name. For each country in my dataset, I have about 60 lines of code specific to each country. Is there a way to have a series of codes only run on a subset of the observations conditional on value? I'm trying to see if rather than typing ( if country=="US") for each line of the code, I can apply it to several lines of code, almost like a loop. I tried to explore the if/else commands, but those didn't seem to work (I didn't have a command following 'else', as I didn't want any edits to be made in that section of code if country!="US".) I applied some sample code.

    replace test =10 if v00 == 1 if country=="US"
    replace test =30 if v00 == 2 if country=="US"
    replace test =75 if v00 == 3 if country=="US"
    ....

    replace test =43 if v00 == 1 if country=="Canada"
    replace test =25 if v00 == 2 if country=="Canada"
    replace test =66 if v00 == 3 if country=="Canada"

    Many thanks,
    Brooke

  • #2
    The immediate problem with your code -- which is illegal -- is that the if qualifier can appear only once in commands like yours, which is tacit but not quite explicit in the help.

    So the second if should be & (or so I presume).

    Code:
    replace test =10 if v00 == 1 & country=="US"
    Otherwise the scope for looping here is, I fear, less than you would hope.

    A construct like

    Code:
    if country == "US" {
    replace test =10 if v00 == 1
    replace test =30 if v00 == 2
    replace test =75 if v00 == 3
    }
    happens to be legal code, but it means something quite different, and would in fact be worse than useless here. What it means, as far as Stata is concerned, is

    Code:
    if country[1] == "US"
    and even if exceptionally that is true and the commands in the loop are executed they won't be restricted to observations for the US.

    On the face of it there is no pattern to your replacement values, so the prospects for a loop otherwise look dim. The only exception I can imagine is that information like

    "USA" 10 30 75
    "Canada" 43 25 66


    is in another dataset, in which case some custom code might be possible.

    The absence of an else condition isn't what's biting here.

    Comment


    • #3
      The challenge here in writing a loop doesn't lie in whether the code performs the replacement for a subset of observations. The -if- qualifier already ensures this. The issue lies in the lack of a consistent pattern in the values 10, 30, and 75 corresponding to 1, 2, and 3, respectively.

      replace test =10 if v00 == 1 if country=="US"
      replace test =30 if v00 == 2 if country=="US"
      replace test =75 if v00 == 3 if country=="US"
      Where do you get these values from? If they are in a separate dataset, perhaps merge is a more efficient approach to linking the observations. See

      Code:
      help merge

      Comment


      • #4
        In addition to the good advice offered so far, perhaps -recode- might be useful in creating a more compact code:

        Code:
        clonevar v00 = test
        recode v00 (1 = 10) (2 = 30) (3 = 75) if country == "US"
        recode v00 (1 = 43) (2 =25) (3 = 66) if country == "Canada"

        I can see ways in which that these recodes might yield to a looping approach, but as indicated by the preceding advice, the prospects for further automating the task here would depend on the form in which those country-specific values are available.

        Comment


        • #5
          Interesting. I dislike recode because I find it fiddly to type and to check. As some of the commands I like could be described similarly, people differ and what else is new?

          Comment


          • #6
            I was acquainted with -recode- from its prominence and popularity in SPSS going back to the 1970s, whose syntax for -recode- Stata's resembles closely. I like -recode- in Stata as a compact but still transparent alternative to a series of -replace- commands. I don't use it a lot with variables involving decimals because I always have to look up how it handles endpoints on constructions such as 1.0/5.0. (The use of -egen ..., cut()- gives me similar problems.) As for checking it, I recommend to people to obtain a frequency distribution on the variable before and after the recode and compare. For the user who is hazy on boolean operators and logic, something we often see on StataList, I would argue that errors with replacing multiple values (e.g.. "replace x = 0 if x == 1 | x == 2 | x == 3") are less likely with -recode-, something of little interest to experienced programmers. However, this is mostly a matter of taste here; I just feel like -recode- deserves more notice than it gets. My apologies for a minor hijacking of the thread.

            Comment


            • #7
              Dear Nick, Andrew, and Mike,

              Thank you all for your helpful comments! These data unfortunately aren't in a dataset to merge, and I have to manually type them in from surveys, per the survey manual's instructions. I was considering putting them in a separate dataset to merge, but I'm not sure if there are an advantages to that, as it will likely take some time and could introduce some errors (possibly more), as well.

              I like the recode option and didn't yet consider that, thank you! It seemed to work on my data, especially since I need to manually type and check the data anyways.

              Thanks also for noticing my error in the sample code - that indeed was a mistake as I was editing the code here, and not the root of the issue.

              Thank you,
              Brooke

              Comment

              Working...
              X