Replacing variables for a subset of dataset

Brooke Am

Join Date: Jan 2024

Posts: 2
#1

Replacing variables for a subset of dataset

31 Jan 2024, 08:12

Hello! I am trying to apply a series of codes to only certain observations in my dataset, conditional on country name. For each country in my dataset, I have about 60 lines of code specific to each country. Is there a way to have a series of codes only run on a subset of the observations conditional on value? I'm trying to see if rather than typing ( if country=="US") for each line of the code, I can apply it to several lines of code, almost like a loop. I tried to explore the if/else commands, but those didn't seem to work (I didn't have a command following 'else', as I didn't want any edits to be made in that section of code if country!="US".) I applied some sample code.

replace test =10 if v00 == 1 if country=="US"
replace test =30 if v00 == 2 if country=="US"
replace test =75 if v00 == 3 if country=="US"
....

replace test =43 if v00 == 1 if country=="Canada"
replace test =25 if v00 == 2 if country=="Canada"
replace test =66 if v00 == 3 if country=="Canada"

Many thanks,
Brooke
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35219
#2

31 Jan 2024, 09:14

The immediate problem with your code -- which is illegal -- is that the if qualifier can appear only once in commands like yours, which is tacit but not quite explicit in the help.

So the second if should be & (or so I presume).

Code:

replace test =10 if v00 == 1 & country=="US"

Otherwise the scope for looping here is, I fear, less than you would hope.

A construct like

Code:

if country == "US" { replace test =10 if v00 == 1 replace test =30 if v00 == 2 replace test =75 if v00 == 3 }

happens to be legal code, but it means something quite different, and would in fact be worse than useless here. What it means, as far as Stata is concerned, is

Code:

if country[1] == "US"

and even if exceptionally that is true and the commands in the loop are executed they won't be restricted to observations for the US.

On the face of it there is no pattern to your replacement values, so the prospects for a loop otherwise look dim. The only exception I can imagine is that information like

"USA" 10 30 75
"Canada" 43 25 66

is in another dataset, in which case some custom code might be possible.

The absence of an else condition isn't what's biting here.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9957
#3

31 Jan 2024, 09:20

The challenge here in writing a loop doesn't lie in whether the code performs the replacement for a subset of observations. The -if- qualifier already ensures this. The issue lies in the lack of a consistent pattern in the values 10, 30, and 75 corresponding to 1, 2, and 3, respectively.

replace test =10 if v00 == 1 if country=="US"
replace test =30 if v00 == 2 if country=="US"
replace test =75 if v00 == 3 if country=="US"

Where do you get these values from? If they are in a separate dataset, perhaps merge is a more efficient approach to linking the observations. See

Code:

help merge
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#4

31 Jan 2024, 09:51

In addition to the good advice offered so far, perhaps -recode- might be useful in creating a more compact code:

Code:

clonevar v00 = test recode v00 (1 = 10) (2 = 30) (3 = 75) if country == "US" recode v00 (1 = 43) (2 =25) (3 = 66) if country == "Canada"

I can see ways in which that these recodes might yield to a looping approach, but as indicated by the preceding advice, the prospects for further automating the task here would depend on the form in which those country-specific values are available.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35219
#5

31 Jan 2024, 10:19

Interesting. I dislike recode because I find it fiddly to type and to check. As some of the commands I like could be described similarly, people differ and what else is new?
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#6

31 Jan 2024, 12:51

I was acquainted with -recode- from its prominence and popularity in SPSS going back to the 1970s, whose syntax for -recode- Stata's resembles closely. I like -recode- in Stata as a compact but still transparent alternative to a series of -replace- commands. I don't use it a lot with variables involving decimals because I always have to look up how it handles endpoints on constructions such as 1.0/5.0. (The use of -egen ..., cut()- gives me similar problems.) As for checking it, I recommend to people to obtain a frequency distribution on the variable before and after the recode and compare. For the user who is hazy on boolean operators and logic, something we often see on StataList, I would argue that errors with replacing multiple values (e.g.. "replace x = 0 if x == 1 | x == 2 | x == 3") are less likely with -recode-, something of little interest to experienced programmers. However, this is mostly a matter of taste here; I just feel like -recode- deserves more notice than it gets. My apologies for a minor hijacking of the thread.
Comment
Brooke Am

Join Date: Jan 2024

Posts: 2
#7

31 Jan 2024, 13:54

Dear Nick, Andrew, and Mike,

Thank you all for your helpful comments! These data unfortunately aren't in a dataset to merge, and I have to manually type them in from surveys, per the survey manual's instructions. I was considering putting them in a separate dataset to merge, but I'm not sure if there are an advantages to that, as it will likely take some time and could introduce some errors (possibly more), as well.

I like the recode option and didn't yet consider that, thank you! It seemed to work on my data, especially since I need to manually type and check the data anyways.

Thanks also for noticing my error in the sample code - that indeed was a mistake as I was editing the code here, and not the root of the issue.

Thank you,
Brooke
Comment

Announcement

Replacing variables for a subset of dataset

Comment

Comment

Comment

Comment

Comment

Comment