Macros and using _all and *

rcarvalho

Join Date: Apr 2014

Posts: 6
#1

Macros and using _all and *

14 Apr 2014, 09:40

Hello all,

I am working on recoding the variables of various datasets based on their own properties. The basic rule is that if there is more than 5% missing information, then that missing information should be coded to missing (=.) instead of being allowed to remain in its original form (-99 -97).

I've built the following program which works well until the last macro. It appears that Stata is recalling the first variable as * instead of each individual variable like it did in the first macro.

local all_vars * (Trying to indicate all of the original variables in the dataset)
foreach var of local all_vars {
cap noi recode `var' (-99=1) (-97=1) (else=0), pre(missing_)
}
*
foreach var of varlist missing_* {
egen frac_`var'= mean(`var')
}
*
foreach var of local all_vars { *This is where the code fails because it reads the first variable as “*” instead of the lsit of variables used earlier
recode `var' (-99 -97=.) if frac_missing_`var'>.05 }
}

Any thoughts on how to avoid this error?
Tags: None
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

14 Apr 2014, 09:47

The problem is that the first statement (local all_vars *) is not actually creating a list of variable names. Take a look at the unab command:

Code:

unab all_vars: * ...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

14 Apr 2014, 10:12

Joe has explained the main point to your bug, but there is another.

Putting some text into a macro only to take out the same text immediately afterwards is possibly entertaining, but it's usually pointless. Try to imagine e.g. getting some chocolate, putting it into a box and then taking out the chocolate immediately after. Why did you feel obliged to do that?

Code:

foreach v of var * {

is good style. * is a varlist and foreach understands varlists, so you can go straight there.
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#4

14 Apr 2014, 10:19

Nick,

I was about the suggest the same thing, but I think the issue is that the user is creating additional variables in the second foreach loop, but wants to use the original set of variables for the third foreach loop.

Regards,
Joe
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

14 Apr 2014, 10:31

Indeed. So, for the first problem the use of a local is pointless, and with the third problem in mind, you need unab as well. As I said, the main point was made by you, that unab is needed to get the syntax right. My secondary point was about style.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

14 Apr 2014, 16:35

I am working on recoding the variables of various datasets based on their own properties. The basic rule is that if there is more than 5% missing information, then that missing information should be coded to missing (=.) instead of being allowed to remain in its original form (-99 -97).

I don't see the the utility of this exercise. Moreover retaining numeric values for missing variables is short-sighted: to exclude them from an analysis you will be forced to apply an if clause to many statements. Instead, use Stata's extended missing values, (help missing), which you can label.

Code:

sysuse auto, clear recode rep78 1 = .a 2 =.b label define mvlabel .a "Did not Know" .b "Refused" label values rep78 mvlabel tab rep78, missing

If you want to indicate that a variable has too many missing values, attach a characteristic to that variable (help char).

Last edited by Steve Samuels; 14 Apr 2014, 17:27.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

14 Apr 2014, 17:21

Steve Samuels rightly draws attention to the key question of what the code is doing, as well as the question raised in the original post.

The algorithm used as I understand it is

foreach numeric variable {
calculate the fraction of -99 and -97 values
if the fraction > 0.05 recode such values to missing
}

On the face of it this is an absurd procedure. If -97 and -99 mean two flavours of missing, they have that meaning regardless of how often they occur. I don't think you could trust any results for the variables not recoded without access to the original data so that you could do your own corrections.

Incidentally, mvdecode is a more convenient command, as it allows not only specifying different values that mean missing, but also a loop over variables.
Comment
rcarvalho

Join Date: Apr 2014

Posts: 6
#8

15 Apr 2014, 15:42

Hey everyone,

Thanks for all the replies and insights into the code. I am a fairly new Stata user to so I often have trouble getting my code to work properly, particularly when using macros, and that may account for my absurd procedures!

The main thing I wanted this code to do was go through a list of datasets and perform the procedure that Nick Cox outlined in the previous post. Essentially, there is a tracking study and some of the data recoding procedures changed at one point, and I need to observe said rules when cleaning the datasets so I can construct an aggregate file.

I was having trouble specifying that the initial macro look at all variables because I was using local all_vars * and this was returning an asterisk at a later macro. The unab suggestion worked really well but one of my colleagues also suggested I use:

describe *, varlist
local all_vars "`r(varlist)'"

This worked really well and the code is now up and running and going through all the 30+ datasets I need it to!

Thank you everyone for the assistance and sorry for the delay in responding
Comment

Announcement

Macros and using _all and *

Comment

Comment

Comment

Comment

Comment

Comment

Comment