NAs and missing values

Keshab Parajuli

Join Date: Apr 2019

Posts: 14
#1

NAs and missing values

27 Apr 2019, 02:57

I am a stata novice, using stata for the first time in life.

I have multiple variables as strings (0 and 1 as yes and no responses and a few missing data listed as NA) in the form:

Var1 Var2 Var3

0 NA 1

0 1 1

1 1 1

0 1 NA

NA 0 NA

1 0 1

1 NA 1

NA NA 0

NA 0 0

I used a loop to convert all of them to numeric, but also need to convert all NAs that are in these variables (which are now as numeric after conversion using encode) as stata recognised missing values. So far I have tried recode and replace syntax to convert all NAs to "." but when I apply those syntax and run count if missing (var), it does not return any missing values. So I am assuming stata is not recognising "." as missing values.

Any suggestions where I am going wrong will be highly appreciated.

Thanks much.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4438
#2

27 Apr 2019, 03:55

you should use -destring- with the "force" option rather than -encode-; if you want to continue as is, please show what "NA" was encoded as
Comment
Keshab Parajuli

Join Date: Apr 2019

Posts: 14
#3

27 Apr 2019, 04:22

Rich, thanks for your response.

I just applied the following loop to convert string variables as numeric:

label define noyes 0"no" 1"yes"
for each var of varlist var1 var2 var3 {
encode 'var', gen (_'var') label (noyes)
}
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#4

27 Apr 2019, 06:11

Your example and code are a little difficult to follow as your variable names in #1 are inconsistent with your code in #3 and your code couldn't possibly work without correction to foreach (no space)and ` ' (different single quotation marks). Don't paraphrase or retype code that worked: copy and paste exactly what worked.

The main deal, however, is that you have encode the wrong way round. You have strings "0" "1" "NA" that you want to map to 0 1 missing with values labels "no" "yes". You don't have strings "no" "yes" as your label definition implies. That being so, encode will just create extra labels for your distinct values that are not what you want. To see this, type

Code:

label list noyes

and you will see that encode has just added extra value labels.

As Rich Goldstein flagged, destring is a more natural starting point. Here is a demonstration.

Code:

clear input str2 (var1 var2 var3) 0 NA 1 0 1 1 1 1 1 0 1 NA NA 0 NA 1 0 1 1 NA 1 NA NA 0 NA 0 0 end destring var*, force replace label define noyes 0 "no" 1 "yes" foreach v of var var* { label val `v' noyes } list +--------------------+ | var1 var2 var3 | |--------------------| 1. | no . yes | 2. | no yes yes | 3. | yes yes yes | 4. | no yes . | 5. | . no . | |--------------------| 6. | yes no yes | 7. | yes . yes | 8. | . . no | 9. | . no no | +--------------------+
Comment
Keshab Parajuli

Join Date: Apr 2019

Posts: 14
#5

27 Apr 2019, 08:30

Hi Nick, thanks much for your insights. What you have demonstrated above is absolutely making sense, and you were spot on with your advice when you pointed out encode making extra labels that I certainly did not want. But how do you get this code to work if you have many variables with almost 2000 observations for each variable, and the variables of interest are scattered between other numeric variables? I tried my best to have the multiple versions of above work but to no avail. Any insights that you can kindly add to your input above? Thanks much.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35405
#6

27 Apr 2019, 10:00

I don't see that having more observations or more variables of the same kind would invalidate the principles behind #4. It is unfortunately no help at all to tell us that you tried multiple versions of this code -- we can't see any -- or that they didn't work -- we can't see what happened.

My only guess is that you are finding it difficult to distinguish variables that are just "0" "1" "NA" from the others.

findname can help . You must install that from the Stata Journal website (type

Code:

search findname, sj

and click on the latest download location which at the time of writing is

Code:

dm0048_3

After that click to install. Then

Code:

findname, all(inlist(@, "0", "1", "NA") local(wanted)

will find the variables wanted and put their names in a local macro. Then it's the same principle:

Code:

destring `wanted', force replace label define noyes 0 "no" 1 "yes" foreach v of local wanted { label val `v' noyes }

If that doesn't help, you'll need to provide more information.
1 like
Comment
Keshab Parajuli

Join Date: Apr 2019

Posts: 14
#7

27 Apr 2019, 10:45

Thanks so very much, Nick; for your input. Hugely appreciate your contribution.
Comment

Var1	Var2	Var3
0	NA	1
0	1	1
1	1	1
0	1	NA
NA	0	NA
1	0	1
1	NA	1
NA	NA	0
NA	0	0

Announcement

NAs and missing values

Comment

Comment

Comment

Comment

Comment

Comment