Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • NAs and missing values

    I am a stata novice, using stata for the first time in life.

    I have multiple variables as strings (0 and 1 as yes and no responses and a few missing data listed as NA) in the form:
    Var1 Var2 Var3
    0 NA 1
    0 1 1
    1 1 1
    0 1 NA
    NA 0 NA
    1 0 1
    1 NA 1
    NA NA 0
    NA 0 0
    I used a loop to convert all of them to numeric, but also need to convert all NAs that are in these variables (which are now as numeric after conversion using encode) as stata recognised missing values. So far I have tried recode and replace syntax to convert all NAs to "." but when I apply those syntax and run count if missing (var), it does not return any missing values. So I am assuming stata is not recognising "." as missing values.

    Any suggestions where I am going wrong will be highly appreciated.

    Thanks much.

  • #2
    you should use -destring- with the "force" option rather than -encode-; if you want to continue as is, please show what "NA" was encoded as

    Comment


    • #3
      Rich, thanks for your response.

      I just applied the following loop to convert string variables as numeric:

      label define noyes 0"no" 1"yes"
      for each var of varlist var1 var2 var3 {
      encode 'var', gen (_'var') label (noyes)
      }


      Comment


      • #4
        Your example and code are a little difficult to follow as your variable names in #1 are inconsistent with your code in #3 and your code couldn't possibly work without correction to foreach (no space)and ` ' (different single quotation marks). Don't paraphrase or retype code that worked: copy and paste exactly what worked.

        The main deal, however, is that you have encode the wrong way round. You have strings "0" "1" "NA" that you want to map to 0 1 missing with values labels "no" "yes". You don't have strings "no" "yes" as your label definition implies. That being so, encode will just create extra labels for your distinct values that are not what you want. To see this, type

        Code:
        label list noyes
        and you will see that encode has just added extra value labels.

        As Rich Goldstein flagged, destring is a more natural starting point. Here is a demonstration.

        Code:
        clear
        input str2 (var1 var2 var3)
        0    NA    1
        0    1    1
        1    1    1
        0    1    NA
        NA    0    NA
        1    0    1
        1    NA    1
        NA    NA    0
        NA    0    0
        end
        
        destring var*, force replace
        label define noyes 0 "no" 1 "yes"  
        foreach v of var var* {
           label val `v' noyes
        }
        
        list
        
             +--------------------+
             | var1   var2   var3 |
             |--------------------|
          1. |   no      .    yes |
          2. |   no    yes    yes |
          3. |  yes    yes    yes |
          4. |   no    yes      . |
          5. |    .     no      . |
             |--------------------|
          6. |  yes     no    yes |
          7. |  yes      .    yes |
          8. |    .      .     no |
          9. |    .     no     no |
             +--------------------+

        Comment


        • #5
          Hi Nick, thanks much for your insights. What you have demonstrated above is absolutely making sense, and you were spot on with your advice when you pointed out encode making extra labels that I certainly did not want. But how do you get this code to work if you have many variables with almost 2000 observations for each variable, and the variables of interest are scattered between other numeric variables? I tried my best to have the multiple versions of above work but to no avail. Any insights that you can kindly add to your input above? Thanks much.

          Comment


          • #6
            I don't see that having more observations or more variables of the same kind would invalidate the principles behind #4. It is unfortunately no help at all to tell us that you tried multiple versions of this code -- we can't see any -- or that they didn't work -- we can't see what happened.

            My only guess is that you are finding it difficult to distinguish variables that are just "0" "1" "NA" from the others.

            findname can help . You must install that from the Stata Journal website (type

            Code:
            search findname, sj
            and click on the latest download location which at the time of writing is

            Code:
            dm0048_3
            After that click to install. Then

            Code:
            findname, all(inlist(@, "0", "1", "NA") local(wanted)
            will find the variables wanted and put their names in a local macro. Then it's the same principle:

            Code:
            destring `wanted', force replace
            label define noyes 0 "no" 1 "yes"  
            foreach v of local wanted {    
                label val `v' noyes
            }
            If that doesn't help, you'll need to provide more information.

            Comment


            • #7
              Thanks so very much, Nick; for your input. Hugely appreciate your contribution.

              Comment

              Working...
              X