Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Destring a variable with missing values and syntax

    Hello,

    I have a stumbling block. I have some variables that were recorded as string because they include a value 'DK' for dont know. If I use encode this works great for variables without missing values. However, I also have some variables that have DK and missing values. When I use encode the missing values are recoded as '1' and all my other data is changed. Is there a way to encode and tell stata to recognise blanks as MV?

    If I try to destring the variable I get 'Q1_7: contains nonnumeric characters; no replace'. If I try to recode the DK to . I get the error 'recode only allows numeric variables r(108);'

    Does anyone have a solution on how to get rid of this pesky DK without changing the values or my other observations in the variable? IF so, is it possible to do it for my whole database or do i need to do it one variable by one?

    Thanks!


  • #2

    Code:
    replace awkward = ".a" if trim(awkward) == "DK" 
    destring awkward, replace
    except that you should do the replace in a loop. So you might use some variation on


    Code:
    foreach v of var awkward* { 
          replace `v' = ".a" if trim(`v') == "DK"
    } 
    
    destring awkward*, replace

    Comment


    • #3
      Welcome to Statalist. Not sure if this is what you meant, but you can consider adding "force" in -destring-, and give DK one of the customized missing:

      Code:
      clear
      input str5 x
      2
      ""
      5
      4
      DK
      ""
      6
      end
      
      destring x, gen(x2) force
      replace x2 = .d if x == "DK"
      
      tab x2, miss

      Comment


      • #4
        First of all: The encode command is designed for assigning numerical codes to non-numeric strings like "France", "Germany", "United States". The output of help encode instructs us

        Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
        So, although you have not shown us any examples of your data, your description suggests your data are numeric values, except for "DK" and missing ("" in Stata). If that is so, put encode out of your thoughts, it is the wrong tool for the job.

        The following code may start you in a useful direction.
        Code:
        . list
        
             +------+
             | Q1_7 |
             |------|
          1. |   42 |
          2. |   DK |
          3. |  666 |
          4. |      |
          5. |   93 |
             +------+
        
        . replace Q1_7 = ".k" if Q1_7=="DK"
        (1 real change made)
        
        . destring Q1_7, replace
        Q1_7: all characters numeric; replaced as int
        (2 missing values generated)
        
        . list
        
             +------+
             | Q1_7 |
             |------|
          1. |   42 |
          2. |   .k |
          3. |  666 |
          4. |    . |
          5. |   93 |
             +------+
        
        .

        Comment

        Working...
        X