Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove variable that contain "NA" in wideformat

    I am currently working on income per capita per state, but the data I download is in wide format, in order to conver to short format, and in first couple year data are missing and coded as string "(NA)" but in recent year with numbers. So in order to reshape, I need to keep those in either string or numeric.

    sample data as follow:

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str20 state str8(Y1948_Q1 Y1948_Q2 Y1948_Q3 Y1948_Q4 Y1950_Q1 Y1950_Q2 Y1950_Q3 Y1950_Q4) float(Y2012_Q1 Y2012_Q2 Y2012_Q3 Y2012_Q4)
    "Alabama"    "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" 35384 35616 35197 36036
    "Alaska"     "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" 53209 53429 53064 53682
    "Arizona"    "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" 35995 36268 35946 37099
    "Arkansas"   "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" 35787 36250 35853 37256
    "California" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" "(NA)" 46905 47373 47202 49643
    end
    ------------------ copy up to and including the previous line ------------------




    I have transfered all to string then reshape and drop previous year. Code as follow:

    Code:
    foreach i of varlist _all {
        local a : variable label `i'
        local a: subinstr local a ":" "_"
        label var `i' "`a'"
    }
    
    
    
    foreach v of varlist Q* v* {
        local x: variable label `v'
        rename `v' Y`x'
        
    }
    
    tostring Y*, replace force
    
    reshape long Y, i(state) j(date) string
    drop if Y == "(NA)"
    destring Y, replace
    But I also want to know how to remove the year and quarter if it contain "(NA)" so that can reduce the process time when reshaping.

    here is what I try but retuen error code:

    Code:
    
    foreach v of varlist _all {
         replace `v' = "" if `v' == "(NA)"
        
     }
    
    (59 real changes made)
    type mismatch
    r(109);
    
    
    destring Y*, replace force
     
    foreach var of varlist _all {
         capture assert mi(`var')
        if !_rc {
            drop `var'
        }
    }

  • #2
    If it is consistent that a string variable is a string because it has "NA" entries, and if this holds true across all states:

    Code:
    quietly ds Y????_Q?, has(type string)
    drop `r(varlist)'

    Res.:

    Code:
    . l
    
         +--------------------------------------------------------+
         |      state   Y2012_Q1   Y2012_Q2   Y2012_Q3   Y2012_Q4 |
         |--------------------------------------------------------|
      1. |    Alabama      35384      35616      35197      36036 |
      2. |     Alaska      53209      53429      53064      53682 |
      3. |    Arizona      35995      36268      35946      37099 |
      4. |   Arkansas      35787      36250      35853      37256 |
      5. | California      46905      47373      47202      49643 |
         +--------------------------------------------------------+

    Comment


    • #3
      Here is another way to do it. I use findname from the Stata Journal, which is in effect a superset of ds, although some of the syntax differs in detail.

      Code:
      findname, all(@ == "(NA)")
      drop `r(varlist)'

      Comment


      • #4
        Thank you all, both of them work well

        Comment

        Working...
        X