Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace all missing values as "NA" (type mismatch?)

    Hi - new to Stata.

    I've run a script in Stata but now want to move my dataset into R. R is treating all Stata missing values as 0, so I want to recode them all in Stata as "NA". I've tried:

    recode * (""=="NA")

    But am getting the error code type mismatch. Is there a command to replace all missing variables, or to change the type of all missing variables? Or is there a better way to change variables before moving a dataset to R?

    Thanks,
    Sarah

  • #2
    Sarah:
    welcome to this forum.
    You may want to try something along to the following lines:
    Code:
    replace A=.a if A==.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      ---Crossed with Carlo: It is easier to follow Carlo's suggestion and then export---

      You are probably mixing up numeric variables with string variables. First, convert all variables to strings. Here is some method

      Code:
      sysuse auto, clear
      *IDENTIFY NUMERIC VARIABLES AND STORE IN A LOCAL MACRO
      local vars "price weight mpg"
      foreach var in `vars'{
            gen s`var'= string(`var')
            drop `var'
            rename s`var' `var'
       }
      
      *REPLACE MISSING WITH "NA"
      foreach var in `vars'{
            replace `var'="NA" if `var'==""
       }

      Note that you can use a Stata data set directly in R via the foreign package, and therefore you do not need to change the format. See this page for details.
      Last edited by Andrew Musau; 28 Aug 2018, 04:11.

      Comment


      • #4
        Would 'save2rda-- Export data from Stata to R in RData(RDA2) format' from http://www.radyakin.org/transfer/save2rda/ be of any use?

        Comment


        • #5
          Just to explain that recode has nothing to do with string variables. That's in the help:

          recode changes the values of numeric variables according to the rules specified.
          I have to say -- sorry@Andrew Musau -- that I don't think converting all your data to strings is a good idea. Also, if #3 were a good idea, then the loop is a reinvention of tostring without any of the safety features of the latter.

          Can you document R treating system missing as zero?

          Comment


          • #6
            Nick Cox I had it in the back of my mind that there is a command that does the opposite of destring. I just never had to use it before.

            Comment


            • #7
              It would be weird if R sets missings to zero. That's definitely not default behavior.
              I just had a go at saving a copy of auto.dta, imported with the haven package in R, and that behaves as it should. Missings are missings.
              Maybe you should share a snippet of your data, with dataex, here. See instructions on dataex in 12.2 of the FAQ: https://www.statalist.org/forums/help
              It is possible, for examples, that your missings are numerical variables with text labels, which could cause problems.

              Comment


              • #8
                i fail to reproduce the problem mentioned in #1:

                Below, an example of a "toy" data with "n.a.", then handled in Stata, finally imported in R:

                Code:
                . input str10 (treat value)
                
                          treat       value
                  1. DrugA 43
                  2. DrugB 31
                  3. DrugA 56
                  4. DrugB "n.a."
                  5. end
                
                . replace value = ".a" if value == "n.a."
                
                . destring value, replace
                
                . labe define mylab .a "not available"
                
                . label values value mylab
                
                . list
                
                     +-----------------------+
                     | treat           value |
                     |-----------------------|
                  1. | DrugA              43 |
                  2. | DrugB              31 |
                  3. | DrugA              56 |
                  4. | DrugB   not available |
                     +-----------------------+
                
                . sum value
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                       value |          3    43.33333    12.50333         31         56
                I saved this toy example as mytest.dta. Then, now in R, I imported it as a Stata file.

                Code:
                > mytest1 <- as.data.frame(mytest)
                > mytest1
                  treat value
                1 DrugA    43
                2 DrugB    31
                3 DrugA    56
                4 DrugB    NA
                > summary(mytest1)
                    treat               value      
                 Length:4           Min.   :31.00  
                 Class :character   1st Qu.:37.00  
                 Mode  :character   Median :43.00  
                                    Mean   :43.33  
                                    3rd Qu.:49.50  
                                    Max.   :56.00  
                                    NA's   :1
                Hopefully that helps.
                Last edited by Marcos Almeida; 05 Feb 2020, 06:24.
                Best regards,

                Marcos

                Comment

                Working...
                X