Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing "NA" with missing

    I'm super new to using Stata and could really use some help! I have imported a csv file from R into Stata and am not sure if there is a succinct way for replacing all the NA values with missing. I have tried: replace 'var'= "." if 'var'== "NA" ....... However, I have 700 variables (many of which are string) so doing this one by one is taking way too long. Is there a better way to do this? Thank you in advance!!

  • #2
    Maybe something along the lines of the following.
    Code:
    ds , has(type string)
    foreach var of varlist `r(varlist)' {
        quietly replace `var' = "" if `var' == "NA" // Don't recommend `var' = "." (string variable's missing is "" and not ".")
    }

    Comment


    • #3
      Another a bit less elegant solution in the lines of what Joseph proposes (there must be a more elegant way to tell Stata than NA is missing, but I dont know it):

      Code:
      foreach var of varlist * {
      cap replace `var' = "" if `var'=="NA"
      }
      What Joseph did was extract the list of only string variables. Because I do not have this step but I am looping over all variables, I need to put the capture statement because otherwise my code will break at the first non-string variable.

      Comment


      • #4
        Stata doesn't attach any special meaning to the string "NA". Necessarily any single string can't capture "not available" "not applicable" "refused to answer" "test-tube dropped on floor" and many more reasons why there might be various kinds of missing or not directly informative string values.

        There is an important extra question for Olivia Silke -- to check whether should-be numeric variables have been misread as string, because of the text NA among instances of say 1, 2, 3.14159 and 42.

        If so


        Code:
        destring *, ignore("NA") 


        is indicated,

        Comment


        • #5
          Thank you Nick, Joro, and Joseph. This was so much help!!

          Comment


          • #6
            Olivia, I just recently came across your post, but Stata now has a simple way of reading NA as a missing value, at least in the case of numeric variables. This NA value arises as a matter of course in reading .csv files created by R. Using IMPORT DELIMITED with the NUMERICCOLS() option does the trick. Let's say that the third variable in dataset CENTRIC is numeric. Then Stata suggests,

            import delimited centric, numericcols(3)

            For numeric variables this converts NA to .

            There is actually an example under the IMPORT DELIMITED command that illustrates this using Stata's famous AUTO dataset.

            Comment


            • #7
              Sorry! I meant to write

              import delimited using centric, numericcols(3)

              Comment


              • #8
                James Adams The documentation does not state it explicitly (that I can find), but I strongly suspect that using the `numericcols()` option will ignore any value (not just "NA") in those column(s) that cannot be coerced to a numeric.
                This should be used with care, since there may be string data other than "NA" in said column(s).

                Comment


                • #9
                  Hi Everyone,
                  Please I also have a similar problem and I would need your assistance.
                  I have imported an excel data in Stata which contains 16 variables (2 string variables and 14 numeric variables) with some missing values (specifically "n.a." type of missing values) and all efforts to delete the "n.a." or replace them with recognized stata missing values such as "." or "" have all proved futile. I really don't know if there is a way to delete them or replace all the "n.a." with missing values.
                  In an attempt to replace them with missing values, I tried the following codes but it didn't work, I got a feedback which reads "type mismatch" :
                  foreach var of varlist * {
                  replace `var'="" if `var'=="NA"
                  }

                  foreach var of varlist * {
                  replace `var'="" if `var'=="n.a."
                  }

                  Moreover, in an attempt to delete all the observations with "n.a.", I tried the following codes but it also didn't work, I got the same feedback which reads "type mismatch" :
                  foreach var of varlist Interest_Revenue-Equity {
                  drop if `var'=="NA"
                  }

                  foreach var of varlist Interest_Revenue-Equity {
                  drop if `var'=="n.a."
                  }

                  Please is there a way to delete all the "n.a." or replace all the "n.a." with missing values???

                  Comment


                  • #10
                    Please see the second post in this thread. That error is most likely because at least one of your variables is numeric and you are trying to compare it to a string.

                    Comment


                    • #11
                      Patrick:
                      as an aside to previous helpful replies, a cautionary tale should be kept in mind before invoking -drop-, as Stata uses listwise dleteion and, as such, deletes all the observations with at least one missing values.
                      In the following example, while the (mistakenly planned) goal was to -drop- the missing value in one variable only, the Stata (as expected) deleted the entire observation:
                      Code:
                      . set obs 3
                      Number of observations (_N) was 0, now 3.
                      
                      . g name="John" in 1
                      
                      . replace name="Josh" in 3
                      
                      . g surname="White" in 1
                      
                      . replace surname="Black" in 2
                      
                      . replace surname="Brown" in 3
                      
                      . list
                      
                           +----------------+
                           | name   surname |
                           |----------------|
                        1. | John     White |
                        2. |          Black |
                        3. | Josh     Brown |
                           +----------------+
                      
                      . foreach var of varlist name-surname {
                        2. drop if `var'==""
                        3.  }
                      (1 observation deleted)
                      (0 observations deleted)
                      
                      . list
                      
                           +----------------+
                           | name   surname |
                           |----------------|
                        1. | John     White |
                        2. | Josh     Brown |
                           +----------------+
                      
                      .
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Hello Carlo,
                        I just tried your code below now and I still got the same feedback "type mismatch"

                        foreach var of varlist Interest_Revenue-Equity { drop if `var'==""
                        }

                        I also tried another code below now and it didn't delete the "n.a." I got several feedbacks that "0 observations deleted"

                        foreach var of varlist Interest_Revenue-Equity { drop if `var'==.
                        }

                        Specifically, it seems there is a problem with "n.a." in my dataset and Stata doesn't recognize that...

                        Comment


                        • #13
                          Patrick Donkor try this code:
                          Code:
                          foreach var of varlist Interest_Revenue-Equity {
                              capture confirm string var `var'
                              if !_rc {
                                  replace `var' == "" if `var'=="n.a."
                              }
                          }
                          Last edited by Hemanshu Kumar; 20 Nov 2022, 09:21.

                          Comment


                          • #14
                            Hemanshu Kumar, please I just tried your code now. Although I didn't get any error message after executing the command, the "n.a." were not deleted. They are still in the dataset after executing the command.

                            Comment


                            • #15
                              Patrick Donkor the only way that is possible is if the string is not exactly "n.a." -- or you are looking at a numeric variable for which certain values have been labeled as "n.a.". Avoiding these sorts of problems in troubleshooting is exactly why the Statalist FAQ advise posting a data extract using the dataex command. If you are not able to figure out the issue yourself, could you please post a data extract?

                              Comment

                              Working...
                              X