Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace zeroes with missing values?

    I've looking to change all the zeroes in a varlist to missing values. I think I first need to convert the values to string first and have tried to use tostring however it says that the varlist cannot be converted reversibly; what does this mean and is there a better way to do this? Thanks


  • #2
    Welcome to the Stata Forum/Statalist,

    First, you need to be fully aware of what is happening, I mean, the reason for the zeros. Just replacing them with missing values may not be the appropriate approach in several cases.

    That said, something like this will do the task:

    Code:
    replace varname = . if varname == 0
    Best regards,

    Marcos

    Comment


    • #3
      There is a dedicated command for this

      Code:
      help mvdecode
      Conversion to string first is not only unneeded, it would cause extra problems, as in your error message.

      That said, the ideal behind tostring is that you lose no information and the test of that ideal is that real(string()) gets you back where you started.

      Comment


      • #4
        To avoid confusing the 0s you want to treat as missing with other missing values (assuming they are possible), I would use .z rather than . as the missing value. E.g.,

        Code:
        clear
        set obs 100
        generate byte x1 = rbinomial(3,.5)
        * Insert a few missing values
        replace x1 = . if inlist(_n,18,51,77)
        generate byte x2 = x1
        replace x2 = .z if x2==0
        tab x1 x2, missing
        Output from the -tab- command:

        Code:
        . tab x1 x2, missing
        
                   |                           x2
                x1 |         1          2          3          .         .z |     Total
        -----------+-------------------------------------------------------+----------
                 0 |         0          0          0          0         14 |        14
                 1 |        36          0          0          0          0 |        36
                 2 |         0         37          0          0          0 |        37
                 3 |         0          0         10          0          0 |        10
                 . |         0          0          0          3          0 |         3
        -----------+-------------------------------------------------------+----------
             Total |        36         37         10          3         14 |       100

        HTH.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          If zeros are irrelevant for some purpose, rather than missing in any strong sense, it is often best just to ignore them, not change them to missing;

          Comment


          • #6
            Thanks guys, I decided to go with Marcos' solution. It's what I tried first, but I did ' replace varname = "." ' instead of 'replace varname = . ' and it threw an error relating strings and that's why I thought I had to convert the zeroes to string first.

            As for why I'm changing them to missing, it's because they were originally missing and I changed them to zero to see if they changed the regression output much (they didn't), now I'm just changing them back.

            Comment


            • #7
              The title of this thread may attract many readers, so let's spell out some of the dangers here.

              First, be careful about calculations back and forth, even on a mechanistic level.

              If you replace missings with zeros and then change your mind and go back, you may now have mangled your data. Suppose 42 observations are initially missing and 666 initially zero. Then if you change zero to missing you now have 708 missings. Change missing to zero and you now have 708 zeros.

              Always keep a back-up of the original data.

              Second, the interchangeability of zero and missing, or vice versa, can only rest on subject-matter knowledge. It could arise like this. Suppose there is a survey of cigarette smoking. On detailed examination it turns out that anyone in the survey who reported being a non-smoker was put down as missing on their cigarette consumption. This circumstance is one where zero is a defensible value. Then again, depending on the goals of the survey, non-smokers may be irrelevant for some or all analyses.

              Another example could be the length of coastline for each country (itself an extraordinarily problematic thing to measure, but that's another story). Some countries are landlocked and have no coast. Missing and zero could both be defensible here.

              What is certain is that Stata will take all zeros literally, meaning numerically. There is no intelligence inside Stata thinking that zero means irrelevant, or anything else. If you tell Stata that some people have a height of zero, it will believe you.

              Comment

              Working...
              X