Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I keep "missing values" missing?

    Hello everyone,

    I am currently working with Panel Data (firm, year) for my seminar paper and I am in the phase of preparing the data for the analysis. My problem now is as followed:

    I generated the variable CETR (Cash Effective Tax Rate) with the command
    gen CETR = CF_TAXATION / PRETAX_INCOME

    The results included some negative values, some values larger than 1 and also missing values.

    Now, in an effort to control for outliers I wanted to winsorize the values for CETR to 0 and 1, i.e. if CETR has a value >1 it should be defined as 1 and if CETR<0 it should be defined as 0.
    replace CETR=0 if CETR<0
    replace CETR=1 if CETR>1

    After looking at the results, I observed that Stata now assigned the value 1 to originally missing data of CETR, because Stata treats missing values as positive infinity. Since I have a significant amount of missing data this biases my results substantially. So my question is therefore, how do I have to alter the previous commands to prevent such a biased result or i.e. how do I tell Stata to keep missing values missing in such a setting?

    Thanks in advance and kind regards,

    Lucas

  • #2
    In this

    Stata treats missing values as positive infinity
    you have the seed of the answer.
    Code:
    replace CETR=0 if CETR<0
    replace CETR=1 if CETR>1 & CETR<.

    Comment


    • #3
      There's lots of good reading to do at -help missing-, in addition to William's helpful answers. My personal preference is to use the -missing()- (or its synonym, -mi()-) function to identify missing values because I like how it reads. It's also helpful if others need to read the code but aren't as experienced when reading Stata code.

      Comment


      • #4
        Leonardo's excellent advice reminds me that I typically use the missing() function for the same reasons he recommends it; in this context it would be
        Code:
        replace CETR=0 if CETR<0
        replace CETR=1 if CETR>1 & ! missing(CETR)
        In post #2 I wanted to show that you were just "epsilon" from having found the answer yourself by showing a solution that recognized the crucial fact you had already realized. That way perhaps Stata looks a little less mysterious.

        But do note from the output of help missing() (notice the parentheses, this is different from help missing) that missing() will take multiple arguments, facilitating code like
        Code:
        generate two_big = x1>1000 & x2>1000 & ! missing(x1,x2)
        which will set two_big to . if either x1 or x2 is missing, otherwise to 1 if both exceed 1000, and otherwise to 0.

        Comment

        Working...
        X