Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Missing Observations

    How to drop observations with missing values on specific variables (X1, X2, X3) not all?

    Thank you in advance

  • #2
    It's not clear to me quite what you are asking.

    If you drop an observation you drop all the values it contains on all variables.

    Comment


    • #3
      Lets assume i have 7 variables, but i want to drop only observations that have 0 in all three variables (for example 0 for Cancer (X1), 0 for diabetes (X2), and 0 for high blood pressure(X3)); as long as the individual has any of such condition it remains in the sample. I dont know what command to use, in order to drop only individuals with neither of those three conditions.

      Comment


      • #4
        Try:

        Code:
        drop if (cancer == 0 & diabetes == 0 & highbloodpressure == 0)
        This will drop all observations that have 0 as the value for those variables. Please note that 0 is not missing. Missing is generally expressed in Stata as a dot ".". Also, please note that the code above will drop all observations (rows) for which cancer, diabetes and highbloodpressure are 0. Other variables for those observations that might not be 0 will also be dropped.

        Comment


        • #5
          Thank you Igor, I was referring to 0 (value for no condition) but used the wrong terminology, that you for pointing that out.

          Comment


          • #6
            You can try:

            drop if missing(X1)

            Comment


            • #7
              It may be redundant to say, but why not just generating a taq variable? This way, you have the full data set. The complete cases analysis could be done by adding the "if"clause. Lately, if you need to perform a sensitivity analysis - out of destin, your sense of duty or an express order of the reviewers - you still have the conditions to do it. The command - gen complete_case if !missing(var1 var2 var3 etc.) - would do the trick. Last but not least, there several models where missing data (well, MAR) are handled nicely.
              Best regards,

              Marcos

              Comment


              • #8
                To long-term Stata users mention of "drop" usually implies use of the drop command (note the typographical distinction). If you mean "ignore" then it's simplest to use an if condition, as in #4,, except don't say drop:

                Code:
                ... if (cancer == 0 & diabetes == 0 & highbloodpressure == 0)
                
                ... if !(cancer == 0 & diabetes == 0 & highbloodpressure == 0)
                Note particularly the negation.

                Comment


                • #9
                  Since Jeta is a new user, let me mention that dropping observations may be useful in selecting a data set to do further calculations, but is often not necessary when you can put equivalent conditions in statements as Nick shows just above. Also, Stata estimation routines automatically drop observations with missing values on any of the variables - beginners often think they need to drop observations with missing data before a regression, but Stata handles this for you.

                  Comment


                  • #10
                    Jeta Statovci also if you are looking for a more general solution, you can count the missing values in each variable as follows:
                    Code:
                    egen nmis=rmiss(*)
                    and dropping observations with missing in a custom number of variables.
                    I guess now there is also rmiss2 in egen

                    Comment

                    Working...
                    X