Dropping Missing Observations

Jeta Statovci

Join Date: Jun 2018

Posts: 6
#1

Dropping Missing Observations

19 Jun 2018, 06:32

How to drop observations with missing values on specific variables (X1, X2, X3) not all?

Thank you in advance
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

19 Jun 2018, 06:41

It's not clear to me quite what you are asking.

If you drop an observation you drop all the values it contains on all variables.
1 like
Comment
Jeta Statovci

Join Date: Jun 2018

Posts: 6
#3

19 Jun 2018, 07:50

Lets assume i have 7 variables, but i want to drop only observations that have 0 in all three variables (for example 0 for Cancer (X1), 0 for diabetes (X2), and 0 for high blood pressure(X3)); as long as the individual has any of such condition it remains in the sample. I dont know what command to use, in order to drop only individuals with neither of those three conditions.
Comment
Igor Paploski

Join Date: Oct 2014

Posts: 174
#4

19 Jun 2018, 07:57

Try:

Code:

drop if (cancer == 0 & diabetes == 0 & highbloodpressure == 0)

This will drop all observations that have 0 as the value for those variables. Please note that 0 is not missing. Missing is generally expressed in Stata as a dot ".". Also, please note that the code above will drop all observations (rows) for which cancer, diabetes and highbloodpressure are 0. Other variables for those observations that might not be 0 will also be dropped.
1 like
Comment
Jeta Statovci

Join Date: Jun 2018

Posts: 6
#5

19 Jun 2018, 08:03

Thank you Igor, I was referring to 0 (value for no condition) but used the wrong terminology, that you for pointing that out.
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#6

19 Jun 2018, 08:10

You can try:

drop if missing(X1)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#7

19 Jun 2018, 09:33

It may be redundant to say, but why not just generating a taq variable? This way, you have the full data set. The complete cases analysis could be done by adding the "if"clause. Lately, if you need to perform a sensitivity analysis - out of destin, your sense of duty or an express order of the reviewers - you still have the conditions to do it. The command - gen complete_case if !missing(var1 var2 var3 etc.) - would do the trick. Last but not least, there several models where missing data (well, MAR) are handled nicely.

Best regards,

Marcos
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#8

19 Jun 2018, 11:48

To long-term Stata users mention of "drop" usually implies use of the drop command (note the typographical distinction). If you mean "ignore" then it's simplest to use an if condition, as in #4,, except don't say drop:

Code:

... if (cancer == 0 & diabetes == 0 & highbloodpressure == 0) ... if !(cancer == 0 & diabetes == 0 & highbloodpressure == 0)

Note particularly the negation.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#9

20 Jun 2018, 11:00

Since Jeta is a new user, let me mention that dropping observations may be useful in selecting a data set to do further calculations, but is often not necessary when you can put equivalent conditions in statements as Nick shows just above. Also, Stata estimation routines automatically drop observations with missing values on any of the variables - beginners often think they need to drop observations with missing data before a regression, but Stata handles this for you.
1 like
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#10

26 Apr 2021, 09:48

Jeta Statovci also if you are looking for a more general solution, you can count the missing values in each variable as follows:

Code:

egen nmis=rmiss(*)

and dropping observations with missing in a custom number of variables.
I guess now there is also rmiss2 in egen
Comment

Announcement

Dropping Missing Observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment