How to flag observations that do not have a particular value on a variable?

Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#1

How to flag observations that do not have a particular value on a variable?

12 Dec 2021, 00:06

Hi statalist,

Please consider the following sample data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str3 var1 str6 var2 "id" "itemid" "111" "1" "111" "2" "111" "3" "112" "2" "113" "1" "113" "2" "114" "1" "114" "2" "114" "3" "115" "3" end

Here, hhs, are identified by the variable id. HHs can have different items of value, with the most valuable item having itemid=1 and items of lower value would therefore be numbered by 2, 3, etc. As can be seen, some hhs can have only one item of value, some can have more than one. Under these scheme, all hhs should have at least one item with itemid=1 (if it is their only valuable item, it would have itemid=1), and can have other items with higher values of itemid. Thus the number of observations when itemid=1 should equal the number of unique households, given by id.

However, there is some error in my data where I suspect some hhs do not have any item with itemid=1. For eg, in the sample data, id 112, and id 115 have one item each and should have had itemid=1 against each of these items. But instead, 112 has itemid 2 against it, and 115 has itemid 3 against it.

I am trying to write a code that would flag these two ids:112 and 115, which do not have 1 on their itemid but have some other number.

it would be great if anyone could suggest how to go about this problem. Please comment if I have not been able to explain the problem well and you need clarification.

Thanks,
Tags: None
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

12 Dec 2021, 00:44

Code:

egen flag = min(real(itemid)>1) ,by(id)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35221

12 Dec 2021, 05:12

This is just to flag that the data example is messed up because of a basic mistake: the variables are string because the first observation containing metadata should not have been imported.

Sometimes it is best to go back and re-do the import. In this case, you can also clean up like this

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 var1 str6 var2
"id"  "itemid"
"111" "1"     
"111" "2"     
"111" "3"     
"112" "2"     
"113" "1"     
"113" "2"     
"114" "1"     
"114" "2"     
"114" "3"     
"115" "3"     
end

foreach v of var var* { 
    rename `v' `=`v'[1]'
}

drop in 1 

destring, replace 

list, sepby(id)


     +--------------+
     |  id   itemid |
     |--------------|
  1. | 111        1 |
  2. | 111        2 |
  3. | 111        3 |
     |--------------|
  4. | 112        2 |
     |--------------|
  5. | 113        1 |
  6. | 113        2 |
     |--------------|
  7. | 114        1 |
  8. | 114        2 |
  9. | 114        3 |
     |--------------|
 10. | 115        3 |
     +--------------+

Comment

Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#4

12 Dec 2021, 06:04

Originally posted by Nick Cox View Post

This is just to flag that the data example is messed up because of a basic mistake: the variables are string because the first observation containing metadata should not have been imported.

Sometimes it is best to go back and re-do the import. In this case, you can also clean up like this

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str3 var1 str6 var2 "id" "itemid" "111" "1" "111" "2" "111" "3" "112" "2" "113" "1" "113" "2" "114" "1" "114" "2" "114" "3" "115" "3" end foreach v of var var* { rename `v' `=`v'[1]' } drop in 1 destring, replace list, sepby(id) +--------------+ | id itemid | |--------------| 1. | 111 1 | 2. | 111 2 | 3. | 111 3 | |--------------| 4. | 112 2 | |--------------| 5. | 113 1 | 6. | 113 2 | |--------------| 7. | 114 1 | 8. | 114 2 | 9. | 114 3 | |--------------| 10. | 115 3 | +--------------+

Hi Nick, could you please clarify what you meant by "first observation containing metadata should not have been imported"? These id variables are strings in my original data as well. Thanks
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#5

12 Dec 2021, 06:46

Your first observation

Code:

"id" "itemid"

manifestly should have been read in as variable names, not as data values.

Many users of spreadsheets and similar applications put metadata in the first row(s) of a worksheet.

So far, so good, but that approach makes no sense in Stata.

First of all, names or other text in the first observation condemn a variable to be string, which renders numeric calculations impossible.

Second, even if a variable really should be (or could be) string, the name of the variable should be ... the name of the variable.

The problem arises from an import in which an entire worksheet has been read in. You should know what happened, and even if the blame lies on whoever supplied the data, the data need cleaning up.
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#6

12 Dec 2021, 07:45

Originally posted by Nick Cox View Post

Your first observation

Code:

"id" "itemid"

manifestly should have been read in as variable names, not as data values.

Many users of spreadsheets and similar applications put metadata in the first row(s) of a worksheet.

So far, so good, but that approach makes no sense in Stata.

First of all, names or other text in the first observation condemn a variable to be string, which renders numeric calculations impossible.

Second, even if a variable really should be (or could be) string, the name of the variable should be ... the name of the variable.

The problem arises from an import in which an entire worksheet has been read in. You should know what happened, and even if the blame lies on whoever supplied the data, the data need cleaning up.

oh understood. Thanks a lot for clarifying!
Comment
Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#7

12 Dec 2021, 07:46

Originally posted by Øyvind Snilsberg View Post

Code:

egen flag = min(real(itemid)>1) ,by(id)

Thank you! it worked perfectly
Comment

Announcement

How to flag observations that do not have a particular value on a variable?

Comment

Comment

Comment

Comment

Comment

Comment