Drop values that are more than 0.1 decimal points

Shailaja Tiwari

Join Date: Dec 2017
Posts: 73

Drop values that are more than 0.1 decimal points

17 Nov 2021, 23:07

Dear all
I have a simple question but not sure how to address it. I have the following dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double hhs_id int menu double weight
474.1  315   20
474.1 2891  585
474.1 2773 2501
474.1 2771  550
474.1 2781 1080
474.1 2772 2714
474.1  303   20
474.1 2522    3
474.1 2772   60
474.1  316   10
474.1  313 1000
474.1  303   20
474.1 2881  117
474.1 2892 2695
474.2 2891  486
474.2  621  250
474.2  303   20
474.2 2771  863
474.2 2892  405
474.2 2522    8
474.2 2781 1152
474.2 2881  706
474.2 2772 1751
end
label values menu x1_05
label def x1_05 303 "Biscuit", modify
label def x1_05 313 "Any boiled food", modify
label def x1_05 315 "Betel Leaf", modify
label def x1_05 316 "Supari", modify
label def x1_05 621 "White Sweet Potato", modify
label def x1_05 2522 "Salt (Iodine)", modify
label def x1_05 2771 "Rice/Jao", modify
label def x1_05 2772 "Rice/Jao", modify
label def x1_05 2773 "Rice/Jao", modify
label def x1_05 2781 "Panta Bhaat", modify
label def x1_05 2881 "Bhaji", modify
label def x1_05 2891 "Jhol curry", modify
label def x1_05 2892 "Jhol curry", modify

For the variable hhs_id I want to drop the observations that are greater than .1 decimal points. For instance in the dataex example above I want to retain values that are precisely 474.1 but want to drop values that are 474.2. Any advice is welcome.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17748

18 Nov 2021, 01:02

Shailja:
I would recommend you to flag instead of -drop-ping the observations you're not interested in and rule them out from statatistical analyses via an -if- clause.
You can start with flagging them this way:

Code:

. g wanted=1 if hhs_id>474.1
(14 missing values generated)

. replace wanted=0 if hhs_id<=474.1
(14 real changes made)

. list

     +-----------------------------------------------+
     | hhs_id                 menu   weight   wanted |
     |-----------------------------------------------|
  1. |  474.1           Betel Leaf       20        0 |
  2. |  474.1           Jhol curry      585        0 |
  3. |  474.1             Rice/Jao     2501        0 |
  4. |  474.1             Rice/Jao      550        0 |
  5. |  474.1          Panta Bhaat     1080        0 |
     |-----------------------------------------------|
  6. |  474.1             Rice/Jao     2714        0 |
  7. |  474.1              Biscuit       20        0 |
  8. |  474.1        Salt (Iodine)        3        0 |
  9. |  474.1             Rice/Jao       60        0 |
 10. |  474.1               Supari       10        0 |
     |-----------------------------------------------|
 11. |  474.1      Any boiled food     1000        0 |
 12. |  474.1              Biscuit       20        0 |
 13. |  474.1                Bhaji      117        0 |
 14. |  474.1           Jhol curry     2695        0 |
 15. |  474.2           Jhol curry      486        1 |
     |-----------------------------------------------|
 16. |  474.2   White Sweet Potato      250        1 |
 17. |  474.2              Biscuit       20        1 |
 18. |  474.2             Rice/Jao      863        1 |
 19. |  474.2           Jhol curry      405        1 |
 20. |  474.2        Salt (Iodine)        8        1 |
     |-----------------------------------------------|
 21. |  474.2          Panta Bhaat     1152        1 |
 22. |  474.2                Bhaji      706        1 |
 23. |  474.2             Rice/Jao     1751        1 |
     +-----------------------------------------------+

.

or, coded up more efficiently,:

Code:

gen wanted2=cond(missing( hhs_id ), ., cond(hhs_id >474.1,1,0))

. list

     +---------------------------------------------------------+
     | hhs_id                 menu   weight   wanted   wanted2 |
     |---------------------------------------------------------|
  1. |  474.1           Betel Leaf       20        0         0 |
  2. |  474.1           Jhol curry      585        0         0 |
  3. |  474.1             Rice/Jao     2501        0         0 |
  4. |  474.1             Rice/Jao      550        0         0 |
  5. |  474.1          Panta Bhaat     1080        0         0 |
     |---------------------------------------------------------|
  6. |  474.1             Rice/Jao     2714        0         0 |
  7. |  474.1              Biscuit       20        0         0 |
  8. |  474.1        Salt (Iodine)        3        0         0 |
  9. |  474.1             Rice/Jao       60        0         0 |
 10. |  474.1               Supari       10        0         0 |
     |---------------------------------------------------------|
 11. |  474.1      Any boiled food     1000        0         0 |
 12. |  474.1              Biscuit       20        0         0 |
 13. |  474.1                Bhaji      117        0         0 |
 14. |  474.1           Jhol curry     2695        0         0 |
 15. |  474.2           Jhol curry      486        1         1 |
     |---------------------------------------------------------|
 16. |  474.2   White Sweet Potato      250        1         1 |
 17. |  474.2              Biscuit       20        1         1 |
 18. |  474.2             Rice/Jao      863        1         1 |
 19. |  474.2           Jhol curry      405        1         1 |
 20. |  474.2        Salt (Iodine)        8        1         1 |
     |---------------------------------------------------------|
 21. |  474.2          Panta Bhaat     1152        1         1 |
 22. |  474.2                Bhaji      706        1         1 |
 23. |  474.2             Rice/Jao     1751        1         1 |
     +---------------------------------------------------------+

.

Last edited by Carlo Lazzaro; 18 Nov 2021, 01:07.

Kind regards,
Carlo
(Stata 19.0)

Comment

Shailaja Tiwari

Join Date: Dec 2017
Posts: 73

18 Nov 2021, 01:20

Dear Carlo Lazzaro Thank you very much for your reply. Flagging the values more than .1 decimal would work very well for me.
However my variable hhs_id ranges from 1 to 6503. What i shared above was a small portion of my data, so how can i adapt the code you mention above to it?
I am sharing a larger portion of the data below

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double hhs_id int(menu weight)
495.1   12   28
495.1  272  150
495.1  293  140
495.1  311  155
495.1  315   12
495.1  316   10
495.1 2522   10
495.1 2771 1918
495.1 2772 3063
495.1 2881  297
495.1 2882  321
495.1 2883  370
495.1 2901  583
495.1 2902  940
495.1 2911  846
495.2   57    3
495.2  272  308
495.2  272  140
495.2  306   45
495.2  310   20
495.2  311  160
495.2 2522    6
495.2 2771 1767
495.2 2772 3285
495.2 2881  215
495.2 2891 1093
495.2 2901  415
495.2 2902  552
495.2 2903  955
495.2 2904  509
496.1   16  140
496.1  132  180
496.1  303  105
496.1  315   10
496.1  316    8
496.2 2771 4571
496.2 2811  480
496.2 2881  206
496.2 2882  506
496.2 2891  969
496.2 2901  788
496.3  272  146
496.3  284  360
496.3  303   20
497.1  315    2
497.1  316    4
497.1 2522    3
497.1 2771 1169
497.1 2772 2163
497.1 2881  439
497.2 2882  436
497.2 2901  385
497.2  298   35
497.2  305   50
497.2  312   90
end
label values menu x1_05
label def x1_05 12 "Muri/Khoi (puffed rice)", modify
label def x1_05 16 "Cerelac", modify
label def x1_05 57 "Green chili", modify
label def x1_05 132 "Milk", modify
label def x1_05 272 "Tea ?prepared", modify
label def x1_05 284 "Paes/firni/cooked firni", modify
label def x1_05 293 "Sweets", modify
label def x1_05 298 "Piaju", modify
label def x1_05 303 "Biscuit", modify
label def x1_05 305 "Patties", modify
label def x1_05 306 "Chips", modify
label def x1_05 310 "Murali", modify
label def x1_05 311 "Nimki", modify
label def x1_05 312 "Any fried food", modify
label def x1_05 315 "Betel Leaf", modify
label def x1_05 316 "Supari", modify
label def x1_05 2522 "Salt (Iodine)", modify
label def x1_05 2771 "Rice/Jao 1", modify
label def x1_05 2772 "Rice/Jao 2", modify
label def x1_05 2811 "Ruti/Parota 1", modify
label def x1_05 2881 "Bhaji 1", modify
label def x1_05 2882 "Bhaji 2", modify
label def x1_05 2883 "Bhaji 3", modify
label def x1_05 2891 "Jhol curry 1", modify
label def x1_05 2901 "Bhuna curry 1", modify
label def x1_05 2902 "Bhuna curry 2", modify
label def x1_05 2903 "Bhuna curry 3", modify
label def x1_05 2904 "Bhuna curry 4", modify
label def x1_05 2911 "Daal 1", modify

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3467
#4

18 Nov 2021, 01:27

Carlo's solution only works when all hhid's have the same "stem" of 474. I suspect that that is not the case in Shailaja's dataset. I suspect that hhid contains two bits of information: the stem and some additional info behind the decimal point (maybe wave?).

This is not the right way to store that information. The problem is that computers store numbers in binary, and .1 in binary is like 1/3 in decimal, and just cannot be stored exactly. The best thing you can do to first split the hhid variable up into its two parts

Code:

gen hhid_stem = floor(hhid) // choose between of the below: gen hhid_rest = round(mod(hhid,1)*10) // assuming that you are only interested in only the first digit behind the decimal point gen hhid_rest = round(mod(hhid,1)*100) // assuming that you are only interested in only the first two digits behind the decimal point gen hhid_rest = round(mod(hhid,1)*1000) // assuming that you are only interested in only the first three digits behind the decimal point etc.

I would use more informative names, but since I don't know your dataset, I don't know what these parts are supposed to mean.

After that you can use the if condition, something like sum weight if hhid_rest == 1

You can drop that part of the dataset if you really don't want it: drop if hhid_rest > 1

Or you can keep your original data intact, but move your selection to another frame:

Code:

frame put hhs_id_stem menu weight if hhid_rest == 1, into(hhid1) frame change hhid1

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#5

18 Nov 2021, 01:28

For the case of #3, it would be:

Code:

drop if (hhs_id - int(hhs_id)) > float(0.1)

Crossed with #4.
1 like
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3467

18 Nov 2021, 01:35

Applying my solutions to your new data:

Code:

. gen hhs_id_stem = floor(hhs_id)

. gen hhs_id_rest = round(mod(hhs_id,1)*10)

.
. // solution 1
. sum weight if hhs_id_rest == 1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      weight |         26    502.5385    777.9512          2       3063

.
. // solution 3
. frame put hhs_id_stem menu weight if hhs_id_rest == 1, into(hhs_id_1)

. frame change hhs_id_1

. sum weight

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      weight |         26    502.5385    777.9512          2       3063

. frame change default

.
. // solution 2
. drop if hhs_id_rest > 1
(29 observations deleted)

. sum weight

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      weight |         26    502.5385    777.9512          2       3063

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#7

18 Nov 2021, 01:42

Dear Maarten Buis and Fei Wang thank you very much for the help. The solutions work very well.

Maarten Buis I will indeed try to use more informative names.
The hhs_id variables denotes household identification numbers. and the decimal points (such as .1 and .2) denote is assigned to denote the number of households formed from the original households.

Thanks again
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17748
#8

18 Nov 2021, 01:53

Shailaja:
the next step is a loop, then.

Kind regards,
Carlo
(Stata 19.0)
Comment
Shailaja Tiwari

Join Date: Dec 2017

Posts: 73
#9

18 Nov 2021, 02:21

Thank you Carlo Lazzaro
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#10

18 Nov 2021, 04:31

Originally posted by Shailaja Tiwari View Post

Maarten Buis I will indeed try to use more informative names.

My comment on using more informative names did not refer to your choice of names, they are fine. I referred to the names I chose for the variables I created (hhs_id_stem and hhs_id_rest). There I chose generic names, because I did not know the context, and I encouraged you to chose better names than my generic ones.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35809
#11

18 Nov 2021, 05:11

I will throw another idea into the pot, which is to think in terms of the equivalent string.

But first, as explained in #7 these look like composite identifiers for individuals within households.

So, what If there are 10, 20, .... members in a household? How does Stata know the difference between 123.1 and 123.10, and so on?
Comment

Announcement

Drop values that are more than 0.1 decimal points

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment