how to define low, medium and high in variable

jahan dgk

Join Date: Jul 2017
Posts: 6

how to define low, medium and high in variable

15 Jul 2017, 21:58

I want to add another variable "group" which is based on size(values), I want to include only those observations for which dev is 1.
size is distributed into three groups (low, medium,high) on the basis of size values.
Hence in the given data. first three values of size are (3,5,7) as said to be small. then medium(60,65,85) and at the end high (96,97 and 99)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 size float(equiv dev)
"3"  50 1
"5"  55 1
"7"  55 1
"10" 60 0
"12"  5 0
"25" 62 0
"35" 62 0
"45" 56 0
"50"  0 0
"60" 25 1
"65" 20 1
"70"  0 0
"75"  0 0
"76"  0 0
"80" 25 0
"85" 26 1
"90" 24 0
"95" 12 0
"96" 13 1
"97" 21 1
"99" 65 0
"99" 32 1
end

New data would be like this.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 size float(dev equiv) str6 group
"3"  1 50 "low"   
"5"  1 55 "low"   
"7"  1 55 "low"   
"10" 0 60 "."     
"12" 0  5 "."     
"25" 0 62 "."     
"35" 0 62 "."     
"45" 0 56 "."     
"50" 0  0 "."     
"60" 1 25 "medium"
"65" 1 20 "medium"
"70" 0  0 ""      
"75" 0  0 ""      
"76" 0  0 ""      
"80" 0 25 ""      
"85" 1 26 "medium"
"90" 0 24 ""      
"95" 0 12 ""      
"96" 1 13 "high"  
"97" 1 21 "high"  
"99" 0 65 ""      
"99" 1 32 "high"  
end

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35444

16 Jul 2017, 01:35

destring false string variables before you try this.

Also, string categories low medium high are inflexible. They sort high low medium which makes no sense.

Make your rules precise!

Code:

clear
input str4 size float(dev equiv) str6 group
"3"  1 50 "low"   
"5"  1 55 "low"   
"7"  1 55 "low"   
"10" 0 60 "."     
"12" 0  5 "."     
"25" 0 62 "."     
"35" 0 62 "."     
"45" 0 56 "."     
"50" 0  0 "."     
"60" 1 25 "medium"
"65" 1 20 "medium"
"70" 0  0 ""      
"75" 0  0 ""      
"76" 0  0 ""      
"80" 0 25 ""      
"85" 1 26 "medium"
"90" 0 24 ""      
"95" 0 12 ""      
"96" 1 13 "high"  
"97" 1 21 "high"  
"99" 0 65 ""      
"99" 1 32 "high"  
end

destring size, replace 

gen sizegroup = cond(missing(size), ., cond(size < 40, 1, cond(size < 80, 2, 3))) if dev == 1 

list, sepby(dev)

That isn't your rule precisely, so you must decide what is.

Code:

    +----------------------------------------+
     | size   dev   equiv    group   sizegr~p |
     |----------------------------------------|
  1. |    3     1      50      low          1 |
  2. |    5     1      55      low          1 |
  3. |    7     1      55      low          1 |
     |----------------------------------------|
  4. |   10     0      60        .          . |
  5. |   12     0       5        .          . |
  6. |   25     0      62        .          . |
  7. |   35     0      62        .          . |
  8. |   45     0      56        .          . |
  9. |   50     0       0        .          . |
     |----------------------------------------|
 10. |   60     1      25   medium          2 |
 11. |   65     1      20   medium          2 |
     |----------------------------------------|
 12. |   70     0       0                   . |
 13. |   75     0       0                   . |
 14. |   76     0       0                   . |
 15. |   80     0      25                   . |
     |----------------------------------------|
 16. |   85     1      26   medium          3 |
     |----------------------------------------|
 17. |   90     0      24                   . |
 18. |   95     0      12                   . |
     |----------------------------------------|
 19. |   96     1      13     high          3 |
 20. |   97     1      21     high          3 |
     |----------------------------------------|
 21. |   99     0      65                   . |
     |----------------------------------------|
 22. |   99     1      32     high          3 |
     +----------------------------------------+

The next step would be to attach value labels to the new variable.

Comment

jahan dgk

Join Date: Jul 2017

Posts: 6
#3

16 Jul 2017, 02:03

Originally posted by Nick Cox View Post

destring false string variables before you try this.

Also, string categories low medium high are inflexible. They sort high low medium which makes no sense.
destring size, replace

gen sizegroup = cond(missing(size), ., cond(size < 40, 1, cond(size < 80, 2, 3))) if dev == 1
list, sepby(dev)
The next step would be to attach value labels to the new variable.

Thank you Sir, I think I have not delivered my question properly.

The rule here is considering dev value 1 and dividing into low, medium, and high on the basis of size into three categories by rule of 1/3. top 1/3 are high, medium 1/3 are medium and lowest 1/3 are low.

suppose size values are between 1-99 and there are 9 values for dev=1. but these 9 are not evenly distributed (in the example dataset).
So I want to say it is not matter of size values. Its matter of total No of observation which are 9 in the given example then distributing them all into three categories. lower 3 are low then middle 3 are medium and so on.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

16 Jul 2017, 02:59

You could look into centile to find tertiles, but check your assumption about the niceness of your data.
1 like
Comment

Announcement

how to define low, medium and high in variable

Comment

Comment

Comment