Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to define low, medium and high in variable

    I want to add another variable "group" which is based on size(values), I want to include only those observations for which dev is 1.
    size is distributed into three groups (low, medium,high) on the basis of size values.
    Hence in the given data. first three values of size are (3,5,7) as said to be small. then medium(60,65,85) and at the end high (96,97 and 99)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 size float(equiv dev)
    "3"  50 1
    "5"  55 1
    "7"  55 1
    "10" 60 0
    "12"  5 0
    "25" 62 0
    "35" 62 0
    "45" 56 0
    "50"  0 0
    "60" 25 1
    "65" 20 1
    "70"  0 0
    "75"  0 0
    "76"  0 0
    "80" 25 0
    "85" 26 1
    "90" 24 0
    "95" 12 0
    "96" 13 1
    "97" 21 1
    "99" 65 0
    "99" 32 1
    end
    New data would be like this.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 size float(dev equiv) str6 group
    "3"  1 50 "low"   
    "5"  1 55 "low"   
    "7"  1 55 "low"   
    "10" 0 60 "."     
    "12" 0  5 "."     
    "25" 0 62 "."     
    "35" 0 62 "."     
    "45" 0 56 "."     
    "50" 0  0 "."     
    "60" 1 25 "medium"
    "65" 1 20 "medium"
    "70" 0  0 ""      
    "75" 0  0 ""      
    "76" 0  0 ""      
    "80" 0 25 ""      
    "85" 1 26 "medium"
    "90" 0 24 ""      
    "95" 0 12 ""      
    "96" 1 13 "high"  
    "97" 1 21 "high"  
    "99" 0 65 ""      
    "99" 1 32 "high"  
    end

  • #2
    destring false string variables before you try this.

    Also, string categories low medium high are inflexible. They sort high low medium which makes no sense.

    Make your rules precise!

    Code:
    clear
    input str4 size float(dev equiv) str6 group
    "3"  1 50 "low"   
    "5"  1 55 "low"   
    "7"  1 55 "low"   
    "10" 0 60 "."     
    "12" 0  5 "."     
    "25" 0 62 "."     
    "35" 0 62 "."     
    "45" 0 56 "."     
    "50" 0  0 "."     
    "60" 1 25 "medium"
    "65" 1 20 "medium"
    "70" 0  0 ""      
    "75" 0  0 ""      
    "76" 0  0 ""      
    "80" 0 25 ""      
    "85" 1 26 "medium"
    "90" 0 24 ""      
    "95" 0 12 ""      
    "96" 1 13 "high"  
    "97" 1 21 "high"  
    "99" 0 65 ""      
    "99" 1 32 "high"  
    end
    
    destring size, replace 
    
    gen sizegroup = cond(missing(size), ., cond(size < 40, 1, cond(size < 80, 2, 3))) if dev == 1 
    
    list, sepby(dev)
    That isn't your rule precisely, so you must decide what is.


    Code:
        +----------------------------------------+
         | size   dev   equiv    group   sizegr~p |
         |----------------------------------------|
      1. |    3     1      50      low          1 |
      2. |    5     1      55      low          1 |
      3. |    7     1      55      low          1 |
         |----------------------------------------|
      4. |   10     0      60        .          . |
      5. |   12     0       5        .          . |
      6. |   25     0      62        .          . |
      7. |   35     0      62        .          . |
      8. |   45     0      56        .          . |
      9. |   50     0       0        .          . |
         |----------------------------------------|
     10. |   60     1      25   medium          2 |
     11. |   65     1      20   medium          2 |
         |----------------------------------------|
     12. |   70     0       0                   . |
     13. |   75     0       0                   . |
     14. |   76     0       0                   . |
     15. |   80     0      25                   . |
         |----------------------------------------|
     16. |   85     1      26   medium          3 |
         |----------------------------------------|
     17. |   90     0      24                   . |
     18. |   95     0      12                   . |
         |----------------------------------------|
     19. |   96     1      13     high          3 |
     20. |   97     1      21     high          3 |
         |----------------------------------------|
     21. |   99     0      65                   . |
         |----------------------------------------|
     22. |   99     1      32     high          3 |
         +----------------------------------------+
    The next step would be to attach value labels to the new variable.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      destring false string variables before you try this.

      Also, string categories low medium high are inflexible. They sort high low medium which makes no sense.
      destring size, replace

      gen sizegroup = cond(missing(size), ., cond(size < 40, 1, cond(size < 80, 2, 3))) if dev == 1
      list, sepby(dev)
      The next step would be to attach value labels to the new variable.
      Thank you Sir, I think I have not delivered my question properly.

      The rule here is considering dev value 1 and dividing into low, medium, and high on the basis of size into three categories by rule of 1/3. top 1/3 are high, medium 1/3 are medium and lowest 1/3 are low.

      suppose size values are between 1-99 and there are 9 values for dev=1. but these 9 are not evenly distributed (in the example dataset).
      So I want to say it is not matter of size values. Its matter of total No of observation which are 9 in the given example then distributing them all into three categories. lower 3 are low then middle 3 are medium and so on.

      Comment


      • #4
        You could look into centile to find tertiles, but check your assumption about the niceness of your data.

        Comment

        Working...
        X