Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a loop for generating different percentiles

    Dear All,

    I want to create different percentiles (25,50, 75 percentiles accordingly) for my variable (exports) and then create a new dummy variable if my variable exceeds these percentiles. I did the following and it works, however I would like to do the same with using loops and therefore not bother to have many command lines. Here is my codes which I would like to incorporate in a loop.

    Code:
    egen p25 = pctile( exports ), p(25)
    gen above25=1 if exports>=p25
    replace above25=0 if exports<p25
    
    egen p50 = pctile( exports ), p(50)
    gen above50=1 if exports>=p50
    replace above50=0 if exports<p50
    and so on...

    I would appreciate if you can help me to do the same with using loops and therefore be more efficient in coding. Many thanks
    Last edited by Alex Boulders; 02 Nov 2016, 18:11. Reason: Loops, percentiles

  • #2
    Note that the combination of -gen- and -replace- to create a dichotomous variable is unnecessarily complicated. You should use Boolean expressions for that.

    Next, if the precentiles of interest are among 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th, then you don't need to go to -xtile- to get them. So the code is fairly succinct:

    Code:
    summ exports, detail
    foreach n of numlist 25 50 75 {
        gen byte above`n' = exports >= `r(p`n')'
    }
    The above code illustrates it with 25th, 50th, and 75th, but you can substitute your own list of interesting percentiles if they are drawn from the ones I mentioned above.

    If you have more "exotic" percentiles in mind, then -summarize, detail- doesn't provide them. But creating the "variables" p25, p50, etc. that don't actually vary is at best a waste of memory and effort. Instead of -pctile-, you can use -_pctile- and get the percentiles return in r(). Here's an example for the 25th, 30th, 35th, and 40th percentiles:

    Code:
    _pctile value, percentiles(25 30 35 40)
    return list // OPTIONAL, IF YOU WANT TO SEE THE PERCENTILES LISTED
    local i = 1
    foreach n of numlist 25 30 35 40 {
        gen byte above`n' = value >= `r(r`i')'
        local ++i
    }

    Comment


    • #3
      To chop into quartile bins and then create indicators (dummies in your terminology) is just two commands.

      Code:
      sysuse auto, clear
      xtile qmpg=mpg, nq(4)
      tab qmpg, gen(qmpg)
      To see what you did, one way is to use groups (install just once from SSC)

      Code:
       
      ssc inst groups 
      groups qmpg qmpg?, sepby(qmpg)
      
        +--------------------------------------------------------+
        | qmpg   qmpg1   qmpg2   qmpg3   qmpg4   Freq.   Percent |
        |--------------------------------------------------------|
        |    1       1       0       0       0      27     36.49 |
        |--------------------------------------------------------|
        |    2       0       1       0       0      11     14.86 |
        |--------------------------------------------------------|
        |    3       0       0       1       0      22     29.73 |
        |--------------------------------------------------------|
        |    4       0       0       0       1      14     18.92 |
        +--------------------------------------------------------+
      Note that in this example, and in many, many others, tied values inhibit the production of equal-sized groups. That is one of several reasons why many people here think this is a lousy method.

      Comment


      • #4
        Dear Members,

        Thank you very much for your valuable time and code suggestions. They worked and I learned more than I expected.

        Best wishes,

        Comment


        • #5
          Hi Alex,

          While there might be more sophisticated ways of doing this, i.e. comments #2 and #3, if all you are looking for is to see what your code would look like inside a loop, have a look at this looped version of your code below.
          Code:
          foreach lvl in 25 50 75 {
              egen p`lvl' = pctile( exports ), p(`lvl')
              gen above`lvl'=1 if exports>=p`lvl'
              replace above`lvl'=0 if exports<p`lvl'
          }
          The first line of the loop specifies what to loop over: the values 25, 50, and 75. The actual meat inside the loop is the exact same code as before, except that the actual values are replaced with the looping variable lvl (you can call this whatever you like). What's very important is that every time you call the lvl variable inside the loop, it must be enclosed in the brackets ` and '.

          I hope this helps.

          Comment


          • #6
            Dear Mathias,

            Thanks much. I got how it works. All these comments were helpful for me to write loops.
            Regards,

            Comment


            • #7
              There is some difference in interpretation of the question here, but I want to underline that Clyde's segment

              Code:
              summ exports, detail
              foreach n of numlist 25 50 75 {    
                   gen byte above`n' = exports >= `r(p`n')'
              }
              is much more direct than Matthias' code. You don't need to create a new variable for each different percentile, and you can create the indicators in one line, not two.


              Comment

              Working...
              X