Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks. I think I get it now. Here is a demonstration using rangerun from SSC.

    There are lots of details, some simple, some subtle.

    Your variable is evidently not invest but data. Or -- if it is something else -- that is your call.

    Your time window appears to be -30 0, so the previous 30 days and this day. But you need a proper Stata daily date variable. Your string date variable is no use for this purpose.

    There are several definitions of rank and percentile rank. See also https://www.stata.com/support/faqs/s...ing-positions/ for one perspective.

    You may not have an analogue of the panel variable company. If not, there is no by() option call.

    Code:
    webuse grunfeld, clear
    
    capture program drop pcrank 
    
    program pcrank 
        count if invest < invest[_N] & invest < . 
        scalar i1 = r(N)
        count if invest == invest[_N] & invest < . 
        scalar i2 = r(N)
        count if invest < . 
        gen n = r(N)
        gen rank = i1 + i2 
        gen pcrank = (i1 + 0.5 * i2) / n 
    end 
    
    rangerun pcrank, use(invest) int(year -9 0) by(company)
    
    list invest n rank pcrank if company == 1 
    
         +-------------------------------+
         | invest    n   rank     pcrank |
         |-------------------------------|
      1. |  317.6    1      1         .5 |
      2. |  391.8    2      2        .75 |
      3. |  410.6    3      3   .8333333 |
      4. |  257.7    4      1       .125 |
      5. |  330.8    5      3         .5 |
         |-------------------------------|
      6. |  461.2    6      6   .9166667 |
      7. |    512    7      7   .9285714 |
      8. |    448    8      6      .6875 |
      9. |  499.6    9      8   .8333333 |
     10. |  547.5   10     10        .95 |
         |-------------------------------|
     11. |  561.2   10     10        .95 |
     12. |  688.1   10     10        .95 |
     13. |  568.9   10      9        .85 |
     14. |  529.2   10      6        .55 |
     15. |  555.1   10      7        .65 |
         |-------------------------------|
     16. |  642.9   10      9        .85 |
     17. |  755.9   10     10        .95 |
     18. |  891.2   10     10        .95 |
     19. | 1304.4   10     10        .95 |
     20. | 1486.7   10     10        .95 |
         +-------------------------------+

    Comment


    • #17
      Thanks a lot for your continuous help Nick.

      I have used the following code which seems to work, but all the observations get the same value (0.71xx).
      Total_buys is the variable I want to rank. id is a unique identifier for each observation. Date is a date which I believe is in good Stata format. Am I making any mistake? Thanks again!

      Code:
      program pcrank 
          count if total_buys < total_buys[_N] & total_buys < . 
          scalar i1 = r(N)
          count if total_buys == total_buys[_N] & total_buys < . 
          scalar i2 = r(N)
          count if total_buys < . 
          gen n = r(N)
          gen rank = i1 + i2 
          gen pcrank = (i1 + 0.5 * i2) / n 
      
      rangerun pcrank, use(total_buys) int(Date -30 -1) by(id)

      Comment


      • #18
        It is likely that you don't need by(id), which enforces ranking within id. As said in #16

        You may not have an analogue of the panel variable company. If not, there is no by() option call.

        Comment


        • #19
          I observed that the sequence 1~N+ generated by egen(group) for sorting purposes represents from small to large, but how to change it to 1~N+ represents from large to small?

          Comment


          • #20
            You can't do that with -egen-. But you can do it from first principles, as illustrated in the following code that uses the auto.dta:

            Code:
            sysuse auto, clear
            
            gsort -mpg
            gen `c(obs_t)' wanted = sum(mpg != mpg[_n-1] & !missing(mpg))

            Comment


            • #21
              Something like this:
              Code:
              egen unwanted = group(original_var) // or whatever egen ... group() statement you want
              qui su unwanted
              local num_groups = r(max)
              gen wanted = `num_groups' - unwanted + 1

              Comment


              • #22
                You could just negate a numeric variable first. If you wanted value labels in terms of the original, that can be done. Here I use labmask from the Stata Journal for the latter.

                Code:
                . sysuse auto, clear
                (1978 automobile data)
                
                . gen negmpg = -mpg
                
                . egen wanted = group(negmpg)
                
                . labmask wanted, values(mpg)
                
                . tab wanted
                
                group(negmp |
                         g) |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                         41 |          1        1.35        1.35
                         35 |          2        2.70        4.05
                         34 |          1        1.35        5.41
                         31 |          1        1.35        6.76
                         30 |          2        2.70        9.46
                         29 |          1        1.35       10.81
                         28 |          3        4.05       14.86
                         26 |          3        4.05       18.92
                         25 |          5        6.76       25.68
                         24 |          4        5.41       31.08
                         23 |          3        4.05       35.14
                         22 |          5        6.76       41.89
                         21 |          5        6.76       48.65
                         20 |          3        4.05       52.70
                         19 |          8       10.81       63.51
                         18 |          9       12.16       75.68
                         17 |          4        5.41       81.08
                         16 |          4        5.41       86.49
                         15 |          2        2.70       89.19
                         14 |          6        8.11       97.30
                         12 |          2        2.70      100.00
                ------------+-----------------------------------
                      Total |         74      100.00
                
                . tab wanted, nolabel
                
                group(negmp |
                         g) |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |          1        1.35        1.35
                          2 |          2        2.70        4.05
                          3 |          1        1.35        5.41
                          4 |          1        1.35        6.76
                          5 |          2        2.70        9.46
                          6 |          1        1.35       10.81
                          7 |          3        4.05       14.86
                          8 |          3        4.05       18.92
                          9 |          5        6.76       25.68
                         10 |          4        5.41       31.08
                         11 |          3        4.05       35.14
                         12 |          5        6.76       41.89
                         13 |          5        6.76       48.65
                         14 |          3        4.05       52.70
                         15 |          8       10.81       63.51
                         16 |          9       12.16       75.68
                         17 |          4        5.41       81.08
                         18 |          4        5.41       86.49
                         19 |          2        2.70       89.19
                         20 |          6        8.11       97.30
                         21 |          2        2.70      100.00
                ------------+-----------------------------------
                      Total |         74      100.00
                The terms small and large suggest that numeric variables are what you're thinking about.

                Comment


                • #23
                  And more succinctly following #21 and #22, to reverse ranking by negating the original values:

                  Code:
                   
                   egen unwanted = group(-original_var)

                  Comment


                  • #24
                    Leonardo Guizzetti No, unfortunately. That trick works with the rank() function of egen but not with group(). The argument for group() must be a varlist, a list of one or more variable names, and the negative sign is not allowed with that rule. The reason it is allowed with rank() is that the latter feeds on an expression, in which negative signs are perfectly legal.

                    Comment


                    • #25
                      Ah you’re right. I was definitely thinking of rank. Thanks for the correction.

                      Comment


                      • #26
                        This sub-thread goes back only to #19.

                        FWIW, here is another way to do it. Value labels are assigned automatically.

                        myaxis is from the Stata Journal. It was previously announced as added to SSC at https://www.statalist.org/forums/for...e-or-graph-use

                        SJ-21-3 st0654 . . Speaking Stata: Ordering or ranking groups of observations
                        (help myaxis if installed) . . . . . . . . . . . . . . . . N. J. Cox
                        Q3/21 SJ 21(3):818--837
                        discusses procedures for datasets based on aggregate
                        frequencies and for datasets based on individuals and
                        introduce a new convenience command, myaxis, that handles
                        many cases directly


                        Code:
                        sysuse auto, clear 
                        myaxis wanted=mpg, sort(mean mpg) descending

                        Comment

                        Working...
                        X