rolling percentile

Costas Antoniou

Join Date: Nov 2022

Posts: 27
#1

rolling percentile

05 Dec 2023, 08:30

Hello,
I have a panel dataset that looks like:
Id date x

I have multiple id's, and for each ID I have observations on a numeric X variable at different dates. The dates are not sequential (i.e., there are gaps between two dates) , and not always the same for all ID's.

I want to create a new variable that tells me what fraction of previous X values for the given ID are less than the current X. So basically, when the data is sorted by ID date, I want to know what fraction of X's are less than the current value of X for the specific ID.

Thank you,

Costas
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35698

05 Dec 2023, 14:24

Here's a stab at this using rangestat from SSC.

For values before the first, I take the line that there weren't any, so we can be confident that they weren't missing, but we don't know what they would have been, or at any rate we aren't entitled to report the fraction below the first value.

Also, if the present value is missing, I avoid the Stata convention that everything is less than missing in favour of a report that as we don't know the present value, we can't say what fraction of previous values were less.

Feel free to follow your own metaphysics.

Code:

webuse grunfeld, clear 

mata: 

mata clear 

real rowvector lessthan(colvector X) {
    scalar n, last, possible, actual 
    
    n = length(X); last = X[n]
    
    if (n == 1) return((0, 0, .)) 
    else {
        possible = nonmissing(X[1..n-1]); actual = sum(X[1..n-1] :< last)
        return((n-1, possible, (missing(last) ? . : actual/possible)))
    } 
}

end 

rangestat (lessthan) invest , int(year . 0) by(company)

rename (lessthan?) (n_previous n_nonmiss pr_lt)

l year invest *_* if company == 1

     +-------------------------------------------------+
     | year   invest   n_prev~s   n_nonm~s       pr_lt |
     |-------------------------------------------------|
  1. | 1935    317.6          0          0           . |
  2. | 1936    391.8          1          1           1 |
  3. | 1937    410.6          2          2           1 |
  4. | 1938    257.7          3          3           0 |
  5. | 1939    330.8          4          4          .5 |
     |-------------------------------------------------|
  6. | 1940    461.2          5          5           1 |
  7. | 1941      512          6          6           1 |
  8. | 1942      448          7          7   .71428571 |
  9. | 1943    499.6          8          8        .875 |
 10. | 1944    547.5          9          9           1 |
     |-------------------------------------------------|
 11. | 1945    561.2         10         10           1 |
 12. | 1946    688.1         11         11           1 |
 13. | 1947    568.9         12         12   .91666667 |
 14. | 1948    529.2         13         13   .69230769 |
 15. | 1949    555.1         14         14   .78571429 |
     |-------------------------------------------------|
 16. | 1950    642.9         15         15   .93333333 |
 17. | 1951    755.9         16         16           1 |
 18. | 1952    891.2         17         17           1 |
 19. | 1953   1304.4         18         18           1 |
 20. | 1954   1486.7         19         19           1 |
     +-------------------------------------------------+

Comment

Costas Antoniou

Join Date: Nov 2022

Posts: 27
#3

06 Dec 2023, 08:33

Hello Nick Cox , Thank you so much for this code, works perfectly!
with all best wishes,
Costas
Comment
Costas Antoniou

Join Date: Nov 2022

Posts: 27
#4

10 Jan 2024, 02:57

Hello Nick Cox

I am working on the code you kindly shared with me a while back, which works very well. However, my data have a feature that makes the result less than optimal.

Please see attached .dta file for an example of what my data looks like.

To remind you, I want to find what proportion of returns are less than the current value. However, I also want to include in the comparison the returns of the same day for each ID.

So, for example, for id==1 and date==02/01/2000, the reference set for company a is all the returns in the previous day for that ID (i.e., the returns of a, b and c in 01/01/2000) AND the returns of companies b and c in day 02/01/2000.

Any help with this would be greatly appreciated.

best wishes,

Costas
Attached Files

rolling2.dta (4.3 KB, 1 view)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

10 Jan 2024, 05:01

You're asked not to post .dta attachments. More importantly, that presumably is programmable, but you need quite a different approach. Someone else may want to make concrete suggestions.

Last edited by Nick Cox; 10 Jan 2024, 05:04.
Comment

Announcement