Help in generating variable that gives first missing observation after a series of minimum consecutive observations

John Carfi

Join Date: Nov 2019
Posts: 3

Help in generating variable that gives first missing observation after a series of minimum consecutive observations

10 Nov 2019, 18:23

Dear members,

I am writing with respect to the above indicated topic after looking on manuals and FAQ but not finding an exact answer.

I am working on panel data about mutual funds, with monthly information for the time dimension. As part of my filtering analysis, I have isolated panels (i.e. mutual funds) that have at least 12 months of consecutive non-missing values under a specific variable called 'Score' through the tsspell command and some additional functions.

Now, I have one dummy variable ('high_sust') that reports for each of these filtered mutual funds a constant observation each month. The issue is that I would like to replace this observation with a missing value from the month that the funds' consecutive non-missing values under the variable 'Score' terminate (provided that these consecutive observations are >= 12).

I have tried the following code, but get a syntax error, which is most likely related to the max command.

Code:

gen h_sust = high_sust
bysort SecId_n: replace h_sust = . if max(Score_cons) >= 12 & Score_cons[_n+1] == . & !missing(PercId)

I have tried other alternatives, such as:

Code:

bysort SecId_n: replace h_sust = . if Score_cons[_N] >= 12 & Score_cons[_n+1] == . & !missing(PercId)

My data looks like this (note that the SecId_n present in the code is just a fund identifier that I use for the panel data, PercId is another identifier):

Code:

* Dataex code
dataex PercId date Score Score_cons high_sust if high_sust == 5
* Example generated by -dataex-. To install: ssc install dataex
clear
input long PercId float date double Score float(Score_cons high_sust)
19 201601 46.92  1 5
19 201602 47.03  2 5
19 201603 47.19  3 5
19 201604 47.16  4 5
19 201605 46.78  5 5
19 201606 46.92  6 5
19 201607 45.98  7 5
19 201608 45.83  8 5
19 201609 45.07  9 5
19 201610 45.31 10 5
19 201611 45.68 11 5
19 201612 45.77 12 5
19 201701 45.73 13 5
19 201702 45.68 14 5
19 201703  45.7 15 5
19 201704 45.55 16 5
19 201705 45.49 17 5
19 201706 45.32 18 5
19 201707 45.14 19 5
19 201708 44.93 20 5
19 201709 44.71 21 5
19 201710  44.9 22 5
19 201711 45.03 23 5
19 201712 44.92 24 5
19 201801 45.14 25 5
19 201802 44.81 26 5
19 201803 44.82 27 5
19 201804 44.71 28 5
19 201805 44.76 29 5
19 201806 45.25 30 5
19 201807 47.09 31 5
19 201808 47.24 32 5
19 201809 47.64 33 5
19 201810 48.09 34 5
19 201811 48.06 35 5
19 201812 47.55 36 5
19 201901 47.55 37 5
19 201902 47.36 38 5
19 201903 47.26 39 5
19 201904 47.41 40 5
19 201905 47.25 41 5
19 201906 47.13 42 5
19 201907 47.48 43 5
19 201908 47.29 44 5
24 201601 46.19  1 5
24 201602 46.35  2 5
24 201603 46.33  3 5
24 201604     .  0 5
24 201605 45.29  1 5
24 201606     .  0 5
24 201607 42.54  1 5
24 201608 42.62  2 5
24 201609 42.56  3 5
24 201610 42.66  4 5
24 201611 42.72  5 5
24 201612 42.68  6 5
24 201701 42.69  7 5
24 201702 42.58  8 5
24 201703 42.56  9 5
24 201704 42.24 10 5
24 201705 41.81 11 5
24 201706 41.65 12 5
24 201707 41.66 13 5
24 201708 41.66 14 5
24 201709 41.57 15 5
24 201710 42.21 16 5
24 201711 42.48 17 5
24 201712 42.28 18 5
24 201801 42.01 19 5
24 201802  41.9 20 5
24 201803 41.78 21 5
24 201804 41.77 22 5
24 201805 41.98 23 5
24 201806 41.73 24 5
24 201807 41.42 25 5
24 201808 41.31 26 5
24 201809 41.04 27 5
24 201810 41.16 28 5
24 201811    41 29 5
24 201812 41.01 30 5
24 201901 40.96 31 5
24 201902 41.14 32 5
24 201903 41.14 33 5
24 201904 41.37 34 5
24 201905 41.82 35 5
24 201906 41.77 36 5
24 201907  41.7 37 5
24 201908  41.7 38 5
94 201601  47.8  1 5
94 201602 47.94  2 5
94 201603 48.17  3 5
94 201604 48.34  4 5
94 201605 47.84  5 5
94 201606 48.75  6 5
94 201607 48.64  7 5
94 201608 48.46  8 5
94 201609 48.28  9 5
94 201610 48.33 10 5
94 201611  48.8 11 5
94 201612 49.59 12 5
end
label values PercId PercDummyId
label def PercDummyId 19 "F000003U5F", modify
label def PercDummyId 24 "F000003YP9", modify
label def PercDummyId 94 "F00000N864", modify

However, I could not come up with a solution, so I thought I would check with the community.

Tags: None

John Carfi

Join Date: Nov 2019

Posts: 3
#2

12 Nov 2019, 08:36

Dear community,

just wanted to follow up on my previous message hoping anyone has possible suggestions for this.

In that case, I would be grateful if you could share your feedback.

Moreover, I'd be happy to give additional explanation on what I am trying to do or on my code if needed.

Thanks!

John

Last edited by John Carfi; 12 Nov 2019, 09:28.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#3

12 Nov 2019, 09:05

I looked through this a couple of times and couldn't really get further than

through the tsspell command and some additional functions

where by functions you may well mean commands but -- more crucially -- you allude to code which you don't show, so that I have no idea exactly what you did.

Further, two of your commands refer to SecId_n and explaining that it is another identifier is not clear enough for me to follow. Simply, what is in the code should be in the data example.

I am puzzled to see high_sust described as a dummy variable when it has a value of 5.

I do know that tsspell (from SSC, as you are asked to explain) requires a tsset or xtset even to work at all and I could not see what that means in your case. The time variable here is not fit for Stata purposes as coding monthly dates with values like 201812 and 201901 leads to jumps of 89 from December to January and 1 otherwise, which (I surmise) can't be what you need or want. There is no sense in which tsset or xtset looking at such dates will think like a person and decode the dates as you wish them.

A better monthly date variable would be

Code:

gen mdate = ym(floor(date/100), mod(date, 100)) format mdate %tm

Also, as you guess, your use of the max() function (not command) is incorrect. Syntactically it is illegal and indeed it is also a misunderstanding of what the function does. That function needs two or more arguments and returns the maximum of whatever you feed it, in the case of two or more variables the maximum across observations. The function doesn't look at a group of values in a single variable and take the maximum over that group. Confusingly, perhaps, that is what the egen function max() does, but you can't use that on the fly.

Otherwise if you don't get a better answer than this, then I think you need to fix your date variable and then explain in terms of your original data what you want.

PS: Giovanni/John: please use your full real name consistently.

Last edited by Nick Cox; 12 Nov 2019, 09:22.
Comment
John Carfi

Join Date: Nov 2019

Posts: 3
#4

13 Nov 2019, 07:13

Hi Nick,

Thank you for your explanation - I guess I still have a lot to learn about Stata and the related best practices.

The advice on the date is helpful, as the results from running the tsspell command are now different.

I have to make some serious changes to my dataset due to some other mistakes I have noticed, so I will pause the query I posted above for now.

Thank you for your time!
Comment

Announcement

Help in generating variable that gives first missing observation after a series of minimum consecutive observations

Comment

Comment

Comment