conditional program help

Leonard Scott

Join Date: Jan 2021

Posts: 33
#1

conditional program help

27 Dec 2021, 15:35

Hello:
I am trying to work with the following data set, and build rules for flagging based on if conditions are met. Hoping for assistance.
Each id is a single case that occurs during a specific duration as indicated by the timestamps in date.
I would probably run this with a bysort in front of the ultimate code, as I want to ensure cases are sorted by id and date.

I would like to ultimately know if the final conditions occurred during each case id, sorted by date.
(We could generate a new variable, say final, that marks whether the final conditions were ultimately met)
Rules to generate a "Yes" vs. "No" in the final varlist are:

1. We must encounter 2 or more consecutive occurrences of "1" being in condition to yield a "Yes" in final.
2. If condition starts with consecutive 1s, then has consecutive 0s, then back to consecutive 1s, "Yes" would go in final.
3. If condition is always with 1s and never gets to a 0, then "No" would go in final.
4. If condition starts with 1s, then changes to 0s for the remaining duration, then "No" would go in final.

Overall, having the variable final is optional, if you have a better way of flagging if the conditions are met for a given id. I just suggested that as I don't know any other ways.
Please assist with suggestions as to how to work with conditions and rules like this in general and thanks!

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id double date float condition 1 1.893459e+12 0 1 1.8934593e+12 0 1 1.8934596e+12 0 1 1.8934599e+12 0 1 1.8934602e+12 0 2 1.8944604e+12 0 2 1.8944607e+12 0 2 1.894461e+12 1 2 1.8944613e+12 1 2 1.8944616e+12 1 2 1.8944619e+12 0 2 1.8944622e+12 0 3 1.894707e+12 1 3 1.8947073e+12 1 3 1.8947076e+12 1 3 1.8947079e+12 0 3 1.8947082e+12 0 3 1.8947085e+12 0 4 1.9278003e+12 1 4 1.9278006e+12 1 4 1.9278009e+12 1 4 1.9278012e+12 1 4 1.9278015e+12 1 4 1.9278018e+12 1 4 1.9278021e+12 1 5 1.9279008e+12 1 5 1.9279011e+12 1 5 1.9279014e+12 0 5 1.9279017e+12 0 5 1.927902e+12 0 5 1.9279023e+12 1 5 1.9279026e+12 1 end format %tcNN/DD/CCYY_HH:MM date
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

27 Dec 2021, 16:21

I'm not sure I fully understand your conditions. In particular, for condition 2, I am applying it in the least restrictive way possible: if the first two observations have condition == 1, and then there are at least two consecutive observations somewhere after that with condition == 0, and then after that run of consecutive condition == 0's, there are at least two consecutive observations with condition == 1, and it is entirely irrelevant what happens after that. Conditions 1, 3, and 4, on this interpretation, automatically follow if condition 2 is met.

If that is what you meant, then I believe the following code does it:

Code:

assert inlist(condition, 0, 1) by id (date), sort: gen spell_num = sum(condition != condition[_n-1]) by id (spell_num date), sort: gen duration = _N by id: egen duration_spell_2 = max(cond(spell_num == 2), duration, .) by id: egen duration_spell_3 = max(cond(spell_num == 3, duration, .)) label define yesno 0 "No" 1 "Yes" by id: gen final:yesno = condition[1] == 1 & duration[1] >= 2 /// & inrange(duration_spell_2 , 2, .) & inrange(duration_spell_3, duration, .)
Comment
Leonard Scott

Join Date: Jan 2021

Posts: 33
#3

28 Dec 2021, 07:11

Thank you Clyde Schechter,
I am going to investigate this code.

I am sorry if I wasn't clear before, but I will try to clarify as well as background to see if it helps further or changes your approach.

When a condition is present, (condition == 1), that indicates "a problem is present". As opposed to condition==0, "a problem is not present". These problem flags are simple initial readings.
So I am trying to verify in a second layer using rules when I decide that a "final" true problem exists.

For a problem to truly exist and make "final==Yes":
1. There must be at minimum 2 consecutive instances of a condition==1
Furthermore:
2. If there was a problem to start with on initial readings (condition==1 initially and for some consecutive length), then problem was corrected (condition==0 for some consecutive length), then problem recurs (condition==1 now again), we will count this as a true final==Yes.
However:
3. If a problem existed from the start (condition==1) and was never resolved during our measurements (never became condition==0) then we would not count this as true and final==No.
4. If a problem existed from the start (condition==1) and we corrected the problem to condition==0, which remained for the duration, then we would not count this as true and final==No.

Does this help or change anything?
Thanks!
-L
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

28 Dec 2021, 09:18

It seems to be a restatement of what you gave in #1. I think the best thing to do now is to try the code out in your full data and hand check several cases to see if it is producing what you intend. If it isn't, post back with example data that illustrates where it is going wrong and explain how what it produces differs from what you want.
Comment
Leonard Scott

Join Date: Jan 2021

Posts: 33
#5

02 Jan 2022, 13:36

Thank you Clyde Schechter again for your help.
I have been messing with this and testing it out. I do see your rationale overall and how this can work.

Question:
If I use the same sample set originally cited in my post and test this out with the code you provided, I do find an error in flagging the rules.
id == 2 is flagging as a "No", where it should be a "Yes" in final.
It should qualify because there was initially no problem (condition==0), then there became a problem (condition==1).

I'm wondering if this occurred because of the last line of your code specifies that the 1st instance of condition in each id needs to == 1? (If that's what "condition[1]==1" means)?:

Code:

by id: gen final:yesno = condition[1] == 1 & duration[1] >= 2/// & inrange(duration_spell_2 , 2, .) & inrange(duration_spell_3, duration, .)

If we just remove the "[1]" by condition this seems to fix. Do you think that is correct or am I missing something?

I also tried to see if a small modification would work:

Code:

by id: gen final2:yesno = condition == 1 & spell_num >= 2

as I believe that this also flags the ids that meet the conditions I desire.
I would then be eliminating any segments of creating and using the duration and duration_spell elements and final code would like this:

Code:

assert inlist(condition, 0, 1) by id (date), sort: gen spell_num = sum(condition != condition[_n-1]) label define yesno 0 "No" 1 "Yes" by id: gen final2:yesno = condition == 1 & spell_num >= 2

However, I am not sure if I may be leaving out necessary restrictions by skipping the duration_spell parts.
Could you help me see the difference or if I am wrong?

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

02 Jan 2022, 14:27

Question:
If I use the same sample set originally cited in my post and test this out with the code you provided, I do find an error in flagging the rules.
id == 2 is flagging as a "No", where it should be a "Yes" in final.
It should qualify because there was initially no problem (condition==0), then there became a problem (condition==1).

I'm wondering if this occurred because of the last line of your code specifies that the 1st instance of condition in each id needs to == 1? (If that's what "condition[1]==1" means)?:

In my reading of #1 I had understood it to be a requirement that the id start out with condition == 1. That was based on condition 2, where you said "If there was a problem to start with on initial readings (condition==1 initially and for some consecutive length) [emphasis added]" But apparently, that was a misunderstanding. Also, I note that id == 2 first has 2 observations with condition = 0, followed by three observations of condition = 1, and then two more of condition = 0. So we never go back to 1 after 0, which seems, again, to contradict your having said "then problem recurs (condition==1 now again) [emphasis added]" in #1.

However, I am not sure if I may be leaving out necessary restrictions by skipping the duration_spell parts.
Could you help me see the difference or if I am wrong?

Well, the additional parts you are deleting were there to verify "at minimum 2 consecutive instances of a condition==1 [emphasis added]". Your revision only requires a single instance of condition == 1.

At this point, the only thing that is clear to me is that we have had a massive misunderstanding regarding the meaning of the conditions in #1. For that reason, I am reluctant to try to revise the code because I have no confidence that I fully understand what you want. I'm hopeful that having explained the purpose of the various parts of the code I originally offered, you can modify it successfully. I will give you one part of the modification that you might have difficulty coming up with on your own. Because you do not require that the first observation of an id have condition = 1, I would change the command that first defines spell_num to:

Code:

by id (date), sort: gen spell_num = sum((condition != condition[_n-1]) & (sum(condition) > 0))

The occurrence of condition going from 1 to 0 and back to 1 again will then be marked in the data as -spell_num[_N] >= 3- (within an id). If all you require is that there be some spell of condition = 1 (whether or not it is followed by a spell of 0 and another spell of 1) then that will show up as -spell_num[_N] >= 1 (again, within an id). You can then decide whether you care about the length of the spells and include, or exclude accordingly.
Comment

Announcement

conditional program help

Comment

Comment

Comment

Comment

Comment