My survey data reports the choices made by an interviewee from the given list of activities. In the actual data, the number of choices is large, up to 3000 while the number of listed activities is limited to 40 different activities. In the example below, the corresponding figures are just 99 (choices, reported in the order of time) and 5 (different activities, coded as 0 - 4).
I need to capture the most lengthy repeated sequence(s) of choices. "Repeated" means the sequence is observed at least 2 times. In the example, the sequence "0-4-1-1-0-4" is reported (at least) 2 times: From time 6 to 11 and then reoccurs from 76 to 81. The length of this sequence is 6, and by manual checking, it seems that no lengthier sequence is found repeated.
I care most about the length (6 in the example), but if possible, the specific pattern of sequence ("0-4-1-1-0-4") is desirable. I guess there might exist some different (repeated) patterns with the same length (6). If so, being able to just capture any of them is more than expected. Catching all of them could be over complicated?
With my Stata limited skill, just taking out the length (6) from the maze is truly overwhelming. It is very much appreciated if anyone could give me a way to deal with that. Many thanks!
I need to capture the most lengthy repeated sequence(s) of choices. "Repeated" means the sequence is observed at least 2 times. In the example, the sequence "0-4-1-1-0-4" is reported (at least) 2 times: From time 6 to 11 and then reoccurs from 76 to 81. The length of this sequence is 6, and by manual checking, it seems that no lengthier sequence is found repeated.
I care most about the length (6 in the example), but if possible, the specific pattern of sequence ("0-4-1-1-0-4") is desirable. I guess there might exist some different (repeated) patterns with the same length (6). If so, being able to just capture any of them is more than expected. Catching all of them could be over complicated?
With my Stata limited skill, just taking out the length (6) from the maze is truly overwhelming. It is very much appreciated if anyone could give me a way to deal with that. Many thanks!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(time choice) 1 0 2 2 3 0 4 2 5 4 6 0 7 4 8 1 9 1 10 0 11 4 12 1 13 4 14 4 15 0 16 1 17 4 18 4 19 0 20 2 21 0 22 3 23 0 24 2 25 2 26 1 27 0 28 1 29 0 30 4 31 0 32 3 33 4 34 1 35 2 36 0 37 0 38 3 39 3 40 3 41 3 42 2 43 4 44 0 45 3 46 4 47 1 48 3 49 2 50 3 51 2 52 3 53 2 54 2 55 4 56 2 57 4 58 1 59 1 60 2 61 0 62 4 63 3 64 1 65 3 66 0 67 0 68 4 69 3 70 2 71 4 72 1 73 0 74 4 75 1 76 0 77 4 78 1 79 1 80 0 81 4 82 3 83 3 84 3 85 1 86 1 87 0 88 4 89 1 90 2 91 4 92 2 93 1 94 1 95 3 96 0 97 2 98 3 99 4 end
Comment