Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify repeating sequence in an array stata

    Dear list members, I am using Time Use Surveys that include observation on people activities on 10 minute interval. Each activity is coded on a 10 minutes interval. I would like to find the repeating sequence of activities and the number of times it occurred in an array. For example:
    A=[44 68 91 876 44 68 91 44 68 91 54 44 68 91]; the repeated sequence is 44 68 91; number of times is 4
    Thank you for your help

  • #2
    There are ways to do that, but the most convenient way depends on the exact structure of your data. What you gave us cannot be a Stata dataset. You can give a short abstract of your data use with the dataex command, see help dataex.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten,

      Thank you for your help, I share a copy of my data using dataex

      dataex serial pnum daynum ActivPeakMor

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long serial byte(pnum daynum) float ActivPeakMor
      11011202 1 1  310
      11011202 1 1 3210
      11011202 1 1 3110
      11011202 1 1 7241
      11011202 1 1  210
      11011202 1 1 3819
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1 3310
      11011202 1 1 3210
      11011202 1 1 3210
      11011202 1 1 3210
      11011202 1 1 3210
      11011202 1 1 3110
      11011202 1 1 3110
      11011202 1 1 3110
      11011202 1 1 3110
      11011202 1 1 7259
      11011202 1 1 5140
      11011202 1 1 5140
      11011202 1 1 5140
      11011202 1 1 5140
      11011202 1 1 5140
      11011202 1 1 7259
      11011202 1 1 7259
      11011202 1 1 7259
      11011202 1 1 7259
      11011202 1 1 7259
      11011202 1 1 7259
      11011202 1 1 3110
      11011202 1 1 5140
      11011202 1 1 5140
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1 8120
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1  210
      11011202 1 1 9360
      11011202 1 1 9360
      11011202 1 1 9360
      11011202 1 1 9360
      11011202 1 1 9360
      11011202 1 1 9360
      11011202 1 1 3610
      11011202 1 1 3610
      11011202 1 1 3610
      11011202 1 1 3610
      11011202 1 1 3610
      11011202 1 1 3610
      11011202 1 1 3240
      11011202 1 2  110
      11011202 1 2  110
      11011202 1 2  110
      11011202 1 2  111
      11011202 1 2  310
      11011202 1 2 3110
      11011202 1 2 3310
      11011202 1 2 3310
      11011202 1 2  210
      11011202 1 2  210
      11011202 1 2  210
      11011202 1 2  210
      11011202 1 2 3310
      11011202 1 2 3310
      11011202 1 2 7241
      11011202 1 2 7241
      11011202 1 2 7241
      11011202 1 2 7241
      11011202 1 2 3430
      11011202 1 2 3430
      11011202 1 2 3210
      11011202 1 2 3210
      11011202 1 2 3210
      11011202 1 2 3210
      11011202 1 2  210
      11011202 1 2 5110
      11011202 1 2 3210
      11011202 1 2 3210
      11011202 1 2 5110
      11011202 1 2 5110
      11011202 1 2 5110
      11011202 1 2 5120
      11011202 1 2 5120
      11011202 1 2 5120
      11011202 1 2 5120
      11011202 1 2 9360
      11011202 1 2 3610
      11011202 1 2 3611
      11011202 1 2 3611
      end
      label values daynum daynum
      ------------------ copy up to and including the previous line ------------------

      Listed 100 out of 1008513 observations
      Use the count() option to list more


      I would like to find the repeating sequence of activities and the number of times it occurred.

      Thank you

      Comment


      • #4
        Dear Maarten,

        I reshaped the data to wide

        clear
        input long serial byte pnum float(WorktimeDay11 WorktimeDay12 WorktimeDay13 WorktimeDay14 WorktimeDay16 WorktimeDay17 WorktimeDay18 WorktimeDay19 WorktimeDay110 WorktimeDay111 WorktimeDay112)
        11011202 1 3210 3210 3210 3210 3110 3110 3110 7259 5140 5140 5140
        11011202 1 210 210 3310 3310 7241 7241 7241 3430 3430 3210 3210
        11011202 4 310 9210 9210 9210 9210 9210 9210 2110 2110 2110 2110
        11011202 4 110 110 110 110 110 110 110 111 111 111 3110
        11011203 1 111 3110 5120 5120 5120 5120 5120 210 300 300 300
        11011203 1 210 210 210 210 300 300 300 300 390 3440 3440
        11011207 1 110 110 3620 3710 3710 3710 3710 3710 210 310 310
        11011207 1 110 110 3110 3110 3110 3110 3110 3210 3210 310 310
        11011207 2 8100 8100 3620 3710 3710 3710 3710 210 210 210 210
        11011207 2 110 310 8100 8100 8100 8100 8100 8100 8100 8100 310
        11011209 1 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11011209 1 1110 1110 1110 1110 1110 1110 1110 1110 1110 1120 1110
        11011209 2 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 5190
        11011209 2 1110 1110 1110 1110 1110 1110 1110 1120 1120 1110 1110
        11011210 1 310 9500 5120 6120 6120 6120 6120 6120 6120 4271 4271
        11011210 1 3819 310 310 310 310 3819 3819 3819 9360 9360 3611
        11011210 2 9120 9120 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11011210 2 110 110 310 310 310 9960 9960 9360 9360 9360 9360
        11011211 1 210 210 3130 7241 310 310 8100 8100 8100 9430 4320
        11011211 1 210 210 9950 210 3710 3710 3710 3710 3710 310 9500
        11011211 2 3110 3110 310 310 9360 7241 3240 3240 3240 3240 8110
        11011211 2 9950 210 210 3130 310 6110 6110 6110 6110 7241 5140
        11011212 1 8210 310 310 310 3110 210 210 3819 3310 3310 5120
        11011212 1 3819 3819 3819 310 310 310 310 3310 3130 3210 3210
        11011212 2 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11011212 2 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11011212 3 110 110 110 110 110 110 110 111 7250 7250 7250
        11011212 3 7250 310 5310 9210 9210 2110 2110 2110 2110 2110 2110
        11011212 4 110 110 110 110 110 110 110 111 7250 7250 7250
        11011212 4 310 310 9210 9210 5190 2110 2110 2110 2110 2110 2110
        11011213 1 1110 1110 1110 1110 1110 1110 1110 9360 3610 3610 3610
        11011213 1 1110 1110 1110 1110 1110 1110 1110 9500 310 9410 4000
        11011213 3 8210 8210 8210 8210 8210 8210 7330 7330 7330 7330 7330
        11011213 3 8210 8210 8210 8210 9210 2110 2110 2110 2110 2110 2110
        11011213 4 8210 8210 8210 8210 8210 8210 8210 2120 2120 2120 2120
        11011213 4 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11011217 1 3110 3110 3130 3130 8110 8110 8110 310 310 310 310
        11011217 1 3110 3110 210 210 210 5310 5310 5310 5310 5310 310
        11011219 1 110 111 8219 8219 8219 8219 8219 310 310 310 310
        11011219 1 110 110 110 110 110 110 110 111 111 111 111
        11011220 1 110 110 110 110 110 5140 3110 7241 7241 7241 7241
        11011220 1 3310 3320 3130 3130 3210 3210 1110 1110 1110 1110 1110
        11011220 2 110 110 110 110 110 110 111 310 210 210 210
        11011220 2 9120 210 210 210 9120 9120 1110 1110 1110 1110 1110
        11011220 3 110 110 110 110 110 111 8110 8110 210 210 210
        11011220 3 310 9120 9120 9120 9120 1399 1110 1110 1110 1110 1110
        11020202 1 110 110 111 111 310 310 310 210 210 210 5120
        11020202 1 110 111 111 111 3110 3110 3110 3110 3110 3110 8210
        11020202 2 110 110 3110 3310 9960 9960 9960 3310 9960 210 9820
        11020202 2 110 110 111 111 210 210 210 310 5140 3210 3000
        11020205 1 310 310 310 3000 3000 3000 210 210 3310 3310 3710
        11020205 1 110 110 111 7241 7241 3110 210 210 310 310 310
        11020206 1 111 111 8219 8219 8311 3310 3310 210 2190 1110 3440
        11020206 1 9960 111 111 210 210 210 310 210 210 210 210
        11020206 2 7259 7259 7259 7259 9120 9120 9120 9120 9120 9120 9120
        11020206 2 110 111 9990 3110 110 310 3110 210 210 210 3320
        11020206 3 110 110 110 110 110 110 110 110 110 111 310
        11020206 3 110 110 110 110 110 110 110 110 110 111 210
        11020206 4 110 110 111 310 8221 8221 210 210 8221 310 310
        11020206 4 9970 9970 9970 9970 9970 9970 9970 9970 9970 5140 5140
        11020207 1 210 210 5120 5120 5120 5120 5120 5120 5120 9940 3710
        11020207 1 8120 8120 8120 310 310 310 310 3110 3110 210 210
        11020208 1 110 110 110 110 5310 5310 5310 5310 5310 5310 300
        11020208 1 3290 3000 3000 3000 3000 9230 9230 9230 3290 210 3000
        11020208 2 3811 3811 7250 7250 7250 7250 7250 7250 7250 7250 7250
        11020208 2 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11020208 3 110 110 110 110 110 110 110 110 110 7251 7251
        11020208 3 9210 9210 9210 2110 2110 2110 2110 2110 2110 2110 2110
        11020211 1 110 110 3430 210 210 210 3130 3310 310 310 3210
        11020211 1 110 110 110 110 110 110 3110 210 210 210 210
        11020212 1 3130 3819 3819 3819 3819 3819 8219 8219 3210 3210 3210
        11020212 1 310 310 210 210 3210 310 7241 7241 310 9210 9210
        11020212 2 3110 3110 3110 3110 310 9230 9230 9230 9130 9130 9130
        11020212 2 3110 3110 3819 3819 3819 310 310 9520 9940 9520 9520
        11020212 3 110 110 111 3210 8210 9210 2110 2110 2110 2110 2110
        11020212 3 9960 9960 9960 9960 9960 9960 9960 9960 9960 9960 9520
        11020218 1 110 110 210 210 210 210 210 3000 3000 3000 3000
        11020218 1 110 110 210 210 210 210 210 3000 3000 3000 3000
        11020301 1 3630 3630 9370 9370 210 210 210 3220 3220 3220 3220
        11020301 1 210 210 310 310 3130 1110 1110 1110 1110 1110 1110
        11020301 2 3320 3320 310 310 310 3130 8110 3310 3310 3310 3110
        11020301 2 210 210 3310 3310 310 310 9360 9360 3611 3611 9610
        11020301 3 110 110 110 110 110 110 110 111 210 210 3210
        11020301 3 310 3110 210 210 9120 9120 1399 1110 1110 1110 1110
        11020305 1 9120 1399 1110 1110 1110 9110 9110 1399 1110 1110 1110
        11020305 1 110 110 110 110 110 110 111 8210 8210 3110 3110
        11020307 1 110 110 110 110 110 110 110 390 390 210 210
        11020307 1 110 110 110 110 110 110 110 310 310 310 310
        11020310 1 3240 9120 1110 1110 1110 1110 1110 1110 1110 1110 1110
        11020310 1 110 110 310 210 8110 8110 8110 8110 8110 8110 210
        11020310 2 3440 3440 3290 310 9390 9390 3310 310 310 310 210
        11020310 2 8110 3911 3110 3110 3110 3440 3440 3440 3440 3440 3110
        11020312 1 111 310 310 210 210 210 3727 3727 3727 9360 9310
        11020312 1 110 111 111 111 111 111 310 310 310 210 210
        11020312 2 310 310 210 210 210 210 3130 3210 310 9610 9610
        11020312 2 110 111 5110 5110 5110 3430 3430 8110 8110 310 310
        11020315 1 1110 1110 1110 1110 1110 1110 310 5140 5140 5140 5140
        11020315 1 8219 9120 9120 9120 1110 1110 1110 1110 1110 1110 1110
        11020315 2 110 110 110 110 110 110 110 310 8210 4210 210
        11020315 2 110 110 110 110 110 110 110 110 110 110 110
        end
        [/CODE]
        ------------------ copy up to and including the previous line ------------------

        Listed 100 out of 16533 observations
        Use the count() option to list more

        Comment


        • #5
          Some ideas, please

          Comment


          • #6
            Dear list members, I am using Time Use Surveys that include observation on people activities on 10 minute interval. Each activity is coded on a 10 minutes interval. I would like to find the repeating sequence of activities and the number of times it occurred in an array. For example:
            A=[44 68 91 876 44 68 91 44 68 91 54 44 68 91]; the repeated sequence is 44 68 91; number of times is 4
            Thank you for your help

            Comment


            • #7
              Your sampling array A has no consecutive duplicates, while within your long sample (at # 4), consecutive duplications do exist sometime (for example, consecutive 3210 in your first line).

              A clarification seems necessary right there: What output you are seeking for in such cases?




              Comment


              • #8
                A tricky code with duplicates and egen seems making a solution.

                Then Mate, if you are still seekkng for an answer, please clarify what you need in the case of consecutive duplications.

                Comment

                Working...
                X