Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with sequencing a string variable

    Hello!
    I have a string variable, new, which represents 36 months of a woman's reproductive status, and includes pregnancies, births, contraceptive use, and non-use.

    Contraceptive use is indicated through a character that relates to a specific method, potential values include : 1 2 3 4 5 6 7 8 9 W N L C E S.

    Pregnancies are indicated through "P" for the months pregnant and "B" for birth, "T" for termination.

    For example, here is a woman's value (read from right to left).

    2222222200000LLLLLLBPPPPPPPPP000000

    You can see she had 6 months of non-use, followed by 9 months of pregnancy and a birth. After that she used method "L" for 6 months and then did not use for 5 months (0s), and then used method "2" for 8 months.

    I am trying to create a variable which indicates how many pregnancies (uninterrupted sequences of Ps) and contraceptive use episodes she had. In this case she had 1 pregnancy and 2 use episodes. So, it could also be 2 new variables, one for the number of pregnancies in the period and one for the number of use sequences in the period. I think in R there is sequencing programming such as entropy which can do this easily but I am having trouble finding a similar program or thinking up a work around in Stata.


    Many many thanks in advance!

  • #2
    You can generalize the following where you just change the highlighted:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str100 status
    "2222222200000LLLLLLBPPPPPPPPP000000"
    end
    
    gen counter= ustrregexra(status, "P([^P])", "P,$1")
    replace counter= ustrregexra(counter, "([P]$)", "$1,")
    gen pregnancy= length(counter)- length(status)
    drop counter
    Res.:

    Code:
    . l
    
         +------------------------------------------------+
         |                              status   pregna~y |
         |------------------------------------------------|
      1. | 2222222200000LLLLLLBPPPPPPPPP000000          1 |
         +------------------------------------------------+

    Comment


    • #3
      Another approach:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str100 status
      "2222222200000LLLLLLBPPPPPPPPP000000"
      "PPPPPPPPP000000000PPPPPPPPP00000000"
      end
      
      * ssc install moss 
      moss status, match("(P+)") regex 
      
      gen count = 0
      foreach v of var _match* {
      replace count = count + 1 if `v' != ""
      }

      Comment


      • #4
        Code:
        gen status2 = ustrregexra(ustrregexra(status,"[1-9WNLCES]","C"),"(.)\1+","$1") 
        gen contraceptive = ustrlen(ustrregexra(status2,"[^C]",""))
        gen pregnancies   = ustrlen(ustrregexra(status2,"[^P]",""))
        Code:
        . list , abbrev(32)
        
             +-----------------------------------------------------------------------------+
             |                              status   status2   contraceptive   pregnancies |
             |-----------------------------------------------------------------------------|
          1. | 2222222200000LLLLLLBPPPPPPPPP000000    C0CBP0               2             1 |
          2. | PPPPPPPPP000000000PPPPPPPPP00000000      P0P0               0             2 |
             +-----------------------------------------------------------------------------+

        Comment


        • #5
          These are all great fixes, thank you so so much!!

          Comment

          Working...
          X