Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting number of events within a dynamic period of time by group

    Hello,

    I have a number of variables. [ id (group id), med (medication), date (date medication taken), start ( beginning of observation period), end (end of observation period) ]

    I want to generate a variable called treated that:
    * coded as 1
    - if the patient received medication1 or medication 3
    - and the date (date variable) of 3 consecutive prescriptions (either for med 1 or med 3) is within 180 days period
    - and the 180 days period must be within the observational period ( after start and before end)

    * coded as 0 otherwise

    The problem is that I am using a huge panel data and the medications received could be over a period of years and I need only to group the patients who received a continuous treatment (180 days) of medication 1 or medication 3 only. The gap between the received medication can sometimes be more than 180 days and the patient could start a continuous treatment period before completing 180 days in a previous continuous treatment trial. Also sometimes the continuous treatment period is outside the observational period.
    Some patients could have more than one continuous treatment period within the observational period. If possible I need also to count the number of those periods per id.

    Please find below an artificial sample of my data:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str4 med double(date start end)
    1 "med2" 19002 18871 19539
    1 "med3" 19002 18871 19539
    1 "med2" 19025 18871 19539
    1 "med3" 19067 18871 19539
    1 "med3" 19094 18871 19539
    1 "med1" 19188 18871 19539
    1 "med2" 19218 18871 19539
    1 "med1" 19218 18871 19539
    1 "med1" 19276 18871 19539
    1 "med2" 19449 18871 19539
    2 "med2" 20210 19967 20635
    2 "med1" 20273 19967 20635
    2 "med3" 20367 19967 20635
    2 "med1" 20430 19967 20635
    3 "med3" 20031 19602 20269
    3 "med3" 20075 19602 20269
    3 "med3" 20120 19602 20269
    3 "med2" 20157 19602 20269
    3 "med2" 20218 19602 20269
    3 "med1" 20370 19602 20269
    end
    format %td date
    format %td start
    format %td end

    Thank you

  • #2
    I am not entirely sure what you mean by "the date (date variable) of 3 consecutive prescriptions (either for med 1 or med 3) is within 180 days period." It might mean that in that 180 day period there must be 3 consecutive prescriptions for med1, or 3 prescriptions for med 3, and it could also mean that no prescriptions for med2 are allowed in that interval. But I have taken it instead to mean simply that in that 180 day period the number of prescriptions written for med1 and for med3 must add up to at least 3. That is, I allow concurrent prescribing of med2, and I allow either med1 or med3 to count, regardless of whether the other has been prescribed. I also assume that in any given observation, the 180 day period must begin at the date of that observation; otherwise put, we do not look back into the past to find a med1/med3 prescription that might count as part of the 180 day period. We only look forward 180 days. If these interpretations are not what you had in mind, post back and clarify.

    Code:
    gen ref_start = max(date, start)
    gen ref_end = min(date + 180, end)
    format ref* %td
    gen byte target_med = inlist(med, "med1", "med3")
    
    rangestat (sum) target_count = target_med, by(id) interval(date ref_start ref_end)
    gen byte wanted = (target_count >= 3)
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC. To use it, you must also install -rangejoin-, by Robert Picard, also available from SSC.

    Added: Thank you for using -dataex- on your very first post.
    Last edited by Clyde Schechter; 22 Mar 2022, 13:36.

    Comment


    • #3
      Thanks for the mention of rangestat. While rangejoin requires rangestat, the converse is not true.

      Comment


      • #4
        Quite right, Nick. Sorry I got that mixed up. Thanks for noticing that.

        Comment


        • #5
          Thanks Clyde Schechter very much. Yes, the interpretations that you mentioned are what I meant. I will try the code.

          Your help is much appreciated!

          Comment


          • #6
            @Clyde Schechter I tried the code provided and I got values for med2 within the target_count variable

            I used the following code to restrict the command to med1 and med3 only

            . rangestat (sum) target_count = target_med if (med=="med1" | med=="med3") , by(id) interval(date ref_start ref_end)

            Is that a correct approach or am I missing something?

            Comment


            • #7
              I don't understand why you are adding -if (med == "med1" | med == "med3")- to the rangestat command. What are you trying to accomplish with that modification to the code I showed in #2? It doesn't make any sense to me; at least it doesn't fit with the logic I had in mind. All it would do is replace the results shown in observations with med = "med2" with missing values. Everything else would come out the same.

              Did you try the code in #2 as written? Was there a problem with the results? When I ran it, it worked correctly with your example data, at least correct based on my understanding of what you are looking for.

              If the code in #2 is not working properly, please post back with a -dataex- example where it produces incorrect results, and point out specifically what is incorrect, and what the correct result should be.

              Added: Wait, now I understand what you're saying. Apparently you didn't want any results to be shown in those observations where med == "med2." That wasn't what I understood in your original post: a med2 observation could lead off a 180 period in which there were subsequently 3 prescriptions for med1 or med3, so I counted those. But if all you want is to eliminate those results, then your modification with if (med=="med1" | med=="med3") is correct.
              Last edited by Clyde Schechter; 22 Mar 2022, 15:44.

              Comment


              • #8
                Yes, that is correct I don't want any results to be shown for med2.

                Thank you very much for your help.

                Comment


                • #9
                  I expanded the example data to cover dates before the observation period and when I applied the code I found that in observation # 21 and # 22 within id = 4 (in red below) , though the date was before the ref_start , they are still counted as events (results are variables target count and wanted). I made a modification to the code (in red) which excluded them from the counting (results are variables target_count2 and wanted2)

                  . rangestat (sum) target_count2 = target_med if (med=="med1" | med=="med3") & (date >= ref_start & date<= ref_end), by(id) interval(date ref_start ref_end)

                  @Clyde Schechter Could you give me your insight please?


                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float id str5 med double(date start end) float(ref_start ref_end) byte target_med double target_count byte wanted double target_count2 byte wanted2
                  1 "med2" 19002 18871 19539 19002 19182 0 . 0 . 0
                  1 "med3" 19002 18871 19539 19002 19182 1 3 1 3 1
                  1 "med2" 19025 18871 19539 19025 19205 0 . 0 . 0
                  1 "med3" 19067 18871 19539 19067 19247 1 4 1 4 1
                  1 "med3" 19094 18871 19539 19094 19274 1 3 1 3 1
                  1 "med1" 19188 18871 19539 19188 19368 1 3 1 3 1
                  1 "med2" 19218 18871 19539 19218 19398 0 . 0 . 0
                  1 "med1" 19218 18871 19539 19218 19398 1 2 0 2 0
                  1 "med1" 19276 18871 19539 19276 19456 1 1 0 1 0
                  1 "med2" 19449 18871 19539 19449 19539 0 . 0 . 0
                  2 "med2" 20210 19967 20635 20210 20390 0 . 0 . 0
                  2 "med1" 20273 19967 20635 20273 20453 1 3 1 3 1
                  2 "med3" 20367 19967 20635 20367 20547 1 2 0 2 0
                  2 "med1" 20430 19967 20635 20430 20610 1 1 0 1 0
                  3 "med3" 20031 19602 20269 20031 20211 1 3 1 3 1
                  3 "med3" 20075 19602 20269 20075 20255 1 2 0 2 0
                  3 "med3" 20120 19602 20269 20120 20269 1 1 0 1 0
                  3 "med2" 20157 19602 20269 20157 20269 0 . 0 . 0
                  3 "med2" 20218 19602 20269 20218 20269 0 . 0 . 0
                  3 "med1" 20370 19602 20269 20370 20269 1 . 0 . 0
                  4 "med3" 19116 19237 19904 19237 19296 1 3 1 . 0
                  4 "med1" 19149 19237 19904 19237 19329 1 3 1 . 0

                  4 "med1" 19237 19237 19904 19237 19417 1 3 1 3 1
                  4 "med1" 19267 19237 19904 19267 19447 1 2 0 2 0
                  4 "med1" 19276 19237 19904 19276 19456 1 1 0 1 0
                  4 "med1" 19814 19237 19904 19814 19904 1 1 0 1 0
                  4 "med1" 19942 19237 19904 19942 19904 1 . 0 . 0
                  4 "med2" 20076 19237 19904 20076 19904 0 . 0 . 0
                  end
                  format %td date
                  format %td start
                  format %td end
                  format %td ref_start
                  format %td ref_end
                  Last edited by Simon Fady; 22 Mar 2022, 23:35.

                  Comment


                  • #10
                    To help, I need more clarity on the problem. There are two distinct ways in which criteria can be applied in your data, and I have been apparently assuming incorrectly about which way.

                    A criterion such as med must be "med1" or "med3" can be a restriction on a) which observations are to get a value for "wanted", or b) which observations are to be counted when calculating wanted (perhaps for results in other, earlier observations). Or it can be both. In my original code in #2, I had assumed that -inlist(med, "med1", "med3")- applied only in manner b). And the code in #2 requires that date in the observation for which wanted is being calculated be between the start and end of the observation being counted for it, but did not require it to be in that range to be counted towards wanted for a different observation. I think your change here does not do the appropriate thing because it also removes these observations from receiving a value for wanted.

                    Generally speaking, using the -if- option in -rangestat- is tricky and inadvisable. (Your use of it in #6 happened to work out OK, so I didn't bring this up back then.) The problem is that when you do that, the observations excluded by -if- are not only excluded from getting results themselves, but are also excluded from being counted for the results in other observations--which is often not what is desired. So when the desire is simply to exclude observations from getting results, it is safer to just let -rangestat- calculate those results and then you replace them with missing values later.

                    I think the following code will do what you want, though I am not certain now that I fully understand the requirements:

                    Code:
                    gen ref_start = max(date, start)
                    gen ref_end = min(date + 180, end)
                    format ref* %td
                    gen byte target_med = inlist(med, "med1", "med3") & inrange(date, start, end)
                    
                    rangestat (sum) target_count = target_med, by(id) interval(date ref_start ref_end)
                    replace target_count = . if !inlist(med, "med1", "med3")
                    gen byte wanted = (target_count >= 3) if !missing(target_count)
                    This will exclude observations where date is not between start and end from being counted towards wanted for other observations (or for itself). It still assigns values of target_count and wanted to those observations. If you don't want those values assigned, add - | !inrange(date, start, end)- to the -replace target_count = . ...- command.

                    Comment


                    • #11
                      @Clyde Schechter I am sorry for being not clear.

                      I applied the code in #10 and the counting now is what I want after adding - | !inrange(date, start, end)- to the -replace target_count = . ...- command.

                      Thank you for your help and time, much appreciated!

                      Comment

                      Working...
                      X