Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • assigning a random switch date of switchers to non-switchers

    Hi everyone,

    With data from a prospective cohort study, I want to look at patients taking 3 drugs. Among those, I would like to compare characteristics and outcomes of those who remained on 3 drugs during follow-up (nonswitchers), and those who switched to a simpler regimen with 2 drugs during follow-up (switchers). The "index date" (or baseline date) of switchers will be defined as the date of switching (switch_date) to the 2 drug regimen. In order to define an index date for non-switchers, I would like to randomly assign a switch date of switchers to non-switchers. There are more switchers than non-switchers in the dataset. Can anyone help me how to do this? The problem is that the follow-up durations and times of the non-switchers vary, and by randomly assigning a switch date of switchers to non-switchers results in some "index dates" that are outside of the actual follow-up times of the non-switchers.

    This is how I tried:

    First, I generated a dataset of non-switchers, including a unique id (variable id), the start of follow-up (i.e., the date when the patient started using 3 drugs, variable firstmoddate), and the end of follow-up (variable lastenddate).


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double id float(firstmoddate lastenddate)
    10184 19663 23139
    10190 19663 23260
    10358 19663 23152
    10366 19663 23209
    10405 19663 21171
    10435 19663 23111
    10468 19663 23160
    10555 19663 23195
    10556 19663 23230
    10568 19663 20044
    end
    format %td firstmoddate
    format %td lastenddate
    I then assigned a random number (variable rand) to every id, and I saved this dateset as nonswitchers.dta:

    Code:
    set seed 20240516
    generate rand=runiformint(0,977)
    
    save "nonswitchers.dta", replace
    Second, I extracted all switch dates of the switchers (n=977) to a separate dataset.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float switchdate
    20346
    21110
    20835
    21717
    19722
    19705
    20922
    21257
    20516
    23112
    end
    format %td switchdate
    I then assigned a number from 1 to 977 (variable rand), and merged the dataset contaning the switch dates with the dataset of the non-switchers based on the variable rand:

    Code:
    gen rand=_n
    
    merge 1:m rand using "nonswitchers.dta"
    
    keep if _merge==3
    The problem is that more then 30% of the switch date that were randomly assigned to the non-switchers are outside of the follow-up time (i.e. before firstmoddate or after lastenddate). Is there a way to randomly assign a switch date of switchers to non-switchers that is within the follow-up time of the non-switchers?

    I use Stata version 16.1 on Windows.

    Thanks!

    Christine

  • #2
    The trick is to first match up each switch date with every non_switchers follow-up interval that contains it, and then foreach non-switcher, select a switch date at random just from among the matches. For this, Robert Picard's -rangejoin- command is the perfect tool. It is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC. Here's how it works in your example data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double id float(firstmoddate lastenddate)
    10184 19663 23139
    10190 19663 23260
    10358 19663 23152
    10366 19663 23209
    10405 19663 21171
    10435 19663 23111
    10468 19663 23160
    10555 19663 23195
    10556 19663 23230
    10568 19663 20044
    end
    format %td firstmoddate
    format %td lastenddate
    tempfile non_switchers
    save `non_switchers'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float switchdate
    20346
    21110
    20835
    21717
    19722
    19705
    20922
    21257
    20516
    23112
    end
    format %td switchdate
    tempfile switchdates
    save `switchdates'
    
    use `non_switchers', clear
    rangejoin switchdate firstmoddate lastenddate using `switchdates'
    
    set seed 20240516
    gen double shuffle = runiform()
    by id (shuffle), sort: keep if _n == 1
    rename switchdate index_date
    Notes:
    1. If there is some non-switcher whose follow-up interval does not contain any possible switch date, this code will assign missing value to that non-switcher's index date.
    2. I saved your example data in tempfile's, but that is just for my convenience in this context. You can use your actual permanent data sets for this purpose.

    Comment


    • #3
      Dear Clyde,

      Perfect, thank you very much, it worked!

      Christine

      Comment

      Working...
      X