Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expanding time-series dataset to include control days within strata of dow, month, year for a case-crossover analysis


    Hi there,

    I have a time series dataset with an ID variable and a date of death variable along with information on latitude and longitude. I would like to expand the dataset to include control days (no death) which within are the same day of the week in the same month of the same year. So each individual serves as their own control. There should will be three to four control days depending on the length of the month.

    Here an example of the dataset that has been changed altered to make sure it is anonymised. There are about 1,100 observations.

    Code:
    input long n_eid double death_date float(lat lon death)
    425 22478 56.35  -3.2 1
    387 22313 53.75 -0.85 1
    218 22223 52.95  -6.6 1
    471 22146 57.15  -3.25 1
    583 22131 54.15  -.85 1
    455 22361 54.25  -2.1 1

    I calculated a stratum variable for each death date

    Code:
    * CREATE YEAR X MONTH X DOW STRATUM VARIABLE
    gen month=month(death_date)
    gen year=year(death_date)
    gen dow=dow(death_date)
    egen stratum_YMD=group(year month dow)
    However, I am a bit stuck on how to expand the dataset to include the control dates within the same stratum for each individual. The dataset ideally would look something like:

    Code:
    input long n_eid double death_date float(lat lon death)
    425 22478 56.35  -3.2 1
    425 22471 56.35  -3.2 0
    425 22485 56.35  -3.2 0
    425 22493 56.35  -3.2 0
    387 22313 53.75 -0.85 1
    387 22320 53.75 -0.85 0
    387 22327 53.75 -0.85 0
    387 22334 53.75 -0.85 0
    218 22223 52.95  -6.6 1
    218 22230 52.95  -6.6 0
    218 22237 52.95  -6.6 0
    218 22244 52.95  -6.6 0
    218 22251 52.95  -6.6 0
    471 22146 57.15    -3.25 1
    471 22146 57.15    -3.25 0
    471 22146 57.15    -3.25 0
    471 22146 57.15    -3.25 0
    .
    .
    .
    The death variable indicates case (1) or control (0) date for that individual.

    Then, I would merge my environmental exposure to the dataset based on lat, lon and date and perform a conditional logistic regression for a time-stratified case-crossover analysis. But first I need to expand the dataset to calculate the control days for each case. Any help would be appreciated.
    Last edited by Andrew Stevenson; 19 Feb 2024, 05:22.

  • #2
    The following method overgenerates candidate controls and then retains only those meeting the required conditions. As a result it uses about twice as much memory, and twice as much compute as is strictly necessary. But, assuming you have enough memory to ruin it, I think it is the best way to go because the code is simple and transparent.
    Code:
    clear
    input long n_eid double death_date float(lat lon death)
    425 22478 56.35  -3.2 1
    387 22313 53.75 -0.85 1
    218 22223 52.95  -6.6 1
    471 22146 57.15  -3.25 1
    583 22131 54.15  -.85 1
    455 22361 54.25  -2.1 1
    end
    format death_date %td
    
    //    CREATE CANDIDATE CONTROLS THROUGH 4 WKS BEFORE & AFTER CASE
    expand 9
    sort n_eid
    by n_eid, sort: replace death_date = death_date + (_n-6)*7 if inrange(_n, 2, 5)
    by n_eid: replace death_date = death_date + (_n-5)*7 if inrange(_n, 6, _N)
    by n_eid: replace death = 0 if _n != 1
    
    //    KEEP ONLY THOSE IN THE SAME CALENDAR MONTH
    by n_eid: keep if mofd(death_date) == mofd(death_date[1])
    
    //    SORT CHRONOLOGICALLY WITHIN ID (OPTIONAL)
    isid n_eid death_date, sort
    If this won't run in your setup due to memory limitations, post back and I'll work up code that first calculates the actual number of controls possible and creates only those in the first place. It's a bit more complicated, less transparent, and a little bit finicky to get the edge cases right.

    Comment


    • #3
      Alternatively, generate all days in the month and year of death and keep those in the same day of the week. With 1100 observations, you are unlikely to run into memory issues.

      Code:
      clear
      input long n_eid double death_date float(lat lon death)
      425 22478 56.35  -3.2 1
      387 22313 53.75 -0.85 1
      218 22223 52.95  -6.6 1
      471 22146 57.15  -3.25 1
      583 22131 54.15  -.85 1
      455 22361 54.25  -2.1 1
      end
      
      expand 31
      foreach period in year month dow{
          gen `period'=`period'(death_date)
      }
      bys n_eid:gen newdate= dmy(_n,month, year)
      keep if dow==dow(newdate)
      replace death= death_date==newdate
      drop death_date
      Res.:

      Code:
      
      . l, sepby(n_eid)
      
           +----------------------------------------------------------------+
           | n_eid     lat     lon   death   year   month   dow     newdate |
           |----------------------------------------------------------------|
        1. |   218   52.95    -6.6       1   2020      11     3   04nov2020 |
        2. |   218   52.95    -6.6       0   2020      11     3   11nov2020 |
        3. |   218   52.95    -6.6       0   2020      11     3   18nov2020 |
        4. |   218   52.95    -6.6       0   2020      11     3   25nov2020 |
           |----------------------------------------------------------------|
        5. |   387   53.75    -.85       1   2021       2     2   02feb2021 |
        6. |   387   53.75    -.85       0   2021       2     2   09feb2021 |
        7. |   387   53.75    -.85       0   2021       2     2   16feb2021 |
        8. |   387   53.75    -.85       0   2021       2     2   23feb2021 |
           |----------------------------------------------------------------|
        9. |   425   56.35    -3.2       0   2021       7     6   03jul2021 |
       10. |   425   56.35    -3.2       0   2021       7     6   10jul2021 |
       11. |   425   56.35    -3.2       1   2021       7     6   17jul2021 |
       12. |   425   56.35    -3.2       0   2021       7     6   24jul2021 |
       13. |   425   56.35    -3.2       0   2021       7     6   31jul2021 |
           |----------------------------------------------------------------|
       14. |   455   54.25    -2.1       0   2021       3     1   01mar2021 |
       15. |   455   54.25    -2.1       0   2021       3     1   08mar2021 |
       16. |   455   54.25    -2.1       0   2021       3     1   15mar2021 |
       17. |   455   54.25    -2.1       1   2021       3     1   22mar2021 |
       18. |   455   54.25    -2.1       0   2021       3     1   29mar2021 |
           |----------------------------------------------------------------|
       19. |   471   57.15   -3.25       0   2020       8     3   05aug2020 |
       20. |   471   57.15   -3.25       0   2020       8     3   12aug2020 |
       21. |   471   57.15   -3.25       1   2020       8     3   19aug2020 |
       22. |   471   57.15   -3.25       0   2020       8     3   26aug2020 |
           |----------------------------------------------------------------|
       23. |   583   54.15    -.85       1   2020       8     2   04aug2020 |
       24. |   583   54.15    -.85       0   2020       8     2   11aug2020 |
       25. |   583   54.15    -.85       0   2020       8     2   18aug2020 |
       26. |   583   54.15    -.85       0   2020       8     2   25aug2020 |
           +----------------------------------------------------------------+
      Last edited by Andrew Musau; 19 Feb 2024, 11:18.

      Comment

      Working...
      X