Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating date variables fulfilling given criteria from multiple occurring events per person

    Hi,
    I am working on a dataset with ICD-codes (fractures in this case) and surgical procedure codes (NCSP) from hospital records data. The ICD-codes (ICD-9 and ICD-10), date of occurrence(Date_ICD) and ID is obtained from one data set, and surgical procedure codes (NCSP), date of occurrence(Date_NCSP) and ID is appended to the ICD-code data set as illustrated in the table bellow
    One person might have several ICD- and procedure codes, each with a corresponding date.
    The goal is to identify number of occurring fractures per person based on following criteria for fractures:
    1. ICD-NCSP criteria: Occurrence of an ICD-code and a surgical procedure code (NCSP) within plus/minus 2 months from the other date.
      E.g. The participant with ID= 8 is registered with Date_ICD and Date_NCSP on 01jun2021 = fracture.
    2. ICD-ICD criteria: Occurrence of two identical occurring ICD-codes with dates within plus minus three months of the other date. E.g. ID 7 has Date_ICD 01.mar 2013 and Date_ICD 01may2013 =fracture
    An example of the data set is given below
    Number ID Date_ICD ICD9 ICD10 Date_NCSP NCSP
    1 1 01feb197 805.3 . . .
    2 2 . . . 01may2017 WBGM00
    3 3 . . . 01nov2020 WBGM00
    4 4 01oct2018 S72.0 . .
    5 4 . . . 01sep2018 .
    6 5 01mar2000 . S52.5 . .
    7 5 01mar2000 . S52.5 . .
    8 5 01feb2000 . S52.5 . .
    9 6 . . . 01may2021 NCX32
    10 6 . . . 01jun2021 NCJ65
    11 6 01apr2009 S52.5 . .
    12 6 01jun2021 . S52.5 . .
    13 6 01may2021 S52.5 . .
    14 7 01mar2013 . S52.5 . .
    15 7 01may2013 . S52.5 . .
    16 8 01feb2009 . S42.2 . .
    17 8 01jan2009 . S42.2 . .
    18 8 01jun2021 . S72.1 . .
    19 8 . . . 01jun2021 NFJ61
    20 9 01feb2009 S42.2 . .
    21 9 01jan2009 . S42.2 . .
    22 9 . . . 01jun2021 NFJ61
    23 9 01jun2021 . S72.1 . .

    I have tried to figure out how to identify number of occurring fractures per individual based on the above stated criteria for fractures, but haven’t succeeded.
    I would be very pleased if I could get some tips regarding how to solve the following problems:
    1. For each individual experiencing a fracture that fulfil the above stated NCSP+ ICD-criteria, generate a variable with the date for the occurring Date_ICD event.
    2. For the ICD+ICD criteria, generate a variable that contains the date for the first occurring ICD-code.
      For individuals with several events fulfilling the stated fracture criteria, all the corresponding dates must be included for each individual.
    The result would preferably look something like:
    ID Date_ICD_NCSP_Fx Date_ICD_ICD_fx
    1 SomeDate SomeDate
    1 SomeDate .
    2 SomeDate .
    2 . SomeDate
    3 SomeDate .
    Hopefully it is possible to understand what I am trying to convey.

    Best wishes
    Ingebjørg

  • #2
    Your data tableau gives the visual gist of the data, but it does not convey metadata that is absolutely critical to writing code for your problem. I can make some educated guesses about what those might be, but, frankly, I will spend a lot more time setting up a facsimile of your data that reflects those guesses than it will take me to solve your actual problem. And it might all be a waste of time if those educated guesses prove wrong--the code I would write won't work in your actual data.

    Please post back with a data example created using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Added: While you're at it, your data has some problems that need attention. In observation number = 5, there is an NCSP date but no NCSP code. And in observation 1, the date is clearly invalid.
    Last edited by Clyde Schechter; 05 Apr 2022, 12:51.

    Comment


    • #3
      Also, you have some observations that are pure duplicates (e.g. number 6 and 7). That is usually indicative of data errors. If they are correct data (or are not pure duplicates when other variables in your real data set are taken into account) this raises a question: if a person has two identical ICD* codes on the same date, does that count as meeting the ICD_ICD criterion, or do the two dates have to be different?)

      Comment


      • #4
        Thank you so much for responding.
        This is what I get when I run the -dataex- command:
        . dataex in 1/50

        copy starting from the next line ---------- --------- --
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float ID str7 ICD9 str6 ICD10 float Date_ICD str10 NCSP float    Date_NCSP
        1 " 805.4" ""      13546 ""           .
        2 ""       ""          . "WBGM00" 20940
        3 ""       ""          . "WBGM00" 22220
        3 ""       ""          . "JUD05"  21550
        4 ""       ""          . "JUD05"  22371
        5 ""       ""          . "JUD05"  20606
        5 ""       ""          . "8065"   13209
        6 ""       ""          . "NBX30"  20545
        7 ""       ""          . "NBB02"  17592
        7 ""       ""          . "NBB12"  16223
        8 ""       ""          . "JUD05"  21154
        9 ""       "S72.0" 21458 ""           .
        9 ""       ""          . "NFB40"  21428
        10 ""       ""          . "NFB40"  17898
        11 ""       ""          . "JUD05"  20789
        12 ""       ""          . "NFB20"  17257
        12 ""       ""          . "NFB20"  17653
        13 ""       ""          . "WBGM00" 22159
        13 ""       ""          . "WBGM00" 18232
        13 ""       ""          . "WBGM00" 18232
        13 ""       ""          . "WBGM00" 18232
        13 ""       ""          . "WBGM00" 20240
        13 ""       ""          . "WBGM00" 21731
        13 ""       ""          . "JUD05"  21550
        13 ""       ""          . "WBGM00" 21550
        13 ""       ""          . "WBGM00" 21731
        13 ""       ""          . "WBGM00" 20270
        13 ""       ""          . "WBGM00" 22067
        14 ""       ""          . "JUD05"  21124
        15 ""       ""          . "JUD05"  21701
        16 ""       ""          . "WBGM00" 19632
        17 ""       "S52.5" 14670 ""           .
        17 ""       "S52.5" 14670 ""           .
        17 ""       "S52.5" 14641 ""           .
        18 ""       ""          . "NCX32"  22401
        18 ""       ""          . "NCJ65"  22432
        18 ""       "S52.5" 17988 ""           .
        18 ""       "S52.5" 22432 ""           .
        18 ""       "S52.5" 18018 ""           .
        18 ""       "S52.5" 22432 ""           .
        18 ""       ""          . "NCJ05"  22401
        18 ""       "S52.5" 22401 ""           .
        19 ""       ""          . "WBGM00" 22281
        19 ""       ""          . "WBGM00" 22097
        19 ""       ""          . "WBGM00" 22097
        20 ""       ""          . "WBGM00" 22312
        20 ""       ""          . "JUD05"  22250
        20 ""       ""          . "JUD05"  21185
        20 ""       ""          . "JUD05"  21609
        20 ""       ""          . "JUD05"  21366
        end
        format %td Date_ICD
        format %td Date_NCSP
        copy up to and including the previous line ----- --------- --

        Listed 50 out of 108648 observations

        I'm not sure why the dates appear the way as they do in the table. In STATA they are displayed as ddmmYYYY.
        Over to your other question regarding the duplicates:
        In the original data file, dates were reported as YYYY-MM, so all events happening in one month are reported as occurring 01.mm.yyyy. Duplicate dates are intended to be in the data set and meet the ICD-ICD criterion if the dates are the same. Hope this makes things a bit clearer.

        Ingebjørg

        Comment


        • #5
          I'm not sure why the dates appear the way as they do in the table. In STATA they are displayed as ddmmYYYY.
          This is how dates are supposed to appear in -dataex-. It makes them usable. At some point you should invest some time reading the date-time sections of the PDF manuals that come with your Stata installation so you learn how dates are handled in Stata. It's a long read, but it will pay off handsomely in your real-world use of Stata. Those numbers you see are the actual way that Stata represents those dates internally. The ddmmYYYY that you see in listings or in the data Browser are just a convenience display format that Stata uses--to make it understandable to human eyes. But in doing calculations, that human-readable stuff is not usable. The purpose of -dataex- is to give some code that those who want to help you can use to create a faithful replica of your data set, including details such as whether you really have Stata date variables or just have string variables that look like dates to human eyes but are useless for calculations. (Since you have the former, the code does not have to do anything to fix that problem.)

          Since the original dates were given only to monthly precision, I have chosen to calculate a Stata internal format monthly date variable, called mdate, to be used for the calculations. This will actually work much better, because I would otherwise have to proxy your 2 and 3 month criteria with an approximate number of days. Using the monthly dates, a difference of 2 months is 2 and a difference of 3 months is 3.

          Anyway, here is some code that does what I understand you to have asked for:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float ID str7 ICD9 str6 ICD10 float Date_ICD str10 NCSP float    Date_NCSP
          1 " 805.4" ""      13546 ""           .
          2 ""       ""          . "WBGM00" 20940
          3 ""       ""          . "WBGM00" 22220
          3 ""       ""          . "JUD05"  21550
          4 ""       ""          . "JUD05"  22371
          5 ""       ""          . "JUD05"  20606
          5 ""       ""          . "8065"   13209
          6 ""       ""          . "NBX30"  20545
          7 ""       ""          . "NBB02"  17592
          7 ""       ""          . "NBB12"  16223
          8 ""       ""          . "JUD05"  21154
          9 ""       "S72.0" 21458 ""           .
          9 ""       ""          . "NFB40"  21428
          10 ""       ""          . "NFB40"  17898
          11 ""       ""          . "JUD05"  20789
          12 ""       ""          . "NFB20"  17257
          12 ""       ""          . "NFB20"  17653
          13 ""       ""          . "WBGM00" 22159
          13 ""       ""          . "WBGM00" 18232
          13 ""       ""          . "WBGM00" 18232
          13 ""       ""          . "WBGM00" 18232
          13 ""       ""          . "WBGM00" 20240
          13 ""       ""          . "WBGM00" 21731
          13 ""       ""          . "JUD05"  21550
          13 ""       ""          . "WBGM00" 21550
          13 ""       ""          . "WBGM00" 21731
          13 ""       ""          . "WBGM00" 20270
          13 ""       ""          . "WBGM00" 22067
          14 ""       ""          . "JUD05"  21124
          15 ""       ""          . "JUD05"  21701
          16 ""       ""          . "WBGM00" 19632
          17 ""       "S52.5" 14670 ""           .
          17 ""       "S52.5" 14670 ""           .
          17 ""       "S52.5" 14641 ""           .
          18 ""       ""          . "NCX32"  22401
          18 ""       ""          . "NCJ65"  22432
          18 ""       "S52.5" 17988 ""           .
          18 ""       "S52.5" 22432 ""           .
          18 ""       "S52.5" 18018 ""           .
          18 ""       "S52.5" 22432 ""           .
          18 ""       ""          . "NCJ05"  22401
          18 ""       "S52.5" 22401 ""           .
          19 ""       ""          . "WBGM00" 22281
          19 ""       ""          . "WBGM00" 22097
          19 ""       ""          . "WBGM00" 22097
          20 ""       ""          . "WBGM00" 22312
          20 ""       ""          . "JUD05"  22250
          20 ""       ""          . "JUD05"  21185
          20 ""       ""          . "JUD05"  21609
          20 ""       ""          . "JUD05"  21366
          end
          format %td Date_ICD
          format %td Date_NCSP
          
          gen Date_ICD10 = Date_ICD if !missing(ICD10)
          rename Date_ICD Date_ICD9
          replace Date_ICD9 = . if missing(ICD9)
          
          rename (ICD* NCSP) code_=
          
          gen long obs_no = _n
          reshape long code Date, i(obs_no) j(code_type) string
          gen int mdate = mofd(Date) // GET MONTHLY DATE
          format mdate %tm
          
          assert missing(code) == missing(Date)
          drop if missing(code)
          
          by ID code (mdate), sort: gen byte ICD_ICD = inlist(code_type, "_ICD9", "_ICD10") ///
              & mdate[_n+1] - mdate <= 3
          by ID (mdate), sort: egen Date_ICD_ICD_fx = min(cond(ICD_ICD, mdate, .))
          format Date_ICD_ICD_fx %tm
          by ID (ICD_ICD), sort: replace ICD_ICD = ICD_ICD[_N]
          sort obs_no
          
          
              
          gen NCSP_Date_2 = mdate if code_type == "_NCSP"
          gen lower = mdate - 2 if code_type != "_NCSP"
          gen upper = mdate + 2 if code_type != "_NCSP"
          replace lower = 1 if code_type == "_NCSP"
          replace upper = -1 if code_type == "_NCSP"
          format NCSP_Date_2 lower upper %td
          
          preserve
          keep if !missing(NCSP_Date_2)
          keep ID NCSP_Date_2
          duplicates drop
          tempfile NCSP_obs
          save `NCSP_obs'
          restore
          
          drop NCSP_Date_2
          
          rangejoin NCSP_Date_2 lower upper using `NCSP_obs', by(ID)
          replace NCSP_Date_2 = . if code_type == "_NCSP"
          by ID (obs_no), sort: gen ICD_NCSP = !missing(NCSP_Date_2)
          by ID: egen Date_ICD_NCSP_fx = min(cond(ICD_NCSP, mdate, .))
          format Date_ICD_NCSP_fx %tm
          by ID (ICD_NCSP), sort: replace ICD_NCSP = ICD_NCSP[_N]
          drop NCSP_Date_2 lower upper
          duplicates drop
          sort obs_no
          Note: -rangejoin- is written by Robert Picard, and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

          Comment


          • #6
            Thank you so much for your help, the codes worked out perfectly. I will read the date-time section PDF as you recommended.

            Ingebjørg

            Comment

            Working...
            X