Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate Hospital Length of Stay-longest duration between 2 dates per episode per member

    Hello Statalist,

    I am trying to find the longest duration between 2 dates per episode per member.

    Background: I have hospital claims data and I am ultimately trying to assess average lengths of stay in hospital. The problem is that multiple lines of claims can be generated for each episode of stay for a member. I would like to identify the longest of these claims for each given episode and use that for my average length of stay calculation.

    The variables I have are member id (mem_ckey), starting date of service (from_dt) and ending date of service (to_dt).

    What I am trying to generate is the following:

    The 2 variables I need to generate are length of stay (LOS) and an indicator of whether the stay was the longest stay among a set of overlapping stays (longest). The LOS is relatively easy, but the longest variable I am not so sure about. May I please get some help?

    mem_ckey from_dt to_dt LOS longest
    1 13APR2007:00:00:00.000000 16APR2007:00:00:00.000000 3 1
    1 13APR2007:00:00:00.000000 14APR2007:00:00:00.000000 1 0
    1 10MAY2007:00:00:00.000000 19MAY2007:00:00:00.000000 9 1
    2 28FEB2013:00:00:00.000000 04MAR2013:00:00:00.000000 5 0
    2 27FEB2013:00:00:00.000000 04MAR2013:00:00:00.000000 6 1
    2 28FEB2013:00:00:00.000000 03MAR2013:00:00:00.000000 4 0
    Thanks!
    Last edited by Karl Yesler; 19 Feb 2015, 11:21.
    I am using Stata SE x64 ver 13.1 with Win 7 x64 and with 8 GB of ram.

  • #2
    There is an ambiguity in your description, that is not illuminated by your sample data. What if there are two or more different observations that are all tied for longest LOS? Do you want to mark them all as "longest," or just pick one? If the latter, which one?

    For now, I'll just assume you want to mark them all as longest. This code should do that:

    Code:
    by mem_ckey (LOS), sort: gen byte longest = (LOS == LOS[_N])

    NOTE: This code will not work correctly if there are any observations for which LOS is missing.

    Comment


    • #3
      Hi Clyde,

      Thanks for pointing out the ambiguity. If there is a tie (which implicitly means they have the same from_dt and to_dt) I would like to just choose one amongst the ties to mark as longest=1

      However, I would like to point out that the suggested code would does not quite work since it would give me the longest LOS per member but I am interested in the longest LOS per episode per member. That is, for each episode, what is the claim line with the longest LOS? An episode is defined as one in which there is any overlap in the from_dt and to_dt. So for the sample data I provided for mem_ckey==1, that member has 2 episodes, (lines 1 and 2 are 1 episode since they have overlapping from_dt and to_dt). Therefore I would mark line 1 and 3 as longest=1.

      What would be the requisite code in this situation?

      Thanks!
      Last edited by Karl Yesler; 19 Feb 2015, 11:45.
      I am using Stata SE x64 ver 13.1 with Win 7 x64 and with 8 GB of ram.

      Comment


      • #4
        I realize that I didn't read your post carefully enough, and the code in my previous response was based on my erroneous assumption that all of the observations within each mem_ckey group constituted a single set of overlapping claims. Since they do not, you first need to identify runs of overlapping claims. So something like this:

        Code:
        by mem_ckey (from_dt to_dt), sort: gen byte claim_group = ///
            !inrange(from_dt, from_dt[_n-1], to_dt[_n-1])
        replace claim_group = sum(claim_group)
        gen LOS = to_dt - from_dt
        by claim_group (LOS), sort: gen byte longest = (LOS == LOS[_N])
        As before, this code will flag as "longest" all the observations that are tied for longest among a run of overlapping claims. If you only want to flag one, you need to identify a criterion for which one. And, also as before, it will not work properly if there are any missing LOS values.

        Finally, I have a deeper question for you. It seems you are trying to use the longest claim period as a proxy for the actual LOS. That seems wrong to me. Wouldn't a better proxy be the duration from the earliest start date to the latest end date of a group of overlapping claims?

        Comment


        • #5
          Karl,

          I wrote my #4 before seeing your #3, so I didn't respond to selecting one. If you want to "just choose one amongst the ties," I interpret that to mean you don't care which one, and you don't care whether it will be the same one each time you run the program with the same data. In that case:

          Code:
          by mem_ckey (from_dt to_dt), sort: gen byte claim_group = ///
              !inrange(from_dt, from_dt[_n-1], to_dt[_n-1])
          replace claim_group = sum(claim_group)
          gen LOS = to_dt - from_dt
          by claim_group (LOS), sort: gen byte longest = (_n==_N)
          By the way, a tie does not necessarily mean the same from and to dates. If one claim runs from Feb 1 to Feb 19 and another runs from Feb2 to Feb20, they are tied in length and overlap.
          Last edited by Clyde Schechter; 19 Feb 2015, 11:55.

          Comment


          • #6
            Karl - Without knowing your analytic objective, let me ask if you really want to select the longest of the overlapping lengths of stay, or would it make more sense to generate the LOS as the difference between the earliest from_dt and the latest to_dt among the episodes in a single claim_group (adopting Clyde's nomenclature)?

            Clyde - I do not think your code will handle the following case:
            Code:
            1apr2014 10apr2014
            2apr2014 3apr2014
            5apr2014 6apr2014
            But I don't immediately see how to do better. As a friend of mine once said, I don't have recommendations, only counterexamples.

            Comment


            • #7
              Hi Clyde,

              Thanks for your insight.

              Regarding your point: "I have a deeper question for you. It seems you are trying to use the longest claim period as a proxy for the actual LOS. That seems wrong to me. Wouldn't a better proxy be the duration from the earliest start date to the latest end date of a group of overlapping claims?"

              This is somethign I am considering but again, as a relative Stata novice, I am unsure how to code. I assume that in the "gen LOS" I need to specify to consider the duration from the earliest start date to the latest end date of a group of overlapping claims? If so, how might I code that?

              Thank you very much for your help!

              I am using Stata SE x64 ver 13.1 with Win 7 x64 and with 8 GB of ram.

              Comment


              • #8
                iam:

                You are right, my code will not handle that example. And it's not immediately obvious to me how to deal with it. The property of being a non-overlapping group is apparently something that cannot be defined "locally" by just looking at consecutive pairs of claims. Oh, my! I will ponder for a while and see if I come up with anything.

                @ Karl:
                Sorry--this is harder than I first thought.
                Last edited by Clyde Schechter; 19 Feb 2015, 12:46.

                Comment


                • #9
                  The problem of overlapping hospital dates shows up from time to time on Statalist (here's an example). Nick has a good article on how to approach the problem.

                  SJ-13-1 dm0068 . . . . . Stata tip 114: Expand paired dates to pairs of dates
                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
                  Q1/13 SJ 13(1):217--219 (no commands)
                  tip on using expand to deal with paired dates

                  http://www.stata-journal.com/article...article=dm0068

                  Here's an example that identifies stays using the data provided and then some

                  Code:
                  input mem_ckey str10 (from_dt to_dt)
                  1    13APR2007    16APR2007
                  1    13APR2007    14APR2007
                  1    10MAY2007    19MAY2007
                  2    28FEB2013    04MAR2013
                  2    27FEB2013    04MAR2013
                  2    28FEB2013    03MAR2013
                  3 1apr2014 10apr2014
                  3 2apr2014 3apr2014
                  3 5apr2014 6apr2014
                  4 1apr2014 10apr2014
                  4 5apr2014 15apr2014
                  4 20apr2014 20apr2014
                  end
                  
                  * get data in shape
                  gen date1 = date(from_dt,"DMY")
                  gen date2 = date(to_dt,"DMY")
                  format %td date*
                  gen obs = _n
                  
                  * reshape to long with obs 1 = admission; obs 2 = discharge
                  reshape long date, i(obs) j(event)
                  
                  * define and inout increment for admission and discharge events
                  bysort obs (event): gen inout = cond(_n==1,1,-1)
                  
                  * for each patient, sort event by date; for multiple events on the same day,
                  * put admissions before discharge
                  gsort mem_ckey date -event
                  by mem_ckey: gen eventsum = sum(inout)
                  by mem_ckey: gen stay = sum(_n == 1 | eventsum[_n-1] == 0)
                  
                  * the length of stay
                  sort mem_ckey stay date
                  by mem_ckey stay: gen staystart = date[1]
                  by mem_ckey stay: gen stayend = date[_N]
                  by mem_ckey stay: gen los = stayend - staystart + 1
                  format %td staystart stayend
                  
                  * restore data to original format
                  bysort mem_ckey obs (event): keep if _n == 1
                  list, sepby(mem_ckey) noobs
                  Last edited by Robert Picard; 19 Feb 2015, 13:22. Reason: fixed link to Stata Journal article

                  Comment


                  • #10
                    Very nice, Robert. Thanks.

                    Comment


                    • #11
                      Clyde, Robert, and William (and Nick for the article),

                      Thank you all for your insightful comments. The reference and code referenced by Robert worked perfectly.

                      Thanks again!
                      I am using Stata SE x64 ver 13.1 with Win 7 x64 and with 8 GB of ram.

                      Comment

                      Working...
                      X