Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining multiple observations into one patient

    Hi,

    I have a data set in which there are multiple observations (i.e. indicated by time of arrival) for the same patient, and right now, my data set has observations (for the same patient) on different rows. I would like the observations to be on the same row. Every patient may have a different number of observations (i.e., patient 1 has 3 observations, patient 2 has 2 observations, patient 3 has 5 observations, etc). There are some duplicate rows. However, there are some observations (i.e., indicated by time of arrival) may have different chief complaints and/or ICD codes. For example:

    ID Date/Time of Arrival Chief Complaint ICD
    1 11/18 10:00 am Chest pain X10
    1 11/18 10:00 am Back ache X10
    1 11/20 11:00 am Headache X11
    2 11/13 10:00 am Chest pain X12

    1) How do I get rid of duplicate rows?
    2) How do I transform multiple observations into one patient row (accounting for the fact that there may be different chief complaints - for the same visit time -- on different rows)?

    Thanks!!

    Tasha

  • #2
    What you post is not from a Stata data set. It can't be, because several of your "variable names" are illegal. You really need to have your data imported into Stata before you begin.

    Since you don't have a Stata data set, and therefore can't provide real example data (the table you show is not sufficient for crafting code), I will just answer your questions in general terms:

    1. -duplicates drop- will eliminate duplicate observations (not "rows.")
    2. The -reshape- command will do this. -help reshape-. That said, you will probably regret doing this. Most of Stata's data management and analysis commands work best with precisely the long layout that your table suggests you have. Putting it into wide layout, one observation per patient, is only useful for a small number of exceptional commands. So think twice, and then some, before doing this; it's usually a bad idea to arrange the data that way.

    Please read the Forum FAQ for excellent advice about the ways to show example data, code, and results that are helpful to those who want to help you. Pay particular attention to FAQ #12.

    If you need specific coding help at this point, do post back, using the -dataex- command to give example data from your actual Stata data set.

    Comment


    • #3
      Hi Natasha,

      I took a pass at what I think you want. However, let me echo Clyde's comment that for any regressions or any other analysis that you do with the data, you will probably want the data in the long format (like you have it now). It is also very easy in long formet to calculate the number of visits a patient had, average time between visits, whether they returned within 30 days, etc.

      So there is a helpful YouTube video on reshaping long to wide here and here.

      Code:
      * I built a toy dataset like yours to test the code
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte id double date str13 chief_complaint str3 icd
      1 21124 "Back Ache"     "X10"
      1 21124 "Chest Pain"    "X10"
      1 21128 "Headache "     "X11"
      1 21154 "Diabetes"      "E08"
      2 21094 "Back Ache"     "X10"
      2 21095 "Chest Pain"    "X10"
      3 21158 "Headache "     "X11"
      4 21094 "Diabetes"      "E08"
      4 21094 "Influenza"     "J09"
      4 21095 "Headache "     "X11"
      5 21154 "Heart attack"  "I25"
      5 21168 "Scarlet fever" "A38"
      end
      format %td date
      Code:
      * When you paste dates like 11/18/2017 into Stata from Excel, it treats the date like a string
      * This converts it to Stata date format
      gen double date2 = date( date, "MDY")
      format %td date2
      ssc install placevar  // in case you don't have it
      placevar date2, after(date)
      drop date
      rename date2 date
      
      * This actually reshapes the data to wide format
      duplicates drop
      bysort id: gen visit = _n
      reshape wide date chief_complaint icd, i(id) j(visit)
      So this took the data from long format (like this)
      Click image for larger version

Name:	Stata - ICD long.png
Views:	1
Size:	16.8 KB
ID:	1470125




      And converted it to this (which may be hard to read as a screenshot)
      Click image for larger version

Name:	Stata - ICD wide.png
Views:	1
Size:	15.7 KB
ID:	1470126

      Comment


      • #4
        Also, this is unrelated to your data management question, but since you are using ICD codes, it turns out that Stata has a lot of commands (both built in and user written) regarding ICD codes. Just type in “search ICD” in Stata or Help > Search > ICD to take a look at them.

        Comment


        • #5
          Hi,

          Thank you Clyde and David for your recommendations. As you can tell, I am very new to STATA, so I'm still learning how to even input data and post on the FAQ page. I will try the strategies you listed.

          Thanks so much!

          Tasha

          Comment


          • #6
            Hi Clyde and David,

            Thank you both for your help! Since this last post, I have looked at more Stata videos and the Statalist the FAQ forum. That being said, I am still interested in reformatting my data from long to wide format and I am still having trouble. My goal is to have one line per patient. For example, I would like, the data to look like:
            id1 Arrivaldate1 chiefcomplaint1v1 chiefcomplaint2v1 chiefcomplaint3v3 Arrivaldate2 chiefcomplaint1v2 chiefcomplaint2v2 chiefcomplaint3v3
            I have data management questions that I believe need to be answered/fixed before I can reformat the data set to get to online line per patient.

            Question 1) I tried to create a visit number variable, using this code:
            [code] list, sepby(id1)
            bysort id1 (Arrivaltime): generate visit=_n
            [code]

            But my result looked like this:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str10 id1 float(Arrivaltime visit)
            "B20389899" 20083  1
            "B20389899" 20083  2
            "B20389899" 20083  3
            "B20389899" 20083  4
            "B20389899" 20092  5
            "B20389899" 20092  6
            "B20389899" 20092  7
            "B20389899" 20092  8
            "B20377710" 20400  1
            "B20377710" 20400  2
            "B20377710" 20800  3
            "B20377710" 20800  4
            "B20377710" 20800  5
            "B20377710" 20800  6
            "B20377710" 20800  7
            "B20377710" 20800  8
            "B20377710" 20800  9
            "B20377710" 20800 10
            "B20377710" 20800 11
            "B20377710" 20800 12
            "B20377710" 20800 13
            "B20377710" 20800 14
            "B20377710" 20800 15
            "B20377710" 20800 16
            "B20377710" 20800 17
            "B20377710" 20800 18
            "B20377710" 20800 19
            "B20377710" 20800 20
            "B20377710" 20800 21
            "B20377710" 20800 22
            "B20377710" 20800 23
            "B20377710" 20800 24
            "B20377710" 20800 25
            "B20377710" 20800 26
            "B20377710" 20999 27
            "B20377710" 20999 28
            "B20377710" 20999 29
            "B20377710" 20999 30
            "B20377710" 21000 31
            "B20377710" 21000 32
            "B20377710" 21000 33
            "B20377710" 21000 34
            "B20377710" 21000 35
            "B20377710" 21000 36
            "B20377710" 21000 37
            "B20377710" 21000 38
            "B20377710" 21000 39
            "B20377710" 21000 40
            "B20377710" 21000 41
            "B20377710" 21000 42
            "B20377710" 21000 43
            "B20377710" 21000 44
            "B20377710" 21000 45
            "B20377710" 21000 46
            "B20377710" 21000 47
            "B20377710" 21000 48
            "B20377710" 21000 49
            "B20377710" 21000 50
            "B20377710" 21000 51
            "B20377710" 21000 52
            "B20377710" 21000 53
            "B20377710" 21000 54
            end
            format %tdNN/DD/CCYY Arrivaltime
            I am interested in recreating a variable such that when the arrival dates are the same for one id, that they are all coded as the same visit. For example, instead of how the data looks above example, I would like the data to look like this:
            "B20389899" 20083 1
            "B20389899" 20083 1
            "B20389899" 20083 1
            "B20389899" 20083 1
            "B20389899" 20092 2
            "B20389899" 20092 2
            "B20389899" 20092 2
            "B20389899" 20092 2
            "B20377710" 20400 1
            "B20377710" 20400 1
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2
            "B20377710" 20800 2

            2) I would like to have one line per patient. For example, I would like the data to look like this:
            ID1 Arrivaldate1 Abusescreeningq1v1 Abusescreeningresponse1v1 Abusescreeningq2v1 Absuesccreeningquestion2v1 chiefcomplaint1v1 chiefcomplaint2v1 chiefcomplaint3v3 Arrivaldate2 Abusescreeningq1v2 Abusescreeningresponse1v2 Abusescreeningq2v2 Absuesccreeningquestion2v2 chiefcomplaint1v2 chiefcomplaint2v2 chiefcomplaint3v3.

            PART A: I have two variables correspond (Abusescreenq and Abusescreeningresponse) that I would like to be on the same line. Right now the data looks like this:
            . dataex id1 Arrivaltime Abusescreeningq Abusescreeningresponse in 1/10

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str10 id1 float Arrivaltime long(Abusescreeningq Abusescreeningresponse)
            "B20389899" 20083 2 1
            "B20389899" 20083 1 1
            "B20389899" 20083 1 1
            "B20389899" 20083 2 1
            "B20377710" 20092 1 1
            "B20377710" 20092 2 1
            "B20377710" 20092 2 1
            "B20377710" 20092 1 1
            "B20377710" 20400 2 1
            "B20377710" 20400 1 1
            end
            format %tdNN/DD/CCYY Arrivaltime
            label values Abusescreeningq Abusescreeningresponse
            label values Abusescreeningresponse Abusescreeningresponse
            label def Abusescreeningresponse 1 "0", modify
            label def Abusescreeningresponse 2 "1 (Comment)", modify
            Instead, I would like the data to be on the same line. For example, I would like the data to look like below (without deleting other data):

            Code:
            input str10 id1 float Arrivaltime long(Abusescreeningq1 Abusescreeningresponse1 Abusescreeningq2 Abusescreeningresponse2 Abusescreeningq1date2 Abusescreeningresponse1date2 Abusescreeningq2date2 Abusescreeningresponse2date2)
            "B20389899" 20074 2 1 1 1
            "B20377710" 20098 2 1 1 1 20411 2 1 1 1
            end
            PART B: Similar to the question above, I would like responses to be on the same line for the same patient. However, I have several variables (such as diagnosis, chief complaint, tobaccouser, cigarette smoker) which occur on multiple lines for the same date. Additionally, some variables may have different responses for the same date and same patient (i.e., chest pain and sore throat for the variable Chiefcomplaint), which I assume will need to be recreated into separate variables (Chiefcomplaint1 Chiefcomplaint2).

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str10 id1 float Arrivaltime byte acuitylevel str42 icd str1 primarydx long Chiefcomplaint byte(ivdruguser sexuall1active)
            "B20389899" 20083 4 "R68.89"   "Y" 493 . .
            "B20389899" 20083 4 "R68.89"   "Y" 493 . .
            "B20389899" 20083 4 "R68.89"   "Y" 105 . .
            "B20389899" 20083 4 "R68.89"   "Y" 105 . .
            "B20389899" 20092 4 "Y04.1XXA" "Y"  50 . .
            "B20389899" 20092 4 "S00.81XA" "N"  50 . .
            "B20389899" 20092 4 "Y04.1XXA" "Y"  50 . .
            "B20389899" 20092 4 "S00.81XA" "N"  50 . .
            "B20377710" 20400 2 "E13.10"   "N" 541 . .
            "B20377710" 20400 2 "E13.10"   "N" 541 . .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 .
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 3
            "B20377710" 20800 3 "A08.4"    "Y"   4 0 2
            "B20377710" 20999 2 "R10.9"    "Y" 282 . .
            "B20377710" 20999 2 "R10.9"    "Y" 215 . .
            "B20377710" 20999 2 "R10.9"    "Y" 215 . .
            "B20377710" 20999 2 "R10.9"    "Y" 282 . .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 .
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 3
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20377710" 21000 2 "E11.9"    "Y" 282 0 2
            "B20399999" 20001 2 "T30.0"    "Y"  88 . .
            "B20399999" 20001 2 "T30.0"    "Y"  88 . .
            "B20367672" 20777 4 "IMO0002"  "Y" 321 . .
            "B20367672" 20777 4 "IMO0002"  "Y" 321 . .
            "B20367672" 20098 4 "A08.4"    "Y" 143 . .
            "B20367672" 20098 4 "A08.4"    "Y" 172 . .
            "B20367672" 20098 4 "A08.4"    "Y" 172 . .
            "B20367672" 20098 4 "A08.4"    "Y" 143 . .
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 1
            "E100137091" 20131 2 "R45.851"  "Y" 138 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 3
            "E100137091" 20131 2 "R45.851"  "Y" 505 0 1
            end
            format %tdNN/DD/CCYY Arrivaltime
            label values Chiefcomplaint Chiefcomplaint
            label def Chiefcomplaint 4 "ABDOMINAL PAIN", modify
            label def Chiefcomplaint 50 "ASSAULT VICTIM", modify
            label def Chiefcomplaint 88 "BURN", modify
            label def Chiefcomplaint 105 "CHEST PAIN", modify
            label def Chiefcomplaint 138 "DEPRESSION", modify
            label def Chiefcomplaint 143 "DIARRHEA", modify
            label def Chiefcomplaint 172 "EMESIS", modify
            label def Chiefcomplaint 215 "FLANK PAIN", modify
            label def Chiefcomplaint 282 "HYPERGLYCEMIA", modify
            label def Chiefcomplaint 321 "LACERATION", modify
            label def Chiefcomplaint 493 "SORE THROAT", modify
            label def Chiefcomplaint 505 "SUICIDAL IDEATION", modify
            label def Chiefcomplaint 541 "URINARY FREQUENCY", modify
            Thank you so much for your help. I appreciate it!

            Comment


            • #7
              I am interested in recreating a variable such that when the arrival dates are the same for one id, that they are all coded as the same visit. For example, instead of how the data looks above example, I would like the data to look like this:
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str10 id1 float Arrivaltime
              "B20389899" 20083
              "B20389899" 20083
              "B20389899" 20083
              "B20389899" 20083
              "B20389899" 20092
              "B20389899" 20092
              "B20389899" 20092
              "B20389899" 20092
              "B20377710" 20400
              "B20377710" 20400
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20800
              "B20377710" 20999
              "B20377710" 20999
              "B20377710" 20999
              "B20377710" 20999
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              "B20377710" 21000
              end
              format %tdNN/DD/CCYY Arrivaltime
              
              by id1 (Arrivaltime), sort: gen visit = sum(Arrivaltime != Arrivaltime[_n-1])
              As for Part A, it does not seem possible to do what you are asking here. Reshaping the data involves re-arranging it, but the total amount of information is conserved. Your long version has more data than your wide version. For the same person and visit date you have more observations in the long format than there are variables in the wide format. So there needs to be some process first that discards the data you don't want to keep, or something else is wrong.

              Let's get that resolved before proceeding farther.

              Comment

              Working...
              X