Hybrid long-wide. Help with transforming dataset

Jesper Eriksson

Join Date: Oct 2016
Posts: 98

Hybrid long-wide. Help with transforming dataset

24 Jun 2020, 09:05

Hi,

I recently got a dataset of 600 patients admitted to the ICU. It contains info regarding sex, age, height etc (data that occurs once and does not change). Patients stay in the ICU for different lengths of time, they also take different blood samples. It contains more than 1500 different types of samples.

My problem is that the dataset I received is a hybrid between long and wide format, and I can not get my head around how to solve it and make the set (preferably) long, or at least consistent.

Variables:
pat_id - anonymized patient identification number
AdmissionTime - Time patient was admitted to the ICU
DischargeTime - Time patient was discharged from the ICU
Sex - Sex of patient
Age - Age at admission
Dead - Dead during stay
DeathDate - Date of death
sample_datetime - Date and time when blood sample was taken
sample_type - Type of blood sample (ex hemoglobin, creatinine, potassium and so on)
sample_acute - 1 if sample was ordered acute, 0 if ordered with normal analysing time
sample_value - Value of the blood sample
sample_unit - The unit of the blood sample (milligrams/ml, mmol/L and so on)
sample_normal_ref - The normal reference range of the blood sample value

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pat_id double(AdmissionTime DischargeTime) str1 Sex str2 Age byte Dead int DeathDate double sample_datetime str105 sample_type byte sample_acute str24 sample_value str10 sample_unit str13 sample_normal_ref
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-St.bik ABL/PNA"     0 "26"   "mmol/L"     "22-27"    
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pO2 ABL/PNA"        0 "9,0"  "kPa"        "8,0-13,0" 
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-MetHb ABL/PNA"      0 "0,8"  "%"          "<1,5"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Glukos ABL/PNA"     0 "8,5"  "mmol/L"     "4,0-6,0"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-FO2Hb ABL/PNA"      0 "94"   "%"          ">93"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Ca,fri ABL/PNA"     0 "1,10" "mmol/L"     "1,02-1,31"
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Natrium ABL/PNA"    0 "137"  "mmol/L"     "137-145"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Laktat ABL/PNA"     0 "1,7"  "mmol/L"     "0,5-2,3"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Syrgastillf% ABL/PNA"  0 "85"   "%"          ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pCO2 ABL/PNA"       0 "3,6"  "kPa"        "4,6-6,0"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-AnjGap ABL/PNA"     0 "6,8"  "mmol/L"     "8-16"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Klorid ABL/PNA"     0 "107"  "mmol/L"     "98-107"   
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Krea ABL/PNA"       0 "71"   "mikromol/L" "<100"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-BE(st) ABL/PNA"     0 "1"    "mmol/L"     "-3-3"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Kalium ABL/PNA"     0 "3,9"  "mmol/L"     "3,5-4,6"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pH ABL/PNA"         0 "7,54" ""           "7,35-7,45"
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-COHb ABL/PNA"       0 "0,1"  "%"          "<2,0"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-sO2 ABL/PNA"        0 "95"   "%"          ">95"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Oxygentillförsel PNA" 0 "0,0"  "L/min"      ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Hb ABL/PNA"         0 "151"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "X-LD"                  1 "9,9"  "mikrokat/L" ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-LD"                  1 "10,4" "mikrokat/L" "<4,3"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Eosinofila granulo"  1 "<0,1" "x10(9)/L"   "0,0-0,5"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-CRP"                 1 "167"  "mg/L"       "<3"       
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Troponin T"          1 "15"   "nanog/L"    "<15"      
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-ASAT"                1 "2,07" "mikrokat/L" "<0,76"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Hemoglobin"          1 "155"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "Erc(B)-MCH"            1 "30"   "pg"         "27-33"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Monocyter"           1 "0,1"  "x10(9)/L"   "0,2-0,8"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-PK(INR)"             1 "1,2"  "INR"        "<1,2"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Pankreasamylas"      1 "0,77" "mikrokat/L" "0,15-1,10"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-ALAT"                1 "1,04" "mikrokat/L" "<1,1"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Erytrocyter"         1 "5,1"  "x10(12)/L"  "4,2-5,7"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "Erc(B)-MCV"            1 "87"   "fL"         "82-98"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Lymfocyter"          1 "0,8"  "x10(9)/L"   "1,1-3,5"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Calcium"             1 "2,16" "mmol/L"     "2,15-2,50"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-GT"                  1 "1,9"  "mikrokat/L" "<2,0"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-EVF"                 1 "0,44" ""           "0,39-0,50"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Neutrofila granulo"  1 "6,7"  "x10(9)/L"   "1,6-5,9"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-APT-tid"             1 "30"   "s"          "20-30"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Albumin"             1 "22"   "g/L"        "34-45"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Trombocyter"         1 "254"  "x10(9)/L"   "145-348"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Basofila granulo"    1 "<0,1" "x10(9)/L"   "0,0-0,1"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Leukocyter"          1 "7,6"  "x10(9)/L"   "3,5-8,8"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Bilirubin"           1 "13"   "mikromol/L" "<26"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "Oxygentillförsel PNA" 0 "16,0" "L/min"      ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Natrium ABL/PNA"    0 "137"  "mmol/L"     "137-145"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-sO2 ABL/PNA"        0 "91"   "%"          ">95"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Hb ABL/PNA"         0 "152"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-AnjGap ABL/PNA"     0 "7,1"  "mmol/L"     "8-16"     
end
format %tcnn/dd/ccYY_hh:MM AdmissionTime
format %tcnn/dd/ccYY_hh:MM DischargeTime
format %tdnn/dd/CCYY DeathDate
format %tcnn/dd/ccYY_hh:MM sample_datetime
label values pat_id pat_id

When I have cleaned sample that was missing values, non-interesting sample-types and more I have 640k observations left, 416 different patients. patients are staying in the ICU from 2 days up until 60+ days. All patients have different samples taken and at different times. As you can see the problem is that the sample_* variables are listed one by one as observations.

What I would like:
Short answer is something that is easier to work with. Long or wide format.

Maybe something like this:
1. Long format
2. One day (days since admission) on each row. "day" == 1 corresponding to the first calender day admitted. day == 2 the second day and so on. Up until day 60+ for a couple of patients. That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged. Fine by me.
3. One sample is one variable. Ex "aB-St.bik ABL/PNA" is one, "aB-pO2 ABL/PNA" the next. 671 different samples types.
If several samples of the same type was taken on the same day I would like the second to be named "aB-St.bik ABL/PNA_2nd" and the third sample of the same type and day be named "aB-St.bik ABL/PNA_3rd" and so on.
4. Corresponding values for each sample on respective observation (i.e. day).

Maybe something like this:

pat_id

day

AdmissionTime

DischargeTime

Sex

Age

Dead

Deathdate

aB_Stbik_ABL/PNA_1st

aB_Stbik_ABL/PNA_1st_datetime

aB_Stbik_ABL/PNA_1st_acute

aB_Stbik_ABL/PNA_1st_unit

aB_Stbik_ABL/PNA_2nd

aB_Stbik_ABL/PNA_2nd_datetime

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 12:08

mmol/L

4/11/2020 14:10

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/16/2020 17:08

mmol/L

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

I hope I could make myself understood. Other suggestions how to arrange my dataset is more than welcome.

All the best,

Jesper Eriksson

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

24 Jun 2020, 12:45

I'd say that the best data layout will substantially depend on what questions you want to ask of the data, and what particular statistical procedures you want to perform, so what follows might be off base. Some thoughts from you what you want out of this data might completely change my thinking.

Leaving that aside: I'd start off by making a data set in which each observation is a person-day. I would go "wide" with respect to "samples" on a given day. I'm confused by one thing you say: "That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged." I don't see why there would need to be any observations for a person *after* they were discharged because I don't see any variables whose values change after discharge. I would just have data records for every day on which a measurable event occurred in the ICU, that is, no data records after the time of death or discharge. So, I would not think that your thinking about missing values would apply.

I don't think your idea of naming *variables* by the type of sample likely to be workable because the sample type values are relatively long strings containing characters that can't be part of a variable *name* (e.g., "." or "-"). Further, there are so many different kinds of samples that this would create huge number of sample variables that are irrelevant to most individuals. In lieu of that, I'd suggest that in going to a format that is wide with respect to repeated samples, you could order and name the samples by type and time, so that, for example, if two B_Hemoglobin samples were analyzed, the first one in time would appear in one sample variable and the second one in the next. I also suspect that there is not any particular purpose for which the naming you want is going to be necessary or useful, but I might be completely wrong.

Finally: While your data example is helpful and remarkably complex, it's only for one individual, so I may not have thought of some subtleties that would occur had I seen more individuals.

Code:

compress // save space and mess // Day in ICU numbered sequentially from 1. gen int ICUday = 1+ dofc(sample_datetime) - dofc(AdmissionTime) order ICUday, after(AdmissionTime) // Number samples sequentially by type and secondarily by time. bysort pat_id ICUday (sample_type sample_datetime): gen int sequence = _n // Each observation will be a patient day, with samples going wide reshape wide sample_*, i(pat_id ICUday) j(sequence) // I like to have the time-invariant variables first. order pat_id AdmissionTime-DeathDate
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#3

25 Jun 2020, 00:10

Originally posted by Mike Lacy View Post

I'd say that the best data layout will substantially depend on what questions you want to ask of the data, and what particular statistical procedures you want to perform, so what follows might be off base. Some thoughts from you what you want out of this data might completely change my thinking.

Leaving that aside: I'd start off by making a data set in which each observation is a person-day. I would go "wide" with respect to "samples" on a given day. I'm confused by one thing you say: "That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged." I don't see why there would need to be any observations for a person *after* they were discharged because I don't see any variables whose values change after discharge. I would just have data records for every day on which a measurable event occurred in the ICU, that is, no data records after the time of death or discharge. So, I would not think that your thinking about missing values would apply.

I don't think your idea of naming *variables* by the type of sample likely to be workable because the sample type values are relatively long strings containing characters that can't be part of a variable *name* (e.g., "." or "-"). Further, there are so many different kinds of samples that this would create huge number of sample variables that are irrelevant to most individuals. In lieu of that, I'd suggest that in going to a format that is wide with respect to repeated samples, you could order and name the samples by type and time, so that, for example, if two B_Hemoglobin samples were analyzed, the first one in time would appear in one sample variable and the second one in the next. I also suspect that there is not any particular purpose for which the naming you want is going to be necessary or useful, but I might be completely wrong.

Finally: While your data example is helpful and remarkably complex, it's only for one individual, so I may not have thought of some subtleties that would occur had I seen more individuals.

Code:

compress // save space and mess // Day in ICU numbered sequentially from 1. gen int ICUday = 1+ dofc(sample_datetime) - dofc(AdmissionTime) order ICUday, after(AdmissionTime) // Number samples sequentially by type and secondarily by time. bysort pat_id ICUday (sample_type sample_datetime): gen int sequence = _n // Each observation will be a patient day, with samples going wide reshape wide sample_*, i(pat_id ICUday) j(sequence) // I like to have the time-invariant variables first. order pat_id AdmissionTime-DeathDate

Hi Mike, thanks for your reply.
I should of course have told you all what I am planning to do. I am going to perform longitudinal clustering, based on several variables. At the moment not sure which ones. I have to try several before I can close in on a final model.
In my analysis it is important that the samples are taken in approximately the same time. Therefore I would like to sort the samples by day, I am using a finite mixture models (traj plugin for stata is one example), i need the samples ordered in time with approximately the same time-intervals. So your suggestion, however appealing, would mean that in my wide-format dataset I can not tell if for example B_Hemoglobin_2 is sampled on day 1 or 2 or even 10. For one patient it could have been sampled on day 1 and for another on day 10. All depending on how many samples and when they were taken. Some patients (the sickest ones) have a lot of samples taken repeatedly the first critical days. The more stable ones have fewer.

Hence my efforts to sort my samples in a way (long or wide, does not matter) that means that all samples named B_Hemoglobin_1st_1 is taken on day 1st day and is the first sample that day. B_Hemoglobin_2nd_1 is the second sample taken on day 1, B_Hemoglobin_1st_2 is the first on the second day and so forth. In this way i could know which samples are taken on a specific day and compare those between patients. If i am numbering the samples sequentially i have no way of knowing if two samples are sampled on the same day. It will be very hard for me to use the datetime-stamp for respective sample in my analysis later.

What i meant by: "That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged." was that if i would arrange my dataset in long format like the table in my example, a patient who were discharged on day 60 would have samples taken most days, in contrast to a patient who were discharged on day 2 who wouldn't have any samples after day 2 (naturally...). However if i am going long, all patients would indeed have days up to the maximum number of days someone was admitted (63 days if i am not mistaken). I guess i can not have a long format with some patients sampled for 2 days and some others sampled for 63 days? All patients need to have the same number of days in long format, right? This is not a big problem however, if patients are right-censored due to discharge (or death).

The names of the string variables are a problem, I agree. I do not know how to tackle that easily. I guess some kind of loop to remove non-allowable characters and to shorten the names.

I will also end up with a lot of samples variabled, that is certainly correct. I am not sure if this is feasible actually.

So to summarize, i need some way of knowing (by variable name) if the samples are taken on the same day. That would mean that I could have several observations each person-day (if a patient have the same type of sample analysed in the same day).

I will paste another patient in dataex after this post.

Again, thanks for your reply. Much appreciated.

/Jesper
Comment

Jesper Eriksson

Join Date: Oct 2016
Posts: 98

25 Jun 2020, 00:38

Patient number 4 (randomly taken). first 50 lines:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pat_id double(AdmissionTime DischargeTime) str1 Sex str2 Age byte Dead int DeathDate double sample_datetime str105 sample_type byte sample_acute str24 sample_value str10 sample_unit str13 sample_normal_ref
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-EVF ABL/PNA"       0 "0,42"         ""           "0,35-0,46"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Hb ABL/PNA"        0 "138"          "g/L"        "117-153"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Glukos ABL/PNA"    0 "8,1"          "mmol/L"     "4,0-6,0"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-pCO2 ABL/PNA"      0 "7,1"          "kPa"        "4,6-6,0"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-BE(a) ABL/PNA"     0 "1"            "mmol/L"     "-3-3"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Ca,fri ABL/PNA"    0 "1,22"         "mmol/L"     "1,02-1,31"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-pH ABL/PNA"        0 "7,33"         ""           "7,35-7,45"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Kalium ABL/PNA"    0 "3,8"          "mmol/L"     "3,5-4,6"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-COHb ABL/PNA"      0 "0,7"          "%"          "<2,0"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Klorid ABL/PNA"    0 "91"           "mmol/L"     "98-107"       
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-AnjGap ABL/PNA"    0 "9,4"          "mmol/L"     "8-16"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-sO2 ABL/PNA"       0 "15"           "%"          ">95"          
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Natrium ABL/PNA"   0 "128"          "mmol/L"     "137-145"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-pO2 ABL/PNA"       0 "2,0"          "kPa"        "8,0-13,0"     
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-St.bik ABL/PNA"    0 "23"           "mmol/L"     "22-27"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902141059999.9998 "aB-Laktat ABL/PNA"    0 "2,1"          "mmol/L"     "0,5-2,3"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "B-Leukocyter"         1 "28,7"         "x10(9)/L"   "3,5-8,8"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "B-Hemoglobin"         1 "125"          "g/L"        "117-153"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "S-S 100 B"            1 "28"           "mikrog/L"   "<0,11"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-Natrium"            1 "127"          "mmol/L"     "137-145"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-Fibrinogen (koag)"  1 "2,7"          "g/L"        "2,0-4,2"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "B-Erytrocyter"        1 "4,0"          "x10(12)/L"  "3,9-5,2"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-Kalium"             1 "3,4"          "mmol/L"     "3,5-4,6"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-PK(INR)"            1 "1,1"          "INR"        "<1,2"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "Erc(B)-MCH"           1 "31"           "pg"         "27-33"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "Pt-eGFR(Krea)relativ" 1 "74"           "mL/min/1.7" ">60"          
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "Erc(B)-MCV"           1 "88"           "fL"         "82-98"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-Kreatinin"          1 "57"           "mikromol/L" "<90"          
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "B-EVF"                1 "0,36"         ""           "0,35-0,46"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "S-Etanol"             1 "Ej påvisbar" "mmol/L"     "Ej påvisbart"
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-APT-tid"            1 "28"           "s"          "20-30"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "B-Trombocyter"        1 "307"          "x10(9)/L"   "165-387"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1.9021425e+12 "P-Calcium"            1 "2,30"         "mmol/L"     "2,15-2,50"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-pCO2 ABL/PNA"      0 "5,4"          "kPa"        "4,6-6,0"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Hb ABL/PNA"        0 "109"          "g/L"        "117-153"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Laktat ABL/PNA"    0 "0,8"          "mmol/L"     "0,5-2,3"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-pO2 ABL/PNA"       0 "14,7"         "kPa"        "8,0-13,0"     
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Glukos ABL/PNA"    0 "7,9"          "mmol/L"     "4,0-6,0"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Kalium ABL/PNA"    0 "3,2"          "mmol/L"     "3,5-4,6"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-EVF ABL/PNA"       0 "0,34"         ""           "0,35-0,46"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-COHb ABL/PNA"      0 "0,5"          "%"          "<2,0"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Ca,fri ABL/PNA"    0 "1,22"         "mmol/L"     "1,02-1,31"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-BE(a) ABL/PNA"     0 "-2"           "mmol/L"     "-3-3"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Natrium ABL/PNA"   0 "125"          "mmol/L"     "137-145"      
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-AnjGap ABL/PNA"    0 "7,6"          "mmol/L"     "8-16"         
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-Klorid ABL/PNA"    0 "94"           "mmol/L"     "98-107"       
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-sO2 ABL/PNA"       0 "98"           "%"          ">95"          
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-St.bik ABL/PNA"    0 "23"           "mmol/L"     "22-27"        
4 1903447800000.0002 1903548180000 "F" "78" 1 22035 1902144539999.9998 "aB-pH ABL/PNA"        0 "7,36"         ""           "7,35-7,45"    
4 1903447800000.0002 1903548180000 "F" "78" 1 22035      1902154020000 "S-Prokalcitonin"      1 "0,37"         "mikrog/L"   "<0,5"         
end
format %tcnn/dd/ccYY_hh:MM AdmissionTime
format %tcnn/dd/ccYY_hh:MM DischargeTime
format %tdnn/dd/CCYY DeathDate
format %tcnn/dd/ccYY_hh:MM sample_datetime
label values pat_id pat_id

Comment

Jesper Eriksson

Join Date: Oct 2016
Posts: 98

25 Jun 2020, 00:45

50 random observations:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pat_id double(AdmissionTime DischargeTime) str1 Sex str2 Age byte Dead int DeathDate double sample_datetime str105 sample_type byte sample_acute str24 sample_value str10 sample_unit str13 sample_normal_ref
 26 1903984199999.9998                  . "F" "74" 0     . 1906408380000.0002 "aB-Glukos ABL/PNA"    0 "13,3" "mmol/L"     "4,0-6,0"  
 46 1904254199999.9998 1904753280000.0002 "F" "72" 0     .      1904362860000 "P-GT"                 1 "1,1"  "mikrokat/L" "<1,3"     
 49      1.9033164e+12 1906133999999.9998 "M" "71" 0     . 1905593160000.0002 "aB-AnjGap ABL/PNA"    0 "7,7"  "mmol/L"     "8-16"     
 52      1903330260000 1905170400000.0002 "M" "70" 0     . 1903396379999.9998 "aB-EVF ABL/PNA"       0 "0,37" ""           "0,39-0,50"
 53      1.9014717e+12 1901624399999.9998 "M" "70" 0     . 1901426460000.0002 "B-Monocyter"          0 "0,5"  "x10(9)/L"   "0,2-0,8"  
 78 1902570900000.0002                  . "M" "67" 0     . 1905566940000.0002 "Pt-eGFR(Krea)relativ" 1 ">90"  "mL/min/1.7" ">60"      
 86      1902693660000 1906729139999.9998 "M" "66" 0     . 1903558739999.9998 "aB-EVF ABL/PNA"       0 "0,28" ""           "0,39-0,50"
101      1902268440000      1903702920000 "M" "65" 0     . 1903212900000.0002 "aB-pH ABL/PNA"        0 "7,42" ""           "7,35-7,45"
105      1.9040022e+12 1905184680000.0002 "M" "65" 0     .      1904749680000 "aB-Natrium ABL/PNA"   0 "145"  "mmol/L"     "137-145"  
106      1.9018233e+12 1904824800000.0002 "M" "65" 0     .      1902376380000 "P-Albumin"            1 "13"   "g/L"        "36-45"    
107      1.9016214e+12       1.903122e+12 "F" "65" 1 22026      1902777780000 "aB-AnjGap ABL/PNA"    0 "9,3"  "mmol/L"     "8-16"     
109 1902415800000.0002 1903631639999.9998 "F" "64" 0     . 1903656480000.0002 "aB-BE(a) ABL/PNA"     0 "3"    "mmol/L"     "-3-3"     
114 1902184560000.0002 1905169499999.9998 "M" "64" 0     .      1.9022871e+12 "B-Erytrocyter"        1 "4,4"  "x10(12)/L"  "4,2-5,7"  
120 1901710799999.9998 1905603599999.9998 "M" "64" 0     . 1905787199999.9998 "vB-Laktat ABL/PNA"    0 "1,0"  "mmol/L"     "0,5-2,3"  
123 1903842000000.0002                  . "M" "64" 0     .      1903530480000 "aB-BE(st) ABL/PNA"    0 "2"    "mmol/L"     "-3-3"     
139 1900685340000.0002 1902420299999.9998 "M" "62" 1 22019 1902029399999.9998 "P-Fibrinogen (koag)"  1 "5,8"  "g/L"        "2,0-4,2"  
139 1900685340000.0002 1902420299999.9998 "M" "62" 1 22019      1902114540000 "Pt-eGFR(Krea)relativ" 1 ">90"  "mL/min/1.7" ">60"      
146 1902837299999.9998 1905510600000.0002 "F" "62" 0     . 1904264039999.9998 "aB-Kalium ABL/PNA"    0 "3,9"  "mmol/L"     "3,5-4,6"  
146 1902837299999.9998 1905510600000.0002 "F" "62" 0     . 1904709660000.0002 "aB-EVF ABL/PNA"       0 "0,25" ""           "0,35-0,46"
147 1901225399999.9998 1902400380000.0002 "M" "62" 0     . 1901657759999.9998 "aB-pCO2 ABL/PNA"      0 "6,5"  "kPa"        "4,6-6,0"  
155      1901182980000 1901818799999.9998 "M" "62" 0     . 1901827620000.0002 "aB-Laktat ABL/PNA"    0 "0,8"  "mmol/L"     "0,5-2,3"  
162 1903517459999.9998                  . "M" "61" 0     . 1906396859999.9998 "aB-pCO2 ABL/PNA"      0 "4,0"  "kPa"        "4,6-6,0"  
168      1901030460000      1.9028364e+12 "M" "61" 0     . 1902160680000.0002 "aB-Hb ABL/PNA"        0 "114"  "g/L"        "134-170"  
168      1901030460000      1.9028364e+12 "M" "61" 0     .      1902679020000 "aB-AnjGap ABL/PNA"    0 "3,9"  "mmol/L"     "8-16"     
184 1902278700000.0002 1902709799999.9998 "M" "60" 1 22022 1902279480000.0002 "fP-Triglycerid"       1 "1,1"  "mmol/L"     "0,45-2,6" 
223      1902412320000 1904051879999.9998 "M" "58" 0     .      1902366180000 "B-EVF"                1 "0,39" ""           "0,39-0,50"
228      1.9013994e+12      1.9055088e+12 "M" "57" 0     . 1904461499999.9998 "P-Heparin,LMWH(aFXa)" 1 "0,40" "kIE/L"      ""         
239 1903902540000.0002 1906116599999.9998 "M" "57" 0     . 1904708399999.9998 "P-ALP (Alk fosfatas)" 1 "1,0"  "mikrokat/L" "0,7-1,9"  
239 1903902540000.0002 1906116599999.9998 "M" "57" 0     .      1906123080000 "aB-BE(a) ABL/PNA"     0 "-1"   "mmol/L"     "-3-3"     
252 1905474900000.0002      1.9061082e+12 "F" "56" 0     . 1905676739999.9998 "B-Trombocyter"        1 "225"  "x10(9)/L"   "165-387"  
253      1.9054386e+12 1905792000000.0002 "M" "56" 0     .      1905657660000 "P-LD"                 1 "4,8"  "mikrokat/L" "<3,5"     
255 1902421200000.0002 1904484839999.9998 "M" "55" 0     .      1902579120000 "aB-Laktat ABL/PNA"    0 "0,7"  "mmol/L"     "0,5-2,3"  
264 1903958280000.0002 1905762899999.9998 "M" "55" 0     .      1905315180000 "aB-Natrium ABL/PNA"   0 "137"  "mmol/L"     "137-145"  
268      1904680080000                  . "M" "54" 0     . 1905538799999.9998 "P-Heparin,LMWH(aFXa)" 1 "0,49" "kIE/L"      ""         
271 1902095939999.9998 1903103999999.9998 "M" "54" 0     .      1902892680000 "S-Prokalcitonin"      1 "0,74" "mikrog/L"   "<0,5"     
279 1901741400000.0002 1904112180000.0002 "M" "54" 0     .      1903928160000 "B-Erytrocyter"        1 "2,7"  "x10(12)/L"  "4,2-5,7"  
282 1902601799999.9998 1905177599999.9998 "M" "53" 0     . 1903447499999.9998 "aB-Natrium ABL/PNA"   0 "150"  "mmol/L"     "137-145"  
294 1901372939999.9998 1905867000000.0002 "M" "52" 0     .      1902200940000 "Erc(B)-MCV"           1 "92"   "fL"         "82-98"    
294 1901372939999.9998 1905867000000.0002 "M" "52" 0     . 1906005840000.0002 "P-Cystatin C"         1 "3,53" "mg/L"       "0,68-1,25"
311 1900528199999.9998 1901835599999.9998 "M" "51" 0     . 1901382719999.9998 "aB-pH ABL/PNA"        0 "7,38" ""           "7,35-7,45"
322 1902959759999.9998 1906465800000.0002 "M" "51" 0     .      1904879880000 "S-Zink"               1 "15"   "mikromol/L" "10-17"    
334      1.9025172e+12      1.9050033e+12 "M" "49" 0     .      1.9039023e+12 "aB-EVF ABL/PNA"       0 "0,26" ""           "0,39-0,50"
334      1.9025172e+12      1.9050033e+12 "M" "49" 0     . 1904204819999.9998 "aB-AnjGap ABL/PNA"    0 "11,7" "mmol/L"     "8-16"     
338      1899762840000 1901374200000.0002 "M" "48" 0     . 1901513099999.9998 "Erc(B)-MCV"           1 "94"   "fL"         "82-98"    
342 1901183339999.9998      1.9016862e+12 "M" "47" 1 22012      1901601720000 "P-ASAT"               1 "1,20" "mikrokat/L" "<0,76"    
360      1901345820000      1.9024848e+12 "F" "45" 0     .      1901673480000 "aB-Klorid ABL/PNA"    0 "99"   "mmol/L"     "98-107"   
361 1901328539999.9998 1902219900000.0002 "M" "45" 1 22016      1901752740000 "aB-sO2 ABL/PNA"       0 "96"   "%"          ">95"      
361 1901328539999.9998 1902219900000.0002 "M" "45" 1 22016 1901839799999.9998 "aB-Ca,fri ABL/PNA"    0 "1,16" "mmol/L"     "1,02-1,31"
371      1.9031064e+12 1904641800000.0002 "M" "42" 0     . 1904495459999.9998 "aB-MetHb ABL/PNA"     0 "0,6"  "%"          "<1,5"     
409 1901640959999.9998      1904734080000 "F" "24" 0     .      1902439080000 "aB-sO2 ABL/PNA"       0 "91"   "%"          ">95"      
end
format %tcnn/dd/ccYY_hh:MM AdmissionTime
format %tcnn/dd/ccYY_hh:MM DischargeTime
format %tdnn/dd/CCYY DeathDate
format %tcnn/dd/ccYY_hh:MM sample_datetime
label values pat_id pat_id

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#6

25 Jun 2020, 14:22

OK, I understand your situation somewhat better, enough to know that I probably don't know enough to be helpful.

A couple of thoughts you might use, and then I'll probably have to step aside:
1) No, all patients do not need to have the same number of days or samples in long format. There's absolutely nothing wrong (as far as Stata is concerned) with having (say) 10 samples from one patient, and 500 samples from another. In fact, that sort of thing is typical in many long form data sets.
2) It sounds like you want your data to be long with respect to day, and then wide with respect to the sequential order of the sample within the day. For the wide manipulation, you'd need to make an ICUday variable as I previously suggested, and then create a sequential integer out of sample_datetime.
3) Perhaps using -encode- to make your sample_type variable into something numeric would be helpful, as it gets around the naming problem.
Comment

Announcement

Hybrid long-wide. Help with transforming dataset

Comment

Comment

Comment

Comment

Comment