Note
Click here to download the full example code
04. Daily Aggregation of MIMIC-III ICU Antibiotic Therapy Data
- Format MIMIC therapy (all) 
This Python script processes and transforms antibiotic treatment data from the ICU_diagnoses_antibiotics.csv file, likely derived from the MIMIC-III dataset. It begins by loading the data, parsing starttime and stoptime columns, and cleaning antibiotic names. The core of the script reshapes the data from a record of treatment intervals into a daily summary. It achieves this by creating a date range for each antibiotic administration and then “exploding” these ranges into individual daily records. Finally, it groups the data by patient stay and date to produce a final series showing the unique list of antibiotics administered each day.
16 # Generic libraries
17 import pandas as pd
18
19 # Show in terminal
20 TERMINAL = False
First, lets load and do some basic formatting on the data.
25 # -----------------------------
26 # Constants
27 # -----------------------------
28 # Path
29 path = './data/mimic-therapy/ICU_diagnoses_antibiotics.csv'
30
31 # -----------------------------
32 # Load data
33 # -----------------------------
34 # Read data
35 data = pd.read_csv(path)
36
37 # Keep only useful columns
38 data = data[['subject_id',
39              'hadm_id',
40              'stay_id',
41              'icd_code',
42              'antibiotic',
43              'route',
44              'starttime',
45              'stoptime']]
46
47 # .. note::  Converting datetime manually because the parse_dates
48 #            value in read_csv returns the value unaltered as an object
49 #            type when the the conversion is not possible, and this
50 #            triggers errors while accessing using .dt.
51 # Explicitly convert columns to datetime, coercing errors to NaT
52 data['starttime'] = pd.to_datetime(data['starttime'], dayfirst=True, errors='coerce')
53 data['stoptime'] = pd.to_datetime(data['stoptime'], dayfirst=True, errors='coerce')
54
55 # Handle any rows where dates could not be parsed (Optional)
56 data.dropna(subset=['starttime', 'stoptime'], inplace=True)
57
58 # Reformat (time info and str)
59 data.starttime = data.starttime.dt.date
60 data.stoptime = data.stoptime.dt.date
61 data.antibiotic = data.antibiotic \
62     .str.lower() \
63     .str.strip()
64
65 # Show
66 if TERMINAL:
67     print("\nData:")
68     print(data)
69 data
Lets transform the data
Note
You might need to add NaNs for missing days per patient.
The other sample included in this repository for a single patient
04. Transforming Patient Therapy Intervals to a Daily Summary
achieves this by using the following code: aux = aux.asfreq('1D')
Note it needs to be applied per patient!
 81 # -----------------------------
 82 # Transform data
 83 # -----------------------------
 84 # .. note: The closed parameter indicates whether to include
 85 #          the first and/or last samples. None will keep both,
 86 #          left will keep only start date and right will keep
 87 #          also the right date.
 88 # Create column with date range
 89 data['startdate'] = data.apply(lambda x:
 90     pd.date_range(start=x['starttime'],
 91                   end=x['stoptime'],
 92                   inclusive='left',         # ignoring right
 93                   freq='D') ,axis=1)
 94
 95 # Explode such column
 96 data = data.explode('startdate')
 97
 98 # Groupby
 99 groupby = ['subject_id',
100            'hadm_id',
101            'stay_id',
102            'startdate']
103
104 # Create daily therapies
105 aux = data.groupby(groupby) \
106     .apply(lambda x: sorted(x.antibiotic \
107         .unique().tolist()))
Lets see the formatted data
112 # Show
113 if TERMINAL:
114     print("\nFormatted:")
115     print(aux)
116 aux
Out:
subject_id  hadm_id   stay_id   startdate
10007818    22987108  32359580  2146-06-30                [meropenem]
                                2146-07-01                [meropenem]
                                2146-07-02    [meropenem, vancomycin]
                                2146-07-03    [meropenem, vancomycin]
                                2146-07-04    [meropenem, vancomycin]
                                                       ...
19997367    20617667  35616526  2126-10-31              [ceftriaxone]
                                2126-11-01              [ceftriaxone]
                                2126-11-02              [ceftriaxone]
                                2126-11-03              [ceftriaxone]
                                2126-11-04              [ceftriaxone]
Length: 143037, dtype: object
Lets count the number of days
122 # Show
123 if TERMINAL:
124     print("\nTherapies (number of days)")
125     print(aux.value_counts())
126 aux.value_counts()
Out:
[vancomycin]                                                               15480
[cefepime, vancomycin]                                                     14006
[piperacillin-tazobactam, vancomycin]                                       8398
[cefepime]                                                                  7935
[meropenem]                                                                 5345
                                                                           ...
[sulfameth/trimethoprim]                                                       1
[cefepime, ceftriaxone, metronidazole (flagyl)]                                1
[linezolid, meropenem, metronidazole (flagyl), piperacillin-tazobactam]        1
[ceftaroline, meropenem, vancomycin]                                           1
[gentamicin, linezolid, meropenem]                                             1
Name: count, Length: 477, dtype: int64
Total running time of the script: ( 0 minutes 13.875 seconds)