Note
Click here to download the full example code
04. Transforming Patient Therapy Intervals to a Daily Summary
Previous title: 04. Format MIMIC therapy (one)
This script demonstrates how to process time-interval data, a common format in medical records like the MIMIC-III dataset. It starts with a table where each row represents an antibiotic therapy with a specific start and end date.
The main goal is to transform this interval-based data into a daily summary. This is achieved by using pd.date_range to generate all dates within each therapy period and then using the explode() method to create a distinct row for each day. The final output is a count of how many days each unique combination of therapies was administered to the patient.
18 # Generic libraries
19 import pandas as pd
20
21 # Show in terminal
22 TERMINAL = False
First, lets load and do some basic formatting on the data.
27 # -----------------------------
28 # Constants
29 # -----------------------------
30 # Path
31 path = './data/mimic-therapy/One_patient_condensed_10656173.csv'
32 path = './data/mimic-therapy/One_patient_condensed_11803145.csv'
33
34 # -----------------------------
35 # Load data
36 # -----------------------------
37 # Read data
38 data = pd.read_csv(path,
39     dayfirst=True,
40     parse_dates=['starttime',
41                  'stoptime'])
42
43 # Keep only useful columns
44 data = data[['subject_id',
45              'antibiotic',
46              'route',
47              'starttime',
48              'stoptime']]
49
50 # Reformat (ignore time information)
51 data.starttime = data.starttime.dt.date
52 data.stoptime = data.stoptime.dt.date
53
54 # Show
55 if TERMINAL:
56     print("\nData:")
57     print(data)
58 data
Lets transform the data
63 # -----------------------------
64 # Transform data
65 # -----------------------------
66 # .. note: The closed parameter indicates whether to include
67 #          the first and/or last samples. None will keep both,
68 #          left will keep only start date and right will keep
69 #          also the right date.
70 # Create column with date range
71 data['startdate'] = data.apply(lambda x:
72     pd.date_range(start=x['starttime'],
73                   end=x['stoptime'],
74                   inclusive='left',         # ignoring right
75                   freq='D') ,axis=1)
76
77 # Explode such column
78 data = data.explode('startdate')
79
80 # Create daily therapies
81 aux = data.groupby('startdate') \
82     .apply(lambda x: sorted(x.antibiotic \
83         .str.lower().str.strip().unique().tolist()))
84
85 # Include missing days
86 aux = aux.asfreq('1D')
Lets see the formatted data
91 # Show
92 if TERMINAL:
93     print("\nFormatted:")
94     print(aux)
95 aux
Out:
startdate
2159-01-20                [piperacillin-tazobactam, vancomycin]
2159-01-21                [piperacillin-tazobactam, vancomycin]
2159-01-22                [piperacillin-tazobactam, vancomycin]
2159-01-23                [piperacillin-tazobactam, vancomycin]
2159-01-24                            [piperacillin-tazobactam]
2159-01-25                                          [meropenem]
2159-01-26                              [meropenem, vancomycin]
2159-01-27                              [meropenem, vancomycin]
2159-01-28    [azithromycin, meropenem, tobramycin sulfate, ...
2159-01-29    [azithromycin, meropenem, metronidazole (flagyl)]
2159-01-30      [meropenem, metronidazole (flagyl), vancomycin]
2159-01-31                  [meropenem, metronidazole (flagyl)]
2159-02-01                              [meropenem, vancomycin]
2159-02-02                                                  NaN
2159-02-03                              [meropenem, vancomycin]
Freq: D, dtype: object
Lets count the number of days
100 # Show
101 if TERMINAL:
102     print("\nTherapies (number of days)")
103     print(aux.value_counts())
104 aux.value_counts()
Out:
[piperacillin-tazobactam, vancomycin]                        4
[meropenem, vancomycin]                                      4
[piperacillin-tazobactam]                                    1
[meropenem]                                                  1
[azithromycin, meropenem, tobramycin sulfate, vancomycin]    1
[azithromycin, meropenem, metronidazole (flagyl)]            1
[meropenem, metronidazole (flagyl), vancomycin]              1
[meropenem, metronidazole (flagyl)]                          1
Name: count, dtype: int64
Total running time of the script: ( 0 minutes 0.027 seconds)