Note
Click here to download the full example code
04. Format MIMIC therapy (all)
Description…
7 # Generic libraries
8 import pandas as pd
9
10 # Show in terminal
11 TERMINAL = False
First, lets load and do some basic formatting on the data.
16 # -----------------------------
17 # Constants
18 # -----------------------------
19 # Path
20 path = './data/mimic-therapy/ICU_diagnoses_antibiotics.csv'
21
22 # -----------------------------
23 # Load data
24 # -----------------------------
25 # Read data
26 data = pd.read_csv(path,
27 dayfirst=True,
28 parse_dates=['starttime',
29 'stoptime'])
30
31 # Keep only useful columns
32 data = data[['subject_id',
33 'hadm_id',
34 'stay_id',
35 'icd_code',
36 'antibiotic',
37 'route',
38 'starttime',
39 'stoptime']]
40
41 # Reformat (time info and str)
42 data.starttime = data.starttime.dt.date
43 data.stoptime = data.stoptime.dt.date
44 data.antibiotic = data.antibiotic \
45 .str.lower() \
46 .str.strip()
47
48 # Show
49 if TERMINAL:
50 print("\nData:")
51 print(data)
52 data
Lets transform the data
Note
You might need to add NaNs
for missing days per patient.
The other sample included in this repository for a single patient
04. Format MIMIC therapy (one)
achieves this by using the following code: aux = aux.asfreq('1D')
Note it needs to be applied per patient!
64 # -----------------------------
65 # Transform data
66 # -----------------------------
67 # .. note: The closed parameter indicates whether to include
68 # the first and/or last samples. None will keep both,
69 # left will keep only start date and right will keep
70 # also the right date.
71 # Create column with date range
72 data['startdate'] = data.apply(lambda x:
73 pd.date_range(start=x['starttime'],
74 end=x['stoptime'],
75 closed='left', # ignoring right
76 freq='D') ,axis=1)
77
78 # Explode such column
79 data = data.explode('startdate')
80
81 # Groupby
82 groupby = ['subject_id',
83 'hadm_id',
84 'stay_id',
85 'startdate']
86
87 # Create daily therapies
88 aux = data.groupby(groupby) \
89 .apply(lambda x: sorted(x.antibiotic \
90 .unique().tolist()))
Lets see the formatted data
95 # Show
96 if TERMINAL:
97 print("\nFormatted:")
98 print(aux)
99 aux
Out:
subject_id hadm_id stay_id startdate
10004733 27411876 39635619 2174-12-04 [piperacillin-tazobactam, vancomycin]
2174-12-05 [piperacillin-tazobactam, vancomycin]
2174-12-06 [piperacillin-tazobactam, vancomycin]
2174-12-07 [piperacillin-tazobactam]
2174-12-08 [piperacillin-tazobactam, vancomycin]
...
19997367 20617667 35616526 2126-05-06 [ceftriaxone]
2126-05-07 [ceftriaxone]
2126-05-08 [ceftriaxone]
2126-05-09 [ceftriaxone]
2126-05-10 [ceftriaxone]
Length: 22936, dtype: object
Lets count the number of days
105 # Show
106 if TERMINAL:
107 print("\nTherapies (number of days)")
108 print(aux.value_counts())
109 aux.value_counts()
Out:
[cefepime, vancomycin] 2630
[cefepime] 1802
[vancomycin] 1732
[piperacillin-tazobactam, vancomycin] 1490
[meropenem] 1378
...
[cefepime, gentamicin, gentamicin sulfate, vancomycin] 1
[ciprofloxacin, ciprofloxacin iv, vancomycin] 1
[doxycycline hyclate, metronidazole (flagyl)] 1
[cefepime, ciprofloxacin iv, metronidazole (flagyl), sulfamethoxazole-trimethoprim, vancomycin] 1
[ceftolozane-tazobactam, ceftolozane-tazobactam *nf*] 1
Length: 707, dtype: int64
Total running time of the script: ( 0 minutes 5.921 seconds)