04. Format MIMIC therapy (all)

Description…

 7 # Generic libraries
 8 import pandas as pd
 9
10 # Show in terminal
11 TERMINAL = False

First, lets load and do some basic formatting on the data.

16 # -----------------------------
17 # Constants
18 # -----------------------------
19 # Path
20 path = './data/mimic-therapy/ICU_diagnoses_antibiotics.csv'
21
22 # -----------------------------
23 # Load data
24 # -----------------------------
25 # Read data
26 data = pd.read_csv(path,
27     dayfirst=True,
28     parse_dates=['starttime',
29                  'stoptime'])
30
31 # Keep only useful columns
32 data = data[['subject_id',
33              'hadm_id',
34              'stay_id',
35              'icd_code',
36              'antibiotic',
37              'route',
38              'starttime',
39              'stoptime']]
40
41 # Reformat (time info and str)
42 data.starttime = data.starttime.dt.date
43 data.stoptime = data.stoptime.dt.date
44 data.antibiotic = data.antibiotic \
45     .str.lower() \
46     .str.strip()
47
48 # Show
49 if TERMINAL:
50     print("\nData:")
51     print(data)
52 data
subject_id hadm_id stay_id icd_code antibiotic route starttime stoptime
0 10656173 25778760 30001555 J95851 ceftriaxone IV 2177-09-27 2177-10-02
1 10656173 25778760 37985659 J95851 cefepime IV 2177-09-19 2177-09-23
2 10656173 25778760 37985659 J95851 cefepime IV 2177-09-11 2177-09-12
3 10656173 25778760 37985659 J95851 cefepime IV 2177-09-10 2177-09-11
4 10656173 25778760 37985659 J95851 cefepime IV 2177-09-12 2177-09-12
... ... ... ... ... ... ... ... ...
21541 15689523 23914765 39918058 99731 metronidazole (flagyl) IV 2159-07-13 2159-07-25
21542 15689523 23914765 39918058 99731 metronidazole (flagyl) IV 2159-06-27 2159-06-27
21543 15689523 23914765 39918058 99731 sulfamethoxazole-trimethoprim IV 2159-06-25 2159-06-26
21544 15689523 23914765 39918058 99731 sulfamethoxazole-trimethoprim IV 2159-06-26 2159-07-02
21545 15689523 23914765 39918058 99731 sulfamethoxazole-trimethoprim IV 2159-06-29 2159-06-29

21546 rows × 8 columns



Lets transform the data

Note

You might need to add NaNs for missing days per patient. The other sample included in this repository for a single patient 04. Format MIMIC therapy (one) achieves this by using the following code: aux = aux.asfreq('1D')

Note it needs to be applied per patient!

64 # -----------------------------
65 # Transform data
66 # -----------------------------
67 # .. note: The closed parameter indicates whether to include
68 #          the first and/or last samples. None will keep both,
69 #          left will keep only start date and right will keep
70 #          also the right date.
71 # Create column with date range
72 data['startdate'] = data.apply(lambda x:
73     pd.date_range(start=x['starttime'],
74                   end=x['stoptime'],
75                   closed='left',         # ignoring right
76                   freq='D') ,axis=1)
77
78 # Explode such column
79 data = data.explode('startdate')
80
81 # Groupby
82 groupby = ['subject_id',
83            'hadm_id',
84            'stay_id',
85            'startdate']
86
87 # Create daily therapies
88 aux = data.groupby(groupby) \
89     .apply(lambda x: sorted(x.antibiotic \
90         .unique().tolist()))

Lets see the formatted data

95 # Show
96 if TERMINAL:
97     print("\nFormatted:")
98     print(aux)
99 aux

Out:

subject_id  hadm_id   stay_id   startdate
10004733    27411876  39635619  2174-12-04    [piperacillin-tazobactam, vancomycin]
                                2174-12-05    [piperacillin-tazobactam, vancomycin]
                                2174-12-06    [piperacillin-tazobactam, vancomycin]
                                2174-12-07                [piperacillin-tazobactam]
                                2174-12-08    [piperacillin-tazobactam, vancomycin]
                                                              ...
19997367    20617667  35616526  2126-05-06                            [ceftriaxone]
                                2126-05-07                            [ceftriaxone]
                                2126-05-08                            [ceftriaxone]
                                2126-05-09                            [ceftriaxone]
                                2126-05-10                            [ceftriaxone]
Length: 22936, dtype: object

Lets count the number of days

105 # Show
106 if TERMINAL:
107     print("\nTherapies (number of days)")
108     print(aux.value_counts())
109 aux.value_counts()

Out:

[cefepime, vancomycin]                                                                             2630
[cefepime]                                                                                         1802
[vancomycin]                                                                                       1732
[piperacillin-tazobactam, vancomycin]                                                              1490
[meropenem]                                                                                        1378
                                                                                                   ...
[cefepime, gentamicin, gentamicin sulfate, vancomycin]                                                1
[ciprofloxacin, ciprofloxacin iv, vancomycin]                                                         1
[doxycycline hyclate, metronidazole (flagyl)]                                                         1
[cefepime, ciprofloxacin iv, metronidazole (flagyl), sulfamethoxazole-trimethoprim, vancomycin]       1
[ceftolozane-tazobactam, ceftolozane-tazobactam *nf*]                                                 1
Length: 707, dtype: int64

Total running time of the script: ( 0 minutes 5.921 seconds)

Gallery generated by Sphinx-Gallery