.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "_examples\pandas\plot_format04_therapy_all.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr__examples_pandas_plot_format04_therapy_all.py: 04. Daily Aggregation of MIMIC-III ICU Antibiotic Therapy Data =============================================================== 04. Format MIMIC therapy (all) This Python script processes and transforms antibiotic treatment data from the ICU_diagnoses_antibiotics.csv file, likely derived from the MIMIC-III dataset. It begins by loading the data, parsing starttime and stoptime columns, and cleaning antibiotic names. The core of the script reshapes the data from a record of treatment intervals into a daily summary. It achieves this by creating a date range for each antibiotic administration and then "exploding" these ranges into individual daily records. Finally, it groups the data by patient stay and date to produce a final series showing the unique list of antibiotics administered each day. .. GENERATED FROM PYTHON SOURCE LINES 16-22 .. code-block:: default :lineno-start: 16 # Generic libraries import pandas as pd # Show in terminal TERMINAL = False .. GENERATED FROM PYTHON SOURCE LINES 23-24 First, lets load and do some basic formatting on the data. .. GENERATED FROM PYTHON SOURCE LINES 24-71 .. code-block:: default :lineno-start: 25 # ----------------------------- # Constants # ----------------------------- # Path path = './data/mimic-therapy/ICU_diagnoses_antibiotics.csv' # ----------------------------- # Load data # ----------------------------- # Read data data = pd.read_csv(path) # Keep only useful columns data = data[['subject_id', 'hadm_id', 'stay_id', 'icd_code', 'antibiotic', 'route', 'starttime', 'stoptime']] # .. note:: Converting datetime manually because the parse_dates # value in read_csv returns the value unaltered as an object # type when the the conversion is not possible, and this # triggers errors while accessing using .dt. # Explicitly convert columns to datetime, coercing errors to NaT data['starttime'] = pd.to_datetime(data['starttime'], dayfirst=True, errors='coerce') data['stoptime'] = pd.to_datetime(data['stoptime'], dayfirst=True, errors='coerce') # Handle any rows where dates could not be parsed (Optional) data.dropna(subset=['starttime', 'stoptime'], inplace=True) # Reformat (time info and str) data.starttime = data.starttime.dt.date data.stoptime = data.stoptime.dt.date data.antibiotic = data.antibiotic \ .str.lower() \ .str.strip() # Show if TERMINAL: print("\nData:") print(data) data .. raw:: html
subject_id hadm_id stay_id icd_code antibiotic route starttime stoptime
0 10656173 25778760 30001555 J95851 ceftriaxone IV 2177-09-27 2177-02-10
2 10656173 25778760 37985659 J95851 cefepime IV 2177-09-11 2177-12-09
3 10656173 25778760 37985659 J95851 cefepime IV 2177-09-10 2177-11-09
4 10656173 25778760 37985659 J95851 cefepime IV 2177-09-12 2177-12-09
8 10656173 25778760 37985659 J95851 vancomycin IV 2177-09-11 2177-12-09
... ... ... ... ... ... ... ... ...
21526 15689523 23914765 39918058 99731 meropenem IV 2159-07-04 2159-05-07
21528 15689523 23914765 39918058 99731 meropenem IV 2159-07-03 2159-04-07
21532 15689523 23914765 39918058 99731 vancomycin IV 2159-07-03 2159-04-07
21540 15689523 23914765 39918058 99731 levofloxacin IV 2159-06-29 2159-02-07
21544 15689523 23914765 39918058 99731 sulfamethoxazole-trimethoprim IV 2159-06-26 2159-02-07

8474 rows × 8 columns



.. GENERATED FROM PYTHON SOURCE LINES 72-80 Lets transform the data .. note:: You might need to add ``NaNs`` for missing days per patient. The other sample included in this repository for a single patient :ref:`sphx_glr__examples_pandas_plot_format04_therapy_one.py` achieves this by using the following code: ``aux = aux.asfreq('1D')`` Note it needs to be applied per patient! .. GENERATED FROM PYTHON SOURCE LINES 80-109 .. code-block:: default :lineno-start: 81 # ----------------------------- # Transform data # ----------------------------- # .. note: The closed parameter indicates whether to include # the first and/or last samples. None will keep both, # left will keep only start date and right will keep # also the right date. # Create column with date range data['startdate'] = data.apply(lambda x: pd.date_range(start=x['starttime'], end=x['stoptime'], inclusive='left', # ignoring right freq='D') ,axis=1) # Explode such column data = data.explode('startdate') # Groupby groupby = ['subject_id', 'hadm_id', 'stay_id', 'startdate'] # Create daily therapies aux = data.groupby(groupby) \ .apply(lambda x: sorted(x.antibiotic \ .unique().tolist())) .. GENERATED FROM PYTHON SOURCE LINES 110-111 Lets see the formatted data .. GENERATED FROM PYTHON SOURCE LINES 111-119 .. code-block:: default :lineno-start: 112 # Show if TERMINAL: print("\nFormatted:") print(aux) aux .. rst-class:: sphx-glr-script-out Out: .. code-block:: none subject_id hadm_id stay_id startdate 10007818 22987108 32359580 2146-06-30 [meropenem] 2146-07-01 [meropenem] 2146-07-02 [meropenem, vancomycin] 2146-07-03 [meropenem, vancomycin] 2146-07-04 [meropenem, vancomycin] ... 19997367 20617667 35616526 2126-10-31 [ceftriaxone] 2126-11-01 [ceftriaxone] 2126-11-02 [ceftriaxone] 2126-11-03 [ceftriaxone] 2126-11-04 [ceftriaxone] Length: 143037, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 120-121 Lets count the number of days .. GENERATED FROM PYTHON SOURCE LINES 121-129 .. code-block:: default :lineno-start: 122 # Show if TERMINAL: print("\nTherapies (number of days)") print(aux.value_counts()) aux.value_counts() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [vancomycin] 15480 [cefepime, vancomycin] 14006 [piperacillin-tazobactam, vancomycin] 8398 [cefepime] 7935 [meropenem] 5345 ... [sulfameth/trimethoprim] 1 [cefepime, ceftriaxone, metronidazole (flagyl)] 1 [linezolid, meropenem, metronidazole (flagyl), piperacillin-tazobactam] 1 [ceftaroline, meropenem, vancomycin] 1 [gentamicin, linezolid, meropenem] 1 Name: count, Length: 477, dtype: int64 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 13.875 seconds) .. _sphx_glr_download__examples_pandas_plot_format04_therapy_all.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_format04_therapy_all.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_format04_therapy_all.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_