`DRI` - Example using `MIMIC`

In the field of healthcare, analyzing drug resistance in antimicrobial use is crucial for understanding and combating the growing problem of antibiotic resistance. One example is the Drug Resistance Index o DRI. In this example, we compute such index using MIMIC, comprehensive, a widely-used and freely available healthcare database that contains de-identified electronic health records of over 60,000 intensive care unit patients. Within MIMIC, researchers have access to rich information, including patient demographics, clinical notes, laboratory results, and medication records. This dataset provides the necessary data, susceptibility and prescription information, to compute the drug resistance index.

Note

In MIMIC, the deidentification process for structured data required the removal of dates. In particular, dates were shifted into the future by a random offset for each individual patient in a consistent manner to preserve intervals, resulting in stays which occur sometime between the years 2100 and 2200. Time of day, day of the week, and approximate seasonality were conserved during date shifting.

 # Define sphinx gallery configuration
 # sphinx_gallery_thumbnail_number = 2

 # Libraries
 import sys
 import warnings
 import numpy as np
 import pandas as pd
 import seaborn as sns
 import matplotlib as mpl

 from pathlib import Path

 try:
     __file__
     TERMINAL = True
 except:
     TERMINAL = False

 # -------------------------
 # Configuration
 # -------------------------
 # Params
 rc = {
     'axes.linewidth': 0.5,
     'axes.labelsize': 9,
     'axes.titlesize': 11,
     'xtick.labelsize': 7,
     'ytick.labelsize': 7,
 }

 # Configure seaborn style (context=talk)
 sns.set_theme(style="white", rc=rc)

 # Configure warnings
 warnings.filterwarnings("ignore",
     category=pd.errors.DtypeWarning)

 # -------------------------------------------------------
 # Constants
 # -------------------------------------------------------
 # Rename columns for susceptibility
 rename_susceptibility = {
     'chartdate': 'DATE',
     'micro_specimen_id': 'LAB_NUMBER',
     'spec_type_desc': 'SPECIMEN',
     'org_name': 'MICROORGANISM',
     'ab_name': 'ANTIMICROBIAL',
     'interpretation': 'SENSITIVITY'
 }

 # Rename columns for prescriptions
 rename_prescriptions = {
     'drug': 'DRUG'
 }

First, we need to load the susceptibility test data

 # -----------------------------
 # Load susceptibility test data
 # -----------------------------
 # Helper
 subset = rename_susceptibility.values()

 # Load data
 path = Path('../../pyamr/datasets/mimic')
 data1 = pd.read_csv(path / 'susceptibility.csv')

 # Rename columns
 data1 = data1.rename(columns=rename_susceptibility)

 # Format data
 data1 = data1[subset]
 data1 = data1.dropna(subset=subset, how='any')
 data1.DATE = pd.to_datetime(data1.DATE)
 data1.SENSITIVITY = data1.SENSITIVITY.replace({
     'S': 'sensitive',
     'R': 'resistant',
     'I': 'intermediate',
     'P': 'pass'
 })

 data1.head(5)

	DATE	LAB_NUMBER	SPECIMEN	MICROORGANISM	ANTIMICROBIAL	SENSITIVITY
2	2171-06-10	3685553	URINE	ESCHERICHIA COLI	AMPICILLIN	sensitive
3	2171-06-10	3685553	URINE	ESCHERICHIA COLI	CEFAZOLIN	sensitive
4	2171-06-10	3685553	URINE	ESCHERICHIA COLI	TRIMETHOPRIM/SULFA	sensitive
5	2171-06-10	3685553	URINE	ESCHERICHIA COLI	NITROFURANTOIN	sensitive
6	2171-06-10	3685553	URINE	ESCHERICHIA COLI	GENTAMICIN	sensitive

Lets also load the prescriptions (could be also overall usage) data

 # ----------------------------
 # Load prescriptions
 # ----------------------------
 # Load prescription data (limited to first nrows).
 data2 = pd.read_csv(path / 'prescriptions.csv', nrows=100000)
 data2 = data2.rename(columns=rename_prescriptions)

 # Format data
 data2.DRUG = data2.DRUG.str.upper()
 data2['DATE'] = pd.to_datetime(data2.starttime)
 data2 = data2.dropna(subset=['DATE'], how='any')

 # .. note:: We are only keeping those DRUGS which have
 #           the exact name of the antimicrobial tested
 #           in the susceptibility test record. There are
 #           also brand names that could/should be
 #           included

 # Filter
 data2 = data2[data2.DRUG.isin(data1.ANTIMICROBIAL.unique())]

 print(data2)

       subject_id   hadm_id  pharmacy_id            starttime             stoptime drug_type          DRUG      gsn  ...  form_rx dose_val_rx dose_unit_rx form_val_disp form_unit_disp doses_per_24_hrs  route                DATE
     17868682  25218370     81568564  2167-05-24 20:00:00  2167-05-26 02:00:00      MAIN     CEFAZOLIN   068632  ...      NaN           2            g             1            BAG              3.0     IV 2167-05-24 20:00:00
     17868682  24052239     59994266  2167-06-20 23:00:00  2167-06-22 11:00:00      MAIN    VANCOMYCIN   043952  ...      NaN        1000           mg           200             mL              2.0     IV 2167-06-20 23:00:00
     17868682  24052239     39104292  2167-06-22 20:00:00  2167-06-25 12:00:00      MAIN    VANCOMYCIN   009331  ...      NaN        1500           mg             3           VIAL              2.0     IV 2167-06-22 20:00:00
     17868682  24052239      1791946  2167-06-24 20:00:00  2167-06-25 12:00:00      MAIN    VANCOMYCIN   009331  ...      NaN        1500           mg             3           VIAL              2.0     IV 2167-06-24 20:00:00
     12315540  24554730     25748315  2172-08-15 11:00:00  2172-08-16 20:00:00      MAIN  LEVOFLOXACIN   029928  ...      NaN         750           mg           1.5            TAB              0.0     PO 2172-08-15 11:00:00
...           ...       ...          ...                  ...                  ...       ...           ...      ...  ...      ...         ...          ...           ...            ...              ...    ...                 ...
  15276693  21600683     24768410  2141-08-21 08:00:00  2141-08-26 18:00:00      MAIN  LEVOFLOXACIN  46771.0  ...      NaN         750           mg             1            TAB              0.0  PO/NG 2141-08-21 08:00:00
  10819462  26263456     61192290  2181-12-01 16:00:00  2181-12-02 15:00:00      MAIN   CLINDAMYCIN   9344.0  ...      NaN         600           mg             4             mL              3.0     IV 2181-12-01 16:00:00
  17442082  26038207     88544744  2135-08-07 15:00:00  2135-08-08 14:00:00      MAIN      CEFEPIME  24095.0  ...      NaN           2            g             1           VIAL              1.0     IV 2135-08-07 15:00:00
  17442082  20010297     74705027  2135-09-10 07:00:00  2135-09-10 11:00:00      MAIN   CEFTRIAXONE   9162.0  ...      NaN           1           gm             1           VIAL              1.0     IV 2135-09-10 07:00:00
  19103067  28239677     20279428  2124-06-02 04:00:00  2124-06-02 22:00:00      MAIN   CLINDAMYCIN   9339.0  ...      NaN         300           mg             2            CAP              3.0  PO/NG 2124-06-02 04:00:00

[2655 rows x 18 columns]

Lets rename the variables.

 # Rename variables
 susceptibility, prescriptions = data1, data2

 # Show
 if TERMINAL:
     print("\nSusceptibility:")
     print(susceptibility.head(10))
     print("\nPrescriptions:")
     print(prescriptions.head(10))

Now we need to create a summary table including the resistance value, which will be computed using SARI and the usage which will be computed manually. This summary table is required as a preliminary step to compute the DRI.

 # ------------------------
 # Compute summary table
 # ------------------------
 # Libraries
 from pyamr.core.sari import SARI

 # Create sari instance
 sari = SARI(groupby=[
     susceptibility.DATE.dt.year,
     #'SPECIMEN',
     'MICROORGANISM',
     'ANTIMICROBIAL',
     'SENSITIVITY']
 )

 # Compute susceptibility summary table
 smmry1 = sari.compute(susceptibility,
     return_frequencies=True)

 # .. note:: We are counting the number of rows as an indicator
 #           of prescriptions. However, it would be possible to
 #           sum the doses (with caution due to units, ...)

 # Compute prescriptions summary table.
 smmry2 = prescriptions \
     .groupby(by=[prescriptions.DATE.dt.year, 'DRUG']) \
     .DRUG.count().rename('use')
     #.DOSE.sum().rename('use')
 smmry2.index.names = ['DATE', 'ANTIMICROBIAL']

 # Combine both summary tables
 smmry = smmry1.reset_index().merge(
     smmry2.reset_index(), how='inner',
     left_on=['DATE', 'ANTIMICROBIAL'],
     right_on=['DATE', 'ANTIMICROBIAL']
 )

 # Show
 if TERMINAL:
     print("\nSummary:")
     print(smmry)

206 smmry

	DATE	MICROORGANISM	ANTIMICROBIAL	intermediate	pass	resistant	sensitive	freq	sari	use
0	2110	BETA STREPTOCOCCUS GROUP B	ERYTHROMYCIN	0.0	0.0	1.0	0.0	1.0	1.00	3
1	2110	CORYNEBACTERIUM UREALYTICUM SP. NOV.	ERYTHROMYCIN	0.0	0.0	1.0	0.0	1.0	1.00	3
2	2110	POSITIVE FOR GROUP B BETA STREPTOCOCCI	ERYTHROMYCIN	0.0	0.0	1.0	0.0	1.0	1.00	3
3	2110	POSITIVE FOR METHICILLIN RESISTANT S...	ERYTHROMYCIN	0.0	0.0	1.0	0.0	1.0	1.00	3
4	2110	STAPH AUREUS COAG +	ERYTHROMYCIN	0.0	0.0	39.0	21.0	60.0	0.65	3
...	...	...	...	...	...	...	...	...	...	...
9933	2210	STAPHYLOCOCCUS, COAGULASE NEGATIVE	VANCOMYCIN	0.0	0.0	0.0	1.0	1.0	0.00	5
9934	2210	ESCHERICHIA COLI	MEROPENEM	0.0	0.0	0.0	9.0	9.0	0.00	5
9935	2210	KLEBSIELLA PNEUMONIAE	MEROPENEM	0.0	0.0	0.0	4.0	4.0	0.00	5
9936	2210	PSEUDOMONAS AERUGINOSA	MEROPENEM	0.0	0.0	0.0	2.0	2.0	0.00	5
9937	2210	SERRATIA MARCESCENS	MEROPENEM	0.0	0.0	0.0	1.0	1.0	0.00	5

9938 rows × 10 columns

Lets compute the DRI

 # -------------------------
 # Compute DRI
 # -------------------------
 # Libraries
 from pyamr.core.dri import DRI

 # Instance
 obj = DRI(
     column_resistance='sari',
     column_usage='use'
 )

 # Compute overall DRI
 dri1 = obj.compute(smmry,
     groupby=['DATE'],
     return_usage=True)

 # Compute DRI by organism
 dri2 = obj.compute(smmry,
     groupby=['DATE', 'MICROORGANISM'],
     return_usage=True)


 if TERMINAL:
     print("DRI overall:")
     print(dri1)
     print("DRI by microorganism:")
     print(dri2)

243 dri1

	use_period	dri
DATE
2110	30.0	0.6817
2111	262.0	0.2516
2112	196.0	0.1040
2113	159.0	0.1330
2114	64.0	0.0576
...	...	...
2206	303.0	0.1102
2207	135.0	0.2612
2208	36.0	0.2141
2209	192.0	0.2178
2210	75.0	0.0370

101 rows × 2 columns

247 dri2

		use_period	dri
DATE	MICROORGANISM
2110	BETA STREPTOCOCCUS GROUP B	3.0	1.00
	CORYNEBACTERIUM UREALYTICUM SP. NOV.	3.0	1.00
	POSITIVE FOR GROUP B BETA STREPTOCOCCI	3.0	1.00
	POSITIVE FOR METHICILLIN RESISTANT STAPH AUREUS	3.0	1.00
	STAPH AUREUS COAG +	3.0	0.65
...	...	...	...
2210	PSEUDOMONAS AERUGINOSA	5.0	0.00
	SERRATIA MARCESCENS	10.0	0.00
	STAPH AUREUS COAG +	5.0	0.00
	STAPHYLOCOCCUS LUGDUNENSIS	5.0	0.00
	STAPHYLOCOCCUS, COAGULASE NEGATIVE	5.0	0.00

4125 rows × 2 columns

Lets visualise the overall DRI.

 # --------------------------------------------
 # Plot
 # --------------------------------------------
 # Libraries
 import matplotlib.pyplot as plt
 import seaborn as sns

 # Display using relplot
 g = sns.relplot(data=dri1.reset_index(), x='DATE', y='dri',
     height=2, aspect=3.0, kind='line',
     linewidth=2, markersize=0, marker='o'
 )

 plt.tight_layout()

Lets visualise the microorganism-wise DRI.

 # --------------------------------------------
 # Format
 # --------------------------------------------
 # Copy results
 aux = dri2.copy(deep=True)

 # Combine with summary
 aux = aux.merge(smmry, how='left',
     left_on=['DATE', 'MICROORGANISM'],
     right_on=['DATE', 'MICROORGANISM'])

 # Find microorganisms with more samples
 top = aux.groupby(by='MICROORGANISM') \
     .freq.sum().sort_values(ascending=False) \
     .head(4)

 # Filter by top microorganisms
 aux = aux[aux.MICROORGANISM.isin(top.index)]

 # --------------------------------------------
 # Plot
 # --------------------------------------------
 # Display
 g = sns.relplot(data=aux,
     x='DATE', y='dri', hue='MICROORGANISM',
     row='MICROORGANISM', palette='rocket',
     #style='event', col='region', palette='palette',
     height=1.5, aspect=4.0, kind='line', legend=False,
     linewidth=2, markersize=0, marker='o')

 """
 # Iterate over each subplot to customize further
 for title, ax in g.axes_dict.items():
     ax.text(1., .85, title, transform=ax.transAxes,
         fontsize=9, fontweight="normal",
         horizontalalignment='right')
 """
 # Configure
 g.tight_layout()
 g.set_titles("{row_name}")
 #g.set_titles("")

 # Show
 plt.show()

STAPH AUREUS COAG +, ENTEROCOCCUS SP., ESCHERICHIA COLI, KLEBSIELLA PNEUMONIAE

The top microorganisms are:

317 top

MICROORGANISM
ESCHERICHIA COLI         101050.0
STAPH AUREUS COAG +       28165.0
KLEBSIELLA PNEUMONIAE     28094.0
ENTEROCOCCUS SP.          10585.0
Name: freq, dtype: float64

The results look as follows:

 aux.rename(columns={
     'intermediate': 'I',
     'sensitive': 'S',
     'resistant': 'R',
     'pass': 'P'
 }).round(decimals=2)

	DATE	MICROORGANISM	use_period	dri	ANTIMICROBIAL	I	P	R	S	freq	sari	use
4	2110	STAPH AUREUS COAG +	3.0	0.65	ERYTHROMYCIN	0.0	0.0	39.0	21.0	60.0	0.65	3
32	2111	ENTEROCOCCUS SP.	7.0	0.35	VANCOMYCIN	0.0	0.0	23.0	43.0	66.0	0.35	7
33	2111	ESCHERICHIA COLI	5.0	0.23	CEFTRIAXONE	1.0	0.0	25.0	107.0	133.0	0.20	2
34	2111	ESCHERICHIA COLI	5.0	0.23	CEFAZOLIN	1.0	0.0	32.0	100.0	133.0	0.25	3
38	2111	KLEBSIELLA PNEUMONIAE	5.0	0.04	CEFTRIAXONE	0.0	0.0	1.0	25.0	26.0	0.04	2
...	...	...	...	...	...	...	...	...	...	...	...	...
9928	2210	ESCHERICHIA COLI	10.0	0.28	CEFTRIAXONE	0.0	0.0	5.0	4.0	9.0	0.56	5
9929	2210	ESCHERICHIA COLI	10.0	0.28	MEROPENEM	0.0	0.0	0.0	9.0	9.0	0.00	5
9930	2210	KLEBSIELLA PNEUMONIAE	10.0	0.00	CEFTRIAXONE	0.0	0.0	0.0	4.0	4.0	0.00	5
9931	2210	KLEBSIELLA PNEUMONIAE	10.0	0.00	MEROPENEM	0.0	0.0	0.0	4.0	4.0	0.00	5
9935	2210	STAPH AUREUS COAG +	5.0	0.00	VANCOMYCIN	0.0	0.0	0.0	1.0	1.0	0.00	5

1108 rows × 12 columns

Total running time of the script: ( 0 minutes 12.707 seconds)

Gallery generated by Sphinx-Gallery

DRI - Example using MIMIC

`DRI` - Example using `MIMIC`