pyamr.datasets.microbiology package
Submodules
pyamr.datasets.microbiology.create_quickimport module
Functions:
|
Creates registry from data. |
- pyamr.datasets.microbiology.create_quickimport.create_registry(data, keyword=None, keep=None)[source]
Creates registry from data.
- Parameters:
data (pd.DataFrame) – The data
keyword (string) – The keyword for the columns. All columns starting with such keyword will be kept and used for the registry.
keep (list) – The list of columns to keep for the registry
pyamr.datasets.microbiology.create_susceptibility module
Functions:
Creates the look up table for the antimicorbials. |
|
Creates the look up table for the organisms. |
- pyamr.datasets.microbiology.create_susceptibility.create_antimicrobials_lookup_table(abxs)[source]
Creates the look up table for the antimicorbials.
This method uses the information in the antibiotics dataframe and the information in the default antimicrobials registry to create a unique lookup table for the data.
- Parameters:
abxs (pd.DataFrame) – The DataFrame with … The DataFrame must contain the following columns:
- Returns:
Lookup table DataFrame with the following columns:
- Return type:
pd.DataFrame
- pyamr.datasets.microbiology.create_susceptibility.create_microorganisms_lookup_table(orgs)[source]
Creates the look up table for the organisms.
This method uses the information in the organisms dataframe and the information in the default microorganisms registry to create a unique lookup table for the data.
- Parameters:
orgs (pd.DataFrame) – The DataFrame with the organism genus and organism species for which the look up table should be created. The DataFrame must contain the following columns:
microorganism_name genus species
- Returns:
- Lookup table DataFrame with the following columns:
’domain’ ‘phylum’ ‘class’ ‘order’ ‘family’ ‘genus’ ‘species’ ‘acronym’ ‘exists_in_registry’ ‘gram_stain’ ‘microorganism_code’ ‘microorganism_name
- Return type:
pd.DataFrame
pyamr.datasets.microbiology.quickimport_create module
pyamr.datasets.microbiology.quickimport_run module
Functions:
|
|
pyamr.datasets.microbiology.test module
import pandas as pd
# ————————————————- # Test to_csv date_format # ————————————————- # Create DataFrame a = pd.DataFrame()
# Create dates a[‘dates’] = [‘23/01/2015 19:37’,
‘23/01/2015 20:08’]
# Format dates a.dates = pd.to_datetime(a.dates)
# Save print(” DF:”) print(a) print(“Saving…”) #a.to_csv(‘test-v0.1.csv’) #a.to_csv(‘test-v0.2.csv’, date_format=’%Y-%m-%d %H:%M:%S’)
# ————————————————- # Test cleaning / replacing # ————————————————- # Create DataFrame a = pd.DataFrame()
# Create regexpmap REGEX_MAP = {
‘([^)]*)’: ‘’, # Remove everything between (). ‘species’: ‘’, # Rename species for next regexp ‘sp(.)?(s|$)+’: ‘ ‘, # Remove sp from word. ‘strep(.|s|$)’: ‘streptococcus ‘, # Complete ‘staph(.|s|$)’: ‘staphylococcus ‘, # Complete ‘s+’: ‘ ‘ # Remove duplicated spaces.
}
# Create data a[‘spaces’] = [’ in between ‘, ‘ sides ‘, ‘end ‘, ‘ start’, None] a[‘species’] = [’ sp.’, ‘ sp’, ‘sp ‘, ‘ sp. ‘, ‘species’] a[‘occus’] = [‘haemolytic streptococcus’,
‘haemolytic strep’, ‘haemolytic strep.’, ‘haemolytic strep. aureus’, ‘haemolytic strep aureus’]
- a[‘occus2’] = [‘strep.aureus’,
‘staph.aureus’, ‘methicillin resistant staph.aureus’, ‘feo strepococcus’, ‘streptococcus feo’]
# Cleaned cleaned = a.copy(deep=True) cleaned = cleaned.replace(regex=REGEX_MAP) cleaned = cleaned.apply(lambda x: x.str.strip() if x.dtype == “object” else x)
# Show print(“-”*80) print(” Original”) print(a) print(” Cleaned”) print(cleaned)
# ————————————————— # Haemolytic # ————————————————– # .. note: https://regex101.com/r/KFXCCM/1
# —————————————————- # Test genus at the beginning # —————————————————- # Import regular expressions import re
# Import function from pyamr.datasets.clean import word_to_start
# Species examples series = pd.Series([‘is viridians, enteroccocus en’,
‘is viridians, enteroccocus’, ‘viridians enteroccocus’, ‘enterococcus viridians’, ‘non haemolytic feo enteroccocus’, ‘non-haemolytic enteroccocus’, ‘non haemolytic enteroccocus’, ‘vancomycin resistant enteroccocus’, None])
# Corrected corrected = series.apply(word_to_start, w=’enteroccocus’)
# Show print(“-”*80) print(”
Raw”) print(series) print(” Corrected”) print(corrected)
# —————————————————- # Test hyphen # —————————————————- # Import from pyamr.datasets.clean import hyphen_before
# Haemolytic examples series= pd.Series([‘non haemolytic something’,
‘ non haemolytic something’, ‘ when beta haemolytic something’, ‘ whn gamma haemolytic’, ‘non haemolytic’])
# Correct it corrected = series.apply(hyphen_before, w=’haemolytic’)
# Show print(“-”*80) print(”
Raw:”) print(series) print(” Corrected:”) print(corrected)
# —————————————————- # Full test # —————————————————- # In this section, we do a full test, specially for # those examples that have shown to be problematic in # the final microorganisms.csv outcome. # Import clean common from pyamr.datasets.clean import clean_common from pyamr.datasets.registries import _clean_microorganism
# Create dataframe df = pd.DataFrame()
# Add microorganism names. df[‘microorganism_name’] = [
‘non haemolytic streptococcus aureus’, ‘this non haemolytic streptococcus’, ‘beta-haemolytic streptococcus group a’, ‘beta-haemolytic streptococcus group b’, ‘beta-haemolytic streptococcus group c’, ‘beta-haemolytic streptococcus group c/g’, ‘beta-haemolytic streptococcus group g’, ‘this is haemolyticus’, ‘perestreptococcus’, ‘Coagulase negative staphylococcus’, ‘Methicillin Resistant Staph.aureus’, ‘mixed streptococcus alpha-haemolytic’, ‘non-coliform lactose fermenting’, ‘Non-haemolytic streptococcus’, ‘Non-lactose Fermenting Coliform’, ‘Vancomycin Resistant Enterococcus’, ‘ non- lactose fermenting coliform’, ‘non-lactose fermenting coliform’, ‘mixed lactose fermenting coliform’, ‘paenibacillus sp’, ‘paenibacillus sp.’, ‘paenibacillus sp..’, ‘escherichia coli o157’, ‘* mrsa * isolated’, ‘streptococcus milleri group’, ‘aspergillus fumigatus’
]
# Any alterations df.microorganism_name = df.microorganism_name.str.upper()
# Clean #aux = clean_common(df.copy(deep=True)) aux = _clean_microorganism(df.microorganism_name)
# Show print(“-”*80) print(” Data:”) print(df) print(” Corrected:”) print(aux)