pyamr.datasets package
Subpackages
- pyamr.datasets.microbiology package
Submodules
pyamr.datasets.clean module
Functions:
|
Performs the basic cleaning. |
|
Performs cleaning for clwsql008 data |
|
This method cleans microbiology data from clwsql008. |
|
This method cleans the microbiology data. |
|
Final formatting... |
|
This method cleans microbiology data from legacy. |
|
This method cleans microbiology data from legacy. |
|
This method.... |
|
This method... |
|
Ensures hyphen between words is correct. |
|
|
|
This method corrects the strings. |
|
Moves the word within the string. |
- pyamr.datasets.clean.clean_basic(data)[source]
Performs the basic cleaning.
Everything to lowercase
Remove spaces begin/end (strip)
Remove duplicate spaces (regexp)
Remove duplicates
- Parameters:
data (pd.DataFrame) – The data to clean.
- Returns:
The cleaned data
- Return type:
pd.DataFrame
- pyamr.datasets.clean.clean_clwsql008(data, clean_microorganism=True)[source]
Performs cleaning for clwsql008 data
rename columns
clean basic
correct issue with sensitivities
correct issue with date_received
- Parameters:
data (pd.DataFrame) – The data to clean.
- Returns:
The cleaned data
- Return type:
pd.DataFrame
- pyamr.datasets.clean.clean_clwsql008_old(data, verbose=10)[source]
This method cleans microbiology data from clwsql008.
- Parameters:
data (pd.DataFrame) – The dataframe with the data
- Returns:
The cleaned dataframe.
- Return type:
pd.DataFrame
- pyamr.datasets.clean.clean_common(data, verbose=10)[source]
This method cleans the microbiology data.
- It assumes the following columns are imputed:
date_received date_outcome microorganism_code microorganism_name (required = True) antimicrobial_code antimicrobial_name method_code method_name sensitivity_code sensitivity_name
- Parameters:
data (pd.DataFrame) – The dataframe to clean
- Returns:
The cleaned dataframe
- Return type:
pd.DataFrame
- pyamr.datasets.clean.clean_legacy(data, clean_microorganism=True, verbose=10)[source]
This method cleans microbiology data from legacy.
Rename columns
clean basic
Add sensitivity code
Correct specimen issue
- Parameters:
data (pd.DataFrame) – The data to clean
- Returns:
The cleaned data
- Return type:
pd.DataFrame
- pyamr.datasets.clean.clean_legacy_old(data, verbose=10)[source]
This method cleans microbiology data from legacy.
- Parameters:
data (pd.DataFrame) – The dataframe with the data
- Returns:
The cleaned dataframe.
- Return type:
pd.DataFrame
- pyamr.datasets.clean.hyphen_before(x, w)[source]
Ensures hyphen between words is correct.
- Parameters:
x (string) – The string to format
w (string) – The word preceded by hyphen.
- Returns:
The formatted string
- Return type:
string
- pyamr.datasets.clean.string_replace(series, remove={})[source]
This method corrects the strings.
- Parameters:
series
remove
- pyamr.datasets.clean.word_to_start(x, w, pos='start', verbose=0)[source]
Moves the word within the string.
- Parameters:
x (string) – The string to format
w (word) – The word to relocate within the string.
pos (string, default start) – The position to insert the word. The possible options are start (at the beginning) or end (at the end) of the string.
verbose (int) – Level of verbosity
- Returns:
Formatted string
- Return type:
string
pyamr.datasets.load module
Functions:
|
Load fixtures |
|
This method loads the susceptibility data. |
|
This method loads the susceptibility data. |
|
This method loads the susceptibility data. |
This method returns the antimicrobials registry |
|
This method returns the microorganisms registry |
|
This method returns sample data (Anonymised) |
|
This method creates a hard-coded time series. |
- pyamr.datasets.load.fixture(name, **kwargs)[source]
Load fixtures
- Parameters:
name (string) – The name of the file within the fixtures folder.
- Return type:
pd.DataFrame
- pyamr.datasets.load.load_data_mimic(folder='susceptibility-v0.0.1', **kwargs)[source]
This method loads the susceptibility data.
- pyamr.datasets.load.load_data_nhs(folder='susceptibility-v0.0.2', **kwargs)[source]
This method loads the susceptibility data.
- pyamr.datasets.load.load_microbiology_folder(path, folder, glob_pattern='susceptibility-*.csv', **kwargs)[source]
This method loads the susceptibility data.
Note
It assumes all the susceptibility data is stored in csv files whose files name starts with ‘susceptibility’. In addition, it assumes that the additional iformation is is available in files named ‘antimicrobials.csv’ and ‘microorganisms.csv’
- Parameters:
path (string) – The path where the folder is located.
folder (string) – Name of the folder with the data.
kwargs – Arguments to pass to pd.read_csv
- Returns:
susceptibility – The susceptibility test data
db_abxs – The registries with the antimicrobials
db_orgs – The registry with the microorganisms
- pyamr.datasets.load.load_registry_antimicrobials()[source]
This method returns the antimicrobials registry
pyamr.datasets.registries module
Classes:
|
Registry for antimicrobials |
|
Registry for microorganisms |
|
This is basically a lookup table. |
Functions:
|
This method... |
|
Computes unique acronyms. |
|
srs = right -> end sls = left -> end |
|
Creates registry from data. |
|
|
This method... |
- class pyamr.datasets.registries.AntimicrobialRegistry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]
Bases:
Registry
Registry for antimicrobials
Methods:
combine
(dataframe[, on])Combines an external dataframe with the registry.
Attributes:
- reg = None
- class pyamr.datasets.registries.MicroorganismRegistry(**kwargs)[source]
Bases:
Registry
Registry for microorganisms
Methods:
combine
(dataframe[, on])Combines an external dataframe with the registry.
uuid
()Attributes:
- reg = None
- taxonomy = ['domain', 'class', 'order', 'family', 'genus', 'species', 'subspecies']
- class pyamr.datasets.registries.Registry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]
Bases:
object
This is basically a lookup table.
Attributes:
Methods:
clean
(series)combine
(data)fit
(data)This method...
fit_transform
(data, **kwargs)Fits and transforms
getr
([prepend])Returns the registry DataFrame
replace
(series[, key, value])This method...
transform
(data[, replace, include_id])Transform data
- FCLEAN = {}
- ORDER = ['id', 'name', 'code', 'description', 'original']
- REG = None
- RENAME_COLUMNS = {}
- SUBSET = ['name', 'original']
- pyamr.datasets.registries.acronym_series(series, unique_acronyms=False, **kwargs)[source]
This method…
- Parameters:
series (pd.Series) – The series with the names to convert in acronyms.
unique_acronyms
exclude_acronyms
split_n (int)
verbose (int) – Level of verbosity
loops_strategy (function) – The function to indicate what length combinations should be used on each iteration. By default it will use the default method _loops_strategy_lengths which split the series in two and returns all possible lengths from (4, 4) till (max_len, max_len). The signature of the function to pass as loops_strategy is as follows:
- param x:
series
- return:
list (array of lengths)
kwgs_split (dict) – The parameters to pass to the split function
kwgs_acronym (dict) – The parameters to pass to the acronym function.
- Returns:
The acronym series
- Return type:
pd.Series
- pyamr.datasets.registries.acronym_series_unique(series, split_n=1, exclude_acronyms=[], loops_strategy=None, verbose=10, kwgs_acronym={})[source]
Computes unique acronyms.
- pyamr.datasets.registries.clean_specimen(series)[source]
srs = right -> end sls = left -> end
- Parameters:
series –
- Returns:
- pyamr.datasets.registries.create_registry(data, keyword=None, keep=None)[source]
Creates registry from data.
- Parameters:
data (pd.DataFrame) – The data
keyword (string) – The keyword for the columns. All columns starting with such keyword will be kept and used for the registry.
keep (list) – The list of columns to keep for the registry