pyamr.datasets package

Subpackages

pyamr.datasets.microbiology package

Submodules

pyamr.datasets.clean module

Functions:

`clean_basic`(data)	Performs the basic cleaning.
`clean_clwsql008`(data[, clean_microorganism])	Performs cleaning for clwsql008 data
`clean_clwsql008_old`(data[, verbose])	This method cleans microbiology data from clwsql008.
`clean_common`(data[, verbose])	This method cleans the microbiology data.
`clean_format`(data)	Final formatting...
`clean_legacy`(data[, clean_microorganism, ...])	This method cleans microbiology data from legacy.
`clean_legacy_old`(data[, verbose])	This method cleans microbiology data from legacy.
`clean_microorganism`(data)	This method....
`clean_mimic`(data)	This method...
`hyphen_before`(x, w)	Ensures hyphen between words is correct.
`invert`(d)
`string_replace`(series[, remove])	This method corrects the strings.
`word_to_start`(x, w[, pos, verbose])	Moves the word within the string.

pyamr.datasets.clean.clean_basic(data)[source]

Performs the basic cleaning.

Everything to lowercase
Remove spaces begin/end (strip)
Remove duplicate spaces (regexp)
Remove duplicates

Parameters:: data (pd.DataFrame) – The data to clean.
Returns:: The cleaned data
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_clwsql008(data, clean_microorganism=True)[source]

Performs cleaning for clwsql008 data

rename columns
clean basic
correct issue with sensitivities
correct issue with date_received

Parameters:: data (pd.DataFrame) – The data to clean.
Returns:: The cleaned data
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_clwsql008_old(data, verbose=10)[source]

This method cleans microbiology data from clwsql008.

Parameters:: data (pd.DataFrame) – The dataframe with the data
Returns:: The cleaned dataframe.
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_common(data, verbose=10)[source]

This method cleans the microbiology data.

It assumes the following columns are imputed:: date_received date_outcome microorganism_code microorganism_name (required = True) antimicrobial_code antimicrobial_name method_code method_name sensitivity_code sensitivity_name

Parameters:: data (pd.DataFrame) – The dataframe to clean
Returns:: The cleaned dataframe
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_format(data)[source]: Final formatting…

pyamr.datasets.clean.clean_legacy(data, clean_microorganism=True, verbose=10)[source]

This method cleans microbiology data from legacy.

Rename columns
clean basic
Add sensitivity code
Correct specimen issue

Parameters:: data (pd.DataFrame) – The data to clean
Returns:: The cleaned data
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_legacy_old(data, verbose=10)[source]

This method cleans microbiology data from legacy.

Parameters:: data (pd.DataFrame) – The dataframe with the data
Returns:: The cleaned dataframe.
Return type:: pd.DataFrame

pyamr.datasets.clean.clean_microorganism(data)[source]: This method….

pyamr.datasets.clean.clean_mimic(data)[source]: This method…

pyamr.datasets.clean.hyphen_before(x, w)[source]

Ensures hyphen between words is correct.

Parameters:

x (string) – The string to format
w (string) – The word preceded by hyphen.

Returns:

The formatted string

Return type:

string

pyamr.datasets.clean.invert(d)[source]

pyamr.datasets.clean.string_replace(series, remove={})[source]

This method corrects the strings.

Parameters:

series
remove

pyamr.datasets.clean.word_to_start(x, w, pos='start', verbose=0)[source]

Moves the word within the string.

Parameters:

x (string) – The string to format
w (word) – The word to relocate within the string.
pos (string, default start) – The position to insert the word. The possible options are start (at the beginning) or end (at the end) of the string.
verbose (int) – Level of verbosity

Returns:

Formatted string

Return type:

string

pyamr.datasets.load module

Functions:

`fixture`(name, **kwargs)	Load fixtures
`load_data_mimic`([folder])	This method loads the susceptibility data.
`load_data_nhs`([folder])	This method loads the susceptibility data.
`load_microbiology_folder`(path, folder[, ...])	This method loads the susceptibility data.
`load_registry_antimicrobials`()	This method returns the antimicrobials registry
`load_registry_microorganisms`()	This method returns the microorganisms registry
`make_susceptibility`()	This method returns sample data (Anonymised)
`make_timeseries`()	This method creates a hard-coded time series.

pyamr.datasets.load.fixture(name, **kwargs)[source]

Load fixtures

Parameters:: name (string) – The name of the file within the fixtures folder.
Return type:: pd.DataFrame

pyamr.datasets.load.load_data_mimic(folder='susceptibility-v0.0.1', **kwargs)[source]: This method loads the susceptibility data.

pyamr.datasets.load.load_data_nhs(folder='susceptibility-v0.0.2', **kwargs)[source]: This method loads the susceptibility data.

pyamr.datasets.load.load_microbiology_folder(path, folder, glob_pattern='susceptibility-*.csv', **kwargs)[source]

This method loads the susceptibility data.

Note

It assumes all the susceptibility data is stored in csv files whose files name starts with ‘susceptibility’. In addition, it assumes that the additional iformation is is available in files named ‘antimicrobials.csv’ and ‘microorganisms.csv’

Parameters:

path (string) – The path where the folder is located.
folder (string) – Name of the folder with the data.
kwargs – Arguments to pass to pd.read_csv

Returns:

susceptibility – The susceptibility test data
db_abxs – The registries with the antimicrobials
db_orgs – The registry with the microorganisms

pyamr.datasets.load.load_registry_antimicrobials()[source]: This method returns the antimicrobials registry

pyamr.datasets.load.load_registry_microorganisms()[source]: This method returns the microorganisms registry

pyamr.datasets.load.make_susceptibility()[source]: This method returns sample data (Anonymised)

pyamr.datasets.load.make_timeseries()[source]

This method creates a hard-coded time series.

Returns:: The x values, the y values and the frequencies.
Return type:: x, y, f

pyamr.datasets.registries module

Classes:

`AntimicrobialRegistry`([keyword, order, ...])	Registry for antimicrobials
`MicroorganismRegistry`(**kwargs)	Registry for microorganisms
`Registry`([keyword, order, subset, fclean])	This is basically a lookup table.

Functions:

`acronym_series`(series[, unique_acronyms])	This method...
`acronym_series_unique`(series[, split_n, ...])	Computes unique acronyms.
`clean_specimen`(series)	srs = right -> end sls = left -> end
`create_registry`(data[, keyword, keep])	Creates registry from data.
`invert`(d)
`length_exceptions`(x)	This method...

class pyamr.datasets.registries.AntimicrobialRegistry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]

Bases: Registry

Registry for antimicrobials

Methods:

combine(dataframe[, on])

Combines an external dataframe with the registry.

Attributes:

reg

combine(dataframe, on='name')[source]: Combines an external dataframe with the registry.

reg = None

class pyamr.datasets.registries.MicroorganismRegistry(**kwargs)[source]

Bases: Registry

Registry for microorganisms

Methods:

`binomial_name`()
`combine`(dataframe[, on])	Combines an external dataframe with the registry.
`uuid`()

Attributes:

`reg`
`taxonomy`

binomial_name()[source]

combine(dataframe, on='name')[source]: Combines an external dataframe with the registry.

reg = None

taxonomy = ['domain', 'class', 'order', 'family', 'genus', 'species', 'subspecies']

uuid()[source]

class pyamr.datasets.registries.Registry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]

Bases: object

This is basically a lookup table.

Attributes:

`FCLEAN`
`ORDER`
`REG`
`RENAME_COLUMNS`
`SUBSET`

Methods:

`clean`(series)
`combine`(data)
`fit`(data)	This method...
`fit_transform`(data, **kwargs)	Fits and transforms
`getr`([prepend])	Returns the registry DataFrame
`replace`(series[, key, value])	This method...
`transform`(data[, replace, include_id])	Transform data

FCLEAN = {}

ORDER = ['id', 'name', 'code', 'description', 'original']

REG = None

RENAME_COLUMNS = {}

SUBSET = ['name', 'original']

clean(series)[source]

combine(data)[source]

fit(data)[source]

This method…

Parameters:: data (pd.DataFrame) – The DataFrame expects to have the code, the name and the description. Specially the name. Think what happens if other missing.

fit_transform(data, **kwargs)[source]: Fits and transforms

getr(prepend=False)[source]: Returns the registry DataFrame

replace(series, key='original', value='name')[source]: This method…

transform(data, replace={}, include_id=True)[source]

Transform data

Parameters:

data (pd.DataFrame) – The data to transform.
replace
include_id

Returns:

The data transformed

Return type:

pd.DataFrame

pyamr.datasets.registries.acronym_series(series, unique_acronyms=False, **kwargs)[source]

This method…

Parameters:

series (pd.Series) – The series with the names to convert in acronyms.
unique_acronyms
exclude_acronyms
split_n (int)
verbose (int) – Level of verbosity
loops_strategy (function) – The function to indicate what length combinations should be used on each iteration. By default it will use the default method _loops_strategy_lengths which split the series in two and returns all possible lengths from (4, 4) till (max_len, max_len). The signature of the function to pass as loops_strategy is as follows:

param x:

series

return:

list (array of lengths)
kwgs_split (dict) – The parameters to pass to the split function
kwgs_acronym (dict) – The parameters to pass to the acronym function.

Returns:

The acronym series

Return type:

pd.Series

pyamr.datasets.registries.acronym_series_unique(series, split_n=1, exclude_acronyms=[], loops_strategy=None, verbose=10, kwgs_acronym={})[source]: Computes unique acronyms.

pyamr.datasets.registries.clean_specimen(series)[source]

srs = right -> end sls = left -> end

Parameters:: series –
Returns:

pyamr.datasets.registries.create_registry(data, keyword=None, keep=None)[source]

Creates registry from data.

Parameters:

data (pd.DataFrame) – The data
keyword (string) – The keyword for the columns. All columns starting with such keyword will be kept and used for the registry.
keep (list) – The list of columns to keep for the registry

pyamr.datasets.registries.invert(d)[source]

pyamr.datasets.registries.length_exceptions(x)[source]: This method…

pyamr.datasets package

Subpackages

Submodules

pyamr.datasets.clean module

pyamr.datasets.load module

pyamr.datasets.registries module

Module contents