pyamr.datasets package

Subpackages

Submodules

pyamr.datasets.clean module

Functions:

clean_basic(data)

Performs the basic cleaning.

clean_clwsql008(data[, clean_microorganism])

Performs cleaning for clwsql008 data

clean_clwsql008_old(data[, verbose])

This method cleans microbiology data from clwsql008.

clean_common(data[, verbose])

This method cleans the microbiology data.

clean_format(data)

Final formatting...

clean_legacy(data[, clean_microorganism, ...])

This method cleans microbiology data from legacy.

clean_legacy_old(data[, verbose])

This method cleans microbiology data from legacy.

clean_microorganism(data)

This method....

clean_mimic(data)

This method...

hyphen_before(x, w)

Ensures hyphen between words is correct.

invert(d)

string_replace(series[, remove])

This method corrects the strings.

word_to_start(x, w[, pos, verbose])

Moves the word within the string.

pyamr.datasets.clean.clean_basic(data)[source]

Performs the basic cleaning.

  1. Everything to lowercase

  2. Remove spaces begin/end (strip)

  3. Remove duplicate spaces (regexp)

  4. Remove duplicates

Parameters:

data (pd.DataFrame) – The data to clean.

Returns:

The cleaned data

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_clwsql008(data, clean_microorganism=True)[source]

Performs cleaning for clwsql008 data

  1. rename columns

  2. clean basic

  3. correct issue with sensitivities

  4. correct issue with date_received

Parameters:

data (pd.DataFrame) – The data to clean.

Returns:

The cleaned data

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_clwsql008_old(data, verbose=10)[source]

This method cleans microbiology data from clwsql008.

Parameters:

data (pd.DataFrame) – The dataframe with the data

Returns:

The cleaned dataframe.

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_common(data, verbose=10)[source]

This method cleans the microbiology data.

It assumes the following columns are imputed:

date_received date_outcome microorganism_code microorganism_name (required = True) antimicrobial_code antimicrobial_name method_code method_name sensitivity_code sensitivity_name

Parameters:

data (pd.DataFrame) – The dataframe to clean

Returns:

The cleaned dataframe

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_format(data)[source]

Final formatting…

pyamr.datasets.clean.clean_legacy(data, clean_microorganism=True, verbose=10)[source]

This method cleans microbiology data from legacy.

  1. Rename columns

  2. clean basic

  3. Add sensitivity code

  4. Correct specimen issue

Parameters:

data (pd.DataFrame) – The data to clean

Returns:

The cleaned data

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_legacy_old(data, verbose=10)[source]

This method cleans microbiology data from legacy.

Parameters:

data (pd.DataFrame) – The dataframe with the data

Returns:

The cleaned dataframe.

Return type:

pd.DataFrame

pyamr.datasets.clean.clean_microorganism(data)[source]

This method….

pyamr.datasets.clean.clean_mimic(data)[source]

This method…

pyamr.datasets.clean.hyphen_before(x, w)[source]

Ensures hyphen between words is correct.

Parameters:
  • x (string) – The string to format

  • w (string) – The word preceded by hyphen.

Returns:

The formatted string

Return type:

string

pyamr.datasets.clean.invert(d)[source]
pyamr.datasets.clean.string_replace(series, remove={})[source]

This method corrects the strings.

Parameters:
  • series

  • remove

pyamr.datasets.clean.word_to_start(x, w, pos='start', verbose=0)[source]

Moves the word within the string.

Parameters:
  • x (string) – The string to format

  • w (word) – The word to relocate within the string.

  • pos (string, default start) – The position to insert the word. The possible options are start (at the beginning) or end (at the end) of the string.

  • verbose (int) – Level of verbosity

Returns:

Formatted string

Return type:

string

pyamr.datasets.load module

Functions:

fixture(name, **kwargs)

Load fixtures

load_data_mimic([folder])

This method loads the susceptibility data.

load_data_nhs([folder])

This method loads the susceptibility data.

load_microbiology_folder(path, folder[, ...])

This method loads the susceptibility data.

load_registry_antimicrobials()

This method returns the antimicrobials registry

load_registry_microorganisms()

This method returns the microorganisms registry

make_susceptibility()

This method returns sample data (Anonymised)

make_timeseries()

This method creates a hard-coded time series.

pyamr.datasets.load.fixture(name, **kwargs)[source]

Load fixtures

Parameters:

name (string) – The name of the file within the fixtures folder.

Return type:

pd.DataFrame

pyamr.datasets.load.load_data_mimic(folder='susceptibility-v0.0.1', **kwargs)[source]

This method loads the susceptibility data.

pyamr.datasets.load.load_data_nhs(folder='susceptibility-v0.0.2', **kwargs)[source]

This method loads the susceptibility data.

pyamr.datasets.load.load_microbiology_folder(path, folder, glob_pattern='susceptibility-*.csv', **kwargs)[source]

This method loads the susceptibility data.

Note

It assumes all the susceptibility data is stored in csv files whose files name starts with ‘susceptibility’. In addition, it assumes that the additional iformation is is available in files named ‘antimicrobials.csv’ and ‘microorganisms.csv’

Parameters:
  • path (string) – The path where the folder is located.

  • folder (string) – Name of the folder with the data.

  • kwargs – Arguments to pass to pd.read_csv

Returns:

  • susceptibility – The susceptibility test data

  • db_abxs – The registries with the antimicrobials

  • db_orgs – The registry with the microorganisms

pyamr.datasets.load.load_registry_antimicrobials()[source]

This method returns the antimicrobials registry

pyamr.datasets.load.load_registry_microorganisms()[source]

This method returns the microorganisms registry

pyamr.datasets.load.make_susceptibility()[source]

This method returns sample data (Anonymised)

pyamr.datasets.load.make_timeseries()[source]

This method creates a hard-coded time series.

Returns:

The x values, the y values and the frequencies.

Return type:

x, y, f

pyamr.datasets.registries module

Classes:

AntimicrobialRegistry([keyword, order, ...])

Registry for antimicrobials

MicroorganismRegistry(**kwargs)

Registry for microorganisms

Registry([keyword, order, subset, fclean])

This is basically a lookup table.

Functions:

acronym_series(series[, unique_acronyms])

This method...

acronym_series_unique(series[, split_n, ...])

Computes unique acronyms.

clean_specimen(series)

srs = right -> end sls = left -> end

create_registry(data[, keyword, keep])

Creates registry from data.

invert(d)

length_exceptions(x)

This method...

class pyamr.datasets.registries.AntimicrobialRegistry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]

Bases: Registry

Registry for antimicrobials

Methods:

combine(dataframe[, on])

Combines an external dataframe with the registry.

Attributes:

reg

combine(dataframe, on='name')[source]

Combines an external dataframe with the registry.

reg = None
class pyamr.datasets.registries.MicroorganismRegistry(**kwargs)[source]

Bases: Registry

Registry for microorganisms

Methods:

binomial_name()

combine(dataframe[, on])

Combines an external dataframe with the registry.

uuid()

Attributes:

reg

taxonomy

binomial_name()[source]
combine(dataframe, on='name')[source]

Combines an external dataframe with the registry.

reg = None
taxonomy = ['domain', 'class', 'order', 'family', 'genus', 'species', 'subspecies']
uuid()[source]
class pyamr.datasets.registries.Registry(keyword='', order=['id', 'name', 'code', 'description', 'original'], subset=['name', 'original'], fclean={})[source]

Bases: object

This is basically a lookup table.

Attributes:

FCLEAN

ORDER

REG

RENAME_COLUMNS

SUBSET

Methods:

clean(series)

combine(data)

fit(data)

This method...

fit_transform(data, **kwargs)

Fits and transforms

getr([prepend])

Returns the registry DataFrame

replace(series[, key, value])

This method...

transform(data[, replace, include_id])

Transform data

FCLEAN = {}
ORDER = ['id', 'name', 'code', 'description', 'original']
REG = None
RENAME_COLUMNS = {}
SUBSET = ['name', 'original']
clean(series)[source]
combine(data)[source]
fit(data)[source]

This method…

Parameters:

data (pd.DataFrame) – The DataFrame expects to have the code, the name and the description. Specially the name. Think what happens if other missing.

fit_transform(data, **kwargs)[source]

Fits and transforms

getr(prepend=False)[source]

Returns the registry DataFrame

replace(series, key='original', value='name')[source]

This method…

transform(data, replace={}, include_id=True)[source]

Transform data

Parameters:
  • data (pd.DataFrame) – The data to transform.

  • replace

  • include_id

Returns:

The data transformed

Return type:

pd.DataFrame

pyamr.datasets.registries.acronym_series(series, unique_acronyms=False, **kwargs)[source]

This method…

Parameters:
  • series (pd.Series) – The series with the names to convert in acronyms.

  • unique_acronyms

  • exclude_acronyms

  • split_n (int)

  • verbose (int) – Level of verbosity

  • loops_strategy (function) – The function to indicate what length combinations should be used on each iteration. By default it will use the default method _loops_strategy_lengths which split the series in two and returns all possible lengths from (4, 4) till (max_len, max_len). The signature of the function to pass as loops_strategy is as follows:

    param x:

    series

    return:

    list (array of lengths)

  • kwgs_split (dict) – The parameters to pass to the split function

  • kwgs_acronym (dict) – The parameters to pass to the acronym function.

Returns:

The acronym series

Return type:

pd.Series

pyamr.datasets.registries.acronym_series_unique(series, split_n=1, exclude_acronyms=[], loops_strategy=None, verbose=10, kwgs_acronym={})[source]

Computes unique acronyms.

pyamr.datasets.registries.clean_specimen(series)[source]

srs = right -> end sls = left -> end

Parameters:

series

Returns:

pyamr.datasets.registries.create_registry(data, keyword=None, keep=None)[source]

Creates registry from data.

Parameters:
  • data (pd.DataFrame) – The data

  • keyword (string) – The keyword for the columns. All columns starting with such keyword will be kept and used for the registry.

  • keep (list) – The list of columns to keep for the registry

pyamr.datasets.registries.invert(d)[source]
pyamr.datasets.registries.length_exceptions(x)[source]

This method…

Module contents