Advanced

Building the Registries

A registry can be described as a centralized database or record-keeping system that stores and manages information about a specific subject. In our scenario, we maintain a registry of microorganisms and antimiocrobials, which includes details like their taxonomy (genus, species), the gram stain type (posotive, or negative), the category (penicillins) and so on.

To create the registries we have provided the following bash script

pyamr/datasets/microbiology/registry/automated_run.sh
# ------------------------------
# Create Microorganisms Registry
# ------------------------------
# Create gram_stain database
cd microorganisms/gram_stain/
python script.py

# Create taxonomy database
cd ../../
cd microorganisms/taxonomy/
python script.py

# Create microorganism registry
cd ../../
cd microorganisms/
python script.py

# ------------------------------
# Create Antimicrobials Registry
# ------------------------------
# Create categories database
cd ../
cd antimicrobials/categories/
python script.py

# Create antimicrobials registry
cd ../../
cd antimicrobials/
python script.py

$SHELL

which can be run as follows

./automated_run.sh

Each of the steps within the script are explained below.

Microorganisms

The folder structure looks as follows. Note that each of the folder has at least the script.py and db_xxx.csv files. The former is a python script with the code necessary to format, validate and finally create the corresponding database. This database is named db_xxx.csv and is used at a latter stage to create the final registry.

microorganisms
    |- gram_stain
        |- db_gram_stain.csv
        |- gram_negative.txt      // external resource
        |- gram_positive.txt      // external resource
        |- script.py
    |- subspecies
        |- db_groups.csv
        |- script.py
    |- taxonomy
        |- db_taxonomy.py
        |- bac120_taxonomy.tsv    // external resource
        |- script.py
    |- uuids
        |- db_codes.csv
        |- script.py
    |- script.py                  // generates registry_microorganisms.csv

The main script.py is used to generate the final registry_microorganisms.py

  • Generating the Gram stain database

    In this script we create a database with the gram stain information. For this, we have downloaded the list of genus and species belonging to each of the groups from the Wikipedia. Thus, we have created two files gram_negative.txt and gram_positive.txt and included all the information from FGN and FGP respectively. These files look as follows:

    A
    Acetic acid bacteria
    Acidaminococcus
    Acinetobacter baumannii
    Acinetobacter guerrae
    

    Then, we have combined them using script.py into db_gram_stain.csv.

    genus,species,gram_stain
    Acetic,acid bacteria,n
    Achromobacter,,n
    Acidaminococcus,,n
    Acinetobacter,baumannii,n
    
  • Generating the Taxonomy database

    In this script we create a daabase with the taxonomy information. For this, we have downloaded the information from FBT and saved the file bac120_taxonomy.tsv. Then we have done some minor amendments, corrections and checks in script.py and the result has been saved into a single .csv named db_taxonomy.csv.

    domain,phylum,class,order,family,genus,species
    Bacteria,Proteobacteria,Gammaproteobacteria,Enterobacterales,Enterobacteriaceae,Escherichia,flexneri
    Bacteria,Proteobacteria,Gammaproteobacteria,Enterobacterales,Enterobacteriaceae,Salmonella,enterica
    Bacteria,Firmicutes,Bacilli,Staphylococcales,Staphylococcaceae,Staphylococcus,aureus
    Bacteria,Firmicutes,Bacilli,Lactobacillales,Streptococcaceae,Streptococcus,pneumoniae
    
  • Generating the uuids database

    In this script we create a database with the unique codes that want to assign to each microorganism species.

    genus,species,subspecies,code
    Achromobacter,xylosoxidans,,AXYL
    Achromobacter,,,SACHRO
    Acinetobacter,baumannii,,ABAU
    Acinetobacter,lwoffi,,ALWO
    
  • Generating the subspecies database

    In this script we create a database with any additional category that we want to include. For example, we can see the categories subspecies, group, coagulase production, haemolysis and host.

    genus,species,subspecies,group,coagulase_production,haemolysis,host
    Staphylococcus,aureus,,A,positive,,
    Staphylococcus,borealis,,A,negative,,
    Staphylococcus,capitis,,A,negative,,
    Staphylococcus,epidermis,,A,negative,,
    

The final registry looks as follows:

domain,phylum,class,order,family,genus,species,acronym,gram_stain
Bacteria,Armatimonadota,Abditibacteria,Abditibacteriales,Abditibacteriaceae,Abditibacterium,utsteinense,ABDI_UTST,
Bacteria,Armatimonadota,Abditibacteria,Abditibacteriales,Abditibacteriaceae,Abditibacterium,,ABDITIBACTERIUM,
Bacteria,Firmicutes,Bacilli,Lactobacillales,Aerococcaceae,Abiotrophia,defectiva,ABIO_DEFE,
Bacteria,Firmicutes,Bacilli,Lactobacillales,Aerococcaceae,Abiotrophia,,ABIOTROPHIA,

Antimicrobials

The folder structure looks as follows. Note that each of the folder has at least the script.py and db_xxx.csv files. The former is a python script with the code necessary to format, validate and finally create the corresponding database. This database is named db_xxx.csv and is used at a latter stage to create the final registry.

antimicrobials
    |- categories
        |- db_categories.csv
        |- category_aminoglycosides.txt      // external resource
        |- category_carbapenems.txt          // external resource
        |- category_aminoglycosides.txt      // external resource
        |- category_oxazolidinones.txt       // external resource
        |- category_penicillins.txt          // external resource
        |- category_quinolones.txt           // external resource
        |- category_tetracyclines.txt        // external resource
        |- script.py
    |- subspecies
        |- db_groups.csv
        |- script.py
    |- taxonomy
        |- db_taxonomy.py
        |- bac120_taxonomy.tsv    // external resource
        |- script.py
    |- uuids
        |- db_codes.csv
        |- script.py
    |- script.py                  // generates registry_microorganisms.csv

The main script.py is used to generate the final registry_antimicrobials.py

  • Generating the categories database

    In this script we create a database with the main categories. The database has been created manually and the other category_xxxx.txt files are just for reference.

    name,category
    Amikacin,Aminoglycosides
    Amoxycillin,Aminopenicillins
    Amp c markers,
    Amphotericin,
    

The final registry looks as follows:

name,category,acronym
Amikacin,Aminoglycosides,AMIK
Amoxycillin,Aminopenicillins,AMOX
Amp c markers,,AMP_C
Amphotericin,,AMPH

What fixtures are available?

In the context of software testing, a fixture refers to a set of predefined data or conditions that are used to create a known and stable starting point for tests. Fixtures provide the necessary context for executing tests and can include data, objects, configurations, or any other elements required for the test to run successfully.

List of available fixtures

Path

Description

Categories

fixtures_01.csv

Empty

fixtures_02.csv

Does not exist

fixtures_03.csv

Susceptibility test records data created manually.

fixtures_04.csv

Summary table necessary to compute ASAI.

fixtures_05.csv

Susceptibility test records data created manually for ACSI.

fixtures_06.csv

Susceptibility test records data created manually for ACSI.

fixtures_07.csv

fixture_antibiogram.csv

fixture_spectrum.csv

fixture_surveillance

cddep/summary.csv cddep/susceptibility.csv cddep/prescriptions.csv cddep/outcome.csv

lancet/mmc2_MIS.csv lancet/mmc2_CMIS.csv lancet/mmc2_MARKOV.csv

nhs

Incomplete!

mimic

Incomplete!

Fixture 01

Fixture 02

Fixture 03

DATE,SPECIMEN,MICROORGANISM,ANTIMICROBIAL,SENSITIVITY
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,resistant

Fixture 04

GENUS,SPECIE,ANTIBIOTIC,GRAM,RESISTANCE,FREQUENCY,THRESHOLD,W_SPECIE,W_GENUS
Staphylococcus,coagulase negative,ANTIBIOTIC_1,P,0.88,1,0.20,0.1,0.3333333333333
Staphylococcus,epidermidis,ANTIBIOTIC_1,P,0.11,1,0.20,0.1,0.3333333333333
Staphylococcus,haemolyticus,ANTIBIOTIC_1,P,0.32,1,0.20,0.1,0.3333333333333
Staphylococcus,lugdumensis,ANTIBIOTIC_1,P,0.45,1,0.20,0.1,0.3333333333333

Fixture 05

DATE,LAB_NUMBER,SPECIMEN,MICROORGANISM,ANTIMICROBIAL,SENSITIVITY
2021-01-01,lab1,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,lab1,BLDCUL,ECOL,ATAZ,sensitive
2021-01-01,lab1,BLDCUL,ECOL,ACAZ,sensitive
2021-01-01,lab1,BLDCUL,ECOL,ACIP,resistant

Fixture 06

DATE,LAB_NUMBER,SPECIMEN,MICROORGANISM,ANTIMICROBIAL,SENSITIVITY
2021-01-01,lab1,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,lab1,BLDCUL,ECOL,ACIP,sensitive
2021-01-01,lab2,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,lab2,BLDCUL,ECOL,ACIP,resistant

Fixture 07

DATE,SPECIMEN,MICROORGANISM,ANTIMICROBIAL,SENSITIVITY
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,sensitive
2021-01-01,BLDCUL,ECOL,AAUG,resistant

CDDEP - How to compute DRI?

LANCET - How to compute ACSI?

MIMIC

NHS