Installation
Creating the virtual environment
A virtual environment is a tool that helps to keep dependencies required by different projects separate by creating isolated python virtual environments for them. This is one of the most important tools that Python developers use.
In recent versions of python (>3) we can use venv. If you have various versions of python you might need to use python3 or py instead.
python -m pip install venv # Install venv
python -m venv <environment-name> # Create environment
Otherwise, using standard virtualenv (linux-based systems)
which python # where is python
pip install virtualenv # Install virtualenv
virtualenv -p <python-path> <environment-name> # create virtualenv
Let’s activate the environment
source <environment-name>/bin/activate # activate environment
To deactivate the environment just type
deactivate # deactivate environment
Warning
Ths might be slightly different on Windows systems. In addition, it is also possible to configure the virtual environment using the python IDE PyCharm and students can get a free licence.
Creating repository from template
Repository template: https://github.com/bahp/fyp-pypkg-template
Open the previous url and create your own repository using it as a template. For this, click on the green “Use as a template” button. Please use the following conventions:
name:
fyp<year>-<imperial_username>
# eg. fyp2020-bahpinclude a brief description of the project
keep it
public
to be able to use github pages.tick
include all branches
.
Once you have created the repository, it should appear on your repository list. Note that the url to clone the repository is under the green “code” button and it should look like https://github.com/bahp/fyp2020-bahp.git but replacing bahp with your github username and fyp2020-bahp with your repository name.
Let’s double check that Github Pages is enabled. For this, click ‘Settings’, scroll down to the GitHub Pages section and select: (i) gh-pages as source branch and (ii) docs as a source folder.
Set up the repository locally
On your computer, open the terminal and create your repository folder:
$ mkdir fyp2020-bahp # fyp<yyyy>-<imperial-username>
Move inside the folder:
$ cd fyp2020-bahp
Clone each branch in a different folder:
$ git clone -b main https://github.com/bahp/fyp2020-bahp.git
$ mv <repository_name> main
$ git clone -b gh-pages https://github.com/bahp/fyp2020-bahp.gi
$ mv <repository_name> gh-pages
Your repository is now ready!
The main branch contains all the source files and the gh-pages will be just used to host the documentation in html. Brief summary of the contents below:
gh-pages
|- docs
- documentation
main
|- docs
|- build
|- source
|- conf.py # config - sphinx documentation
|- index.rst # index - sphinx documentation
make.bat
Makefile # run to create documentation
|- examples
|- pkgname # your library
|- core # contains your pkg core classes
|- tests # contains your pkg tests - pytest
|- utils # contains your pkg utils
Installing your pkg in editable mode
During your project you will find that some behaviour can be encapsulated in
either classes or methods. This will save you repeating code in every single
script you write. These classes and/or methods will be part of your library.
You can find an example in pkgname
.
It is recommended to install the package in editable (develop) mode. It puts a link (actually *.pth files) into the python installation to your code, so that your package is installed, but any changes will immediately take effect. This way all your can import your package the usual way.
First, ensure that the repository is in your local machine (we just did it on the previous section)
git clone https://github.com/<username>/<reponame>.git
Let’s install the requirements. Move to the folder where requirements.txt is and install all the required libraries as shown in the statements below. In the scenario of missing libraries, just install them using pip.
python -m pip install -r requirements.txt # Install al the requirements
Note
Note that you will need to add required libraries so other people
can start using your package quickly. For example, if you use use
scikit-learn then include scikit-learn in requirements.txt
.
Move to the directory where the setup.py is. Please note that although setup.py
is
a python script, it is not recommended to install it executing that file with python
directly. Instead lets use the package manager pip.
Warning
Feel free to change your package name if you want. However, note that
to make things work you will need to make the appropriate changes in existing
files: setup.cfg
and plot_greetings_01.py
.
python -m pip install --editable . # Install in editable mode
Read more about packages
Generating documentation
Note
To generate autodocs automatically look at sphinx-napoleon and sphinx-autodocs. In general the numpy documentation style is used thorough the code.
Let’s use Sphinx to generate the documentation. First, you will need to install sphinx,
sphinx-gallery, sphinx-std-theme and matplotlib. Note that they might have been already
installed through the requirements.txt
.
Let’s install the required libraries.
python -m pip install sphinx # Install sphinx
python -m pip install sphinx-gallery # Install sphinx-gallery for examples
python -m pip install sphinx-std-theme # Install sphinx-std-theme CSS
python -m pip install matplotlib # Install matplotlib for plot examples
Then go to the docs folder within main and generate the api docs
sphinx-apidoc -o ./source/_apidoc ../pyamr
Now lets generate the documentation:
make github
Note that make github is defined within the Makefile and it is equivalent to:
make clean html
cp -a _build/html/. ../../gh-pages/docs
These commands first generate the sphinx documentation in html and then copies the html folder into the gh-pages branch. You can see how the documentation looks like locally by opening the gh-pages/docs/index.html file. If you move to the gh-pages branch and push all the changes the documentation will be also available online thanks to GitHub Pages. You can access it through your repository page (see Environments / GitHub Pages / Active)
Note that in order to edit the documentation you need to create .rst files and include these newly created files in the index.rst document. An example is shown in docs/source/tutorials/setup.rst.
In addition, you can create and document python scripts that will be automatically
included in the documentation (gallery examples) using sphinx-gallery. Remember
to include the folder(s) containing the scripts in the variable sphinx_gallery_conf
in the conf.py file as shown below for tutorial.
# Configuration for sphinx_gallery
sphinx_gallery_conf = {
# path to your example scripts
'examples_dirs': ['../../examples/tutorial'],
# path to where to save gallery generated output
'gallery_dirs': ['../source/_examples/tutorial'],
# Other
'line_numbers': True,
'download_all_examples': False,
'within_subsection_order': FileNameSortKey
}
Also remember to include the .rst file automatically generated
the docs/index.rst
file.
.. toctree::
:maxdepth: 2
:caption: Example Galleries
:hidden:
_examples/tutorial/index
To include the output of the script (e.g. graph or console output) in the documentation
remember to prefix the script file name with plot
(e.g. plot_sample_01.py). You can
find the following examples in examples/tutorial:
plot_greetings_01.py
script using your pkgname package.
plot_sample_01.py
script just including all the code.
plot_sample_02.py
script documenting steps within the code.
Running tests
Just go to the main folder and run:
pytest
Read more about pytest
Now it is time to start coding!
I would recommend to start with dirty scripts
in the scripts folder.
After some time coding, you might identify that part of your implementation
could be encapsulated in a number of classes and methods. Or that some
methods are being called very often. In such case, include those methods
in your pkgname
so you can import them easily and therefore don’t end up
copy/paste code all the time.
Once you have some results, create an example, document it and keep it clean. For example, if you have trained a model and plotted some graphs to evaluate its performance, create a file in gallery, document it properly within the code and include the folder in sphinx-gallery. This will facilitate to review the code and results and you will have everything ready to include it later in your report!
Where to store the data?
Option I: When the datasets used in the examples are completely different, we can include the datasets directly within the example folder as shown in the folder structure below.
examples
|- tutorial
|- yourexample
|- datasets # put here any data
|- pathology.csv
|- outputs # put here any outcomes
|- datasets
|- pathology_fmt.csv
format_pathology.py
do_something.py
do_something_else.py
Option II: If your examples use always the same data (which is probably the case) you could include the data in the main folder so it looks something like this.
datasets
|- microbiology.csv
|- pathology.csv
examples
|- tutorial
|- yourexample
|- outputs # put here any outcomes
|- datasets
|- pathology_fmt.csv
|- microbiology_fmt.csv
format_pathology.py # loads datasets/pathology.csv saves in outputs
format_microbiology.py # loads datasets/microbiology.csv saves in outputs
plot_roc_and_cfmatrix.py # loads outputs/datasets/....
plot_temporal_analysis.py # loads outputs/datasets/...
Also, if you create various examples, the portion of the code that loads and saves data might become a bit repetitive. However, this is usually not included in the package. Thus you could use the code below so the paths are not absolute but to the file you are running:
# Libraries
import pathlib
# -------------------------------
# Create configuration from data
# -------------------------------
# Current path
curr_path = pathlib.Path(__file__).parent.absolute()
# Folder with the raw data
path_data = '{0}/../../datasets/'
# Path with fixed data
path_micro = '{0}/outputs/datasets/{1}'.format(
curr_path, 'microbiology_fmt.csv')
These are just suggestions, if you feel more comfortable following other folder structure and/or approaches feel free to do so!
Happy coding!