6. Using Python on ARCHER

Python is supported on ARCHER both for running intensive parallel jobs and also as an analysis tool. This chapter describes how to use Python in either of these scenarios.

The Python environments on ARCHER contain the most commonly used modules. If you wish to install additional Python modules, we recommend that you install a local copy of Miniconda within your own directories to manage this. This is decribed in more detail below.

When you log onto ARCHER, no Python module is loaded by default. You will need to load either the anaconda or the python-compute module to access the functionality described below. Running python without loading a module first will result in your using the operating system default Python.

6.1 Deciding which modules to use

The first step in using Python on ARCHER is deciding which is the most appropriate set of modules to use. The easiest way to decide is to consider which nodes you will be running on:

  • If you are going to run Python on the login nodes or the PP nodes then you should use the Anaconda distribution.
  • If you are running Python on the compute nodes (via aprun) then you should use the native distribution.

6.2 Python for data analysis: the Anaconda distributions

For serial data analysis, we provide modules containing the Anaconda scientific Python distribution.

Note: the anaconda modules will not work on the ARCHER compute nodes.. You should use the native Python modules documented below for running on the compute nodes. If you require Anaconda on the compute nodes, we provide anaconda-compute modules.

There are two Anaconda distributions installed on ARCHER for the login and PP nodes:

  • anaconda/python2 - Anaconda with Python 2.7
  • anaconda/python3 - Anaconda with Python 3.6

To load the Anaconda Python environment you should use:

module load anaconda

As python2 is the default this would add the Anaconda Python 2 environment to your session. To load the Python 3 version you need to fully specify the module name:

module load anaconda/python3

Full details on the Anaconda distributions can be found on the Continuum website at:

6.2.1 Packages included in Anaconda distributions

You can list the packages currently available in the distribution you have loaded with the command conda list:

user@eslogin001:~> module load anaconda
user@eslogin001:~> conda list
# packages in environment at /home/y07/y07/cse/anaconda/python2:
#
_license                  1.1                      py27_1  
alabaster                 0.7.10                   py27_0  
anaconda                  custom                   py27_0  
anaconda-client           1.6.3                    py27_0  
anaconda-navigator        1.2.3                    py27_0  
anaconda-project          0.6.0                    py27_0  
argcomplete               1.0.0                    py27_1  
asn1crypto                0.22.0                   py27_0  
astroid                   1.4.9                    py27_0
...

6.2.2 Adding packages to the Anaconda distribution

Adding packages to the central Anaconda ditribution cannot be done by users. If you wish to have additional packages, we recommend installing your own local version of Miniconda and adding the packages you need. This approach is described in Custom Environment below.

6.3 Python for HPC: the native distributions

When you are using Python on the compute nodes you should use the native Python modules.

You do this by loading the "python-compute" module in your job submission script with:

module load python-compute

Note: there are versions of "python-compute" available for both Python 2 and Python 3.

6.3.1 Python packages for native distributions

Unlike the Anaconda distributions, the native Python distribution does not have the performance packages such as numpy built in as they are compiled from source against the Cray or Intel libraries.

If you wish to use these packages then you need to add them to your environment separately after you have loaded the "python-compute" module

All of the module names for these packages are prepended with "pc-" (Python compute) to make them easier to identify. For example, the numpy modules are:

user@eslogin001:~> module avail pc-numpy

------------------------- /opt/modules/packages-archer -------------------------
pc-numpy/1.9.2-libsci(default)  pc-numpy/1.9.2-mkl  pc-numpy/1.9.2-mkl-python3

To see a full list of modules available for native Python use:

module avail pc-

You can view packages installed within the Python distribution itself (i.e. available without loading further system modules) with the command pip list:

user@eslogin002:~> module load python-compute
user@eslogin002:~> pip list
apache-libcloud (0.15.1)
Biggus (0.8.0)
biopython (1.64)
cf-python (1.0.3)
colorama (0.3.1)
Cython (0.21.1)
...

6.3.2 Adding packages to the native distribution

If you require additional supporting packages that are not performance-critical you can add these to a custom Python virtual environment, which you should create using the virtualenv command that becomes available after loading one of the python-compute modules:

user@eslogin002:~> module load python-compute
user@eslogin002:~> virtualenv --system-site-packages myEnv
New python executable in /home/z01/z01/user/myEnv/bin/python
Installing setuptools, pip, wheel...done.

Note the --system-site-packages flag, which is required on ARCHER.

This environment can then be activated as follows:

user@eslogin002:~> source ~/myEnv/bin/activate
(myEnv) user@eslogin002:~>

Additional packages can then be installed into this Python virtual environment using pip:

(myEnv) user@eslogin002:~> pip install --user newPackage

If you wish to compile performance-critical packages for the compute nodes then please contact the ARCHER Helpdesk in the first instance for advice.

6.3.3 Anaconda on the compute nodes

Although the Anaconda distribution is not optimised for the ARCHER compute nodes and will provide inferior performance compared to the native distribution its, flexibility means that it may be of some use to ARCHER users on compute nodes.

The standard anaconda modules will not work on the ARCHER compute nodes as they are installed on the /home file system, which is not accessible from the compute nodes.

If you wish to use the Anaconda environment on the ARCHER compute nodes, you must load the anaconda-compute module (usually in your job submission script). For example:

module load anaconda-compute

loads the Python 2 version of the anaconda module for use on the ARCHER compute nodes.

As for the standard anaconda modules, there is a Python 3 version available too. The anaconda-compute modules have exactly the same packages installed as the standard anaconda packages.

For Python 3, you need to unload the xalt module, and use aprun -b. For example:

module unload xalt
module load anaconda-compute/python3
aprun -b -n 1 python -c 'print("Hello")'

6.4 Custom Environments

To setup a custom Python environment including packages that are not in the central installation, the simplest approach is the install Miniconda locally in your own directories.

6.4.1 Installing Miniconda

Note: If you wish to use Python on the compute nodes then you must install Miniconda in your /work directories as these are the only ones visible on the compute nodes.

First, you should download Miniconda. You can use wget on ARCHER to do this, for example:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

You can find links to the various miniconda versions on the Miniconda website:

For ARCHER, you should use the Linux 64-bit (bash installer).

Once you have downloaded the installer, you can run it. For example:

user@eslogin008:~> bash Miniconda3-latest-Linux-x86_64.sh 

Welcome to Miniconda3 4.3.31

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>> 
====================================
Miniconda End User License Agreement
====================================

---snip---


Do you accept the license terms? [yes|no]
[no] >>> yes

Miniconda3 will now be installed into this location:
/home/t01/t01/user/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/t01/t01/user/miniconda3] >>> 
PREFIX=/home/t01/t01/user/miniconda3
installing: python-3.6.3-h6c0c0dc_5 ...
installing: ca-certificates-2017.08.26-h1d4fec5_0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-h7cc24e2_2 ...
installing: libstdcxx-ng-7.2.0-h7a57d05_2 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.0-h9df7e31_2 ...
installing: openssl-1.0.2n-hb7f436b_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.3-h55aa19d_2 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-ha838bed_2 ...
installing: libedit-3.1-heed3624_0 ...
installing: readline-7.0-ha6073c6_4 ...
installing: sqlite-3.20.1-hb898158_2 ...
installing: asn1crypto-0.23.0-py36h4639342_0 ...
installing: certifi-2017.11.5-py36hf29ccca_0 ...
installing: chardet-3.0.4-py36h0f667ec_1 ...
installing: idna-2.6-py36h82fb2a8_1 ...
installing: pycosat-0.6.3-py36h0a5515d_0 ...
installing: pycparser-2.18-py36hf9f622e_1 ...
installing: pysocks-1.6.7-py36hd97a5b1_1 ...
installing: ruamel_yaml-0.11.14-py36ha2fb22d_2 ...
installing: six-1.11.0-py36h372c433_1 ...
installing: cffi-1.11.2-py36h2825082_0 ...
installing: setuptools-36.5.0-py36he42e2e1_0 ...
installing: cryptography-2.1.4-py36hd09be54_0 ...
installing: wheel-0.30.0-py36hfd4bba0_1 ...
installing: pip-9.0.1-py36h6c6f9ce_4 ...
installing: pyopenssl-17.5.0-py36h20ba746_0 ...
installing: urllib3-1.22-py36hbe7ace6_0 ...
installing: requests-2.18.4-py36he2e5f8d_1 ...
installing: conda-4.3.31-py36_0 ...
installation finished.
WARNING:
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python interpreter
    in Miniconda3: /home/t01/t01/user/miniconda3
Do you wish the installer to prepend the Miniconda3 install location
to PATH in your /home/t01/t01/user/.bashrc ? [yes|no]
[no] >>> 

You may wish to edit your .bashrc to prepend the Miniconda3 install location to PATH:

export PATH=/home/t01/t01/user/miniconda3/bin:$PATH

Thank you for installing Miniconda3!

Miniconda is now installed in your local directories but we still need to setup a way to access it correctly. There are a number of ways to do this.

  • If you are always going to be using this Python environment on ARCHER and do not wish to use any other Python environment, you can follow the advice of the Miniconda installer and add a line to your .bashrc file
  • You can export PATH every time you wish to use you local install by using the bash command export PATH=/home/t01/t01/user/miniconda3/bin:$PATH (using the correct PATH as specified by the installer). This will become tedious if you use the environment often!
  • You can create an alias in your .bashrc file to set the path. For example, adding the line alias condasetup="export PATH=/home/t01/t01/user/miniconda3/bin:$PATH" would allow you to use the command condasetup to initialise the Miniconda environment.
  • You could also create a modulefile to provide a way to initialise the environment using module load ... as we do for our Anaconda environments. Please contact the helpdesk if you want help to do this.

6.4.2 Installing packages into Miniconda

Once you have installed Miniconda and setup your environment to access it, you can then add whatever packages you wish to the installation using the conda install ... command. For example:

user@eslogin001:~> conda install numpy
Fetching package metadata ...............
Solving package specifications: .

Package plan for installation in environment /home/t01/t01/user/miniconda3:

The following NEW packages will be INSTALLED:

    blas:        1.1-openblas                  conda-forge
    libgfortran: 3.0.0-1                                  
    numpy:       1.14.0-py36_blas_openblas_200 conda-forge [blas_openblas]
    openblas:    0.2.20-7                      conda-forge

The following packages will be UPDATED:

    conda:       4.3.31-py36_0                             --> 4.3.33-py36_0 conda-forge

The following packages will be SUPERSEDED by a higher-priority channel:

    conda-env:   2.6.0-h36134e3_1                          --> 2.6.0-0       conda-forge

Proceed ([y]/n)? y

conda-env-2.6. 100% |########################################################################| Time: 0:00:00  33.71 kB/s
libgfortran-3. 100% |########################################################################| Time: 0:00:00   7.85 MB/s
openblas-0.2.2 100% |########################################################################| Time: 0:00:03   4.84 MB/s
blas-1.1-openb 100% |########################################################################| Time: 0:00:00   1.33 MB/s
numpy-1.14.0-p 100% |########################################################################| Time: 0:00:01   5.00 MB/s
conda-4.3.33-p 100% |########################################################################| Time: 0:00:00   5.71 MB/s

Here we see the numpy module has been installed in the local environment:

user@eslogin001:~> conda list
# packages in environment at /home/t01/t01/user/miniconda3:
#
asn1crypto                0.23.0           py36h4639342_0  
blas                      1.1                    openblas    conda-forge
ca-certificates           2017.08.26           h1d4fec5_0  
certifi                   2017.11.5        py36hf29ccca_0  
cffi                      1.11.2           py36h2825082_0  
chardet                   3.0.4            py36h0f667ec_1  
conda                     4.3.33                   py36_0    conda-forge
conda-env                 2.6.0                         0    conda-forge
cryptography              2.1.4            py36hd09be54_0  
idna                      2.6              py36h82fb2a8_1  
libedit                   3.1                  heed3624_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 7.2.0                h7cc24e2_2  
libgfortran               3.0.0                         1  
libstdcxx-ng              7.2.0                h7a57d05_2  
ncurses                   6.0                  h9df7e31_2  
numpy                     1.14.0          py36_blas_openblas_200  [blas_openblas]  conda-forge
openblas                  0.2.20                        7    conda-forge
openssl                   1.0.2n               hb7f436b_0  
pip                       9.0.1            py36h6c6f9ce_4  
pycosat                   0.6.3            py36h0a5515d_0  
pycparser                 2.18             py36hf9f622e_1  
pyopenssl                 17.5.0           py36h20ba746_0  
pysocks                   1.6.7            py36hd97a5b1_1  
python                    3.6.3                h6c0c0dc_5  
readline                  7.0                  ha6073c6_4  
requests                  2.18.4           py36he2e5f8d_1  
ruamel_yaml               0.11.14          py36ha2fb22d_2  
setuptools                36.5.0           py36he42e2e1_0  
six                       1.11.0           py36h372c433_1  
sqlite                    3.20.1               hb898158_2  
tk                        8.6.7                hc745277_3  
urllib3                   1.22             py36hbe7ace6_0  
wheel                     0.30.0           py36hfd4bba0_1  
xz                        5.2.3                h55aa19d_2  
yaml                      0.1.7                had09818_2  
zlib                      1.2.11               ha838bed_2  

Please note, for some package installations it may also be necessary to specify a channel such as conda-forge. For example, the following command installs the pygobject module.

user@eslogin001:~> conda install -c conda-forge pygobject 

6.4.3 Custom Python environments with MPI

Miniconda does not provide an mpi4py installation that is compatible with the ARCHER interconnect so you cannot use a Miniconda to provide an environment for this. In this case you will need to build on the centrally-installed python-compute environment by compiling the packages required yourself and modifying PYTHONPATH.

If you need help to do this, then please contact the helpdesk.