6. Using Python on ARCHER

Python is supported on ARCHER both for running intensive parallel jobs and also as an analysis tool. This chapter describes how to use Python in either of these scenarios and also provides advice on getting the best performance out of the Python programs you are running.

When you log onto ARCHER, no Python module is loaded by default. You will need to load either the anaconda or the python-compute module to access the functionality described below. Running python without loading a module first will result in your using the operating system default Python.

6.1 Deciding which modules to use

The first step in using Python on ARCHER is deciding which is the most appropriate set of modules to use. The easiest way to decide is to consider which nodes you will be running on:

  • If you are going to run Python on the login nodes or the PP nodes then you should use the Anaconda distribution.
  • If you are running Python on the compute nodes (via aprun) then you should use the native distribution.

6.2 Python for data analysis: the Anaconda distributions

For serial data analysis, we provide modules containing the Anaconda scientific Python distribution.

Note: the anaconda modules will not work on the ARCHER compute nodes.. You should use the native Python modules documented below for running on the compute nodes. If you require Anaconda on the compute nodes, we provide anaconda-compute modules.

There are two Anaconda distributions installed on ARCHER for the login and PP nodes:

  • anaconda/2.2.0-python2 - Anaconda 2.2.0 with Python 2.7
  • anaconda/2.2.0-python3 - Anaconda 2.2.0 with Python 3.4

To load the Anaconda Python environment you should use:

module load anaconda

As 2.2.0-python2 is the default this would add the Anaconda Python 2 environment to your session. To load the Python 3 version you need to fully specify the module name:

module load anaconda/2.2.0-python3

Full details on the Anaconda distributions can be found on the Continuum website at:

6.2.1 Packages included in Anaconda distributions

You can list the packages currently available in the distribution you have loaded with the command conda list:

user@eslogin001:~> module load anaconda
user@eslogin001:~> conda list
# packages in environment at /home/y07/y07/cse/anaconda/2.2.0-python2:
#
_license                  1.1                      py27_0  
abstract-rendering        0.5.1                np19py27_0  
anaconda                  2.2.0                np19py27_0  
argcomplete               0.8.4                    py27_0  
astropy                   1.0.1                np19py27_0  
basemap                   1.0.7                np19py27_0
...

6.2.2 Adding packages to the Anaconda distribution

The simplest way to add packages to the Anaconda distribution for your own use is to create a Virtual Environment as described below and use pip to add additional packages.

6.3 Python for HPC: the native distributions

When you are using Python on the compute nodes you should use the native Python modules.

You do this by loading the "python-compute" module in your job submission script with:

module load python-compute

Note: there are versions of "python-compute" available for both Python 2 and Python 3.

6.3.1 Python packages for native distributions

Unlike the Ananconda distributions, the native Python distribution does not have the performance packages such as numpy built in as they are compiled from source against the Cray or Intel libraries.

If you wish to use these packages then you need to add them to your environment separately after you have loaded the "python-compute" module

All of the module names for these packages are prepended with "pc-" (Python compute) to make them easier to identify. For example, the numpy modules are:

user@eslogin001:~> module avail pc-numpy

------------------------- /opt/modules/packages-archer -------------------------
pc-numpy/1.8.0                 pc-numpy/1.9.2-libsci
pc-numpy/1.8.0-libsci(default) pc-numpy/1.9.2-mkl
pc-numpy/1.8.0-mkl             pc-numpy/1.9.2-mkl-python3
pc-numpy/1.8.0-python3

To see a full list of modules available for native Python use:

module avail pc-

You can view packages installed within the Python distribution itself (i.e. available without loading further system modules) with the command pip list:

user@eslogin002:~> module load python-compute
user@eslogin002:~> pip list
apache-libcloud (0.15.1)
backports.ssl-match-hostname (3.4.0.2)
Biggus (0.8.0)
biopython (1.64)
cf-python (1.0.3)
colorama (0.3.1)
Cython (0.21.1)
...

6.3.2 Adding packages to the native distribution

The simplest way to add packages to the native distribution yourself is to create a Virtual Environment as described below and use pip to add them.

If you wish to compile performance packages for the compute nodes then please contact the ARCHER Helpdesk in the first instance for advice.

6.3.3 Anaconda on the compute nodes

Although the Anaconda distribution is not optimised for the ARCHER compute nodes and will provide inferior performance compared to the native distribution its flexibility means that it may be of some use to ARCHER users on compute nodes.

The standard anaconda modules will not work on the ARCHER compute nodes as they are installed on the /home file system, which is not accessible from the compute nodes.

If you wish to use the Anaconda environment on the ARCHER compute nodes, you must load the anaconda-compute module (usually in your job submission script). For example:

module load anaconda-compute

Would load the Python 2 version of the anaconda module for use on the ARCHER compute nodes.

As for the standard anaconda modules, there is a Python 3 version available too. The anaconda-compute modules have exactly the same packages installed as the standard anaconda packages.

6.4 Virtual Environments

The virtualenv tool is installed and supported for both the Anaconda (login and PP nodes) and native Python (compute node) distributions on ARCHER. This enables users to create their own Python environments with a custom set of packages. This has several advantages:

  • Package installations can be more easily managed
  • Multiple environments can be set up and easily switched between, facilitating work on multiple projects
  • Snapshots or environment "freezes" can be shared with colleagues to reproduce the conditions for a simulation

Full details on virtual environments can be found in the Python documentation at:

6.4.1 Basic Use

Note: if you wish to use virtual environments for Python code running on the compute nodes then the virtualenv directory you create must be on the /work file system.

A virtual environment named venv can be created as follows:

user@eslogin008:~/test> module load anaconda      # (or module load python-compute)
user@eslogin008:~/test> virtualenv venv
New python executable in venv/bin/python
Installing setuptools............done.
Installing pip...............done.

This creates a venv folder in the current working directory encapsulating the environment. You can switch to the environment with the command:

user@eslogin008:~/test> source venv/bin/activate
(venv)user@eslogin008:~/test>

which will prefix your command prompt with the environment name, here (venv).

Installing new packages can be easily accomplished with the pip tool. For example:

(venv)user@eslogin008:~/test> pip install memory_profiler
Downloading/unpacking memory-profiler
  Downloading memory_profiler-0.33.tar.gz
  Running setup.py egg_info for package memory-profiler

Installing collected packages: memory-profiler
  Running setup.py install for memory-profiler
    changing mode of build/scripts-2.7/mprof from 644 to 755

    changing mode of /home1/z01/z01/user/test/venv/bin/mprof to 755
Successfully installed memory-profiler
Cleaning up...

Here we see the new memory profiling module has been installed in our local virtual environment:

(venv)user@eslogin008:~/test> python
Python 2.7.6 (default, Mar 10 2014, 14:13:45)
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import memory_profiler
>>> import inspect
>>> inspect.getfile(memory_profiler)
'/home1/z01/z01/user/test/venv/lib/python2.7/site-packages/memory_profiler.pyc'
>>>

To switch back to the normal environment, simply use the command deactivate as demonstrated below:

(venv)user@eslogin008:~/test> deactivate
user@eslogin008:~/test> python
iPython 2.7.6 (default, Mar 10 2014, 14:13:45)
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import memory_profiler
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named memory_profiler
>>>

Note: by default, your virtual environment will not include packages that are available centrally. Running virtualenv with option --system-site-packages will give access to the global Python packages.

6.4.2 Environment Freeze

The current state of the environment can be output, or "frozen", to a file to enable yourself or another to easily recreate it. This can help ensure a consistent set of conditions when running simulations.

The command to create a snapshot is pip freeze and can be used as follows:

(venv)user@eslogin008:~/test> pip freeze
Cython==0.21.1
PyYAML==3.11
memory-profiler==0.33
mpi4py==1.3.1
wsgiref==0.1.2
(venv)user@eslogin008:~/test> pip freeze > venv_snapshot.txt

The venv_snapshot.txt file can then be used to recreate the environment as follows:

(venv)user@eslogin008:~/test> pip install -r venv_snapshot.txt