6. Using Python on ARCHER

Python is supported on ARCHER both for running intensive parallel jobs and also as an analysis tool. This chapter describes how to use Python in either of these scenarios and also provides advice on getting the best performance out of the Python programs you are running.

When you log onto ARCHER, no Python module is loaded by default. You will need to load either the anaconda or the python-compute module to access the functionality described below. Running python without loading a module first will result in your using the operating system default Python.

6.1 Deciding which modules to use

The first step in using Python on ARCHER is deciding which is the most appropriate set of modules to use. The easiest way to decide is to consider which nodes you will be running on:

  • If you are going to run Python on the login nodes or the PP nodes then you should use the Anaconda distribution.
  • If you are running Python on the compute nodes (via aprun) then you should use the native distribution.

6.2 Python for data analysis: the Anaconda distributions

For serial data analysis, we provide modules containing the Anaconda scientific Python distribution.

Note: the anaconda modules will not work on the ARCHER compute nodes.. You should use the native Python modules documented below for running on the compute nodes. If you require Anaconda on the compute nodes, we provide anaconda-compute modules.

There are two Anaconda distributions installed on ARCHER for the login and PP nodes:

  • anaconda/2.2.0-python2 - Anaconda 2.2.0 with Python 2.7
  • anaconda/2.2.0-python3 - Anaconda 2.2.0 with Python 3.4

To load the Anaconda Python environment you should use:

module load anaconda

As 2.2.0-python2 is the default this would add the Anaconda Python 2 environment to your session. To load the Python 3 version you need to fully specify the module name:

module load anaconda/2.2.0-python3

Full details on the Anaconda distributions can be found on the Continuum website at:

6.2.1 Packages included in Anaconda distributions

You can list the packages currently available in the distribution you have loaded with the command conda list:

user@eslogin001:~> module load anaconda
user@eslogin001:~> conda list
# packages in environment at /home/y07/y07/cse/anaconda/python2:
#
_license                  1.1                      py27_1  
alabaster                 0.7.10                   py27_0  
anaconda                  custom                   py27_0  
anaconda-client           1.6.3                    py27_0  
anaconda-navigator        1.2.3                    py27_0  
anaconda-project          0.6.0                    py27_0  
argcomplete               1.0.0                    py27_1  
asn1crypto                0.22.0                   py27_0  
astroid                   1.4.9                    py27_0
...

6.2.2 Adding packages to the Anaconda distribution

The simplest way to add packages to the Anaconda distribution for your own use is to create a Custom Environment as described below and use conda to add additional packages.

6.3 Python for HPC: the native distributions

When you are using Python on the compute nodes you should use the native Python modules.

You do this by loading the "python-compute" module in your job submission script with:

module load python-compute

Note: there are versions of "python-compute" available for both Python 2 and Python 3.

6.3.1 Python packages for native distributions

Unlike the Anaconda distributions, the native Python distribution does not have the performance packages such as numpy built in as they are compiled from source against the Cray or Intel libraries.

If you wish to use these packages then you need to add them to your environment separately after you have loaded the "python-compute" module

All of the module names for these packages are prepended with "pc-" (Python compute) to make them easier to identify. For example, the numpy modules are:

user@eslogin001:~> module avail pc-numpy

------------------------- /opt/modules/packages-archer -------------------------
pc-numpy/1.9.2-libsci(default)  pc-numpy/1.9.2-mkl  pc-numpy/1.9.2-mkl-python3

To see a full list of modules available for native Python use:

module avail pc-

You can view packages installed within the Python distribution itself (i.e. available without loading further system modules) with the command pip list:

user@eslogin002:~> module load python-compute
user@eslogin002:~> pip list
apache-libcloud (0.15.1)
Biggus (0.8.0)
biopython (1.64)
cf-python (1.0.3)
colorama (0.3.1)
Cython (0.21.1)
...

6.3.2 Adding packages to the native distribution

The simplest way to add packages to the native distribution yourself is to create a Virtual Environment as described in Section 6.4.3, using pip to add them.

If you wish to compile performance packages for the compute nodes then please contact the ARCHER Helpdesk in the first instance for advice.

6.3.3 Anaconda on the compute nodes

Although the Anaconda distribution is not optimised for the ARCHER compute nodes and will provide inferior performance compared to the native distribution its flexibility means that it may be of some use to ARCHER users on compute nodes.

The standard anaconda modules will not work on the ARCHER compute nodes as they are installed on the /home file system, which is not accessible from the compute nodes.

If you wish to use the Anaconda environment on the ARCHER compute nodes, you must load the anaconda-compute module (usually in your job submission script). For example:

module load anaconda-compute

loads the Python 2 version of the anaconda module for use on the ARCHER compute nodes.

As for the standard anaconda modules, there is a Python 3 version available too. The anaconda-compute modules have exactly the same packages installed as the standard anaconda packages.

6.4 Custom Environments

The conda tool is provided by the anaconda (login and PP nodes) and anaconda-compute (compute nodes) modules on ARCHER. This tool enables users to create their own Python environments with a custom set of packages. This has several advantages:

  • Package installations can be more easily managed
  • Multiple environments can be set up and easily switched between, facilitating work on multiple projects
  • Snapshots or environment "freezes" can be shared with colleagues to reproduce the conditions for a simulation

Full details on managing custom environments can be found in the conda documentation at:

6.4.1 Basic use

Note: if you wish to use custom environments for Python code running on the compute nodes then the custom environment directory you create must be on the /work file system.

A custom environment named myenv can be created as follows:

user@eslogin008:~> module load anaconda      # (or module load anaconda-compute)
user@eslogin008:~> conda create --name myenv
Fetching package metadata .........
Solving package specifications: 
Package plan for installation in environment /home/z01/z01/user/.conda/envs/myenv:

Proceed ([y]/n)?y

This creates a myenv folder within the home directory that encapsulates the environment. You can also run conda create with the --path option to install the environment to a specific folder. Once created, you can switch to the environment with the command:

user@eslogin008:~> source activate myenv
(myenv)user@eslogin008:~>

which will prefix your command prompt with the environment name, here (myenv).

Installing new packages can be easily accomplished with the conda tool. For example:

(myenv)user@eslogin008:~> conda install memory_profiler
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/z01/z01/user/.conda/envs/myenv:

The following NEW packages will be INSTALLED:

    certifi:         2016.2.28-py27_0
    memory_profiler: 0.43-py27_0     
    openssl:         1.0.2l-0        
    pip:             9.0.1-py27_1    
    psutil:          5.2.2-py27_0    
    python:          2.7.13-0        
    readline:        6.2-2           
    setuptools:      36.4.0-py27_1   
    sqlite:          3.13.0-0        
    tk:              8.5.18-0        
    wheel:           0.29.0-py27_0   
    zlib:            1.2.11-0

Proceed ([y]/n)?y

Here we see the new memory profiling module has been installed in the local custom environment:

(myenv)user@eslogin008:~> conda list
# packages in environment at /home/z01/z01/user/.conda/envs/myenv:
#
certifi                   2016.2.28                py27_0  
memory_profiler           0.43                     py27_0  
openssl                   1.0.2l                        0  
pip                       9.0.1                    py27_1  
psutil                    5.2.2                    py27_0  
python                    2.7.13                        0  
readline                  6.2                           2  
setuptools                36.4.0                   py27_1  
sqlite                    3.13.0                        0  
tk                        8.5.18                        0  
wheel                     0.29.0                   py27_0  
zlib                      1.2.11                        0
(myenv)user@eslogin008:~>
(myenv)user@eslogin008:~> python
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>
>>> import memory_profiler
>>> import inspect
>>> inspect.getfile(memory_profiler)
'/home/z01/z01/user/.conda/envs/myenv/lib/python2.7/site-packages/memory_profiler.pyc'
>>> exit()

Please note, for some package installations it may also be necessary to specify a channel such as conda-forge. For example, the following command installs the pygobject module.

(myenv)user@eslogin008:~> conda install -c conda-forge pygobject 

To switch back to the normal environment, simply use the command source deactivate as demonstrated below:

(myenv)user@eslogin008:~> source deactivate
user@eslogin008:~> python
Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import memory_profiler
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named memory_profiler
>>> exit()

6.4.2 Exporting custom environments

The current state of the environment can be exported to a file to enable yourself or another to easily recreate it. This can help ensure a consistent set of conditions when running simulations.

The following command creates a snapshot of the custom environment.

(myenv)user@eslogin008:~> conda env export > myenv_snapshot.yml

The myenv_snapshot.yaml file can then be used to recreate that environment.

user2@eslogin008:~> conda env create -f myenv_snapshot.yml

6.4.3 Virtual environments on the compute nodes

Note: if you wish to use virtual environments for Python code running on the compute nodes then the virtualenv directory you create must be on the /work file system.

When using the compute nodes, loading the anconda-compute module will allow you to manage custom environments by using the conda command as described above. However, if you require the best possible performance you should use the python-compute module instead. This module uses different commands for setting up virtual environments: you will now need to use virtualenv/pip instead of conda.

The following text explains how to manipulate virtual environments using virtualenv and pip.

A virtual environment named venv can be created as follows:

user@eslogin008:/work/z01/z01/user> module load python-compute
user@eslogin008:/work/z01/z01/user> virtualenv venv
New python executable in venv/bin/python
Installing setuptools............done.
Installing pip...............done.

This creates a venv folder in the current working directory encapsulating the environment. You can switch to the environment with the command:

user@eslogin008:/work/z01/z01/user> source venv/bin/activate
(venv)user@eslogin008:/work/z01/z01/user>

which will prefix your command prompt with the environment name, here (venv).

Installing new packages can be easily accomplished with the pip tool. For example:

(venv)user@eslogin008:/work/z01/z01/user> pip install memory_profiler
Downloading/unpacking memory-profiler
  Downloading memory_profiler-0.33.tar.gz
  Running setup.py egg_info for package memory-profiler

Installing collected packages: memory-profiler
  Running setup.py install for memory-profiler
    changing mode of build/scripts-2.7/mprof from 644 to 755

    changing mode of /fs3/z01/z01/user/venv/bin/mprof to 755
Successfully installed memory-profiler
Cleaning up...

Here we see the new memory profiling module has been installed in our local virtual environment:

(venv)user@eslogin008:/work/z01/z01/user> python
Python 2.7.6 (default, Mar 10 2014, 14:13:45)
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import memory_profiler
>>> import inspect
>>> inspect.getfile(memory_profiler)
'/fs3/z01/z01/user/venv/lib/python2.7/site-packages/memory_profiler.pyc'
>>> exit()

To switch back to the normal environment, simply use the command deactivate as demonstrated below:

(venv)user@eslogin008:/work/z01/z01/user> deactivate

Note: by default, your virtual environment will not include packages that are available centrally. Running virtualenv with option --system-site-packages will give access to the global Python packages.

As with conda, the current state of the environment can be output, or "frozen", to a file to enable yourself or another to easily recreate it. This can help ensure a consistent set of conditions when running simulations.

The command to create a snapshot is pip freeze and can be used as follows.

(venv)user@eslogin008:/work/z01/z01/user> pip freeze
Cython==0.21.1
PyYAML==3.11
memory-profiler==0.47
mpi4py==1.3.1
wsgiref==0.1.2
(venv)user@eslogin008:/work/z01/z01/user> pip freeze > venv_snapshot.txt

The venv_snapshot.txt file can then be used to recreate the environment.

(venv)user2@eslogin008:/work/z01/z01/user2> pip install -r venv_snapshot.txt

Full details on managing virtual environments with pip can be found at: