6. Using Python on ARCHER
- 6.1 Deciding which modules to use
- 6.2 Python for data analysis: the Anaconda distributions
- 6.3 Python for HPC: the native distributions
- 6.4 Virtual Environments
Python is supported on ARCHER both for running intensive parallel jobs and also as an analysis tool. This chapter describes how to use Python in either of these scenarios and also provides advice on getting the best performance out of the Python programs you are running.
When you log onto ARCHER, no Python module is loaded by default. You will need to load either the anaconda or the python-compute module to access the functionality described below. Running python without loading a module first will result in your using the operating system default Python.
6.1 Deciding which modules to use
The first step in using Python on ARCHER is deciding which is the most appropriate set of modules to use. The easiest way to decide is to consider which nodes you will be running on:
- If you are going to run Python on the login nodes or the PP nodes then you should use the Anaconda distribution.
- If you are running Python on the compute nodes (via aprun) then you should use the native distribution.
6.2 Python for data analysis: the Anaconda distributions
For serial data analysis, we provide modules containing the Anaconda scientific Python distribution.
Note: the anaconda modules will not work on the ARCHER compute nodes.. You should use the native Python modules documented below for running on the compute nodes. If you require Anaconda on the compute nodes, we provide anaconda-compute modules.
There are two Anaconda distributions installed on ARCHER for the login and PP nodes:
- anaconda/2.2.0-python2 - Anaconda 2.2.0 with Python 2.7
- anaconda/2.2.0-python3 - Anaconda 2.2.0 with Python 3.4
To load the Anaconda Python environment you should use:
module load anaconda
As 2.2.0-python2 is the default this would add the Anaconda Python 2 environment to your session. To load the Python 3 version you need to fully specify the module name:
module load anaconda/2.2.0-python3
Full details on the Anaconda distributions can be found on the Continuum website at:
6.2.1 Packages included in Anaconda distributions
You can list the packages currently available in the distribution you have loaded with the command conda list:
user@eslogin001:~> module load anaconda user@eslogin001:~> conda list # packages in environment at /home/y07/y07/cse/anaconda/2.2.0-python2: # _license 1.1 py27_0 abstract-rendering 0.5.1 np19py27_0 anaconda 2.2.0 np19py27_0 argcomplete 0.8.4 py27_0 astropy 1.0.1 np19py27_0 basemap 1.0.7 np19py27_0 ...
6.2.2 Adding packages to the Anaconda distribution
The simplest way to add packages to the Anaconda distribution for your own use is to create a Virtual Environment as described below and use pip to add additional packages.
6.3 Python for HPC: the native distributions
When you are using Python on the compute nodes you should use the native Python modules.
You do this by loading the "python-compute" module in your job submission script with:
module load python-compute
Note: there are versions of "python-compute" available for both Python 2 and Python 3.
6.3.1 Python packages for native distributions
Unlike the Ananconda distributions, the native Python distribution does not have the performance packages such as numpy built in as they are compiled from source against the Cray or Intel libraries.
If you wish to use these packages then you need to add them to your environment separately after you have loaded the "python-compute" module
All of the module names for these packages are prepended with "pc-" (Python compute) to make them easier to identify. For example, the numpy modules are:
user@eslogin001:~> module avail pc-numpy ------------------------- /opt/modules/packages-archer ------------------------- pc-numpy/1.8.0 pc-numpy/1.9.2-libsci pc-numpy/1.8.0-libsci(default) pc-numpy/1.9.2-mkl pc-numpy/1.8.0-mkl pc-numpy/1.9.2-mkl-python3 pc-numpy/1.8.0-python3
To see a full list of modules available for native Python use:
module avail pc-
You can view packages installed within the Python distribution itself (i.e. available without loading further system modules) with the command pip list:
user@eslogin002:~> module load python-compute user@eslogin002:~> pip list apache-libcloud (0.15.1) backports.ssl-match-hostname (184.108.40.206) Biggus (0.8.0) biopython (1.64) cf-python (1.0.3) colorama (0.3.1) Cython (0.21.1) ...
6.3.2 Adding packages to the native distribution
The simplest way to add packages to the native distribution yourself is to create a Virtual Environment as described below and use pip to add them.
If you wish to compile performance packages for the compute nodes then please contact the ARCHER Helpdesk in the first instance for advice.
6.3.3 Anaconda on the compute nodes
Although the Anaconda distribution is not optimised for the ARCHER compute nodes and will provide inferior performance compared to the native distribution its flexibility means that it may be of some use to ARCHER users on compute nodes.
The standard anaconda modules will not work on the ARCHER compute nodes as they are installed on the /home file system, which is not accessible from the compute nodes.
If you wish to use the Anaconda environment on the ARCHER compute nodes, you must load the anaconda-compute module (usually in your job submission script). For example:
module load anaconda-compute
Would load the Python 2 version of the anaconda module for use on the ARCHER compute nodes.
As for the standard anaconda modules, there is a Python 3 version available too. The anaconda-compute modules have exactly the same packages installed as the standard anaconda packages.
6.4 Virtual Environments
The virtualenv tool is installed and supported for both the Anaconda (login and PP nodes) and native Python (compute node) distributions on ARCHER. This enables users to create their own Python environments with a custom set of packages. This has several advantages:
- Package installations can be more easily managed
- Multiple environments can be set up and easily switched between, facilitating work on multiple projects
- Snapshots or environment "freezes" can be shared with colleagues to reproduce the conditions for a simulation
Full details on virtual environments can be found in the Python documentation at:
6.4.1 Basic Use
Note: if you wish to use virtual environments for Python code running on the compute nodes then the virtualenv directory you create must be on the /work file system.
A virtual environment named venv can be created as follows:
user@eslogin008:~/test> module load anaconda # (or module load python-compute) user@eslogin008:~/test> virtualenv venv New python executable in venv/bin/python Installing setuptools............done. Installing pip...............done.
This creates a venv folder in the current working directory encapsulating the environment. You can switch to the environment with the command:
user@eslogin008:~/test> source venv/bin/activate (venv)user@eslogin008:~/test>
which will prefix your command prompt with the environment name, here (venv).
Installing new packages can be easily accomplished with the pip tool. For example:
(venv)user@eslogin008:~/test> pip install memory_profiler Downloading/unpacking memory-profiler Downloading memory_profiler-0.33.tar.gz Running setup.py egg_info for package memory-profiler Installing collected packages: memory-profiler Running setup.py install for memory-profiler changing mode of build/scripts-2.7/mprof from 644 to 755 changing mode of /home1/z01/z01/user/test/venv/bin/mprof to 755 Successfully installed memory-profiler Cleaning up...
Here we see the new memory profiling module has been installed in our local virtual environment:
(venv)user@eslogin008:~/test> python Python 2.7.6 (default, Mar 10 2014, 14:13:45) [GCC 4.8.1 20130531 (Cray Inc.)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import memory_profiler >>> import inspect >>> inspect.getfile(memory_profiler) '/home1/z01/z01/user/test/venv/lib/python2.7/site-packages/memory_profiler.pyc' >>>
To switch back to the normal environment, simply use the command deactivate as demonstrated below:
(venv)user@eslogin008:~/test> deactivate user@eslogin008:~/test> python iPython 2.7.6 (default, Mar 10 2014, 14:13:45) [GCC 4.8.1 20130531 (Cray Inc.)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import memory_profiler Traceback (most recent call last): File "
", line 1, in ImportError: No module named memory_profiler >>>
Note: by default, your virtual environment will not include packages that are available centrally. Running virtualenv with option --system-site-packages will give access to the global Python packages.
6.4.2 Environment Freeze
The current state of the environment can be output, or "frozen", to a file to enable yourself or another to easily recreate it. This can help ensure a consistent set of conditions when running simulations.
The command to create a snapshot is pip freeze and can be used as follows:
(venv)user@eslogin008:~/test> pip freeze Cython==0.21.1 PyYAML==3.11 memory-profiler==0.33 mpi4py==1.3.1 wsgiref==0.1.2 (venv)user@eslogin008:~/test> pip freeze > venv_snapshot.txt
The venv_snapshot.txt file can then be used to recreate the environment as follows:
(venv)user@eslogin008:~/test> pip install -r venv_snapshot.txt