6. Using Python on ARCHER
- 6.1 Deciding which modules to use
- 6.2 Python for data analysis: the Anaconda distributions
- 6.3 Python for HPC: the native distributions
- 6.4 Custom environments
Python is supported on ARCHER both for running intensive parallel jobs and also as an analysis tool. This chapter describes how to use Python in either of these scenarios.
The Python environments on ARCHER contain the most commonly used modules. If you wish to install additional Python modules, we recommend that you install a local copy of Miniconda within your own directories to manage this. This is decribed in more detail below.
When you log onto ARCHER, no Python module is loaded by default. You will need to load either the anaconda or the python-compute module to access the functionality described below. Running python without loading a module first will result in your using the operating system default Python.
6.1 Deciding which modules to use
The first step in using Python on ARCHER is deciding which is the most appropriate set of modules to use. The easiest way to decide is to consider which nodes you will be running on:
- If you are going to run Python on the login nodes or the PP nodes then you should use the Anaconda distribution.
- If you are running Python on the compute nodes (via aprun) then you should use the native distribution.
6.2 Python for data analysis: the Anaconda distributions
For serial data analysis, we provide modules containing the Anaconda scientific Python distribution.
Note: the anaconda modules will not work on the ARCHER compute nodes.. You should use the native Python modules documented below for running on the compute nodes. If you require Anaconda on the compute nodes, we provide anaconda-compute modules.
There are two Anaconda distributions installed on ARCHER for the login and PP nodes:
- anaconda/python2 - Anaconda with Python 2.7
- anaconda/python3 - Anaconda with Python 3.6
To load the Anaconda Python environment you should use:
module load anaconda
As python2 is the default this would add the Anaconda Python 2 environment to your session. To load the Python 3 version you need to fully specify the module name:
module load anaconda/python3
Full details on the Anaconda distributions can be found on the Continuum website at:
6.2.1 Packages included in Anaconda distributions
You can list the packages currently available in the distribution you have loaded with the command conda list:
user@eslogin001:~> module load anaconda user@eslogin001:~> conda list # packages in environment at /home/y07/y07/cse/anaconda/python2: # _license 1.1 py27_1 alabaster 0.7.10 py27_0 anaconda custom py27_0 anaconda-client 1.6.3 py27_0 anaconda-navigator 1.2.3 py27_0 anaconda-project 0.6.0 py27_0 argcomplete 1.0.0 py27_1 asn1crypto 0.22.0 py27_0 astroid 1.4.9 py27_0 ...
6.2.2 Adding packages to the Anaconda distribution
Adding packages to the central Anaconda ditribution cannot be done by users. If you wish to have additional packages, we recommend installing your own local version of Miniconda and adding the packages you need. This approach is described in Custom Environment below.
6.3 Python for HPC: the native distributions
When you are using Python on the compute nodes you should use the native Python modules.
You do this by loading the python-compute module in your job submission script with:
Python 2
module load python-compute aprun -b -n 1 python test.py
Python 3
module load python-compute/3.6.0_gcc6.1.0 aprun -b -n 1 python3 test.py
The Python 3 module switches to the Gnu programming environment with gcc 6.1.0. The Python and Pip commands are python3 and pip3, not python and pip.
Note: there is a previous version of python-compute for Python 3 (python-compute/3.4.3) which is still available.
6.3.1 Python packages for native distributions
Unlike the Anaconda distributions, the native Python distribution does not have the performance packages such as numpy built in as they are compiled from source against the Cray or Intel libraries.
If you wish to use these packages then you need to add them to your environment separately after you have loaded the python-compute module
Python 2
All of the module names for these packages are prepended with "pc-" (Python compute) to make them easier to identify. For example, the numpy modules are:
user@eslogin001:~> module avail pc-numpy ------------------------- /opt/modules/packages-archer ------------------------- pc-numpy/1.9.2-libsci(default) pc-numpy/1.9.2-mkl pc-numpy/1.9.2-mkl-python3
To see a full list of modules available for native Python 2 use:
module avail pc-
You can view packages installed within the Python 2 distribution itself (i.e. available without loading further system modules) with the command pip list:
user@eslogin002:~> module load python-compute user@eslogin002:~> pip list apache-libcloud (0.15.1) Biggus (0.8.0) biopython (1.64) cf-python (1.0.3) colorama (0.3.1) Cython (0.21.1) ...
Python 3
The Python 3 module is like a programming environment: after it is loaded, three further modules are available and can be loaded: numpy, scipy, and mpi4py.
You can view packages installed within the Python 3 distribution itself (i.e. available without loading further system modules) with the command pip3 list.
6.3.2 Adding packages to the native distribution
If you require additional supporting packages that are not performance-critical you can add these to a custom Python virtual environment, which you should create using the virtualenv command that becomes available after loading one of the python-compute modules:
user@eslogin002:~> module load python-compute user@eslogin002:~> virtualenv --system-site-packages myEnv New python executable in /home/z01/z01/user/myEnv/bin/python Installing setuptools, pip, wheel...done.
Note the --system-site-packages flag, which is required on ARCHER.
This environment can then be activated as follows:
user@eslogin002:~> source ~/myEnv/bin/activate (myEnv) user@eslogin002:~>
Additional packages can then be installed into this Python virtual environment using pip:
(myEnv) user@eslogin002:~> pip install --user newPackage
If you wish to compile performance-critical packages for the compute nodes then please contact the ARCHER Helpdesk in the first instance for advice.
6.3.3 Anaconda on the compute nodes
Although the Anaconda distribution is not optimised for the ARCHER compute nodes and will provide inferior performance compared to the native distribution its, flexibility means that it may be of some use to ARCHER users on compute nodes.
The standard anaconda modules will not work on the ARCHER compute nodes as they are installed on the /home file system, which is not accessible from the compute nodes.
If you wish to use the Anaconda environment on the ARCHER compute nodes, you must load the anaconda-compute module (usually in your job submission script). For example:
module load anaconda-compute
loads the Python 2 version of the anaconda module for use on the ARCHER compute nodes.
As for the standard anaconda modules, there is a Python 3 version available too. The anaconda-compute modules have exactly the same packages installed as the standard anaconda packages.
For Python 3, you need to unload the xalt module, and use aprun -b. For example:
module unload xalt module load anaconda-compute/python3 aprun -b -n 1 python -c 'print("Hello")'
6.4 Custom Environments
To setup a custom Python environment including packages that are not in the central installation, the simplest approach is the install Miniconda locally in your own directories.
6.4.1 Installing Miniconda
Note: If you wish to use Python on the compute nodes then you must install Miniconda in your /work directories as these are the only ones visible on the compute nodes.
First, you should download Miniconda. You can use wget on ARCHER to do this, for example:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Note: at present there seems to be a bug in the installer for the current version of miniconda.
The work around which we have found to be successful is to install an older version, and once that has installed, then to do an update.
For example
wget https://repo.continuum.io/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh
install (see below), then
conda update --all
You can find links to the various miniconda versions on the Miniconda website:
For ARCHER, you should use the Linux 64-bit (bash installer).
Once you have downloaded the installer, you can run it. For example:
user@eslogin008:~> bash Miniconda3-latest-Linux-x86_64.sh Welcome to Miniconda3 4.3.31 In order to continue the installation process, please review the license agreement. Please, press ENTER to continue >>> ==================================== Miniconda End User License Agreement ==================================== ---snip--- Do you accept the license terms? [yes|no] [no] >>> yes Miniconda3 will now be installed into this location: /home/t01/t01/user/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/home/t01/t01/user/miniconda3] >>> PREFIX=/home/t01/t01/user/miniconda3 installing: python-3.6.3-h6c0c0dc_5 ... installing: ca-certificates-2017.08.26-h1d4fec5_0 ... installing: conda-env-2.6.0-h36134e3_1 ... installing: libgcc-ng-7.2.0-h7cc24e2_2 ... installing: libstdcxx-ng-7.2.0-h7a57d05_2 ... installing: libffi-3.2.1-hd88cf55_4 ... installing: ncurses-6.0-h9df7e31_2 ... installing: openssl-1.0.2n-hb7f436b_0 ... installing: tk-8.6.7-hc745277_3 ... installing: xz-5.2.3-h55aa19d_2 ... installing: yaml-0.1.7-had09818_2 ... installing: zlib-1.2.11-ha838bed_2 ... installing: libedit-3.1-heed3624_0 ... installing: readline-7.0-ha6073c6_4 ... installing: sqlite-3.20.1-hb898158_2 ... installing: asn1crypto-0.23.0-py36h4639342_0 ... installing: certifi-2017.11.5-py36hf29ccca_0 ... installing: chardet-3.0.4-py36h0f667ec_1 ... installing: idna-2.6-py36h82fb2a8_1 ... installing: pycosat-0.6.3-py36h0a5515d_0 ... installing: pycparser-2.18-py36hf9f622e_1 ... installing: pysocks-1.6.7-py36hd97a5b1_1 ... installing: ruamel_yaml-0.11.14-py36ha2fb22d_2 ... installing: six-1.11.0-py36h372c433_1 ... installing: cffi-1.11.2-py36h2825082_0 ... installing: setuptools-36.5.0-py36he42e2e1_0 ... installing: cryptography-2.1.4-py36hd09be54_0 ... installing: wheel-0.30.0-py36hfd4bba0_1 ... installing: pip-9.0.1-py36h6c6f9ce_4 ... installing: pyopenssl-17.5.0-py36h20ba746_0 ... installing: urllib3-1.22-py36hbe7ace6_0 ... installing: requests-2.18.4-py36he2e5f8d_1 ... installing: conda-4.3.31-py36_0 ... installation finished. WARNING: You currently have a PYTHONPATH environment variable set. This may cause unexpected behavior when running the Python interpreter in Miniconda3. For best results, please verify that your PYTHONPATH only points to directories of packages that are compatible with the Python interpreter in Miniconda3: /home/t01/t01/user/miniconda3 Do you wish the installer to prepend the Miniconda3 install location to PATH in your /home/t01/t01/user/.bashrc ? [yes|no] [no] >>> You may wish to edit your .bashrc to prepend the Miniconda3 install location to PATH: export PATH=/home/t01/t01/user/miniconda3/bin:$PATH Thank you for installing Miniconda3!
Miniconda is now installed in your local directories but we still need to setup a way to access it correctly. There are a number of ways to do this.
- If you are always going to be using this Python environment on ARCHER and do not wish to use any other Python environment, you can follow the advice of the Miniconda installer and add a line to your .bashrc file
- You can export PATH every time you wish to use you local install by using the bash command export PATH=/home/t01/t01/user/miniconda3/bin:$PATH (using the correct PATH as specified by the installer). This will become tedious if you use the environment often!
- You can create an alias in your .bashrc file to set the path. For example, adding the line alias condasetup="export PATH=/home/t01/t01/user/miniconda3/bin:$PATH" would allow you to use the command condasetup to initialise the Miniconda environment.
- You could also create a modulefile to provide a way to initialise the environment using module load ... as we do for our Anaconda environments. Please contact the helpdesk if you want help to do this.
6.4.2 Installing packages into Miniconda
Once you have installed Miniconda and setup your environment to access it, you can then add whatever packages you wish to the installation using the conda install ... command. For example:
user@eslogin001:~> conda install numpy Fetching package metadata ............... Solving package specifications: . Package plan for installation in environment /home/t01/t01/user/miniconda3: The following NEW packages will be INSTALLED: blas: 1.1-openblas conda-forge libgfortran: 3.0.0-1 numpy: 1.14.0-py36_blas_openblas_200 conda-forge [blas_openblas] openblas: 0.2.20-7 conda-forge The following packages will be UPDATED: conda: 4.3.31-py36_0 --> 4.3.33-py36_0 conda-forge The following packages will be SUPERSEDED by a higher-priority channel: conda-env: 2.6.0-h36134e3_1 --> 2.6.0-0 conda-forge Proceed ([y]/n)? y conda-env-2.6. 100% |########################################################################| Time: 0:00:00 33.71 kB/s libgfortran-3. 100% |########################################################################| Time: 0:00:00 7.85 MB/s openblas-0.2.2 100% |########################################################################| Time: 0:00:03 4.84 MB/s blas-1.1-openb 100% |########################################################################| Time: 0:00:00 1.33 MB/s numpy-1.14.0-p 100% |########################################################################| Time: 0:00:01 5.00 MB/s conda-4.3.33-p 100% |########################################################################| Time: 0:00:00 5.71 MB/s
Here we see the numpy module has been installed in the local environment:
user@eslogin001:~> conda list # packages in environment at /home/t01/t01/user/miniconda3: # asn1crypto 0.23.0 py36h4639342_0 blas 1.1 openblas conda-forge ca-certificates 2017.08.26 h1d4fec5_0 certifi 2017.11.5 py36hf29ccca_0 cffi 1.11.2 py36h2825082_0 chardet 3.0.4 py36h0f667ec_1 conda 4.3.33 py36_0 conda-forge conda-env 2.6.0 0 conda-forge cryptography 2.1.4 py36hd09be54_0 idna 2.6 py36h82fb2a8_1 libedit 3.1 heed3624_0 libffi 3.2.1 hd88cf55_4 libgcc-ng 7.2.0 h7cc24e2_2 libgfortran 3.0.0 1 libstdcxx-ng 7.2.0 h7a57d05_2 ncurses 6.0 h9df7e31_2 numpy 1.14.0 py36_blas_openblas_200 [blas_openblas] conda-forge openblas 0.2.20 7 conda-forge openssl 1.0.2n hb7f436b_0 pip 9.0.1 py36h6c6f9ce_4 pycosat 0.6.3 py36h0a5515d_0 pycparser 2.18 py36hf9f622e_1 pyopenssl 17.5.0 py36h20ba746_0 pysocks 1.6.7 py36hd97a5b1_1 python 3.6.3 h6c0c0dc_5 readline 7.0 ha6073c6_4 requests 2.18.4 py36he2e5f8d_1 ruamel_yaml 0.11.14 py36ha2fb22d_2 setuptools 36.5.0 py36he42e2e1_0 six 1.11.0 py36h372c433_1 sqlite 3.20.1 hb898158_2 tk 8.6.7 hc745277_3 urllib3 1.22 py36hbe7ace6_0 wheel 0.30.0 py36hfd4bba0_1 xz 5.2.3 h55aa19d_2 yaml 0.1.7 had09818_2 zlib 1.2.11 ha838bed_2
Please note, for some package installations it may also be necessary to specify a channel such as conda-forge. For example, the following command installs the pygobject module.
user@eslogin001:~> conda install -c conda-forge pygobject
6.4.3 Custom Python environments with MPI
Miniconda does not provide an mpi4py installation that is compatible with the ARCHER interconnect so you cannot use a Miniconda to provide an environment for this. In this case you will need to build on the centrally-installed python-compute environment by compiling the packages required yourself and modifying PYTHONPATH.
If you need help to do this, then please contact the helpdesk.