ARCHER » 5. UK-RDF Data Analytic Cluster (DAC)

5. UK-RDF Data Analytic Cluster (DAC)

The UK-RDF data analytic cluster (DAC) is designed to allow users to run compute, memory, or IO intensive analyses on data hosted on the service.

To support this, all nodes on the DAC have hihg-bandwidth, direct Infiniband connections to the UK-RDF disks. This means that much higher IO data rates are potentially available compared to, for example, accessing the UK-RDF disks via the ARCHER login and post-processing nodes.

5.1 Accessing the Data Analytic Cluster

Users should log in to the cluster frontend at login.rdf.ac.uk using their usual username and password (ARCHER users can use their ARCHER username and password). For example:

ssh [userID]@login.rdf.ac.uk

If you wish to display graphics on your desktop from programs running on the cluster then you should add the -X option to ssh:

ssh -X user@login.rdf.ac.uk

5.2 System Software

All nodes run the CentOS 6 operating system.

The Torque batch scheduling system is used.

5.3 OMP_NUM_THREADS

The OMP_NUM_THREADS environment variable should be set to 1 on the login node, and set appropriately in job scripts. You can add the line

export OMP_NUM_THREADS=1

in your ~/.bashrc file to ensure the default value is 1. (For other shells, use the corresponding command.)

5.4 Compiling Code

5.4.1 Serial Compilation

The GCC compilers are available on the cluster:

gcc - C compiler
gfortran - Fortran compiler
g++ - C++ compiler

5.4.2 MPI Compilation

The cluster also supports MPI parallel programs. To compile MPI programs you first need to load the MPI module with:

module load openmpi-x86_64

Once this has been done the compilers are available as:

mpicc - C compiler
mpif90 - Fortran compiler
mpic++ - C++ compiler

5.4.3 Build Tools

Both make and cmake are available by default to all users.

5.5 Running Jobs

The UK-RDF cluster runs the Torque batch scheduling system. The basic commands are:

qsub - Submit jobs to the queue
qstat - Query the current queue status
qdel - Remove jobs from the queue

5.5.1 Example job submission scripts: standard compute nodes

All of the examples below would be submitted using qsub (assuming that the script has been saved in a file called "submit.pbs"):

qsub submit.pbs

Note: the maximum number of cores that can be requested for MPI jobs on standard compute nodes is 40.

An example submission script for running a serial job on the standard compute nodes is:

#!/bin/bash --login

#PBS -N my_test_job
#PBS -l ncpus=1
#PBS -l walltime=0:10:0

# Replace "t01" with your account code
#PBS -A t01

gzip my_big_directory/

Example for running an MPI job on standard compute nodes:

#!/bin/bash --login

#PBS -N my_mpi_test_job
#PBS -l ncpus=8
#PBS -l walltime=0:10:0

# Replace "t01" with your account code
#PBS -A t01

# Make sure any symbolic links are resolved to absolute path 
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)

# Change to the directory that the job was submitted from
cd $PBS_O_WORKDIR

module load openmpi-x86_64

mpiexec -n 8 ./my_mpi_program.x

5.5.2 Example job submission scripts: high memory compute nodes

All of the examples below would be submitted using qsub (assuming that the script has been saved in a file called "submit.pbs"). You can use one of two high memory queues:

qsub -q hm03 submit.pbs

qsub -q hm04 submit.pbs

Note: the maximum number of cores that can be requested for MPI jobs on high memory compute nodes is 64.

An example submission script for running a serial job on the high memory compute nodes is:

#!/bin/bash --login

#PBS -N my_test_job
#PBS -l ncpus=1
#PBS -l walltime=0:10:0

# Replace "t01" with your account code
#PBS -A t01

gzip my_big_directory/

Example for running an MPI job on high memory compute nodes:

#!/bin/bash --login

#PBS -N my_mpi_test_job
#PBS -l ncpus=8
#PBS -l walltime=0:10:0

# Replace "t01" with your account code
#PBS -A t01

# Make sure any symbolic links are resolved to absolute path 
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)

# Change to the directory that the job was submitted from  
cd $PBS_O_WORKDIR

module load openmpi-x86_64

mpiexec -n 8 ./my_mpi_program.x

5.5.3 Interactive Jobs

It is also possible to run interactive jobs on the RDF cluster. This is done by specifying the resources required in with qsub command, adding some flags to run interactively. Below is an example of an interactive job submission that uses 8 cpus for 1 hour on the normal RDF nodes (you should replace 't01' with your budget code):

qsub -IVX -lwalltime=1:00:00,ncpus=8 -A t01

The following is an example of the same job using a high memory node (again, you should replace 't01' with your budget code):

qsub -IVX -lwalltime=1:00:00,ncpus=8 -q hm03 -A t01

5.6 Using Python

Python 2.* is available through the Anaconda scientific Python distribution. To access it you should load the anaconda module:

module load anaconda

A list of the default packages installed in the distribution can be found on the web at:

Anaconda Package Documentation

We have also installed some additional packages. The full list of packages can be displayed with the commands:

module load anaconda
conda list
# packages in environment at /general/y12/y12/cserdf/anaconda/2.2.0-python2:
#
_license                  1.1                      py27_0  
abstract-rendering        0.5.1                np19py27_0  
anaconda                  2.2.0                np19py27_0  
argcomplete               0.8.4                    py27_0 
...

5.6.1 Python 3

There is also an Anaconda distribution based on Python 3. To load this use:

module load anaconda/2.2.0-python3

The packages installed can be queried in the same way as described above for Python 2.

5.7 Installed Software

5.7.1 Paraview

There are two versions of paraview installed on the RDF; normal paraview with the paraview GUI installed, and parallel paraview with MPI enabled (but no GUI).

The normal paraview (with the paraview GUI), which will work on both the compute nodes and login nodes, is available by loading the paraview module:

module load paraview

However, this version of paraview does not have an MPI enabled pvserver or pvbatch, so if you are using paraview to do intensive rendering/processing jobs you probably want to use the parallel verison of paraview.

The version of paraview that has been built with MPI can be loaded using the paraview-parallel module:

module load paraview-parallel

This version of paraview does not have the GUI installed and has been compiled to work on the compute nodes (rather than the login nodes).

If you want to use pvserver and control it from the paraview GUI you can do this by running an interactive job, loading the paraview-parallel module and setting up pvserver, as this example shows:

-bash-4.1$ hostname
rdf-comp-ns10
-bash-4.1$ qsub -IXV -lwalltime=3:00:00,ncpus=16
-bash-4.1$ module load paraview-parallel
-bash-4.1$ mpirun -np 16 pvserver --mpi --use-offscreen-rendering --reverse-connection --server-port=11112 --client-host=rdf-comp-ns10

The above assumes you have already setup a server connection in the paraview GUI to listen on port 11112. You can change the port numbers as required provided they match between pvserver and the paraview GUI.

5.7.1.1 portfwd for remote connectivity

Whilst the above instructions allows users running the paraview client on the RDF login nodes to attach to pvserver running on the RDF compute nodes, it is also possible to connect a paraview client on a remote machine to pvserver on the RDF compute nodes using the portfwd utility.

portfwd has been installed as a module on the RDF and can be loaded as follows:

module load portfwd

We are using a modified version of portfwd that has been changed to provide more information to users and to ensure that it only runs when the user is active, not when the user has logged off or their session has ended. portfwd runs as a background process, so once launched it will run without occupying your terminal. To run portfwd launch it like this:

-bash-4.1$ portfwd -c config.cfg
portfwd started

The config.cfg in the above command is the location of the configure file you will create to setup portfwd for the particular application you are running. The configure file shown below is designed to allow a user to connect to a paraview client try to connect to a paraview server on localhost port 11111 to a pvserver instance they have configured to connect to port 11112 on the RDF compute nodes. The config.cfg file contains the following line:

        tcp { 11112 { => 127.0.0.1:11111 } }

This will setup portfwd to connect the pvserver listening on port 11112 to the paraview client on a localhost that is connecting on port 11111. For this to work it will require a port forwarding ssh connection (sometimes also known as SSH tunnels) to forward port 11111 on the localhost to port 11111 on the RDF login node. If you are connecting via SSH on the command line you can use a command like this to create such a tunnel (replacing username with your username on the RDF:

localhost$ ssh -R 11111:localhost:11111 username@login.rdf.ac.uk

5.7.2 VisIt

The VisIt visualisation tool is available by loading the visit module:

module load visit

5.7.3 R

The R statistical and data analysis package is installed. You can access it with the command R:

-bash-4.1$ R

R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

5.7.4 NetCDF

The serial NetCDF libraries and headers are installed on the system.

To compile a C program that uses NetCDF use, for example,

gcc netcdf_program.c -lnetcdf

The Fortran module is installed in /usr/lib64/gfortran/modules.

To compile a Fortran program that uses NetCDF use, for example,

gfortran -I/usr/lib64/gfortran/modules netcdf_program.F90 -lnetcdff -lnetcdf

Note that a number of NCO tools (e.g., ncrcat, ncks) are also installed in /usr/bin. A separate version of the NCO tools is available via the module system:

module load nco

which includes more recent additions such as ncremap. This module includes relevant dependencies on ESMF executables such as ESMF_Regrid. Note that both NCO and ESMF modules have been compiled to support OpenMP, but not MPI.

5.7.6 HDF5

The serial HDF5 shared libraries and headers are installed on the system. If you are using MPI the parallel HDF5 libraries and headers will also be available. Using the HDF5 compiler wrappers is recommended.

To compile a serial C program that uses HDF5 use, for example,

h5cc -shlib hdf5_program.c

To compile an OpenMPI C program that uses parallel HDF5 use, for example,

h5pcc -shlib hdf5_program.c

To compile a serial Fortran program that uses HDF5 use, for example,

h5fc -shlib hdf5_program.F90

To compile an OpenMPI Fortran program that uses parallel HDF5 use, for example,

h5pfc -shlib hdf5_program.F90

5.7.7 FFTW 3

The FFTW 3 libraries and headers are installed on the system and should be included by default by the compilers.

Libraries are installed in /usr/lib64.

Headers are installed in /usr/include.

5.7.8 BLAS and LAPACK

The BLAS and LAPACK linear algebra libraries and headers are installed on the system and should be included by default by the compilers.