Course Descriptions
This section contains details on all the training courses offered by the ARCHER service.
To find out the dates and locations of upcoming courses, please see the Training pages.
If you would like us to run one of the courses described below at a particular location or would like training not specifically covered by one of the courses below then please contact us via the ARCHER Helpdesk.
Course levels
We have classified the ARCHER courses into 3 levels:
- Introductory: Requiring no substantial programming skills or knowledge of HPC; these courses only assume basic computer literacy.
- Intermediate: These require some existing knowledge, for example the ability to program in C or Fortran, or experience of running parallel applications on HPC systems.
- Advanced: These require an existing knowledge of parallel programming.
Possible routes through the available training courses
Clicking on a course in the above diagram will take you to the brief description of that course.
These 'paths' are only a few possible suggestions. Only the specific 'Advanced' courses have any previous course requisite recommendations and many of the courses such as Data Carpentry, Software Carpentry and Practical Software Development could be useful to and accessible by anyone, users or developers.
Please do not hesitate to contact the ARCHER Helpdesk if you would like any advice on the suitablity and availability of courses
Outline Course Descriptions
Introductory (level 1) courses
Data Carpentry
In many domains of research, the rapid generation of large amounts of data is fundamentally changing how research is done. The deluge of data presents great opportunities, but also many challenges in managing, analysing and sharing data. Data Carpentry aims to teach the skills that will enable researchers to be more effective and productive. The course is designed for learners with little to no prior knowledge of programming, shell scripting, or command line tools.
Hands-on Introduction to High Performance Computing
High-performance computing (HPC) is a fundamental technology used in
solving scientific problems. Many of the grand challenges of science
depend on simulations and models run on HPC facilities to make
progress, for example: protein folding, the search for the Higgs boson
and developing nuclear fusion.
The course runs for 2 days. The first day covers the the basic
concepts underlying the drivers for HPC development, HPC hardware,
software, programming models and applications. The second day will
provide an opportunity for more practical experience, information on
performance and the future of HPC. This foundation will give the you
ability to appreciate the relevance of HPC in your field and also
equip you with the tools to start making effective use of HPC
facilities yourself.
The course is delivered using a mixture of lectures and hands-on
sessions and has a very practical focus. During the hands-on sessions
you will get the chance to use ARCHER with HPC experts available to
answer your questions and provide insight.
HPC Carpentry
This course provides an introduction to High Performance Computing (HPC). After completing this course, participants will:
- Understand motivations for using HPC in research
- Understand how HPC systems are put together to achieve performance and how they differ from desktops/laptops
- Know how to connect to remote HPC systems and transfer data
- Know how to use a scheduler to work on a shared system
- Be able to use the Bash command line on remote systems
- Be able to use software modules to access different HPC software
- Be able to work effectively on a remote shared resource
Introduction to Modern Fortran
This course provides an introduction to Modern Fortran, which contains
many powerful features that make it a suitable language for
programming scientific, engineering and numerical applications.
Fortran 90/95 is a modern and efficient general purpose programming
language, particularly suited to numeric and scientific
computation. The language offers advanced array support, and is
complimented by a wealth of numerical libraries. Many large scale
computing facilities offer heavily optimised Fortran compilers, making
Fortran suitable for the most demanding computational tasks.
Topics include: fundamentals, program control, input and output,
variables, procedures, modules, arrays.
Scientific Computing
This course covers the fundamental concepts of numerical simulation, and how
modern parallel supercomputers are used in computational science.
At the end of the course, attendees should be able to:
- explain the motivation for the use of parallel supercomputers in computational science
- describe the main models of parallel programming and propose parallelisation methods for standard problems
- understand the way real numbers are stored on a computer and the way that this affects the accuracy of results
- explain why random numbers are used in many simulations
Software Carpentry
Software Carpentry's goal is to help scientists and engineers become more productive by teaching them basic computing skills like program design, version control, testing, and task automation. In this two-day workshop, short tutorials will alternate with hands-on practical exercises. Participants will be encouraged both to help one another, and to apply what they have learned to their own research problems during and between sessions.
Intermediate (level 2) courses
Data Analytics with High Performance Computing
Data Analytics, Data Science and Big Data are a just a few of the many terms used in business and academic research, all referring to the manipulation, processing and analysis of data. Fundamentally, these are all concerned with the extraction of knowledge from data that can be used for competitive advantage or to provide scientific insight. In recent years, this area has undergone a revolution in which HPC has been a key driver. This course provides an overview of data science and the analytical techniques that form its basis as well as exploring how HPC provides the power that has driven their adoption. The course will cover: key data analytical techniques such as, classification, optimisation, and unsupervised learning; key parallel patterns, such as Map Reduce, for implementing analytical techniques; relevant HPC and data infrastructures; case studies from academia and business.
GPU Programming with CUDA
Graphics Processing Units (GPUs) were originally developed for computer gaming and other graphical tasks, but for many years have been exploited for general purpose computing across a number of areas. They offer advantages over traditional CPUs because they have greater computational capability, and use high-bandwidth memory systems (where memory bandwidth is the main bottleneck for many scientific applications). This introductory course will describe GPUs, and the advantages they offer. It will teach participants how to get started with programming GPUs, which cannot be used in isolation but as "accelerators" in conjunction with CPUs, and how to get good performance. The course focuses on NVIDIA GPUs, and the CUDA programming language (an extension to C/C++ or Fortran). Hands-on practical sessions are included.
Introduction to Spark for Data Scientists
Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.
This hands-on course will cover Introduction to Spark, Map, Filter and Reduce, Running on a Spark Cluster, Key-value pairs, Correlations, logistic regression, Decision trees and K-means.
LAMMPS Workshop
LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is a widely-used classical molecular dynamics (MD) code. This C++ code is easy to use, incredibly versatile, and parallelised to run efficiently on both small-scale personal computers and CPU/GPU/CPU&GPU HPC clusters. As of 2018, LAMMPS has been used, to some degree, in over 14,000 publications in fields as varied as chemistry, physics, material science, granular and lubricated-granular flow, etc.
The first session will be an introduction to setting up and running an MD simulation using LAMMPS. We will begin by running a simulation of a Lennard-Jones fluid before delving deeper into how simulations can be set up and run in LAMMPS. In the second session, we will discuss how to download and install LAMMPS, with a more in-depth discussion of the various packages LAMMPS offers and how to use them efficiently.
Modern C++ for Computational Scientists
With the recent revisions to the C++ language and standard library, the ways it is now being used are quite different. Used well, these features enable the programmer to write elegant, reusable and portable code that runs efficiently on a variety of architectures.
However it is still a very large and complex tool. This course will cover a minimal set of features to allow an experienced non-C++ programmer to get to grips with language. These include: overloading, templates, containers, iterators, lambdas and standard algorithms. We will also briefly cover several important libraries for numerical computing.
Object-Oriented Programming with Fortran
This course provides an introduction to Object-Oriented Programming
(OOP) with Fortran. Fortran is often used for scientific applications,
but applications are mainly developed using the standard procedural
programming techniques that Fortran was initially designed for.
OOP is a programming methodology designed to enable safe and reusable
programming, coupling procedures with the data they operate on in
classes and using them as objects. More commonly associated with large
programs, and programs written in industry/companies, there are many
scientific applications that become very large and long-lived and
therefore could benefit from such programming techniques to make
development, maintenance, and extension of the code simpler and safer.
Whilst Fortran is generally viewed as a procedural programming
language there are features in the most recent versions of the Fortran
standards (90, 95, and 2003) that enable development in OOP or
OOP-like ways. We will introduce these language features and explore
how they can be used in scientific applications.
Message-Passing Programming with MPI
The world's largest supercomputers are used almost exclusively to run
applications which are parallelised using Message Passing. This course
covers all the basic knowledge required to write parallel programs
using this programming model, and is directly applicable to almost
every parallel computer architecture.
Parallel programming by definition involves co-operation between
processors to solve a common problem. The programmer has to define the
tasks that will be executed by the processors, and also how these
tasks are to synchronise and exchange data with one another. In the
message-passing model the tasks are separate processes that
communicate and synchronise by explicitly sending each other
messages. All these parallel operations are performed via calls to
some message-passing interface that is entirely responsible for
interfacing with the physical communication network linking the actual
processors together. This course uses the de facto standard for
message passing, the Message Passing Interface (MPI). It covers
point-to-point communication, non-blocking operations, derived
datatypes, virtual topologies, collective communication and general
design issues.
The course is taught using a variety of methods including formal
lectures, practical exercises, programming examples and informal
tutorial discussions. This enables lecture material to be supported by
the tutored practical sessions in order to reinforce the key concepts.
Parallel Design Patterns
If you were given a serial problem, conceptually, how would you go about splitting it up into many different parts that could run concurrently on the latest parallel computers?
The good news is that you don't need to reinvent the wheel. Instead, there are many different approaches (called parallel patterns) that have been developed by the community and can be used in a variety of situations. These patterns apply equally well regardless of whether your problem is computational or data-driven.
Understanding and being able to apply these patterns also helps in getting to grips with existing parallel codes and optimising poorly performing computation and data codes. Whilst the lectures take a top down approach, focusing on the patterns themselves, the practical exercises give the opportunity to explore the concepts by implementing pattern-based solutions to problems using common HPC technologies.
The parallel patterns (known as a pattern language) that we cover are split into two categories.
The closest to the problem area (and most abstract) are parallel algorithm strategy patterns and include:
- Task Parallelism
- Recursive Splitting
- Geometric Decomposition
- Pipeline
- Discrete Event
- Actors
The other category of patterns is closer to the implementation and drives how the programmer should structure their code and data. These are implementation strategy patterns, and include:
- Master/Worker
- Loop Parallelism
- Fork/Join
- Shared Data and Queues
- Active Messaging
Patterns are described on an abstract level and we will also discuss enhancements that can be made to improve performance/scalability but at the cost of code complexity. Practical implementations of these patterns are explored in depth in the hands-on exercises.
Programming exercises use C and Fortran, with MPI and OpenMP.
Practical Software Development
Writing code is just part of developing effective software - how do
you get the best from working with others as part of a software team,
incorporating existing work into your own, contributing back and
producing code and software suites for others?
Software development comprises a range of activities including writing
code, requirements analysis, testing and product evaluation. This
course introduces how software development projects can be approached
to achieve high-quality software products. We introduce important
ideas for both academic and industrial development such as software
sustainability, testing, adapting processes and communications. The
course will introduce practical skills important for use in developing
software for research and industrial purposes.
Scientific Programming with Python
This course is aimed at programmers with basic Python knowledge seeking to learn how to use Python for scientific computing. We will introduce Python's fundamental scientific libraries such NumPy, SciPy and Matplotlib. We will also introduce how to interface Python with Fortran and C codes, and outline how to implement message-passing in Python with mpi4py.
Shared Memory Programming with OpenMP
Almost all modern computers now have a shared-memory architecture with
multiple CPUs connected to the same physical memory, for example
multicore laptops or large multi-processor compute servers. This
course covers OpenMP, the industry standard for shared-memory
programming, which enables serial programs to be parallelised easily
using compiler directives. Users of desktop machines can use OpenMP on
its own to improve program performance by running on multiple cores;
users of parallel supercomputers can use OpenMP in conjunction with
MPI to better exploit the shared-memory capabilities of the compute
nodes.
This course will cover an introduction to the fundamental concepts of
the shared variables model, followed by the syntax and semantics of
OpenMP and how it can be used to parallelise real programs. Hands-on
practical programming exercises make up a significant, and integral,
part of this course.
Threaded Programming
This is an extended three-day version of the standard OpenMP course, covering additional material related to performance optimisation on modern multicore systems. It is delivered in conjunction with the Centre for Doctoral Training in Next-Generation Computational Modelling at the University of Southampton.
Advanced (level 3) courses
Performance Analysis Workshop
Current and future supercomputing architectures face a dramatic growth
of parallelism and heterogeneity on multiple levels. As a result, it
is almost impossible for code developers to predict which parts of
their code will perform well, which development decisions impact
scalability, which choice of data structures are reasonable for a
specific architecture, etc. Most decisions are based upon experience,
intuition and a limited understanding of the code's performance.
To get a better understanding of code performance and to guide
performance engineering, it is essential for computational scientists
and engineers to conduct measurements in order to study code
performance in detail. Performance analysis tools, a generalisation of
the classic profiler, are the best tools to obtain this
insight. However, they themselves require a certain level of
understanding, experience and expertise to be used productively which
adds to the complexity of the underlying problem. This workshop
introduces several performance analysis tools and provides hands-on
training on how to use them in practice on large-scale HPC
applications.
Advanced MPI
This course is aimed at programmers seeking to deepen their understanding of MPI and explore some of its more recent and advanced features. We cover topics including Hybrid MPI/OpenMP, communicator management, neighbourhood collectives, single-sided MPI and the new MPI memory model. We also look at performance aspects such as which MPI routines to use for scalability, overlapping communication and calculation and MPI internal implementation issues.
Advanced OpenMP
This course is aimed at programmers seeking to deepen their understanding of OpenMP and explore some of its more recent and advanced features. We cover topics including nested parallelism, OpenMP tasks, the OpenMP memory model, performance tuning, hybrid OpenMP + MPI, OpenMP implementations, and upcoming features in OpenMP 4.0
Efficient Parallel IO
One of the greatest challenges to running parallel applications on
large numbers of processors is how to handle file IO: standard IO
routines are not designed with parallelism in mind. Parallel file
systems such as Lustre are optimised for large data transfers, and
performance can be far from optimal if many files are opened at once.
The IO part of the MPI standard gives programmers access to efficient
parallel IO in a portable fashion. However, there are a large number
of different routines available and some can be difficult to use in
practice. Despite its apparent complexity, MPI-IO adopts a very
straightforward high-level model. If used correctly, almost all the
complexities of aggregating data from multiple processes can be dealt
with automatically by the library.
The first day of the course will cover the MPI-IO standard, developing
IO routines for a regular domain decomposition example. It will also
briefly cover higher-level standards such as HDF5 and NetCDF. The
second day will concentrate on how to use the
Lustre file system for best performance. Case studies from real codes will also be presented.
Although the course mainly uses the MPI-IO library and the Lustre parallel filesystem for
specific examples, most of the IO concepts and performance
considerations are applicable to almost any parallel system.
Efficient use of ARCHER and the Knights Landing Processor
Delivered in collaboration with the Cray Centre of Excellence, this course covers the system-specific features of the ARCHER service over three days. It is focused on hardware and software issues specific to the Cray XC30 and XC40, and includes a detailed overview of the Intel Xeon processors, the Aries network interconnect and Cray-provided systems software and performance tools. It is ideal for users both familiar with existing Cray supercomputers or those porting from alternative platforms. The course will also cover how to exploit the potential of the Knights Landing (KNL) processor, including hands-on sessions on the ARCHER KNL system.
Programming the Manycore Knights Landing Processor
The Knights Landing (KNL) processor is Intel's most recent release of the Many Integrated Core design. This course will focus on the architecture of the KNL processor, including details of its new high-bandwidth memory and a discussion of how best to use all of the 250-plus possible threads of execution in real applications. How to use the Intel compiler and associated tools to achieve good performance, for example to exploit vectorisation, will be explained. It will also cover issues relating to using the Xeon Phi as part of a larger HPC system.
Single-Node Performance Optimisation
This course covers techniques for improving the performance of
parallel applications by optimising of the code that runs within each node.
Modern HPC systems such as ARCHER are being constructed using
increasingly powerful nodes, with larger and larger numbers of cores
and enhanced vector capabilities. To extract maximum performance from
applications, it is therefore necessary to understand, and be able to
overcome, on-node performance bottlenecks.
This course will cover the main features of modern HPC nodes,
including multiple cores, vector floating point units, deep cache
hierarchies, and NUMA memory systems. We will cover techniques for
efficient programming of these features, using batch processing
options and compiler options as well as hand tuning of code. The
course will also contain an introduction to the use of Cray
performance analysis tools.
Single-sided Communications
Partitioned Global Address Space (PGAS) languages such as Unified Parallel C (UPC) and Fortran Coarrays have been the subject of much attention in recent years, in particular due to the exascale challenge. There is a widespread belief that existing message-passing approaches such as MPI will not scale to this level due to issues such as memory consumption and synchronisation overheads. PGAS approaches offer a potential solution as they provide direct access to remote memory. This reduces the need for temporary memory buffers, and may allow for reduced synchronisation and hence improved message latencies. This course covers how the PGAS model is implemented in C (via UPC) and Fortran (via coarrays), and also how to use the OpenSHMEM library for single-sided communication.