7. Debugging

Note that the usefulness and accuracy of the information within any debugger depends on your compilation options. If you have optimisation switched on then you may find that the line numbers listed in the debugging information do not correspond with the statements in your source code file. For debugging code we always recommend that you compile with optimisation switched off and the -g flag enabled to provide the most accurate information.

You may want to use an interactive session whilst debugging, in which case you are advised to also consult the section in the user guide on interactive jobs and productivity tips.

7.1 Available Debuggers

ARCHER has Cray ATP, DDT and lgdb installed.

7.2 Cray ATP

Cray ATP (Abnormal Termination Processing) is a tool that monitors your application and, in the event of an abnormal termination, it will collate the failure information from all the running processes into files for analysis.

With ATP enabled, in the event of abnormal termination, all of the stacktraces are gathered from the dying processes, analysed and collated into a single file called atpMergedBT.dot. In addition the stacktrace from the first process to die (hence the probable cause for the failure) is delivered to stderr.

The atpMergedBT.dot file can be viewed using the stat-view command that is accessible by loading the stat module.

7.2.1 ATP Example

To enable ATP you should load the atp module in your job submission script and set the "ATP_ENABLED" environment variable to 1. i.e. you should include the following commands in your (bash) job submission script:

module load atp
export ATP_ENABLED=1

and then run your job using aprun as usual. Once your application has terminated abnormally you need to log into the service while exporting the X display back to your local machine (you must have an X server running locally) with:

ssh -Y username@archer.ac.uk

Load the stat module with:

module add stat

and view the merged stacktrace with:

stat-view atpMergedBT.dot

The stderr from your job should also contain useful information that has been processed by ATP.

Please note, Cray ATP should only be used in circumstances when the application code has been forcibly aborted, such as a segmentation fault. Aborts initiated from within the application code itself will not be captured by Cray ATP and so no atpMergedBT.dot file will be generated.

7.3 STAT

The Stack Trace Analysis Tool (STAT) is a cross-platform debugging tool from the University of Wisconsin-Madison. ATP is based on the same technology as STAT, both are designed to gather and merge stack traces from a running application's parallel processes. The STAT tool can be useful when application seems to be deadlocked or stuck, i.e. they don't crash but they don't progress as expected, and it has been designed to scale to a very large number of processes. Full information on STAT, including use cases, is available at the STAT website.

STAT will attach to a running program and query that program to find out where all the processes in that program currently are. It will then process that data and produce a graph displaying the unique process locations (i.e. where all the processes in the running program currently are). To make this easily understandable it collates together all processes that are in the same place providing only unique program locations for display.

7.3.1 STAT Example

YouTube video from the ARCHER CSE team demonstrating STAT:

To use the STAT tool you need to run an interactive job. To do this, we recommend using

qsub -I 

In partciluar, do not use the '-V' option; this exports the login environment to the interactive job which can cause problems when connecting STAT to the running job. Add '-X' if using 'stat-view' from within the interactive job.

Once you've launched your interactive job and navigated to the /work directory where you will run you code you need to load the STAT module as follows:

module load stat

Then you simply launch your job as normal, but run it as a background task, for example the following text will run an executable called my_exe using 512 processes. The & symbol runs the application in the background:

aprun -n 512 -N 24 ./my_exe &

Now you need to discover the program ID of the job you have just run. Use the following command to do this:


This should present you with a set of text that looks something like this:

  PID TTY          TIME CMD
21704 pts/0    00:00:00 bash
21868 pts/0    00:00:00 aprun
21871 pts/0    00:00:00 aprun
21879 pts/0    00:00:00 aprun
21884 pts/0    00:00:00 ps

When your application has reached the point that it hangs issue the following command (replacing PID below with the number of second aprun task you got when you ran the ps command as outlined above):

stat-cl PID

Once STAT has finished working you can kill your aprun job using the following command (again replacing PID as you did for the STAT command):

kill -9 PID

Now you can view the result that STAT has produced using the following command (exe is replaced with the name of the executable you ran):

stat-view stat_results/exe.0000/exe.0000.3D.dot

This should produce a graph displaying all the different places in the program that the parallel processes were at when you queried them. If you have problems viewing the graph it is likely you have not exported your X display when you logged into ARCHER or when you submitted your interactive job. Viewing the graph does not need to be done through an interactive job so you can quit the interactive job at this point and view the graph from the normal ARCHER login nodes.

7.4 DDT Debugger (Arm Forge)

DDT is a debugging tool for scalar, multi-threaded and large-scale parallel applications.

Check the latest version of the User Guide, as well the current default version of DDT on Archer. For more information on using DDT see

7.4.1 Download and install the remote client

The recommended way to use DDT on ARCHER is to install the free Allinea Forge remote client on your workstation or laptop using these instructions.

Once you have installed the remote client, the instructions below describe how to compile and debug a simple executable.

7.4.2 Compile the code for debugging

To compile the code to be debugged you should install the source code on the /work filesystem and compile the executable into a location on /work to ensure that the running job can access all of the required files.

You will also usually want to specify the -O0 option to turn off all code optimisation (as this can produce a mismatch between source code line numbers and debugging information) and -g to include debugging information in the compiled executable.

For example, using the simple MPI code from the ARCHER Quick Start Guide we would compile with:

auser@eslogin01:/work/x01/x01/auser> ftn -O0 -g -o hello_world.x hello_world.f90

7.4.3 Set up the debugger to submit jobs to ARCHER

We must now tell the remote client how to submit jobs to the ARCHER job submission system. You should only need to configure this once and the client will remember for future debugging sessions.

On the main DDT interface, click "Options" and on the dialog box that appears, select "Job Submission" from the list on the left. Ensure that the settings are set up as illustrated below and click "OK":

(The path to the Submission template file is /home/y07/y07/cse/forge/19.0.1/templates/archer.qtf.)

Submit command
Cancel command
Display command

7.4.4 Run your debugging session on your program

Now everything is configured we can debug our program. On the main DDT interface click "Run". This will bring up a dialogue where you can specify the path to your executable and other options such as the number of processors to use and the walltime for the job. An example of the dialog is shown below with dummy values completed for the executable name and the working directory. For our small example we are just using a single node (24 cores) and running for just 10 minutes (so we can use the "short" queue).

Note: to use the short queue your job must have a maximum run time of 20 minutes. If you wish to run for longer you should remove the queue specification so that you run in the standard ARCHER queue.

Once all the options have been set up you can submit your debugging session to the ARCHER queues by clicking "Submit".

A dialog showing the ARCHER queue will appear while the tool waits for your job to start. Note: you may see the warning message below which may be safely ignored.

pbs_iff: cannot connect to host
pbs_iff: cannot connect to host
No Permission.
qstat: cannot connect to server sdb (errno=15007)

Once the job starts the a dialog will appear while the debugger connects to your running processes.

Finally, the debugging interface will appear, allowing you to interactively debug your program.

7.4.5 Finishing your debugging session

To finish the debugging session, just quit the remote client on your workstation or laptop; DDT will ensure that the session is cleaned up properly.

7.4.6 Using DDT directly on the compute nodes

If you intend to use DDT directly on the compute nodes instead of using the remote client, you will need to load the forge module before compiling and linking your program, and before executing your program on the compute nodes:

module load forge

The User Guide gives instructions on how to compile and execute your program, and the command

ddt -help

lists the options for the ddt command. Please contact the ARCHER helpdesk for assistance with using DDT directly on the compute nodes.

7.4.7 Memory debugging of statically-linked programs

When using memory debugging with statically-linked programs, the debugging version of the malloc library needs to be included when the program is linked.

Load the forge module

module load forge

and add the following arguments to the command line when linking the program with the compiler wrapper

-L $ALLINEA_TOOLS_DIR/lib/64 -Wl,--whole-archive -ldmallocthcxx -Wl,--no-whole-archive -Wl,--allow-multiple-definition

The standard malloc library is usually linked by the compiler wrapper. This is replaced with the debugging version. Using whole-archive ensures that any libraries automatically loaded by the compiler wrapper use the debugging version. Using allow-multiple-definition ensures that the standard malloc library is ignored.