Quick Start Guide
This guide runs through the process of getting an ARCHER account, logging in, compiling a simple program and running your first job.
New Access to ARCHER Arrangements
This provides a step by step guide to getting on to ARCHER following the recent security incident and subsequent changes to the security and logon procedures.
Read the new Access to ARCHER guide....
File systems and manipulating data
ARCHER has a number of different file systems mounted and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.
ARCHER file systems are:
- /home: backed up for disaster recovery purposes only, data recovery for accidental deletion is not supported. NFS, available on login and service nodes.
- /work: not backed-up. Lustre, available on login, service and compute nodes.
- UK-RDF: backed up for disaster recovery purposes only, data recovery for accidental deletion is not supported. GPFS, available on login nodes (and serial nodes).
Top tips for managing data on ARCHER:
- Do not generate huge (>1000) numbers of files in a single directory
- Archive directories or large numbers of files before moving them between file systems (e.g. using tar)
- When using tar or rsync between file systems mounted on ARCHER avoid using the compression options as these slow operations down (as file system bandwidth is generally better than throttling by CPU performance by using compression).
- Think about automating the combination and transfer of multiple files output by software on ARCHER to the UK-RDF. The Data Management Guide linked below provides examples of how to automatically verify the integrity of an archive and examples of how to do this.
Much of the performance difference on transferring data is due to numbers of files involved in the transfer. You should ensure that your work flow is set up so that you do not generate huge (>1000) numbers of files in a single directory
Information on best practice in managing you data is available in our Data Management Guide:
Write your first program on ARCHER
Open a text file on the system using your favourite text editor called "hello_world.f90". For example, using vi:
auser@eslogin01:~> vi hello_world.f90
Now copy and paste the source code below into the file and save it.
! Example Hello World program program hello_world use mpi implicit none ! Set up the variables integer :: irank, nrank integer :: iout, ierr character(len=5) :: cirank character(len=30) :: filename ! Initialize MPI and get my rank and total call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, irank, ierr) call mpi_comm_size(MPI_COMM_WORLD, nrank, ierr) ! Set the filename from this process and open for writing write(cirank, "(I0.5)") irank filename = "output"//cirank//".out" iout = 10 + irank open(unit=iout, file=filename, form="FORMATTED") ! Write the output write(iout,'(A,I5,A,I5)') "Hello from ", irank, " of ", nrank ! Close the output file and finalize MPI close(iout) call mpi_finalize(ierr) end program hello_world
Compile your first program
Now use the Fortran compiler wrapper command "ftn" to compile the code:
auser@eslogin01:~> ftn -o hello_world.x hello_world.f90
Note: for C programs you would use the "cc" command and for C++ programs you would use the "CC" command.
More information on compilers on ARCHER is available in the User Guide.
Create a job submission script
To run a program on the ARCHER compute nodes you need to write a job submission script that tells the system how many compute nodes you want to reserve and for how long. You also need to use the "aprun" command to tell ARCHER how to place the parallel processes and threads onto the cores you have reserved.
More information on job submission and process/thread placement on ARCHER is available in the User Guide.
Parallel jobs on ARCHER should be run from the /work filesystem as /home is not mounted on the compute nodes - you will see a chdir error if you try to run a job from the /home filesystem.
Create a job submission script called "submit.pbs" in your space on the work filesystem using your favourite text editor. For example, using vi:
auser@eslogin01:~> cd /work/z01/z01/auser auser@eslogin01:/work/z01/z01/auser> vi submit.pbs
(You will need to use your project code and username to get to the correct directory. i.e. replace the "z01" above with your project code and replace the username "auser" with your ARCHER username.
Paste the following text into your job submission script, replacing ENTER_YOUR_BUDGET_CODE_HERE with your budget code e.g. e99-ham.
#!/bin/bash --login #PBS -N hello_world #PBS -l select=1 #PBS -l walltime=0:5:0 #PBS -A ENTER_YOUR_BUDGET_CODE_HERE # This shifts to the directory that you submitted the job from cd $PBS_O_WORKDIR aprun -n 24 $HOME/hello_world.x cat output*.out > helloworld.out
The bolt job submission script creation tool can be used to automatically create job submission scripts with the correct options and parameters. See:
You can also use the checkScript command to check any job scripts you have written for correctness before you submit them. See:
Submit your job to the queue
You submit your job to the job submission using the "qsub" command:
auser@eslogin01:/work/z01/z01/auser> qsub submit.pbs 72136.sdb
The value retuned is your Job ID.
Monitoring your job
You use the "qstat" command to examine jobs in the queue. Use:
auser@eslogin01:/work/z01/z01/auser> qstat -u $USER
To list all the jobs you have in the queue. PBS will also provide an estimate of the start time for any queued jobs that the system is actively scheduling for by adding the "-T" option:
auser@eslogin01:/work/z01/z01/auser> qstat -Tu $USER
Note: the majority of jobs will not have an estimated start time as the system will be aiming to schedule them in an opportunistic manner (i.e. as soon as resources become available).
To see more details about the queued job, Use:
auser@eslogin01:/work/z01/z01/auser> qstat -f $JOBID
If your job does not enter a running state in the queues, this option may be useful as it contains a Comment field which may explain the reason why.
You can use the checkQueue utility to access information on all your jobs quickly, see:
Checking the output from the job
The job submission script above should write the output to a file called "helloworld.out", you can check this with the "cat" command. If the job was successful you should see output that looks something like:
auser@eslogin01:/work/z01/z01/auser> cat helloworld.out Hello from 0 of 24 Hello from 1 of 24 Hello from 2 of 24 Hello from 3 of 24 Hello from 4 of 24 Hello from 5 of 24 Hello from 6 of 24 Hello from 7 of 24 Hello from 8 of 24 Hello from 9 of 24 Hello from 10 of 24 Hello from 11 of 24 Hello from 12 of 24 Hello from 13 of 24 Hello from 14 of 24 Hello from 15 of 24 Hello from 16 of 24 Hello from 17 of 24 Hello from 18 of 24 Hello from 19 of 24 Hello from 20 of 24 Hello from 21 of 24 Hello from 22 of 24 Hello from 23 of 24
If something has gone wrong, you will find any error messages in the file "hello_world.e[jobID]".
Acknowledging ARCHER
You should use the following phrase to acknowledge ARCHER in all reseach outputs that have used the facility:
This work used the ARCHER UK National Supercomputing Service (http://www.archer.ac.uk).
You should also tag outputs with the keyword ARCHER whenever possible.
Useful Links
Links to other documentation you may find useful:
- ARCHER User Guide - Covers basic use of ARCHER: e.g. compilation, running jobs and using Python
- ARCHER Best Practice Guide - Covers optimisation, debugging, performance monitoring and other advanced topics.
- UK-RDF User Guide - Covers using the UK Research Data Facility including the Data Analytic Cluster and the Data Transfer Nodes.