2. Connecting and Data Transfer
- 2.1 Interactive access
- 2.2 Non-interactive SSH access
- 2.3 Transferring data to and from ARCHER
- 2.4 Making access more convenient using the SSH configuration file
On the ARCHER system interactive access can be achieved via SSH, either directly from a command line terminal or using an SSH client. In addition data can be transferred to and from the ARCHER system using scp from the command line or by using a file transfer client.
To access ARCHER, you need to use two credentials: your password and an SSH key pair protected by a passphrase. You can find more detailed instructions on how to set up your credentials to access ARCHER from Windows, macOS and Linux in the "Logging on to ARCHER" guide:
In the rest of this section, we cover the basic connection and data transfer methods, along with presenting some performance considerations. Finally, we briefly introduce the SSH config file which can make access a bit more convenient.
Note:Whenever this section talks about connecting to or transferring data to/from ARCHER you will be required to use both your password and SSH key.
2.1 Interactive access
To log into ARCHER you should use the "login.archer.ac.uk" address:
ssh [userID]@login.archer.ac.uk
Windows users will need an SSH client. If you have a modern version of Windows 10 you can use the SSH client included with Powershell. We also find the following tools useful for connecting from Windows:
Note on using MobaXterm on ARCHER:
When you log in to a system using MobaXterm, it also tries to set up a SFTP session. On ARCHER this cannot happen automatically so you will receive a prompt for SFTP browser password.
Two possible solutions are:
- Turn off the SFTP browser
- Re-enter your password when prompted
2.1.1 Interactive access to Post Processing nodes
The ARCHER post processing (PP) nodes are provided for compute, memory, or data-intensive operations that do not require access to parallel, compute nodes. /home, /work and the RDF filesystems are all mounted on the PP nodes.
The PP nodes can be accessed in two ways:
- Via the serial queues: see the description in the Post Processing Jobs section of this User Guide.
- Via direct interactive access: this is described below.
To connect to the PP nodes you must first be logged in to the ARCHER login nodes as described above. Once on the ARCHER login nodes you connect to one of the two PP nodes using the ssh command and one of the following host names:
- espp1
- espp2
For example, to connect to the espp1 PP node you would first log in to ARCHER and then use the command:
ssh espp1
You will be prompted for a password - this is your ARCHER password.
If you wish to export the PP node display back to your local workstation (for example you are using an application with a GUI). Then you should first log into ARCHER using the "-X" (or "-Y") option to ssh and then add this option to your ssh command to access the PP nodes. For example:
ssh -X espp1
Note: compiling programs for the PP nodes is a slightly different procedure than compiling for the compute nodes. See the Compiling for Post Processing nodes section in this User Guide. You can use the PP nodes for performing long compilations for the compute nodes using the standard compiling commands outlined in this User Guide.
Note: you must be logged in to the ARCHER login (or MOM/job launcher nodes via a batch job) before you can log in to the PP nodes.
2.2 Non-interactive SSH access
Disconnected tty/terminal processes will be terminated, which means that non-interactive SSH sessions will be terminated. To avoid this, use the -t flag (you may need more than one depending on how you are starting ssh). For example
ssh -t [userID]@login.archer.ac.uk 'echo start; sleep 60; echo finish'
2.3 Transferring data to and from ARCHER
The ARCHER systems are connected to the outside world via the UK academic SuperJANET5 network.
The simplest way of transferring data to and from ARCHER is using the scp command. Here are some examples:
Example 1:
The following command on your local system, will transfer the file source.f to your home directory on the ARCHER system.
scp ./source.f [username]@login.archer.ac.uk:
Example 2:
The following command will copy the file input_data.tar.gz from your local system to the RUN5 sub-directory on your work directory.
scp ./input_data.tar.gz \ [username]@login.archer.ac.uk:/work/[project]/[group]/[username]/RUN5
Example 3:
The following command will copy the sub-directory RESULTS from your work directory to the current directory on your local system (note the use of the -r option).
scp -r [username]@login.archer.ac.uk:/work/[project>]/[group]/[username]/RESULTS ./
Example 4:
If your local system supports ssh logins, you may also run the scp commands from the ARCHER system. For example, the same transfer of the RESULTS sub-directory on ARCHER to the home directory of your local system could also be accomplished by running the following command on the ARCHER system:
scp -r /work/[project]/[group]/[username]/RESULTS \ local_username@machine_name.institution.ac.uk:~
2.3.1 Performance considerations
ARCHER is capable of generating data at a rate far greater than the rate at which this can be downloaded over SuperJANET5. In practice, it is expected that only a portion of data generated on ARCHER will be required to be transferred back to users' institutions - the rest will be, for example, intermediate or checkpoint files required for subsequent runs. However, it is still essential that all users try to transfer data to and from ARCHER as efficiently as possible. The most obvious ways to do this are:
- Only transfer those files that are required for subsequent analysis, visualisation and/or archiving. Do you really need to download those intermediate result or checkpointing files? Probably not.
- Combine lots of small files into a single tar file, to reduce the overheads associated in initiating data transfers.
- Compress data before sending it, e.g. using gzip or bzip2.
- Consider doing any pre- or post-processing calculations on ARCHER. Long running pre- or post- processing calculations should be run via the batch queue system, rather than on the login nodes. Such pre- or post-processing codes could be serial or OpenMP parallel applications running on a single node, though if the amount of data to be processed is large, a MPI application able to use multiple nodes may be preferable.
Note that the performance of data transfers between ARCHER and your local institution may differ depending upon whether the transfer command is run on ARCHER or on your local system.
2.4 Making access more convenient using the SSH configuration file
Typing in the full command to login or transfer data to ARCHER can become tedious as it often has to be repeated many times. You can use the SSH configuration file, usually located on your local machine at ".ssh/config" to make things a bit more convenient.
Each remote site (or group of sites) can have an entry in this file which may look something like:
Host archer HostName login.archer.ac.uk User username
(remember to replace "username" with your actual username!).
The Host archer line defines a short name for the entry. In this case, instead of typing ssh login.archer.ac.uk to access the ARCHER login nodes, you could use ssh archer instead. The remaining lines define the options for the "archer" host.
- Hostname login.archer.ac.uk - defines the full address of the host
- User username - defines the username to use by default for this host (replace "username" with your own username on the remote host)
Now you can use SSH to access ARCHER without needing to enter your username or the full hostname every time:
-bash-4.1$ ssh archer
You can set up as many of these entries as you need in your local configuration file. Other options are available. See the ssh_config man page (or "man ssh_config" on any machine with SSH installed) for a description of the SSH configuration file. You may find the "IdentityFile" option useful if you have to manage multiple SSH key pairs for different systems as this allows you to specify which SSH key to use for each system.