Data Management Plans
We strongly recommend that you give some thought to how you use the various data-storage facilities that are part of the ARCHER service. This will not only allow you to use the machine more effectively but also to ensure that your valuable data is protected.
The ARCHER/RDF service like many HPC systems has a complex structure. There are three types of file-system available on the system:
- Home file systems
- Work file systems
- Archive file systems
Each type of file-system has different characteristics and is suitable for differnt types of use.
There are also many different types of node:
- Login nodes
- Compute nodes
- Serial batch/post-processing nodes
- Data transfer nodes
Each type of nodes sees a different combination of the file-system types.
Home file systems
There are four independent home file-systems. Every project has an allocation on one of the four. You don't need to know which one your project uses as your projects space can always be accessed via the path /home/project-code. Each home file-system is approximately 60TB in size and is implemented using standard Network Attached Storage (NAS) technology. This means that these disks are not particularly high performance but are well suited to standard operations like compilation and file editing. These file-systems are visible from the ARCHER login nodes and the serial-batch/post-processing nodes
The home file systems are fully backed up. Full backups are taken weekly with incremental backups added every day in between. These file-systems are therefore a good location to keep source-code, copies of scripts and compiled binaries. Small amounts of important data can also be copied here for safe keeping though the file-systems are not fast enough to host large data-sets.
Backups are kept for disaster recovery purposes only. If you have accidentally lost data from a backed-up file-system and have no other way of recovering the data then contact us as quickly as possible but we may be unable to assist.
work file systems
There are three independent work file-systems:
- /fs2 1.5PB
- /fs3 1.5PB
- /fs4 1.8PB
Every project has an allocation on one of the three. You don't need to know which one your project uses as your projects space can always be accessed via the path /work/project-code.
These are high-performance parallel file-systems built using lustre. They are designed to support data in large files. The performance for data stored in large numbers of small files is probably not going to be as good.
These are the only file systems that the compute nodes can see so all data read or written by the compute nodes has to live here.
The work file-systems are scratch file-systems so no backups take place. You should not rely on these file-systems for long term storage. Ideally these file-systems should only contain data that is either:
- Actively in use
- Recently generated and in the process of being saved elsewhere
- Being made ready for up-coming work.
In practice it may be convenient to keep copies of data-sets in work that you know will be needed at a later date. However make sure that important data is always backed up elsewhere and that your work would not be hugely impacted without these copies. Large data sets can be copied to the archive file systems or transfered off the ARCHER service entirely.
If you have data on work that you are not going to need in the future please delete the data. This will help to ensure that in the event of any problems with the file-system maintenance operations will not be slowed down by the need to recover irrelevant data.
Archive file systems
There are three archive file-systems.
Most projects will have an allocation on one of these file-systems. If you have a requirement to use these file-systems and your project does not have an allocation (or the allocation is insufficient) please contact the helpdesk.
The file-systems are provided by the RDF but are directly mounted by the ARCHER login nodes and serial-batch/post-processing nodes as well as the RDF data-mover nodes and analysis cluster.
The archive file-systems are parallel file-systems built using GPFS. They are intended as a safe location for large data-sets so backups to an off-site tape library are performed daily. Backups of deleted files are retained for 180 days
As with any parallel file-system large data files are handled more efficiently than large numbers of small data files. If your data consists of a large number of related files you should consider packing them into larger archive files for long term storage. This will also make it easier to manage your data as the collection can be treated as a single object.
Understanding rdf file systems quotas
It should be noted that on the RDF the group allocations are implemented using GPFS "file-sets" this means that the quota and usage is what you expect it to be (the amount of data held within the directory tree). Files outside of these directory trees don't cound towards the group totals.
However the user quotas are still standard unix file-quotas and the usage values is the sum of all of the files owned by the user wherever they are within the /nerc, /epsrc or /general file-system. The user quotas will only sum to the group value if all of the users concerned only have access to a single file-set.
To be clear, every file counts towards both a user quota and a group quota (or a file-set in the case of the RDF file-systems) but one is a limit on the space taken by files owned by a specific user, the other is a limit on the space taken by files belonging to a group (or directory tree).
The group/file-set quota used by a project is constrained such that it can never total more than the overall disk space allcated to the project. However, there is no such limit on the user quotas. Most projects do not choose to set user quotas at all and leave all user quotas to be without limit.
We would recommend that projects EITHER use a single group and user quotas or use group quotas only to avoid confusion.