Data Management: IO, Transfer and Storage

Whether you're undertaking very data-intensive computations or working with large input or output files that need to be transferred on or off an HPC machine, a good understanding of data management best practices will help you to make the most of available resources.


This two day course will fall into two parts:

The first part will cover best practices for working with your files on ARCHER. It will describe ARCHER's file systems and the relationship between ARCHER and the RDF. It will introduce GridFTP as a mechanism for moving large amounts of data on and off ARCHER and the RDF. We'll then introduce file formats that have associated libraries for high-performance IO (such as HDF5 and NetCDF).

The second part will cover alternative data storage and transmission techniques. It will introduce both traditional "SQL" databases and more modern "NoSQL" databases that are particularly suited for some big data applications. Finally, we'll cover some of the widely used file formats used when working with data on the Internet, such as XML and JSON.

This course is free to all academics.


Attendees are expected to have experience of using desktop computers, but no programming, Linux or HPC experience is necessary.


Wednesday 28th January

  • 09:00 : Registration
  • 09:20 : Welcome & Introduction
  • 09:30 : File systems on ARCHER and the RDF. Moving Data between file systems, and on and off ARCHER and the RDF.
  • 11:00 : Coffee
  • 11:15 : Walkthrough: Moving data. Demonstrations of using GridFTP and related data transfer techniques.
  • 12:30 : Lunch
  • 13:30 : I/O libraries & formats: HDF5
  • 14:30 : Practical (HDF5)
  • 15:30 : Coffee
  • 15:45 : I/O libraries & formats: NetCDF
  • 16:45 : Close

Thursday 29th January

  • 09:30 : Data Infrastructure Hardware: Filesystems, Hierarchical Storage, data-intensive computing
  • 10:30 : Coffee
  • 10:45 : Intro to Relational Databases
  • 11:30 : Practical: SQL
  • 12:30 : Lunch
  • 13:30 : Intro to NoSQL databases
  • 14:30 : Practical: MongoDB
  • 15:30 : Coffee
  • 15:45 : XML & JSON
  • 16:45 : Close

Course Materials

The course will be held in Room 3305, JCMB, King's Buildings, Edinburgh.


