Data Management: IO, Transfer and Storage

Whether you're undertaking very data-intensive computations or working with large input or output files that need to be transferred on or off an HPC machine, a good understanding of data management best practices will help you to make the most of available resources.

Details

This two day course will fall into two parts:

The first part will cover best practices for working with your files on ARCHER. It will describe ARCHER's file systems and the relationship between ARCHER and the RDF. It will introduce GridFTP as a mechanism for moving large amounts of data on and off ARCHER and the RDF. We'll then introduce file formats that have associated libraries for high-performance IO (such as HDF5 and NetCDF).

The second part will cover alternative data storage and transmission techniques. It will introduce both traditional "SQL" databases and more modern "NoSQL" databases that are particularly suited for some big data applications. Finally, we'll cover some of the widely used file formats used when working with data on the Internet, such as XML and JSON.

This course is free to all academics.

Pre-requisites

Attendees are expected to have experience of using desktop computers, but no programming, Linux or HPC experience is necessary.

Timetable

Wednesday 28th January

  • 09:00 : Registration
  • 09:20 : Welcome & Introduction
  • 09:30 : File systems on ARCHER and the RDF. Moving Data between file systems, and on and off ARCHER and the RDF.
  • 11:00 : Coffee
  • 11:15 : Walkthrough: Moving data. Demonstrations of using GridFTP and related data transfer techniques.
  • 12:30 : Lunch
  • 13:30 : I/O libraries & formats: HDF5
  • 14:30 : Practical (HDF5)
  • 15:30 : Coffee
  • 15:45 : I/O libraries & formats: NetCDF
  • 16:45 : Close

Thursday 29th January

  • 09:30 : Data Infrastructure Hardware: Filesystems, Hierarchical Storage, data-intensive computing
  • 10:30 : Coffee
  • 10:45 : Intro to Relational Databases
  • 11:30 : Practical: SQL
  • 12:30 : Lunch
  • 13:30 : Intro to NoSQL databases
  • 14:30 : Practical: MongoDB
  • 15:30 : Coffee
  • 15:45 : XML & JSON
  • 16:45 : Close

Course Materials

Slides and exercise material for this course will be available soon.

Location

The course will be held in Room 3305, JCMB, King's Buildings, Edinburgh.

Registration

Please use the PRACE website to register for ARCHER courses.

Questions?

If you have any questions please contact the ARCHER Helpdesk.