Optimizing the I/O performance of OpenFOAM for massively parallel high-fidelity CFD simulations
eCSE07-15Key Personnel
PI/Co-Is: Dr. Wes Armour (PI), Dr. Neil Ashton(Co-PI), Dr. Ian Bush (Co-I) - University of Oxford
Technical: Dr. Stef Salvini - University of Oxford
Relevant documents
eCSE Technical Report: Optimizing the I/O performance of OpenFOAM for massively parallel high-fidelity CFD simulations
Project summary
Project description
OpenFOAM is a comprehensive, flexible, widely used package for CFD computation. Its flexibility allows users to plan and implement their own problems starting from a "semi-symbolical" definition, choosing items such as discretization and solution methods but avoiding the details of their implementation. A range of templates, solutions, codes etc. are available to enable users to define their problem and solve it.
Whilst OpenFOAM has many strengths it's code design has resulted in some fundamental weaknesses for I/O:
1. Parallelism is not optimal. OF uses MPI, but MPI is linked internally by the executable at run time, rather than at link time. That implies that MPI facilities in use and their locations in the code are limited. Advanced parallel libraries such as PETSci (ANL) cannot be currently used which means linear solvers have to be directly coded.
2. 2. The OpenFOAM IO system has several characteristics that make it less than ideal for HPC systems :
a. For a typical case at each checkpoint ( N_T ) each processor (N_P ) creates and writes a file for each variable ( N_F ) (e.g., pressure, velocity, faces, cells etc.). Thus the number of files created is F= N_T *[N]_P*N_F.
For example, a large ARCHER scale problem (10,000 cores), with NT=100, NP=10000, NF=20, would generate 20 million files. This would be, to say the least, impractical in a non-dedicated large-scale HPC environment. It would firstly overrun the current limit of 500,00 files in the ARCHER /work directory and becomes inefficient for file transfer and usability.
b. Objects types, such as scalars, vectors, tensors, matrices etc. are defined in OpenFOAM, but only accessible at IO time as collections of individual data items, stripped of any metadata. For example, when writing a vector, the length of the vector is not available (this was confirmed by code developers).
c. Each individual entry (number) is output independently directly to an external file. This is very inefficient as no pipelining is possible and use of full cache lines is denied, thus increasing memory traffic, etc.
d. The mechanism for input data is even more cumbersome, where each element is read in character by character then decoded.
Achievement of objectives
The aim of this project was to prototype the introduction of a modern, parallel, large scale IO into OpenFOAM (OF). We proposed the following requirements:
- Minimum modifications to the code, no changes in users' interface, data, etc.
- Default behavior exactly as standard OF. Users' runtime choice to use standard OF files, OF files and HDF5, HDF5 only.
- Reduce the number of output files, while maintaining the ability to remove cleanly time dumps already stored (for example, time dumps outside a "storage window").
- Simplify users' editing in the case of restart.
- Clean compilation using the standard OF mechanisms.
- Testing and benchmarking.
All of these objectives have been achieved, which has been demonstrated for a representative test-case. The number of files have been reduced from 10,000's (number of processors X number of variables) to either a single file or a small number of files if processor bunching is used (explained in detail in the technical report). This represents a very useful addition to the code for OpenFOAM users.
At present there is an additional computational cost for this HDF5 addition however we believe the benefit of reducing the number of files is more beneficial than the time to save each checkpoint for the vast majority of cases because the time taken to transfer and compress multiple larger files saves more time than the checkpoint time alone. Future work will however be to reduce this additional computational effort for the HDF5 checkpoint.
Summary of the Software
OpenFOAM is an open-source C++ toolbox that is most commonly used for CFD and is used by thousands of academics and companies. It supports arbitrary polyhedral unstructured grids and contains its own Cartesian-prismatic grid generation - snappyHexMesh. A range of solvers are available from steady-state SIMPLE-type approaches to unsteady PISO schemes which allow the use of various turbulence modelling approaches e.g RANS, DES, LES, DNS. For this work version 4.1 is used and all modification are released as Open source.
To obtain the objectives of this project, six files were modified (which is in line with a minimum disruption policy), these were: OFstream.C and OFstream.H, OSstream.C, OSTream I.C and OSstream.H, and regIOobjectWrite.C.
A github repository has been created containing both the code and instructions for to its use, which can be found at: https://github.com/stefsal/OeRC_OpenFOAM_HDF5 .