Enabling large-scale microphysics in the MONC weather model
eCSE05-012Key Personnel
PI/Co-Is: Dr. Paul Connolly - University of Manchester, Dr. Michele Weiland - EPCC, University of Edinburgh, Dr. Adrian Hill - Met Office
Technical: Dr. Nick Brown - EPCC, University of Edinburgh
Relevant Documents
eCSE Technical Report: Enabling large-scale microphysics and optimising solver performance in MONC
Project summary
Thirty years ago, the maximum resolvable distance for atmospheric flows was on the order of 100km. Today, distances of 10km for global operational models, and 1km for regional models are available. With increased resolution comes increased accuracy, but challenges remain. Even at higher resolutions, the fundamental fluid motions of clouds and turbulent flows remain at the subgrid scale.
In order for models to represent and account for the interaction of these small-scale flows with the larger scale meteorology, physically based parametrizations are developed. A key tool in understanding the fundamental physics of these flows are Large Eddy Simulation (LES) models.
The Met Office/NERC Cloud model (MONC) is a highly scalable and flexible LES model. It has been developed in a collaboration between EPCC and the Met Office, with the support of the Joint Weather Climate Research Program (JWCRP) and NERC. MONC is capable of simulating clouds and other turbulent flows at resolutions of tens of metres on very large domains. The model is used to simulate a wide variety of atmospheric flows, such as dry boundary layers, fog, stratocumulus or deep moist convection. Each of these requires its own particular configuration, using varying levels of complexity or different numerical implementations.
This project focused on a number of computational areas, including optimising the iterative solver which is used to solve the pressure equation. The traditional method used here inherently limits the scalability of the code. By integrating with a common and widely available solver toolkit, the user has a significant choice about which solver to use and how. We have also optimised the CASIM micro-physics scheme, used to model moisture interactions. Our work has more than doubled the performance of this code on CPUs. As part of this work, CASIM has also been ported to GPUs and KNL, to see if these novel architectures could help to optimise performance.
A scalable and flexible model is good, but the key thing scientists requires is to know the diagnostic state of the system. The complex nature of many of the MONC simulations leads to a wealth of interesting and insightful diagnostics. These could potentially be used to understand the underlying physics of the behaviour of clouds. To access this information, the scalable computational core of the model needs to interact with a scalable and efficient diagnostics system.
An in-situ approach for analysing data and calculating these diagnostics exhibited a number of short comings, both in terms of performance but also features. We have significantly developed and extended the support for data analytics which has been demonstrated on over 36,000 cores.
The result of this work are a number of significant improvements to the model which not only provide significant new functionality to current users on ARCHER, but also future proof and identify avenues for further development.
Impact
The overarching aim of the development of MONC is to provide UK atmospheric research community with a highly scalable atmospheric process research model, which is used to develop and test future parametrisations for numerical weather and climate prediction models. The eCSE funding has been a fundamentally part of this development since it enabled the optimisation of the base microphysics scheme and the pressure, and the development of the data analytic methods which have led to a stable version of the MONC. The resulting latest stable revision of MONC is a central tool in several NERC projects. As it stands there are around 40 users of MONC, of which 30 are non-Met Office employees. Further, there are 7 PhD projects that are based on MONC.
Another important scientific impact of this work has been on the computer science side. In-situ data analytics is a field which is still fairly new and encompasses many challenges. We have developed a number of novel techniques and solutions as described in more detail in the technical report. We have also studied the use of accelerators (both GPUs and KNL) for CASIM to understand the applicability of a code like this to these architectures. CASIM was unusual because unlike many GPU code, it was not possible to isolate a few simple computationally intensive loops so instead we had to offload the entirety of the scheme to GPUs (via OpenACC.) It was interesting to understand the suitability for doing this, both in terms of maturity of technological support but also to understand the performance of this approach.
Achievement of objectives
The project objectives were to:
Objective 1: Optimise the new iterative pressure solver
We tackled this slightly differently than we had initially intended. We have initially envisaged the development of an optimal pre-conditioner for the problem. However, instead we decided that it would be far more useful and future proof if we integrated PETSc into MONC, with PETSc solving the pressure terms iteratively. A major benefit to this is that, based upon this integration, one then gains all the PETSc pre-conditioners and solvers for use with their specific problem.
As planned we have run, at scale, the MONC test-cases and determined for each common test case (which specific runs are often a modification of) the optimal iterative solver configuration to use as well as when one should utilise the FFT solver instead.
Obective 2: Optimise the new Cloud AeroSol Interacting Microphysics Scheme (CASIM)
We have optimised CASIM and significantly reduced the runtime of the code by undertaking a significant refactoring, optimising memory allocation and the ordering of loops for caching behaviour. The initial metric of “CASIM will increase the runtime by no more than a factor of 2” has been met as per the technical report and this work has over doubled the performance of the micro-physics scheme. We have also investigated the execution and tuning of CASIM on ARCHER KNL.
Additional objective: Optimise and further develop data analytics
After we started on the project it was clear that one of the blockers to running MONC on ARCHER were limitations to the maturity of data analytics. Diagnostic (analysed) values are calculated from the raw fields in-situ by sharing cores of a processor between computation and analytics. In addition to the additional work planned we have developed numerous improvements to the analytics, including significantly improving the performance, the ability to checkpoint-restart the analytics, fixed numerous bugs around correctness, provided the ability for multiple concurrent files with different analytics to be written and supported the efficient handling of raw (prognostic) fields so that our approach can hide the cost of IO.
Summary of the software
The software is licenced under BSD and hosted on the Met Office science repository. This repository is used because the Met Office is the code owner and this is their preferred code hosting and management service. Most importantly it fits in with their simulation workflow and tools, such as fcm and Rose. Access to the Met Office science repository is possible for those who are from a recognised partner organisation (such as a UK academic organisation) and have a Met Office sponsor which is often the code owner (Adrian Hill or Ben Shipway.)
Due to the BSD licence of the software a copy could also be hosted on an external service, such as Github, but we have avoided this because the Science Repository is a hub for current modifications being made by the community and-so we do not want these to get out of sync. We have developed full installation and execution instructions on the MONC science repository wiki, as well as an introductory two day course (involving both lectures and practicals) which has been run twice and the materials are available on the Wiki.