Adding Multiscale Models of DNA to LAMMPS

eCSE05-10

Key Personnel

PI/Co-I: Dr Oliver Henrich - University of Edinburgh, Dr Thomas Ouldridge - Imperial College London, Dr Davide Marenduzzo - University of Edinburgh

Technical: Dr Oliver Henrich - University of Edinburgh

Relevant documents

eCSE Technical Report: Adding Multiscale Models of DNA to LAMMPS

Project summary

DNA modelling has been an important field in biophysics for decades. Traditionally, most of the available simulation techniques have worked at the atomistic level of detail. Recent times have witnessed a rapid increase of a new research effort at a different level. Coarse-grained (CG) DNA modelling is indispensable for the modelling of DNA on timescales in the microsecond range and beyond, or when very long DNA strands (of tens to hundreds of kilo base pairs) have to be considered. This is for instance important for the dynamics of DNA supercoiling, of genomic DNA loops and of chromatin or chromosome fragments.

A small number of very promising CG DNA models have emerged to date. These models, however, are often based on bespoke software developed by individual research groups, which creates significant entry barriers and restricts their use to a small user community. On the other hand, a suitable platform for CG simulation of DNA has emerged through the popular and powerful Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) for molecular dynamics.

During this project, oxDNA, a CG model for DNA and RNA, was ported into the LAMMPS code. The oxDNA model has been developed in the groups of Ard Louis and Jonathan Doye at the University of Oxford. Until now, this model was only available as bespoke and standalone software. Through the efficient parallelisation of LAMMPS it is now possible to run oxDNA in parallel on multi-core, multi-processor and distributed memory architectures, extending its capabilities to unprecedented time and length scales. The largest system that could be studied by oxDNA was previously limited by the size of system that can be fitted onto a single GPU. It is worth mentioning that the GPU-accelerated version of the standalone code can only achieve speedups of up to a factor 30 compared to the single core performance.

The results of the scaling tests and performance analysis are very encouraging and demonstrate that LAMMPS is absolutely capable of tackling extremely large problems, which are well beyond what could be reached with the standalone versions. The scaling analysis of the benchmarks gives evidence that the performance of the GPU-enabled version can be easily matched on a single ARCHER node with 24 MPI-tasks and an MPI-only implementation.

We also implemented new Langevin-type integrators for rigid bodies, as the range of accurate rigid-body integrators in LAMMPS was previously limited. The DOT-C integrator offers additional advantages over the existing standard LAMMPS rigid-body integrators for Langevin dynamics. At the costs of a small additional overhead it shows improved stability and allows considerably larger timestep sizes.

We consider this project also as a starting point for multiscale modelling of DNA. This could be achieved through combining different CG and atomistic models in one single simulation. Through its extensibility and excellent performance LAMMPS appears ideally suited for such an undertaking.

Enabled science and impact

The impact of this project is fundamental both from a scientific and a technological point of view. The new LAMMPS implementation of the oxDNA model will enable researchers to address fundamental open questions on unprecedented time and length scales.

We envisage that it will be particularly useful for three key areas. Firstly, many examples of DNA nanotechnology involve large systems containing many thousands of nucleotides, such as structures built from DNA origami, DNA bricks or compound tiles. Allowing oxDNA to harness efficient multi-processor CPU-architectures will be extremely helpful in this regard. With the current single-CPU-single-GPU version of oxDNA we can only address the structural and mechanical properties of single origami. The new code could be used to study the structure and mechanics of even larger multi-origami nanostructures, e.g. multi-origami polyhedra.

A second important field of application are biological systems. This includes the structure and dynamics of protein-DNA systems, DNA supercoiling and RNA hybridisation as well as DNA-RNA interactions. In all these cases the DNA strands that need to be studied are often tens or hundreds of kilo base pairs long. For instance, the new code will allow the first ever study of the dynamics of supercoiling DNA domains, each of which is about 100 kbp in size. It could also address the dynamics and some aspects of the assembly of multi-nucleosome chromatin fibres made up by DNA wrapped around histone octamers on linear or supercoiled DNA templates. Each nucleosome comprises about 200 bp, so that a chain of a hundred or more nucleosomes is already well beyond the current capabilities of the standalone code.

Another category of systems is smaller, on the scale of up to a few thousands of nucleotides. Slow transitions opposed by large free-energy barriers are hard to measure for these systems. Sampling can be enhanced by techniques such as Forward Flux Sampling (FFS). FFS is a naturally parallel technique, because it involves running many independent simulation trajectories. Efficient parallelisation on CPU-architectures will allow the ideal number of cores (depending on system size) to be devoted to each trajectory, whilst running many trajectories in parallel. One of the current bottlenecks is that for more complicated processes each individual simulation, which either has to reach the next interface or go back to the first interface, can take a long time to run. This will be circumvented with the new code if each of the simulations is a parallel rather than a serial job. Example studies might include the assembly of DNA polyhedra or the detailed investigation of blunt-ended strand displacement as a function of temperature.

The wider scientific benefit of this project lies beyond enhanced capabilities of oxDNA and the science that can be directly addressed through it. CG DNA modelling currently suffers from a lack of systematic comparison between the different CG models and between CG and atomistic models. This forms a formidable roadblock in current DNA research. Without a common basis on which these models can be compared, further progress in the field of DNA modelling is likely to come to a halt. This project can form a starting point for multiscale modelling of DNA by combining different CG DNA models in one single simulation with the LAMMPS code as underpinning computational engine. This will be of great value for the large community of theoretical and experimental physicists, chemists, engineers and biologists interested in DNA and chromatin modelling, genetics and nanotechnology.

Achievement of objectives

1. Implement the oxDNA coarse-grained (CG) DNA model for single- and double-stranded DNA into the LAMMPS code, test and validate the implementation.

This objective has been achieved.

2. Adapt oxDNA utility software to produce input topology for LAMMPS and handle LAMMPS output.

This objective has been partly achieved. We converted setup tools for single and double stranded DNA from native oxDNA to LAMMPS format. The conversion of analysis tools is currently work in progress.

3. Release the software under GNU General Public License (GPL) and distribute the software as LAMMPS USER-package.

This objective has been achieved. The source code is also available from our repository at CCPForge (see below).

4. Create a starting point for a systematisation and comparison of CG DNA models through a set of standardised test problems and structural and dynamical trajectory information.

This objective has been achieved. A set of simple test cases that consist of a single stranded helix, a duplex and an array of duplexes are provided with the software distribution.

5. Additional objective, not part of the original proposal:
Implement new rigid-body integrators for NVE ensemble and Langevin dynamics.

This objective has been achieved.

Summary of the Software

The software is open source and distributed under GNU General Public License (GPL). On ARCHER, it is available as a module and can be loaded with:

module load lammps/oxdna

The source code is distributed as LAMMPS USER-package CGDNA and part of the official LAMMPS download. It is also available from our main repository at CCPForge (https://ccpforge.cse.rl.ac.uk/gf) under the project name Coarse-Grained DNA Simulation (cgdna). You can request to join the project for full access, which includes permission to browse the repository and commit changes. Anonymous access is also provided via subversion:

svn checkout https://ccpforge.cse.rl.ac.uk/svn/cgdna

To compile, either load the package via:

make yes-USER-CGDNA

or copy all source code from /cgdna/trunk/oxdna/src into your LAMMPS directory and compile as usual.

There are no restrictions on the version of LAMMPS, but you need to load the standard packages MOLECULE and ASPHERE by issuing

make yes-molecule yes-asphere

Templates for input and data files as well as setup tools for single and double-stranded DNA or arrays of DNA duplexes can be found in /cgdna/trunk/oxdna/util