SciPost Submission Page
RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles
by Nicholas P. Bailey, Trond S. Ingebrigtsen, Jesper Schmidt Hansen, Arno A. Veldhorst, Lasse Bøhling, Claire A. Lemarchand, Andreas E. Olsen, Andreas K. Bacher, Lorenzo Costigliola, Ulf R. Pedersen, Heine Larsen, Jeppe C. Dyre, Thomas B. Schrøder
This is not the latest submitted version.
This Submission thread is now published as SciPost Phys. 3, 038 (2017)
|As Contributors:||Nicholas Bailey · Lorenzo Costigliola|
|Arxiv Link:||https://arxiv.org/abs/1506.05094v3 (pdf)|
|Date submitted:||2017-09-25 02:00|
|Submitted by:||Bailey, Nicholas|
|Submitted to:||SciPost Physics|
RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU's). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to other GPU-MD codes at large system sizes and substantially better at smaller sizes.RUMD is open-source and consists of a library written in C++ and the CUDA extension to C, an easy-to-use Python interface, and a set of tools for set-up and post-simulation data analysis. The paper describes RUMD's main features, optimizations and performance benchmarks.
Ontology / TopicsSee full Ontology or Topics database.
Submission & Refereeing History
Published as SciPost Phys. 3, 038 (2017)
You are currently on this page
Reports on this Submission
Report 2 by Daniele Coslovich on 2017-10-27 (Contributed Report)
- Cite as: Daniele Coslovich, Report on arXiv:1506.05094v3, delivered 2017-10-27, doi: 10.21468/SciPost.Report.269
1- The paper demonstrates convincingly the efficiency of the RUMD simulation package for small and medium size particle systems
2- It provides a succinct but complete description of the code internals
No major weakness
The manuscript describes the RUMD simulation package. It is a well-tested molecular dynamics code running entirely on graphic processing units (GPU). It distinguishes itself from related software for the very good performance on small and medium systems size, down to a few thousands particles and even less. This is achieved by dynamically adjusting the number of threads per particle involved in the force calculation at the beginning of the simulation. The paper describes succinctly this and other low level optimizations that enable very good efficiency on both small/medium and large system sizes. The direct comparison between RUMD and the well-established LAMMPS package for a standard Lennard-Jones benchmark nicely demonstrates the effectiveness of the approach. I have used RUMD myself in a few projects of mine and I confirm it runs efficiently on small systems, making it an ideal building block of more complex simulation strategies.
The paper is carefully written, with a sufficient level of detail about the code internals. Its publication on a peer-reviewed journal is long overdue and it is great to see it submitted to SciPost. I definitely recommend its publication.
- Table 4: Lennard-Jones (not Lennard-Jone)
Anonymous Report 1 on 2017-10-22 (Invited Report)
- Cite as: Anonymous, Report on arXiv:1506.05094v3, delivered 2017-10-22, doi: 10.21468/SciPost.Report.267
1) Introduces an efficient package for MD simulations on GPUs, which does not waste hardware resources for small to medium size systems. This offers in particular the possibility to simulate systems with 10^3-10^4 particles over very long time scales, e.g. to investigate glass-forming liquids
2) Nice overview of the features offered by the package, presentation of the main optimization strategies
1) Even though the manuscript reads well, it may look too technical to the non-specialized reader
The manuscript submitted by Bailey et al. reports on a molecular dynamics package, RUMD, optimized to efficiently use GPU resources not only for the usual goal of large systems, but also for smaller systems down to a few thousands of particles. Such an objective may sound counter-intuitive, but its relevance is well justified by the authors, who put forward three main reasons for targeting such performance: 1) simulating long time scales rather than large systems (this is particularly relevant for glass-forming liquids — a field in which the authors are well-known experts — and can be readily applied using the currently available hardware and the RUMD package) 2) as a building-block for multi-GPU simulations (which is indicated as a perspective not dealt with in this manuscript) 3) anticipating hardware development, with increased number of cores, from which the present work should benefit automatically.
Even though quite technical, the manuscript reads well. It does not describe the basics of GPU programming, but rather insists on the features which make the implementation of RUMD particularly efficient for small systems, in particular the optimization strategies which allow to limit the under-utilization of the hardware. Of particular interest is the use (and description in the appendix) of the autotuner, which optimizes the relevant parameters for a given system: neighbor list algorithm, associated skin size, and distribution among GPU threads of the neighbor list generation and force calculation. The performance of RUMD for Lennard-Jones and charged Lennard-Jones fluids is demonstrated and compared to various implementations of the LAMMPS code. Compared to the latter, better performance is achieved on GPUs for small to medium sized systems, while keeping a similar performance for large systems.
I recommend this manuscript for publication after the authors have addressed the following points.
1) The performance is measured with respect to that of the LAMMPS code and its implementations on GPU, for the same number of cores (2688), as well as a pure CPU version running on 12 Xeon cores — for which a weaker performance is found. Could the authors explain this choice of reference for CPUs, since the number of cores is so different?
2) In the same spirit, the authors mention in the introduction several existing MD codes based on GPUs, but no comparison is then provided to measure the relative performance of RUMD. Even though such a numerical comparison may be out of the scope of the present work, could they explain what are the specific features of RUMD which resulted in their choice of developing a new code instead of using the existing ones? In addition, it might be useful in the introduction to explicitly mention why GPUs might be interesting for MD in general, so that non-specialist readers may also appreciate this point.
3) A list of available features is provided. The authors also mention the project of implementing Ewald summation. Indeed I believe that this is an important point, since this approach is more standard than the truncated-shifted/Wolf method. Could the authors explain how this could be achieved? Another important feature (which I believe is not yet included, but I might have missed it) is that of constraints on bonds/angles, as necessary e.g. for the most standard water models.
4) On page 6, the authors refer to « devices of compute capability at least 3.5 ». This is unclear to me. Could the authors explain what this means?
5) The algorithm presented in Figure 3 (and in the text in general) discusses only pair interactions. Is there a different implementation for more-than-two-body interactions? In the same figure, the line referring to double-counting deals with the variable my_f.w Could the authors explain what this is?
6) On page 9, « aviable » probably means « available »?
7) On page 11, could the authors explain further the calculation leading to the 58% estimate with the choice of cell size?
8) In Table 2 one observes a non-monotonic decay of the parameter pb in the range N 32768-262144 (associated with a decrease in the skin but at constant tp=1). Could the authors comment on this evolution?