GPU-accelerated Gibbs ensemble Monte Carlo simulations of Lennard-Jonesium
Introduction
Over the last 50 years, Monte Carlo simulations have been used extensively to provide fundamental insight regarding the relationship between atomic-level interactions and macro-scale phenomena. In the late 1980s through the 1990s, a proliferation of new algorithms, including Gibbs ensemble [1], [2], configurational-bias[3], [4], [5], [6], [7], Gibbs–Duhem integration [8], and histogram-reweighting techniques [9], [10], [11], [12] simplified greatly the determination of vapor–liquid coexistence curves. The Gibbs ensemble method is particularly versatile; in addition to vapor–liquid coexistence, this method has been used to determine adsorption equilibria in single [13], [14], [15], [16] and multi-component systems [17], [18], membrane equilibria, solid–fluid [19] and solid–vapor [20], [21] equilibria. Unlike molecular dynamics, where highly optimized parallel codes are widely available to the public [22], [23], [24], [25], [26], most Gibbs ensemble Monte Carlo calculations (GEMC) have been performed with codes that execute on a single CPU core. This limits calculations to relatively small system sizes ( atoms), and precludes the use of GEMC for the simulation of equilibria in most biological systems.
There have been numerous attempts to parallelize Monte Carlo simulations based on three core algorithms, embarrassingly parallel, farm, and domain decomposition. The embarrassingly parallel approach is the simplest to implement; a replica of the system is placed on each processor core, and each instance runs independently without any communication between processors. For systems with short equilibration periods, parallel efficiencies near 100% are possible. However, for systems with long equilibration periods, this method quickly becomes inefficient. In the farm algorithm [27], also known as energetic decomposition, the energy calculation is distributed over the available processing cores. Some efforts have shown improved performance with this method [28], but the farm algorithm has been avoided due to communications overhead noted in early comparative studies between energetic decomposition and embarrassingly parallel algorithms [29]. Domain decomposition is an example of a modified Markov chain algorithm [30], and leverages the fact that in large systems, particles outside the range of interactions of other particles may be moved simultaneously. The simulation box is split into cells, and interactions on each cell are calculated on a single CPU core. The amount of inter-processor communication is reduced compared to the farm method. In early work, domain decomposition was dismissed as inefficient, since its performance was worse than the embarrassingly parallel method [31]. More recently, however, it has been shown that by using a sequential updating algorithm the parallel efficiency of domain decomposition may be improved significantly [32], [33], [34]. Additional modified Markov chain algorithms have been proposed where displacement of all particles, or a subset of particles, is attempted simultaneously [35]. These methods are referred to as hybrid Monte Carlo methods, due to the use of velocities to determine new trial locations, which are typically used in Newtonian-physics based molecular dynamics simulations. The efficiency of hybrid Monte Carlo methods tends to be dictated by the phase [36] and type of atoms or molecules studied [37].
In addition to general methods of parallelizing atomistic Monte Carlo simulations, some specialized ensemble-specific techniques have been demonstrated. Chen and Hirtzel proposed a macro state Markov chain model (MSMCM) for parallelization of simulations in the grand canonical ensemble [38], [39], [40]. Spatial updating algorithms, combined with sequential updating and domain decomposition, have been used to improve the efficiency of parallel Monte Carlo simulations in the grand canonical ensemble [34]. Parallel grand canonical Monte Carlo simulations have been combined with molecular dynamics for the simulation of diffusion in porous materials [41], [42]. Additional parallelization efforts have been focused on the configurational-bias algorithm, which is crucial to achieving reasonable acceptance rates for molecule transfers in grand canonical and Gibbs ensemble Monte Carlo simulations. In order to improve the success rate for growing chain molecules in a dense system, Esselink et al. proposed evaluating multiple trial locations for the first bead in parallel [43]. This methodology for particle exchange was combined with hybrid Monte Carlo moves for particle displacement in Gibbs ensemble Monte Carlo where a parallel efficiency of approximately 80% was observed [36], [43].
In recent years the field of parallel computing has shifted its focus from the CPU to a new kind of hardware, the graphics processing unit (GPU), which was originally created satisfy the demand for increasing realism in virtual video game environments [44]. GPUs have outpaced CPU in terms of units of computational power per electrical power consumption and computational power per discrete hardware costs. GPUs typically devote more transistors to fast parallel math processing than CPU at the expense of flexibility. Compared to the relatively small collection of complex processing cores found in modern CPU, GPUs are composed of a collection of simpler processors (streaming multiprocessors–“SMs”–each composed of dozens of shader core subunits) capable of massively parallel math operations. GPUs have several types of volatile storage available for simulation variables—DRAM, constant cache memory, global cache memory, shared memory and registers. DRAM has the highest latency, while registers offer the lowest latency. However, the DRAM bank has significantly larger capacity than the registers or cache, hence part of the challenge in developing efficient algorithms to utilize the GPUs is optimizing variable distribution and memory throughput.
Given inherent hardware differences between multi-core CPU and GPU, the creation of optimized GPU algorithms requires a fundamental reexamination of parallelization methods. Some methods that appear less viable for CPU parallelization may perform well on the GPU, and the converse may also be true. Additionally, there is an opportunity to develop new algorithms/techniques, which were not considered or postulated due to the architectural differences between single instruction multiple data (SIMD) devices and the traditional CPU-based multiple-instruction multiple-data (MIMD) clusters and supercomputers. Despite relatively mature GPU-enabled molecular dynamics simulation engines[25], [45], [46], [47], [48], [49], there has been significantly less exploration of Monte Carlo simulations on GPU. As with molecular dynamics, GPGPU computing is expected to bring sizeable benefits to Monte Carlo simulations. Early work has focused on simple systems, due to the relative complexity of writing optimal GPU code. 2D and 3D simulations of Ising models were recently accelerated using single [50] and multiple GPUs [51]. GPU-driven canonical ensemble simulations of hard spheres [52], have also been published. Canonical ensemble simulations of methane in a zeolite framework have been performed on GPU for small systems ( particles) [53]. This code was later expanded to perform GPU-accelerated grand canonical Monte Carlo simulations of methane and CO2 in a zeolite [54], [55], [56].
In this work, an implementation of Gibbs ensemble Monte Carlo on the GPU is presented. A straightforward implementation of the farm algorithm is used for parallelization of the pair-wise energy calculations for the three distinct move types: particle displacement, swap and volume exchange. The choice of the farm algorithm is motivated by the fact that for complex molecules, such as those found in biomolecular simulations, long equilibration periods are required for adequate sampling of phase space, which reduces significantly the efficiency of embarrassingly parallel methods. Simulations of biomolecular systems present an additional problem of system size. While Monte Carlo simulations for phase equilibria and physical property prediction are performed typically for systems of 5000 atoms or fewer, typical biomolecular simulations contain tens to hundreds of thousands of atoms. Therefore it is important to have an implementation that scales well with system size. As a demonstration, calculations are performed for systems of Lennard-Jones beads ranging in size from 512 to 131,072 particles. While the properties of the Lennard-Jones fluid may be studied easily with a serial code, the wealth of data in the literature makes this the ideal system for the evaluation of new methodologies. Calculated energies, pressures, saturated liquid and vapor densities are presented for a range of temperatures and compared to literature data for validation. Detailed assessments of the performance of the GPU-accelerated GEMC code relative to an equivalently designed and optimized serial code are presented for a range of GPU and CPU. Additional data are presented for simulations in the canonical ensemble.
Section snippets
GPU implementation
The Gibbs ensemble methodology utilizes two simulation boxes, each representing a region of fluid deep within their respective phases (vapor, liquid) [1]. The three criteria for equilibria, equality of temperature, pressure and chemical potential between phases, are satisfied through three types of moves: particle displacement within a phase, volume exchange and particle swaps between phases. Acceptance probabilities for each move are defined as follows
Simulation details
In this work, interactions between particles are governed by the ubiquitous Lennard-Jones potential where is the configurational energy of the pair interaction, is the well depth, is the particle diameter and is the distance between particles and . Reduced quantities were defined as .
Simulations were performed for both the truncated and tail-corrected Lennard-Jones potential. Analytical tail corrections were
Results and discussion
The GPU accelerated Gibbs ensemble Monte Carlo engine was used to determine vapor–liquid coexistence curves for the truncated and long range corrected Lennard-Jones potential for systems containing 10,000 atoms. These data are shown in Fig. 3, and show excellent agreement with the original Gibbs ensemble results of Panagiotopoulos for systems containing 500 atoms [1], as well as more modern approaches using grand canonical Monte Carlo simulations coupled with histogram reweighting methods [57]
Conclusions
In this work, an implementation of Gibbs ensemble Monte Carlo on the GPU has been demonstrated. Parallelization of routines to calculate pair-wise energies was performed using the method of energetic decomposition, where each pair interaction was calculated by a particular thread. This method is naturally well suited to the architecture of the GPU, but has limitations. For small system sizes, particles, it was found that overhead from memory transfers between system RAM and the GPU led to
Acknowledgments
Financial support from the National Science Foundation (NSF CBET-0730786, OCI-1148168) and the Wayne State University Research Enhancement Program is gratefully acknowledged.
References (63)
- et al.
J. Colloid Interface Sci.
(2007) - et al.
Fluid Phase Equilib.
(1995) J. Comput. Phys.
(1995)- et al.
J. Comput. Phys.
(2008) Comput. Phys. Comm.
(2000)- et al.
Chem. Eng. Sci.
(1994) - et al.
Sep. Technol.
(1994) - et al.
Comput. Phys. Comm.
(2011) - et al.
J. Comput. Phys.
(2011) - et al.
Comput. Phys. Comm.
(2011)
Adapting a message-driven parallel application to GPU-accelerated clusters
J. Comput. Phys.
Comput. Phys. Comm.
Fluid Phase Equilib.
Mol. Phys.
Mol. Phys.
Mol. Phys.
J. Phys. Chem. B
Mol. Phys.
Mol. Phys.
Macromolecules
Mol. Phys.
Phys. Rev. Lett.
Mol. Phys.
J. Appl. Phys.
Phys. Rev. Lett.
Mol. Phys.
Mol. Phys.
Langmuir
Langmuir
Mol. Simul.
Cited by (28)
On the history of key empirical intermolecular potentials
2023, Fluid Phase EquilibriaReview and comparison of equations of state for the Lennard-Jones fluid
2020, Fluid Phase EquilibriaCoexistence calculation using the isothermal-isochoric integration method
2019, Fluid Phase EquilibriaA GPU-based large-scale Monte Carlo simulation method for systems with long-range interactions
2017, Journal of Computational PhysicsCitation Excerpt :A. Yaseen and Y. Li used the remapping method to calculate the total energy on GPU for protein systems [37]. A group at Wayne State University [39,40] realized a GPU code for Gibbs ensemble MC simulation of simple liquids. J. Kim et al. developed an implementation with embarrassing parallel on GPU, where each block performs an individual Monte Carlo simulation [41–43].
Scalable Metropolis Monte Carlo for simulation of hard shapes
2016, Computer Physics CommunicationsCitation Excerpt :Two recent open source codes fall into the second category. CASSANDRA [24] uses OpenMP to run in parallel on the CPU and GOMC [25] uses CUDA to parallelize on NVIDIA GPUs. Both of these tools model atomistic systems with classical potentials.