GPU-accelerated Gibbs ensemble Monte Carlo simulations of Lennard-Jonesium

doi:10.1016/j.cpc.2013.06.020

Computer Physics Communications

Volume 184, Issue 12, December 2013, Pages 2662-2669

https://doi.org/10.1016/j.cpc.2013.06.020 Get rights and content

Abstract

This work describes an implementation of canonical and Gibbs ensemble Monte Carlo simulations on graphics processing units (GPUs). The pair-wise energy calculations, which consume the majority of the computational effort, are parallelized using the energetic decomposition algorithm. While energetic decomposition is relatively inefficient for traditional CPU-bound codes, the algorithm is ideally suited to the architecture of the GPU. The performance of the CPU and GPU codes are assessed for a variety of CPU and GPU combinations for systems containing between 512 and 131,072 particles. For a system of 131,072 particles, the GPU-enabled canonical and Gibbs ensemble codes were 10.3 and 29.1 times faster (GTX 480 GPU vs. i5-2500K CPU), respectively, than an optimized serial CPU-bound code. Due to overhead from memory transfers from system RAM to the GPU, the CPU code was slightly faster than the GPU code for simulations containing less than 600 particles. The critical temperature $T_{c}^{*} = 1.312 (2)$ and density $ρ_{c}^{*} = 0.316 (3)$ were determined for the tail corrected Lennard-Jones potential from simulations of 10,000 particle systems, and found to be in exact agreement with prior mixed field finite-size scaling calculations [J.J. Potoff, A.Z. Panagiotopoulos, J. Chem. Phys. 109 (1998) 10914].

Introduction

Over the last 50 years, Monte Carlo simulations have been used extensively to provide fundamental insight regarding the relationship between atomic-level interactions and macro-scale phenomena. In the late 1980s through the 1990s, a proliferation of new algorithms, including Gibbs ensemble [1], [2], configurational-bias[3], [4], [5], [6], [7], Gibbs–Duhem integration [8], and histogram-reweighting techniques [9], [10], [11], [12] simplified greatly the determination of vapor–liquid coexistence curves. The Gibbs ensemble method is particularly versatile; in addition to vapor–liquid coexistence, this method has been used to determine adsorption equilibria in single [13], [14], [15], [16] and multi-component systems [17], [18], membrane equilibria, solid–fluid [19] and solid–vapor [20], [21] equilibria. Unlike molecular dynamics, where highly optimized parallel codes are widely available to the public [22], [23], [24], [25], [26], most Gibbs ensemble Monte Carlo calculations (GEMC) have been performed with codes that execute on a single CPU core. This limits calculations to relatively small system sizes ( $< 5000$ atoms), and precludes the use of GEMC for the simulation of equilibria in most biological systems.

There have been numerous attempts to parallelize Monte Carlo simulations based on three core algorithms, embarrassingly parallel, farm, and domain decomposition. The embarrassingly parallel approach is the simplest to implement; a replica of the system is placed on each processor core, and each instance runs independently without any communication between processors. For systems with short equilibration periods, parallel efficiencies near 100% are possible. However, for systems with long equilibration periods, this method quickly becomes inefficient. In the farm algorithm [27], also known as energetic decomposition, the energy calculation is distributed over the available processing cores. Some efforts have shown improved performance with this method [28], but the farm algorithm has been avoided due to communications overhead noted in early comparative studies between energetic decomposition and embarrassingly parallel algorithms [29]. Domain decomposition is an example of a modified Markov chain algorithm [30], and leverages the fact that in large systems, particles outside the range of interactions of other particles may be moved simultaneously. The simulation box is split into cells, and interactions on each cell are calculated on a single CPU core. The amount of inter-processor communication is reduced compared to the farm method. In early work, domain decomposition was dismissed as inefficient, since its performance was worse than the embarrassingly parallel method [31]. More recently, however, it has been shown that by using a sequential updating algorithm the parallel efficiency of domain decomposition may be improved significantly [32], [33], [34]. Additional modified Markov chain algorithms have been proposed where displacement of all particles, or a subset of particles, is attempted simultaneously [35]. These methods are referred to as hybrid Monte Carlo methods, due to the use of velocities to determine new trial locations, which are typically used in Newtonian-physics based molecular dynamics simulations. The efficiency of hybrid Monte Carlo methods tends to be dictated by the phase [36] and type of atoms or molecules studied [37].

In addition to general methods of parallelizing atomistic Monte Carlo simulations, some specialized ensemble-specific techniques have been demonstrated. Chen and Hirtzel proposed a macro state Markov chain model (MSMCM) for parallelization of simulations in the grand canonical ensemble [38], [39], [40]. Spatial updating algorithms, combined with sequential updating and domain decomposition, have been used to improve the efficiency of parallel Monte Carlo simulations in the grand canonical ensemble [34]. Parallel grand canonical Monte Carlo simulations have been combined with molecular dynamics for the simulation of diffusion in porous materials [41], [42]. Additional parallelization efforts have been focused on the configurational-bias algorithm, which is crucial to achieving reasonable acceptance rates for molecule transfers in grand canonical and Gibbs ensemble Monte Carlo simulations. In order to improve the success rate for growing chain molecules in a dense system, Esselink et al. proposed evaluating multiple trial locations for the first bead in parallel [43]. This methodology for particle exchange was combined with hybrid Monte Carlo moves for particle displacement in Gibbs ensemble Monte Carlo where a parallel efficiency of approximately 80% was observed [36], [43].

In recent years the field of parallel computing has shifted its focus from the CPU to a new kind of hardware, the graphics processing unit (GPU), which was originally created satisfy the demand for increasing realism in virtual video game environments [44]. GPUs have outpaced CPU in terms of units of computational power per electrical power consumption and computational power per discrete hardware costs. GPUs typically devote more transistors to fast parallel math processing than CPU at the expense of flexibility. Compared to the relatively small collection of complex processing cores found in modern CPU, GPUs are composed of a collection of simpler processors (streaming multiprocessors–“SMs”–each composed of dozens of shader core subunits) capable of massively parallel math operations. GPUs have several types of volatile storage available for simulation variables—DRAM, constant cache memory, global cache memory, shared memory and registers. DRAM has the highest latency, while registers offer the lowest latency. However, the DRAM bank has significantly larger capacity than the registers or cache, hence part of the challenge in developing efficient algorithms to utilize the GPUs is optimizing variable distribution and memory throughput.

Given inherent hardware differences between multi-core CPU and GPU, the creation of optimized GPU algorithms requires a fundamental reexamination of parallelization methods. Some methods that appear less viable for CPU parallelization may perform well on the GPU, and the converse may also be true. Additionally, there is an opportunity to develop new algorithms/techniques, which were not considered or postulated due to the architectural differences between single instruction multiple data (SIMD) devices and the traditional CPU-based multiple-instruction multiple-data (MIMD) clusters and supercomputers. Despite relatively mature GPU-enabled molecular dynamics simulation engines[25], [45], [46], [47], [48], [49], there has been significantly less exploration of Monte Carlo simulations on GPU. As with molecular dynamics, GPGPU computing is expected to bring sizeable benefits to Monte Carlo simulations. Early work has focused on simple systems, due to the relative complexity of writing optimal GPU code. 2D and 3D simulations of Ising models were recently accelerated using single [50] and multiple GPUs [51]. GPU-driven canonical ensemble simulations of hard spheres [52], have also been published. Canonical ensemble simulations of methane in a zeolite framework have been performed on GPU for small systems ( $N_{methane} \leq 128$ particles) [53]. This code was later expanded to perform GPU-accelerated grand canonical Monte Carlo simulations of methane and CO₂ in a zeolite [54], [55], [56].

In this work, an implementation of Gibbs ensemble Monte Carlo on the GPU is presented. A straightforward implementation of the farm algorithm is used for parallelization of the pair-wise energy calculations for the three distinct move types: particle displacement, swap and volume exchange. The choice of the farm algorithm is motivated by the fact that for complex molecules, such as those found in biomolecular simulations, long equilibration periods are required for adequate sampling of phase space, which reduces significantly the efficiency of embarrassingly parallel methods. Simulations of biomolecular systems present an additional problem of system size. While Monte Carlo simulations for phase equilibria and physical property prediction are performed typically for systems of 5000 atoms or fewer, typical biomolecular simulations contain tens to hundreds of thousands of atoms. Therefore it is important to have an implementation that scales well with system size. As a demonstration, calculations are performed for systems of Lennard-Jones beads ranging in size from 512 to 131,072 particles. While the properties of the Lennard-Jones fluid may be studied easily with a serial code, the wealth of data in the literature makes this the ideal system for the evaluation of new methodologies. Calculated energies, pressures, saturated liquid and vapor densities are presented for a range of temperatures and compared to literature data for validation. Detailed assessments of the performance of the GPU-accelerated GEMC code relative to an equivalently designed and optimized serial code are presented for a range of GPU and CPU. Additional data are presented for simulations in the canonical ensemble.

Section snippets

GPU implementation

The Gibbs ensemble methodology utilizes two simulation boxes, each representing a region of fluid deep within their respective phases (vapor, liquid) [1]. The three criteria for equilibria, equality of temperature, pressure and chemical potential between phases, are satisfied through three types of moves: particle displacement within a phase, volume exchange and particle swaps between phases. Acceptance probabilities for each move are defined as follows $displacement : acc (o \to n) = e^{- β [U_{new} - U_{old}]}$

Simulation details

In this work, interactions between particles are governed by the ubiquitous Lennard-Jones potential $U (r_{i j}) = 4 ϵ_{i j} [{(\frac{σ_{i j}}{r_{i j}})}^{12} - {(\frac{σ_{i j}}{r_{i j}})}^{6}]$ where $U$ is the configurational energy of the pair interaction, $ε_{i j}$ is the well depth, $σ_{i j}$ is the particle diameter and $r_{i j}$ is the distance between particles $i$ and $j$ . Reduced quantities were defined as $T^{*} = k_{B} T / ε, ρ^{*} = N σ^{3} / V, U^{*} = U / ε, p^{*} = p σ^{3} / ε$ .

Simulations were performed for both the truncated and tail-corrected Lennard-Jones potential. Analytical tail corrections were

Results and discussion

The GPU accelerated Gibbs ensemble Monte Carlo engine was used to determine vapor–liquid coexistence curves for the truncated and long range corrected Lennard-Jones potential for systems containing 10,000 atoms. These data are shown in Fig. 3, and show excellent agreement with the original Gibbs ensemble results of Panagiotopoulos for systems containing 500 atoms [1], as well as more modern approaches using grand canonical Monte Carlo simulations coupled with histogram reweighting methods [57]

Conclusions

In this work, an implementation of Gibbs ensemble Monte Carlo on the GPU has been demonstrated. Parallelization of routines to calculate pair-wise energies was performed using the method of energetic decomposition, where each pair interaction was calculated by a particular thread. This method is naturally well suited to the architecture of the GPU, but has limitations. For small system sizes, $N < 600$ particles, it was found that overhead from memory transfers between system RAM and the GPU led to

Acknowledgments

Financial support from the National Science Foundation (NSF CBET-0730786, OCI-1148168) and the Wayne State University Research Enhancement Program is gratefully acknowledged.

References (63)

X. Peng et al.
J. Colloid Interface Sci.
(2007)
T. Okayama et al.
Fluid Phase Equilib.
(1995)
S. Plimpton
J. Comput. Phys.
(1995)
J.A. Anderson et al.
J. Comput. Phys.
(2008)
G.S. Heffelfinger
Comput. Phys. Comm.
(2000)
F.A. Brotz et al.
Chem. Eng. Sci.
(1994)
A. Chen et al.
Sep. Technol.
(1994)
T.D. Nguyen et al.
Comput. Phys. Comm.
(2011)
C.L. Phillips et al.
J. Comput. Phys.
(2011)
W.M. Brown et al.
Comput. Phys. Comm.
(2011)

J.C. Phillips et al.

Adapting a message-driven parallel application to GPU-accelerated clusters

T. Preis et al.

J. Comput. Phys.

(2009)

B. Block et al.

Comput. Phys. Comm.

(2010)

W. Shi et al.

Fluid Phase Equilib.

(2001)

A.Z. Panagiotopoulos

Mol. Phys.

(1987)

B. Smit et al.

Mol. Phys.

(1989)

J.J. de Pablo et al.

Mol. Phys.

(1993)

M.G. Martin et al.

J. Phys. Chem. B

(1999)

J.I. Siepmann et al.

Mol. Phys.

(1992)

T.J.H. Vlugt et al.

Mol. Phys.

(1998)

C.D. Wick et al.

Macromolecules

(2000)

D.A. Kofke

Mol. Phys.

(1993)

A.M. Ferrenberg et al.

Phys. Rev. Lett.

(1988)

J.J. Potoff et al.

Mol. Phys.

(1999)

A.M. Ferrenberg et al.

J. Appl. Phys.

(1991)

A.M. Ferrenberg et al.

Phys. Rev. Lett.

(1989)

A.Z. Panagiotopoulos

Mol. Phys.

(1987)

S.C. McGrother et al.

Mol. Phys.

(1999)

C. Lastoskie et al.

Langmuir

(1993)

J.W. Jiang et al.

Langmuir

(2003)

M.B. Sweatman et al.

Mol. Simul.

(2004)

Cited by (28)

On the history of key empirical intermolecular potentials
2023, Fluid Phase Equilibria
Intermolecular potentials are considered which were suggested between 1903 and 1971. These comprise the Mie potentials, the Lennard-Jones potentials, the Kihara potential, the numerically tabulated potential of Dymond and Alder and the multi-parameter potential of Barker, Fisher and Watts. Their origin is described as well as subsequent applications. An interesting item regarding the Mie potential is its twofold appearance as attractive mean field approximation and as n,m-potential. We also emphasize the big step forward made by London in 1930 who suggested an approximate solution of the Schrödinger equation leading to an attractive potential with inverse sixth power in the intermolecular distance. This work of London was the basis for the 12,6-Lennard-Jones potential which appeared in 1931. Results obtained in the literature for thermodynamic and transport properties with the considered potentials are shown and discussed.
Review and comparison of equations of state for the Lennard-Jones fluid
2020, Fluid Phase Equilibria
The Lennard-Jones (LJ) potential is widely used for describing simple fluids; it is also a point of departure for developing models of complex fluids. Thermodynamic properties of the LJ fluid have been studied by molecular simulations by many authors and a critical review of the available data, which comprises about 35,000 data points, has been published recently [J. Chem. Inf. Mod. 59 (2019) 4248–4265]. The importance of the LJ fluid has also triggered the development of a large number of equations of state (EOS). In the present work, 20 LJ EOS were critically assessed by comparing their results with consolidated data from computer experiments. A large variety of thermophysical properties was considered: vapor pressure; saturated densities; enthalpy of vaporization; critical properties; thermal, caloric, and entropic properties at homogeneous state points; and second and third virial coefficients. It was found that none of the available LJ EOS meets the following two criteria: (1) it does not yield unphysical artifacts when used for extrapolations, and (2) it describes data from computer experiments within their statistical uncertainty in most fluid regions. Furthermore, a re-parameterization of the monomer term of the PC-SAFT EOS was carried out by fitting it to data of the LJ fluid. The new LJ EOS yields good results for the LJ fluid, but does not outperform the best existing LJ EOS.
Coexistence calculation using the isothermal-isochoric integration method
2019, Fluid Phase Equilibria
In this work, the isothermal-isochoric integration (ITIC) method is demonstrated as a viable method for vapor-liquid coexistence calculation by molecular simulation. Several tests are carried out to validate the method. The first group of tests utilizes self-consistent NIST REFPROP values to demonstrate that, in the absence of simulation uncertainties, the ITIC method yields coexistence values with less than 1% deviation for reduced temperatures of less than 0.85. The impact of various simulation specifications are then compared. Following our recommended simulation methodology, consistent results are achieved between the ITIC method, Gibbs Ensemble Monte Carlo (GEMC) method, and Grand Canonical Monte Carlo (GCMC) method for reduced temperatures of 0.6–0.85. The ITIC method proves to be much more effective compared to GEMC and GCMC methods for vapor-liquid coexistence calculations at reduced temperatures of 0.45–0.6, which are important for practical applications. It is shown that computational efficiency is often served best by applying the ITIC method for the entire temperature range rather than applying Monte Carlo (MC) methods for part of the range. Furthermore, the ITIC method lends itself to application with molecular dynamics (MD) as well as MC, advancing the prospect of simulation results that are quantitatively consistent across software platforms.
A GPU-based large-scale Monte Carlo simulation method for systems with long-range interactions
2017, Journal of Computational Physics
Citation Excerpt :
A. Yaseen and Y. Li used the remapping method to calculate the total energy on GPU for protein systems [37]. A group at Wayne State University [39,40] realized a GPU code for Gibbs ensemble MC simulation of simple liquids. J. Kim et al. developed an implementation with embarrassing parallel on GPU, where each block performs an individual Monte Carlo simulation [41–43].
In this work we present an efficient implementation of Canonical Monte Carlo simulation for Coulomb many body systems on graphics processing units (GPU). Our method takes advantage of the GPU Single Instruction, Multiple Data (SIMD) architectures, and adopts the sequential updating scheme of Metropolis algorithm. It makes no approximation in the computation of energy, and reaches a remarkable 440-fold speedup, compared with the serial implementation on CPU. We further use this method to simulate primitive model electrolytes, and measure very precisely all ion–ion pair correlation functions at high concentrations. From these data, we extract the renormalized Debye length, renormalized valences of constituent ions, and renormalized dielectric constants. These results demonstrate unequivocally physics beyond the classical Poisson–Boltzmann theory.
Scalable Metropolis Monte Carlo for simulation of hard shapes
2016, Computer Physics Communications
Citation Excerpt :
Two recent open source codes fall into the second category. CASSANDRA [24] uses OpenMP to run in parallel on the CPU and GOMC [25] uses CUDA to parallelize on NVIDIA GPUs. Both of these tools model atomistic systems with classical potentials.
We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel–Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
GOMC: GPU Optimized Monte Carlo for the simulation of phase equilibria and physical properties of complex fluids
2019, SoftwareX
GPU Optimized Monte Carlo (GOMC) is open-source software for simulating molecular systems using the Metropolis Monte Carlo algorithm. It supports simulations in a variety of ensembles, which include canonical, isothermal–isobaric, grand canonical, and Gibbs ensemble. GOMC can be used to study vapor–liquid equilibria, adsorption in porous materials, surfactant self-assembly, and condensed phase structure for complex molecules. GOMC supports a variety of all-atom, united atom, and coarse grained force fields such as OPLS, TraPPE, Mie, and Martini. The software has been written in object oriented C++, and uses OpenMP and NVIDIA CUDA to allow for execution on multi-core CPU and GPU architectures.

View all citing articles on Scopus

View full text

GPU-accelerated Gibbs ensemble Monte Carlo simulations of Lennard-Jonesium

Abstract

Introduction

Section snippets

GPU implementation

Simulation details

Results and discussion

Conclusions

Acknowledgments

J. Colloid Interface Sci.

Fluid Phase Equilib.

J. Comput. Phys.

J. Comput. Phys.

Comput. Phys. Comm.

Chem. Eng. Sci.

Sep. Technol.

Comput. Phys. Comm.

J. Comput. Phys.

Comput. Phys. Comm.

J. Comput. Phys.

Comput. Phys. Comm.

Fluid Phase Equilib.

Mol. Phys.

Mol. Phys.

Mol. Phys.

J. Phys. Chem. B

Mol. Phys.

Mol. Phys.

Macromolecules

Mol. Phys.

Phys. Rev. Lett.

Mol. Phys.

J. Appl. Phys.

Phys. Rev. Lett.

Mol. Phys.

Mol. Phys.

Langmuir

Langmuir

Mol. Simul.