Performance Analysis of Different Computational Architectures: Molecular Dynamics in Application to Protein Assemblies, Illustrated by Microtubule and Electron Transfer Proteins

All-atom molecular dynamics simulation represents a computationally challenging, but powerful approach for studying conformational changes and interactions of biomolecules and their assemblies of different kinds. Usually, the numbers of simulated particles in modern molecular dynamics studies range from thousands to tens of millions, while the simulated timescales span from nanoseconds to microseconds. For cost and computation efficiency, it is important to determine the optimal computer hardware for simulations of biomolecular systems of different sizes and timescales. Here we compare performance and scalability of 17 commercially available computational architectures, using molecular dynamics simulations of water and two different protein systems in GROMACS-5 package as computing benchmarks. We report typical single-node performance of various combinations of modern CPUs and GPUs, as well as multiple-node performance of “Lomonosov-2” supercomputer in molecular dynamics simulations of different protein systems in nanoseconds per day. These data can be used as practical guidelines for selection of optimal computer hardware for various molecular dynamics simulation tasks.


Introduction
Molecular dynamics (MD) is a powerful method to study conformational dynamics and interactions of biomolecules, including protein assemblies. Because of a big number of particles, which make up protein systems, and multiple computational steps, usually required to achieve meaningful results, MD simulations of proteins represent a major computational challenge. Therefore, indentification and use of optimal hardware for high efficiency calculations is important. In this work we systematically compare multiple currently available computer architechtures in their MD simulation performance, using two types of biomolecular systems as computing benchmarks: (i) water boxes of different sizes and (ii) two protein systems. The selected protein systems include different assemblies of tubulins, the building blocks of microtubules [5], and a photosynthetic electron-transfer complex of plastocyanin and cytochrome f proteins [4].

Methods
All-atom explicit solvent MD was used in all tests. Calculations were performed with the use of software package GROMACS-5 [3], which allows parallel computing on hybrid architecture with the CHARMM27 force field. All benchmarks were run for 15 minutes. TIP3P water model was employed. The protein structures were obtained from the Protein Data Bank. We used higher plant plastocyanin-cytochrome f complex (PDB id 2PCF) and GMPCPP-bound tubulin structure (PDB id 3J6E). The size of the virtual cell was chosen in such a way that the distance from the protein surface to the nearest box boundary was no less than two nanometers. The particle mesh Ewald method was used for the long-range electrostatics. All-bond PLINKS constraints and mass rescaling were applied to the tested protein systems. Coulomb and Lennard-Jones cut-offs were both set to 1.25 nm. Specifications of MD systems used for benchmarking are summarized in (Tab. 1).

Results and Discussion
To begin with, we used our first benchmark, the water box, in order to examine performance of MD simulations as a function of the number of particles in the molecular system. We conducted MD simulations of water boxes of various sizes using 17 different single-node computer systems with various CPU/GPU architectures. Data summarized in (Tab. 2) suggest that an increase of MD system size leads to an unproportional decrease of computer performance. However, the relative extent of the decrease is almost equal for different computer systems. Not suprisingly, 2×Intel Xeon E5-2695 with four Tesla K80 GPUs shows the highest performance for all the tested MD systems. However, Intel Core i7-5930K with GTX 980 has the most optimal performanceprice combination out of all hardware configurations we tested, consistent with conclusions of a previous study [2].
To further address the question of scalability, we used our second type of computing benchmark and established the dependence of the supercomputer "Lomonosov-2" performance in MD simulations on the number of computer nodes used. As expected, we could clearly see that for all three tested protein systems the performance grew as a function of the number of supercomputer nodes (Tab. 3). The relative rate of that growth did not significantly depend on the type and size of the biological system and slowed down gradually, roughly following Amdahl's law. 2×Intel Xeon E5-2697 ("Lomonosov-2") Tesla K40 87.5 13.5 9.0 6.9 5.5

Conclusion
Our comparative performance analysis suggests that for relatively small biomolecular systems, below 100,000 atoms, such as the complex of plastocyanin and cytochrome f proteins, it is quite practical to use "personal supercomputers", i.e. single node workstations with a video accelerator. Such a computer can provide 100 ns/day performance for molecular dynamics calculation of a small biomolecular system with the size of about ten thousand atoms. For larger biomolecular systems, like a fragment of microtubule or a part of biological membrane with protein complexes, "personal supercomputers" are not currently fast enough, with typical performance of only several ns/day. Therefore, for large systems usage of modern supercomputers, like "Lomonosov-1" or "Lomonosov-2" with hybrid architecture is imperative [1]. By employing dozens of supercomputer nodes, such hardware systems are capable of accelerating calculations by an order of magnitude, providing up to 22 ns/day performance of GROMACS-5 MD simulation for a system sized more than one million particles.