Coarse-grained molecular dynamics simulations of biomolecules

Coarse-grained molecular dynamics (CGMD) simulations are increasingly being used to analyze the behaviors of biological systems. When appropriately used, CGMD can simulate the behaviors of molecular systems several hundred times faster than elaborate all-atom molecular dynamics simulations with similar accuracy. CGMD parameters for lipids, proteins, nucleic acids, and some artificial substances such as carbon nanotubes have been suggested. Here we briefly discuss a method for CGMD system configuration and the types of analysis and perturbations that can be performed with CGMD simulations. We also describe specific examples to show how CGMD simulations have been applied to various situations, and then describe experimental results that were used to validate the simulation results. CGMD simulations are applicable to resolving problems for various biological systems.


Introduction
Richard P. Feynman stated, "Everything we know is only some kind of approximation, because we know that we do not know all the laws as yet."In contrast, simulations attempt to reveal nature's secrets by applying approximations of nature's laws.As the name suggests, coarse-grained molecular dynamics (CGMD) simulates the behaviors of atoms and molecules by audacious approximations.
Nevertheless, when used effectively, CGMD simulations can explain the physicochemical nature or even predict the behavior of a biological system, which may be impossible experimentally.Because of this advantage, the amount of research that uses CGMD simulations has steadily increased in recent years (Figure 1).Although simulating a molecular system based on quantum chemistry is possible [1], molecular dynamics usually employs classical Newtonian physics to simulate molecular motions.The force between atoms is derived from bonded interactions, including 2-, 3-, and 4-body interactions, and non-bonded interactions such as van der Waals and electrostatic interactions.All-atom molecular dynamics (AAMD) calculate the motions of every atom, including those of hydrogen, and thereby simulate the behavior of a system with considerable accuracy.However, high computational costs limit the spatial and temporal scales that can be used with AAMD simulations.In contrast, CGMD substantially reduces these costs by replacing multiple atoms with a larger, unified particle (coarse-grained atom); thus, the degrees of freedom of a system are limited.This makes the energy function smoother and allows the use of larger time steps (e.g., 30 fs) as compared with that of AAMD (e.g., 2 fs).By appropriately setting up a CGMD system, the number of simulations can be exponentially increased with a limited loss of accuracy.
In this review, we discuss the variations, potentials, and limitations of CGMD simulations.
Strategic designing of a molecular system of interest and applying appropriate analyses are the keys to success with CGMD simulations.The validity of CGMD simulations can be supported by comparing the simulation results with experimentally derived evidence.Here, we describe specific examples of CGMD applications and the methods used to correlate the results obtained from simulations with those obtained from experiments.

Configuring a CGMD system
In essence, coarse-graining of a molecular system is a process to describe the behaviors of molecules in a simpler way by discarding less essential effects.Historically, a simple model in which amino acid residues were placed in a two-dimensional square lattice, the Gō model, was introduced to investigate protein folding [2].The Gō model takes into account only those interactions between residues that are present in the native protein structure.This strikingly simple theory was considerably successful in describing the folding mechanisms of real proteins.It inspired many researchers and, after various modifications, is still being applied.
There are infinite ways to construct CGMD models.Variations of these models are derived from the differences in the natures of molecular systems.For example, there is no need to treat solvent molecules, such as water, explicitly when the target molecular event involves DNA self-assembly [3].
In contrast, incorporating solvent molecules, ions, and electrostatic interactions between charged molecules is necessary when the object of interest is the translocation of cationic nanoparticles through a lipid bilayer [4].When proteins are the molecules of interest, the numbers and types of coarse-grained atoms assigned to represent each amino acid side chain are crucial factors that define a CGMD model [5].
In more complex CGMD models, four to six coarse-grained atoms are allocated to one amino acid residue.Some CGMD models explicitly incorporate hydrogen bonding [6].In contrast, only one pseudo atom is assigned for an entire subunit of a protein in a huge molecular system, such as a virus capsid.Some CGMD models are individually tailored for each study, whereas others are generalized for various molecular systems.Parameterization of atomic interactions (forcefield) requires expertise in physics and chemistry; thus, it is particularly difficult for researchers who are not expert in these fields to construct a universal CGMD model.However, there are some generalized CGMD models, including Martini [7,8] and NAMD-CG builder [9,10].Among these, the Martini forcefield has recently gained popularity [4,[11][12][13][14][15][16][17][18][19][20][21][22][23][24].Because this CGMD model runs under AAMD software Gromacs [25] with a modified forcefield for coarse-grained atoms, the computational algorithm for CGMD simulation is exactly the same as that used for AAMD.
The Martini forcefield incorporates preset parameters for general phospholipids and proteins.
CGMD simulations can basically be run for lipid membranes and/or proteins with known three dimensional structures.However, there are barriers during actual simulations because some knowledge of several processes is required, including the construction of a molecular system and potential energy minimization.Applications of the Martini forcefield have increased after additional coarse-grained parameters were proposed for lipids, including cardiolipin [26] and glycolipids [22], carbohydrates such as monosaccharides, disaccharides, and oligosaccharides [27], DNA [23], and artificial carbon nanotubes [19].Among CGMD models that have to be reparameterized each time and whose targets are restricted to specific types of molecules (e.g., peptides), Martini's ready-to-use and highly applicable system is very attractive.
In addition, the Martini forcefield can simulate the dielectric responses of solvent molecules.
Coarse-grained water atoms in the Martini forcefield were originally apolar, similar to several CGMD models.A recently proposed polarized water model enabled simulations of realistic phenomena, such as electroporation of a lipid bilayer with voltage applied across a bilayer membrane [28].

What can be done with CGMD simulations
During a CGMD simulation, analysis methods and perturbations similar to those used with AAMD can be used.Basic quantities used in a CGMD analysis include indicators of structural alterations such as root mean square deviation (RMSD) and root mean square fluctuation (RMSF), tilt angles of lipids and proteins, area per lipid within a lipid membrane, and diffusion constants for molecules.
Local pressure profiles [29], binding free energy, and potential of mean force are analyzed.In addition, voltage can be locally analyzed [4].These energy profiles are important for large molecular systems, such as for protein-protein interactions.
Regarding perturbations, the positions of every atom can be controlled during a CGMD simulation.Position restraints and force applications are used during steered molecular dynamics.
These features are used for protein and ligand docking.Replacing amino acid residues (mutation) is possible for proteins, which is useful for comparing the results of simulations with those of experiments.In addition to temperature and pressure, the surface tension of a lipid bilayer is controllable during a CGMD simulation.Applying a stretch force to membrane proteins is achieved by incorporating surface tension.
Membrane potential, the difference in electric potential between the inside and outside of a cell, is important for certain cellular functions.This voltage difference can be mimicked during a CGMD simulation by creating two compartments by modeling two parallel planes of lipid bilayers inside a simulation box, and then adding the appropriate types and numbers of ions into each compartment.
In addition, pH effects can be mimicked during a CGMD simulation.The method used to alter the pH in a molecular system by modifying the protonation state of a titratable site, originally suggested for AAMD simulations [30], is also possible for CGMD simulations [26,31].pH alters the protonation states of aspartate, glutamate, and histidine in proteins [32]; thus, it changes a protein's activities.
Another feature that makes CGMD simulations very powerful is "reverse graining", to reconstruct all atom geometry from coarse-grained molecular geometry.Reverse graining is achieved by a simulated annealing method, which efficiently determines the minimum potential energy [33,34].Reverse graining enables a strategy to first observe any large conformational changes in a molecular system by CGMD and then analyze the detailed molecular interactions by AAMD.

Advantages and disadvantages
If a molecular system is appropriately constructed, then high-speed CGMD simulations can be used with nearly the same accuracy as those achieved with AAMD simulations.An example of a CGMD simulation of a human immunodeficiency virus (HIV) membrane protein is shown (Figure 2).
CGMD simulation predictions of binding free energies between proteins are more than 500 times faster than those with AAMD simulations, and with similar accuracy [13].A CGMD simulation has the advantage of being fast; however, there are some disadvantages due to coarse graining.
A disadvantage with the Martini CGMD model is that alterations in the secondary structures of proteins are beyond the scope of simulations.If alpha helix parameters are assigned to the amino acid residues of a protein, then these parameters do not change throughout the simulations.This can potentially affect the accuracy of CGMD simulations.
Stark et al. reported that the estimated protein-protein interaction between lysozyme and chymotrypsinogen with a CGMD simulation was greater than the experimentally derived value.
They attributed this discrepancy to the intrinsic problems of CGMD parameters and proposed that down-scaling of van der Waals parameters could resolve this problem [11].The key to a successful CGMD simulation is to set up the molecular system of interest based on sufficient knowledge regarding these issues.

Application of CGMD
Although vast computational power enables more accurate AAMD to simulate larger molecular systems for a longer time [35,36], CGMD simulations are still good at handling large molecular systems, such as a membrane tether consisting of 4 million particles [37], a fully solvated protein in a liposome [38], and for effectively sampling the configuration space of target molecules because of rapid calculation speed.An example of a conformational change in a mechanosensitive ion channel in response to membrane tension is shown (Figure 3), where a CGMD simulation can reproduce the dynamic actions of a protein and aid in explaining the observations derived from experiments [21].
Analysis of these simulation results leads to a better explanation of the experimentally observed phenomena based on the theories of physics and chemistry.

Simulations of large molecular systems
Protein folding is a problem that is scientifically interesting [39,40] as well as medically important, given that protein misfolding is a cause of several diseases, such as Parkinson's disease [41].Borgia et al. demonstrated that similarities in amino acid sequences between adjacent domains induced protein misfolding in multidomain proteins using CGMD simulations [42].They used a Gō-like model in which each amino acid residue was represented as a single unified atom, and either an attractive or a repulsive interaction was defined between any two residues [43].Water molecules and ions were not incorporated, drastically decreasing the degrees of freedom in their simulation system.This is a practical choice for effectively sampling protein structures among a vast conformational space in protein folding studies.
Protein aggregation, which is also a cause of certain diseases, including Alzheimer's disease, prion diseases, and Type II diabetes, is a phenomenon related to protein folding in that it can be regarded as an alternate folding pathway [44].Aggregation begins from the nucleation of proteins.
This nucleation process has been elucidated using coarse-grained simulations, including CGMD.
Nguyen and Hall reported that an amorphous aggregate of peptides was formed first, after which an ordered nucleus was formed, resulting in the lateral addition of β-sheets [45].
In their CGMD model (designated PRIME), four pseudo atoms were mapped to each amino acid residue, hydrogen bonding was incorporated, a solvent was implicitly modeled, and discontinuous potential energy functions were employed [46].Because the PRIME model allowed for alterations in the secondary structures of proteins, it was suitable for studying protein aggregation, as proteins undergo dynamic structural changes during this process.Discontinuous potential energy functions and implicit handling of solvents reduced the computational time and provided an advantage of sampling a vast conformational space.
Some types of viruses, including HIV, pose a threat to the life of their hosts.Thus, unraveling the processes involved in virus proliferation, including capsid formation, is important.In a huge molecular system that includes a virus capsid typically comprising hundreds of proteins, one CGMD atom unites a large number of amino acid residues (e.g., a protein subunit), and solvents are not incorporated.Although the definition of CGMD atoms (i.e., identifying CG sites) can be deduced from the structural requirements of virus capsids [51], this can also be achieved by AAMD simulations using X-ray crystallographic structures to parameterize constants, such as Lennard-Jones potential parameters [48].The effects of the lengths of polyions on the formation of a huge virus capsid can be explained using highly coarse-grained models [50].In this case, the effects of electrostatic interactions are naturally incorporated.In addition, because the effects of pH and salt concentration on capsid formation are thought to be non-negligible, in future applications these should be incorporated in CGMD models for more detailed simulations.
CGMD simulations are also used for protein structure predictions.Although membrane proteins are important pharmacological targets, the three dimensional structures of most of these proteins remain unknown.Bucher et al. constructed a structural model for phospholipase A 2 (PLA 2 ) by combining homology modeling and CGMD simulations [18].PLA 2 is a membrane protein that releases fatty acids when it hydrolyzes phospholipids.These play important roles in intracellular signal transduction and inflammatory processes that are associated with Alzheimer's disease, hypertensive heart failure, neurological diseases, and cancer.
The appropriate localization of a protein within a lipid bilayer is important for a protein's function.AAMD simulations require a large computational cost for the process of inserting a protein into a lipid bilayer, whereas CGMD simulations can handle this very efficiently.After the stabilization process of a protein within a lipid bilayer is simulated by CGMD, the coarse-grained geometry can be reverse grained to obtain all atom geometry.Then, extensive detailed analyses, including hydrogen bonding, can be achieved using AAMD simulations.
Cholesterol is a crucial component of mammalian cells, as it determines the structural, thermodynamic, and mechanical properties of lipid membranes [52].Flip-flop, namely the exchange of cholesterol molecules inside lipid bilayers, is important for efficient cholesterol tracking.This relatively time-consuming event (estimated half-time of <1 s) can be simulated using the Martini CGMD forcefield [53].

Pharmacological/toxicological applications
Applications of CGMD to in silico drug design are in progress.Lewis et al. used CGMD simulations to search for the optimal structure for an antiatherogenic agent [16].The interactions between oxidized low-density lipoproteins (LDL) and scavenger receptors (SR) on the cell surfaces of macrophages are important phenomena during arterial stiffening.Amphiphilic macromolecules (AM) are known to competitively inhibit the interactions between oxidized LDL and SR [16].
Because AMs can form highly variable structures, it is expensive to synthesize them chemically and to test their actions experimentally.Lewis et al. systematically generated different AM structures and estimated their actions using CGMD simulations.Using quantitative structure-activity relationships (QSAR) [54], a method for predicting the biological effects of a substance as per its chemical structure, they estimated the antiatherogenic effects of AMs based on their structures obtained from their CGMD simulations.AMs with high QSAR index values were experimentally confirmed to be effective because they exhibited strong interactions with SR and low oxidized LDL intake into macrophages.
CGMD is suitable for searching the feature spaces of several agents.Even when using CGMD that can track the dynamics of a molecular system on a relatively large time scale, direct observations of a chemical reaction (AM-SR interactions in the example above) are usually impossible.Therefore, using ingenious methods like QSAR is a key component for applying CGMD to in silico drug design.
If the target of a particular agent is located inside a cell, this agent should be able to pass through the lipid bilayer of a cell membrane.It has been shown that cationic nanoparticles (NPs) and cell-penetrating peptides (CPPs) are useful for delivering agents into cells.Determining the optimal size and the chemical composition of a carrier is crucial for developing a drug delivery system (DDS).Lin et al. suggested that the shapes of NPs and CPPs were important during their translocation through a lipid bilayer using CGMD simulations [4].Furthermore, carbon nanotubes are used for DDS. Lee et al. used CGMD to analyze the processes involved when incorporating carbon nanotubes into a lipid bilayer [19].
For gene therapy, DNA must be transferred into a cell.Lipofection is a method used to transfer DNA by coating DNA with lipids.Khalid et al. analyzed the interactions between DNA and lipids using CGMD [23].

Protein-protein interactions
A major application of CGMD is analyzing protein-protein interactions.The association and dissociation dynamics of an antibody (immunoglobulin G, IgG) and its affinity ligand Staphylococcus aureus protein A (SpA) were analyzed using CGMD [21].The binding status of IgG with SpA was evaluated using an index of conformational change, root mean square deviation (RMSD), and the potential energy between these proteins.In addition, the detailed contributions of each amino acid residue to the potential energy could be obtained.Liu et al. reported that the dissociation dynamics in response to changes in pH consisted of four phases based on an analysis of the hydrophobic interactions and electrostatic interactions between charged residues [21].To properly describe the protein-protein interactions involved, the effects of pH and Coulomb's forces should be incorporated into a CGMD model, which in this case was the Martini forcefield.
Oligomerization and self-aggregation of proteins are perfect targets for CGMD simulations because these molecular systems are relatively large and chemical actions play important roles within these systems.G protein-coupled receptors (GPCR) are important pharmacological targets, including photon receptor rhodopsins and cardiac adrenergic receptors.The functions of these proteins are significantly modified by the formation of oligomers within a lipid bilayer.Mondal et al. found that hydrophobic mismatches between a protein and lipids were crucial for the oligomerization of βadrenergic receptors [15].Another example is the protein Ras, which is associated with cancerous transformation of cells, as Ras forms clusters in a structure-dependent manner [17].

Experimental Validation of CGMD Simulations
We will now discuss methods used for comparing the results obtained from simulations with those experimentally obtained by roughly categorizing these into structure-oriented properties and dynamics-oriented properties.Length is a simple structural metric of a molecule.López et al.
compared the thickness of a glycolipid membrane with that experimentally obtained [22].Khalid et al. compared the persistence length (index of polymer hardness) of coarse-grained duplex DNA obtained from simulations with that experimentally obtained [23].
The area per lipid is often used to analyze the properties of lipid membranes [22,23,26].Lipid molecules can assume various phases, including liposomes, micelles, bilayer sheets, and hexagonal tubes, depending on environmental conditions like pH.Experimental observations of the hexagonal phase can be performed by freeze-fracture electron microscopy or X-ray diffraction.Dhalberg et al.
compared the phase preferences and hexagonal spacing of lipid molecules obtained from simulations with those experimentally obtained [26].Comparisons of the structural patterns of a molecular assembly discussed above are often used and applied to other cases, such as analyzing the pore diameters of ion channels and the distances between domains/molecules.
In addition to the structural properties, dynamic properties are often used to compare the results from simulations and experiments.With regard to the phase transitions of lipid molecules, the transition temperature is another property used for comparisons [26].Khalid et al. confirmed that the concentration dependence of cationic lipids fit well with experimental results for DNA chain spacing [23].Another sophisticated example for validating CGMD simulations is analyzing the solvent accessible surfaces of proteins [18].This property can be measured by deuterium exchange mass spectrometry and compared with CGMD results.

Conclusions
There are certain biomolecular details that can only be revealed by CGMD simulations.CGMD is a powerful tool for simulating molecular systems with large spatial and temporal scales, where chemical interactions are crucial.Protein-protein interactions, including self-assembly and oligomerization, protein-lipid interactions, and even analyzing the effects of artificial substances, such as carbon nanotubes on biological systems are targets for CGMD simulations.CGMD simulation results can be validated by comparing the structural and dynamic properties of a molecular system with those experimentally obtained.Uniform CGMD methods to handle lipids, proteins, and nucleic acids are under development.
As we have discussed in this article, applicable targets of CGMD simulations are rapidly expanding.However, CGMD alone is not sufficient to provide scientific and biological insights into the extensive molecular systems incorporated in a cell.A possible direction for the evolution of CGMD simulations is seamless cooperation with larger systems, such as the finite element method (FEM) based on continuum dynamics, and with smaller systems, such as AAMD.Approaches to associate simulations on different scales are currently in progress [55][56][57].Combining CGMD and FEM is anticipated to be particularly useful for analyzing the mechanical properties of cellular components, such as the extracellular matrix, and their implications for cellular functions.
Combinations of CGMD and AAMD will synergize each other's strengths; namely, the accuracy of AAMD and the speed of CGMD.In conclusion, CGMD simulations will contribute to resolving problems in more varied situations than ever before.

Figure 1 .
Figure 1.Increase in the number of CGMD-related papers.Pubmed was used to search for the numbers of articles published per year that included "coarse grained" and "molecular dynamics".

Figure 2 .
Figure 2. Example of a CGMD simulation.This molecular system consisted of one HIV-1 viral protein U (Vpu, PDB id: 2JPX), 30 POPEs, 30 POPCs, 30 DPPEs, 30 DPPCs, 30 galactosylceramides, 30 cholesterols, and 3000 CG water atoms.The total number of CG atoms was 5178.Lipids and water particles were initially randomly placed within a 12 x 12 x 12 nm box, after which the self-assembly of lipid molecules was observed.The position of the Vpu protein was constrained.The size of each picture corresponds to the size of a periodic boundary box.In the top views from 0.2 to 1.0 µs, water atoms were omitted for clarity.Simulation for 1 µs was performed using a personal laptop computer (2.1 GHz single core) with a computation time of about 9 h.

Figure 3 .
Figure 3. Opening of a bacterial mechanosensitive channel of large conductance (MscL, PDB accession number: 2OAR) in response to lipid bilayer stretch obtained by CGMD simulations.An MscL channel is embedded in a lipid bilayer consisting of POPC lipids, which are shown in stick representation.The simulation box is filled with CG water atoms.The channel's pore enlarged and the bilayer thickness decreased after applying stretch.Left: before applying tension, right: 4 µs after applying tension.