Generalized-ensemble molecular dynamics and Monte Carlo algorithms beyond the limit of the multicanonical algorithm

I review two new generalized-ensemble algorithms for molecular dynamics and Monte Carlo simulations of biomolecules, that is, the multibaric–multithermal algorithm and the partial multicanonical algorithm. In the multibaric–multithermal algorithm, two-dimensional random walks not only in the potential-energy space but also in the volume space are realized. One can discuss the temperature dependence and pressure dependence of biomolecules with this algorithm. The partial multicanonical simulation samples a wide range of only an important part of potential energy, so that one can concentrate the effort to determine a multicanonical weight factor only on the important energy terms. This algorithm has higher sampling efficiency than the multicanonical and canonical algorithms.


Introduction
Biomolecules such as proteins have complicated free energy surfaces with many local minima. Conventional molecular dynamics (MD) and Monte Carlo (MC) simulations in physical ensembles, such as the canonical [1][2][3][4] and isobaricisothermal [5,6] ensemble, tend to get trapped in these localminimum states. One of the powerful techniques to avoid this difficulty is generalized-ensemble algorithms [7][8][9][10][11], such as the multicanonical algorithm [12][13][14][15]. In the multicanonical ensemble, a free one-dimensional random walk is realized in the potential-energy space and a simulation does not get trapped in free-energy-minimum states.
However, because the multicanonical simulation is performed in a fixed volume, neither the pressure dependence nor the temperature dependence at certain pressure can be investigated as in experiments. In order to overcome this difficulty, I proposed recently the multibaric-multithermal algorithm [16][17][18][19][20][21][22]. In this ensemble, two-dimensional random walks are realized both in the potential-energy space and in the volume space, so that the temperature and pressure dependence can be discussed.
Another problem of the multicanonical algorithm is that the determination of the multicanonical weight factor to give a flat distribution becomes difficult in a large system. In order to alleviate this difficulty, I proposed the partial multicanonical algorithm [23,24]. The partial multicanonical simulation samples a wide range of a part of the potential energy terms, which is necessary to sample the conformational space widely. Thus, one can concentrate the effort to determine a multicanonical weight factor only on the important energy terms, so that the partial multicanonical algorithm has a higher sampling efficiency than the multicanonical and canonical algorithms.
In this review article, I explain the multibaricmultithermal algorithm and partial multicanonical algorithm. Applications of these algorithms to an alanine dipeptide in explicit water are also presented.
The multicanonical algorithm is briefly reviewed in section 2. In section 3, the multibaric-multithermal algorithm is explained. The partial multicanonical algorithm is described in section 4. Section 5 is devoted to conclusions.

The multicanonical algorithm
In the canonical ensemble, the distribution P NVT (E) of total potential energy E is given by where n(E) is the density of states, β 0 is the inverse of the product of the Boltzmann constant k B and absolute temperature T 0 , at which simulations are performed. This ensemble has a bell-shaped distribution in the potential-energy space.
In the multicanonical ensemble [12][13][14][15], every state is sampled with a weight factor W muca (E) ≡ exp{−β 0 E muca (E)} so that a uniform distribution of E may be obtained, where E muca (E) and W muca (E) are referred to as the multicanonical energy and the multicanonical weight factor, respectively. The difference between E muca and E is written is characteristic of the multicanonical simulation. The case of δ E(E) = 0 gives the regular canonical ensemble. The multicanonical simulation covers a wide range of E and escapes from local-minimum free-energy states. The expectation value of physical quantity A at temperature T is obtained by reweighting techniques [25] from where . . . muca is the multicanonical ensemble average. Because of the random walks in the potential-energy space, we can calculate physical quantities in a wide range of T . The multicanonical weight factor W muca (E) is not a priori known and has to be determined, for example, by the iterations of short simulations [26][27][28] and the Wang-Landau techniques [29].

The multibaric-multithermal algorithm
In the isobaric-isothermal ensemble [5,6], the distribution P NPT (E, V ) of potential energy E and volume V is given by where n(E, V ) is the density of states as a function of E and V and H is the 'enthalpy' (without the kinetic energy contributions): H = E + P 0 V . Here, P 0 is the pressure at which simulations are performed. This ensemble has bell-shaped distributions both in the potential-energy space and in the volume space, as shown in figure 1(a). In order to obtain the isobaric-isothermal ensemble, the combination of the Nosé thermostat [1,2] and the Andersen barostat [5] is frequently employed.
In the multibaric-multithermal ensemble [16][17][18][19][20], every state is sampled with a weight factor W mbt (E, V ) ≡ exp {−β 0 H mbt (E, V )} so that a uniform distribution of both E and V , as shown in figure 1(b), may be obtained, Here, W mbt (E, V ) and H mbt are referred to as the multibaricmultithermal weight factor and the multibaric-multithermal enthalpy, respectively. The difference between H mbt and H is written is therefore characteristic of the multibaric-multithermal simulation. The case of δ H (E, V ) = 0 gives the regular isobaric-isothermal ensemble. The isobaric-isothermal Hamiltonian H mbt for a system consisting of N atoms based on the Nosé thermostat [1,2] and the Andersen barostat [5] is given by The variabler i is the coordinate scaled by the length of the simulation box:r i = V −1/3 r i (r i is the real coordinate). We have introduced a simplified notation for the set of the scaled coordinates,r ≡ {r 1 ,r 2 , . . . ,r N }. The variablep i is the conjugate momentum forr i . The real momentum p i is related to the virtual momentump i by p i =p i /V 1/3 s, where s is the additional degree of freedom for the Nosé thermostat. The variable P V and P s are the conjugate momenta for V and s, respectively. The real-time interval t is associated with the virtual time interval t by t = t /s. The constant m i is the mass of atom i. The constants W and Q are the artificial 'mass' related to V and s, respectively. In the case of a system consisting of N atoms, g equals 3N if the time development is performed in real time t, or g equals 3N + 1 if the time development is performed in virtual time t . The equations of motion in real time are given from the Hamiltonian in equation (6) bẏ where the dot (·) stands for the real-time derivative d/dt and F i stands for the force acting on atom i. Performing the MD simulation by the equations of motion in equations (7)-(12), the multibaric-multithermal distribution P mbt (E, V ) in equation (5) is obtained.
the Hamiltonian in equation (6) and the equations of motion in equations (7)-(12) become those for the regular isobaric-isothermal MD simulation of the Nosé-Andersen formulation.
In order to perform a multibaric-multithermal MC simulation, we employ the Metropolis criterion onr i and V . The trial moves fromr i tor i and is generated by uniform random numbers. The multibaric-multithermal enthalpy is consequently changed from H mbt (r, V ) to H mbt ≡ H mbt (r , V ) by the trial moves. The trial moves are now accepted with the probability (13) The multibaric-multithermal probability distribution P mbt (E, V ) is obtained by this scheme.
After an optimal weight factor W mbt (E,V ) is determined-for example, by the iterations of short simulations [26][27][28] and by the Wang-Landau techniques [29]-a long production run is performed for data collection. The reweighting techniques [25] are used for the results of the production run to calculate the isobaric-isothermal-ensemble φ ψ averages. The expectation value of a physical quantity A at T and P is then obtained by where . . . mbt is the multibaric-multithermal ensemble average. Because of the random walks both in the potential-energy space and in the volume space, we can calculate physical quantities in wide ranges of T and P. I introduce here an application of the multibaricmultithermal MD algorithm to a system consisting of one alanine dipeptide molecule ((S)-2-(acetylamino)-N-methylpropanamide) and 63 water molecules [22]. The AMBER parm96 force field [30] was used for the alanine dipeptide molecule and the TIP3P rigid-body [31] model was employed for the water molecules. The electrostatic potential was calculated by the Ewald method. The time step was taken to be t = 0.5 fs. The temperature was set as T 0 = 600 K and the pressure was set as P 0 = 300 MPa. The MD simulations were performed with the symplectic combination [32] of the Nosé-Poincaré thermostat [33,34], the Andersen barostat [5] and the symplectic quaternion scheme [35]. In order to obtain an optimal multibaric-multithermal weight factor W mbt (E, V ), two-dimensional versions of both the iterative method [26][27][28] and the Wang-Landau techniques [29] for multicanonical weight-factor determination were employed. An optimal weight factor was obtained after these preliminary simulations in total for 2.61 ns in the AMBER parm96 simulation. Two sets of the initial values of the alanine-dipeptide backbone dihedral angles were prepared for the data-production runs. The initial condition 1 was φ = ψ = 180 • and the initial condition 2 was φ = 180 • and ψ = 0 • . The initial conformation 1 is shown in figure 2. We then performed a long multibaric-multithermal MD simulation for data collection. The simulation time was 20 ns for each initial condition. The isobaric-isothermal MD simulations of the same system were also performed for comparisons. Figure 3 shows the contour maps of the distributions P (φ, ψ) at T = 240 K and P = 0.1 MPa by the MD simulations started from the initial condition 1. The distribution P (φ, ψ) by the multibaric-multithermal MD were obtained using the reweighting techniques. The isobaric-isothermal distribution P (φ, ψ) has only four peaks of the α R , α P , P II and C 5 states, as shown in figure 3(a). On the other hand, the multibaric-multithermal  MD simulation provides correct distributions P (φ, ψ) with all the six peaks of the α R , α P , P II , C 5 , α L , and C ax 7 states, as shown in figure 3(b). These results mean that the multibaric-multithermal simulation can get over the free energy barriers and has a high sampling efficiency.
The volume under the surface P (φ, ψ) around each peak corresponds to the population W of each state. The ratio of each state population W against the P II state population was calculated as a function of the inverse of temperature 1/T at the constant pressure of P = 0.1 MPa, as shown in figure 4(a) and as a function of the inverse of pressure P at the constant temperature of T = 298 K, as shown in figure 4(b). The difference in partial molar enthalpy H and the difference in partial molar volume V can be obtained from the gradient in figure 4(a) and that in figure 4(b), respectively. Partial molar enthalpy and partial molar volume are important quantities in solution chemistry, which describes the change in the ratio of each state population due to temperature and pressure changes. As the temperature increases, the relative population W C 5 /W P II of the C 5 state increases, as shown in figure 4(a). This indicates that the enthalpy for the C 5 state is higher than that of the P II state. The difference in partial molar enthalpy H and the difference in partial molar volume V of the C 5 state from that of the P II state, for example, are calculated from the derivative of W C 5 /W P II by where R is the gas constant: R = 8.3145 J (mol K ) −1 . The values of H and V are listed in tables 1 and 2, respectively. The differences H and V of the C 5 state and H of the α R state are consistent in the multibaric-multithermal MD and the Raman spectroscopy experimental data [37]. The value of V of the α R state, however, is smaller than that of the Raman spectroscopy.   Table 1. Differences H/(kJ mol −1 ) in partial molar enthalpy of the C 5 and α R states from that of the P II state calculated by the multibaric-multithermal (MUBATH) MD simulations. Raman experimental data are taken from [37].
State MUBATH MD Raman exp. Table 2. Differences V /(cm 3 mol −1 ) in partial molar volume of the C 5 and α R states from that of the P II state calculated by the multibaric-multithermal (MUBATH) MD simulations. Raman experimental data are taken from [37].
State MUBATH MD Raman exp.
The multicanonical simulation can escape from local-minimum free-energy states but cannot change volume. The conventional isobaric-isothermal MD simulation can change volume but is trapped in a local free energy minimum. Therefore, H and V cannot be calculated by these simulation algorithms. The multibaric-multithermal algorithm has the advantages of both algorithms. This is the first simulational work to calculate H and V of biomolecules.

The partial multicanonical algorithm
Although the multicanonical algorithm is a powerful technique, the determination of the multicanonical weight factor is time consuming for a large system. In order to alleviate the effort to determine the weight factor, the total potential energy is divided into two terms in the partial multicanonical algorithm [23,24], where E 1 is a part of the potential energy which is important to sample a wide range of the conformational space. The partial potential energy E 1 can be taken in any manner, in principle. The electrostatic, Lennard Jones and torsion-angle energy terms between solute atoms may be included in a biomolecular system. The second term E 2 is the rest of the potential energy. The partial multicanonical algorithm covers a wide range of E 1 . The partial multicanonical simulation is performed with a weight factor W pmuca (E 1 , where E pmuca (E 1 , E 2 ) and W pmuca (E 1 , E 2 ) are referred to as the partial multicanonical energy and the partial multicanonical weight factor, respectively. The difference between E pmuca and E depends only on E 1 and is written as The difference δ E 1 (E 1 ) is characteristic of the partial multicanonical simulation. If we define here a distribution P pmuca (E 1 ) of E 1 and the density of states n(E 1 ) of E 1 as equation (17) is written as where . . . pmuca is the partial multicanonical ensemble average. Note that A NVT can be calculated correctly at a temperature T only in the vicinity of T 0 , although A NVT can be calculated in a wide range of T in the multicanonical algorithm. This is because a limited range of E 2 is sampled with the normal Boltzmann weight factor exp(−β 0 E 2 ) in partial multicanonical algorithm.
The scheme of the partial multicanonical algorithm includes the canonical and the multicanonical algorithms. In the case of E 1 = 0 and E 2 = E, this algorithm becomes the regular canonical algorithm. In the case of E 1 = E and E 2 = 0, the multicanonical algorithm is obtained.
In order to perform a partial multicanonical MD simulation, a constant-temperature MD method is employed. The partial multicanonical Hamiltonian based on the Nosé thermostat [1,2] is given by where p i is the conjugate momentum for r i . The real momentum p i is related to the virtual momentum p i by p i = p i /s. In the Nosé-Hoover formalism [1][2][3], a variable ς ≡ṡ/s is introduced and the equations of motion are given from the Hamiltonian in equation (23) bẏ Equation (25) is obtained from regular canonical-ensemble equations of motion by modifying force from to For a partial multicanonical MC simulation, we perform Metropolis sampling on the coordinates r. The trial moves from r i to r i are generated by uniform random numbers. The partial multicanonical energy is consequently changed from E pmuca (r) to E pmuca ≡ E pmuca (r ) by these trial moves. The trial moves are accepted by the Metropolis criterion [4] with the probability The partial multicanonical probability distribution P pmuca (E 1 ) is obtained by this scheme. The partial multicanonical MD algorithm was applied to a system consisting of one alanine dipeptide molecule and 67 water molecules [23]. The total potential energy is the sum of the potential energy between solute atoms E slt-slt , that of the solute-solvent interaction E slt-svn , and that between solvent atoms E svn-svn , as follows: Each term for the AMBER force field [30] is given by where the subscripts bond, angle, torsion, imp, elec and LJ stand for the bond-length, bond-angle, torsion-angle, improper torsion-angle, electrostatic and Lennard Jones potential energy, respectively. Although these terms may vary for different force fields, the partial multicanonical algorithm can be applied in essentially the same way. Even if we sample wide ranges of bond-length, bond-angle and improper torsion-angle energy in E slt-slt , the structure of the solute molecule is not expected to be changed very much. Sampling a wide range of the solvent energy E svn-svn would not change the solute structure very much, either. Sampling a wide range of the solute-solvent potential energy E slt-svn may sample a wide range of the interface area between the solute molecule and solvent molecules. The solute molecule of the alanine dipeptide in this simulation, however, is small and this effect was not taken into account in this simulation. Finally, the partial potential energy E 1 was here taken to be the sum of the torsion-angle, electrostatic and Lennard Johnes energy between the solute molecules as follows: and E 2 was the rest of the potential energy, (35) Note that this choice of E 1 is not always the best. The best choice of E 1 should depend on the system and the force field. In order to obtain an optimal partial multicanonical weight factor W pmuca (E 1 , E 2 ), both the iterative method [26][27][28] and the Wang-Landau techniques [29] were employed. An optimal weight factor was obtained after these preliminary simulations for 2 ns in total. I then performed a long partial multicanonical MD simulation for data collection for 6 ns at T 0 = 300 K.
The probability distributions of E and E 1 are shown in figure 5. The distribution of E in the multicanonical simulation was much wider than those in the other two simulations. The partial multicanonical algorithm sampled a much wider range of E 1 . These generalized-ensemble MD simulations actually sampled a wide range of the potential energy, which and only which are predesignated to be sampled widely.
The time series of φ is shown in figure 6. The canonical MD simulation did not make a complete rotation, as shown in figure 6(a). It covered only 65.6% of the whole range of φ. There seem to be potential energy barriers around φ = 0 • and φ = 120 • , which cannot be overcome by the canonical simulation (these energy barriers correspond to the boundaries between the local-minimum states). This means that the canonical simulation was trapped in local-minimum free-energy states. The multicanonical MD simulation also did not make a complete rotation and sampled 82.8% of the whole range of φ, as shown in figure 6(b). It overcame the potential energy barrier around φ = 0 • but did not overcome that around φ = 120 • . The partial multicanonical MD simulation, on the other hand, overcame both energy barriers and covered the whole range of φ, as shown in figure 6(c). The backbone dihedral angle φ underwent a complete rotation (covering 360 • ) six times (six turns) in this MD simulation for 6 ns.
Note that these results do not mean that the multicanonical algorithm cannot sample the whole range of φ. If a better weight factor W muca (E) were obtained with the longer preliminary simulations, the whole range of φ could be sampled also in the multicanonical ensemble. These results mean that the partial multicanonical algorithm shows higher sampling efficiency than the multicanonical algorithm if the weight factor is determined with the same effort.
The probability distributions P NVT (φ, ψ) of φ and ψ are shown in figure 7. In order to obtain the distribution P NVT (φ, ψ) in the canonical ensemble from the multicanonical simulation and that from the partial   Figure 7. Contour maps of the logarithms of the probability distributions log P NVT (φ, ψ) at T = 300 K obtained (a) by the canonical MD simulation, (b) by the reweighting techniques from the results of the multicanonical MD simulation and (c) by the reweighting techniques from the results of the partial multicanonical MD simulation. multicanonical simulation, the reweighting techniques in equation (3) and those in equation (22) were used, respectively. The probability distributions P NVT (φ, ψ) by the canonical simulation show four peaks of the P II , C 5 , α R , and α P states, as shown in figure 7(a). However, they do not have any peaks in the range of 0 • < φ < 120 • . The distribution P NVT (φ, ψ) by the multicanonical MD simulation has five peaks of the P II , C 5 , α R , α P and α L states, as shown in figure 7(b). The partial multicanonical MD simulation also sampled the C ax 7 states in addition to the five peaks that were sampled in the multicanonical simulation, as shown in figure 7(c). In the present simulations, only the partial multicanonical simulation sampled all of the six states of the alanine dipeptide.

Conclusions
I have reviewed two new generalized-ensemble MD and MC algorithms: the multibaric-multithermal and partial multicanonical algorithms. These generalized-ensemble algorithms allow the simulation to escape from localminimum free-energy states and sample many conformations.
The multibaric-multithermal simulation performs a two-dimensional random walk both in the potential-energy space and in the volume space so that one can obtain various isobaric-isothermal ensemble averages at different temperatures and pressures from only one simulation run. From the temperature and pressure dependences of an alanine dipeptide in explicit water, the differences in the partial molar enthalpy H and the partial molar volume V were calculated between the P II state and other states. This is the first work to calculate H and V by molecular dynamics simulations. The multibaric-multithermal algorithm will thus be a powerful simulation technique to study the temperature and pressure dependences of biomolecules like proteins.
In the partial multicanonical algorithm, the simulation covers a wide range of only important potential energy, which is necessary to investigate a wide range of the conformational space. Thus, one can concentrate the effort for the weight factor determination on the important energy terms. The MD simulation of an alanine dipeptide showed that the multicanonical simulation showed higher sampling efficiency than the canonical simulation. The partial multicanonical algorithm will thus be of great use for the protein folding problem. This idea, in which a simulation covers a wide range of only important potential energy terms, can be utilized not only for the multicanonical algorithm but also for multidimensional extensions of the multicanonical algorithm, such as the multibaric-multithermal [16][17][18][19][20][21][22] and multicanonical-multioverlap [38] algorithms.