Superposition-Enhanced Estimation of Optimal Temperature Spacings for Parallel Tempering Simulations

Effective parallel tempering simulations rely crucially on a properly chosen sequence of temperatures. While it is desirable to achieve a uniform exchange acceptance rate across neighboring replicas, finding a set of temperatures that achieves this end is often a difficult task, in particular for systems undergoing phase transitions. Here we present a method for determination of optimal replica spacings, which is based upon knowledge of local minima in the potential energy landscape. Working within the harmonic superposition approximation, we derive an analytic expression for the parallel tempering acceptance rate as a function of the replica temperatures. For a particular system and a given database of minima, we show how this expression can be used to determine optimal temperatures that achieve a desired uniform acceptance rate. We test our strategy for two atomic clusters that exhibit broken ergodicity, demonstrating that our method achieves uniform acceptance as well as significant efficiency gains.


INTRODUCTION
Effective equilibrium sampling of complex systems remains a major challenge in computational thermodynamics. Systems exhibiting broken ergodicity, reliant on rare event fluctuations, or otherwise locally "trapped", are characterized by long equilibration times due to large (free) energy barriers separating metastable states. This problem is particularly acute for systems undergoing strong first-order-like phase transitions, such as transitions corresponding to protein crystallization, crystal polymorphism, and other changes in morphology for condensed-matter systems. 1−3 Parallel tempering (PT) 3−5 has emerged as a powerful method to analyze such complex systems. Here, several copies of a given system are simulated at different temperatures, with occasional moves that attempt to swap configurations of neighboring replicas. By virtue of these exchange moves, the low-temperature replicas gain access to larger thermal fluctuations of the high-temperature replicas, facilitating quicker transitions between metastable states.
The effectiveness of a given PT simulation depends crucially upon the sequence of temperatures that define the replicas. The temperature spacing should be small enough to produce a reasonable acceptance rate between neighboring replicas, yet large enough so that computational resources are not wasted on an unnecessarily large number of replicas. Generally, a uniform acceptance rate is desirable across all neighboring replica pairs. 6−11 This condition ensures that the progression of replica trajectories among a ladder of temperatures is uniform, providing an equilibration of the combined replica system that is not hindered by bottlenecks along the temperature axis.
In general it is a difficult problem to find a temperature progression that achieves a uniform acceptance rate, often requiring an iterative or adaptive approach before production runs. 12−14 However, theoretical progress has also been made under certain assumptions about the thermodynamic behavior of the system of interest. Kofke has shown that a geometric progression of temperatures leads to a uniform acceptance rate for systems with constant heat capacity. 7,8 Today the geometric spacing is a standard rule of thumb, considered a best guess if nothing else is known about the system of interest. Yet it is not always optimal. Indeed, systems undergoing phase transitions will generally have a significant heat capacity peak, in which case a constant heat capacity approximation is ovbiously inaccurate, and hence a temperature progression given by eq 1 is certainly not optimal (see Figure 1). Other methods for determination of optimal PT temperatures have focused on uniform acceptance for solvated protein systems, 15 optimizing replica diffusion through iterative approaches, 16 or have made temperature a dynamical parameter. 17−20 In the following we will be concerned with obtaining a uniform acceptance profile, although the method we describe can be used to optimize according to other criteria.
In the present contribution we present a method for determining optimal PT temperatures, which can be especially useful for systems undergoing phase transitions. The method relies upon knowledge of the competing phases, represented as configurational basins on the potential energy landscape. Using the harmonic superposition approximation (HSA), 2 in which the landscape is approximated by harmonic wells at the various minima, we approximate thermodynamic quantities, such as the energy density of states, heat capacities, and the various free energies of the competing phases. With the approximate distributions obtained by the HSA we can estimate the parallel tempering acceptance probability, using information contained solely within a database of configurational local minima. To arrive at our result, we subdivide the configuration space into separate basins of attraction on the potential energy landscape and express the PT acceptance rate of a given replica pair as a weighted sum over contributions from individual pairs of minima (eq 2). In this way, the replica pair acceptance rates are determined explicitly from the relative weights of the various competing phases, along with their associated conditional acceptance probabilities. Using the known analytic energy distributions within the HSA, we derive an analytic expression, eq 10, for the conditional acceptance rates. The overall PT rate is then obtained by weighting each of these terms by the HSA local free energies of the individual wells. With this expression at our disposal, we find an optimized temperature progression via a simple algorithm, described in Section 3, which matches each neighboring pair of temperatures to a target acceptance rate.
Our method exploits the HSA to represent the landscape. Because the HSA becomes exact in the limit of zero temperature, we expect our method to be ideal for transitions at low temperatures compared to the barrier heights between phases. Indeed, low-temperature solid−solid phase transitions of many atomic clusters are very accurately described by the HSA. 2 While the HSA is in practice never an exact approximation, it is clear why this method could more accurately predict thermodynamics of systems undergoing phase transitions: While a constant heat capacity assumption is certainly unjustifiable for an arbitrary system, at a relatively low temperature it is reasonable to assume that, within a certain basin of attraction, the system samples an energy landscape that is roughly harmonic, to a first approximation.
While our method relies upon knowledge of competing minima, finding these structures is computationally inexpensive compared to characterizing the thermodynamics. 21,22 Because the end result of such techniques is not thermodynamics but a list of minima, the dynamics does not require detailed balance to be satisfied. Software such as GMIN 23 and its python-based counterpart pele 24 employ global optimization tools of energy landscape exploration and provide rapid, efficient, and accurate determination of low-lying minima. We also note that recent developments in thermodynamic sampling algorithms (which do satisfy detailed balance) have exploited the HSA to obtain more accurate thermodynamic results using information contained in a database of minima. Examples include parallel tempering with an auxiliary "reservoir" replica, which samples the HSA distribution, 25,26 superposition-enhanced nested sampling, which draws samples from the HSA, 27 and the hybrid basin-sampling method. 21 Similar methods such as smart-darting, 28 based upon smart-walking, 29 employ similar ideas with collective movesets that hop between minima.
The remainder of the paper is organized as follows. In Section 2 we derive an expression for the PT acceptance probability within the HSA approximation. Our central result, eq 10, allows us to estimate PT acceptance rates solely from a database of minima. Following this estimation, in Section 3 we describe how this expression can be exploited to determine a set of temperatures which achieve a uniform acceptance rate. In Section 4 we then test this prescription on two systems exhibiting low-temperature solid−solid phase transitions, namely Lennard-Jones clusters with 31 and 75 particles, LJ 31 and LJ 75 , showing that our method can achieve uniform acceptance rates across phase transitions and perform more efficiently than simulations using standard temperature progressions. Finally, in Section 5, we discuss the conditions under which the current strategy can be most effective and useful.

DERIVATION OF ACCEPTANCE PROBABILITY EXPRESSION
Consider a PT simulation for a system described by a potential energy V(x), where x is a point in the system's configuration space. Let A and B be two replicas in this simulation, at temperatures T A and T B , respectively, which sample the system canonically, so that ρ A (x) ∝ e −V/kT A etc. In the following we assume T B > T A . We desire an expression for the average PT acceptance rate, ⟨P acc ⟩, between our pair of replicas (A, B). To begin our analysis we divide the configuration space into a finite number of wells, indexed by w. Here a point x in configuration space belongs to well w if it lies in the corresponding basin of attraction for a given local minimum, defined by steepestdescent pathways which converge to the minimum in question. 2,30 Without loss of generality we can write any configuration space average as a weighted sum over conditional averages. Hence we can express the average PT acceptance probability of this pair as where p w i = Z w i /Z i is the equilibrium occupation probability for well w in replica i and P̅ acc is the average acceptance probability, conditioned upon A and B residing in wells w and w′, respectively.
We desire an explicit expression for ⟨P acc ⟩ in terms of T A and T B , via eq 2. While the well weights p w i can be approximated by the corresponding HSA well free energies (see Section 3), more challenging is the conditional acceptance, P̅ acc (w, w′). The central result of this section is an analytic expression for P̅ acc (w, w′), eq 10, which we now derive.
We begin by writing the conditional acceptance rate as a configuration-space average: x ∈ w and 0 otherwise. Using properties of the min function it can easily be shown 31 that eq 3 can be rewritten as 4) or alternatively, in terms of energy distributions p A and p B : The distribution p i (ϵ|w) corresponds to the canonical energy distribution of replica i (at temperature T i ) when restricted to well w.
Our expression for the conditional acceptance probability, eq 5, is exact, but the energy distributions p i (ϵ|w) are unknown. In the following analysis we approximate these quantities by the (known) HSA distributions, described by a gamma distribution. For replica A, for instance, we have where ϵ w is the potential energy of the local minimum associated with well w, Γ is the gamma function, and 2κ is the number of configurational degrees of freedom. Note that this distribution only depends on the energy, ϵ w , of the minimum in question and not the shape (curvature) of the well. In the limit κ → ∞ this distribution approaches a Gaussian, described by cumulants: Assuming this Gaussian limit has been reached (since 2κ is the number of degrees of freedom, this is not a severe assumption), we can evaluate eq 5 as In going from the first to second line, we used the definition of the error function and exploited the fact that it is an odd function. Equation 8 is an integral of a Gaussian convoluted with an error function, which is where Δμ ww′ = μ B (w′) − μ A (w) and r ww′ = (σ A (w) 2 + σ B (w′) 2 ) 1/2 . Substituting in the cumulants for our replicas, eq 7, we arrive at our final expression: Here γ = T B /T A , ΔT = T B − T A , and Δϵ ww′ = ϵ w′ − ϵ w is the potential energy difference between the local minima corresponding to wells w and w′.
It is instructive to consider limiting cases of eq 10. For w = w′, we have Δϵ ww′ = 0, and our expression reduces to the previous result derived by Kofke for constant C V systems with Gaussian energy distributions (see eq 13 of refs 8 and 7). Here we find that γ = T B /T A alone controls the acceptance rate, validating a geometric temperature progression. However, if w ≠ w′, then the acceptance deviates from "geometric behavior" by a factor Δϵ ww′ /κΔT, which becomes significant when the energy splitting between minima is comparable to the difference in average internal energy of the two replicas. We expect such deviations to appear near a phase transition, when multiple wells contribute to the thermodynamics.

OPTIMIZING THE REPLICA SPACING
With eq 10 in hand, we can now estimate the full acceptance probability ⟨P acc ⟩, in terms of T A and T B . The occupation probabilities p w i , which weight the individual contributions P̅ acc (w,w′), are conveniently expressed in terms of their free energies f w . For a replica at inverse temperature β, we have Within the HSA, the free energy is given by with local minimum potential energy ϵ w and well entropy Here the entropy is specified by two standard features of the well: (1) the geometric mean vibrational frequency, v ̅ w , which describes the curvature of the minimum; and (2) the number of distinguishable permutation-inversion isomers, n w , determined by the point group associated with the given minimum. 2 Equations 2, 10, and 12−14 provide an explicit expression for our acceptance probability, which we write as (15) and depends upon the database of minima. Using this expression, we now optimize a sequence of M temperatures, as follows. First, a target acceptance rate p acc * is chosen, and an initial temperature T 1 is specified. A sequence of temperatues T 2 , ..., T M is then built from the bottom up: given a temperature T i , T i+1 is determined from the condition: This is a standard optimization problem, which can be solved in various ways. 2,32 Hence we generate a sequence of M temperatures which, within the HSA approximation, each have a uniform acceptance of p acc * and which relies only on the database of minima.

PERFORMANCE ON TEST SYSTEMS
In this section we evaluate the effectiveness of our method by applying it to two model systems, clusters of Lennard-Jones particles, interacting via the pairwise potential: These clusters can exhibit strong first-order transitions at low temperatures 2,25,26,33−35 and as such are prototypical examples of complex systems exhibiting broken ergodicity. Hence they have been used extensively for benchmark studies of thermodynamic sampling methods. 2,21,25−27 In what follows we will focus on LJ 31 and LJ 75 , clusters of 31 and 75 particles, respectively. Both of these systems exhibit low-temperature solid−solid transitions indicated by pronounced peaks in the heat capacity profile (Figure 1), making them ideal candidates for testing our method.
To evaluate the effectiveness of our strategy we perform two PT simulations for each of our LJ systems: one simulation employing an HSA-optimized set of temperatures, and the other using a standard geometric temperature progression. Because this approach is intended to be useful near lowtemperature phase transitions, we consider a temperature range that spans the peak in the heat capacity profile (Figure 1). For each of our test systems this range was achieved using M = 12 replicas. The precise HSA-optimized sequence was determined via the optimization scheme described in Section 3, using a relatively cautious 14,36 target acceptance rate of p acc * = 0.22. The corresponding reference simulations employed a temperature sequence defined by the recursion T i = γT i−1 , where the constant γ was chosen to achieve the acceptance rate p acc * = 0.22 for the lowest temperature pair.
In order to uniquely specify the temperature sequence, we must also specify initial conditions from which the optimization scheme and geometric recursion begin. Rather than (arbitrarily) choosing an initial temperature T 1 , we instead require an intermediate replica R ≈ M/2 to be located at the peak of the C V curve, i.e., T R = T*. In this way, the temperature progression of the HSA and reference simulations coincide exactly at replica R, located at the C V peak. For our simulations we chose R = 5 and 6 for LJ 31 and LJ 75 , respectively. This choice of initial condition is convenient for the purposes of the advanced simulation strategy which we describe below, in addition to being less arbitrary than a value of T 1 chosen ad hoc. In Figure  2 we compare the temperature set obtained from HSA optimization to the reference geometric progression for the case of LJ 31 . The corresponding plot for LJ 75 is qualitatively similar.
To ameliorate poor PT convergence due to broken ergodicity, we employed an advanced simulation technique, reservoir replica exchange, 25,26 whereby the PT simulation employs an additional "reservoir replica," which samples the HSA approximation to the underlying landscape. Because the reservoir replica is an efficient sampler, it provides an ersatz high-temperature replica: its frequent transitions between metastable states provide a channel for rapid PT equilibration when coupled to the sequence of M "physical" replicas. By employing this method, we were able to obtain converged thermodynamics without the need for additional replicas with temperatures approaching melting. For our simulations, the reservoir replica was coupled to replica R at the C V peak (T R = T*), shown elsewhere to be a good choice for these clusters. 26 Since replica R coincides for our HSA and reference simulations, the reservoir method couples to the same distribution in both the HSA and reference simulations, reducing any systematic discrepancies due to this auxiliary sampling.
We now present the results of our PT simulations, focusing first on the acceptance rate. Figure 3 displays the acceptance profile of LJ 31 for both the reference and HSA-enhanced simulations. The reference simulation exhibits a roughly constant profile near the terminal temperatures but displays a noticeable dip in the vicinity of the phase transition. The HSAenhanced simulations, however, display a profile that is much more uniform across the phase transition. Our simulations of LJ 75 show similar behavior: Figure 4 indicates that the reference simulations of LJ 75 fail to achieve a uniform profile near the phase transition, whereas with the HSA-optimized set this dip is absent. From these results it is clear that HSA-optimized temperature set can achieve a more uniform acceptance profile than standard geometric progression.   31 : The average replica exchange acceptance probability for pair (T i , T i+1 ) is plotted vs temperature index i, for temperature spacings chosen geometrically (empty black circles) and chosen by HSA-optimization (filled blue circles). The dip in the geometric case coincides with the LJ 31 heat capacity peak (Figure 1). At higher temperatures both profiles deviate from uniform behavior, as the HSA becomes less accurate. We have seen above that our strategy provides a more uniform acceptance profile near a C V peak. We now investigate whether it can also lead to gains in simulation efficiency. To characterize convergence of our PT simulation, we need a measure that captures the dynamics of our entire M replica system. This measure can be achieved by following the replica trajectories as they evolve between the set of M temperatures. By "replica trajectory" we refer to a time series trace of the temperature index k, where discrete jumps k → k ± 1 occur by successful replica exchanges with neighboring temperatures. By projecting onto k, the dynamics of a given replica trajectory r are represented as a random walker in this discrete space, with transitions in k related to the PT acceptance profile. The entire M replica system will be converged once each of our M trajectories has exchanged many times between the terminal temperatures (k = 1 and k = M), and the time scale under which these transitions occur will give an indication of the convergence condition for our PT simulation.

Journal of Chemical Theory and Computation
To characterize convergence from our replica trajectories, we use a technique developed by Doll et al. 37 For each trajectory r, let p r (t) = (p 1 r , ..., p M r ) denote the vector of observed occupation probabilities up to time t. That is, element p k r (t) is the fraction of simulation time trajectory r was observed in temperature index k up to time t. Since we have M trajectories, the dynamics of our simulation are represented by M such vectors, p 1 , ..., p M . Importantly, we know the converged limit of p r : As t → ∞ each temperature is equally occupied, 37 so p r → p eq = (1/M, ..., 1/ M) for each r. The approach of p r (t) to p eq will determine the overall simulation convergence. To analyze this convergence we define an occupation-based entropy: 37 (18) By following the approach of S (r) to its limiting value, S eq = ln(M), we can straightforwardly probe convergence of our M replica system. Below, we report values averaged over the M replica trajectories: S(t) = (1/M)∑ r S (r) (t).
In Figures 5 and 6 we display S(t) for LJ 31 and LJ 75 , respectively. For both systems, we clearly see that S approaches S eq quicker with HSA-optimized temperatures compared to the reference simulations, indicating that HSA-optimized spacings provide more rapid equilibration.
As a second test of convergence, we directly calculate a time scale associated with the temperature index k for our replica trajectories. For a given replica trajectory, we estimate a characteristic relaxation time associated with k as is the autocorrelation function of k and σ k 2 is its variance. In Table 1 we display our estimates of τ for simulations with HSAoptimized and geometric temperature sequences. The values are reported as averages over the M replica trajectories, with individual τ values obtained from block averaging. 38 As shown     75 , on the other hand, are more difficult to discern: with HSAoptimization, LJ 75 only receives a marginal benefit as judged by the convergence time scales. The fact that τ HSA ≈ τ geom despite a more uniform acceptance profile (Figure 4) may indicate that the bottleneck for equilibration is not captured simply by a dip in the acceptance profile, but rather is determined by rare structural fluctuations within a given replica acting on much longer time scales. Indeed, it is known that low-temperature simulations of LJ 75 are extremely difficult to converge, even when using parallel tempering. 26 In this case, the convergence time scales reported for LJ 75 can only be considered lower limits to equilibration.

DISCUSSION AND CONCLUSION
We have applied our optimization technique to systems undergoing first-order-like phase transitions, for which there is currently no successful analytic theory for PT temperature spacings. The results above clearly show that our optimized temperature sequence can provide PT results with (1) a more uniform acceptance profile and (2) faster convergence. A standard geometric progression of temperatures applied to such a situation leads to a bottleneck in the acceptance profile, which ultimately slows system equilibration and hinders replica mixing. This failure is due to the breakdown of the assumption of constant heat capacity, resulting from a shift of metastability from one phase to another. Our approach, on the other hand, can be very effective in this regime. The method is based upon the HSA, an approximation of the underlying canonical distribution that is able to predict changes of state and other nontrivial thermodynamic behavior. By utilizing the HSA, our temperature optimization scheme is sensitive to changes in populations of various phases that occur near such transitions and hence can tolerate a nonuniform heat capacity.
Although our method requires no particular assumptions about the heat capacity profile, it does, importantly, rely on the HSA to faithfully represent the underlying canonical distribution. This representation is necessary in order to accurately estimate the thermodynamic behavior of the system, in particular its acceptance probability profile. Because the HSA is accurate only for low temperatures (and only exact at T = 0), our method will generically be most reliable for systems in or near the solid phase. We therefore intend to use it for suitable applications, such as crystal polymorphs, crystal−quasicrystal coexistence, and other systems undergoing structural rearrangements.
In a similar vein, our method also requires a database of configurational minima of the potential energy landscape, which must first be found before the HSA can be constructed and utilized. Although the total number of minima can often be large, fortunately our method only requires those minima that are important for the thermodynamics. For the systems we are considering (solids at reasonably low temperatures), this number is relatively small. Furthermore, finding these relevant minima can be computationally inexpensive compared to equilibrium thermodynamic sampling as a whole; 22,21 primarily because minima searching techniques need not satisfy detailed balance and hence permit a wider range of step-taking routines. By investing a relatively minor amount of time in global optimization (minutes in the case of our test systems), we obtain quicker convergence of our (much) longer PT simulations thanks to the HSA-optimized temperatures. Basin-hopping 33,39,40 is one efficient strategy for minima searching and exploration of the potential energy landscape and is currently available in our group software. 23,24 Finally, we mention that our method can be extended to optimize temperatures according to other criteria as well. Here temperatures were selected to achieve a uniform acceptance profile. Others, however, argue instead that temperature spacings should minimize replica round-trip times. 16 To achieve this, Katzgraber et al. use an iterative procedure that optimizes the flow of replica trajectories between terminal temperatures, achieving sizable efficiency gains over a uniform acceptance strategy. 16 The tools developed here, in particular the estimates for acceptance rates, could be used to optimize in a similar way. This analysis is in progress.