Canonical free-energy barrier of particle and polymer cluster formation

A common approach to study nucleation rates is the estimation of free-energy barriers. This usually requires knowledge about the shape of the forming droplet, a task that becomes notoriously difficult in macromolecular setups starting with a proper definition of the cluster boundary. Here we demonstrate a shape-free determination of the free energy for temperature-driven cluster formation in particle as well as polymer systems. Combined with rigorous results on equilibrium droplet formation, this allows for a well-defined finite-size scaling analysis of the effective interfacial free energy at a fixed density. We first verify the theoretical predictions for the formation of a liquid droplet in a supersaturated particle gas by generalized-ensemble Monte Carlo simulations of a Lennard-Jones system. Going one step further, we then generalize this approach to cluster formation in a dilute polymer solution. Our results suggest an analogy with particle condensation, when the macromolecules are interpreted as extended particles.

T he formation of equilibrium droplets from a supersaturated gas is a long-standing subject of interest, being an essential phase transition in nature [1][2][3] . More importantly, the underlying mechanism is relevant for a multitude of nucleation-like processes from statistical mechanics to material science. These include crystallization in colloidal suspensions 4,5 and cluster formation in protein solutions 5,6 , as well as domain formation in so-called phase-change materials [7][8][9] and glassy solids 10 . It is even connected to field theory 11 and nuclear reactions 12 . The formal framework of free-energy calculations is straightforward, for example, in terms of reaction coordinates in phase space, but the application in computer simulations is diverse with an ongoing demand for further methodological developments 13 . A seminal application was the parameter-free estimate of crystal nucleation rates from equilibrium free-energy barriers 4 . It seems natural that the estimation of nucleation barriers becomes increasingly difficult when considering more complex systems, such as polymer or protein solutions 5 .
The rate of nucleation R is related by classical nucleation theory 1,3 to the free-energy cost bDF of a nucleus on top of the nucleation barrier: with the inverse temperature b ¼ 1/k B T and the Boltzmann constant k B . The kinetic prefactor k includes the kinetic details of the nucleation process, such as diffusion and nucleus-attachment rates. The free-energy barrier may be related to the suppression in the equilibrium probability distribution. Physically relevant barriers for liquid-vapour condensation are supposed to be in the range of 20k B T to 100k B T (see, for example, ref. 14). The typical setup for the study of free-energy barriers is at fixed temperature by variation of the density or degree of supersaturation or directly in the grand canonical ensemble. The barrier is then associated with the suppression in the droplet-size probability distribution 4 and is clearly temperature dependent 15,16 . This usually requires estimating the droplet size and interface, a task that introduces systematic uncertainties and strongly depends on the droplet definition. Instead, the freeenergy barrier may be directly related to the volume of the critical nucleus and the pressure difference 17 , exploiting a thorough understanding of the underlying phenomenon in a clever way. In this context, the problem can be reduced to conformational phase space, knowing that canonical expectation values typically do not depend on the kinetic energy.
In the following, we address the question of how to easily obtain dependable results on nucleation barriers without invoking elaborate thermodynamic reasoning or estimating nucleus shapes. This opens the door to more complex systems with nucleationlike mechanisms such as self-assembly and aggregation, where the nucleus shapes are a priori unknown. Importantly, we consider a setup at fixed density with varying temperature-an intuitive approach from a condensed matter perspective. We focus on aggregation of polymers in a dilute setup [18][19][20] guided by the canonical case of droplet formation in a particle gas. For the canonical case, we analyse a two-dimensional freeenergy landscape and identify the energy as a suitable reaction coordinate. This allows us to formulate the problem in the microcanonical ensemble of either fixed total energy E or fixed potential energy E p . The first is the usual textbook definition, while the latter has been frequently applied in recent computer simulation studies. This enables us to directly discuss the effect of kinetic energy when changing between the two formulations E2E p . If kinetic energy matters, only the first one allows a direct physical interpretation.

Results
Droplet formation free-energy barrier. We begin with the paradigm of nucleation and dissolution, the equilibrium droplet formation in a supersaturated particle gas [21][22][23][24][25][26][27][28][29] . The first-order condensation-evaporation transition separates a homogeneous gas phase from an inhomogeneous phase, where a single macroscopic droplet of size N D is in equilibrium with the remaining vapour [21][22][23][24][25][26] . In fact, the probability for intermediatesized droplets was shown to vanish 23,24 . In the vicinity of the transition, the energy-dominated inhomogeneous condensed phase coexists with the entropy-dominated homogeneous gas phase. A transition between both phases may only occur by energy variation upon nucleation or dissolution. In reality, this of course refers to the total energy E. For systems where the momentum phase space may be integrated out explicitly (see Methods section), the problem simplifies in terms of computability. We hence begin with an illustration in the potential energy formulation (denoted by a hat, for example,F) as a direct result of computer simulations before we go over to comparing both energy approaches. Figure 1a shows the free-energy landscape bF E p ; N D À Á of droplet condensation-evaporation of Lennard-Jones particles (see Methods section) at the finite-size transition temperature.
is a generalization of the (conformational) density of states to the two-dimensional E p -N D reaction coordinate space. For E p fixed, bF E p ; N D À Á resembles a parabola with a single local minimum. The resulting transition path is shown as a black line and its relative maximum along this path (around E 0 p ) is an estimate of the transition free-energy barrier. Instead, however, one may consider the projection along the droplet size 4 or equivalently along the energy, connected to the corresponding probability distributions bFðN D Þ¼ À lnP b ðN D Þ and bFðE p Þ¼ À lnP b ðE p Þ.
We follow the latter approach and derive the free-energy barrier from the suppression of transition states in the canonical energy probability distributions (see Methods section) for both reaction coordinates E and E p . The probability distributions are shown in Fig. 1b,d and are clearly asymmetric, with a narrow peak for the gas phase and a broad peak for the droplet phase. Methodologically, both ensembles are analogous so that we limit in the following the notation to the case of total energy E. At the equal-height inverse temperature b eqh , P b eqh (E) has two peaks at E ± of equal height and in between a minimum at E 0 . The resulting free-energy barrier is bDF ¼ ln(P b eqh (E ± )/P b eqh (E 0 )). Equivalently, one may perform the analysis entirely in the microcanonical frame 30 and consider the enclosed area by the microcanonical inverse temperature b(E) and the canonical inverse temperature (see Methods section) bDF¼ shown in Fig. 1c,e. Demanding areas of equal size yields the equal-area inverse temperature b eqa , which is in fact identical to b eqh (ref. 31). We notice that the energy probability distributions P b (E) and P b E p À Á are related by a convolution involving the Maxwell-Boltzmann distribution P MB (x) as P b ðEÞ¼P b ÂP MB À Á (E) (see Methods section). This in turn corresponds to a physical smoothing, which diminishes the ratio between maxima and minimum. As a consequence, we expect a lower barrier in the total energy formulation due to the kinetic contribution.
Finite-size scaling of free-energy barrier. In the following, we discuss the free-energy barrier of droplet formation in a particle gas and a dilute polymer solution as a function of system size. We here extend the notion of droplet formation to the formation of clusters or aggregates in polymeric systems. In particular, we consider linear bead-spring polymers (see Methods section), each consisting of 13 monomers. The resulting polymer cluster or aggregate is coexisting with a polymer 'gas', see Fig. 2, a first sign for the analogy to particle droplet formation.
The free-energy barrier is commonly assumed to be proportional to the occurring interface. Here the surface of the droplet @V D separates the liquid droplet from the surrounding gas and consequently we expect bDF ¼ s@V D , with the interface tension s. For any non-fractal shape, the surface area is related to the droplet volume V D as @V D pV 2=3 D . Since nucleation shows no sign of critical behaviour, this is a physically valid assumption. However, at the condensation-evaporation transition V D itself does not scale trivially with system size V. At a fixed temperature, general arguments exploiting the equivalence to the Ising model imply that droplet formation is triggered by insertion of particles until a single macroscopic droplet of size V D pV 3/4 coexists with the surrounding vapour [23][24][25][26] . This result may be translated to a fixed-density setup using Taylor expansions, where directly at the finite-size transition temperature the analogue scaling V D pN 3/4 was verified for both lattice and off-lattice particle models 29 .
Putting everything together and introducing an effective interfacial free energy t eff then yields to leading order bDFpt eff N 1/2 . It is common for the study of interface tensions to consider logarithmic corrections [32][33][34][35] , dating back to early field theoretic results 11 . The physical origin are translational invariance as well as capillary waves at the interface. Altogether we use for our final scaling ansatz where a and c are constants. This is the leading-order exponent in equation (1). Neglecting the prefactor k for now, we obtain from equation (3) to leading order the rate of equilibrium droplet formation as RpN a e À t eff N 1=2 . Thus for increasing system size the probability that a single macroscopic droplet forms decreases exponentially. Figure 3a shows the free-energy barrier for droplet formation in a particle gas as a function of system size for both reaction coordinates E and E p , obtained via equation (2). Both estimates yield barriers up to about 42k B T, showing at close sight an almost constant shift. Only the total energy E includes the kinetic contribution, which is here reflected in a smaller barrier, whereas the interfacial free energies are expected to be identical, t eff ¼t eff . In fact, fits to equation (3) yield t eff ¼ 0.939(4) andt eff ¼0:935ð4Þ, each for NZN min ¼ 320 with goodness-of-fit parameter QE0.3, for reaction coordinate E and E p , respectively. This remains consistent within error bars under variation of N min A[224, 1,280]. The accordance of data and fit is demonstrated in the inset. In order to test the significance of the logarithmic corrections, we  ARTICLE considered in addition a restricted fit to bDF ¼ t eff N 1/2 þ c. We obtain t eff ¼ 0.973(2) andt eff ¼0:977ð2Þ for N min ¼ 768 with QE0.2 and QE0.1, respectively. However, the estimate of t eff gradually decreases with increasing N min . Thus the most probable scenario remain small logarithmic corrections competing with a constant shift. The leading scaling behaviour was also observed in ref. 17 directly as a linear function of the droplet surface. The advantage of the present approach is that it avoids difficulties and uncertainties coming from the 'correct' identification of the cluster surface.
For the cluster formation in a dilute polymer solution shown in Fig. 3b, the situation remains qualitatively similar. Fits to equation (3) yield t eff ¼ 1.06(3) andt eff ¼1:03ð3Þ, each for N min ¼ 16 with QE0.2, for reaction coordinate E and E p , respectively. Note that this is on the same scale as for droplet formation of particles in Fig. 3a. Compared with the particle case, the system sizes are, however, much smaller, making quantitative predictions for the polymer case less reliable. Still, the overall behaviour supports the hypothesis that cluster formation in a dilute polymer solution shows a strong analogy to droplet formation in a particle gas.
We observe that considering the kinetic contribution results in a shifted barrier. For the considered examples and relevant system sizes, the shift is about bDF À bDF % 0:5 . . . 1, that is, of the order Oð1Þ. This additive contribution, while much smaller than the leading behaviour, leads to a multiplicative relation between the nucleation rates Rpe À bDF Ee À ðbDF À 1Þ % 3e À bDF p3R. Neglecting the momentum phase space thus underestimates the rates. In common situations, however, the deviations between experiment and simulations are of the order of several magnitudes, such that the effect of the kinetic contribution may be considered subleading.
Finite-size scaling of transition temperature. The evaluation of the free-energy barrier via equal areas provides us with a definition of the finite-size transition temperature. At fixed density, we showed for the condensation-evaporation transition 29 that the transition temperature, obtained from specific-heat peaks, scales as inverse power of the critical droplet radius R D pV 1=3 D pN 1/4 . The same is expected for all other transition temperature definitions. Figure 4 shows the equal-area definition together with a fit including higher-order corrections of the form for cluster formation in both particle gas and polymer solution. In the case of particle condensation, least-square fits for N min ¼ 192 yield b 0 ¼ 1.436(2) andb 0 ¼1:436ð2Þ each with QE0.5, for reaction coordinate E and E p , respectively. The excellent fit results show that the empirical, yet physically motivated, higher-order corrections describe the finite-size scaling very well. In addition, the strong finite-size deviations open a possibility to study finitesize scaling in experiments on the nanoscale (for a conversion, see Methods section). The finite-size transition temperature of the largest system (N ¼ 2,048) is b eqa $ 1:61948ð2Þ, which still deviates from the thermodynamic limit by O(10%).
It is worth noting that typical canonical finite-size transition temperatures, for example, the peak location of the specific heat, do not depend on the kinetic contribution to the partition function. In the canonical expectation values, the kinetic prefactor simply cancels. Here, however, we observe a finite-size difference in the transition temperature depending on whether we consider the kinetic contribution or not. Of course, the thermodynamic limit coincides. This is illustrated in the inset of Fig. 4a with a finite-size scaling of the transition temperature differencê b eqa ðNÞ À b eqa ðNÞ. It shows a prominent power law scaling of the form $ N À 3=4 , which interestingly is the same scaling as the inverse transition droplet volume. The finite-size difference arises from the convolution of an asymmetric energy probability distribution with the Maxwell-Boltzmann distribution, compare Fig. 1 and equation (8) in Methods section, which manifests in the geometric differences of the microcanonical inverse temperature and the enclosed areas in equation (2). It appears that correlations between the ensemble definitions account for compatible leading-order scaling corrections in equation (4), which further supports this ansatz and explains the observed difference. For polymer aggregation in Fig. 4b, fits of equation (4) yield b 0 ¼ 0.64(2) andb 0 ¼0:64ð2Þ for N min ¼ 14 (guided by the inset) each with QE0.8, for reaction coordinate E and E p , respectively. Qualitatively, the fit ansatz describes the data already well when including the smallest system sizes. Also the finite-size ensemble deviation in the inset shows a clear N À 3/4 trend for NZ14. Again, this is an indication for the analogy between cluster formation in polymer solutions and droplet formation in a particle gas.

Discussion
We have presented a shape-free approach to the estimation of canonical free-energy barriers in equilibrium droplet formation. The finite-size scaling is dominated by the predicted N 1/2 behaviour but we identified additional logarithmic corrections from precise numerical estimates. Somewhat surprisingly, the absolute free-energy barrier is sensitive to the consideration of the kinetic contribution. It is well known that the restriction to the conformational phase space does not influence finite-size transition points determined from canonical expectation values. These are evaluated on the level of the canonical partition function. The free-energy barrier and the associated equal-height or equal-area transition temperature, however, are determined from the energy probability distribution. Here the two formulations are related by a convolution with the Maxwell-Boltzmann distribution, which explains the finite-size differences. At the same time, the probability distributions are the integrands of the respective partition functions. In the end, this boils down to the trivial fact that equality of integrals does not imply equality of the integrands. This may become relevant once theoretical predictions and experimental measurements become precise enough. Still, a restriction to the conformational phase space retains intensive parameters in the thermodynamic limit. As a numerical advantage, considering the total energy as reaction coordinate leads to less fluctuations in the microcanonical partition function and canonical probability distribution, since the underlying convolution is a smoothing procedure of physical origin.
We provided evidence that the derived finite-size scaling of canonical droplet formation is applicable to cluster formation in dilute polymer solutions as well, despite the a priori non-trivial shape of the polymer cluster. This is a clear indication of an analogy between particle condensation and polymer aggregation. In particular, we showed that polymer clusters are in equilibrium with non-attached (free) polymers: an inhomogeneous or mixed phase of aggregate and solute polymers. This is intuitively clear when the polymers are interpreted as extended particles. The leading-order corrections then follow from the interplay between energy minimization by forming a local cluster and entropy maximization by retaining freely movable constituents. Of course, additional corrections should follow from the explicit geometry and internal behaviour of the constituents.
It is expected that the energy remains a suitable reaction coordinate for general nucleation-like mechanisms. In this case, generalized ensemble methods may unfold their full power. Moreover, our approach at fixed density provides the possibility for experiments to perform heating-cooling studies in order to probe transition rates. The presented results for polymer aggregation suggest that this approach may be generalized to studies of protein cluster formation 6 . In a wider scope, it may also find potential application for temperature-driven selfassembly 36,37 , crystallization in phase-change materials 7-9 and glassy solids 10 , dislocation nucleation 38 or the study of surface nanobubbles and nanodroplets 39 . Of course, experimental observations commonly include the formation of multiple clusters. Reasons for this include heterogeneities or impurities acting as nucleation seeds. We suppose that this leads to a local quasiequilibrium on the respective length scales. Here a proper combination of the canonical droplet formation with the effect of nucleation seeds 40 seems to be a fruitful approach to a systematic understanding. With further developments, simulations may provide reliable estimates for finite-size systems and meet experiments on the nanometer scale.

Methods
Microcanonical ensembles. Recently, there has been some ambiguity with the definition of a 'microcanonical ensemble' in computer simulations 41 . This is a crucial aspect relevant for physical interpretations that appears to be unwittingly softened in the past decade. The microcanonical ensemble (NVE) describes an isolated system in which the number of constituents N, the volume V and the total energy E are conserved. Here the transfer of potential energy E p into kinetic energy E k and vice versa is a valid and relevant mechanism, where E ¼ E k þ E p . The microcanonical (Boltzmann) entropy is defined as S(E) ¼ k B ln G(E), with the partition function GðEÞ¼ where Dx denotes the integration over state space and Dp over momentum space.
The other common definition is the conformational microcanonical ensemble (NVE p ), describing instead a system with fixed potential energy E p . The conformational microcanonical entropy isŜ is the density of states or the conformational microcanonical partition function. The consequences are drastic: a (physical) interpretation of energy transfer from potential to kinetic energy is no longer valid. This is natural for spin systems, where a kinetic contribution is not defined in the first place (but may be exploited for numerical purposes 42 ). On the contrary, it is particularly relevant for situations in soft condensed matter, for example, for particles and polymers, where interpretations of energy transfer become natural. However, there are good reasons for this choice: O(E p ) is a fundamental property of statistical mechanics. It encodes the full information about the conformational space and allows for identification of (structural) phase transitions 30,31,43,44 . Furthermore, it may be exploited for reweighting techniques and flat-histogram Monte Carlo methods 18,45,46 .
The relation between G(E) and O(E p ) is given by a convolution with the kinetic energy contribution (see, for example, ref. 47): If momenta and positions are independent, one may separate the kinetic energy contribution E k ¼ P i p 2 i =2m, explicitly perform the momentum integration and obtain for N particles in three dimensions 48 GðEÞ¼ ð2pmÞ where G( 3N 2 ) is the Gamma function. We define O(E p ) ¼ 0 8 E p oE p,min in order to extend the integral over the full energy range ( À N, N). In this way, the total energy surface entropy S(E) appears as a (weighted) potential energy volume entropy. Notice that, since all O(E p )Z0, the classical NVE entropy increases with E and the microcanonical inverse temperature k B b(E) ¼ @S(E)/@E cannot become negative, opposed to its conformational counterpart k Bb E p À Á ¼ @Ŝ E p À Á =@E p . This may be an interesting aspect for a recent debate on the correct definition of entropy when connected with the phenomenological thermodynamic entropy, for example, in refs 49-51 and references therein.
Numerically, we determine the microcanonical inverse temperatures as follows. In the conformational microcanonical ensemble, we have direct access to an estimate of ln O(E p ) (see below) such thatb(E p ) is obtained by a numerical five-point derivative. In the full microcanonical ensemble, we may estimate the inverse temperature in terms of microcanonical expectation values for N where the explicit prefactor in equation (5) cancels. Then we may express Canonical ensembles. The canonical ensemble is defined in terms of the partition function Z b ¼ R R Dx Dp e À bE , where each phase-space point is weighted with the Boltzmann factor according to the total energy. Again, the kinetic part may be explicitly integrated for generic systems. Each degree of freedom contributes with a Gaussian integral, and one obtains for N particles, whereẐ b ¼ R Dx e À bEp is the partition function of the conformational canonical ensemble. Both partition functions may be expressed as integrals in terms of the respective energies, namely, The corresponding canonical energy probability distributions are defined as We may now relate the two energy probability distributions by starting with the definition of P b (E) and inserting equations 5 and 7: We identify the latter part of the integrand as the N-particle Maxwell-Boltzmann distribution P MB (x) and may write equation (8) as a convolution Lennard-Jones particles. We consider a system of Lennard-Jones particles in a dimensionless periodic box of length L with fixed density r ¼ N/L 3 ¼ 10 À 2 . Mutual avoidance and short-range attraction are modelled by the 12-6 Lennard-Jones potential V LJ r ð Þ¼ 4E s=r ð Þ 12 À s=r ð Þ 6 Â Ã with E ¼ 1 and s ¼ 2 À 1/6 such that r min ¼ 1. For computational efficiency, the potential is cutoff at r c ¼ 2.5s and shifted by À V LJ (r c ). System sizes range up to N ¼ 2,048, which is competitive with state-of-the-art Molecular Dynamics simulations such as well-tempered metadynamics 52 . For the chosen density r ¼ 10 À 2 , a system with 2,048 particles requires a box of linear dimension L 0 E59 r 0 min ¼ 59 Â 2 1/6 s 0 . For argon, s 0 E3.4 Å such that L 0 E22.5 nm is on the nanoscale. Of course, for a comparison to an experimental setup one should include both the explicit geometric constraints and the full Lennard-Jones potential.
Bead-spring polymers. The considered dilute polymer solution is modelled by N linear bead-spring polymers, consisting of 13 monomers each, again in a dimensionless periodic box with monomer density r ¼ 13N/L 3 ¼ 10 À 2 . Bonds are modelled between neighbouring monomers by the FENE potential V FENE (r) ¼ À (KR 2 /2)ln[1 À (r À r 0 ) 2 /R 2 ] with K ¼ 40, R ¼ 0.3 and r 0 ¼ 0.7. Non-bonded monomers interact with the same Lennard-Jones potential as above but with s ¼ r 0 2 À 1/6 such that r min ¼ r 0 [18][19][20] . The total number of monomers is 13N, which yields 3 Â 13N total momentum degrees of freedom in equation (5) and successive relations. The bounded bond length [r 0 À R, r 0 þ R] from the FENE potential formally introduces constraints on these degrees of freedom. However, for practical applications in ordinary temperature ranges this effect is negligible and reweighting to the full microcanonical and canonical ensemble is feasible 41 .
Multicanonical Monte Carlo simulations. Parallel multicanonical Monte Carlo simulations 53-58 allow us to efficiently sample the suppressed states, by iteratively adapting an auxiliary weight function W(E p ) to yield a flat histogram H(E p ). The final weight function is related to the density of states up to a multiplicative factor: O(E p )pH(E p )/W(E p ). This gives direct access to microcanonical estimates and canonical expectation values at any temperature. Using equation (5), this even provides an estimate of G(E). Monte Carlo updates for the particle case include short-and long-range particle displacements. For updates of the polymer configurations, we employed local single-monomer shifts, bond-rotation and double-bridging moves, as well as long-range polymer displacements We measure the conformational (potential) energy E p and the number of particles in the largest cluster N D as in ref. 29. Error bars are obtained by the Jackknife method 59,60 .
Data availability. The data that support the findings of this study are available from the corresponding author upon request. The computer code required to generate the data as well as the analysis scripts that lead to our conclusions are available from the corresponding author upon reasonable request.