Kinetic constraints on self-assembly into closed supramolecular structures

Many biological and synthetic systems exploit self-assembly to generate highly intricate closed supramolecular architectures, ranging from self-assembling cages to viral capsids. The fundamental design principles that control the structural determinants of the resulting assemblies are increasingly well-understood, but much less is known about the kinetics of such assembly phenomena and it remains a key challenge to elucidate how these systems can be engineered to assemble in an efficient manner and avoid kinetic trapping. We show here that simple scaling laws emerge from a set of kinetic equations describing the self-assembly of identical building blocks into closed supramolecular structures and that this scaling behavior provides general rules that determine efficient assembly in these systems. Using this framework, we uncover the existence of a narrow range of parameter space that supports efficient self-assembly and reveal that nature capitalizes on this behavior to direct the reliable assembly of viral capsids on biologically relevant timescales.

analysis of experimental data, general dynamic constraints on the microscopic rate constants that control efficient supramolecular self-assembly in such systems.

Results and Discussion
Fundamental kinetic equations. The self-assembly of molecular building blocks into closed target structures may be captured by the following set of kinetic equations for the concentration f(t, j) of intermediates of size j, known as the assembly line model ( Fig. 1(a)) 19,20 : where N is the number of subunits in the target structure and m(t) is the concentration of free subunits in solution, as determined by conservation of the total subunit concentration The terms on the first line of Eq. (1) describe the growth of assembly intermediates through the addition of individual subunits with rate constant k + . The term k m t ( ) n n c describes the initial nucleation step 24 as the spontaneous formation of the smallest growth-competent intermediate from the interaction of n c subunits with rate constant k n . Thus, the parameter n c corresponds to the reaction order of the nucleation step and, in the simplest scenario, can be thought of as the size of the smallest stable assembly intermediate (n c is the analogous quantity to the critical nucleus size in classical nucleation theory); intermediates with size j < n c are unstable and quickly dissociate back to free subunits, such that their concentration can be assumed to be negligible. Finally, the last equation of (1) describes the end step of the assembly line as the closure of an intermediate of size N − 1 into the final structure. Note that in the limit of infinite N, Eq. (1) recover the kinetic equations commonly used to describe filamentous protein assembly processes [14][15][16] . Note that, in Eq. (1) we have assumed size-independent rate constants; this assumption was primarily made for minimizing the number of model parameters to avoid over-fitting in the analysis of kinetic data, but our framework can be extended straightforwardly to take this effect into account. Moreover, Eq. (1) are deterministic and neglect therefore the potential effect of statistical number fluctuations. Such fluctuations are often negligible in reactions in bulk, but can become dominant in reactions under volume confinement 25 . We also note that Eq. (1) assume that assembly results in a single closed capsid geometry. Application to reactions that yield multiple capsid morphologies, such as the CCMV systems studied in refs 26,27 , require a more complex master equation.
Integrated rate law for assembly kinetics. To obtain an integrated rate law to Eqs (1) and (2), we make use of the perturbative renormalization group (RG) 28 , a general mathematical technique for constructing approximative solutions to nonlinear differential equations. In our case, the applicability of this method relies on the observation that the dimensionless ratio ε ≡ c is a small parameter, where m(0) is the initial subunit concentration. In order to take advantage formally of the smallness of ε, it is convenient to rewrite Eq. (1) in dimensionless form ( , )/ (0). Solving Eq. (3) perturbatively yields, after some algebra, the following result for the time-varying subunit concentration: . While Eq. (4) is accurate for short times, we observe the emergence of a divergent term ε τ − N ( ) at later times, which prevents this linearized early-time solution from being valid over the full time course of the reaction. At a fundamental level, this divergence emerges due to our ignorance about the system's behavior in the future; in fact, while the information about the initial concentration of subunits is sufficient for describing the system dynamics over short timescales, at later times the lack of information about the way in which ρ varies with changing timescale is what causes Eq. (4) to depart from the true solution. Perturbative RG provides a systematic method for dealing with this undesired divergence and hence obtain a global approximation valid for the duration of the whole reaction. Note that this procedure mirrors very closely the conventional RG approaches of quantum field theory and condensed matter physics. In these theories, we are interested in describing how a certain quantity of interest, such as the charge or the mass of an electron, is renormalized as we vary the observation scale (e.g. momentum or energy scale in quantum field theory). The missing information about the large-scale (e.g. high-energy) behavior of the system is packed into so-called counter terms, which are constructed in order to cancel the divergencies in the theory. In our case, the analogous quantity to the electron charge or mass of quantum field theory is the initial concentration of monomers and the RG procedure should yield renormalized values for this quantity at different time scales. Following the conventional work-flow of RG, we start by introducing an arbitrary time scale σ which we will vary between the initial time 0 and the observation time τ and then allow for a σ-dependence of the initial subunit concentration by writing ρ ρ σ εδρ σ = + ( ) ( ), where ρ(σ) is the renormalized subunit concentration (at scale σ) and δρ σ ( ) is a counter term. The counter term δρ σ ρ σ = N ( ) n c removes the divergent term in Eq. (4) and so we arrive at the following renormalized expansion where  stands for regular terms. As a next step in the RG framework, we require µ σ ∂ ∂ = / 0 since σ is arbitrary. Doing so, we arrive at the following RG equation By solving Eq. (6) and substituting in Eq. (5) as σ τ → we obtain the uniformly valid solutions Finally, using conservation of total subunit concentration and transforming back to real time t we arrive at the following integrated rate law for the concentration of closed target structures: This solution shows overall good agreement with the numerical evaluation of Eqs (1) and (2) (Fig. 1(b) and see Supplementary Material for a discussion on the accuracy of Eq. (9) as a function of ε). Moreover, we note that within the first-order RG approximation discussed here the kinetic trace for capsid formation is systematically underestimated by the analytical solution. This is because the function ρ(t) obtained by solving the first-order RG equation decays faster than the true solution. These errors can be reduced by applying the RG method to higher orders in ε.
General characteristics of assembly kinetics. Using the integrated rate law, Eq. (9), we are now in the position to derive, from first principles, a number of relationships characterizing the time course of the assembly reaction. According to Eq. (9), the time evolution of the concentration of target structures demonstrates the characteristic sigmoidal shape defined by an initial lag phase followed by a phase of rapid growth and final asymptotic approach to the plateau 20  Note that r max is given by the product of the rate of rate-limiting nucleation step and the Poissonian probability of observing the minimal number N − n c of elongation steps required to complete the assembly structure. A key prediction of Eq. (12) is the emergence of a power-law scaling of the maximal growth rate with initial subunit concentration γ r m(0) max . Because the scaling exponent γ solely depends on the nature of the nucleation step, γ = n c , the critical nucleus size can be determined from the slope of a log-log plot of r max vs m(0). Thus, as in many other areas of science 29,30 , scaling laws emerge in the context of supramolecular assembly as a general property that connects macroscopic data with the physical nature of the underlying microscopic processes through the value of the scaling exponent.
Equation (9) implies that the median assembly time t 1/2 , defined by the condition is the time needed to consume half the free subunits after the assembly line is set up, assuming that each nucleation event leads to the target structure through the eventual consumption of N subunits (see Supplementary  Information). This result shows that the t 1/2 is given as a sum of two distinct contributions, one originating from t max , the time necessary to form a quasi-steady state of intermediates, and the other from t nuc + t max , the time to nucleate a sufficient amount of intermediates that mature into the final structure through the the chain reactions of the assembly line. The former contribution to t 1/2 depends only on the efficiency of the elongation reactions in the assembly line, while the latter is governed by nucleation events. Crucially, the relative importance of these two contributions to the median assembly time is determined by the parameter ε = − + k m k (0) / n n 2 c . This quantitywhich measures the ratio of the rates of nucleation and elongation-naturally emerges from our theoretical framework as the key parameter controlling the assembly kinetics. In general, large values of ε correspond to a kinetic trap, whereby subunits are significantly depleted by nucleating too quickly, leaving less material to complete the assembly of target structures. By contrast, when ε is small, few nuclei are formed and the assembly yield is low for relevant time scales. The crossover between these two regimes occurs when t nuc = 2t max . Using the results above, this criterion can be formulated as a condition on the parameter ε as: When ε > ε c , the system is susceptible to kinetic traps, whereas when ε < ε c the assembly is inefficient. According to this criterion, successful assembly is the result of a delicate balance between the necessity of forming appreciable amounts of target structures and the danger of being kinetically trapped. Controlling the relative importance of nucleation and elongation processes provides therefore a high degree of intrinsic regulation of self-assembly 20 . We note that ε c decreases with increasing size N of the target geometry as N −2 . This behavior follows intuition because larger target structures impose stronger constraints on the time available for nucleation, t nuc ~ 1/N, while the time required for producing the quasi-steady state assembly line, t max ~ N, is inevitably longer for larger N.
Kinetic analysis of experimental data. Through the analysis of experimental kinetic data, we now demonstrate that the theoretical framework provided by Eq. (9) is capable of describing macroscopic features of supra-molecular self-assembly into closed topologies in terms of microscopic rate constants. We took a representative example and considered kinetic data of the formation of icosahedral viral capsids. Since the current version of our theory only considers empty capsid assembly, we limit our comparison to in vitro experiments on the assembly of purified capsid proteins; i.e., the systems do not include viral genomes, other viral proteins, or host factors. Previous studies modeling viral capsid assembly kinetics using master equations 19,20,[31][32][33][34][35][36] , continuum models [37][38][39] or molecular dynamics simulations [40][41][42][43][44][45][46][47][48][49] have led to important insights into the system characteristics, yet it remains a key challenge to elucidate the general physical principles underlying capsid assembly. First, we consider the assembly kinetics of Human Hepatitis B Virus (HBV) 19 , a representative icosahedral virus comprised predominantly of N = 120 subunits. Figure 1(c) shows the time evolution of HBV capsid concentration, as monitored by light scattering intensity, fit globally to the integrated rate law Eq. (9) with fixed n c = 3 (as determined from the scaling of maximal growth rate, Fig. 2(b)), yielding rate constants of = . ± . × + k 3 32 0 15 10 5 M −1 s −1 and = . ± . × k 1 6 0 9 10 n 6 M −2 s −1 . The global nature of the fit demonstrates the consistent agreement between Eq. (9) and the full time courses observed in the experiment over a wide range of initial subunit concentrations, including the characteristic sigmoidal shape of kinetic traces. We note that the entire data set could be fitted to Eq. (9) using just two global rate constants and one concentration-dependent plateau parameter for each kinetic curve that accounts for the constant of proportionality between the measured light scattering signal and the capsid concentration. In the SI, we provide also fitting to HBV assembly data under reducing conditions at 37 °C and pH 7.5 from ref. 50 . The fits to these higher temperature data, however, are less accurate, which could arise due to late stage intermediates 52 and protein interconversion between assembly-active and assembly-inactive conformations 53,54 . Next, we consider kinetic data for the formation of Human Papillomavirus (HPV, Fig. 1(d)) 21 and Brome Mosaic Virus (BMV, Fig. 1(e)) capsids 22  . The scatter in the data for the inflection time is due to increased experimental noise in the kinetic profiles close to the initial point of the reaction. Moreover, Fig. 2(b) illustrates how the relevant value for the reaction order for the nucleation step, n c , can be determined from the analysis of the maximal growth rate as a function of total subunit concentration. Through the analysis of the maximal growth rate, it is therefore possible to fix the value of n c necessary for fitting kinetic traces. We note that the value of n c for BMV was set to 3 as reported previously in the literature 22 ; this was done because the corresponding dataset has too few points for confident fitting. We also note that similar scaling laws have been previously obtained approximately by assuming an ad hoc separation between nucleation and growth processes 39,55 .
The availability of microscopic rate constants enabled by the present analysis allows mechanistic comparisons to be made between the assembly of different virus capsid systems. Interestingly, while the absolute values of the rate constants obtained from the fitting of experimental data vary over several orders of magnitude ( Fig. 1(f)), we observe that the parameter ε takes similar values across the different data sets: ε = . ± . × − 5 0 0 4 10 5 ( µ = m(0) 10 M) for HBV, ε = . ± . × − 8 0 0 7 10 5 ( µ = m(0) 1 M) for HPV and ε = . ± . × − 1 7 0 3 10 5 ( µ = m(0) 10 M) for BMV. Moreover, these values fall in the same order of magnitude as the theoretical predictions for ε c : . × − 5 5 10 5 (HBV), . × − 1 0 10 4 (HPV) and . × − 9 5 10 5 (BMV). This illustrates how the apparently distinct viral systems studied in this work are characterized by a similar balance of the relative rates of elongation and nucleation to achieve successful assembly. By contrast, for filamentous protein self-assembly 14-16 the long-time average length of aggregates 〈 〉 L becomes ε 〈 〉L 1/ c . As linear systems such as actin 51 are required by their biological function to be long, they should have low measured ε so as to maximize efficiency. This prediction is in agreement with what is observed in Fig. 2(c).

Conclusions
In conclusion, although it is of both fundamental and practical interest to identify and characterize the kinetic constraints governing supra-molecular self-assembly into closed target structures, this understanding has proved challenging to achieve in practice. Here, we have demonstrated how the availability of integrated rate laws to the underlying kinetic equations illuminates the dynamic design criteria that characterize the efficiency of such processes. We showed that efficient assembly only occurs in a narrow range of parameter space. By applying this kinetic analysis to experimental data of icosahedral viral capsid assembly we demonstrated that these structures occupy this narrow region of parameter space corresponding to efficient assembly. The reaction order for nucleation, n c , is obtained from the scaling behavior of r max . The data are for HBV and HPV. Note that discontinuities in the experimental kinetic traces for HPV assembly are responsible for inaccuracies in determining r max for data at higher initial concentrations. (c) Balance between elongation and nucleation in the viral systems studied in this work. Green solid line corresponds to ε = 4.9 × 10 −5 , green dashed lines correspond to ε c (Eq. (14)) for the various viruses. BMV data denoted by circles, HPV by squares, HBV by hexagons and HBV assembly data obtained at 37 °C and pH 7.5 from 50 (see Supplementary Information) by triangles. Blue data are from actin polymerization measurements in magnesium (stars) or in calcium (diamonds) from 51 . Viral images reproduced from 23 with permission.