Entropic control of particle sizes during viral self-assembly

Morphologic diversity is observed across all families of viruses. Yet these supra-molecular assemblies are produced most of the time in a spontaneous way through complex molecular self-assembly scenarios. The modeling of these phenomena remains a challenging problem within the emerging field of Physical Virology. We present in this work a theoretical analysis aiming at highlighting the particular role of configuration entropy in the control of viral particle size distribution. Specializing this model to retroviruses like HIV-1, we predict a new mechanism of entropic control of both RNA uptake into the viral particle, and of the particle's size distribution. Evidence of this peculiar behavior has been recently reported experimentally.


I. INTRODUCTION
Viruses rely mainly on molecular self-assembly to perpetuate their life cycle. Spontaneous self-assembly of molecules is indeed a powerful and yet passive way of structuring solutions of molecules at an intermediate length scale between the nanoscopic scale of the molecule itself and the microscopic scale of cells. Viruses are composed of proteins, nucleic acids, and eventually lipids in the case of enveloped viruses. All these components have to be orchestrated in order to produce the regular morphologies observed across different viral families through self-assembly [1]. The precise modeling of these phenomena is generally challenging. In particular the question of the necessary regulation or control of self-assembly remains largely open for viruses with complex life cycles.
In the case of non-enveloped virus, the genome of the virus is protected by a protein shell called a capsid. The proteins are arranged according to icosahedral symmetry [2]. As a consequence, this symmetry imposes some restrictions on the size distribution of viral particles to be self-assembled. Spontaneous curvature of protein layers has been put forward recently as another plausible mechanism of the control of the size polydispersity using continuous models of the capsid [3]. In the present work, we would like to emphasize the particular role of the genome in the regulation of particle size distribution. This is especially important in the case of viruses which have multipartite genomes. Indeed, for such viruses the configuration entropy of all the molecules constituting the virus is of utmost relevance and its balance with enthalpic contribution to the self-assembly leads to the appearance of specific phenomena like the entropic control of particle size distribution discussed in the present work. Zandi and Van der Schoot discussed recently within a similar formalism the interplay between electrostatic forces driving co-assembly of proteins and RNA and their relative stoichiometry [4]. Following their work, we extend the modeling of viral self-assembly in order to describe the competition between different particles sizes and different RNA content. Our analysis leads to the following identification of several roles of entropic origin for the genome: (i) the genome facilitates the viral self-assembly of proteins by lowering the onset of particle formation and by increasing the effective free energy gain per protein upon particle formation; (ii) viral RNAs are preferentially co-packaged based on entropic considerations within viral particles in the mixtures of viral and cellular RNAs; (iii) the uptake of viral genome produces a shift of particle size distribution towards smaller particles and a reduced polydispersity. This paper is organized as follows. In the first part, we present the classical thermodynamic framework to describe micellization phenomena in the case of a monodisperse protein self-assembly. The entropic role of the genome on the self-assembly is then investigated. The second part describes the influence of viral and cellular RNA uptake on the self-assembly process. The entropic uptake of viral RNA and the entropic control of size polydispersity found using the model is discussed with respect to recent experiments performed on HIV-1.

II. VIRAL SELF-ASSEMBLY AND THE ENTROPIC ROLE OF THE GENOME
A. Classical description of pure protein self-assembly using micellization thermodynamics We consider in this section identical proteins that have a spontaneous tendency to self-assemble into a set of different aggregates. Knowing the input concentration of proteins φ o , we investigate the equilibrium partitioning of proteins into the different aggregates. Each aggregates is made of p proteins, and the equilibrium concentration of these aggregates is written as c p . The gain in free energy for the formation of one aggregate of size p is kT F p , where k is arXiv:1302.4724v1 [q-bio.BM] 19 Feb 2013 the Boltzmann constant and T the temperature of the system. The reference free energy of a single protein is kT F 1 . The Gibbs free energy of the solution of proteins is written as where V is the volume of the solution, and v 0 is a reference volume, interpreted as the cell volume used to compute the configuration entropy. For each aggregate type, there is a translational entropy term kT V c p (ln (c p v 0 ) − 1) and an energetic gain term for the formation of aggregate kT V c p F p . As it is described below, this is the balance between these entropic and enthalpic contributions that sets the precise size distribution. This Gibbs free energy assumes implicitly that long-range interactions between aggregates are negligible. At equilibrium, the size distribution c p minimizes the Gibbs free energy with the global constraint of mass conservation This can be taken into account by the use of a Lagrange multiplier µ that is interpreted as the chemical potential of individual proteins. The equilibrium conditions are written as The first equation is simply the law of mass action for the aggregate of size p. Using the notation ∆G p ≡ F p −pF 1 ≡ pg p , one can find the equilibrium partition of proteins among the different aggregates by solving the following non-linear equation in c 1 and by plugging the solution into the law of mass action Eq.3. In order to address the question of dominance of a given population of particles with respect to another one, we restrict the model to a bimodal size distribution: the product of the self-assembly is either a small particle with p 1 proteins or a large particle with p 2 proteins. The equilibrium concentration of un-aggregated proteins c 1 is now given by The figure 1a shows the numerical resolution of the previous equation for a representative set of parameters mimicking a protein titration experiment. This set of parameters was chosen because the representation of the results are in this case particularly clear. We checked however that the results described below are not strongly dependent on the precise choise of the parameters. In the particular case where the free energy gains per protein for small particles g 1 and for large particles g 2 are equal, any imbalance between population of small and large particles is directly attributed to purely entropic effect. For the sake of convenience, this scenario is called the "non-selective enthalpy" (NSE) scenario.
For low initial concentration of proteins, the entropy of individual proteins cannot be balanced by free energy gain per protein g i , and no self-assembly occurs. Once the so-called critical micellar concentration (CMC) is reached, protein association starts 1 . This threshold is roughly estimated by Once the aggregation sets in, it is observed that there are more smaller particles than larger particles. This is understood as a purely entropic effect, as it was already mentioned earlier: indeed, a larger number of smaller particles can be formed at fixed concentration of proteins. (b) Ratio cp 1 /cp 2 as function of initial concentration. The curves correspond to increasing difference g1 − g2 > 0 (favoring larger particles) between enthalpic gain following the large black arrows. The orange dashed line represents the limit between small particle dominance (blue rectangle) and large particle dominance (green rectangle). The entropic selection reduces as the enthalpic gain is increased. Parameters are identical to (a), except for g1 It is possible to relax the NSE model by choosing distinct free energy gains per protein 2 g 1 > g 2 . In this case, the enthalpy contribution to self-assembly tends to favor larger particles, and therefore it will counterbalance the entropic selection mechanism illustrated in figure 1a. After little algebra, it is possible to find an exact relationship between the concentration of initial protein φ 0 and the ratio between the equilibrium value of the number of small and large particles α = c p1 /c p2 The solution to this equation α = f (φ 0 ) is shown in figure 1b for different values of g 1 − g 2 . A progressive loss of entropic selection compared to the enthalpic selection is observed. As a consequence the entropic selection of small particles is therefore subjected to an assumption of NSE scenario and might be observed only under weak enthalpic size selectivity.

B. Entropic role of monodisperse RNA upon protein self-assembly
The previous calculation is useful at illustrating the role of entropy in the partitioning of proteins among different particles. However, many viruses require the presence of their genome in order to initiate or complete their assembly.
Without specifying the precise structure of the capsid with its inner genome (single-stranded RNA in most cases), it is possible to generalize the previous approach in order to predict the influence of a monodisperse genome on the viral particle self-assembly. This generalization has been partly done in reference [4], and therefore the results of this section are similar to those obtained earlier.
The first important feature to be incorporated into this generalized model is the stoichiometry of viral particles. Indeed, several works have recently pointed out a linear relation of electrostatic origin between the number of proteins in the capsid and the total number of nucleotides in the genome [4][5][6][7]. Following these works, we will assume that for each particle made of p i proteins, there are m i RNA molecules such that p i = Km i , where K is a constant depending in particular on the length of RNA. Assuming that the initial concentrations of proteins and RNAs are respectively φ 0 and φ r , the equilibrium equations for self-assembly are easily written as where c r is the concentration of free RNAs. In order to highlight the role of genome in the self-assembly process, we will restrict the product of self-assembly to two distinct sizes of capsid made of p 1 and p 2 proteins. Furthermore, each of these particles p 1,2 have the possibility to contain either no RNA or m 1,2 RNAs. Therefore there are four types of particles within this model (two particle sizes and two RNA contents each), and this new feature goes beyond the two particle treatment performed in reference [4]. This particular configuration allows to show two main features of RNA presence during the self-assembly. The first one is the enhancement of the entropic selection of smaller particles containing RNA even if RNA uptake is done without extra free energy gain. The second one is the lowering of the CMC for particle assembly. The first feature is illustrated in figure 2. For the sake of notation clarity, we define the free energy gain per protein in the presence or in the absence of RNA respectively by g (P +R) i and g (P ) i . In the case where RNA does not bring extra gain in free energy upon their uptake in viral particle, we have g i , and the results of figure 2 shows that small particles containing RNA are more numerous than larger particles regardless of their RNA content. This shows that the presence of the genome in the solution is a key factor affecting the relative populations of particles.
Rewriting the equilibrium equations, this can be understood as an effective increase in free energy gain per protein. Indeed the equations 9,10 and 11 applied to the four types particles are written as: where g (0) i is the free energy gain per protein for pure protein self-assembly into particle of size p i , δg i is the extra free energy gain per protein brought by the presence of RNA. This last term contains in particular both the contribution of RNA entropy within the particle and the specificity of RNA-protein interactions. Their contributions to the size selection phenomena discussed with our formalism are expected to produce similar effects. The relative contribution from RNA entropy and RNA-protein interactions is however expected to be more model-dependent, and goes therefore beyond the scope of the present work. This last equation shows that even without intrinsic extra free energy gain δg i = 0, the entropy of RNA allows to effectively increase the free energy gain, g ief f becoming more negative.
The second important feature of protein self-assembly in the presence of the genome is a shift of the CMC for particle assembly as compared to pure protein self-assembly. Indeed, the effective free energy gain per protein g ief f will contribute to the shift of CMC, according to the previous estimation of CMC in Eq. 7. This is clearly illustrated in the figure 3 by comparing the two types of self-assembly. In this case, the self-assembly of proteins in the presence of RNA starts at lower protein concentration when compared to the self-assembly of pure proteins (cf figure 3). Note that for the parameters of figure 3, RNA uptake is assumed to reduce the free energy per protein (δg i = 0), thereby reducing further g ief f .

A. Entropic selection of viral RNAs
The uptake of viral genome during capsid self-assembly is made through both electrostatic and specific interactions. The former interaction is largely responsible for the linear relation between protein numbers and nucleotides observed in viruses databases. On the other hand, virologists have identified for many viral genome some specific sequence that have stronger affinities with viral proteins than electrostatic-based predictions [8][9][10]. This sequence is called a packaging signal (PSI or ψ). As a consequence, many viruses may contain both viral RNA bearing the ψ sequence and non-viral cellular RNAs. This has been indeed observed for several viruses, and in particular for retroviruses like HIV-1 [11,12]. We anticipate in this case that the entropy of viral and cellular RNA will be of utmost relevance in determining the size distribution of particles and their RNA content. These phenomena can be described within the framework of micellization thermodynamics similarly to the discussion of previous sections.
We consider in this section a mixture of proteins, monodisperse viral RNAs and cellular RNAs of respective initial concentrations φ 0 , φ rv and φ rc . The main difference between two types of RNA within our simple model is their length: viral RNAs are usually longer than cellular RNAs [13]. Since each particle has the ability to contain both RNAs, we use for the particle with index i a generalized linear relation between protein numbers p i , viral RNAs n i and cellular RNAs m i such that In particular, the ratio K v /K c scales like the ratio of RNA length. The equilibrium equations describing self-assembly are generalized from previous sections into where c rv and c rc are respectively the concentration of free viral and cellular RNAs. These non-linear equations do not have general analytical solutions and may lead to a large variety of molecule partitioning among particles. Rather, it is possible to infer the influence of multiple RNA inside the particles by restricting the final products of self-assembly.
In particular, imposing particles of same size but with different RNA content as the final product of self-assembly allows to address the question of preferential uptake of multiple RNAs as function the RNA partitioning. Therefore we restrict the analysis in this section to two final products of self-assembly: particles made of p 1 proteins and containing n 1 large viral RNAs and m 1 small cellular RNAs, and particles made of p 2 = p 1 proteins and containing only m 2 small cellular RNAs and no large viral RNAs. The typical results in this case of viral RNA titration at fixed concentration of proteins and cellular RNAs is shown in figure 4a. In this case, there exists a threshold above which the uptake of viral RNA is systematically favorable. Interestingly, due to the length difference between RNAs, the threshold concentration is much smaller than the actual cellular RNA concentration. This can be understood qualitatively by the following entropic argument. The replacement of several small cellular RNAs by some longer viral RNAs in order to maintain the constant level of nucleotides required for a given capsid size allows to reduce effectively the number of small cellular RNAs per particle. Therefore a larger number of particles with reduced number of cellular RNAs can be made at constant cellular RNA concentration, and this entropically favorable.
Our results shows that the length difference between viral and cellular RNA is prone to favor viral RNA uptake based solely on entropic considerations. This is remarkable than without any sequence specificity, the spontaneous tendency of the viral self-assembly behavior is the uptake of longer genome. The packaging signal ψ adds another contribution to the preference of viral RNA 3 , but it is not necessary to have a strong signal according to the previous entropic argument. Not surprinsingly, this entropic preference of large RNAs into particles disappears as the length difference between viral and cellular RNAs reduces (data not shown).

B. Entropic control of size distribution
By restricting the final product of self-assembly to a different set of particles, it is possible to go beyond simple bimodal products of self-assembly, and to describe the influence of the mixture of viral and cellular RNAs on the size distribution of viral particles. In order to investigate this case, we assume two families of particles: particles A which have discrete sizes distributed evenly around a central value p 0 with a width H × δ, and particles B which have the same discrete size distribution, but a different RNA content. A second index is introduced in order to label particles within each family {A (i) } and {B (i) }. The particles A (i) contain n A typical numerical solution for these equations is shown in figure 5a. The energetic parameters g i of the calculation were chosen in order to favor the central number of proteins p 0 . As a consequence, the size distribution at low viral RNA concentration has a peak around p 0 of enthalpic origin. Interestingly, titration of viral RNA leads to a shift of the most favorable size towards smaller size. This is attributed to the entropic effects associated to the difference in RNA length between viral and cellular RNAs, similarly to the particular case of entropic selection discussed in the previous section. Moreover, since these entropic effects tend to favor smaller particles, the size polydispersity of this discrete set of particles is reduced. Notice that both effects of peak shift and polydispersity reduction reach a saturation, as it is explicitly seen in figure 5b.

IV. DISCUSSION
We presented in the previous sections an analysis of the influence of viral genome in the self-assembly of proteins into capsid using the framework of micellization thermodynamics. Focusing on the specific effects associated to the entropy of partitioning all molecules (proteins and RNAs) among various particles, we identified several relevant features. The first one is that within a self-assembly scenario without inherent strong size selection of enthalpic origin, entropy will favor smaller particles. This is easily understood as more particles of smaller size can be made at constant number of proteins, and this is entropically favorable. This observation is of central importance since the presence of viral genome will essentially enhance this preference for smaller particles.
More precisely the presence of monodisperse RNAs during viral self-assembly has been shown to enhance the preference for smaller particles, and to shift the CMC for viral self-assembly towards lower protein concentration. In the case where both viral and cellular RNAs are uptaken by viral particles, viral RNAs, which have been shown to be longer than most cellular RNAs [13], are preferentially chosen for the self-assembly. Moreover, we showed that the size distribution of viral particles is shifted towards smaller sized particles and the size polydispersity is accordingly reduced.
Most of the findings described previously rely on the assumption of weak size selection of enthalpic origin. This assumption is certainly arguable in the case of most icosahedral virus, but it is likely to be realistic in the case of retroviruses like HIV-1. Indeed, the size distribution of HIV-1 has been shown to be quite large, reflecting the absence of a strong size-selection mechanism, whatever its precise origin [14]. As a consequence, we might expect that some of the results of the present work are applicable to HIV-1. We were recently able to address this question experimentally using viral particles produced within cells, and by quantifying their size distribution thanks to Atomic Force Microscopy imaging [14]. Remarkably, we found that viral particles grown in the presence of viral genome were statistically smaller than particles grown in its absence, and that the size polydispersity was also reduced, in qualitative agreement with the prediction of our models. Similarly, evidence of the entropic selection of large genome was observed in studies quantifying the RNA amount within HIV-1 particles [11,12]: in the absence of viral genome, a few number of large RNAs were observed within viruses (typically one or two).
Interestingly recent observations on members of the Paramyxoviruses family, like the Newcastle Disease Virus (NDV) are also qualitatively explained by the entropic features highlighted in our work [15]: indeed, it was observed in this case that a majority of infectious VLPsare small and contain a single genome, while a minority are large and contain multiple genomes. The qualitative observation of effects predicted by the entropy of partitioning of molecules during self-assembly shows therefore unambiguously that the entropy contributes to the control of viral particle size distribution.
The authors would like to thank the Fondation Simone et Cino Del Duca from the Institut de France for initial funding that allowed to launch this project. This work was also partially supported thanks to CNRS program entitled "PIR: Interface physique, biologie et chimie".