A modelling paradigm for RNA virus assembly

Highlights • Multiple, common dispersed packaging signals (PSs) promote RNA virus assembly.• PS-mediated assembly solves a viral equivalent of Levinthal’s Paradox.• RNA PSs occur in many viral families, including Picornaviruses and Hepatitis B virus.• PSs encode an assembly manual that can be repurposed for VLP production.


Introduction
The formation of a viral protein container encapsulating a virus' genomic cargo is a prerequisite for the successful propagation of a viral infection. A better understanding of this process can therefore be exploited for therapy, either via the development of antiviral strategies inhibiting assembly, or the repurposing of the self-assembly process for the design of gene vectors and vaccines.
The initial focus in the study of virion assembly was directed towards in vitro studies of capsid self-assembly in the absence of other viral components. Models developed in tandem with such experiments provided an understanding of the kinetics [1][2][3] and thermodynamics [4,5] of spontaneous capsid self-assembly, and of the roles of protein-protein interactions in defining quasiequivalent capsid geometries [6,7]. They also elucidated the local rules underpinning coat protein (CP) self-association during capsid formation [8,9]. Many viruses, especially double-stranded DNA viruses, assemble their capsids prior to genome packaging via an ATP driven packaging motor. The protein-centric models, with the addition of scaffolding proteins in the case of larger capsid shells, are therefore an adequate context to study capsid assembly in these cases. By contrast, single-stranded RNA viruses, the largest family of viruses and containing many important human pathogens, package their genomes during capsid assembly, exhibiting a co-assembly process. For these viruses, capsid assembly has to be modelled in tandem with genome packaging. An important aspect of virus assembly in the presence of genomic RNA is the need for genome compaction [10], and several groups have made important contributions to the modelling of this aspect of virus assembly [11 ,12,13 ,14 ]. The impact of non-specific electrostatic interactions between genomic RNAs and CP [15][16][17][18]19 ] and of the stiffness of the RNA molecule on the assembly process [20 ] have been analysed. It has also been shown that the secondary structure of the RNA molecules play an essential role in determining capsid morphology in the self-assembly of Cowpea Chlorotic Mottle Virus (CCMV)-like particles [21]. The roles of genomic RNA have been studied in the assembly of helical viruses [22 ]. Moreover, molecular dynamics simulations of capsid assembly, both in the absence and presence of different types of cargoes, have made important contributions to our understanding of virus assembly [23 ,24]. Indeed, viral capsids can be assembled in vitro around different types of cargoes, including anions [25][26][27]. The models presented here go one step further. Instead of viewing viral genomes as passive passengers with at most non-specific electrostatic contributions to the assembly process, they demonstrate the consequences of the cooperative action of multiple, sequence-specific contacts between genomic RNA and CP.
repressor (TR) [28], a stem-loop in the genomic RNA known to function also as a packaging signal. This observation suggests that the contributions from genomic RNA to the assembly process are significant and therefore cannot be neglected in the assembly models.
There is only one copy of TR in the MS2 genome. Binding of TR to the CP dimer triggers a conformational switch from the symmetric dimer, the dominant form in solution, to its asymmetric conformation [29] that is needed in a 2:1 ratio for the construction of the capsid (Figure 1a). Normal mode analysis has revealed the structural features of TR that are required for this allosteric effect [30,31], demonstrating that many other, multiple dispersed, stem-loops in the MS2 genome could trigger the same effect [32]. This has resulted in the packaging signal (PS) hypothesis: Multiple dispersed secondary structure elements in the genomic RNA, with CP recognition features akin to those of the known high affinity PS, also trigger conformational changes of the CP dimer to its asymmetric conformation. These multiple dispersed sites have been called PSs, in analogy to the high affinity PS with which they share their characteristic feature for CP recognition. In the case of MS2, assembly mediated by these multiple dispersed PSs is also known as the dimer-switching model [33]. In other viruses, PSs can play a number of different roles in promoting capsid formation [35 ,36 ,45]. However, these different scenarios all share the same basic mechanism of PS-mediated assembly, in which multiple dispersed sites in the (pre) genomic viral RNA with affinity for CP promote efficient formation of a viral capsid with the correct geometry.

A mathematical model of PS-mediated assembly
In order to investigate how such multiple dispersed PS sites mediate capsid assembly, we developed a  Figure 2b): pentamers associate with, and disassociate from, PSs on the genomic RNA with rates depending on CP:PS affinity. As the precise nucleotide sequences of the PSs vary around their shared recognition motif, their affinities for CP can be distinct. In our model, they fall into three categories, weak (from 0 to À4 kcal/ M), intermediate (from À4 kcal/M to À8 kcal/M), and strong (from À8 kcal/M to À12 kcal/M), reflecting affinities seen in MS2 [40,41]. If two pentamers are bound to adjacent PSs, they form (or subsequently break) CP-CP interactions with rates determined by the free energy of the CP:CP bonds, chosen to be À2.5 kcal/M following estimates in Ref. [4]. This model allows us to study the determinants of PS-mediated assembly in a scenario of reduced computational complexity.

A systems approach is key
Assembly against a backdrop of cellular competitor RNAs (in a 1:300 ratio consistent with experimental studies) [38 ] reveals relatively low yields of viral particles compared with an abundance of misencapsidated particles (Figure 3), implying that in this simple form the model would not account for the assembly efficiency expected in vivo. This suggests that a key feature of the assembly process in vivo is missing in the model. Bacteriophage Qb   assembly has been studied by Eigen and collaborators [42], demonstrating that CP concentration gradually builds up while virion assembly is taking place, a phenomenon known as the protein ramp. Therefore, instead of adding the entire aliquot of CP (corresponding to the number of CP needed to fully encapsulate all viral RNAs in the simulation) at the start, a protein ramp was built into the model that reflects the gradual build-up of CP concentration, as is the case in a viral infection in vivo. Under these conditions, the model outcome reflects the observed in vivo behaviour for MS2 and other singlestranded RNA viruses [43,44], with viral particles now being the dominant species at the end of the simulation.
These results enable an important biological conclusion. They imply that the cooperative action of the PSs in enhancing assembly efficiency is best observed in experiments that are carried out under the conditions of the protein ramp, that is, a CP titration, explaining perhaps why PSs have long been missed by in vitro experiments. Indeed, experiments carried out in the context of a protein ramp reveal the hallmarks of PS-mediated assembly in a model virus, demonstrating that both the spacing between PSs and their recognition motifs impact on virion assembly [45 ].

A solution to a viral-equivalent of Levinthal's Paradox
The model also reveals the mechanism by which viruses efficiently navigate the landscape of possible assembly intermediates [38]. In protein folding, the ensemble of potential folding pathways of an amino acid sequence into its native conformation is so complex that a random exploration of different options would take longer than the known age of the universe. Despite this, proteins fold within biologically meaningful timeframes, a phenomenon known as Levinthal's Paradox, which we now understand, because protein chains do not sample all possible conformations on their way to their folded state. Similarly, the number of geometrically distinct ways in which a viral capsid can be built from CP is vast, yet virus assembly must have evolved strategies to bias assembly to the most efficient assembly pathways in order to sustain a productive infection against host defence mechanisms. Our model of PS-mediated assembly demonstrates how multiple dispersed PSs with varying affinities for CP can achieve this under the condition of the protein ramp ( Figure 3). In particular, variations in PS affinities for CP across the genomic sequence result in nucleation of assembly at specific sites, as opposed to nonlocalised nucleation across the full length of the RNA genome in the absence of the protein ramp, that is, PSs impact on nucleation behaviour. Only a small number of distinct assembly pathways from the ensemble of geometrically possible ones are actually realized during PS-mediated assembly, which are characterized by assembly intermediates that deviate only minimally from those maximising CP:CP contacts. This demonstrates that the PS distribution mitigates the combinatorial complexity of the assembly process. In short, it solves a virus-equivalent to Levinthal's Paradox in protein folding.
76 Virus structure and expression

Current Opinion in Virology
A modelling paradigm for packaging signal-mediated assembly.

Hamiltonian paths analysis
Different assembly scenarios can be encoded by geometric book-keeping devices that capture the order in which PSs make contact with CP during virus assembly. In particular, by connecting PS binding sites on the capsid interior in the order in which the corresponding PS:CP contacts are made, a connected string is obtained that provides a geometric representation of the assembly pathway. Superposition of all possible such strings results in a polyhedral shape with vertices at the PS binding sites at the capsid's interior surface, and edges connecting vertices on neighbouring capsomers. From a mathematical point of view, each individual string corresponds to a Hamiltonian path on this polyhedron, that is, a connected path visiting every polyhedral vertex precisely once. They do not represent, however, the exact location of the viral genome, which can make excursions into the capsid interior (Figure 4a). The (local) geometric properties of these paths can be classified for different types of capsid geometries. These local properties of the paths (as illustrated in Figure 4b for MS2) can then be used, in combination with a bioinformatics search for potential PS A modelling paradigm for RNA virus assembly Twarock et al. 77 The cooperative effects of PS distributions can only be observed in the presence of the protein ramp. (a) Differences in the PS affinity distributions for different RNAs, that is, different bead configurations in the mathematical model, result in differences in particle yield. The spectrum of different particle yields over 30 000 random RNAs is shown, with the best (RNA1) and worst (RNA2) performing RNA shown to the right. Cellular RNAs are modelled by strings of low affinity PSs (red beads). (b) In a viral infection, protein is synthesized while capsid assembly already takes place, a phenomenon known as the protein ramp. It is modelled via gradual addition of CP according to the graph shown. (c) The assembly of virus and malformed particles in the absence (left) and presence (right) of the protein ramp reveals the importance of the protein ramp for virion yield. In particular, in the presence of the protein ramp, assembly of RNAs (shown here for RNA1) is more efficient than in its absence, where malformed species deplete the protein resource.    (see red rhombs in Figure 4c based on Ref. [32]), which agrees well with an asymmetric EM reconstruction of MS2 at 8.7 Å resolution [49 ]. Moreover, all PSs identified in a subsequent EM reconstruction at 3.6 Å resolution [50 ] had previously been identified via our Hamiltonian Path Analysis method (Figure 4d). This demonstrates the utility of mathematical tools in identifying salient features in the organization of a packaged viral genome.

Conclusions
Modeling of PS-mediated assembly demonstrates the distinct advantages of PSs for efficient capsid formation. As PS-mediated assembly confers fitness advantages to viral particles assembling via this mechanism, it is likely that it is widespread in nature. The discovery of PSs in a number of viral families infecting different hosts including humans supports this hypothesis. Even Hepatitis B virus, a DNA virus, has been shown to reveal packaging signals in its pregenomic RNA, that impact on capsid geometry by biasing assembly towards formation of T = 4 shells [36 ]. It is likely that multiple dispersed PSs will be discovered in many more viral systems over the next decade, for example, in the alphaviruses [51]. Similar assembly mechanisms may even occur more widely in nature, for example in the assembly of repurposed Gag-like proteins [52 ] with roles in intercellular RNA transfer across synaptic boutons [53 ].
The models of PS-mediated assembly have provided mechanistic insights that could not have been obtained via experiment alone. They revealed that hallmarks of PS-mediated assembly can only be observed in the context of scenarios reflecting in vivo infections, and demonstrated the importance of the PS affinity distribution for efficient capsid formation. The Hamiltonian path approach has moreover served as a tool for the identification of PSs [32]. The discovery of PS-mediated assembly has opened up novel opportunities for antiviral therapy, for example, via small molecular weight compounds blocking either the PS or CP sites of the PS: CP interactions. The modelling paradigm reviewed here provides a basis for the study of viral infections and viral evolution, and such models have been constructed in order to study the merits of different anti-viral strategies   The two references above describe the neuronal gene Arc, that exhibits homologies with the retroviral Gag protein. The gene encodes a protein that forms viral capsid-like structures, which specifically transfer Arc mRNAs. The analysis shows that assembly of these containers is more efficient in the presence of the mRNA, suggesting that there could be secondary structure elements in the mRNA that function similarly to PSs during container assembly, implying that the mechanism of PS-mediated assembly may be occurring more widely in biological systems beyond virology.

54.
Bingham RJ, Dykeman EC, Twarock R: RNA virus evolution via a quasispecies-based model reveals a drug target with a high barrier to resistance. Viruses 2017, 9. The virus assembly model reviewed here has been coupled with a model of viral replication in order to describe a viral infection at the scale of an individual infected cell. This has been coupled with a quasispecies model of viral evolution in the context of a viral infection. Application of this model to a chronic Hepatitis C infection demonstrates that drug strategies targeting PSs are less likely triggering drug resistance than conventional forms of therapy.

55.
Dykeman EC: A model for viral assembly around an explicit RNA sequence generates an implicit fitness landscape. Biophys J 2017, 113:506-516. The dodecahedral model is expanded to include an explicit nucleotide sequence, where packaging efficiency depends on the ability of the RNA to fold locally into PSs. This enables the mutation of the RNA primary structure, thus allowing exploration of the effects of such mutations on viral assembly. The study suggests that viruses rely on degenerate PSs to ensure mutational resilience.

56.
Patel N, Wroblewski E, Leonov G, Phillips SEV, Tuma R, Twarock R, Stockley PG: Rewriting nature's assembly manual for a ssRNA virus. Proc Natl Acad Sci 2017, 114:201706951. The knowledge of the PSs of STNV is used to create a synthetic nucleotide fragment which assembles more efficiently than WT, and even outcompetes the WT in a head-to-head competition. This demonstrates that the PS-encoded virus assembly instruction manual can be optimised and repurposed for the synthesis of virus-like particles, with potential applications as gene vectors or in vaccinology.