Efficient support of virus-like particle assembly by the HIV-1 packaging signal

The principal structural component of a retrovirus particle is the Gag protein. Retroviral genomic RNAs contain a ‘packaging signal’ (‘Ψ') and are packaged in virus particles with very high selectivity. However, if no genomic RNA is present, Gag assembles into particles containing cellular mRNA molecules. The mechanism by which genomic RNA is normally selected during virus assembly is not understood. We previously reported (Comas-Garcia et al., 2017) that at physiological ionic strength, recombinant HIV-1 Gag binds with similar affinities to RNAs with or without Ψ, and proposed that genomic RNA is selectively packaged because binding to Ψ initiates particle assembly more efficiently than other RNAs. We now present data directly supporting this hypothesis. We also show that one or more short stretches of unpaired G residues are important elements of Ψ; Ψ may not be localized to a single structural element, but is probably distributed over >100 bases.


Introduction
A retrovirus particle is assembled from~1500-3000 molecules of the Gag protein, together with RNA (Vogt and Simon, 1999), as well as smaller amounts of other viral and cellular proteins and a surrounding lipid bilayer. In a cell infected with wild-type virus, the vast majority of the released particles contain the genomic RNA (gRNA) of the virus, despite the fact that this RNA is only a minor species in the virus-producing cell (Chen et al., 2009). The selection of the gRNA for encapsidation depends upon the presence in this RNA of the 'packaging signal' or 'É', a region of~200 or more bases near the 5' end of the viral RNA (Aldovini and Young, 1990;Berkowitz et al., 1996;Comas-Garcia et al., 2016;D'Souza and Summers, 2005). However, the nature of É and the mechanism of selective packaging of gRNA are not well understood as yet.
In mammalian cells expressing Gag in the absence of É-containing RNA, the protein assembles into virus-like particles (VLPs) structurally indistinguishable from immature virions; these particles contain roughly the same amount of RNA as wild-type particles, but this RNA is a nearly random sample of cellular mRNA molecules (Rulli et al., 2007). Similarly, recombinant Gag protein can assemble into VLPs in a defined system in vitro; while this assembly requires the presence of RNA (or DNA), virtually any single-stranded nucleic acid can support assembly under these conditions (Campbell et al., 2001;Campbell and Rein, 1999).
In an effort to understand the selective packaging of É-containing RNA, we recently measured the affinity of recombinant HIV-1 Gag protein (lacking the p6 domain at its C-terminus) for different RNAs (Comas-Garcia et al., 2017). We found that the protein has similar, very high affinities for all the RNAs tested when assayed at near-physiological ionic strengths. However, further examination showed that this affinity is the sum of both specific and non-specific interactions. Non-specific binding could be selectively reduced by mutating specific residues in the protein; or by adding a vast excess of an irrelevant competitor RNA; or simply by raising the ionic strength in the assay. When the binding measurements were modified in any of these ways, a strong specific interaction with É could be detected. The salt-resistance of the binding of Gag to É had previously been observed, using somewhat different techniques, by Webb et al. (Webb et al., 2013).
To explain how É-containing RNAs are selectively packaged, despite the fact that Gag binds any RNA tightly at physiological ionic strength and any RNA can support assembly, we proposed that binding to É leads to initiation of assembly more efficiently than binding to other RNAs (Comas-Garcia et al., 2016;Nikolaitchik et al., 2013). We now present in vitro data that lend strong support to this hypothesis. This work also includes a preliminary characterization of the RNA sequences that are specifically bound by Gag under the modified assay conditions described above.
Gag has been suggested to bind specifically to several distinct sites in the 5' region of HIV-1 RNA (Lever, 2007). These include an internal loop and surrounding bases in stem-loop 1, the locus of the 'kissing interaction' where dimerization of gRNA is initiated (Abd El-Wahab et al., 2014); stem-loop 2 (Amarasinghe et al., 2000); stem-loop 3 (sometimes called 'É') (De Guzman et al., 1998); and a series of very short unpaired stretches, each with one or more unpaired G residues, collectively termed the 'Nucleocapsid Interaction Domain' (Wilkinson et al., 2008). We tested several of these possibilities by introducing mutations into a 'É' construct and testing the binding of Gag under different conditions.

Results and discussion
One important unresolved question is the exact sequence(s) which define É. We measured binding affinities using, where not specified otherwise, the methodologies described earlier (Comas-Garcia et al., 2017), except that the RNAs were 401 bases in length rather than 190 nts. These RNAs begin at either nt 150 or nt 200 (see Figure 1A) and were labeled at their 3' ends with Cy5. As indicated in the Figure (Wilkinson et al., 2008) were replaced with A's. We also noted that these RNAs contain a run of unpaired G and C residues (nt 442-459) that may well be paired in full-length RNA, but not in our 401-base RNAs. To test the possibility that these bases contribute to specific binding of Gag to the transcripts, we also mutated these residues to A's, both in the otherwise wild-type construct beginning at nt 200 (creating the 'GC loop mutant') and in the MBSM; this construct is designated 'MBSM second generation'. In all cases, removal of bases by deletion was compensated by extending the 3' end of the RNA, so that all the RNAs were 401 bases long. As a negative control RNA, we produced the reverse complement of É, that is RNA complementary to nt 200-600.
To test the effects of these changes upon the specific and non-specific binding by Gag, we titrated Gag into these RNAs, monitoring binding by the quenching of the fluorophore as described (Comas-Garcia et al., 2017), either in binding buffer (containing 0.2M NaCl), or in binding buffer with a 50-fold excess by mass of yeast tRNA, or in binding buffer containing 0.4M NaCl. Results of these assays are shown in Figures 1B, C and D, respectively. It is evident that Gag binds all the tested RNAs well in binding buffer. However, addition of yeast tRNA ( Figure 1C) or raising the ionic strength in the assay ( Figure 1D) strongly depressed binding to both iterations of the MBSM RNA, while deleting either SL1 or SL3 did not. Binding to the reverse complement RNA was drastically reduced under both of these conditions. These results show that the specific binding of Gag to É, detected in these assays, depends upon some or all of the clusters of unpaired G residues called the Nucleocapsid Interaction Domain (Wilkinson et al., 2008), but neither stem-loop one nor stem-loop three is crucial for this binding ( Figure 1C and D). A similar mutant has been reported to be deficient in selective packaging in vivo (Keane et al., 2015).
As discussed above, we have proposed that genomic RNA is selectively packaged because binding to É is particularly efficient at initiating VLP assembly (Comas-Garcia et al., 2016). Thus, it was of interest to assess the abilities of the different RNAs to support VLP assembly. For these experiments we focused on the É that starts at nt 200, the MBSM second generation and the Reverse complement RNA. Also, for these experiments on particle assembly, we used Gag protein lacking most of the matrix (MA) domain, as well as p6: we have previously reported that Dp6 assembles into VLPs with radii of curvature drastically different from those of authentic virions (Campbell and Rein, 1999), indicating that they are quite different in overall structure from authentic immature particles.
We first compared the binding to RNA of D16-99 Gag with that of Gag. Our previous measurements monitored RNA-binding using the ability of Gag to quench the Cy5 fluorophore on the RNA (Comas-Garcia et al., 2017). However, we found that D16-99 Gag does not quench the fluorophore; evidently, the quenching involves the MA domain. Therefore, we used microscale thermophoresis (MST) for monitoring binding by this protein. As shown in Figure 2A, MST and quenching measurements give very similar results for the binding of Dp6 Gag to the É RNA that starts at nt 200; at 0.15 M NaCl the K D s for MST and FCS were 14 and 17 nM, respectively, while at 0.45 M NaCl they were 256 and 302 nM. In all cases the Hill coefficient was greater than 1.0. These data show that MST is able to recapitulate our original FCS results (Comas-Garcia et al., 2017). MST data are presented in more detail in Figure 2-figure supplement 1 and Table 1. Interestingly, D16-99 Gag bound relatively weakly to all 3 RNAs at 0.5M NaCl (see Table 1). The implications of this result are now under further investigation.
We wished to quantitatively compare the different RNAs with respect to their ability to support assembly. It was important that all of the RNA be bound by the D16-99 Gag protein in these experiments, so that any differences observed represent differences in support of assembly, not differences in the extent of binding. Figure 2B shows the results of MST binding assays in a buffer closely resembling that used in assembly experiments, yielding K D s of 226, 382, and 568 nM for É (beginning at nt 200), MBSM second Generation (Gen), and Reverse Complement (Rev Comp) RNAs, respectively. Specifically, it is evident that nearly all of each RNA is bound at 1-2 mM D16-99 Gag, significantly below the levels used in the assembly experiments (see Figure 3 below).   Finally, we compared different RNAs with respect to their ability to support assembly of D16-99 Gag into VLPs. Different amounts of D16-99 Gag were added to 61 nM solutions of the Cy5-labeled RNAs; VLP assembly was monitored by the shift of the RNA into large, rapidly sedimenting structures, and was confirmed by negative-stain electron microscopy ( Figure 4). Although well-formed VLPs were visible in all the reactions (see insets in the Figure), a variety of other structures were also observed, particularly in the y and MBSM samples. The mixtures were layered onto sucrose gradients and centrifuged at 76,000 x g for 14 hr. Fractions were collected and assayed for both Cy5 fluorescence and D16-99 Gag protein content (p24 CA signal). Results of this experiment for É, MBSM second Gen, and Rev Comp RNAs are shown in Figure 3A-C. In each panel, the black line is the sedimentation profile of the free RNA. In Figure 3A, the free É RNA is a single peak centered on fraction 6. Addition of 3 mM D16-99 Gag (red curve) shifts the majority of this RNA to fraction 8, with a significant tail extending nearly to the bottom of the gradient. When 7.5 mM or higher concentrations of D16-99 Gag are added, nearly all the RNA is shifted to a broad peak centered around fraction 13. Qualitatively similar results were obtained with MBSM second Gen ( Figure 3B) and Rev Comp ( Figure 3C) RNAs. We also determined the distribution of the D16-99 Gag protein in these gradients, by performing immunoblotting on dot blots of aliquots of the gradient fractions ( Figure 5). We found that in all cases, the vast majority of the protein remained near the top of the gradient (fractions 2-4), and the presence of 61 nM RNA had little or no significant effect upon the distribution of the protein. The fact that the overall protein profile was not significantly affected by the presence of the RNA is not surprising, as the protein was in 50-fold molar excess over the RNA in these gradients.
In order to quantitatively assess the level of VLP assembly in each of the reactions, we summed the amount of RNA between fractions 10 and 18. The results of this analysis are shown in Figure 6. These data were fitted, using the non-linear least squares Levenberg-Marquardt method, using the equation where x is the protein concentration, Y(x) is the fraction of RNA in the bottom half of the tube, and n is a fitting parameter. This equation is analogous to the Hill cooperative model for macromolecular association. Solving for these values yielded the results shown in Table 2.
The results reveal a striking difference between É RNA and either MBSM second Gen or Rev Comp RNA: particularly at the lower protein levels, É supports assembly far more efficiently than the other RNAs. For example, at 11.25 mM D16-99 Gag, approximately 4= 5 of the É RNA has been shifted into the bottom half of the gradient, while only about half of the MBSM second Gen or Rev Comp RNA has undergone a similar shift.
These results are in complete concordance with our hypothesis that binding to a packaging signal nucleates assembly with particularly high efficiency (Comas-Garcia et al., 2016;Nikolaitchik et al., 2013). Simulations by Perlmutter and Hagan (Perlmutter and Hagan, 2015) also demonstrate the quantitative plausibility of this hypothesis. The fact that when Gag is limiting, there is more assembly on É than on other RNAs (shown here in a defined system in vitro), has also been demonstrated in vivo (Dilley et al., 2017); our finding that the same result is obtained in a defined in vitro system shows that this is a direct reflection of the interactions between Gag and the RNAs, and that other cellular components do not drive this phenomenon to any significant degree. The second important finding presented here is that the unpaired guanines within the first few hundred bases of HIV-1 RNA make a major contribution to the specific interactions between Gag and É, as manifested in direct binding assays (Figure 1). In fact, the contribution of these clusters of unpaired bases is far more important than that of either SL1 or SL3. Somewhat similar data have been reported by Webb et al. (Webb et al., 2013). Furthermore, these unpaired bases are critical for efficient VLP assembly, under conditions in which the protein binds equally well to all the RNAs tested (Figures 3,6). Altogether, these results support our hypothesis that É promotes selective packaging of the HIV-1 genomic RNA by virtue of its distinctive efficiency in promoting particle assembly. The data suggest that binding to É reduces the activation energy of the assembly process. We believe that this phenomenon explains the selective packaging of gRNA, in preference to other, cellular RNAs, into virions in infected cells. Experiments to identify a hypothetical nucleating complex are now under way. Except where otherwise specified, all procedures were as previously described (Comas-Garcia et al., 2017). RNAs were produced by in vitro transcription of linearized plasmids containing the T7 promoter. All transcripts were 401 nucleotides long unless indicated otherwise and were ultimately derived from the pNL4-3 molecular clone of HIV-1. Numbering begins with the first nucleotide in the R region, equivalent to nt 454 in the DNA sequence. Specifically, HIV-1 É 150 represents nucleotides 150-550; HIV-1 É 200 contains nt 200-600; DSL1 contains nt 150-180 and 280-650;

Materials and methods
DSL3 contains nt 150-305 and 405-650; 1 st -generation MBSM was derived from HIV-1 É 200 by  replacement of G224, G226, G240, C243, G241, G270, G272, G273, C274, G275, G289, G290,  G292, G310, C312, G318, G320, G328, and G329 with adenines (Wilkinson et al., 2008). The RNA transcribed from this HIV É 200-derived plasmid would still contain a highly GC-rich sequence which would quite possibly be unpaired. To eliminate this potential source of unpaired G residues, we also generated the MBSM second-generation, in which the first-generation MBSM was modified by replacing G442, G443, G444, C445, G448, C449, G451, G452, G453, G455, C456, and G459 with adenines. This latter series of changes was also produced in HIV-1 É 200, yielding the 'HIV-1 GC loop' plasmid. In some experiments, the negative strand complementary to the HIV-1 É 150 RNA ('Reverse Complement') was produced by transcribing a plasmid in which the T7 promoter was at the 3' end, rather than the 5' end, of the HIV-1 y 200 insert. The inserts in all plasmids were completely verified by sequencing. MST measurements were performed in premium coated capillaries on a Monolith NT.115 instrument according to the manufacturer's instructions (Nanotemper Technologies GmbH). Samples were incubated 20 min at 22˚C after loading into measuring capillaries. All experiments were done with temperature control set to 22˚C. LED power was 90% for É and second generation MBSM RNAs and 50% for Reverse complement RNA. MST power was 20% for all measurements with 5 s fluorescence read before MST laser on, 20 s MST laser switched on and 5 s fluorescence read after MST laser off.