Exploring the Landscape of the PP7 Virus-like Particle for Peptide Display

Self-assembling virus-like particles (VLPs) can tolerate a wide degree of genetic and chemical manipulation to their capsid protein to display a foreign molecule polyvalently. We previously reported the successful incorporation of foreign peptide sequences in the junction loop and onto the C-terminus of PP7 dimer VLPs, as these regions are accessible for surface display on assembled capsids. Here, we report the implementation of a library-based approach to test the assembly tolerance of PP7 dimer capsid proteins to insertions or terminal extensions of randomized 15-mer peptide sequences. By performing two iterative rounds of assembly-based selection, we evaluated the degree of favorability of all 20 amino acids at each of the 15 randomized positions. Deep sequencing analysis revealed a distinct preference for the inclusion of hydrophilic peptides and negatively charged amino acids (Asp and Glu) and the exclusion of positively charged peptides and bulky and hydrophobic amino acid residues (Trp, Phe, Tyr, and Cys). Within the libraries tested here, we identified 4000 to 22,000 unique 15-mer peptide sequences that can successfully be displayed on the surface of the PP7 dimer capsid. Overall, the use of small initial libraries consisting of no more than a few million members yielded a significantly larger number of unique and assembly-competent VLP sequences than have been previously characterized for this class of nucleoprotein particle.


Library expression and VLP purification
50 L of E. cloni EXPRESS BL21(DE3) electrocompetent cells (Biosearch Technologies) were transformed with 400 ng of plasmid library by electroporation.After transformation, 10 µL of the transformed cell suspension was serial diluted for plating and determining transformation efficiency, while the rest was used for setting up overnight culture, of which 5-10 mL was sampled the next morning for isolation of the transformed plasmid library.The remainder of the overnight culture was used to inoculate at 1:50 ratio into fresh 2YT media supplemented with appropriate antibiotics.The culture was incubated at 37 °C with shaking (250 rpm) until an OD600 value of 0.8-1.0 was reached.VLP protein expression was induced by adding IPTG to a final concentration of 1 mM and the expression culture was kept at room temperature with shaking (250 rpm) overnight.Cells were harvested by centrifugation at 6,000 rpm, and the cell pellets were processed immediately or stored at -80 °C.The cell pellet from 500 mL culture was resuspended in 80-100 mL potassium phosphate buffer (0.1 M, pH 7.0) and sonicated at 60-65% power for 15 min with 5-second bursts separated by 5-second intervals.The centrifugation cleared lysate was precipitated with 0.265 g/mL ammonium sulfate at 4 °C for about 3 hours.The crude protein library precipitate was collected by centrifugation at 13,000 rpm for 10 minutes, and the protein pellet was suspended with potassium phosphate buffer (0.1 M, pH 7.0) and kept at 4 °C with mild shaking to allow complete dissolution.The lipids and membrane proteins in the crude library sample were removed by 1:1 n-butanol:chloroform organic extraction.After recovering the top aqueous layer containing VLP library, sucrose (10-40% w:v).density gradient was used to further polish the library sample.The PP7 VLP library fractions was collected and pelleted out by ultracentrifugation at 68,000 RPM (Beckman Type 70 Ti Rotor) for 2 hours.The library protein pellet was dissolved in the potassium phosphate buffer (0.1 M, pH 7.0) and filtered through 0.22 µm filter.

PP7 particle analysis
PP7 protein concentration was measured using the Coomassie Plus Protein Reagent (Pierce) with bovine serum albumin as standard.Molecular weight distributions of assembled PP7 VLP libraries were assessed by LC-MS (ESI-ToF, Agilent), although we do not assume that all library members are detectable by this technique.Accurate mass measurements on individual VLP sequences were determined on the same instrument.For all mass spectrometry analyses, VLP samples were first denatured by incubation of 10 µL of 0.5 mg/mL VLP solution with a mixture of 2.5 µL of 1M DTT and 2.5 µL of 1M urea for 5 minutes at room temperature) followed by dilution (90 µL water) and centrifugation (13,000 rpm for 2 minutes) to remove aggregates.Each sample (5 µL, corresponding to 250 ng VLP) was introduced to the mass analyzer after passage through a C3 reversed-phase HPLC column (0.5 mL/min; step elution profile = 98/2 H2O/MeCN for 2.5 min; 20/80 H2O/MeCN for 4.5 min; column wash with 100% MeCN, all elution solvents containing 0.1% formic acid).
Particle sizes were assessed by DLS, TEM, and SEC.All libraries were diluted to 0.1 mg/mL concentration and assessed with a Dynapro DLS plate reader (Wyatt Technologies) in a 384-well plate.TEM samples were prepared by diluting each of the VLP libraries to 0.05-0.1 mg/mL with sterile water.Grids were prepared by applying 6 µL of diluted VLP sample to the carbon surface of 300 mesh lacey formvar/carbon TEM grids (Ted Pella Inc.) and incubating for 90 seconds.
Grids were blotted against a Kimwipe and excess salts were removed by laying the carbon surface of the grid on 1.0 mL water drops for 5-10 seconds and repeating once more.The grids were blotted against a Kimwipe again and negative staining was performed by applying 5 µL of a 2% w/v uranyl acetate stain solution onto the grid for 45 seconds.The grids were blotted and air dried for 5-10 minutes at room temperature.TEM images were acquired with Hitachi HT-7700 operating at 120 kV.Size-exclusion chromatography was performed on a Superose-6 column (0.4 mL/min flow rate with 0.1 M potassium phosphate buffer, pH 7.0).The protein content of library samples was determined by dilution to 1.0 mg/mL and analysis with a Bioanalyzer 2100 Protein 80 microfluidics chip (Agilent) for the protein content.

CryoEM sample preparation, data collection and data processing
R1.2/1.3 quantifoil (carbon foil) holey grids (EMS, 300 mesh) were plasma-cleaned using Solarus Gatan Plasma System in an oxygen-argon environment at 15 W power (30 s for C-term NNS samples; 7 s for loop NNS samples).VLPs were diluted to 1 mg/mL, and an aliquot of 3 μL of sample was applied to the foil side of the grid.Grids were plunge-frozen into liquid ethane on a Leica EM GP automatic plunge freezer (Leica Microsystems) after blotting (1 s for C-term NNS; 3 s for loop NNS).(The indicated differences in these procedural parameters derived from the routine procedures of different users rather than from differences in sample properties.)Frozen grids were clipped in AutoGrid rings and stored in liquid nitrogen until data collection.
C-term NNS data were acquired at the New York Structural Biology Center (NYSBC) on a Krios 1 instrument (dataset 19nov18k) with the following parameters: Gatan K2 camera in counting mode, 70 e -/A 2 total dose on a specimen level, 10 s exposure fractionated into 50 frames, 1-2 µm nominal defocus range and 1.073 Å calibrated pixel size.A total of 2631 movies were acquired using Leignon 1 for automatic data acquisition.Loop NNS data were acquired at NYSBC on a Krios 2 instrument equipped with Gatan Bioquantum energy filter (dataset 21feb14a) with the following parameters: Gatan K3 camera in counting mode (1x binning), 52 e -/Å 2 total dose, 2 s exposure fractionated into 50 frames, energy filter slit width 15 eV, 1-2 µm nominal defocus and 1.076 Å calibrated pixel size.A total of 1026 movies were acquired using Leginon.

Supplementary Tables
1 -0.13 0.07 -0.13 -0.14 -0.21 0.04 -0.19 0.12 0.29 0.05 0.03 0.04 -0.10 0.15 0.07 0.12 2 -0.11 0.00 -0.16 -0.26 -0.03 -0.04 -0.05 0.14 0.09 -0.03 -0.04 Table S7.Log2 (fold change) in frequency of each amino acid within the 15-mer positions from the isolated G2 loop NNS VLP library compared to its corresponding transformed plasmid library (Figure S13a).Minimum value in this table = -3.42(10.7-fold reduced representation in the selected cDNA library relative to the starting transformed library); maximum value = 2.12 (4.4-fold enhanced representation in the selected cDNA library).Table S8.Log2 (fold change) in frequency of each amino acid within the 15-mer positions from the isolated G2 C-term NNS VLP library compared to its corresponding transformed plasmid library (Figure S13b).Minimum value in this table = -5.02(32.4-fold reduced representation in the selected cDNA library relative to the starting transformed library); maximum value = 2.22 (4.7-fold enhanced representation in the selected cDNA library).Table S9.Log2 (fold change) in frequency of each amino acid within the 15-mer positions from the isolated G2 loop VNS VLP library compared to its corresponding transformed plasmid library (Figure S13c).Minimum value in this table = -0.77(1.7-fold reduced representation in the selected cDNA library relative to the starting transformed library); maximum value = 1.39 (2.6-fold enhanced representation in the selected cDNA library).
Figure S1.Sanger sequencing traces of the cloned plasmid libraries, showing the success in cloning of the 15NNS and 15VNS stretch in the loop or C-term of PP7 dimer coat protein.A = green, T = red, C = blue, G = black.

Figure S2 .
Figure S2.Library cloning strategy.(a, b) Plasmids encoding sfGFP are used as the initial vector since coat protein (CP) variants containing this motif are incapable of assembly into VLPs, as shown by the sucrose gradient (c).This strategy eliminates the interference of any small trace amount of plasmid template carryover in the downstream VLP library analysis.

Figure S3 .
Figure S3.Characterization of particles isolated from generation 1 (G1) NNS libraries.(a) Hydrodynamic radii of VLP libraries assessed by dynamic light scattering (DLS).(b) Size-exclusion chromatography (SEC) elution profile of VLP libraries with the 260 nm wavelength monitoring the encapsulated nucleic material that co-migrates with the assembled protein particles, monitored by 280 nm signal.

Figure S4 .
Figure S4.Characterization of particles isolated from generation 2 (G2) NNS libraries.Panels a-c describe the G2 loop library; panels d-f describe the G2 C-term library.(a,d) CP mass spectrometry trace of VLP libraries.(b) hydrodynamic radius determined by DLS.(c,f) SEC profile and (e) TEM image of assembled VLPs.

Figure S5.
Figure S5.Characterization of particles isolated from generation 2 (G2) VNS libraries.Panels a-b describe the G2 loop library; panels c-d describe the G2 C-term extension library.(a,c) hydrodynamic radius determined by DLS.(b,d) SEC profile of assembled VLP libraries.

Figure S6.
Figure S6.Length distribution of peptides encoded within the G1 loop (left) and C-term (right) NNS transformed libraries.

Figure S7 .
Figure S7.Assessment of bias in the G1 transformed plasmid libraries compared to theoretical codon frequency in the respective NNS or VNS codons.Heat maps represent log2 of the abundance ratio (cDNA library / transformed library), of each amino acid at each position.(a) G1 loop NNS, (b) G1 C-term NNS, (c) G1 loop VNS, and (d) G1 C-term VNS libraries.

Figure
Figure S8.(a) Total number of unique 15-mer sequences with frequency ≥ 2 successfully displayed on the PP7 dimer VLPs.(b) Average frequency of unique 15-mer sequences successfully displayed on the PP7 dimer VLPs.T1 = generation 1 transformation library (possible sequences expressed in E. coli), C1 = generation 1 cDNA library (particles recovered and sequenced), T2 and C2 = transformation and cDNA libraries for generation 2, respectively.

Figure S9 .
Figure S9.Favorability of amino acids within the G1 cDNA libraries in relation to hydrophobicity.The yaxis values represent average log2 of the abundance ratio (cDNA library / transformed library) of a given amino acid across positions 1-15 of the extension or loop insert; the error bars represent the global standard deviation for each amino acid that appears within one of the variable peptide segments.

Figure S10 .
Figure S10.Calculated properties of peptides in the loops or C-termini of the indicated first-generation PP7 dimer VLP libraries, comparing all of the sequences available for expression (transformed plasmid library) to those of the assembled, isolated, and sequenced particles (cDNA library).(a) Hydropathy evaluated by GRAVY score distribution.(b) Charge at pH 7.4.

Figure S11 .
Figure S11.Favorability, measured by log2 of the abundance ratio (cDNA library / transformed library), replot of Figure 4 with the color scale matching the full range of data within each panel.(a) Generation 1 (G1) loop NNS, (b) G1 C-term NNS, (c) G1 loop VNS, (d) G1 C-term VNS.Amino acids are grouped by their properties indicated by the color of the single-letter code (hydrophobic = black, polar/uncharged = blue, basic and positively charged = green, acidic and negatively charged = red, aromatic = purple, and thiol containing = pink).

Figure S12 .
Figure S12.Favorability (a, b) and count (c, d) of truncated peptide extensions appended to PP7 dimer and PP7 monomer capsids, as the result of random TAG stop codon appearance in the G1 C-term NNS and loop NNS libraries, respectively.Favorability is measured by log2 of the abundance ratio (cDNA library / transformed library) of each amino acid at each position.The favorability heat plots represent the combined results for 5-to 14-mers (a, b; top panels), exclusively 5-mers (a, b; middle panels), and exclusively 10-mers (a, b; bottom panels).The latter two are included to show that specifically-sized extensions give similar results as the pooled analyses.Crossed out cells reflect the omission of the indicated amino acid at that position in the cDNA library.(c, d) The number of total and unique peptide sequences (appearing at least twice) of particles isolated from each library.

Figure S13 .
Figure S13.Three of G1 C-term NNS library compared side-by-side.Library replicate 1 was constructed from nucleotide library purchased from Eurofins, whereas nucleotide libraries in replicates 2 and 3 were purchased from IDT. Replicates 1 and 2 were sequenced by Illumina MiSeq, while replicate 3 was sequenced by Illumina MiniSeq.(a) Assembled particle yields (from 0.5 L of E. coli culture) and the number of total and unique sequences (appearing at least twice) of particles isolated from each library.(b) Heat map showing the log2 fold bias of each amino acid at various positions of the 15-mer C-terminal extension.(c) The length of full 15-mers vs the truncated peptides sequenced in each replicate.(d) Change in the hydropathy (left) and charge (right) from the G1 transformed DNA library (T1) and the cDNA generated from assembled VLP variants (C1).(e) Calculated properties of peptides in replicate 3 of the plasmid and cDNA libraries in this experiment, showing hydropathy evaluated by GRAVY score (left) and calculated peptide charge at pH 7.4 (right).

Figure S14 .
Figure S14.Calculated properties of peptides in the loops or C-termini of the indicated second-generation PP7 dimer VLP libraries, comparing all of the sequences available for expression (transformed plasmid library) to those of the assembled, isolated, and sequenced particles (cDNA library).(a) Hydropathy evaluated by GRAVY score distribution.(b) Charge at pH 7.4.

Figure S16 .
Figure S16.Favorability of amino acids within the G2 cDNA libraries in relation to hydrophobicity.The yaxis values represent average log2 of the abundance ratio (cDNA library / transformed library) of a given amino acid across positions 1-15 of the extension or loop insert; the error bars represent the standard deviation within these positions.

Figure S17 .
Figure S17.Cryogenic electron microscopy analysis of particle libraries displaying randomized 15NNS sequences.(a) Analysis of 263,024 particles from G1 loop NNS library, identifying icosahedral particles with both T=3 and T=4 symmetries.(b) T=4 class map of G1 loop NNS library (left) with the surface density of the displayed randomized loop peptide highlighted (right).(c) Analysis of 34,016 particles in the G1 Cterm NNS library containing well-ordered T=4 icosahedral symmetry.(d) T=4 class map of G1 C-term NNS library (left) with the surface density of the displayed randomized peptide highlighted (right)."Junk" refers to protein aggregates with no discernable structure.

Table S1 .
Sequence of oligonucleotides used for cloning, amplification, and NGS preparation.

Table S2 .
Validation of assembled VLP variants from the NNS and VNS libraries.
* The purified VLP were partially insoluble, and only the concentration of the soluble fraction was measured.