Selective reversed-phase high-performance liquid chromatography method for the determination of intact SARS-CoV-2 spike protein

Protein-based vaccines are playing an increasingly important role in the COVID-19 pandemic. As late-stage clinical data are finalized and released, the number of protein-based vaccines expected to enter the market will increase significantly. Most protein-based COVID-19 vaccines are based on the SARS-CoV-2 spike protein (S-protein), which plays a major role in viral attachment to human cells and infection. As a result, in order to develop and manufacture quality vaccines consistently, it is imperative to have access to selective and efficient methods for the bioanalytical assessment of S-protein. In this study, samples of recombinant S-protein (hexS-protein) and commercial S-protein were used to develop a selective reversed-phase HPLC (RP-HPLC) method that enabled elution of the intact S-protein monomer as a single peak on a wide pore, C8-bonded chromatographic column. The S-protein subunits, S1 and S2 subunits, were clearly separated from intact S-protein and identified. The results of this study set the foundation for reversed-phase HPLC method development and analysis for selective and efficient separation of S-protein monomer from its subunits.


Introduction
Within a year of the first COVID-19 cases, vaccines were developed rapidly to combat the public health crisis. First-generation vaccines were based on nucleic acids and non-replicating viral vectors, which allowed novel technologies to debut effective vaccine candidates at the global scale [1 , 2] . However, as the SARS-CoV-2 virus spread rapidly, the need for global immunization coverage increased dramatically [3] . As a result, conventional protein-based vaccines are expected to find a strong foothold in logistically difficult regions as the pandemic develops [4] . The advantages of these vaccines include a strong history of triggering safe and robust immune responses [5 , 6] along with significant flexibility in storage and transportation.
The vast majority of protein-based COVID-19 vaccines use fulllength or truncated forms of the SARS-CoV-2 virus spike protein (S-protein) as the primary antigenic component [5 , 7] . The Sprotein is a heavily glycosylated fusion protein (140 kDa by amino acid sequence) that homotrimerizes on the virus surface and is critical for membrane fusion and host cell receptor binding [8] . During infection, the S-protein is activated by proteolytic cleavage into its S1 and S2 subunits [9] by host-cell proteases. Importantly, viral surveillance show that mutations in the S-protein have the highest impact on transmissibility and immune escape [10 , 11] .
To date, a growing number of protein-based COVID-19 vaccines are at the late stages of clinical trials and seeking approval [7] . One highly anticipated candidate is the NVX-CoV2373 protein subunit vaccine, which contains a cleavage-resistant full-length Sprotein stabilized in the prefusion conformation [12] . Another vaccine in late-stage development, SCB-2019, also uses the full-length S-protein but maintains the wild-type protease cleavage site, making it prone to cleavage by host-cell proteases during manufacturing [13] . Given the diversity of protein-based COVID-19 vaccines, the complexity of the S-protein, and the potential for variants in future vaccine formulations, robust analytical techniques will be needed to separate S-protein species and assess the critical quality attributes which ensure product consistency and comparability throughout its lifecycle.
Reversed   and quality assessment of proteins. Compared to other techniques such as SDS-PAGE, the advantages of RP-HPLC in resolving power, selectivity, and sensitivity are significant, especially for complex and heterogeneous biotherapeutics. Not only can antigens be separated by sequence but also post-translational modifications. As a result, RP-HPLC is a highly reproducible analytical tool widely used for a variety of biotherapeutics, such as purified protein, virus-like particles, and whole viruses [14][15][16] . However, as of this writing, only one brief application note with limited detail on S-protein analysis by RP-HPLC exists [17] and no published methods have characterized intact S-protein with its major subunits, S1 and S2.
To address this gap, we developed a selective, repeatable, and versatile RP-HPLC method to characterize S-protein. Using an inhouse purified, full-length S-protein variant called HexaPro Sprotein (hexS-protein) and a variety of commercial S-protein products, we identified clear and reproducible peak patterns associated with full-length S-protein as well as its S1 and S2 subunit cleavage products. Finally, a preliminary method qualification is presented for additional direction in biotherapeutic applications.
Protein size and purity were confirmed by gel electrophoresis. Final samples were dialyzed against buffer (1.0 M Phosphate Buffer Solution pH 7.4 from Sigma-Aldrich diluted to 5 mM phosphate, supplemented with 50 mM sodium chloride) at room temperature and stored at 4 °C or −80 °C.

HPLC instrumentation and chemicals
All chemicals used for HPLC assays were analytical reagent grade. Sodium chloride, sodium phosphate dibasic, polysorbate-80, and trifluoroacetic acid (TFA) were purchased from Sigma-Aldrich (St. Louis, MO, USA). Acetonitrile (ACN) and 2-propanol were purchased from Merck KGaA (Darmstadt, Germany). Distilled water (dH2O) was deionized on a Nanopure Diamond TM system (Barnstead International, Dubuque, IA, USA). Commercial reference Sproteins and related subunits were purchased from Sino Biological Inc., summarized in Table 1 .
The HPLC system consisted of a Waters Alliance 2695 chromatograph equipped with a column heater and an auto-sampler with a sample cooling device. Fluorescence and UV spectroscopy detection was accomplished by coupling in-series a Waters 2475 Multichannel Fluorescence Detector (8 μL flow cell, excitation wavelength of λ ex 280 nm and emission wavelength at λ em 335 nm) and a Waters 2998 UV-VIS photodiode array detector (Waters, QC, Canada). Data acquisition and integration were performed with Empower 3 Chromatography Data Software.
Additional details on LC-MS/MS analysis are provided in Supplemental Information.

RP-HPLC analysis
The chromatographic column used for the optimized protocol was an Aeris Widepore XB-C8, 150 mm x 4.6 mm, 3.6 μm particles, 200 Å pore size (Phenomenex, Torrance, CA, USA). The optimized separation method were carried out at 40 ºC with an AB gradient elution of 45 min at a flow rate of 1.0 mL/min (Gradient I, Table 2 ). Eluent A was 0.1% (v/v) aqueous TFA and eluent B was 0.08% (v/v) TFA in 50% ACN and 50% 2-propanol. Prior to analysis, two zero injections (no sample injected) are performed with Gradient I. Injection volume for full-length S-protein material was 20 μL at 50 μg/mL and 10 μL at 50 μg/mL for comS1 and comS2.
For linearity assessment with polysorbate-80, final solutions were diluted to 0.05 w/v polysorbate-80 using a 1% w/v stock solution.

Spike protein material used for method development
To generate vaccine-relevant protein material for RP-HPLC method development, a modified full-length S-protein variant called hexS-protein was purified from FreeStyle293-F cells [18] . As one of the earliest published S-protein constructs available, many manufacturers have incorporated the features of hexS-protein into protein-based COVID-19 vaccines such as NVX-CoV2373 (Supplemental Figure S1) [5 , 12] . Some features enhance immunogenicity, originally established for a related betacoronavirus [20] , while other features improve production yields [18] .
As a quality check to confirm that in-house purified glycosylated hexS-protein formed the expected trimeric structure, samples were analyzed by SE-HPLC using a TSKgel G40 0 0SWXL column, appropriate for fractionating biomolecules up to 70 0 0 kDa. A comparison to MW standards showed that the main hexS-protein peak eluted at a MW app of 700 kDa ( Fig. 1 A), consistent with a previous SE-HPLC report on the size of trimeric glycosylated S-protein determined by an Xbridge BEH200 column [21] . It is not clear whether the small group of unresolved peaks eluting before the major peak correspond to higher order aggregates or alternate S-protein conformers, since both have been observed as earlier-eluting species by SE-HPLC [21 , 22] . However, it's noted that column parameters in this study, such as particle size, could contribute to resolution loss between species of similar hydrodynamic radius. Overall, the observed size of the main hexS-protein was consistent with the trimeric form.
In addition to in-house purified hexS-protein, commercial products available at the time of this study were used to represent vaccines closer to the wild-type S-protein sequence. These include two full-length wild-type S-proteins (comS-protein A and comS-protein B) along with the S1 and S2 subunits alone (comS1 and comS2, respectively) ( Table 1 ).

Method development and optimization
Under the organic and acidic solvent conditions of RP-HPLC, proteins are typically denatured and higher-order species dissociated. As a result, RP-HPLC analysis will characterize the physiochemical features of monomers (140 kDa for hexS-protein).
Method development using hexS-protein and commercial Sprotein products began with a number of RP-HPLC columns established for bioanalytical characterization of viral membrane proteins, such as hemagglutinin ( Fig. 1 B) [16] . Under low TFA conditions, 0.04% TFA Eluent A and 0.03% TFA Eluent B, two MICRA® HPLC NPS-ODSI columns (33 mm × 4.6 mm with 1.5 μm particles, and 100 mm × 4.6 mm with 3 μm particles) were first tested with a separation gradient from 40% to 65% Eluent B over 25 min. Both showed some peak resolution and acceptable back pressure, with the 3 μm particle column allowing for higher sam- ple loading ( Fig. 1 B, top trace). Later, these columns were replaced with the Aeris Widepore XB-C8 core-shell silica column (150 mm x 4.6 mm, 3.6 μm particles, 200 Å ), also suitable for separating large hydrophobic proteins. This provided even lower back pressure, significantly improved near-baseline resolution between peaks, and comparable sample loading capacity ( Fig. 1 B, bottom trace). An Aeris Widepore XB-C18 column was also tested and showed similar performance to the C8 version, except for the co-elution of an apparent S-protein peak with TFA impurities at 6 min ( Fig. 1 B, middle trace). Due to this minor difference in performance, the Aeris Widepore XB-C8 column was chosen for further evaluation. Adjusting column temperatures from 55 °C to 40 °C had no notable impact on sample recovery or resolution. Interestingly, with a newly purchased Aeris Widepore XB-C8 column and the gradient elution profile used in Fig. 1 B (40% to 65% B over 25 min), TFA concentrations needed to be increased to 0.1% in Eluent A and 0.08% in Eluent B for similarly good peak resolution ( Fig. 2 A and B , high TFA traces). In contrast, the lower TFA concentrations used in development (0.04% TFA Eluent A and 0.03% TFA Eluent B) produced significant peak tailing and reduced resolution ( Fig. 2 A and B , low TFA traces). Later, the gradient was extended (35% to 65% Eluent B over 30 min) to prevent the coelution of S1 peaks in comS-proteins with TFA impurity peaks ( Table 2 , Gradient I). These results were reproducible across multiple recently (2021) manufactured columns. The difference in performance between these newly purchased columns and the older columns used in method development could be due to unknown impurities introduced in the column's usage or manufacturing history, which emphasizes the importance of interlaboratory studies for robustness assessment across equipment and environments.
In all studies, both native fluorescence (ex/em: 280 nm/335 nm) and absorbance (210 -400 nm) were monitored, which showed comparable peak retention patterns using Gradient I ( Fig. 2 C). Native fluorescence detection of aromatic residues, primarily tryptophan, is advantageous for low-abundance samples due to its sensitivity [23][24][25] . Since the hexS-protein sequence contains 12 tryptophan residues, fluorescence detection allowed for significantly lower injection volumes and improved resolution.
The final method (Gradient I, using the Aeris Widepore XB-C8 column) showed good peak resolution and was highly reproducible across independent sample preparations (Supplemental Figure S2). Analysis of purified hexS-protein showed elution of a single major peak at a retention time of 26.7 min ( Fig. 3 , second trace from bottom), with a corresponding absorbance signal at 280 nm. Confirmation of the S-protein identity (60% sequence coverage with 46 unique peptide sequences identified) was obtained by MS/MS analysis of the collected HPLC peak (Supplemental Figure S3).

Separation of S1, S2, and full-length S-protein
A potential quality issue for COVID-19 vaccines, such as SCB-2019, is uncontrolled cleavage during production [26] . Since both Fig. 3. RP-HPLC chromatograms with Gradient I comparing hexS-protein to commercial products: full-length S-protein (comS-protein A and comS-protein B), the S1 subunit (comS1), and the S2 subunit (comS2). No sample injected is in the zero injection and solvent gradient is shown by the dashed line. comS-protein A and comS-protein B retain the native S1/S2 protease cleavage site, the susceptibility to cleavage is increased [9] , which can be assessed by RP-HPLC analysis with Gradient I. Indeed, chromatograms of comS-protein A and comS-protein B showed the presence of multiple peaks ( Fig. 3 , top two traces). These were identified by comparison to comS1 and comS2 subunit products ( Fig. 3 , third and fourth traces from the top respectively). Both comS-protein A and comS-protein B showed a peak eluting at around 29.4 min, which corresponded to the comS2 subunit retention time of 29.4 min. However, only comS-protein A showed a significant peak at 12.3 min, which corresponded roughly to the comS1 subunit retention time of 11.7 min. In contrast, no S1 cleavage product could be detected in the comS-protein B product, despite significant S2 presence. While not a quantitative analysis, which would require a calibration curve with a carefully selected reference standard, this relative difference in S1/S2 fluorescence between comS-protein A and B cannot be solely explained by aromatic residue content. These results suggest significant differences in subunit content between products, possibly caused by expression or purification conditions impacting degradation or recovery.
It was noted that the full-length species of comS-protein A and comS-protein B both eluted at 27.7 min, which was shifted relative to full-length hexS-protein at 26.7 min ( Fig. 3 , second trace from bottom) and likely due to sequence or glycosylation differences. An extraneous peak at 1.5 min in comS-protein B ( Fig. 3 , top trace) was determined non-proteinaceous from lack of absorbance. All other S-protein and subunit peak identities were confirmed by MS/MS analysis of the corresponding fractions collected from HPLC (Supplemental Figure S4).

Preliminary method qualification
In a preliminary qualification assessment, the method's linearity, limit of detection (LOD), limit of quantification (LOQ), and intra-assay precision was assessed.
Using hexS-protein, we confirmed linearity between 50 -100 μg/mL using fluorescence (R 2 = 0.995, 20 μL injections). This range was significantly improved with the addition of 0.05% w/v polysorbate-80, a common ingredient in the final formulation of vaccines such as NVX-CoV2373, which likely prevented nonspecific protein adsorption to vial surfaces. With the polysorbate-80 detergent, method linearity was determined between 2.5 -100 μg/mL using fluorescence (R 2 = 1.000, 20 μL injections) with no impact on the retention time or shape of the hexS-protein peak. In both linearity assessments, with and without polysorbate-80, 3 injections were analysed for each of 6 concentrations.
The LOD, defined as the lowest protein amount injected to give a signal-to-noise ratio of 3:1, was determined with hexS-protein at 25 ng for fluorescence detection and 500 ng for absorbance detection at 280 nm. The LOQ, defined as the lowest protein amount injected to give a signal-to-noise ratio of 10:1, was determined with hexS-protein at 50 ng for fluorescence detection and 1 μg for absorbance detection at 280 nm. These results show that fluorescence detection is significantly advantageous for low-concentration S-protein samples.

Conclusions
Protein-based vaccines, including those based on whole viruses and recombinant protein, are some of the most well-established and robust vaccine types [5] . Overall, these advantages in efficacy and stability are significant, especially for areas without access to the stable cold-chain distribution system required for firstgeneration mRNA vaccines [27] and for future endemic scenarios where addressing local needs will be priority.
In this work, we show that RP-HPLC is a useful tool for assessing vaccine-relevant S-protein quality and consistency. The separation and identification of full-length S-protein from its major cleavage subunits, S1 and S2, was consistently demonstrated for a variety of products. Preliminary method qualification suggests that the RP-HPLC method could be easily adapted to meet the stringent quality requirements of drug assessment. The observations reported here, such as the impact of TFA conditions and polysorbate-80 addition, will be informative for adapting the method to different systems.
Historically, vaccines and their quality assays were developed and standardized in parallel, slowly over decades [28] . However, in the COVID-19 era, a variety of modern techniques will need to be established rapidly. As a bioanalytical technique, RP-HPLC is one such tool that provides the rapid results, reagent flexibility, and method adaptability well-suited for vaccine analysis.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.