Production of full-length SARS-CoV-2 nucleocapsid protein from Escherichia coli optimized by native hydrophobic interaction chromatography hyphenated to multi-angle light scattering detection

The nucleocapsid protein (NP) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for several steps of the viral life cycle, and is abundantly expressed during infection, making it an ideal diagnostic target protein. This protein has a strong tendency for dimerization and interaction with nucleic acids. For the first time, high titers of NP were expressed in E. coli with a CASPON tag, using a growth-decoupled protein expression system. Purification was accomplished by nuclease treatment of the cell homogenate and a sequence of downstream processing (DSP) steps. An analytical method consisting of native hydrophobic interaction chromatography hyphenated to multi-angle light scattering detection (HIC-MALS) was established for in-process control, in particular, to monitor product fragmentation and multimerization throughout the purification process. 730 mg purified NP per liter of fermentation could be produced by the optimized process, corresponding to a yield of 77% after cell lysis. The HIC-MALS method was used to demonstrate that the NP product can be produced with a purity of 95%. The molecular mass of the main NP fraction is consistent with dimerized protein as was verified by a complementary native size-exclusion separation (SEC)-MALS analysis. Peptide mapping mass spectrometry and host cell specific enzyme-linked immunosorbent assay confirmed the high product purity, and the presence of a minor endogenous chaperone explained the residual impurities. The optimized HIC-MALS method enables monitoring of the product purity, and simultaneously access its molecular mass, providing orthogonal information complementary to established SEC-MALS methods. Enhanced resolving power can be achieved over SEC, attributed to the extended variables to tune selectivity in HIC mode.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) nucleocapsid protein (NP) is an attractive diagnostic marker to coronavirus disease 2019 (COVID-19) [1][2][3] since it exhibits a lower mutation rate and is abundantly expressed during infection [4][5][6][7]. Vaccines against SARS-CoV-2 are almost exclusively based on the virus spike protein [8]. Hence, diagnostic markers based on NP are essential to discriminate between infection-and vaccination-conferred immunity [9]. This protein has a strong tendency for dimerization and interaction with DNA when expressed in a bacterial host [10]. Moreover, coronaviral nucleoproteins are known to undergo concentration-dependent multimerization [3,11,12] and fragmentation [10]. In fact, most of the preparations of coronaviral nucleocapsid proteins are dealing with either the N-terminal or the C-terminal domain separately [13,14]. When full-length NP is produced, the sizes usually correspond to various multimerization states [12,13] with wide size distributions, suggesting a heterogeneous product [12,13,15]. Therefore, a fast in-process control method is needed to monitor product purity and molecular mass, especially during downstream processing (DSP) development. This method can be used to assess the impact of buffer optimization and the use of additives to reduce protein aggregation. While polyacrylamide gel electrophoresis is commonly used for monitoring product quality, the employed staining and detection methods suffer from a limited linear dynamic range, which makes it less suitable for quantitative protein analysis [16]. Native high-performance liquid chromatography methods, such as hydrophobic interaction chromatography (HIC) hold promise for a rapid screening method with high resolving power [17][18][19]. Unfortunately, traditional HIC analysis with UV detection does not allow for monitoring the molecular mass of the separated analytes. A powerful analytical method to characterize protein mass in native conditions is to use multi-angle light scattering (MALS) detection, as was demonstrated already for ion exchange chromatography, size exclusion chromatography and field flow fractionation [20][21][22]. Combining HIC with MALS allows for separation by hydrophobicity while simultaneously obtaining information about the molecular mass and relative quantity of the analytes.
Although NP is post-translationally modified by phosphorylation [23,24] and potentially glycosylation [25] in its native host, it can be expressed in E. coli as a soluble protein with high titers without compromising antigenicity and diagnostic performance in serological assays [9]. The E. coli strains BL21 (DE3) and the recently developed enGenes-X-press [26] are all based on T7 expression [27], whereby with the latter recombinant protein synthesis can be decoupled from cell growth to exclusively utilize metabolic resources for synthesis of the protein of interest. Microbial expression is often combined with fusion of the target protein to an affinity tag to simplify downstream process development. A hexa-histidine (6H) tag is commonly used in combination with immobilized metal affinity chromatography (IMAC) because of the wide availability of resins and the easy capture of target proteins, even from crude solutions. These tags can be removed when the native target protein is required, and several enzymes are available to perform this task such as Tobacco etch virus protease, thrombin or the newly developed circularly permuted caspase-2 (cpCasp2) [28,29]. While most proteases are either too unspecific or leave residual amino acids on the N-terminus, cpCasp2 combines high specificity with the ability to fully remove the affinity CASPON tag, regardless of the N-terminus of the target protein. The CASPON tag combines a 6H affinity tag, a cleavage site and a bacteriophage T7 based solubility tag [29,30].
A robust biotechnological production platform, including protein expression and DSP, is essential for the reliable recombinant production of SARS-CoV-2 NP in sufficient quantity and with acceptable product quality. To ensure the absence of host cell-derived proteins that could otherwise lead to erroneous results in antibody tests, additional unit operations are usually performed after capture, such as ion exchange or hydrophobic interaction chromatography. HIC is a potent purification method to separate highly similar proteins, such as protein variants or fragments [31,32]. This purification step also aids in removal of host cell proteins still present after IMAC [33].
In this paper, we report the successful expression and purification of recombinant SARS-CoV-2 NP from E. coli. A growth-decoupled E. coli fermentation strategy in which NP was expressed as a fusion-protein allowed for high titers [26]. A dedicated DSP strategy was identified to produce NP with high yield and product purity, using the CASPON technology for fusion-protein production [29,30]. A HIC-MALS method was developed to monitor the purity of the NP end product and simultaneously assess its molecular mass, allowing to evaluate and optimize the different DSP strategies. Several complementary analytical techniques, including MS/MS peptide mapping and SEC-MALS, were performed to confirm the qualitative and quantitative data derived from HIC-MALS.

Chemicals and reagents
All chemicals were purchased from Sigma-Aldrich (St. Louis, MO, USA). The buffers for analytical HPLC-MALS analysis were prepared using HQ-H 2 O (18.2 MΩ cm), filtered through a 0.1 μm filter and degassed prior to use. A variant of T7AC-6H-cpCasp2 was used for the enzymatic tag removal [29].

Expression of the SARS-CoV-2 nucleocapsid protein
For selection of the best expression system regarding soluble volumetric titer, shake flask experiments were performed. Cells from research cell banks of the different constructs were inoculated in semisynthetic medium [36] and grown on an orbital shaker at 200 rpm at 30 • C. Expression was induced at a cell density of OD 600 = 1 by addition of 0.5 mM IPTG and additionally 100 mM arabinose for the enGenes-X-press strains.
For lab-scale production of NP, growth-decoupled recombinant protein production was performed according to Stargardt et al. [26]. Cells were grown in fed-batch mode in a 1.0 L (0.5 L batch volume, 0.5 L feed) DASGIP® Parallel Bioreactor System (Eppendorf AG, Hamburg, DE) equipped with standard probes (pH, pDO). The pH was maintained at 7.0 ± 0.05, temperature was maintained at 37 ± 0.5 • C during the batch phase and decreased to 30 ± 0.5 • C in the beginning of the feed phase. The dissolved oxygen (O 2 ) level was stabilized at > 30%. The composition of the media and the pre-culture were described elsewhere [26,36]. All media components were added in relation to the grams of calculated cell dry mass (CDM) to be produced and for calculation the required yield coefficient YX/S of 0.3 g/g specific for BL21 (DE3) was used. Feeding was initiated when the culture, grown to 4 g/L CDM in 0.5 L batch medium, entered the stationary phase. The carbon-limited fed-batch regime was divided into three separate phases. An exponential substrate feed providing a constant growth rate of 0.13 h − 1 was selected for the first 15 h, followed by two linear feed profiles at 0.4 g medium/min and 0.245 g medium/min for four and 15 h respectively, resulting in a final CDM of about 30 g/L in 1.2 L. Induction of NP production was facilitated at feed hour 19 with the addition of 0.1 mM IPTG and 100 mM arabinose. For off-line analysis (OD 600 , CDM, product), samples were withdrawn from the bioreactor prior to induction and after induction at two-hourly intervals. To describe cell growth, OD 600 and CDM was determined according to Cserjan-Puschmann et al. [37]. For determination of NP titers by SDS-PAGE, purified NP with the concentrations 75 μg/mL, 50 μg/mL and 25 μg/mL were used as standards to generate calibration curves via linear regression.

Purification of SARS-CoV-2 nucleocapsid protein
The cells were harvested by centrifugation (Beckman Avanti JXN-26 with JLA-10.500 rotor, Krefeld, Germany). Following the harvest of the cells, 20 g of cell wet mass was solubilized in 250 mL of lysis buffer (50 mM NaPO 4 , 450 mM NaCl, pH 7.4). Cell lysis was performed using highpressure homogenization (Panda PLUS 2000, Gea, Düsseldorf, Germany) for two cycles with a first and second stage pressure of 1000 and 100 bar, respectively. The lysed cells were clarified by centrifugation and filtration (0.22 μm). The centrifugation parameters differed between DSP strategies. Moreover, starting from strategy DSP #2, a nuclease treatment was included. Denarase was acquired from c-LEcta GmbH (Leipzig, Germany) and Salt Active Nuclease High Quality was acquired from ArcticZymes Technologies ASA (Tromsø, Norway). Centrifugation settings and nuclease digest parameters can be found in Table S1.
All chromatographic purification steps were performed on an Ä KTA Pure 25 (Cytiva, Austria). The outlet was monitored at 254, 280 and 320 nm. The compositions of the mobile phases can be found in Table S2. All columns were packed in Tricorn columns with 10 mm internal diameter, with varying length depending on the column volume (CV), which can be found in Table S2. The stationary phase resins were acquired from Cytiva, Bio-Works Technologies (Uppsala, Sweden) and Tosoh Corporation (Griesheim, Germany). For the capture step clarified cell lysis supernatant was subsequently loaded on an equilibrated WorkBeads 40 Ni NTA column. The residence time for the whole capture chromatography run was kept constant at 2 min. After loading, the column was washed with equilibration buffer for 10 CV to remove weakly bound impurities and CASPON-NP was eluted using a linear gradient to elution buffer in 10 CV. The subsequent unit operations differed for the five DSP strategies.
For DSP #1 the elution fraction of IMAC capture was buffer exchanged to remove imidazole using 15 mL Amicon 10 kDa ultrafiltration/diafiltration (UF/DF) units. This was done to ensure binding of impurities in the subtractive immobilized metal affinity chromatography (sIMAC) intermediate purification step. A protease digestion was used to remove the affinity fusion-tag and obtain native NP. CASPON-NP with a concentration of 1 mg/mL was incubated with 0.035 mg/mL T7AC-6H-cpCasp2 variant (100:1 M/M) for 2 h at room temperature. The protein solution was subsequently loaded onto the equilibrated sIMAC column at a residence time of 2.5 min. Native NP was collected in the flow-through fraction, while remaining CASPON-NP enzyme, previously co-purified host cell proteins and free tag were bound to the column and removed. The sIMAC flow-through fraction was directly loaded to a subtractive anion exchange chromatography (sAEX) column at a residence time of 2 min. NP was collected from the flow-through fraction.
For DSP #2, the enzymatic tag removal conditions were changed to 5 mg/mL CASPON-NP with 0.07 mg/mL T7AC-6H-cpCasp2 variant (50:1 M/M) incubated over night at 4 • C. The sIMAC step was performed as in DSP #1 and the sAEX polishing step was omitted.
DSP #3 was performed the same as DSP #2, but with an additional cation exchange chromatography (CEX) step performed after sIMAC.
The sIMAC flow-through fraction was loaded to an equilibrated SP Sepharose FF column at a residence time of 2 min and eluted at a residence time of 5 min using a 5 CV linear gradient.
DSP #4 added an additional HIC polishing step to DSP #3. The CEX eluate was conditioned by adding ammonium sulfate to a final concentration of 900 mM. The conditioned CEX eluate was loaded to an equilibrated Butyl Toyopearl 650-M column at a residence time of 3 min. NP was eluted using a 10 CV linear gradient at a residence time of 3 min.
For DSP #5, the whole process was streamlined and unit operations were performed in a different order. A HIC step was used for removing imidazole after the capture step, instead of UF/DF. For this, the IMAC eluate was conditioned by adding ammonium sulfate to a final concentration of 720 mM while keeping the CASPON-NP concentration under 1.5 mg/mL to avoid precipitation. The conditioned eluate was loaded to an equilibrated Butyl Sepharose HP column at a residence time of 2 min. The residence time was increased to 5 min during elution for which a linear gradient from 0 to 60% B in 9 CV followed by 60-100% B in 2 CV was used. The HIC eluate with a CASPON-NP concentration of 5-7 mg/mL was incubated with T7AC-6H-cpCasp2 variant at a 50:1 M ratio for 3 h at room temperature. The digest was conditioned to 10 mM imidazole and 150 mM NaCl for the sIMAC polishing step which was performed on a smaller column at a residence time of 2.5 min.
The final product fractions of all DSP strategies were buffer exchanged into PBS using UF/DF. For DSP #5 this UF/DF formulation step was performed using tangential flow filtration on an Ä KTA Flux using a Pellicon 3 Ultracel 10 kDa membrane (Merck Millipore). For calculation of DSP yield, a cell lysis and clarification yield of 28% was estimated.

Quantification of dsDNA, endotoxin, and host cell protein content
The analytical assays for host cell protein (HCP) determination via ELISA, dsDNA quantification via PicoGreen assay and Endotoxin quantification via recombinant Factor C assay were performed as previously described by Sauer et al. [38].

Gel electrophoresis
NP samples were qualitatively analyzed by sodium dodecyl sulfatepolyacrylamide gel electrophoresis (SDS-PAGE) using NuPAGE 4-12% Bis-Tris Precast Gels (Thermo Fisher Scientific). NuPAGE MES running buffer (Thermo Fisher Scientific) was used to prepare the gel and 15 μL of sample was loaded in the appropriate wells. The precast gel was run at 200 V for 45 min. SeeBlue Plus 2 (ThermoFisher Scientific) was used as a protein ladder.

Analytical hydrophobic interaction chromatography and multi-angle light scattering detection
All the HIC-HPLC measurements were performed using an Agilent 1260 Infinity II series instrument equipped with a Quat Pump, Multisampler, and VWD detector (set at 280 nm). Multi-angle light scattering data was acquired from a DAWN 8 MALS detector (Wyatt Technology, Santa Barbara, CA, USA). OpenLAB CDS ChemStation edition software (Agilent Technologies, Santa Clara, CA, USA) and ASTRA 7 software was used for the data interpretation. The NP samples were analyzed by using a dn/dc value of 0.185 mL/g and a UV extinction coefficient of 0.962 mL/mg⋅cm as input for the MW calculation.
A MAbPac HIC-10, 4.6 mm i.d. × 250 mm column length (5 μm particles, 1000 Å pore diameter, 4.15 mL CV) was acquired from Thermo Fisher Scientific to perform analytical HIC-HPLC measurements of the NP samples. The auto-sampler was set at 4 • C, 30 μL sample was injected, and the mobile-phase flow rate was set at 1 mL/min. A 10 min linear gradient was applied from 100% 1.2 M ammonium sulfate containing 0.1 M phosphate buffer (pH 7) to 0.1 M phosphate buffer (pH 7), next a 4 min washing step at 100% 0.1 M phosphate buffer was applied to assure that all bound impurities were removed, followed by 10 min column equilibration at gradient starting mobile phase conditions.

Analytical size exclusion chromatography and multi-angle light scattering -refractive index detection
A Dionex UltiMate 3000 RSLC system (Thermo Fisher Scientific, Germering, Germany) was used to perform the SEC-HPLC experiments. The instrument consisted of a membrane degasser, a Dionex Ultimate LPG-3400SD pump module, and a WPS-3000 TSL analytical split-loop well plate autosampler with a 100 μL sample loop installed, and a DAD-3000 diode array detector equipped with a 10 μL analytical flow cell. UV detection was carried out at λ = 280, 260 and 254 nm and the data collection rate was set at 50 Hz with a response time of 0.1 s. Viper MP35 N fittings (Thermo Fisher Scientific, Germering, Germany) were used to make the fluidic connections. MALS data was acquired using a DAWN HELEOS 18-angle detector (Wyatt Technology, Santa Barbara, CA, USA), which was coupled to an Optilab refractive index (RI) detector (Wyatt Technology, Santa Barbara, CA, USA). Chromeleon 7.2 Chromatography Data System (Thermo Fisher Scientific, Germering, Germany) and ASTRA V software were used for data collection. The NP samples were analyzed by using a dn/dc value of 0.185 mL/g and a UV extinction coefficient of 0.962 mL/mg⋅cm as input for the MW calculation.
A Superdex 200 Increase 10/300 GL, 10 mm i.d. × 300 mm column length (GE Healthcare, Uppsala, Sweden) column was used for the SEC experiments. The auto-sampler was set at 4 • C, 90 μL of sample were injected, and the mobile-phase flow rate was set at 0.5 mL/min. A 60 min isocratic analysis was performed using 0.1 M phosphate buffer (pH 7) containing 0.3 M sodium chloride as mobile phase.

Peptide mapping using reversed-phase-mass spectrometry (RPLC-MS)
Purified NP sample was digested in solution. The proteins were Salkylated with iodoacetamide and digested with LysC/GluC (Roche/ Promega) or Chymotrypsin (Roche). The digested samples were analyzed by RPLC-MS using an UltiMate 3000 system (Thermo Fisher Scientific, Germering, Germany) hyphenated to a QTOF MS (Bruker maXis 4G, Bruker) equipped with the standard ESI source in positive ion, DDA mode. MS-scans were recorded (range: 150-2200 Da) and the six highest peaks were selected for fragmentation. Instrument calibration was performed using ESI calibration mixture (Agilent).
A BioBasic C18 column, 0.32 mm i.d. × 150 mm (5 μm particles) acquired from Thermo Scientific was used for RPLC-MS analysis. 80 mM ammonium formate buffer was used as the aqueous solvent and 80:20 v/ v % acetonitrile: water as B solvent. 2 μg of NP tryptic digest was injected. A linear gradient was applied from 5% B to 40% B in 30 min, followed by a 6 min gradient from 40% B to 95% B to facilitate the elution of large peptides, at a flow rate of 6 μL/min. The analysis files were converted (using Data Analysis, Bruker) to mgf files, which are suitable for performing a MS/MS ion search with MASCOT. The files were searched against database containing the target sequences, HCPs and contaminates.

Expression of SARS-CoV-2 NP and optimization of the downstream processing protocol
An overview of the different steps for developing a DSP strategy based on the CASPON process with stable quality attributes for diagnostic applications can be found in Fig. 1A. Different host strain and affinity fusion tag combinations with and without signal peptides for periplasmic or cytoplasmic expression were designed. The N-terminally fused CASPON tag, which contains a 6H-tag, can be removed with a specific protease to generate an authentic N-terminus [29,30], whereas the C-terminally fused 6H-tag is not removable. Evaluation at the level of shake flask expression identified enGenes-X-press V2 [26,35] (pET30acer-CASPON-NP) as the most promising candidate for high level of soluble NP expression ( Figure S1). This candidate was therefore further scaled up to 1L fed-batch cultivations, using the enGenes-X-press process that was optimized by Stargardt et al. [26]. The process was run in quadruplicates and showed reproduceable high NP titers (3.7 ± 0.3 g/L; n = 4) and reproducible biomass concentration (33.7 ± 0.9 g/L; n = 4) (Fig. 1B).
An overview of the applied chromatography purification steps can be found in Fig. 1C. Screening of the process binding and elution conditions, as well as comparing different chromatographic media was performed by SDS-PAGE (Fig. 2). Based on the primary sequence, NP product has a theoretical molecular weight of 49.9 kDa before and 45.6 kDa after removal of the CASPON tag. Full-length NP was not detected when using strategy DSP #1, as the final DSP product fraction (black rectangle in Fig. 2A) consisted mainly of an NP fragment with a mass of ~25 kDa, which was confirmed by LC-MS/MS after in-gel digestion (data not shown). The process yield of DSP #1 was very low ( Table 1) with most of the losses already occurring during capture ( Fig. 2A, IMAC FT and Wash fractions), where most of NP did not bind. The calculated dynamic binding capacities, even when altering the residence time, were not consistent with a 49.9 kDa protein. It was hypothesized that these losses were due to diffusional hindrance where NP bound to large nucleotides is prevented from entering the pores of the stationary phase.
To reduce the size of the nucleotides available for NP binding the capture load was treated with a nuclease in DSP #2. This reduced the losses during capture chromatography (Fig. 2B IMAC FT  fractions), confirming the diffusional hindrance hypothesis. The final product also had a higher ratio of full-length NP to fragment (Fig. 2B, sIMAC FT fraction). To further improve the full-length content, several polishing steps were tested. DSP #3 included a CEX polishing step to remove low pI HCPs. This removed proteins >50 kDa (compare Fig. 2B sIMAC FT and Fig. 2C, Final fraction) but did not remove the fragments with a molecular mass between 20 and 40 kDa. An additional HIC step was used in DSP #4, which resulted in a highly pure final product (Fig. 2D, Final fraction), but lowered the process yield (Table 1).
In order to reduce the number of unit operations and increase process yield, HIC purification was used as the intermediate purification step and sIMAC as polishing step in DSP #5 (Fig. 2E). The ion exchange steps were omitted, since the nucleotides removed in these steps were of no concern for the intended use of NP as an antigen. This resulted in the purest NP product, providing a process yield of 77% after cell lysis. Since a lot of DSP losses of NP can be attributed to fragmentation and aggregation, it is possible that the specific order of unit operations in DSP #5 have helped to mitigate this issue. Specifically, by very quickly removing aggregates and fragments in the HIC step, nucleation points for further deterioration could be eliminated. The purity of NP regarding dsDNA, HCP and endotoxin was quantified for DSP #5 at different process steps ( Table 2). The protein of interest already has a very low HCP concentration after the capture step, which was further reduced during the subsequent processing steps ( Table 2). In all five DSP strategies, the CASPON enzymatic tag removal step could be performed in ~3 h with yields ranging between 70% and 90%.

Quality assessment of the purified NP product using HIC-MALS
The process purity could not be assessed by the SDS-PAGE analysis, hence a native analytical HIC-HPLC screening method was established to assess the purity of the different DSP approaches. The salt concentration at the start of the gradient was optimized to separate the impurities eluting between 5 and 10 min (Fig. 3A DSP #1) from the main NP product peak (eluting just before 12 min). The starting concentration of ammonium sulfate was lowered from 2 M to 1.2 M to reduce the total run time. This also improves the resolution of the late-eluting hydrophobic compounds (i.e., the main NP peak), which is a known effect in HIC chromatography [39]. The pump was set at 1 mL/min and a 10 min gradient was selected as the best-compromise condition between sample-throughput and adequate resolution for the purified NP product resulting from the different DSP strategies. The purity of NP product (defined by the main peak) based on the HIC method is reported in Table 1. The recovery of the method was verified by comparing the peak area (λ = 280 nm UV signal) of an injected NP sample in column bypass with an experiment where the sample is injected on the column using elution buffer. A recovery of 97% for 5 μL injected sample and 98% when injecting 50 μL was observed.
The unbound flow-through fraction (eluting at 3 min) is mainly attributed to oligo-and polynucleotides, based on the high A 254 /A 280 ratio at the peak-maximum of 1.5 for DSP #5. The main NP peak in contrast has an A 254 /A 280 ratio of 0.4, indicative for proteins. The lowermolecular mass species around 28 kDa, that appear in the fractions marked by the black box in the SDS-PAGE ( Fig. 2A-C), are also observable in the chromatograms for DSP #1, DSP #2, and DSP #3. A very pure NP product peak (purity >70%) is observed in all DSP protocols except for DSP #1. The same product quality could be obtained when comparing DSP #4 with DSP #5, even though the latter is simpler. The NP peak features a small shoulder of co-eluting species that comprises about 20% of the peak area. When extending the gradient time to 60 min, the co-eluting analytes could be better separated from the main target peak (Fig. 3B). The fact that this high-resolution gradient separation could still not fully resolve these closely eluting species indicates that the analytes exhibit very similar binding characteristics and hence also similar hydrophobicity. The shoulder seen in Fig. 3B could consist of a hetero-dimer of full-length NP with a fragment of NP. Putative fragments are likely to originate from fragmentation processes in the serine/arginine-rich region between N-terminal and C-terminal domain (NTD and CTD, respectively) [10]. The NTD of NP has an aliphatic index of 45.8-50.2, depending on length of the fragment (185-203 amino acids), whereas the CTD of NP has an aliphatic index between 54.4 and 58.9 (216-234 amino acids). This is very similar to the aliphatic index of full-length NP (52.53), which can explain the similar HIC retention behavior.
In order to identify the main peak, the HIC method was coupled to an on-line multi-angle light scattering (MALS) detector. Bovine Serum Albumin (BSA) was used as a test protein to optimize the HIC-MALS set-up and to normalize the MALS detectors for accurate molecular mass calculation ( Figure S2). NP product from DSP #4 and DSP #5 was analyzed using HIC-MALS, and the main peak has a weight-average molar mass of 100.0 and 99.2 kDa, respectively (Fig. 4). This value is consistent with the molar mass of two monomeric NP entities and therefore we assume that mainly dimerized NP is present in the final purified sample solution.

Native SEC-MALS-RI of the NP product
In order to verify the newly established HIC-MALS results, we additionally performed native SEC-MALS experiments. The MALS detector was calibrated in toluene for absolute mass measurements and the model protein BSA was used for normalization of the different detectors (angles) inside the MALS detector. The optimized SEC-MALS-RI method was then used to analyze NP from DSP #5 (Fig. 5). The main peak has a molecular mass of 99.9 kDa when using RI as a concentration source. Some lower molecular mass impurities eluting at 30 min could also be detected, which are likely to be the same as the non-binding fraction in HIC and are associated with nucleotides present in the sample. These are also faintly visible in the final product fraction in SDS-PAGE (Fig. 2E,  around 28 kDa). The SEC-MALS data supports the MW analysis from the HIC-MALS method, showing an excellent degree of overlap between the results obtained from both methods. The SEC elution profile is comparable with HIC, however, SEC results in a poorer resolution between the main peak and the co-eluted fraction. Despite optimization of the mobile-phase flow rate and column selection was performed to achieve the best resolution, this drawback inherently makes high-resolution screening with SEC-MALS very challenging.

RPLC-MS peptide mapping of NP
A tryptic digest of the purified NP sample from DSP #5 was analyzed using RPLC-MS, applying a data-independent database search. The LC-MS total ion chromatogram (TIC) obtained from the mass spectrometer with associated peak assignment of the peptides can be seen in Figure S3A. The MS/MS data was searched against a protein sequence database by MASCOT including the host (E. coli) and the target sequence. The sequence coverage map ( Figure S3B) shows the identified peptides (Table S3) based on color coding. The sequences which are highlighted in red are identified by MS/MS. The grey lines represent the peptides using greyscale shades to indicate the intensities of the precursor ions. The matched b-and y-ions are shown as red squares. Nglycosylation sites are highlighted in yellow. Based on the coverage of the C-and N-terminus, it can be concluded that the whole protein had been expressed. NP product from DSP #5 features nucleocapsid protein from human SARS-CoV-2 with high sequence coverage, supporting previous results obtained from native chromatography. MS/MS analysis also revealed the minor presence of chaperone protein DnaK from E. coli, with a molecular weight of 69.1 kDa. The final product fraction in Fig. 2E depicts a faint band with a MW > 62 kDa, which is likely to be associated with DnaK.

Conclusions
HIC-MALS was developed for in-process analysis of a protein with Fig. 3. (A) HIC analysis of the final product from the five different DSP strategies, revealing the differences in purity resulting from the purification process. The chromatograms are normalized to the highest peak in the run. The red dotted line shows the applied gradient profile, and the ammonium sulfate concentration is shown on the secondary vertical axis to the right. (B) High-resolution HIC separation of DSP #5 (recorded at 280 nm) using a 60 min gradient time, revealing that species with a similar hydrophobicity are co-eluting with the main peak associated with NP. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) intrinsic tendency for multimerization and fragmentation, and was applied to evaluate product quality which allowed the optimization of the downstream processing workflow. The HIC-MALS method is a powerful analytical tool that was used for evaluation of the end product resulting from a biotechnological production platform that features growth-decoupled expression in the E. coli enGenes-X-press strain, and the CASPON platform process [29,30]. Using this technology, it was shown that nucleocapsid protein from SARS-CoV-2 with high process yield (730 mg of purified NP per liter fermentation) and purity (95%) could be reliably produced after optimization of the downstream  processing. Complementary characterization tools, including bioanalytical assays, SDS-PAGE, native SEC-MALS, and peptide mapping RPLC-MS, were used to verify the results from the newly established HIC-MALS method. The NP product mainly contains dimerized protein (100 kDa) and has a co-eluting fraction which is likely associated with NP fragments that were also detected in SDS-PAGE. In contrast with SEC, which has limited resolution, the developed HIC method has the advantage that the resolving power can be improved by tuning the mobile-phase composition and gradient time. Combined with the molecular-mass information from MALS detection, HIC-MALS has proven to be a powerful tool to guide process development of proteins produced from recombinant origin under native conditions. Extended information can be obtained by online coupling of a refractive index detector with MALS, provided that proper calibration and baseline subtraction is performed. With this native analytical method, the availability of high-quality antigens for further research and diagnostic purposes can be accelerated, especially in highly demanding times, such as the current COVID-19 pandemic.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.