Development of a generic reversed-phase liquid chromatography method for protein quantification using analytical quality-by-design principles

Biopharmaceutical drug substances are generally produced using fermentation technology and are subsequently purified in the following downstream process. For the determination of critical quality attributes (CQAs), such as target protein titer and purity, monitoring tools are required before quality control analysis. We herein present a novel reversed phase liquid chromatography method (RPLC), which enables facile and robust protein quantification during upstream and downstream processing of intracellularly produced proteins in E. coli. The overall goal was to develop a fast, robust and mass spectrometry compatible method which can baseline resolve and quantify each protein of interest. Method development consisted of three steps, oriented on an Analytical Quality by Design (AQbD) workflow: (i) the stationary phase as primary parameter was chosen based on state-of-the art technology thus minimizing protein on-column adsorption and providing high efficiency, (ii) secondary parameters (i.e. gradient conditions and column temperature) were optimized applying chromatographic modeling, and (iii) the established Method Operable Design Region (MODR) was challenged and confirmed during robustness testing, performed in-silico and experimentally by a Design of experiment (DoE) based approach. Finally, we validated the RPLC method for pivotal validation parameters (i.e. linearity, limit of quantification, and repeatability) and compared it for protein quantification against a well-established analytical methodology. The outcome of this study shows (i) a protocol for RPLC development using an AQbD principle for new method generation and (ii) a highly versatile RPLC method, suited for quick and straightforward recombinant protein titer measurement being applicable for the detection of a broad range of proteins. © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license Abbreviations: CQAs, critical quality attributes; RPLC, reversed phase liquid chromatog oE, Design of experiment; KPI, key performance indicator; E., coli Escherichia coli; IB, inclu C, liquid chromatography; CE, Capillary electrophoresis; MS, mass spectrometry; HIC, H on-exchange chromatography; mAbs, monoclonal antibodies; ADCs, antibody-drug con usion protein; GFP, green fluorescent protein; CH3H, Chalcone 3-Hydroxylase; HRP, Hor rand average of hydropathicity; FLD, fluorescence detector; UV, ultraviolet; EP, European 2, adj. adjusted measure of fit; Q2, predictability; LOQ, limit of quantification; RSD, relat ∗ Corresponding author. E-mail address: julian.kopp@tuwien.ac.at (J. Kopp). ttps://doi.org/10.1016/j.jpba.2020.113412 731-7085/© 2020 The Authors. Published by Elsevier B.V. This is an open access article u (http://creativecommons.org/licenses/by/4.0/). raphy; AQbD, Analytical Quality by Design; MODR, Method Operable Design Region; sion body; SDS-PAGE, sodium-dodecyl-sulfate polyacrylamide gel electrophoresis; ydrophobic interaction chromatography; SEC, size exclusion chromatography; IEX, jugates; TFA, trifluoroacetic acid; SPP, superficially porous particles; N-pro, N-pro seradish peroxidase; BSA, Bovine Serum Albumin; MW, Molecular weight; GRAVY, Pharmacopeia; tG, gradient time; ATP, analytical target profile; R2, measure of fit; ive standard deviation. nder the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
A panoply of commercially available biopharmaceuticals are manufactured as recombinant proteins using microbial or mammalian hosts [1]. To monitor the production of biopharmaceutically active compounds, product titer is a pivotal key performance indicator (KPI) [2]. Since it is necessary to react promptly on process changes, facile sample preparation and subsequent rapid analytics are of high interest [3]. The gram-negative bacterium Escherichia coli (E. coli) still is today's expression host of choice for production of approximately 30 % of recombinantly produced pharmaceuticals in industry [1]. Quantification of target proteins is challenging in E. coli as recombinant proteins generally are expressed in the cell interior (either soluble or as an inclusion body = IB). Necessary cell disruption leads to a high amount of hostcell associated impurities and thus to a complex sample matrix for analysis.
Today, a plethora of analytical technologies exist for the analysis of (recombinant) proteins. Favored techniques are sodiumdodecyl-sulfate polyacrylamide gel electrophoresis (SDS-PAGE), which can be further used in western blotting to detect low protein concentrations, or capillary electrophoresis [4]. Samples derived from microbial hosts often suffer from impurities, favoring target protein detection via size using SDS-PAGE analysis [5]. Different stains, such as silver and comassie blue, can be employed for protein staining of gels. Silver staining PAGE enables more sensitive staining, however comassie blue SDS-PAGE is used more frequently as it is easier in its application, cheaper and also allows much faster processing compared to silver staining [5]. Still, when comparing SDS-PAGE analysis to liquid chromatography (LC) it is more time demanding and lacks accurate quantitation [6]. Capillary electrophoresis (CE) is another tool for the qualitative and quantitative analysis of biopharmaceuticals, and proteins in general [7]. CE is considered a powerful separation technique in terms of selectivity and efficiency, and can further be coupled with mass spectrometry (MS) for protein identification and characterization [8,9]. However, it has drawbacks such as limited instrument ruggedness and nonstraightforward method development.
In the last decade, LC has emerged as technique of choice for characterization and quantitation of (recombinant) proteins [10][11][12][13], due to facile sample preparation, technically mature instrumentation and major advancements in stationary phase chemistry. Hydrophobic interaction chromatography (HIC) [14], size exclusion chromatography (SEC) [15] and ion-exchange chromatography (IEX) [11] enable analysis of proteins in native (non-denaturing) conditions, but are limited in their applicability. SEC is mainly carried out for quantitation of protein aggregates and IEX for separation of protein charge variants. HIC (on analytical scale) has its main application in separation and characterization of large biomolecules, such as monoclonal antibodies (mAbs) and antibody-drug conjugates (ADCs) [16]. However, HIC exhibits drawbacks like slow mass transfer resulting in broad peaks and the necessity of high mobile phase salt concentrations thus impeding hyphenation with MS.
Reversed phase liquid chromatography (RPLC) has become the predominant technique for analysis of proteins due to its versatility, robustness, high efficiency and MS compatibility [17,18]. In contrast to the aforementioned chromatographic modes, RPLC is a denaturing technique which often requires harsh conditions (e.g. high temperature and low pH) in order to eliminate irreversible on-column protein adsorption and improve peak shape. However, the advent of novel stationary phases, including wide-pore superficially porous particles (SPP) and special stationary phase coatings for suppression of silanol interactions, enabled higher peak efficiencies and application of milder elution conditions [19,20]. Recently, a wide-pore SPP column with a high coverage phenyl bonding was introduced to the market, which addresses the last obstacles in RPLC protein analysis (i.e. on-column adsorption of large biomolecules, poor selectivity for closely related proteins) [21].
In this study, we established a widely-applicable, robust, and MS-compatible RPLC method applicable for on-line process analytics and quality control of recombinant proteins with pronounced differences in their physicochemical properties. A state-of-the art column was used for systematic method development based on Analytical Quality by Design (AQbD) principles. The needed requirements to the method were defined according to the ATP (= analytical target profile), followed by investigation of critical method parameters and their effects onto monitored responses using a Design-of Experiment (=DoE) approach [22]. Following the AQbD concept, the method operable design region (MODR) including retention modeling was assessed [23]. Robustness testing was performed in-silico and experimentally to verify the set MODR, applying a Design-of-Experiment (DoE) concept. Finally, the method performance was compared for protein quantification in a case study against one of the benchmark technologies (i.e. SDS-PAGE).

Chemicals and samples
Ultra-pure water was obtained from a Milli-Q system from Merck Millipore (Burlington, USA). Acetonitrile (HPLC grade) and trifluoroacetic acid (TFA, >99.9 %) were purchased from Carl Roth (Karlsruhe, Germany). All proteins implemented within this study are described below. Further information regarding cultivation and purification of the proteins can be found in Supplementary section A.

N-pro fusion protein (N-pro)
N-pro fusion protein was produced in an E. coli BL21(DE3) with the pET30a plasmid system (kanamycin resistance). The target protein was linked to a N-pro fusion protein used for purification. Details on protein structure and purification steps cannot be given due to proprietary reasons.

eGFP (GFP)
GFP was produced in an E. coli BL21(DE3) (Life technologies, Carlsbad, CA, USA), transformed with a pET21a(+) vector (ampicillin resistance) carrying the gene coding for enhanced green fluorescent protein (eGFP). Samples derived from fed-batch fermentation (as referred here [24]), were disrupted and centrifuged to receive the soluble fraction. Protein solution was filtered through pores < 0.2 m and was further purified with an immobilized metal affinity chromatography (IMAC), binding to the expressed HIS-tag of the protein. Samples were desalted and stored in aqueous buffer composed of 20 mM TRIS, 100 mM NaCl, pH = 7.5 containing 10 % glycerol for RPLC analysis.

Protein physicochemical properties
Physical properties and number of disulfide bonds were calculated using different software tools and are summarized in Table 1. Molecular weight (MW) and grand average of hydropathicity (GRAVY) were calculated using Expasy ProtPar [25] as these features can be accessed on a free basis with the given amino acid composition of the target protein. Disulfide prediction was calculated using the DiANNA 1.1 webserver by F. Ferre et C. Clote [26].

Equipment and software
Measurements were performed using a Dionex UltiMate 3000 system with a quaternary solvent delivery pump, an auto-sampler, UV and fluorescence (FL) detector (Thermo Fisher, Waltham, MA, USA). Dwell volume was determined as V d =0.90 mL. Data were acquired using either an UV detector or a FL Detector. Data acquisition and instrument control were carried out by Chromeleon 7.2 software (Thermo Fisher). Calculations and data transfer were achieved with Excel (Microsoft). Resolution was calculated according to European Pharmacopeia (EP). Retention and resolution modeling was performed with DryLab ® software (Version 4.3.3., Molnár-Institute, Berlin, Germany). DoE and statistical evaluations were carried out with the software MODDE (MODDE 12.1, Umetrics, Sweden).

Chromatographic conditions
The HPLC was equipped with a BioResolve RP mAb Polyphenyl column (dimensions 100 × 3 mm, particle size 2.7 m) which is designed for mAb and large protein analyses (Waters Corporation, MA, USA). In order to prolong column lifetime, a pre-column (3.9 × 5 mm, 2.7 m) was used. The mobile phase was composed of ultrapure water (=MQ, eluent A) and acetonitrile (eluent B) both supplemented with 0.10 % (v/v) trifluoroacetic acid. The injection volume was set to 2.0 L. The flow rate and the column temperature of the final method were 0.4 mL/min and 70 • C, respectively. Re-equilibration was conducted for six minutes, which corresponds to approximate five column volumes. The acquisition wavelength for UV detection was set to 214 and 280 nm, respec-tively, for the FL detection the excitation wavelength was set to 280 nm and the emission wavelength to 360 nm. All detectors were series-connected, monitoring UV and FLD chromatograms at once, although FLD chromatograms only were used for evaluation. Sample concentration was 1.5 mg/mL for BSA. Via BSA calibration the sample nominal concentration was determined to be 2.5 mg/mL for N-pro, 3.5 mg/mL for GFP, 0.5 mg/mL for HRP and 2.0 mg/mL for CH3H. Total observed values in FLD-spectra differ as the amount of aromatic amino acids highly varies between employed target proteins.

Design of experiment of input runs for software modeling
For the chromatographic modeling using DryLab, a 2D model was chosen in order to optimize gradient time (tG) and column temperature (T). The input runs required a 2 × 2 full factorial design applying a linear gradient with short (tG of 8 min) and long (tG of 24 min) gradient time (starting from 25 % B to 60 % B), and column temperature of 50 • C and 78 • C, respectively (the concept is shown in Fig. 1a). The flow rate for the input runs was set to 0.8 mL/min. All runs were carried out separately for each protein and performed in duplicate. The mean value of the retention time was used for modeling in DryLab (the raw data are given in the Supplementary data section B, Table B1).

Protein assay determination (quantification)
For SDS-PAGE analysis BSA was mixed with 2-fold concentrated Laemmli solution to achieve single concentrated Laemmli buffer in the final dilution. Samples were then heated at 95 • C for 10 min. 10 L of each sample were loaded onto pre-cast SDS gels containing 15 vials (8-16 %, Bio-Rad, Hercules, CA, USA). Gels were run in a Mini-PROTEAN Tetra System (Bio-Rad, Hercules, CA, USA) for about 60 min at 140 V and stained with Coomassie Blue. The protein bands were evaluated densitometrically using the software ImageLab (Bio-Rad, Hercules, CA, USA).
For HPLC-measurement of the soluble proteins, the sample was directly used after cell disruption and centrifugation for GFP and after a refolding step for HRP [24]. IB pellet samples were prepared according to Ref. [24], and subsequently solubilized using a buffer solution of 7.5 M guanidine hydrochloride, 62 mM TRIS, and 125 mM 1,4-Dithioerythritol (DTT) at pH = 8 (all chemicals purchased from Carl Roth, Karlsruhe, Germany). All samples were filtered through 0.2 m syringe filters purchased from Carl Roth (Karlsruhe, Germany) prior to analysis.

Validation
As BSA was stored in an aqueous solution, dilutions of the commercial purchased standard were done with ultra-pure water (Burlington, USA). Solutions for LOQ and linearity determination consisted of the following concentrations: (5, 10, 50, 100, 250, 500, 1000, 1500, and 2000 g/mL). LOQ was determined experimentally by a six-fold injection followed by evaluation of the relative standard deviation of peak areas and the S/N ratio. Linearity was evaluated in the range of determined LOQ until 2.0 mg/mL (=con- centration of the commercially available reference standard) at nine concentration levels.

Results and discussion
In order to develop a widely applicable LC method for protein assay determination, five model proteins were chosen which cover a broad range of protein size and hydrophobicity, as hydrophobicity is known to influence RPLC majorly. The GRAVY Index, describing hydrophobicity of target proteins, is calculated via amino acid composition of target proteins (listed in Table 1): proteins resulting in a GRAVY index close to 0 are highly hydrophobic whereas proteins are more hydrophilic, the more negative results for GRAVY Index are [25]. In biotechnology, GFP is often utilized as biosensor or as a marker protein due to easy auto-fluorescent detection [2]. BSA is often used for calibration purposes in biotechnology if there is a lack of a purified reference substance [24]. HRP, being used for enzymatic cancer treatment, was recombinantly expressed as an inclusion body but analyzed in its native form after refolding within this study. To show the applicability of the developed method independent from expression form, the two further analytes (N-pro fusion protein and CH3H) were analyzed in their incorrect folded form (=IB). All samples were evaluated separately as they were (i) derived from different purification steps (stated in Section 2.1. for each given protein) (ii) and were stored in different buffer solutions (stated in Section 2.5).

Method development workflow and selection of primary factors
The analytical target profile (=ATP) was established to develop a robust and sensitive method with baseline-separation between neighboring target peaks (i.e. critical resolution greater 1.5). As samples derived from microbial cultivation might cause a highly complex sample matrix, the focus of this study was to provide a generic RPLC method for protein quantification, being independent from state of purification and its needed buffer solution. To verify this approach, target proteins were chosen in a broad range of hydrophobicity and varied in their degree of purity, as crude samples (i.e. CH3H) and highly purified proteins (i.e. HRP) were evaluated. MS-compatibility was further chosen as target profile in order to make the developed method applicable for further elucidation possibilities. The first step was the selection of appropriate method factors (parameters) for the stationary-and mobile phase system (stationary phase and mobile phase bulk solvents are considered primary method factors in an "AQbD language" [21,27]): Considering the recent advancements in stationary phase chemistry for RPLC protein analytics, a small set of columns is commercially available which minimize protein on column adsorption due to either special end-capping of the silica surface or using an organic polymer as support material [17,21]. Further, high peak efficiency can be accomplished on account of smaller particle sizes and/or usage of superficially porous particles [19,21]. Due to aforementioned reasons, a BioResolve RP mAb Polyphenyl column was chosen.
Apart from the stationary phase chemistry, also the type of organic modifier and mobile phase pH are considered as primary factors [20,27]. As primary factors are known to exhibit a higher impact on selectivity than secondary factors (i.e. column temperature), appropriate choice is highly important [21,27]. In contrast to RPLC analyses of small organic molecules, Acetonitrile (being an aprotic solvent) is most often applied as bulk mobile phase in protein RPLC [17]. The application of protic solvents (e.g. Methanol) often creates protein adsorption, hence they are avoided or only added in small portions to Acetonitrile in order to improve selectivity. For the sake of simplicity and in order to develop an easy to use method, Acetonitrile was chosen as the only organic modifier. Hence, utilization of purified water and acetonitrile were selected as mobile phase bulk solvents and TFA was used as additive to suppress secondary interactions. Also, TFA as "semi-volatile" additive enables hyphenation with MS or aerosol-forming universal detectors (i.e. charged aerosol or evaporative light scattering detectors), although detector sensitivity is compromised. Gradient (tG) and column temperature (T), have been described to significantly enhance protein separation in RPLC [17] and thus were chosen as secondary factors. High temperature setpoints were chosen to (i) decrease the chance of non-specific protein adsorption and (ii) to favor thermodynamic behavior for chromatographic interaction. Since TFA was applied because of aforementioned reasons, a systematic variation and optimization of pH was omitted.

DryLab assisted modeling of secondary method factors
Gradient time and column temperature were optimized using a chromatographic modeling software (i.e. DryLab [28,29]). The calculation of "ideal" parameters for the flow rate, gradient and column temperature required retention time data obtained from input runs. In this study, a 2 × 2 full factorial design was applied (illustrated in Fig. 1a). tG was varied by three-fold and T was set to 50 • C for the lower limit and to 78 • C for the upper limit (Fig. 1). Limits were chosen as temperatures below 50 • C may cause irreversible adsorption of proteins on the column, whereas 80 • C was the technical limit of the employed HPLC system (thus 78 • C was chosen to allow flexibility during robustness experiments). The changes in retention time are exemplarily displayed for HRP in Fig. 1b Based on the obtained retention times and peak-widths (at 50 % peak height) for each condition and protein, a chromatographic model was created in DryLab. The resolution between GFP and BSA (compare Fig. 2b and c) was identified as the most critical peak pair due to their congeneric hydrophobicity [25]. By iteratively modifying the method parameters in-silico, a constant flow rate of 0.4 mL/min, a column temperature of 70 • C and the gradient program shown in Table 2 were determined thus establishing the method operable design region (MODR). Fig. 2a depicts the predicted chromatogram. In order to proof the validity of the DryLab model and its prediction, a verification run was carried out in the lab (Fig. 2b). As can be deduced by comparing the predicted with the real chromatogram, the retention times and peak widths are well comparable. Thus, the DryLab model is considered to be valid and can be used for further studies.
The critical resolution, defined as the smallest calculated resolution for all peak pairs, is displayed as a heat map in dependence of the column temperature versus the gradient time (Fig. 2c). This additionally helps in the selection of the optimal combination of column temperature and gradient time by simultaneously obtaining an estimate for the robustness of the selected method factors. In Fig. 2c, the red to orange area is the result of a combination of T and tG which leads to a critical resolution bigger than 2.0. The cross shows the selection of tG and T of the optimized method. Tempera- Table 3 List of all factors altered including their set-points and their variance, as well as the variance deviation for the in-silico calculation.

Factor
Set-point + variance delta tures below 70 • C could facilitate unspecific protein adsorption and were therefore neglected. The calculated chromatographic space around our set point is predicted to be in major parts characterized by a critical resolution > 2.0 and thus considered as robust by simultaneously minimizing the run time. Thus, the predicted method operable design region is equal to the area colored in red-orange in Fig. 2c.

3.3.
In-silico and experimental robustness studies 3.3.1. In silico robustness Robustness evaluates the capacity of an analytical method of remaining unaffected by small, but deliberate changes of method factors (parameters) [30]. DryLab offers the possibility of in-silico robustness studies [28]. Nine factors in total (compare to Table 3) were varied at three levels (central point, upper and lower level) and the critical resolution was calculated for each factor combination.
A total of 13122 operating conditions were simulated and tested for their influence onto the critical resolution. Remarkably, the outcome of the in-silico calculation is showing that a 94.1 % of all factor combinations were above the set resolution of 2.0, indicating a highly robust method. Moreover, the in-silico robustness study allowed to identify pivotal method factors with the highest impact on the critical resolution (compare to Supplementary data B, Figure  B1). The factor exhibiting the highest influence identified by the insilico robustness study was found to be the gradient time (compare to Supplementary Fig. 1B). Further factors, exhibiting high impact on the critical resolution were identified throughout simulation and hence chosen for experimental robustness determination as described in chapter 3.3.2.

Experimental determination of robustness
Due to the fact that the critical peak pair (i.e. BSA and GFP) is eluting during the first step gradient (compare to Fig. 2b and c), only tG of the first gradient was investigated. Further, flow rate and temperature also exhibited significant impact in the in-silico simulation and were therefore selected for the experimental robustness study. The influence of ion-pairing effects on selectivity and peak shape caused by certain amounts of TFA in the mobile phase cannot be predicted by the modeling software. Consequently, TFA concentration in mobile phase A and B was selected as fourth component for the experimental robustness study.
The four chosen factors (tG of the first gradient, flow rate, column temperature and TFA conc. in the mobile phase) were varied based on a response surface (central composite) design of experiment, as illustrated in Fig. 3a. The model was evaluated for linear, quadratic and interaction terms using the software MODDE. Centre-points were conducted as six-time repeated experiments. The response of the model was evaluated for the critical resolution between neighboring peaks. Factors were varied experimentally in the following ranges: duration of the first gradient from 7.5 to 8.5  The model was evaluated for its statistical terms being the measure of fit (R 2 ) and the adjusted measure of fit (R 2 adj.) to validate the modelś variation of response, adjusted for degrees of freedom. Furthermore, the model predictability Q 2 was evaluated to test for the model capability of predicting new data, shown in Table 4.
High R 2 -values (> 0.9) indicated that input data fit the model well. High values received for Q 2 (> 0.775) evaluated with MODDE showed that new data would fit the model properly and low experimental error was observed. To avoid any data specific model responses, the difference between R 2 and Q 2 should be as small as possible, which was found to be smaller than 0.184 for the stated data-set. The difference between R 2 and R 2 adj. indicates that the experimental setup performed within this study was chosen appropriate. Experiments conducted for all model proteins are given in Supplementary table 2B, whereas the resolution between the critical peak pair BSA-GFP will be discussed in more detail within this chapter, as these target proteins resulted in the smallest resolution (compare to Fig. 2b).
The DoE was evaluated by the resulting contour and interaction plots as illustrated in Fig. 3b and 3c. Even though column tempera-tures below 70 • C were found to increase the critical resolution, 70 • C was considered as a compromise in order to avoid nonspecific protein adsorption effects occurring at lower temperatures. Similar results were obtained for TFA concentration in the mobile phases: A lower TFA concentration was found to decrease the critical resolution, whereas a higher TFA concentration would improve critical resolution values but would on the other hand detrimentally impact sensitivity in mass spectrometry. A lower flow rate would improve the critical resolution but increase run time, simultaneously (see Fig. 3c). Therefore, a flow rate of 0.4 mL/min was determined as optimum. A short gradient time seemed to be beneficial, however the interaction plot in Fig. 3c showed that this was only true for smaller flow rates thus impeding a fast run time. Flow rates above 0.45 mL/min showed a greater critical resolution for long gradient times, compared to shorter gradient times.
No method factor combination resulted in a critical resolution lower than 1.5, hence selected optima were confirmed experimentally and the method can be considered robust for method variations such as ±3 • C column temperature, ± 0.02 % of TFA concentration in mobile phase, ±0.5 min of gradient time and ±0.05 mL/min flow rate.

Instrument precision and repeatability
Instrument precision was tested via six subsequent injections of the same protein solution at sample nominal concentration (compare to Section 2.3). A relative standard deviation (RSD) of at most 2.0 % for peak area and retention time were set as acceptance criteria for all five proteins. For instrument precision experiments conducted with BSA, an injection of a blank solution (7.5 M guanidine-HCl solution) was required to prevent sample carryover from one run to the next. RSD of the peak areas and the retention times were evaluated and fulfilled the criteria of not more than 2.0 % (compare to Supplementary Table C1).
In order to evaluate the impact of sample preparation, a repeatability study was performed. As BSA was commercially purchased, whereas other target proteins were derived from fermentation and subsequent purification, BSA was applied as model protein for the repeatability experiments. RSD for peak area and retention time below 2.0 % could be achieved throughout a six-fold sample preparation and subsequent analysis (compare to Supplementary Table  C2).

Linearity and limit of quantitation
In biotechnology research and development model proteins are often quantified using BSA as a "generic" reference standard, in case no isolated material of the target protein is available [24]. For this reason, we applied BSA for evaluation of the Limit of Quantitation (LOQ) and linearity. A LOQ of 0.005 mg/mL could be established (RSD < 15 %, S/N ratio >10). The method was found to be linear and precise over three orders of magnitude (correlation coefficient R 2 = 0.999, residual standard deviation = 1.7 %, prognosis interval P = 95 % for the y-intercept includes the zero point). The calibration curve is shown in Fig. 4a and further details on linearity evaluation are given Table 5.

Comparison of the developed RPLC method with SDS-PAGE analysis
Even though analysis of microbial proteins is commonly performed with SDS-PAGE, proper LC analytics should be superior regarding accuracy and analysis time [5]. To verify this hypothesis, a standard calibration curve established with (i) SDS-PAGE analysis (using comassie blue staining) was compared to (ii) a calibration curve measured with the herein developed RPLC method. Method attributes such as analysis time, LOQ, correlation coefficient, y-intercept and RSD are compared in Table 5.
The standard calibration curves (compare to Table 5 and Fig. 4a  and b) show that higher accuracy and a decrease in analysis time can be achieved using the established RPLC method. Due to method limitations low protein concentrations could not be evaluated with a gel-based quantification, as indicated in Fig. 4b. As integration of SDS-PAGE-derived bands is challenging and needs to be performed visually (being highly operator dependent), high deviations in evaluation can be observed, which is in accordance with literature [6]. To test whether host cell impurities would impact the resolution of target proteins, crude samples were compared with purified ones (see Supplementary data section D, figure D1). The respective target Fig. 4. Calibration of BSA standards with a confidence interval set for 95 % comparing a) the developed RPLC method with b) a state of the art SDS-PAGE using comassie blue staining.

Table 5
Parameters for the evaluation of the two compared analytical methods, using LOQ, correlation coefficient, y-intercept, and residual standard deviation referred to nominal concentration.

Method
Analysis peaks could be separated from their impurities, thus eliminating the need for SDS-page derived analysis throughout any process step.
Summarizing the comparison of both methods, RPLC measurements outperform the SDS-PAGE in terms of analysis speed (18 min versus 150 min) and limit of quantification (LOQ 0.005 mg/mL versus 0.1 mg/mL). This makes RPLC an attractive technique facilitating pharmaceutical up-and downstream development where fast and accurate responses are a prerequisite for screening conditions that potentially increase recovery yields and/or purity of the final product.

Conclusion
A generic RPLC method for quantification of commonly applied model proteins expressed by the microbial host E. coli was developed. The method development workflow applied analytical quality-by-design principles by using retention modeling and establishing a method operable design region. The MODR was challenged by both in-silico and experimental robustness experiments, using a multivariate experimental design and analysis approach. Method performance of the established RPLC method was compared to a SDS-PAGE method, which is still a benchmark technology for the analysis of microbial derived samples. Results indicate the superiority of the RPLC method regarding analysis time, ease of use, linear range and LOQ.
In addition, the herein described AQbD workflow can be used as a generic method development protocol for RPLC analysis of proteins. Furthermore, the presented method has found direct utilization in the analysis of the chosen target proteins, indicating (i) its applicability for the analysis of a broad range of proteins with different hydrophobicities and (ii) its independency of applied protein purification procedures and the therefore needed buffer solutions. Finally, the established RPLC method is also MS-compatible thus enabling further opportunities for mass elucidation.