Biomass soft sensor for a Pichia pastoris fed ‐ batch process based on phase detection and hybrid modeling

A common control strategy for the production of recombinant proteins in Pichia pastoris using the alcohol oxidase 1 (AOX1) promotor is to separate the bioprocess into two main phases: biomass generation on glycerol and protein production via methanol induction. This study reports the establishment of a soft sensor for the prediction of biomass concentration that adapts automatically to these distinct phases. A hybrid approach combining mechanistic (carbon balance) and data ‐ driven modeling (multiple linear regression) is used for this purpose. The model parameters are dynamically adapted ac-cording to the current process phase using a multilevel phase detection algorithm. This algorithm is based on the online data of CO 2 in the off ‐ gas (absolute value and first derivative) and cumulative base feed. The evaluation of the model resulted in a mean relative prediction error of 5.52% and R² of .96 for the entire process. The resulting model was implemented as a soft sensor for the online monitoring of the P. pastoris bioprocess. The soft sensor can be used for quality control and as input to process control systems, for example, for methanol control.

maintenance (Kano & Fujiwara, 2012). For these reasons, biomass is in many cases not measured online at all.
Because the direct measurement of biomass is often not feasible, soft sensors can be used for predicting it. Soft sensors consist of computational models or algorithms that allow the prediction of target values, such as biomass concentration, via continuously measured secondary variables, such as exhaust gas concentrations, dissolved oxygen (DO), and flow rates (Luttmann et al., 2012).
Various modeling techniques have been proposed for developing soft sensors, the majority of which are based on mechanistic or datadriven approaches. An overview of soft sensors and the selection of appropriate modeling techniques for online bioreactor state estimation has been presented elsewhere (Zhang, 2009). Mechanistic modeling approaches include, for example, differential balancing systems, which describe the material and energy conversions at the cellular level, as well as mass and energy balances (Jenzsch, Gnoth, Kleinschmidt, Simutis, & Lübbert, 2007). Data-driven approaches include, among others, artificial neural networks (ANN; Gonzaga, Meleiro, Kiang, & Maciel Filho, 2009) and methods from the field of multivariate statistical process control (Kadlec, Gabrys, & Strandt, 2009), such as principal component regression and partial least squares regression. In hybrid modeling, mechanistic and data-driven modeling approaches are combined, as reviewed by Kalos, Kordon, Smits, and Werkmeister (2003) and Solle et al. (2017).
The main challenges in the development of soft sensors are as follows: control of model complexity (overfitting vs. underfitting) (Kordon, Smits, Kalos, & Jordaan, 2003); limited amount of data sets or data points (Fortuna, Graziani, & Xibilia, 2009); outliers resulting from, for example, sensor faults (Zhang, 2009); adaption mechanisms for model maintenance (Bakirov, Gabrys, & Fay, 2017); input variable selection; reliability of soft sensors; and changes in process characteristics and operating conditions (Kano & Fujiwara, 2012). In addition, a specific challenge arises in soft sensor development for P. pastoris bioprocesses given its distinct process phases, as described previously: The underlying principles of prediction models for biomass are related to the inherent biological relationships between online measured variables and biomass (Chen, Nguang, Li, & Chen, 2004); thus, the soft sensor needs to be adaptive to the current process phase to give accurate prediction results throughout the entire process.
In this study, an adaptive soft sensor for biomass concentration was developed. The novelty of this study is that the soft sensor changes its model coefficients regarding the current process phase (batch, transition, or fed-batch phase) of the P. pastoris bioprocess. The soft sensor's underlying prediction model is based on a hybrid of mechanistic and data-driven approaches. The mechanistic part comprises mass balancing of carbon using methanol and CO 2 fluxes. The outcome of this mechanistic model-the generation rate of total organic carbon inside the bioreactor-is fed into a data-driven model that in turn leads to an online prediction of biomass concentration. The adaptability of the soft sensor to the distinct process phases is guaranteed by automatic and reliable detection of glycerol depletion based on online process variables, namely, CO 2 in the off-gas (absolute value and first derivative) and cumulative base feed. The soft sensor's model coefficients switch automatically depending on the current process phase and thus give accurate biomass predictions throughout the entire process. Finally, the soft sensor was implemented in a real-time capable system to enable online biomass monitoring.

| Fed-batch cultivation in bioreactor
The shake flask culture was used to inoculate the main culture in the bioreactor Biostat ® Cplus (Sartorius AG, Goettingen, Germany) with working and total volumes of 15 and 42 L, respectively. The main culture medium was FM22. Pressure, pH, temperature, and dissolved oxygen were controlled to 500 mbar, 5, 30°C, and 40%, respectively. NH 4 OH was used as nitrogen source and to set and maintain a pH of 5. A dissolved oxygen minimum of 40% was controlled by a cascade control using variable stirrer speed (300-600 min −1 ) and air flow rate (20-40 L · min −1 ).
The end of the batch phase, that is, the depletion of glycerol, was indicated online by a characteristic peak in the off-gas CO 2 concentration. The complete depletion of glycerol was verified offline via HPLC analysis (data not shown). After a short transition phase, which prevents the potential repression of the AOX1 promotor by glycerol residues from the preceding batch phase, the culture was induced with methanol. The methanol feed was supplemented with 12 ml · L -1 PTM4 stock solution. Methanol concentration was controlled via a fuzzy logic controller to 4.5 g · L -1 . This controller uses methanol concentration as the main input and the feed rate of methanol as output. The general concept of fuzzy logic controllers is described, for example, in Birle, Hussein, and Becker (2013

| Determination of dry cell weight
Dry cell weight was determined in triplicate by centrifugation of 2 ml cell suspension in previously weighed centrifuge tubes, followed by discarding the supernatant and drying the cell pellet to a constant weight at 80°C. Samples for the determination of dry cell weight were taken using the BaychroMAT ® autosampler (Bayer AG, Leverkusen, Germany) with a minimum sampling interval of 2 hr.

| Data management
The digital control unit (DCU) of the Biostat ® bioreactor (Sartorius AG) was used for primary process control (pressure, pH, temperature, and dissolved oxygen) and signal recording. SIMATIC SIPAT (Siemens AG, Munich, Germany) was used for data management and to store the process (online) and laboratory (offline) data in a central

| RESULTS AND DISCUSSION
This study aims to develop a soft sensor for the prediction of biomass concentration that provides accurate online predictions for a multiphase process (batch, transition, and fed-batch phase) with two different carbon sources (glycerol and methanol). The general concept of the hybrid-model-based soft sensor presented here consists of two main levels: The first level comprises a phase detection algorithm to differentiate online among batch, transition, and fed-batch phase; the second level consists of a hybrid-model-based prediction equation that automatically adjusts the model parameters based on the current process phase (batch, transition, or fed-batch phase). For the development of the first and second levels, nine and six data sets, respectively, were used. Only the latter six data sets had a fed-batch phase with control of methanol concentration and therefore can be compared with each other.
The hybrid model uses a carbon balance as the mechanistic part.
The result of the carbon balance is fed into a data-driven part to provide accurate prediction of the biomass concentration. The information-bearing model inputs that were used in this study to predict biomass concentration are cumulative methanol and base feed as well as concentrations of off-gas CO 2 and methanol. Figure 1 shows the time course of the relevant model inputs of the soft sensor for an exemplary process run. This process run is used as an illustrative example throughout the following sections. In this case, the batch phase ends at 39.6 hr, followed by a transition phase that lasts for 6.9 hr, and a fed-batch phase that starts at 46.5 hr. In the batch phase, glycerol is metabolized and biomass is generated.
The presence of the transition phase prevents the potential repression of the AOX1 promotor by glycerol residues from the preceding batch phase. In the transition phase, no significant increase (due to the absence of carbon sources) or decrease of biomass concentration was observable. In the fed-batch phase, methanol is fed into the bioreactor via a pump for the first time. Subsequently, methanol concentration is controlled to a setpoint of 4.5 g · L -1 via a fuzzy logic controller. This process run shows control errors such as high initial overshoot and an increasing deviation of the measured methanol concentration to the setpoint in the subsequent time course. Base (NH 4 OH, 5 M) is fed into the bioreactor via a pump and is used to maintain pH at 5.0. The cumulative base feed represents the degree of metabolic activity, that is, substrate depletion. This variable shows F I G U R E 1 Time course of the relevant model inputs of the soft sensor for an exemplary process run, namely, cumulative feed volume of methanol, V meth , and base, V base , as well as concentrations of CO 2 in the off-gas, σ CO2 , and methanol, c meth . For this exemplary process, the batch phase ends at 39.6 hr and the fed-batch phase starts at 46.5 hr high collinearity to the biomass concentration (see later in Figure 6).
In the batch phase, the off-gas CO 2 signal almost continuously increases until the end of this phase. Here, the signal drops abruptly and, except for minor fluctuations, begins to rise again only upon methanol induction. After methanol induction, the cells need to adapt to the metabolization of methanol.
3.1 | Multilevel process phase detection 3.1.1 | General concept of process phase detection This algorithm step aims to differentiate among the three distinct process phases, which are listed in Table 1 together with its process data characteristics regarding process phase detection. The detection of the end of the batch phase is primarily based on the off-gas CO 2 signal. The metabolization of glycerol together with an increasing cell concentration leads to an almost continuous increase in CO 2 emission during the batch phase. When glycerol is depleted, the off-gas CO 2 signal drops abruptly, as shown in Figure 1 (here at 39.6 hr). The relationship between the CO 2 drop and substrate consumption is shown and discussed in detail in Munch et al. (2020). This abrupt drop is the main sign of the end of the batch phase and is hereinafter referred to as trigger 3. To increase robustness of the phase detection algorithm, two additional trigger conditions upstream of trigger 3 were implemented, namely the exceeding of absolute values for cumulative base feed (trigger 1) and off-gas CO 2 concentration (trigger 2).
The output of the algorithm for process phase detection is a binary value indicating whether the end of the batch phase has been reached (1 = true) or not (0 = false) together with the corresponding timestamp.
Variable inputs to the algorithm consist of the signals for cumulative base feed (V base ) for trigger 1, the absolute off-gas CO 2 concentration (σ CO2 ) for trigger 2, and the timewise derivative of the off-gas CO 2 ) for trigger 3. Only when triggers 1 and 2 are initiated, that is, they are "true", trigger 3 is active and can be initiated.
The end of the batch phase is indicated when all three triggers are "true." The process variable V base represents the cumulative metabolic activity regarding the consumption of the carbon source. Because the batch process starts with a glycerol concentration of 40 g · L -1 , the total volume of base fed into the bioreactor at the end of the batch phase is restricted to the stoichiometry of glycerol metabolization.
Trigger 1 is therefore initiated when a defined threshold for V base is exceeded. In the transition phase, the variable V base remains constant because cells do not grow. Similar to V base , the process variable σ CO2 is strongly related to biomass growth and substrate consumption.
During exponential growth, σ CO2 increases almost continuously until the end of the batch phase. Trigger 2 is therefore initiated when a defined threshold for σ CO2 is exceeded. This trigger is implemented to guarantee that natural fluctuations in σ CO2 , which can statistically occur in biological systems (see Figure 1), and sensor faults impede the functionality of the process phase detection as little as possible.
Trigger 2 thus slightly increases robustness of the phase detection algorithm. Figure 2 shows the functioning of trigger 3 in terms of the for an exemplary process run. The value of falls below the threshold uniquely at the end of the batch phase (here at 39.6 hr). A median filtering step was implemented before and after the derivation step to decrease noise of the vari- , respectively.

| Threshold definition
The thresholds for triggers 1, 2, and 3 were calculated as shown in (1), where threshold i is the threshold for the trigger variable used for process phase detection with ; mean i and std i is the arithmetic mean and standard deviation, respectively, of the variable i at the end of the batch phase. The end of the batch phase was for this purpose defined as the time at the minimum of σ / d dt CO2 . SF is a constant safety factor of 3 that is implemented to avoid false positive detections of the end of the batch phase of the phase detection and thus to increase the robustness of the multilevel detection algorithm.
For illustration and comparison of the three triggers, Figure 3 shows the results (mean ± standard deviation) for the trigger variables normalized to the corresponding threshold. The resulting threshold values together with the mean and standard deviation are summarized in Table 2. These threshold values were implemented in SIMULINK to automatically detect the end of the batch phase and therefore to select the right model coefficients for the biomass soft sensor shown in the following.

| Mass balance for carbon
The underlying principle of the mechanistic modeling part is mass balancing of carbon. The boundary for the balancing is the bioreactor system: Carbon is fed into the bioreactor in the form of methanol (fed-batch phase) and leaves the boundary in the form of CO 2 .
T A B L E 1 Main characteristics of the three distinct process phases (batch, transition, and fed-batch phase) regarding process phase detection The remaining carbon is in the form of either glycerol or methanol or is bound in cells as well as extracellular organic acids and proteins.
The following sections show how the timewise rates of off-gas CO 2 and methanol are calculated. These rates are then balanced to enable calculation of the formation rate of total organic carbon (TOC) that remains bound in cells as well as extracellular organic acids and proteins. To determine the cumulative amount of TOC online, this rate needs to be multiplied by the total liquid volume and numerically integrated. This cumulative amount of TOC is used in the subsequent data-driven modeling part to predict biomass concentration ( Figure 4).

| Calculation of liquid volume
To calculate the total liquid volume, all feeds and removals (sampling) need to be considered. The total reactor volume V total is calculated as in (2), where V start is the start volume after inoculation; V base , V meth , and V afoam are the cumulative volumes of base, methanol, and antifoam, respectively, fed into the bioreactor; V samples is the cumulative volume of samples automatically taken via the BaychroMAT ® autosampler: total start base meth afoam samples (2)

| Calculation of carbon dioxide emission rate
The calculation of the carbon dioxide emission rate r CO2 in (3) is adapted from Takors (2013), where Q air is the air flow rate, p is the pressure, R is the universal gas constant (8.314 × 10 −2 L · bar · mol −1 · K −1 ), T is the temperature, σ CO2 and σ O2 are the concentrations of carbon dioxide and oxygen, respectively, and the indices α and ω represent the gas inlet and outlet of the bioreactor, respectively:

| Calculation of methanol reaction rate
As described above, errors in the methanol control, such as an initial overshoot or a deviation of the measured methanol concentration to the setpoint (Figure 1) , for an exemplary process run. A median filter (window size = ten sensor readings) is implemented before and after the derivation step to handle noisy sensor readings. The characteristic negative peak (here at 39.6 hr) is the main indicator for the depletion of the batch phase substrate (glycerol) and thus the end of the batch phase. This landmark is used to initiate the start of the transition and fed-batch phase, respectively F I G U R E 3 Triggers for the multilevel detection of the end of the batch phase (i.e., depletion of glycerol). Only when defined values for base, V base , and absolute off-gas CO 2 concentration, σ CO2 , are reached for the first time, the last trigger-the timewise derivative of the offgas CO 2 concentration, σ / d dt CO2 -is active. The three thresholds are defined based on the calculation of the mean and standard deviation for each of the three variables at the end of the batch phase as well as a safety factor. The diagram shows normalized absolute variable values; error bars correspond to the normalized standard deviation (n = 9) T A B L E 2 Threshold, mean, and standard deviation for the three trigger variables V base , σ CO2 , and σ / d dt CO2 at the end of the batch phase (n = 9)

| Calculation of formation rate of total organic carbon
TOC refers to all carbon inside the bioreactor system that is bound in the substrate (glycerol or methanol) and cells as well as extracellular organic acids and proteins. The formation rate of TOC, r TOC , is not directly measured by reference analysis but calculated as follows by balancing the methanol reaction rate r meth and the carbon dioxide emission rate r CO2 : In the batch phase = r 0 meth and no glycerol is fed into the bioreactor; therefore, the carbon balance in this phase is = − r r TOC CO2 .

| Development of hybrid-model-based soft sensor
3.3.1 | Combination of mechanistic and data-driven parts in a hybrid model The output of the mechanistic part (mass balance for carbon), r TOC , is together with V base fed into the data-driven part. The data-driven part comprises a numerical integration step for r TOC to obtain the cumulative amount of total organic carbon, TOC, and a multiple linear regression (MLR) step. MLR was chosen as regression method because the prediction model uses only the two inputs TOC and V base .
Using TOC only for biomass prediction leads to acceptable prediction results (data not shown). However, the concentrations of dissolved carbon dioxide (H 2 CO 3 ) as well as extracellular proteins (c P ) and organic acids, which can in most cases not be measured online, distort the biomass prediction. The prediction model for biomass is therefore complemented by adding information about acids in the medium. The process variable with most information about acids in the medium is the cumulative base feed, V base . Because c P ≪ c X , the extracellular protein concentration is neglected for biomass prediction.
TOC is calculated as follows by multiplication with V total and numeric integration from the beginning of the process run (t 0 ) to the current time (t): When in sum (up to t) more carbon passed the bioreactor boundary to the outside than to the inside, TOC has a negative value.
The time course of TOC is together with r meth and r CO2 illustrated in Figure 5 for an exemplary process run. In the batch phase (Figure 5a), the only carbon passing through the bioreactor boundary is CO 2 .
Therefore, TOC has a negative gradient. In the fed-batch phase (Figure 5b), methanol is fed to the bioreactor, resulting in a net positive gradient for TOC.
The soft sensor uses three distinct sets of model coefficients for each the batch, transition, and fed-batch phase. For model calibration F I G U R E 4 Simplified representation of the hybrid-model-based soft sensor for biomass concentration c X . The methanol reaction rate r meth and the carbon dioxide emission rate r CO2 are fed to the mechanistic model; carbon balancing is here used to calculate the formation rate of total organic carbon, r TOC . The subsequent data-driven model uses the numerical integration of r TOC , namely, TOC, together with the cumulative base feed, V base , as inputs to calculate the amount of biomass X. Finally, X is divided by the total liquid volume inside the bioreactor, V total , to calculate the biomass concentration c X . Both the data-driven and the mechanistic parts can be carried out online via MLR in the batch phase, TOC and V base are used as inputs and the biomass amount X (determined offline as dry cell weight) as output.
The prediction equation is formulated as follows, where b 0 , b 1 , and b 2 are the model coefficients: In the transition phase, no significant cell growth or decline was observed, so b 0 was set to the value of X at the end of the batch phase (X batchend ) and b 1 and b 2 were set to 0. In the fed-batch phase, b 0 was set to X batchend and b 1 and b 2 were determined analogously to the methods used in the batch phase.
The regression step in (8) is related to the total liquid volume inside the bioreactor, V total . To determine the biomass concentration c X , the biomass amount X is divided by V total , as in the following equation:

| Cross-validation approach for model calibration and validation
The model was calibrated and validated by a batch-wise crossvalidation approach. The six data sets used for developing the bio- The use of separate subsets for internal and external (holdout) validation (OECD, 2014) does not appear to be practicable because F I G U R E 5 Illustration of the carbon balance for (a) the batch and (b) fed-batch phase for an exemplary process run. The carbon dioxide emission rate, r CO2 , and-in the fed-batch phase, additionally-the methanol reaction rate, r meth , are used to calculate the formation rate of total organic carbon, r TOC , as in (6). Multiplication of r TOC with V total and numeric integration as in (7) result in the cumulative amount of total organic carbon, TOC F I G U R E 6 Online prediction of biomass concentration c X during batch, transition, and fed-batch phase for an exemplary process run using the hybrid-model-based soft sensor. Both the batch and the fed-batch phase start with a lag phase after which cells grow exponentially (batch phase) or linearly (fed-batch phase). The two dashed, gray lines indicate the switches from batch to transition phase (39.6 hr) and from transition to fed-batch phase (46.5 hr), respectively the total number of data sets that are available for model calibration and validation is too small (n = 6).

| Online prediction of biomass using the multilevel phase detection
The multilevel phase detection algorithm resulted in a 100% correct hit rate for the detection of the end of the batch phase. On average, the phase end was detected 2.56 measurements (corresponding to 77 s) before the minimum σ / d dt CO2 was reached-which was defined as the end of the batch phase.
The arithmetic means for R 2 , RMSE, and NRMSE are calculated using the abovementioned 15 combinations of n = 6 data sets. The mean R 2 for the batch and fed-batch phases is .97 and .95, respectively; the mean R 2 for the entire process is .96. The mean RMSE for the batch and fed-batch phase is 1.14 and 5.05 g · L -1 , respectively; the mean RMSE for the entire process is 3.57 g · L -1 , which results in a mean NRMSE of 5.52%. The results for the model coefficients b 0 , b 1 , and b 2 in (8) are listed in Table 3. As described above, these model coefficients are used to determine the biomass amount X, which needs to be divided by V total to calculate the biomass concentration c X . V total varies be- and on average = V 12.56 L total (n = 6) at the end of the cultivation. In the batch phase, the intercept b 0 describes the initial biomass from inoculation. As mentioned above, b 0 was in the transition and fed-batch phase replaced by X batchend , which has a mean of 253.47 g (n = 6). The model coefficient for TOC, b 1 , is negative in the batch phase because here the carbon balance in (6) is simplified to = − r r TOC CO2 (boundary for the balancing is the bioreactor system) and thus TOC in (7) decreases with increasing CO 2 emission and biomass, respectively. In the fed-batch phase, in which methanol is fed to the bioreactor, TOC correlates positively with X .
The model coefficient for V base , b 2 , is positive for both the batch and fed-batch phase. In the fed-batch phase, b 2 is more than 50% higher than in the batch phase, which means that more than 50% base is necessary to maintain the pH setpoint on glycerol compared to methanol. The soft sensor's model coefficients switch automatically depending on the current process phase. The differences in the model coefficients b 1 and b 2 between the individual process phases indicate the necessity for the adaption of model coefficients with changing process phases.
The accuracy of the estimates of the model coefficients is given by the corresponding 95% confidence intervals, CI .95 (Table 3). None of the CI .95 contains the value zero, which is considered to be a primary indication that the model inputs are to a certain degree significant to the model output, biomass. The width of CI .95 relative to the absolute value of the model coefficient is a further indicator for the quality of the regression and hence for the uncertainty of the soft sensor model (Fernandes et al., 2012). For b 0 , the ratio of the width of CI .95 to the absolute value of the model coefficient is 55%; for b 1 , the ratio is 72% and 60% for the batch and fed-batch phase, respectively; for b 2 , the ratio is 11% and 9% for the batch and fed-batch phase, respectively.
The contribution of the model coefficients b 0 , b 1 , and b 2 to the prediction of X is illustrated in Figure 7 for an exemplary process run.
Here, each model coefficient's contribution was determined by disassembling the linear combination in (8) and dividing each model coefficient's prediction by the total model prediction. As expected, the contribution of b 0 starts with an initial value of 100% at the process start and decreases relative to the contribution increases of b 1 and b 2 . Until the end of the batch and fed-batch phase, the T A B L E 3 Results for model coefficients b 0 , b 1 , and b 2 in (8) and the corresponding 95% confidence intervals, CI .95 . In the transition and fed-batch phase, b 0 is set to the value of X at the end of the batch phase (X batchend ) CI 95 b F I G U R E 7 Contribution of model coefficients b 0 , b 1 , and b 2 to the prediction of biomass amount X during batch, transition, and fedbatch phase for an exemplary process run. The two dashed, gray lines indicate the switches from batch to transition phase (39.6 hr) and from transition to fed-batch phase (46.5 hr), respectively. The soft sensor updates its model coefficients automatically for the three distinct process phases contribution of b 0 falls to values of 1.72% and 0.57%, respectively.
Since b 0 is in the transition and fed-batch phase replaced by X batchend , the contributions to X batchend (18.99% for b 1 and 79.29% for b 2 ) are used as offset for the contributions of b 1 and b 2 throughout the latter process phases. The contribution of b 1 initially rises to a maximum of 36.41% approximately at the end of the lag phase and reaches contributions of 18.99% and 18.26%, respectively, at the end of the batch and fed-batch phase. The contribution of b 2 starts to rise when base is first fed to the bioreactor (see Figure 1) and reaches values of 79.29% and 81.18%, respectively, at the end of the batch and fedbatch phase. It can be concluded from these results that, approximately after the end of the lag phase, V base has a higher impact on biomass prediction than TOC. This result is consistent with the apparent high collinearity of V base (see Figure 1) and c X (see Figure 6).

| CONCLUSIONS
As mentioned at the beginning of this paper, several challenges can arise when attempting to develop soft sensors. One of these is specific to P. pastoris bioprocesses with distinct process phases such as batch, transition, and fed-batch phase. The underlying principles of prediction models for biomass are related to the inherent biological relations (Chen et al., 2004), which differ depending on the substrate used in the specific process phase. The fundamental differences in the metabolism of different carbon sources have a visible impact on CO 2 emission and the consumption of pH correction agent (see Figure 1), which are two of the main model inputs used in this study.
For multiphase processes with more than one substrate, this means that the probability of finding a single model that captures the information necessary for prediction of biomass is rather low.
This study demonstrates the application of a multilevel phase detection algorithm to determine the end of the batch phase (glycerol depletion) online. In every tested case, the algorithm provided the correct end time of the batch phase. The detection of this end time was used to trigger the transition phase and the subsequent methanol induction. The knowledge about the significantly reduced CO 2 emission that comes with glycerol depletion was effectively utilized. Specifically, the stoichiometric restrictions concerning the cumulative amount of supplied base (trigger 1) and emitted CO 2 (trigger 2) were used to increase robustness of the third trigger (timewise derivative of the off-gas CO 2 signal). The output of the phase detection algorithm was used to switch the parameters of the prediction model online. The prediction model was calibrated offline using a hybrid-model-based approach. The output of the mechanistic part (carbon balance) is fed to the datadriven part (MLR) to provide an accurate prediction of the biomass concentration. The process runs were conducted under the same operating conditions (initial glycerol concentration, constant setpoints for methanol, pH, dissolved oxygen, temperature, and pressure). However, the process runs and corresponding data sets used in this study were subject to variance of initial biomass concentration, which in turn resulted from the variability of the preculture. Further, errors in the methanol control, such as an initial overshoot or a deviation of the measured methanol concentration to the setpoint (Figure 1), occurred and additionally increased the variance between the data sets. Despite this variance between the used data sets, model evaluation results in a mean relative prediction error of 5.52% and R 2 of .96 for the entire process. These two evaluation criteria are of similar magnitude to those of other biomass soft sensors for P.
pastoris fed-batch processes (Beiroti, Aghasadeghi, Hosseini, & Norouzian, 2019;Crowley, Arnold, Wood, Harvey, & McNeil, 2005;Fazenda et al., 2013;Surribas, Geissler, et al., 2006;. In the approach presented here, however, the soft sensor is adaptable online to the different process phases, and no cost-intensive spectroscopic measurement system is necessary. The robustness of the soft sensor with regard to different process conditions (e.g., variation of methanol, pH, and temperature setpoint) was not in the scope of this study. These investigations are subject of future research.
The main constraint of the presented soft sensor is that the prediction in the transition and fed-batch phase is directly dependent on the prediction result in the batch phase. This is due to the passing on of the biomass prediction at the end of the batch phase (X batchend ) as a start value for the prediction models of the subsequent phases.
The effect of error propagation can be visualized by considering the slight decrease of R 2 and increase of prediction error between batch and fed-batch phase. It should further be noted that the carbon balance in the individual phases depends on constant ratios of biomass formation, CO 2 emission, and-in the fed-batch phasemethanol metabolization. Longer periods of substrate limitation or metabolite inhibition would impede an accurate biomass prediction if these scenarios are not included in the data sets used for model calibration.
Knowledge-based relationships were combined with data-driven methodology in this study. No general statement can be made here about whether mechanistic, data-driven, or hybrid approaches are superior because the choice is strongly dependent on the available process knowledge and measurement systems (offline/online) as well as the number of data sets and data points (Solle et al., 2017).
However, in this study, the usage of a hybrid approach appears to be suitable because of the benefits from both components of it. This is due to the availability of the necessary online measurement systems for capturing the information relevant for modeling biomass (off-gas CO 2 , methanol, cumulative base feed) and, on the other hand, the relatively small number of data sets (six) for model calibration and validation.
The transferability of the developed phase-dependent soft sensor to other fed-batch cultivations with different P. pastoris strains, control strategies, media, and process parameters must be investigated in future research. It is supposed that the presented