Narrow Operating Space Based on the Inversion of Latent Structures Model for Glycosylation Process

N-linked glycan distribution plays a significant role in the generation of therapeutic proteins. It is challenging to determine the operating conditions when developing a new biopharmaceutical product with the desired glycan distributions. The glycosylation is a high complex nonlinear system, and it is difficult to develop a reliable first-principle model that heavily relies on experimentation. Our goal is to develop a nonlinear data-driven model and find an appropriate operating space included kinds of input combination from process variables based on this model to ensure the desired product quality. A methodology is proposed based on the inversion of a nonlinear latent-variable model (locality preserving projection to latent structures, LPPLS) to identify a subspace of the knowledge space. The normal operating points of the input variables are designed based on the LPPLS inversion, and the range of operating conditions are expanded around the normal operation points through the prediction uncertainty analysis of forward and inversion model simultaneously. Finally, the designated operation space from LPPLS inversion is applied in an benchmark glycosylation model.


I. INTRODUCTION
The monoclonal antibodies play an important role in biomedical science as a glycosylation end products, since the products of the glycosylation process are directly associated with the therapeutic effects of related proteins. Existing literatures demonstrated that effectiveness in vivo activity and immunogenicity of therapeutic proteins are affected by the structure of the end products [1]- [3]. Glycosylation reaction actually is an intricate and nonlinear process described by plenty of differential equations [4]- [7]. Obtaining desired glycan distributions via proper operating conditions are challenging for the biopharmaceutical manufacturers.
The US Food and Drug Administration (FDA) has initiated Quality by Design (QbD), a new method to drug development and production based on the mathematical tools [8]. Finding a suitable range of operating condition is necessary for assurance of the critical quality attributes through an intensive understanding of the relationship between the The associate editor coordinating the review of this manuscript and approving it for publication was Salman Ahmed . product quality and the input variables. In order to obtain the desired end products, it is crucial to guide the glycosylation process development by insight into the tight interconnection between manufacturing process conditions and product quality [9]- [11].
The space of operating conditions can be determined through the appropriate model between the process variable and product quality. First-principles model is an effective way to predict the quality or performance adequately in some chemical reactions [12]. It gives the kinetic order and reaction rate constant (prospective reaction conditions) and optimize the desired outcome with appropriate understanding of physical significance. Pantelides et al. pointed out that the first-principles model is a reliable method for describing the complex and nonlinear systems and is very useful for calculating acceptable ranges based on the assessment [13]. Design of experiment (DOE) studies are conducted to gain physiological insight and determine the acceptable input variable ranges [14]. Rathore and Winkle used DOE method to identify the parameters for process characterization [15]. Chatzizacharia and Hatziavramidis did a comparative experiment in which several different methods were used to find the operating conditions for various data features [16]. Schoberer assess systematically the controllability of different types of specific glycoforms and glycans in glycosylation process through a statistical design of experiments scheme and Analysis of Variance (ANOVA) [17].
Mathematical modeling has been encouraged by the regulatory agencies to support the implementation of QbD paradigms [18]. The first-principles model provides a clear description of the process mechanism. However, it relies heavily on the experiments, which makes the development of reliable first-principles models difficult for the pharmaceutical industry. What's more, the experimental effort raises significantly according to the fractional factorial design if the input number is large. Therefore, the trend of modeling in QbD is data-based rather than experience-based or experiments-based. It benefits from all kinds of data modeling methods such as multivariate statistical analysis, machine learning and artificial intelligence, who systematically extract much useful information from data of the developed products.
Latent variable model is a specifically designed tool to analyze massive amounts of correlated data, including principal component analysis (PCA) [19] and projection to latent structures (PLS) [20] and so on. The underlying factors acting on a system are extracted optimally to describe the relationship between the quality outputs and inputs. Latent variable model inversion is firstly presented to develop the operating conditions by Jaeckle and MacGregor [21]. Then the general framework for latent variable model inversion for the design and manufacturing of new products is given [22]. By inverting a latent model, a set of inputs enables a process to obtain an assigned output. If the model is sufficiently accurate to be unaffected by uncertainty, the model can be reversed directly to find the appropriate operating point without changing the quality of the products. However, since all the latent-variable models are subject to the uncertainty, the uncertainty inevitably is back-propagated to the designed inputs and the designated operating space when the models are inverted [23], [24]. For example, Pierantonio and Filippo raised the PLS inversion model to locate the operating space inside the given space under the uncertainty of the PLS model predictions [25].
Unlike other manufacturing industries, the glycosylation is a non-template driven cellular process that is exceedingly high-dimensional, complex, and non-linear. There is no sufficient mechanistic knowledge to develop high-fidelity mathematical model of glycosylation, so data-based model is a good choice for the development of new drugs. The traditional PLS is a linear modeling technique that has limited representation if process variables are relative in a strongly nonlinear way. There are several nonlinear PLS methods, such as Kernel Partial Least Squares (KPLS) [26], Quadratic Partial Least Squares (QPLS) [27] and Neural Network Partial Least Squares (NNPLS) [28]. These nonlinear PLS approaches more and less suffer from the high computational complexity. The principle of locally weighted regression is good solution for dealing with the nonlinearity, which goes back to seminal work of Cleveland W. ( [29], [30]. This approach was elaborated into many other forms. For example, [30] proposed an incremental form of locally weighted projection regression, where PLS projection was incorporated on locally weighted inputs. Similarly, motivated by the distinct advantage of locality preserving projection (LPP) in nonlinear dimensionality reduction, two kinds of nonlinear PLS techniques are given, e.g., Global and Local Partial Least Squares (QGLPLS) [31] and Locality Preserving Partial Least Squares (LPPLS) [32]. Especially, LPPLS utilizes the LPP technique instead of the extraction role of principal components in PLS, and can be more simple to build nonlinear PLS model and easily to apply in actual process.
In this study, we restrict our attention to the intracellular process of glycosylation at the micro-scale, before tackling the practical problem of finding appropriate macro-scale process variables to control the process of glycosylation [33], [34]. Here, LPPLS method is used to provide insight into the internal nonlinear interaction of the inputs (bio-enzymes) and outputs (various glycan classes or specific glycoforms) for glycosylation process. One can determine the appropriated process operating conditions that enables one to obtain an assigned pharmaceutical product by inverting the LPPLS model. The operating space is found only under the uncertainty from the forward model output prediction and the null space in the existing literatures. Different from the current approaches, a narrow operating space is located inside the given knowledge space under the prediction uncertainty both in the forward and inverse LPPLS models. As we known when the uncertainty back-propagates to the calculated inputs in the inverse model, it will double the prediction error and relax the experiment space. The input uncertainty directly from the inverse model can bracket the operating space than the null space approach.
The highlights of our work include, (1) A methodology is presented to determine the operating space for a new biopharmaceutical product manufactured from the protein glycosylation process in vivo. The methodology relies on the latent variable model explored from the historical databases on the developed products that are similar to the new ones. The existing approaches for glycosylation are based on the first-principles model and analysis of variance methods.
(2) A new application of PLS and LPPLS model inversion is proposed to interpret the nonlinear relationship between the critical product attributes and the critical operation variables, and LPPLS model inversion is derived for the operating space. Since glycosylation is a cellular process with the exceedingly high-dimension, complexity, and non-linearity, LPPLS model shows higher prediction precision than the traditional PLS model.
(3) Operating space is predetermined under the prediction uncertainty both in the forward and inverse LPPLS models. The input uncertainty directly from the inverse model can crumple the operating space based on the traditional null VOLUME 8, 2020 space approach since the proposed method avoids the uncertainty back-propagation to the calculated inputs in the inverse model.
The proposed LPPLS inversion methodology is applied to an illustrative glycosylation model that is faithfully represented by the Krambeck and Betenbaugh (KB2005 model) [5]. The designed protein qualities are characterized by eight classes of glycan as well as three specific glycoforms that typically found in recombinant biologics produced with Chinese Hamster Ovary (CHO) cells. Specific nine kinds of glycosylation enzymes are selected as the input variables since they have the direct/indirect and interactive effects on final glycosylation products. The operating space based on LPPLS inversion model is narrow and reliable, and the statistical results for protein qualities obtained under the designed operation conditions in the operating space are very closer to the desired qualities.

II. MATHEMATICAL BACKGROUND
PLS is an effective statistical tool to analyze highly correlated data by reducing the vast information into a few meaningful quantities and interpreting the implicit relationship. However, PLS extracts the maximum correlation between the quality variables and process variables (the global structural information), so it shows poor capability in some strong local nonlinear system. LPP is a newly developed dimensionality reduction algorithm [35], which makes a global nonlinear problem converted into the combination of a plurality of local linear problem by introducing local structure information. A novel combination statistical model of LPP and PLS, named as LPPLS, is given to utilize LPP technique instead of the role of PCA theory in PLS [32]. LPPLS shows excellent nonlinear modeling features, so it is suitable to analyze the highly nonlinear and intricate glycosylation process. Now we use LPPLS to extract the maximum relevant information between glycosylation enzymes and glycan outputs in glycosylation process.

A. THE LPPLS MODEL
Normalization is the transformation of the original data linearization method into the range of [0 1] using a linear function. The normalization formula is as X norm = (X − X min )/(X max − X min ). X norm is the normalized data set, X is the original data, and X max and X min are the maximum and minimum values of the original data set. This method achieves an equal scaling of the original data.
Consider two normalized data sets, , respectively. Considering the problem of mapping X and Y to the subspace vector by the transformation.
LPPLS is a multivariate regression technique used to correlate an input and response. It can be used to solve nonlinear problem by local linear technology. The LPPLS model is based on the performance of LPP to preserve the local structure. It also maintain the properties of the PLS to maximize the extraction of relevant information.
Extract the local feature from the normalized X and Y The principal components of the input space X and output space Y are extracted by LPP simultaneously to preserve local characteristics. Both S x and S y are n * n dimensional matrix, and S ij is an adjacency weighting matrix denoting the neighborhood relationship between x i and x j (or y i and y j ), where δ x is the neighbor parameter.
A number of latent variables T = (t 1 , · · · , t A ) are defined to represent a low dimensional space through the LPPLS locality preserving projects. The neighboring mapping of scaled and mean centered X L and Y L are decomposed as follows where T = (t 1 , · · · , t A ) indicates the latent variables P = [p 1 , p 2 , · · · , p A ] and Q = [q 1 , q 2 , · · · , q A ] are loading matrix for X L and Y L respectively. The score matrix T can be computed directly From eq. (7), we obtain where T 0 is the core matrix of PLS method, X = S −1/2X ,X andȲ represent the residual space of input and output respectively.

B. FORWARD MODEL UNCERTAINTY
There is mismatch on predictions in the LPPLS model due to model calibration. The mean square error prediction method is used to characterize the prediction uncertainty. The confidence interval (CI) on the observed output y obs based the LPPLS forward model is defined as: whereŷ obs is the estimation output y obs , N is the number of the LPPLS model calibration samples, d represents the number of degrees of model freedom, and δ is the significance level for the CI. The s is standard deviation, which can be calculated as: SE is the standard error of calibration and h obs is the leverage of the observation: where y n is the n-th measured output andŷ n represents the n-th estimated output of the model calibration dataset.

III. NARROWING THE OPERATING SPACE BASED ON LPPLS INVERSION A. LPPLS MODEL INVERSION
Process design based on the latent variable models was introduced by Jaeckle and MacGregor [21], [37]. In general, latent variable models are used to predict the output of a model using the relationships established between variables [36]. Conversely, the latent variable regression model inversion provides helpful guidance for developing new products. It is built from the existing data and developed to assist product and process design [21]. In its inverse form, the unknown combination X new of inputs can be calculated and a feasible target region for the operating conditions can be specified by suggesting to obtain a desired output y des . The LPPLS relationship between input and output can be written as following, where P T represents the correlation matrix between input and output.
Here, products with a single quality attribute are used for analysis. Suppose the relationship between y and X in the LPPLS model is given in Eq. (9) and Eq. (10). We assume the case that desire response y des is completely defined, then the LPPLS model inversion can be applied to calculate the operation conditions X new corresponding to the desired quality y des .
Consider the following two cases: • The dimension of new operation conditions X new is same as that of y. Then it is shown as: where • The dimension of X des is larger than that of y. The result would be located in an underdetermined equation system, which may cause an infinite number of solutions. We denote a rank(X ) − rank(y) dimension of null space as where G are the left singular vectors of S A that in Eq. (20) associated with the rank(X ) − rank(y) dimensional zero singular values. A set of possible conditions within the range of past operating conditions is provided by selecting λ.

B. UNCERTAINTY OF LPPLS MODEL INVERSION
The LPPLS inverse model give the possible input space X new , which may differ from the actual value X corresponding to y des in the actual process. The difference comes from the uncertainty in the model. The prediction uncertainty of the forward model was explored by estimating the standard deviation of the prediction error s [23]. There are two steps, the first step is to obtain an estimate of the standard deviation of the prediction error s. Then, assuming that the estimated error follows the t-statistic, a confidence interval can be established. However, it is unreasonable to focus solely on the uncertainty of the output y instead of the input space X . The modeling uncertainty is doubled and the error is increased after back-propagation to the inputs. So calculating input uncertainty directly from the inverse model is necessary.
The inputs variance can be calculated from Eq. (17), and it is noticed that the main sources of y des uncertainty is the parameter S x , S y , P and Q in the model calibration, To facilitate the calculation of the variance in Eq. (21), the coefficient of variation CV [x] is defined for solving.
where E(x) is the expectation. CV (x) is the standard deviation of a random variable with a unit mean. The uncertainty of X-space comes from model parameter which can be calculate as where the X and y both are training data. Finally, we get The standard deviation s of the prediction error is defined by the error of the model parameters. According to the uncertainty theory proposed in [38], the estimated error follows the t-statistic, 100 × (1 − δ)% confidence interval (CI) on X new is calculated in Eq. (48) [39]:

C. DETERMINE THE OPERATING SPACE
We suppose that X new is a series of inputs that can be used to achieve the desired quality of Y des in complex pharmaceutical process, where X new could be several different input combinations. Assuming that the model is accurate enough and there is no uncertainty, the score vector T new obtained by the model inversion projection will not affect the product quality in the null space range. In this case, the operating space corresponding to the product y des is identified as null space. However, the latent variable models are affected by prediction uncertainty, which affect the null space calculation itself as well.
In some way, the experimental effort could be significantly reduced if the experimental domain were restricted to the region of subspace within which the operating space is likely to lie. Pierantonio [25] considered that the prediction uncertainty directly from forward model is back-propagated to the calculated inputs which would affect operating space by increasing the region of null space. Though the operating space is conveniently narrower than the knowledge space which is the entire space of the knowledge from the available historical dataset, there is a huge amount of computation and possibility of input value combination.
We propose a new calculating uncertainty method of inputs directly from the inverse model. When considering uncertainty of inputs, we utilize the same y des and project on T new that in the null space. This methodology would narrow the operating space into the segment of null space related to y des , and be able to improve experimental efficiency.
The narrowed operating space by following the steps which is adapted from reference [25].
(1) Evaluate the uncertainty of y des according to the Facco P's [25] method. The significance level is δ = 0.05, which corresponds to a confidence level of 95%. The probability density function of the t distribution centered on y des shown as Figure 1(a).
(2) The score t new corresponding to y des projected in the space of the first two latent variables are represented with a small triangle. The null spaces are calculated and expanded into the red region by 95% CI for y des in Figure 1(b). The 95% CI for y des is highlighted.
(3) The 95% confidence interval on X new are described in Figure 1(c) and confidence interval of inputs is lying along the null space.
(4) The narrowed operating space located in the intersection between the null spaces calculated by the prediction uncertainty for y des 95% CI (who obtained from the forward model) and null space uncertainty for X new 95% CI directly (who obtained from the inverse model).

IV. CASE STUDY A. MATHEMATICAL MODEL OF GLYCOSYLATION PROCESS
Glycosylation is a post-translational modification process both at the cellular and protein level in which a carbohydrate glycan is added to protein in the endoplasmic reticulum and Golgi apparatus of a cell. The glycoproteins migrate through the four compartments of Golgi in sequence and encounter various enzymes involved to generate different kinds of specific glycoforms. Significant efforts have been made in understanding the roles [41] and the mathematical models of glycosylation [4]- [7]. Glycoprotein surface connects kinds of oligosaccharide precursor to the glycosylation sites by the action of glycosylation enzymes. The sequential enzymatic reactions of the glycosylation process are described via a reaction network with a set of reaction rules. The mathematical model for testing our method are previously developed by Krambeck and Betenbaugh (KB2005 model). This model gives the effects of 11 known enzymes and 4 donor substrate concentrations are studied. The whole reaction network KB2005 model consists of 20 reaction rules and generates 22,871 glycosylation reactions and 7,565 oligosaccharide structures [5]. The first few generations of the reaction network are illustrated in Figure 2 but the complete network would be unwieldy to project graphically. To improve the calculation efficiency of the modeling and data analysis in section 3, the KB2005 model is simplified by deleting several unnecessary enzymes. Finally, we have 365 reaction networks and 1460 enzyme reactions. The calibration and validation matrices (so-called history data) are simulated in MATLAB. The historical data set includes 100 observations of nine input glycosylation enzymes (ManI, ManII, FucT, GnTI, GnTII, GnTIV, GnTV, GalT, SiaT) and of one response variable (one of the glycan class output S0, S1, S3, G1, G2, G3, G4, F1 or one of specific glycoforms, A1F, A2G1F, A2G2S1F).
The Golgi network were modeled as a well-mixed reactor. Ultimately, the glycoforms satisfy the following equation at a stead state [5]: P ij = P ij−1 + τ j r ij (27) where P ij is the concentration of glycoform, and i indicate species in the Glogi compartment j = 4, τ j is the residence time of compartment j. For the simplied model, i = 365. r ij  is the reaction rate of producting glycoform i, K m , k f and K md are the constants, which represent enzyme kinetic parameters. UDP-S is the concentration of nucleotide sugar donors, E t is the enzyme concentration. These nonlinear equations are solved to obtain the concentrations of each glycoform i in the four Golgi compartments.
It is a challenge to design operating conditions due to the complex nature of glycosylation. Krambeck FJ, Betenbaugh (2005) [5] developed the first-principle models to get better understanding of the whole glycosylation process, yet the model has disadvantages with too many parameters in fairly complex network. Amand et al. [34] put forward systematically and judiciously according to a statistical design of experiments (DOE) scheme to assess the controllability of various classes of glycans and of specific glycoforms in glycosylation process in 2014. Reference [33] establishes a comprehensive strategy for effective glycosylation control. The glycan distribution is directed to achieve the desired state by designing appropriate manipulated variables. In their studies, it has proven that enzymes is the critical conditions which the process should operate in a designated space to yield a product with desired quality characteristics.
Instead of heavily carrying out experiments, we utilize the latent variable models inversion to determine the operating space which is a range of operating conditions for a new specific product.

B. PLS AND LPPLS MODEL COMPARISON
The calibration data sets are used to build a latent variable model. The model evaluates the relationship between input and output. The number of latent variables is determined by a screen test, and the selection principle is to include enough information in the inputs and outputs as much as possible [40]. Finally, two potential variables were selected and the model created includes 96.1% for the y variance.
Latent variable models are first built using the calibration data sets and are developed to evaluate the relationship between input and output. The number of latent variables is determined by the screen test to explain a sufficiently large fraction of the variance not only of the product quality but also of the input variables. We using 2 latent variables the model explains 96.1% of the variance of y.
For LPPLS model, the parameter δ is for adjusting adjacent relationship matrix S x and S y . The adjustment of S x and S y will have an impact on the output predictive ability of LPPLS model. Here, the parameter δ = 0.1 is determined from the history data by trial and error. The comparison of relative error can be calculated using the formula. The relative error= y − y pre /y. y pre means the predictive value of y. Table 1 shows the performance of PLS and LPPLS model on the calibration and validation data in order to testing the operation variables x effect on response of glycosylation. We calculate the mean SSE value of these output, the mean SSE of validation is reduced from 11.64 to 9.56, which indicates calibration and validation accuracy of LPPLS is higher than that of PLS. It represent that LPPLS method has good fitting for the nonlinear relationship between the inputs and outputs with lower SSE value of nearly each glycan or glycoform response.
It is shown that the predictive ability of PLS and LPPLS models for glycan response S0 of glycosylation process in Figure 3 and Figure 4. All the comparison of shows that LPPLS have better generalization ability than that of PLS model dealing with nonlinear system.

C. MODEL INVERSION AND RESULT
In this section, the results of the operation conditions based on inverse model uncertainty are shown. We utilize model inversions to find the X new related to desire output, and compare the relative error between the value of y des and the value of new output y new (the simulation result in mechanism model of glycosylation process corresponding to X new ) to observe the effect of two different methods. Table 2 gives the results of desired glycosylation y des output and the value of y new corresponding to operating conditions calculated from PLS model and LPPLS model respectively. And the comparison of relative error is calculated using the formula error= (y new − y des ) /y des . It clearly demonstrates both value of y new within operating condition X new of PLS and LPPLS are similar to the expected value y des , and the y new of LPPLS is closer to the desire output y des than that with PLS. The result indicate that LPPLS is more precise to search a proper  operating conditions. PLS method may be limited when process variables are associated in a strongly nonlinear manner. The method of using LPPLS to implement glycosylation reaction process appeared more suitable for its local preserving quality. For the advantages of LPLLS modeling and the performance comparison with other methods, our previous work explores the superiority of LPLLS for nonlinear problem processing [32].

D. THE OPERATING SPACE BASED ON PLS MODEL AND LPPLS MODEL
In order to obtain the operating conditions corresponding to the desired product quality, we design the space of the operation conditions directly based on model uncertainty and confidence interval is set to be δ = 0.05 (significance level). Table 3 gives the corresponding operation conditions and its confidence intervals for desired product quality. It can be VOLUME 8, 2020   1.5] which brackets the entire fraction of the glycosylation experiment space. The designated space can greatly reduce the cost of the experiment.
As discussed preciously, LPPLS seems to be a more appropriate way to find operating conditions. Comparing the average standard deviation by applying PLS and LPPLS in Table 3, it's worth noting that values of the average standard deviation s are related to the mechanism model of glycosylation process which reflect different enzymes have different degree of influence on the different products. The smaller the value of s, the greater correlation with the products.
The average standard deviation of SiaT which is tight bound up with output (glycan class S0) should lie in a low range value 0.03, and that of LPPLS (0.02) is a bit lower for its high modeling accuracy. SiaT only provides sialic acid residues, which is the most important part of the S glycans during glycosylation. It is consistent with the glycosylation machinery. Low value of the s indicates that SiaT is a sensitive input enzyme associate with S0. Furthermore, the value s of enzyme unnecessary for production of S0 like Fuct and Galt are also improved in LPPLS via expanding CI to larger regions from 2.38 -2.72 and 0.56 -0.84 to 1.96 -3.15 and 0.29 -0.92.
The dataset contains 100 observations of 9 input enzymes and one response variable S0. Using 2 latent variables, the model explains 96.1% of the variance of y in PLS  and LPPLS. Confidence limits of historical data is considered in the latent space of the scores in the shape of an ellipsoid as well. Within A = 2 > k = 1, a null space exists. Figure 5 shows a graphical representation of the one-dimensional null space. It shows the fractional space of the first two latent variables. Here, circles represent historical data, and dashed ovals are 95% confidence space. We assume value of λ is from −0.4 to 0.4 and then the null spaces associated to each score vector are calculated resulting in the blue line of Figure 4. The green little triangle indicates the operating conditions vector T new . Process operating conditions X real = X new + X null produce the product y des with same quality characteristics since null space not affecting the desire output. Engineering judgement would be used to select a reasonable conditions from by different value of λ to determine the most economical operation.
In figure 4, the red line stands for the Tnew designate space which is bracketed by model uncertainty given in section 3.4 at the significance level δ = 0.05. The operating space is lying within the null space and narrow the entire region of it. In addition, the boundary of operating space X by applying the LPPLS inverse model are shown in Table 3. Compared with null space in Figure 4, the operating space significantly reduces the required experimental effort to achieve the desire output.
The narrowed design space could be found in the intersection area between the null spaces calculated by the prediction uncertainty on y des and null space uncertainty for X new , when both the inversion certainty of input the prediction uncertainty of output are analyzed simultaneously. The uncertainty on y des is estimated at δ = 0.05 which are described in Reference [24] Figure 6, panel (a) illustrates that the bracket design space of PLS associated with each score are located in red area. The design space of LPPLS (red region in panel (b)) is obviously narrowed. Figure 7 illustrates the comparison of predict value of y within PLS and LPPLS inversion forms respectively. Each y pred is mean values of y pred calculate by a series X new selected from operating space of PLS and LPPLS. Product quality ypred is obtained with different value from several operating conditions X new corresponding to their own y des . Therefore, multiple sets of operating conditions are selected to reach a statistically significant results. In this experiment, we chose 4 set of product quality of S0 (y des = 22, 23, 24, 25) and 10 set of operating conditions for each product quality. For example, X new related to S0−des = 21 is calculated by model inversion forms and expanded to operating space under model uncertainty, we chose 10 X new in operating space and obtain 10 different value of y pred . We see that the y pred obtained from LPPLS inversion are much closer to desired values than that from PLS inversion form. The results showed that LPPLS is more accurate compared with PLS, that means the nonlinear method of searching glycosylation process operating space is effective.

V. CONCLUSION
In this article, we have developed a methodology based on local preserving projections latent variable in inversion form to determine the process operating conditions of desired quality specifications. The confidence interval based on the inverse model is used to design the space of the manipulated variable of the required quality. LPPLS has the advantage of LPP to strong local nonlinear system for describing glycosylation process and obtaining desired glycan distributions. From the simulated cases, the results show that the specified experimental space obtained using the proposed method is effective and much narrower than the space obtained from PLS. Compared LPPLS with PLS, LPPLS provides a more accurate model to decrease its estimate error, what's more, product with specific quality obtained from LPPLS operating space by model inversion are much closer to desired values than that from PLS. Both PLS and LPPLS has been applied to processes in glycosylation process, however, the window of operating conditions of LPPLS appeared to be more appropriate in such nonlinear glycosylation process to refine the conditions. Furthermore, this study has only considered the case that desire class glycan determined by a single quality attribute. Extending this methodology to the multivariate glycan groups is an area for further research.
JING WANG (Member, IEEE) received the B.Eng. degree in industry automation and the Ph.D. degree in control theory and control engineering from Northeastern University, in 1994 and 1998, respectively.
She was a Professor with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China, from 1999 to 2020. She is currently a Professor with the School of Electrical and Control Engineering, North China University of Technology, Beijing. Her research interests include advanced control, process monitoring, fault detection and diagnosis, and their applications. He is currently a Professor with the College of Information Science and Technology, Beijing University of Chemical Technology. His research interests include stochastic distribution control, fault detection and diagnosis, variable structure control, and their applications.
TONGLAI XUE received the Ph.D. degree in environmental science and engineering from the Beijing University of Technology, in 2016.
He is currently a Senior Engineer with the School of Electrical and Control Engineering, North China University of Technology, Beijing, China. His research interests include advanced control, process control, and data mining. VOLUME 8, 2020