Towards rational glyco-engineering in CHO: from data to predictive models

Metabolic modeling strives to develop modeling approaches that are robust and highly predictive. To achieve this, various modeling designs, including hybrid models, and parameter estimation methods that define the type and number of parameters used in the model, are adapted. Accurate input data play an important role so that the selection of experimental methods that provide input data of the required precision with low measurement errors is crucial. For the biopharmaceutically relevant protein glycosylation, the most prominent available models are kinetic models which are able to capture the dynamic nature of protein N-glycosylation. In this review we focus on how to choose the most suitable model for a specific research question, as well as on parameters and considerations to take into account before planning relevant experiments.


Introduction
High productivity and correct glycosylation are two main goals of the pharmaceutical industry for biotherapeutics production. Glycosylation is one of the most important quality parameters. Due to (i) its impact on biological activity, in vivo half-life and immunogenicity, all of which define the safety and the efficacy of the product and (ii) the non-template driven complex network of the glycosylation machinery it needs to be carefully monitored and regulated [1,2].
The most widely used expression systems for biotherapeutics are Chinese hamster ovary (CHO) cell lines [3]. CHO cells are able to produce human-like glycosylation patterns, which make therapeutic proteins more compatible with and bioactive within the human body [4]. Despite being commonly used, CHO cells still struggle with instability caused by mechanisms on genomic, transcriptomic and proteomic level [5] and high productivity often results in simplified glycosylation due to overload of the cellular glycosylation machinery [6,7,8].Moreover, non-engineered CHO cell lines are not able to produce some human glycosylation structures such as α-2,6-sialylation [9] and α-1,3/4-fucoslyation [10]. Rarely, they can produce glycosylation structures not present in humans, such as Nglyconeuramic acid (Neu5Gc) [11] and galactose-α1,3-galactose [12], which can induce immune response in humans. Nevertheless, after careful selection of suitable subclones, CHO is still the most adequate expression system [3]. In comparison to human cell lines, it is safer to use due to lower susceptibility to human viruses. Compared to other available mammalian cell lines, the CHO glycosylation machinery produces the most human-like glycosylation profiles, which is, as mentioned above, important for immuno-compatibility within the human body. Non-mammalian cell lines that lack the required glycosylation ability are thus mainly used for production of non-glycosylated products [4].
Glycosylation, the attachment of sugar moieties to proteins, is a posttranslational modification which takes place in the Endoplasmatic reticulum (ER) and the Golgi apparatus (GA). First, a pre-made complex attached to dolichol is transfered to the nascent polypetide. Subsequently, after removal of some sugar moieties, ER and GA resident glycosyltransferases attach additional sugar residues from nucleotide sugar (NS) to the growing glycans [13]. The pattern of glycosylation of proteins is influenced by the producer cell line [14,15], culture conditions [16,17] [18] * and the structure and amino-acid sequence of the protein [19,20,21]. For example, the therapeutically important antibodies, like IgG1, bear relatively simple biantennary glycans at a conserved N-glycosylation site at Asn297 in each Fc region [22,23], while other proteins, such as erythropoeitin have several glycosylation sites and also bear more complex structures with up to four antennae [24].
Glycoform heterogeneity at the time of harvest, called "temporal dynamics" is influenced by the kinetics of metabolic reactions involved in glycosylation which in turn is primarily determined by changes in availability of nucleotide sugar donors and enzyme co-factors, the levels of glycosylation enzymes and their activity and by-products from the central energy metabolism [25]. As protein glycosylation is a complex, multi-step process taking place in several cellular compartments that the protein reaches sequentially, this non-template mechanism as such can lead to considerable variability. Consistent glycosylation between production batches is important to meet safety specifications of the product [16], as changes in glycosylation influence its pharmaceutical properties [26] * .
Protein glycosylation can be manipulated with different glycoengineering approaches that include genetic engineering of cell lines [27] * [28], modifications of cell culture media [18,29] * [30,31] and process parameters [32] which is also reviewed in Sha et al. [23]. Due to the complexity of the glycosylation machinery, the outcomes of these interventions do not always show the desired results. Yet it is important to understand the connecting mechanisms between intracellular factors associated with glycan biosynthesis pathways and the consequences of glycoengineering, media composition and process parameters. For this, further mechanistic insight into the cell is required to explore intracellular changes [23].
Considering the complexity of the glycosylation process and many parameters that play a role, it is difficult to investigate the process just by experimental 3 work and this is where mathematical models offer clear advantages and insight.
They complement experimental investigations in addressing the effects of cellular or process changes induced to increase recombinant protein productivity or maximize desired glycoform fractions within the glycoform population [33].
Here we are interested in the impact of process conditions on glycosylation. For this, we need to build a model that contains the glycosylation process coupled with a process model that allows us to understand the multiple interactions and feed-back loops between the two. Each of these "modules" can be modelled differently and has different data requirements. We will address three major topics, namely (i) an overview of the recent developments in glycosylation modelling and glyco-engineering with a focus on kinetic modelling, (ii) experimental methods to measure relevant parameters required for valid predictions, and (iii) available data sources for these parameters. The focus is to enable the choice of a model most suitable for a specific research question, but also the planning of relevant experiments.

Recent advances in glyco-engineering and -modelling
The first glycosylation model was established in 1995 by Shelikoff et al. [34]. Since then and with the increase in therapeutic protein production, the interest in glycosylation modelling has grown in academia and industry [18,27,29,35] * [28,30,31,32,36,37]. These models are listed chronologically in Figure   1, distinguished by the respective modeling approach, kinetic or stoichiometric.
Several deficiencies still constrain the models' usability, as (i) mechanisms of glycosylation are not yet fully understood [7], (ii) a large number of kinetic and transport parameters used for developing the models are not fully known and may well be species-specific (we do not know for lack of experimental and precise data) [38] and (iii) after simulations and sensitivity analyses, the predictions have to be compared to multiple experiments, to confirm their predictive power [39,40].

Modelling approaches
Mathematical modelling can take different approaches depending on the system under investigation and the scope of the system. In general, mathematical modelling can be divided into stoichiometric and kinetic modelling, where each has its own strengths and weaknesses. In brief, stoichiometric modelling assumes steady state and it can be used on genome-scale, whereas kinetic modelling can describe dynamic changes, but can only be performed on smaller scale, for example representing separate cellular processes. For a detailed description of the modelling approaches we refer the reader to recent review papers on metabolic 5 modelling [41,42,43].
As glycosylation is a dynamic process that changes over time, the best way to describe it is with kinetic modelling. This type of model requires kinetic parameters which add up to the complexity, so that it is not suitable for large genome-scale models [41]. To develop a kinetic model with good predictive capability, reliable and precise input data on essential elements are required, including (i) a defined model structure and reaction network, (ii) parameters for kinetic reactions, (iii) an estimation of unknown parameters and (iv) experimentally obtained mass balances [42]. A schematic representation of the process of building a model is shown in Figure 2. The first step is to define a model structure with a detailed cellular network and the metabolic pathway under investigation ( Figure 2a). The defined model is then fed with input data from experimental work and kinetic parameters obtained from the literature ( Figure   2b). The next step is to train the model using different approaches to adapt the quality of and the precise parameters of the input data ( Figure 2c  and global alterations in cell metabolism [31]. They further expanded their GFA model and improved its computational efficiency which allowed them to identify process parameters that could contribute to the changes in the intracellular IgG glycosylation network. However, the improved GFA still could not solve the dynamic enzyme-specific changes [46].
The selection of the most suitable model will depend on the predictions we want to make as well as on the level of the details required to describe the glycosylation process. The majority of glycosylation models developed in recent years took kinetic modelling approaches due to their ability to describe dynamic changes in the process. From the above examples one can see that kinetic modelling is not yet fully optimized to provide precise predictions, yet it is still the preferred way to describe the dynamics of glycosylation. As kinetic modelling brings in additional complexity, a promising way forward is to construct hybrid models and to precisely evaluate which parts of the process need to be described in a dynamic manner.

How to reduce variation originating from model design
In the design of the model and its corresponding metabolic network, a number of parameters is included, however, this number does not necessarily correlate to model predictive power. The number of parameters can be reduced by parameter estimation and sensitivity analysis which evaluates parameters for their impact on the model prediction and thus can point to inessential parameters. Thus parameters with minor influence on the model can be excluded without altering the output of the model [47]. Another approach to avoid overparameterization is to assume constant values for parameters throughout the culture, to not estimate intracellular concentrations of metabolites [29] * or in the case of concentrations of NSs, to assume that their concentrations are equal in cytosol and GA [32]. The validity of such assumptions can only be estimated, however, without precise input data and measurements. Selecting appropriate parameter combinations by excluding the ones with lower accuracy can lower the variation of model predictions [27] * . Moreover, smoothing data makes the model more robust in case of errors in experimental measurements or outliers and helps to reduce the effect of noise coming from the experimental data [31].
All of the mentioned steps should be included in model training to decide on crucial and precise input parameters and avoid too complex models.

Building blocks of glycosylation models and things to consider
The models described in the previous section require experimental data input to adequately represent cell line or process specific performance, which typically includes cell culture parameters, i.e. viable cell density and viability, the consumption and secretion rates of metabolites, productivity of the selected protein and the glycoprofile of the product [18] * [31,36,37]. Some more complex models also require the quantification of nucleotides and NSs as precursors for glycosylation [29] * [30,32] and/or gene expression of glycosylation enzymes and transporter proteins involved in glycosylation [27] * [32]. In the following we look into the methods available to analyze these parameters.

Measuring concentrations of nucleotides and nucleotide sugar donors
NSs are involved in glycosylation as donors of sugar moieties which are attached at the glycosylation site [50]. Whereas some of the models involve experimental measurement of NSs, e.g. the study from Zhang et al. [18] * did not include that data. The authors argued, based on previous research [51,52], that it is still challenging to accurately determine the concentrations of NSs and nucleotides. Furthermore, they argued that including this kind of data makes model simulations more challenging due to the higher number of unknown parameters [31]. Another option can be the approach taken by Sha et al. where they used FBA to provide NSs fluxes which were used as an input for the kinetic model to predict N-glycosylation [36]. Despite the analytical challenge, many models contain NS measurements [27,29,35] * [30,32,37].
There are different extraction methods available for which the reviews by high selectivity and specificity. Despite being a powerful method and its ability to detect unusual NSs, MS has its drawbacks. It is not compatible with all separation methods, it faces problems with distinguishing isomers, byproducts formed during derivatization, low volatility reagents and stable ions that can be formed during the separation step. Another type of detection method are UV based methods, which can overcome these obstacles, but can only provide rapid and accurate quantification of well characterized NSs [17,53,54].
Besides the aforementioned methods, it is also possible to measure NSs with capillary electrophoresis [55], IP-RP (ion-pair reverse phase) HPLC [56] * and FACE (Flourophore Assisted Carbohydrate Electrophoresis) [57]. Although many methods have been described, the analysis of NSs and nucleotides re-mains a challenging task that still is limited in its precision and reliability. NSs and nucleotide data were still used in some recent models [29] * [30,32] due to their ability to connect the cellular metabolism with glycosylation.

Kinetic data
Glycosylation models typically also include kinetic data extracted from the literature. This adds connectivity between obtained experimental data and it is important for better predictive capability of the models [41]. The main challenge of building a kinetic model is the need to include a significant amount of unknown kinetic parameters that are mostly system specific, but undetermined, which creates a challenge for parameter estimations and model simulations. On the other hand, modelling with kinetic data might provide a mechanistic insight for the dynamic changes of enzyme depended factors [31,46]. Kinetic models [18,27,29,45] * [28,30,32,37]  Databases available to obtain data for kinetic and stoichiometric models are BiGG (genome-scale models), KEGG (genes, enzymes, reactions, pathways), MetaCyc (enzymes, pathways), BioModels (established models from literature) and BRENDA (kinetic parameters for enzymes) (Figure 2b) [41]. However, only a small portion of parameters in the kinetic models is obtained via experimental work for a specific cell line or cultivation system, and most of this data were collected in older research work using different cell lines. Therefore, before using these parameters in a CHO relevant context, they should be re-estimated and adjusted for the metabolic profile of the selected cell line [41]. Another flaw, even of experimentally obtained enzymatic parameters, is that the values are measured from in vitro studies, which is not completely representative of the enzyme nature in in vivo conditions [58]. In the case of unknown parameters, proficient knowledge of parameter estimation technologies is required for which there are several approaches available [29] * [39,59,60]. Although this was done on the example of biomass and flux predictions, we suppose that it would have the same or similar effect on glycosylation models and it should therefore be taken into consideration when planning the experiments to provide input data for glycosylation models.
Currently, it is difficult to conclude which data should be included as on one hand it is needed to precisely describe the cellular environment which requires an extensive number of parameters and is possibly based on imprecise data, which adds complexity to the model and does not necessary improve the precision of the model. On the other hand, building a simpler model with fewer parameters might fail to describe important connections of the process and therefore cannot reach the precision we were striving for. Therefore one should carefully evaluate the data required and the precision of the methods available, include only the essential parameters and, in the case of experimental measurements of glycosylation profile and NSs, counteract the inaccuracy of analysis methods by appropriate sampling schemes and statistical tools [49] * [53,54] and suitable extraction steps [26] * [17,48,62,63]. One of the possible improvements would be to generate a database of kinetic parameters relevant for CHO cell lines that would also be able to describe enzyme characteristics in in vivo conditions. We also encourage to search for innovations in the field which offer more reliable measurements and also to apply established approaches from other expression systems.

Conclusions
Metabolic models require detailed knowledge, a significant amount of precise input data and their predictions are not error-free. The effect of these drawbacks can be minimized by advances in modelling designs and amongst others approaches, by generating input data with minimum error. Recent publications in the field present numerous models that are able to predict modifications in N-glycosylation in response to genetic engineering of the glycosylation pathway, changes in media supplementation such as carbon source and glycosylation precursors and changes in culture conditions. These advances now enable possible applications to improve control of glycosylation during a process, to find possible targets for genetic engineering of the glycosylation pathway, and to study the activity of single glycosylation enzymes and enzyme cascades, and to optimize both process parameters and media.

Acknowledgments
This work was supported by the COMET center "acib: Next Generation