An Approach for Model Reduction of Biochemical Networks

Biochemical networks are not only complex but also extremely large.The dynamic biological model of great complexity resulting in a large number of parameters is a main difficulty for optimization and control processes. In practice, it is highly desirable to further simplify the structure of biological models for the sake of reducing computational cost or simplification for the task of system analysis. This paper considers the S-system model used for describing the response of biochemical networks. By introducing the technique of singular value decomposition (SVD), we are able to identify the major state variables and parameters and eliminate unimportant metabolites and the corresponding signal transduction pathways. The model reduction by multiobjective analysis integrates the criteria of reactive weight, sensitivity, and flux analyses to obtain a reduced model in a systematic way. The resultant model is closed to the originalmodel in performance but with a simpler structure. Representative numerical examples are illustrated to prove feasibility of the proposed method.


Introduction
Metabolic models are usually with extremely high complexity with their system model because those networks are entangled by numerous molecules and reaction pathways [1][2][3].The complexity of full-scale metabolic models is commonly cognized as a major obstacle for their effective use in computational systems biology.Direct analysis of that kind of systems based on the mathematical models is usually inefficient from the viewpoint of computational cost.To cope with the difficulty, one appropriate approach is to simplify the model before conducting the analysis that is, configuration of biological systems is simplified to reduce its actual size [4][5][6].
In the mathematics, the stoichiometric matrix is commonly applied to express the framework of biochemical networks, which is built from the stoichiometric coefficients of the reactions.The S-system model is a generalized representation of the dynamic behaviors of genetic networks or metabolic networks.Most results commence from considering the mathematical model of biochemical networks by Ssystem architecture [7,8].The model consists of a number of particular features of biochemical systems as well as with observation from their behaviors.The biochemical reactions of the S-system models in heterogeneous media show fractal kinetic orders [9].
The technique of SVD has been used in the field of engineering for model simplification and data compression.In biochemical territory, the stoichiometric matrix can be disassembled into eigen-reaction (corresponding to the left singular vector), singular value, and eigen-connectivity (corresponding to the right singular vector) [10][11][12][13].SVD can be utilized as a useful mathematical framework for processing and modeling genome-wide expression data, in which both of the mathematical variables and operations may be assigned to specific biological meanings [12,13].It is thus expected that the characteristics and information of the state variables can be peeked via the decomposed network structure.
Sensitivity analysis combines the branch of mathematic and engineering responses.It shows how a biochemical system responds to short-term perturbations [3].In general, the analysis is quantified via principal component analysis (PCA) [3,11,14], by which the upper bound of the tolerated parameter variations related to the system matrix of the biochemical network at the steady state can be obtained.In [14], the tolerance of the parameter uncertainty was estimated under which the system stability is preserved.This paper proposes a systematic model reduction method based on the SVD which considers three criteria (SVD, flux, and sensitivity analyses) to simplify complex rational rate expressions of the biochemical networks, such as enzymatic reactions.Our objective is to obtain biochemically meaningful reduced models, which reproduce the dynamic properties of the original model.The advantage of this method is that it is algorithmic and does not significantly rely on input in the form of biochemical knowledge.It is easy to apply and that allows for user-specified reductions of individual rate expressions in a full-order model.The approach, however, is better at preserving biochemical properties, as we show through some representative case studies.Availability endorses it as an alternative to achieve the classical engineering objective of improved parameter identifiability without losing the demand for preserving biochemical interpretation.

S-System Model.
A typical S-system model consists of a number of specific features of biochemical systems [3].It provides easy systemic observations while compared with other biological modelling techniques.To explain, consider the reactive relation of influx  +  ∈ R and efflux  −  ∈ R represented, respectively, by where   ,  = 1, . . .,  are dependent variables such as intermediate metabolites and products,   ,  =  + 1, . . .,  +  are independent variables such as substrates and enzymes,   is metabolic concentration, the positive numbers   and   refer to rate constants, and   and ℎ  are kinetic order variables.
Positive kinetic orders indicate activating influences while negative values mean inhibition.If the kinetic order   (ℎ  ) equals zero, then the network will not contain the pathway from   to   .This model can then be casted into a canonical matrix form regardless of the degree of its nonlinearity: where () ∈ R  is the concentration of reactants,  ∈ R ×2 denotes the stoichiometric matrix, and (⋅) ∈ R 2 is a function vector.

Stoichiometric Matrix and SVD Analysis.
The stoichiometric matrix is commonly utilized to describe the structure of a reaction network and SVD representation of the stoichiometric matrix can be used to analyze network properties.It is particularly a useful way to obtain the basic information about four fundamental subspaces of the matrix.
If rank() = , then  can be decomposed by SVD as where  ∈ R × and  ∈ R 2×2 are unimodular matrices, Σ = [diag( 1 ,  2 , . . .,   ) 0 × ] ∈ R ×2 is the matrix consisting of singular values (  ) and null entries, the elements of  are stoichiometric coefficients, and the elements of  are reaction participation numbers in model.
To further investigate the effect of the flux rate on the biochemical network, the matrix    is diagonalized as where  = [V , ] ∈ R 2×2 and Λ = diag( 1 ,  2 , . . .,   , 0, . . ., 0) ∈ R 2×2 with   =  2  are the eigenvector matrix and eigenvalue matrix, respectively.From the representation of (4), it is clear that  involves reaction participation numbers and Λ consists of the square of singular values.Relative importance of the th reaction can thus be measured via the weighted parameter   defined as where   and V , have been defined in (4).

Sensitivity Analysis.
Sensitivity analysis is introduced to explore the relationship between concentration and reactive rate.It helps to predict how the system performance is influenced by changes in key network parameters.In (1), if the biochemical network is at steady state, then we have When the system remains close to the steady state, sensitivity analysis provides the result that a relative change in a rate constant affects the steady-state concentration of the metabolite.The idea provides perception of the structure of S-system and can be employed in the evaluation of typical biochemical experiments.Consider the sensitivity of rate constants as where    = −   in the steady state; ,  = 1, 2, . . .,  by [3].Define which can be decomposed as where  = [ , ] ∈ R × is the similarity transformation matrix and  = diag( 1 ,  2 , . . .,   ).
The sensitivity coefficient can be obtained via (8).A good sensitivity analysis can distinguish each path by identifying relative importance of the parameters in the model.To this aim, define the relative importance of the th reaction as and let where    ∈ R equals to    ∈ R by    in (12).The factor   quantitatively weights the parametric effects on the network features in (13).Therefore, sensitivity can be a measurement for quantitatively evaluating how fast the responses of any system components change with parameter variations.

Flux Analysis.
In biochemical networks, the reactive input and output refer to the flux   .Biological metabolic networks have more fluxes (reactions) in the system than metabolites.We calculate the values of fluxes to get additional constraints for the metabolic network.For each metabolite one can obtain reactive rates by using the flux analysis.The flux consists of two patterns: influx and efflux.It can be further separated into activating and inhibition influences in the S-system.The result of conversion of the compounds is affected by the magnitude of flux rate.To compare each flux, we calculate, for each average flux   ∈ R, as where   and  0 are, respectively, the terminal and initial times for simulation;   represents the input flux of the (( + 1)/2)th state when  is odd; otherwise, it represents the output flux of the (/2)th state when  is even.From (14), one can compute the average flux rate on which each metabolic accumulation and degradation are quantified.From the calculated results, the relative magnitude of flux rate for each reaction pathway is perceived.

Model Reduction.
Relative importance of each subobjective is not only subjective but also uneasily evaluated in a fair manner.The previous indices are thus integrated into a single objective for convenience of joint evaluation.Before integration, all objective functions are normalized.For   , consider  min ≤   ≤  max with 0 ≤ (  − min )/( max − min ) ≤ 1, where  min and  max are, respectively, the lower and upper limits of   .Similar treatments are applied for   and   .Let where   is the weight of the th subobjective;   and   are defined, respectively, in ( 13) and ( 14).Define where  = 1, 2, . . ., 2.The exponent  ∈ [1, ∞) is used to reflect the general vector length.If  = 1, the weighting represents the relative importance among each subobjective.
Next, consider combination of the three objective functions by finding the cost value Ŷ ∈ R. It can help to distinguish important and unimportant parts: where  serves as a threshold and  is a normalized factor to normalize the value of .If Ŷ < , Ŷ is categorized as the unimportant part and others as the important part.The unimportant part implies that there is weak or no connection between two states.Referring to the system model (1), if the parameters   and ℎ  are treated to zero, then, the network structure is reduced accordingly.The selection of  is case dependent which is closely related to the network characteristics.One can determine an appropriate value of  based on the simulation results.The principle for selection of  is that, its value is such that (i) there are identifiable unimportant parts; (ii) the responses of the reduced model are not far away from the original model.After performing model reduction, one needs to ensure that there is no significant discrepancy between the original and reduced networks.We compute the residual error   ∈ R via where    () denotes concentration of the reduced network at the sampling time instant .If the error concentration is larger than the original system concentration, then the reduced system is unfeasible and could not be reduced.Furthermore, define When  → 0 + , it refers to the ideal situation and there is no residual error.On the other hand, when  → 1, it refers to the worst situation.Under the situation, the model should not be reduced.

Simulation Results
Data analysis of large reaction networks is usually time consuming due to higher computational cost.A variety of reduced models have been developed for the purpose [5,14].An efficient way to analyze a large-sized mathematical model is to decompose it into a smaller one which is easier to manage and analyze, analogous to "divide-and-conquer" method.However, the reduced module should be combined to produce the same set of inputs and outputs as those in the original model.Simply put, precision, and fidelity should not give in simplicity.

Case 1: Tricarboxylic Acid Cycle.
In Figure 1 One is referred to [3] for the definitions of all state variables depicted above.Figure 2 illustrates the result of reactive weight, sensitivity analysis, flux rates, and compromise solution.The entries of the matrices Λ and  are given below: from which we obtain the maximal value of Ŷ to be 0.0534, with which, 19 pathways and one metabolite could be eliminated based on the proposed method while setting  = 0.0534 in Tables 1 and 2 or the dashed part of Figure 1.
The reduced dynamic model is given by Note that  7 has been removed from the original model.The effect of model reduction can be confirmed from observation of Figure 3.It is seen that the responses of concentrations for the original model and reduction model are quite consistent.With regard to the computational aspect, we conducted the simulation study using MATLAB code for Runge-Kutta 4th order on a PC featured with a 2.33 GHz microprocessor (Q8200).For this case, expenditure of the      computation time for the original and reduction models is, respectively, 14.8 and 13.6 seconds.The latter saves the computation cost up to 9%.

Case 2: Purine Metabolism.
In Figure 4, the concentration (dependent variable) possesses 16 species.We illustrate the procedure with the dynamics of PRPP, which is affected by 6 reactions.The synthesizing reaction has R5P as substrate, which is explicitly included as an independent variable, and is modulated by numerous inhibitors.One is PRPP itself, and four others of relevance are AMP, ADP, GDP, and GTP.One uses the aggregated pools  4 and  8 as substitutes and adjust the kinetic orders accordingly, because the latter are not represented individually in the model [3]: For the definitions of all state variables, see [3].Simulation results of the network are illustrated in Figures 5 and 6.The entries of the matrices Λ and  are Λ = diag (296.47,264. As in the previous case, it can be observed from Tables 3 and 4 that 10 pathways and 3 metabolites are ignorable when choosing  = 0.0003.This makes it clear that three input fluxes and three output fluxes (six UMPs) are ignorable.Ignoring the six UMPs is equivalent to eliminating three state variables  6 ,  14 , and  16 .The six dashed lines in Figure 4 show the pathways to be ignorable after performing the analysis.The corresponding reduced model is given by    (external) (internal)

Glucose
Glucose,  1 The computational costs for the original model and reduction model are, respectively, 1 and 0.9 seconds.The reduced model saves the computation cost up to 10%.Note that this case possesses simpler interconnected structure than that of Case 1, therefore, the computation time is far less than that in Case 1; We summarize in Figure 7 the residual errors between the original and reduced models for the two cases above.Again, the result confirms the proposed approach.

Case 3: Saccharomyces cerevisiae Network.
Organisms use glucose to produce ethanol, glycerol, glycogen, and trehalose in the yeast fermentation pathway.For the map illustrated in Figure 8, it has an important point of regulation that external glucose is transported into the cell.Glucose-6-phosphate (G6P) inhibits the step.Internal glucose is phosphorylated to G6P by hexokinase and G6P flow into distinct pathways.Here, one only considers two terms.The first one is toward ethanol, and the next one is toward the polysaccharides glycogen and trehalose.This is insignificant for the flow into the oxidative and nonoxidative pentose pathways.The process is omitted under the experimental conditions of interest.The omission allows less kinetic information to establish the model.Thus, it is a major modeling step.However, it exhibits a drawback, because it has conditions to limit model predictions that the pentose branches can be ignored in comparison with the ethanol branch.In mathematical complexity, the gained reduction is weighed with the limitation in scope.
G6P and fructose-6-phosphate can be converted into each other.Thus, G6P and fructose-6-phosphate are considered to be at equilibrium.In the next step, fructose-6phosphate is phosphorylated by phosphofructokinase.If the enzyme fructose-1,6-diphosphate is active, it will lead to a futile cycle between fructose-6-phosphate and fructose-1, 6-diphosphate.However, this enzyme is typically inactive under the experimental conditions of interest.And the model cannot include the cycle.Glycerol or phosphoenolpyruvate is produced by fructose-1,6-diphosphate.Finally, the production of ethanol is catalyzed by pyruvate kinase.In addition to these major reaction pathways, there are a number of reactions between the ADP and ATP procedures.The model can be mathematically expressed as follows [3]: In Figure 9, some important messages from the graphs were discerned.The reactions of PEP were eliminated from the simulated result.However, the result obtained is not satisfactory as expected.To explore the cause of failure, we conduct a comparison among examples to identify the differences.One can use the value of average residual  to determine suitability of the model simplification.For this case,  is 2.8417.Obviously, it is too large to endorse suitability of the method for model simplification.
3.4.Case 4: Dictyostelium Network. Figure 10 illustrates pulses of cAMP produced by ACA.ACA is activated by the binding of extracellular cAMP to the surface receptor CAR1.The protein kinase ERK2 is activated to transmit the signal to ACA via ligand-bound CAR1.The activation of ACA is independent of ERK2 in an alternate circuit.If cAMP is accumulated internally, then the protein kinase PKA is activated via binding to the regulatory subunit of PKA.PKA inhibits ERK2 and the cAMP phosphodiesterase REG is activated by phosphorylating it.REG A is activated by a protein phosphatase.The process makes internal cAMP hydrolyze via REG A. When REGA hydrolyzes the internal cAMP, its regulatory subunit inactivates PKA activity.And CAR1 is returned to its high-affinity state by protein phosphatase.The phosphodiesterase PDE degrades secreted cAMP, before it diffuses between cells.The double horizontal line expresses the membrane surface of the cell [16,17].The mathematical model is given by ẋ 1 = 2 Figure 11 shows the results for the respective analyses of this case in which the average residual error value is 0.9488.For Cases 1-4, the values of  are 7.16 × 10 −2 , 1.041 × 10 −5 , 2.8417, and 0.9488, respectively.It is quite obvious that the value of  in Cases 3 and 4 is large than that in other cases.Thus, the model cannot be reduced further.It is quite obvious that not every biochemical network can be simplified in its structural model.However, one can quantify the mathematical model of the metabolic network while comparing relative importance of each reaction pathway.

Conclusions
This paper proposes a multiobjective analysis approach based on the SVD to identify the major characteristics in biochemical networks and performs model reduction from its original model.We use the S-system model to capture Computational Biology Journal the dynamic behaviours of the biochemical networks and integrate the criteria of reactive weight, sensitivity, and flux analyses into the multiobjective function to reduce the unimportant states and related parameters.It is demonstrated that the simplified model is achievable and explainable with information obtained via the systematic approach.Several representative examples have been tested and verified which endorse applicability of the presented approach.Despite preliminary success of the demonstrative cases, most biochemical reactions in real world may not be that simple as those presented here.However, the proposed approach does provide an option to simplify the task of model reduction for complex biochemical networks.

Figure 3 :
Figure 3: Dynamic responses of the noisy states in Case 1: (a) original model and (b) simplified model (  as original concentration,   as reduced concentration).

Figure 4 :
Figure 4: Graph of purine metabolite in which curved arrows enter or leave the pathway corresponding to purine ring and ribose moieties.

Figure 6 :
Figure 6: Dynamic responses of the noisy states in Case 2 where   denotes the original concentration and   denotes the reduced concentration: (a) original model (in blue line) and (b) simplified model (in green line).

2 Figure 7 :
Figure 7: Residual errors of model reduction for Cases 1 and 2.

Figure 8 :
Figure 8: Map of anaerobic fermentation pathway in Saccharomyces cerevisiae.

Table 1 :
Result of model reduction by  in Case 1 (IMP as important part, UMP as unimportant part).

Table 3 :
Result of model reduction by  in Case 2 (IMP as important part, UMP as unimportant part).