Parameter estimation in modulated, unbranched reaction chains within biochemical systems

doi:10.1016/j.compbiolchem.2005.08.001

Computational Biology and Chemistry

Volume 29, Issue 5, October 2005, Pages 309-318

https://doi.org/10.1016/j.compbiolchem.2005.08.001 Get rights and content

Abstract

Modern biology is increasingly developing techniques for measuring time series of global gene expression and of many simultaneous proteins or metabolites. These data contain valuable information on the dynamics of cells, which has to be extracted with computational means. Given a suitable mathematical model, this extraction is in principle a straightforward regression task, but the complexity and nonlinearity of the differential equations that describe biological systems cause severe difficulties when the systems are of realistic size. We propose a method of stepwise regression that can be applied effectively to linear portions of pathways. The method may be combined with other estimation methods and either directly yields reasonable parameter estimates or at least provides appropriate start values for subsequent nonlinear search algorithms. We illustrate the method with the analysis of in vivo NMR data describing the dynamics of glycolytic metabolites in Lactococcus lactis.

Introduction

Mathematical models of biochemical systems have traditionally been constructed from the bottom up. Individual enzyme-catalyzed steps were formulated as rate laws and their parameters were estimated from experimental results in vitro, for instance, in the form of Lineweaver-Burk plots (e.g., Segel, 1991). Subsequently all rate functions were integrated into a comprehensive model describing the dynamics of select metabolite pools of interest. The integrative behavior was compared to observed system responses, if these were available, which was rarely the case, and discrepancies were resolved through secondary adjustments of parameter values.

Modern biology is generating new classes of data that permit a complementary approach. These data are of high density, capturing simultaneously the expression of hundreds or thousands of genes or proteins, or the concentrations of many metabolites. As a direct extension, the same experimental techniques allow the construction of time series of these measurements. These time series are very interesting, because they illuminate the dynamics of cellular responses in unprecedented ways. It is easy to imagine that these data have the potential of offering truly novel insights into the functioning of biological systems, if efficacious methods are available for their analysis. Indeed, several groups around the world have begun establishing methods that estimate model parameters from high-density time series (e.g., Voit and Savageau, 1982, Voit and Sands, 1996, Voit, 2000, Maki et al., 2002, Almeida and Voit, 2003, Kikuchi et al., 2003, Voit and Almeida, 2003, Voit and Almeida, 2004, Veflingstad et al., 2004, Kimura et al., 2005, Tsai and Wang, 2005, Naval et al., 2005; see also Vance et al., 2002, Torralba et al., 2003).

The estimation of system parameters from time series data is in principle a straightforward regression task, which, however, is obstructed by severe challenges in implementation. One class of issues pertains to the choice of the mathematical model that is supposed to capture the observed dynamics. Clearly, there are no a priori rules or laws that identify one specific type of model as optimal for a given biological system. Thus, one has to make assumptions and either use functions that have worked well in the past, or one employs generic representations that are supported by mathematical theory. The former approach might use modulated Michaelis-Menten or Hill functions, while the latter might use power-law representations. The second class of issues is of a technical nature and, for instance, includes the lack of convergence of nonlinear estimation algorithms, which is aggravated by the combinatorial explosion in the number of parameters to be estimated if the system grows in size.

In this article, we discuss a technique that, in combination with existing methods, helps tame the combinatorial explosion in the size of the parameter search space. This technique deals efficiently with linear portions of complex pathways, which may or may not be regulated by activators or inhibitors. The method does not provide advantages over other methods at branch points, but we show how linear parts may be separated out and thus render the proposed techniques admissible. This is important, because many parts of biochemical systems are indeed linear. As just one example, the conversion of PRPP into IMP consists of an unbranched chain of 10 steps, the first of which is subject to several inhibitory signals (cf. Stanbury et al., 1983: Chapter 50).

The method begins with the decoupling of the system equations, as it was proposed in the recent literature (Voit and Almeida, 2004; see also Voit and Savageau, 1982, Volgin et al., 2003, Clements et al., 2004); we will review this method in the next section. The decoupling renders it possible that the parameters of the entire system be estimated one equation at a time. Specifically, one considers the measured data quasi as forcing functions for the equation presently under consideration. Upon decoupling, and using power-law representations for the underlying model, we show that linear segments of pathways may be estimated with linear regression techniques that are extremely efficient.

We illustrate the method with time series data that Neves et al. (1999) obtained from Lactococcus lactis with methods of in vivo nuclear magnetic resonance.

Section snippets

Decoupling

Essentially all models of biochemical systems are based on nonlinear ordinary differential equations (ODE's). If time series measurements are available for all system variables of interest, then it should be possible in principle to use a nonlinear search algorithm, which solves the equations at each iteration and ultimately yields optimal parameter values. Practical experience shows that such an approach is not feasible for realistically sized problems (cf. CPU time information in Kikuchi et

Illustration example

L. lactis is an industrially important microorganism that plays an essential role in the production of fermented milk, cheese, yogurt, meat, bread, vegetables and wine. Through its production of lactic acid, polysaccharides and CO₂, and the digestion of casein, it provides an effective preservation of foods and improves the flavor, texture, color, and preservation characteristics of fermented products.

For our illustration, we use in vivo time series data on some key metabolites of glycolysis in

Results

In all practical cases, linear regression yields a unique solution. However, in order to test the reliability of results, we subjected the data to random noise, which was added uniformly from a 10% range about each smoothed experimental data point. As an alternative, we could have developed a bootstrapping or jackknife scheme with the measured data, but because of the small number of measurements it appeared that experimental variability was better represented in our approach.

It may be

Discussion

Any parameter estimation in biochemical systems faces the challenge of nonlinearity with all its computational problems, and one must expect that no single method will be sufficient for all estimation tasks. We have demonstrated in the past that decoupling significantly simplifies these estimation tasks, because large networks can be analyzed one metabolite at a time and this analysis can be executed sequentially or in parallel (Voit and Almeida, 2004). Nonetheless, the combinatorial explosion

Acknowledgments

This work was supported in part by a Quantitative Systems Biotechnology grant (BES-0120288; E.O. Voit, PI) from the National Science Foundation, a National Heart, Lung and Blood Institute Proteomics Initiative (Contract N01-HV-28181; D. Knapp, PI), and an endowment from the Georgia Research Alliance. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions.

References (36)

F. Alvarez-Vasquez et al.
Integration of kinetic information on yeast sphingolipid metabolism in dynamical pathway models
J. Theor. Biol.
(2004)
R. Curto et al.
Mathematical models of purine metabolism in man
Math. Biosci.
(1998)
J.L. Galazzo et al.
Fermentation pathway kinetics and metabolic flux control in suspended and immobilized Saccharomyces cerevisiae
Enzyme Microb. Technol.
(1990)
A.R. Neves et al.
Is the glycolytic flux in Lactococcus lactis primarily controlled by the redox charge? Kinetics of NAD(+) and NADH pools determined in vivo by ¹³C NMR
J. Biol. Chem.
(2002)
P.J. Sands et al.
Flux-based estimation of parameters in S-systems
Ecol. Model.
(1996)
M.A. Savageau
Biochemical systems analysis. 1. Some mathematical properties of the rate law for the component enzymatic reactions
J. Theor. Biol.
(1969)
M.A. Savageau
Biochemical systems analysis. 2. The steady-state solutions for an n-pool system using a power law approximation
J. Theor. Biol.
(1969)
E.O. Voit et al.
Modeling forest growth. I. Canonical approach
Ecol. Model.
(1996)
V.M. Volgin et al.
Finite difference method of simulation of nonsteady-state ion transfer in electrochemical systems with allowance for migration
Comput. Biol. Chem.
(2003)
K.R. Albe et al.
Cellular concentrations of enzymes and their substrates
J. Theor. Biol.
(1989)

J.S. Almeida et al.

Neural-network-based parameter estimation in complex biomedical systems

Genome Inform.

(2003)

R.L. Burden et al.

Numerical Analysis

(1993)

J.C. Clements et al.

Activation dynamics in anisotropic cardiac tissue via decoupling

Ann. Biomed. Eng.

(2004)

Ferreira, A.E.N., 1996. PLAS:...

M.H. Hoefnagel et al.

Metabolic engineering of lactic acid bacteria, the combined approach: kinetic modeling, metabolic control and experimental analysis

Microbiology

(2002)

S. Kimura et al.

Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm

Bioinformatics

(2005)

S. Kikuchi et al.

Dynamic modeling of genetic networks using genetic algorithm and S-system

Bioinformatics

(2003)

Y. Maki et al.

Inference of genetic network using the expression profile time course data of mouse P19 cells

Genome Inform.

(2002)

Cited by (21)

Optimizing ethanol production selectivity
2011, Mathematical and Computer Modelling
Lactococcus lactis metabolizes glucose homofermentatively to lactate. However, after disruption of the gene coding for lactate dehydrogenase, $L D H$ , a key enzyme in NAD⁺ regeneration, the glycolytic flux shifts from homolactic to mixed-acid fermentation with the redirection of pyruvate towards production of formate, acetate, ethanol and CO₂. A mathematical model of the pyruvate metabolism pathway that enhances ethanol production was developed from in vivo Nuclear Magnetic Resonance (NMR) time-series measurements that describe the dynamics of the metabolites in L. lactis. Both Michaelis–Menten and S-system models capture the observed in vivo dynamics of the glycolysis pathway in L. lactis, while prior models describe only the in vitro dynamics. The models provide insight into the maximization of selectivity of ethanol with respect to acetate and CO₂ as undesired products in multiple reactions. High concentrations of NADH and acetyl-CoA and low concentrations of pyruvate and NAD appear to maximize ethanol selectivity.
Computational Challenges in Systems Biology
2010, Systems Biomedicine
This chapter examines the challenges and some of the recent advances in computational systems biology. Research in computational systems biology has moved beyond interaction networks based simply on clustering and correlation. There are two paradigms in computational systems biology: the iterative cycle of biochemical model—mathematical model—computational model, and integration of novel data and legacy knowledge to develop context-specific biochemical, mathematical, and computational models. Challenges in building biochemical models include the complexity of proteomic states and interactions, integration of diverse data to infer biochemical interactions, and the temporal state of biochemical models. Challenges in building mathematical models include incorporating statistical/probabilistic information into analytical models, using qualitative constraints in mathematical models, and incomplete knowledge and coarse-graining. Challenges in computational modeling include the absence of knowledge about model parameters such as rate constants, local versus global concentrations of species and multiple scales of distance and time, and variation among different cell types and subpopulation variability, or variability among biological repeats. Advanced research in coarse graining will pave the way for progress in the development of multiscale multidomain modeling that can connect fundamental research in network biology to clinical research.
Computational Challenges in Systems Biology
2009, Systems Biomedicine: Concepts and Perspectives
This chapter examines the challenges and some of the recent advances in computational systems biology. Research in computational systems biology has moved beyond interaction networks based simply on clustering and correlation. There are two paradigms in computational systems biology: the iterative cycle of biochemical model—mathematical model—computational model, and integration of novel data and legacy knowledge to develop context-specific biochemical, mathematical, and computational models. Challenges in building biochemical models include the complexity of proteomic states and interactions, integration of diverse data to infer biochemical interactions, and the temporal state of biochemical models. Challenges in building mathematical models include incorporating statistical/probabilistic information into analytical models, using qualitative constraints in mathematical models, and incomplete knowledge and coarse-graining. Challenges in computational modeling include the absence of knowledge about model parameters such as rate constants, local versus global concentrations of species and multiple scales of distance and time, and variation among different cell types and subpopulation variability, or variability among biological repeats. Advanced research in coarse graining will pave the way for progress in the development of multiscale multidomain modeling that can connect fundamental research in network biology to clinical research.
Recent developments in parameter estimation and structure identification of biochemical and genomic systems
2009, Mathematical Biosciences
Citation Excerpt :
Some methods that aim to reduce the parameter search space using BST formalisms are described in Section 4.3 [83,155,156]. For linear parts of pathways, a technique of ‘peeling’ terms [202] can be applied to models in BST to convert the nonlinear parameter estimation task into a series of linear regression tasks. Specifically, beginning with an equation that contains only one unknown power-law term, the differentials are substituted by slopes and the parameters of the unknown terms are estimated by linear regression.
The organization, regulation and dynamical responses of biological systems are in many cases too complex to allow intuitive predictions and require the support of mathematical modeling for quantitative assessments and a reliable understanding of system functioning. All steps of constructing mathematical models for biological systems are challenging, but arguably the most difficult task among them is the estimation of model parameters and the identification of the structure and regulation of the underlying biological networks. Recent advancements in modern high-throughput techniques have been allowing the generation of time series data that characterize the dynamics of genomic, proteomic, metabolic, and physiological responses and enable us, at least in principle, to tackle estimation and identification tasks using ‘top-down’ or ‘inverse’ approaches. While the rewards of a successful inverse estimation or identification are great, the process of extracting structural and regulatory information is technically difficult. The challenges can generally be categorized into four areas, namely, issues related to the data, the model, the mathematical structure of the system, and the optimization and support algorithms.
Many recent articles have addressed inverse problems within the modeling framework of Biochemical Systems Theory (BST). BST was chosen for these tasks because of its unique structural flexibility and the fact that the structure and regulation of a biological system are mapped essentially one-to-one onto the parameters of the describing model. The proposed methods mainly focused on various optimization algorithms, but also on support techniques, including methods for circumventing the time consuming numerical integration of systems of differential equations, smoothing overly noisy data, estimating slopes of time series, reducing the complexity of the inference task, and constraining the parameter search space. Other methods targeted issues of data preprocessing, detection and amelioration of model redundancy, and model-free or model-based structure identification.
The total number of proposed methods and their applications has by now exceeded one hundred, which makes it difficult for the newcomer, as well as the expert, to gain a comprehensive overview of available algorithmic options and limitations. To facilitate the entry into the field of inverse modeling within BST and related modeling areas, the article presented here reviews the field and proposes an operational ‘work-flow’ that guides the user through the estimation process, identifies possibly problematic steps, and suggests corresponding solutions based on the specific characteristics of the various available algorithms. The article concludes with a discussion of the present state of the art and with a description of open questions.
Inverse problems of biological systems using multi-objective optimization
2008, Journal of the Chinese Institute of Chemical Engineers
Mathematical modeling for dynamic biological systems is a central theme in systems biology. There are still many challenges in using time-course data to obtain an inverse problem of nonlinear dynamic biological systems. In this study, a multi-objective optimization technique is introduced to determine kinetic parameter values of biochemical reaction systems. The multi-objective parameter estimation was converted into the minimax problem through the satisfying trade-off method. The aspiration value was assigned as the minimum solution to the corresponding single objective estimation. The aim of this trade-off estimation was to obtain a compromised result by simultaneously minimizing both concentration and slope error criteria. Hybrid differential evolution was applied to solve the minimax problem and to yield a global estimation.
Dynamic simulation of an in vitro multi-enzyme system
2007, FEBS Letters
Parameters often are tuned with metabolite concentration time series data to build a dynamic model of metabolism. However, such tuning may reduce the extrapolation ability (generalization capability) of the model. In this study, we determined detailed kinetic parameters of three purified Escherichia coli glycolytic enzymes using the initial velocity method for individual enzymes; i.e., the parameters were determined independently from metabolite concentration time series data. The metabolite concentration time series calculated by the model using the parameters matched the experimental data obtained in an actual multi-enzyme system consisting of the three purified E. coli glycolytic enzymes. Thus, the results indicate that kinetic parameters can be determined without using an undesirable tuning process.

View all citing articles on Scopus

¹: Present address: BACTER Institute, Room 6615 Biochemistry Addition, 433 Babcock Drive, Madison, WI 53706, USA.

View full text

Parameter estimation in modulated, unbranched reaction chains within biochemical systems

Abstract

Introduction

Section snippets

Decoupling

Illustration example

Results

Discussion

Acknowledgments

J. Theor. Biol.

Math. Biosci.

Enzyme Microb. Technol.

J. Biol. Chem.

Ecol. Model.

J. Theor. Biol.

J. Theor. Biol.

Ecol. Model.

Comput. Biol. Chem.

Cellular concentrations of enzymes and their substrates

J. Theor. Biol.

Neural-network-based parameter estimation in complex biomedical systems

Genome Inform.

Numerical Analysis

Activation dynamics in anisotropic cardiac tissue via decoupling

Ann. Biomed. Eng.

Metabolic engineering of lactic acid bacteria, the combined approach: kinetic modeling, metabolic control and experimental analysis

Microbiology

Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm

Bioinformatics

Dynamic modeling of genetic networks using genetic algorithm and S-system

Bioinformatics

Inference of genetic network using the expression profile time course data of mouse P19 cells

Genome Inform.