Parameter estimation in modulated, unbranched reaction chains within biochemical systems

https://doi.org/10.1016/j.compbiolchem.2005.08.001Get rights and content

Abstract

Modern biology is increasingly developing techniques for measuring time series of global gene expression and of many simultaneous proteins or metabolites. These data contain valuable information on the dynamics of cells, which has to be extracted with computational means. Given a suitable mathematical model, this extraction is in principle a straightforward regression task, but the complexity and nonlinearity of the differential equations that describe biological systems cause severe difficulties when the systems are of realistic size. We propose a method of stepwise regression that can be applied effectively to linear portions of pathways. The method may be combined with other estimation methods and either directly yields reasonable parameter estimates or at least provides appropriate start values for subsequent nonlinear search algorithms. We illustrate the method with the analysis of in vivo NMR data describing the dynamics of glycolytic metabolites in Lactococcus lactis.

Introduction

Mathematical models of biochemical systems have traditionally been constructed from the bottom up. Individual enzyme-catalyzed steps were formulated as rate laws and their parameters were estimated from experimental results in vitro, for instance, in the form of Lineweaver-Burk plots (e.g., Segel, 1991). Subsequently all rate functions were integrated into a comprehensive model describing the dynamics of select metabolite pools of interest. The integrative behavior was compared to observed system responses, if these were available, which was rarely the case, and discrepancies were resolved through secondary adjustments of parameter values.

Modern biology is generating new classes of data that permit a complementary approach. These data are of high density, capturing simultaneously the expression of hundreds or thousands of genes or proteins, or the concentrations of many metabolites. As a direct extension, the same experimental techniques allow the construction of time series of these measurements. These time series are very interesting, because they illuminate the dynamics of cellular responses in unprecedented ways. It is easy to imagine that these data have the potential of offering truly novel insights into the functioning of biological systems, if efficacious methods are available for their analysis. Indeed, several groups around the world have begun establishing methods that estimate model parameters from high-density time series (e.g., Voit and Savageau, 1982, Voit and Sands, 1996, Voit, 2000, Maki et al., 2002, Almeida and Voit, 2003, Kikuchi et al., 2003, Voit and Almeida, 2003, Voit and Almeida, 2004, Veflingstad et al., 2004, Kimura et al., 2005, Tsai and Wang, 2005, Naval et al., 2005; see also Vance et al., 2002, Torralba et al., 2003).

The estimation of system parameters from time series data is in principle a straightforward regression task, which, however, is obstructed by severe challenges in implementation. One class of issues pertains to the choice of the mathematical model that is supposed to capture the observed dynamics. Clearly, there are no a priori rules or laws that identify one specific type of model as optimal for a given biological system. Thus, one has to make assumptions and either use functions that have worked well in the past, or one employs generic representations that are supported by mathematical theory. The former approach might use modulated Michaelis-Menten or Hill functions, while the latter might use power-law representations. The second class of issues is of a technical nature and, for instance, includes the lack of convergence of nonlinear estimation algorithms, which is aggravated by the combinatorial explosion in the number of parameters to be estimated if the system grows in size.

In this article, we discuss a technique that, in combination with existing methods, helps tame the combinatorial explosion in the size of the parameter search space. This technique deals efficiently with linear portions of complex pathways, which may or may not be regulated by activators or inhibitors. The method does not provide advantages over other methods at branch points, but we show how linear parts may be separated out and thus render the proposed techniques admissible. This is important, because many parts of biochemical systems are indeed linear. As just one example, the conversion of PRPP into IMP consists of an unbranched chain of 10 steps, the first of which is subject to several inhibitory signals (cf. Stanbury et al., 1983: Chapter 50).

The method begins with the decoupling of the system equations, as it was proposed in the recent literature (Voit and Almeida, 2004; see also Voit and Savageau, 1982, Volgin et al., 2003, Clements et al., 2004); we will review this method in the next section. The decoupling renders it possible that the parameters of the entire system be estimated one equation at a time. Specifically, one considers the measured data quasi as forcing functions for the equation presently under consideration. Upon decoupling, and using power-law representations for the underlying model, we show that linear segments of pathways may be estimated with linear regression techniques that are extremely efficient.

We illustrate the method with time series data that Neves et al. (1999) obtained from Lactococcus lactis with methods of in vivo nuclear magnetic resonance.

Section snippets

Decoupling

Essentially all models of biochemical systems are based on nonlinear ordinary differential equations (ODE's). If time series measurements are available for all system variables of interest, then it should be possible in principle to use a nonlinear search algorithm, which solves the equations at each iteration and ultimately yields optimal parameter values. Practical experience shows that such an approach is not feasible for realistically sized problems (cf. CPU time information in Kikuchi et

Illustration example

L. lactis is an industrially important microorganism that plays an essential role in the production of fermented milk, cheese, yogurt, meat, bread, vegetables and wine. Through its production of lactic acid, polysaccharides and CO2, and the digestion of casein, it provides an effective preservation of foods and improves the flavor, texture, color, and preservation characteristics of fermented products.

For our illustration, we use in vivo time series data on some key metabolites of glycolysis in

Results

In all practical cases, linear regression yields a unique solution. However, in order to test the reliability of results, we subjected the data to random noise, which was added uniformly from a 10% range about each smoothed experimental data point. As an alternative, we could have developed a bootstrapping or jackknife scheme with the measured data, but because of the small number of measurements it appeared that experimental variability was better represented in our approach.

It may be

Discussion

Any parameter estimation in biochemical systems faces the challenge of nonlinearity with all its computational problems, and one must expect that no single method will be sufficient for all estimation tasks. We have demonstrated in the past that decoupling significantly simplifies these estimation tasks, because large networks can be analyzed one metabolite at a time and this analysis can be executed sequentially or in parallel (Voit and Almeida, 2004). Nonetheless, the combinatorial explosion

Acknowledgments

This work was supported in part by a Quantitative Systems Biotechnology grant (BES-0120288; E.O. Voit, PI) from the National Science Foundation, a National Heart, Lung and Blood Institute Proteomics Initiative (Contract N01-HV-28181; D. Knapp, PI), and an endowment from the Georgia Research Alliance. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions.

References (36)

  • J.S. Almeida et al.

    Neural-network-based parameter estimation in complex biomedical systems

    Genome Inform.

    (2003)
  • R.L. Burden et al.

    Numerical Analysis

    (1993)
  • J.C. Clements et al.

    Activation dynamics in anisotropic cardiac tissue via decoupling

    Ann. Biomed. Eng.

    (2004)
  • Ferreira, A.E.N., 1996. PLAS:...
  • M.H. Hoefnagel et al.

    Metabolic engineering of lactic acid bacteria, the combined approach: kinetic modeling, metabolic control and experimental analysis

    Microbiology

    (2002)
  • S. Kimura et al.

    Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm

    Bioinformatics

    (2005)
  • S. Kikuchi et al.

    Dynamic modeling of genetic networks using genetic algorithm and S-system

    Bioinformatics

    (2003)
  • Y. Maki et al.

    Inference of genetic network using the expression profile time course data of mouse P19 cells

    Genome Inform.

    (2002)
  • Cited by (21)

    • Optimizing ethanol production selectivity

      2011, Mathematical and Computer Modelling
    • Computational Challenges in Systems Biology

      2009, Systems Biomedicine: Concepts and Perspectives
    • Recent developments in parameter estimation and structure identification of biochemical and genomic systems

      2009, Mathematical Biosciences
      Citation Excerpt :

      Some methods that aim to reduce the parameter search space using BST formalisms are described in Section 4.3 [83,155,156]. For linear parts of pathways, a technique of ‘peeling’ terms [202] can be applied to models in BST to convert the nonlinear parameter estimation task into a series of linear regression tasks. Specifically, beginning with an equation that contains only one unknown power-law term, the differentials are substituted by slopes and the parameters of the unknown terms are estimated by linear regression.

    • Inverse problems of biological systems using multi-objective optimization

      2008, Journal of the Chinese Institute of Chemical Engineers
    View all citing articles on Scopus
    1

    Present address: BACTER Institute, Room 6615 Biochemistry Addition, 433 Babcock Drive, Madison, WI 53706, USA.

    View full text