Metabolic reconstruction using shortest paths

https://doi.org/10.1016/S0928-4869(00)00006-9Get rights and content

Abstract

This paper introduces a graph-oriented representation of metabolism, and shows how to apply the shortest path algorithm to reconstruct metabolic pathways. Our metabolic model is constructed from molecular structures of compounds and reaction formulas of enzymes, and its output is all the logically possible pathways consisting of input reactions. We also show how to integrate putative reactions in the model.

Introduction

The human genome project altered the trend of biological simulation. The whole genome of humans will be completely sequenced within a few years, and the complete genome of more than 20 bacterial species is already available. With this abundant genetic information, the genome research is becoming ever more systematic. What is required is a good framework to reorganize the biological knowledge and to reconstruct models from the perspective of system science.

Traditional biological simulation can be categorized as a top–down approach, because it is tuned to output observed laboratory data in a goal-oriented manner, and because its modeling method sometimes does not have a corresponding biological basis. Let us suppose that a certain cellular rhythm can be modeled by a differential equation. It is still unknown, however, as to which cellular components are responsible for this effect: it may be a single protein, or a result of multiple protein interaction. Such disadvantage of the top–down approach becomes clearer when we ask `What other possible models exist for achieving the same result?'

What is necessary is the systematic re-construction of biological models in a bottom–up fashion. In this systematic approach, all possible consequences from the observed laboratory data are considered, and are checked for their validity. In other words, we need systematic computation which covers every possible model that might be overlooked by human. The systematic method will soon become the mainstream of biological modeling.

Thus, in biological simulation, computers are better used to search possible models, rather than to output an optimal behavior of a certain fixed model. The traversal of possible models is called model search, and will become a necessary step in the future genome research.

This paper focuses on how to find metabolic pathways from the input qualitative knowledge, i.e. chemical structures and reaction formulas. The outline of the paper is as follows. Section 2 explains metabolism, our target biological system. Section 3 introduces the modeling principles, or how to represent metabolism with a graph. The actual implementation of the system is shown in Section 4. Preliminary results of our computer program are in Section 5. The discussion and the related work are in Section 6.

Section snippets

Background

Among various cellular mechanisms, bacterial metabolism is the best target for biological simulation, because (1) basic metabolism is well conserved throughout species, and because (2) it facilitates a new research area, such as drug synthesis or toxin degradation using genetically engineered bacteria.

Metabolism is a network of chemical reactions, mostly catalyzed by enzymes. For example, the sugar and protein in our food are digested into water and carbon dioxide, and the obtained energy is

Atomic level representation

Information on the structure of chemicals is essential for the pathway re-construction, in order to check the correctness of pathways and to compute the similarity of compounds. For example, the following pathway from citrate to malonate is not correct, because compounds only exchange their coenzyme A (CoA) moiety. To avoid this mistake, each reaction should be interpreted in terms of the molecular structure of compounds.

Moreover, the structures should be considered at the atomic level, not at

System implementation

The simulation system for the metabolic reconstruction is called AMR (automated metabolic re-construction). AMR consists of the database of compounds and enzymatic reactions, and the inference engine for computing pathways. The system is written in C++ with LEDA algorithmic library [14].

Results

Currently, over 700 compounds and 200 reactions are registered in the database, and the system can efficiently compute logically possible pathways. Basically, its output is the reproduction of the tracer experiment in biochemistry, and it can find the tracing results such as `a carbon of acetyl CoA can be excised already in the first round of TCA cycle' (at 2-oxoglutarate dehydrogenase in Fig. 5).

Fig. 6 shows the reproduction of glycolysis as another example. The shortest glycolysis, ?( glucose

Hypothetical enzymes

The argument so far dealt with already known reactions only. In order to cope with unknown enzymes, the metabolic graph should be augmented with additional edges. We summarized major enzymatic reactions into 16 basic types in Table 1. These edges serve as hypotheses, and their appearance in selected subgraphs suggests the possibility of the existence of yet unknown enzymes. The hypothetical reactions are better used only when there is no pathway consisting of known reactions only. The use of

Conclusions

No biological simulation has focused on the importance of hypotheses and model search as our work: a traditional simulation has been concerned with `what' can be achieved in the modeled system by describing `how' each component works. Our graph representation of metabolism aims its reverse process. Its motivation is to investigate `how' the result is achieved by describing `what' is known or experimentally observed.

This paper introduced the representation of metabolism using a graph. Its unique

Acknowledgements

The author thanks Prof. Morris, Prof. Green (U of Washington), Prof. Akutsu (U of Tokyo), and Dr. Dandekar (EMBL Heidelberg) for their continuous help and encouragement.

References (20)

There are more references available in the full text version of this article.

Cited by (54)

  • Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path

    2014, Metabolic Engineering
    Citation Excerpt :

    Computational approaches for the prediction of non-native pathways exist, but are limited in their design and scope. Different approaches have been implemented for pathway prediction (Arita, 2000; Carbonell et al., 2011; Cho et al., 2010; Dale et al., 2010; Greene et al., 1999; Hatzimanikatis et al., 2005; Heath et al., 2010; Hou et al., 2003; McShan et al., 2003; Pharkya et al., 2004), where increasing attention has been focused mainly on retrosynthetic algorithms (Carbonell et al., 2011; Cho et al., 2010; Henry et al., 2010; Yim et al., 2011) based on Biochemical Reaction Operators (BROs). In these analyses, BROs are used to go from a target compound to a predefined set of metabolites in an iterative backward search.

  • Computational biotechnology: Prediction of competitive substrate inhibition of enzymes by buffer compounds with protein-ligand docking

    2012, Journal of Biotechnology
    Citation Excerpt :

    Optimization of enzyme activity in vitro gains importance nowadays with the increasing number of multi-enzyme synthetic pathways being developed for biotechnological production. With new computational methods predicting synthetic pathways for chemical production (Arita, 2000; Cho et al., 2010; Li et al., 2004; McShan et al., 2003; Wu et al., 2011), composing synthetic pathways is significantly supported and will gain more importance in the next decades. Multi-enzyme pathways easily grow to more than ten enzymes (Findrik and Vasić-rački, 2009; Santacoloma et al., 2011).

  • Prediction of metabolic pathways from genome-scale metabolic networks

    2011, BioSystems
    Citation Excerpt :

    However, since it has a low degree (it is involved only in a few reactions), the path finding algorithm may traverse it even in a weighted graph, thus predicting an invalid pathway. One solution to this problem is to trace the atoms of the compound(s) of interest through the metabolic network, an approach first introduced by Arita (2000, 2003) and then applied in various path finding tools (Rahman et al., 2004; Blum and Kohlbacher, 2008; Pitkänen et al., 2009). In our case, there is an important drawback to this approach: it is not suited for path finding between reactions.

  • Path finding approaches and metabolic pathways

    2009, Discrete Applied Mathematics
    Citation Excerpt :

    Such findings are reflected in, for example, standard textbooks such as Nelson and Cox [19], or online resources such as BioCyc (http://biocyc.org/). Here, the terminology is not unique, and different authors describe these pathways using different terms, e.g. annotated pathway [4,5]; consensus pathway [2]; experimentally elucidated pathway [10]; experimentally determined pathway [3]; reference pathway (Kegg Pathway Database, http://www.genome.ad.jp/kegg/pathway.html). Of course, it is important to note that from the physiological viewpoint, metabolic pathways do not operate in isolation, and within an organism many different pathways work together to produce an overall global flux (reaction/compound) distribution.

  • Finding the k Shortest Simple Paths: Time and Space Trade-offs

    2023, ACM Journal of Experimental Algorithmics
View all citing articles on Scopus
View full text