Elsevier

Biosystems

Volume 84, Issue 2, May 2006, Pages 115-123
Biosystems

Dynamical properties of model gene networks and implications for the inverse problem

https://doi.org/10.1016/j.biosystems.2005.09.010Get rights and content

Abstract

We study the inverse problem, or the “reverse-engineering” problem, for two abstract models of gene expression dynamics, discrete-time Boolean networks and continuous-time switching networks. Formally, the inverse problem is similar for both types of networks. For each gene, its regulators and its Boolean dynamics function must be identified. However, differences in the dynamical properties of these two types of networks affect the amount of data that is necessary for solving the inverse problem. We derive estimates for the average amounts of time series data required to solve the inverse problem for randomly generated Boolean and continuous-time switching networks. We also derive a lower bound on the amount of data needed that holds for both types of networks. We find that the amount of data required is logarithmic in the number of genes for Boolean networks, matching the general lower bound and previous theory, but are superlinear in the number of genes for continuous-time switching networks. We also find that the amount of data needed scales as 2K, where K is the number of regulators per gene, rather than 22K, as previous theory suggests.

Introduction

Abstract models of genetic networks, such as Boolean networks and continuous-time switching networks, have been proposed as conceptual models for helping us understand the behavior of real genetic networks (Glass, 1975, Kauffman, 1969, Kauffman, 1993). These formalisms, or generalizations of them, have been used to model such systems as the sporulation network in Bacillus subtilis (de Jong et al., 2004a), the gap gene network of Drosophila melanogaster (Sanchez and Thieffry, 2001), and the segment polarity network in the same organism (Albert and Othmer, 2003). Although these models omit many of the details of the real chemical interactions, they serve as useful syntheses of the often-distributed biological knowledge about these networks, they allow testing of hypotheses about network behavior, and they sometimes lead to new biological hypotheses.

General properties of model networks, particularly randomly generated networks, have been studied in an effort to understand basic principles behind the functioning of real networks. For example, Kauffman has equated different cell types in real organisms with different fixed points or cycles in the dynamics of model networks (Kauffman, 1969, Kauffman, 1993). He and others have studied how the number and period of attractors in random networks depends on network size, topology, and how the dynamics functions are chosen (Bagley and Glass, 1996, Bastolla and Parisi, 1997, Bilke and Sjunnesson, 2001, Glass and Hill, 1998, Kauffman, 1969, Kauffman, 1993, Kauffman et al., 2003, Raeymaekers, 2002, Samuelsson and Troein, 2003, Shmulevich and Kauffman, 2004, Socolar and Kauffman, 2003).

The increasing availability of quantitative gene expression data has kindled the hope of automatically inferring regulatory relationships in real gene networks. There have been some successes (e.g. Jaeger et al., 2004a, Jaeger et al., 2004b, Reinitz and Sharp, 1995), but there is not yet a standard methodology for doing so. Much remains to be understood about the problem. What is the computational complexity of the problem? What algorithms work best? How much data is needed? How should data be collected? Are there fundamental limits on what can be inferred from expression data alone?

Analyses of the inverse problem for Boolean and continuous-time switching networks have begun to provide theoretical answers to these questions. Liang et al. (1998) were the first to propose a solution to the inverse problem for Boolean networks. Later, Akutsu et al. (1999) and Ideker et al. (2000) described alternative solutions. Perkins et al. (2004) described solutions to the inverse problem for continuous-time switching networks. The approaches proposed for Boolean networks can also be applied to continuous-time switching networks, though the methods of Liang et al. (1998) and Akutsu et al. (1999) in particular require significantly more computation than the method described in Perkins et al. (2004).

We focus on the sample complexity of the inverse problem—that is, how much data is needed to identify the network? In particular, we study the problem for Boolean and continuous-time switching networks of N genes in which each gene has precisely K regulators. Akutsu et al. (1999) studied this problem for Boolean networks under the assumption that the data comprises uniformly randomly sampled states of the network. They proved that the amount of data needed scales as log N and as 22K. The 22K term is disheartening, because it suggests that it will take enormous amounts of data to identify densely connected networks. However, the log N dependence is encouraging because it suggests that network size per se is not a very important factor.

We consider solving the inverse problem based on time series data, although, as we argue in Section 4, time series data from randomly generated Boolean networks behave in many respects as randomly sampled data. In Section 5, we show that, regardless of how the data is generated, solving the inverse problem for Boolean networks or continuous-time switching networks requires at least 12(2K+K(log2(NK)log2K)) samples. We then derive new estimates for the expected amount of data required. It turns out that differences in the dynamical properties of these networks, examined in Section 4, have a significant impact. In Section 5, we estimate the expected sample complexity for Boolean networks as O(K2K log N) and for continuous-time switching networks as O(2KN log N). These estimates are supported by simulation experiments, reported in Section 6.

Section snippets

Boolean networks and continuous-time switching networks

Boolean networks, as introduced by Kauffman, 1969, Kauffman, 1993, are a discrete-time model of gene expression dynamics. Each of N genes has a Boolean level of expression as a function of time, denoted by Xi(t)∈{0,1}, where i∈{1, 2, …, N} and t∈{0, 1, 2, …}. Each gene i has Ki regulators, denoted ri1,,riKi. Each gene also has a regulation function, or dynamics function, fi:{0,1}Ki{0,1}. The dynamics of gene i is given byXi(t+1)=fi(Xri1(t),,XriKi(t)).Our analysis and simulations focus on

The inverse problem

Given one or more time series generated by a Boolean network or a continuous-time switching network, the inverse problem is to identify the network generating the data. For simplicity of exposition we assume, in the Boolean network case, a single time series of length T + 1. Thus, the data is a sequence, {X(0), X(1), …, X(T)}, where X(t) is a vector of the Boolean states of the genes at time t. We say this sequence comprises T samples of the network dynamics, because it includes T transitions

Dynamical properties of Boolean networks and continuous-time switching networks

Consider a Boolean network of N genes and K regulators per gene, generated randomly as described in Section 2. A Boolean network is a deterministic dynamical model with a finite number of possible states. Thus, the asymptotic behavior of the network is to reach a fixed point of the dynamics or to reach a repeating cycle of states, where a cycle can be between 2 and 2N states long. It has been observed that when K = 1or 2, a typical network rapidly reaches a fixed point or short-period cycle (

The amount of data needed to solve the inverse problem

How much data is needed to solve the inverse problem for a Boolean or continuous-time switching network of N genes, each having K regulators? A simple lower bound can be derived based on the number of possible networks. There areNKpossible sets of regulators for each gene in a Boolean network, andN1Kfor a continuous-time switching network. In either case, this is more than NKKK. There are also 22K possible dynamics functions for each gene. Thus, the total number of networks on N genes with K

Simulation experiments

We performed simulation experiments to test the bound and estimates of the previous section. In the first experiment, we tested the sample complexity of the inverse problem for Boolean and continuous-time switching networks with K = 5 regulators per gene and number of genes N∈{10, 15, 20, …, 50}. For each choice of N we randomly generated 10 networks, each comprising regulator sets and regulation functions for each gene. Each network was simulated as a Boolean network and as a continuous-time

Discussion

We have observed that the dynamics of randomly generated Boolean and continuous-time switching networks have much different statistical characteristics, which lead to different estimates for the amount of data needed to solve the inverse problem. As did Akutsu et al. (1999), we observed that the number of samples needed for Boolean network identification scales as log N, where N is the number of genes in the network. The log N dependence comes from the assumption that the data is independently

Acknowledgements

This work was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. This material is based upon work supported by the National Science Foundation under a grant awarded in 2002. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References (30)

  • T. Akutsu et al.

    Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

  • M. Aldana et al.

    A natural class of robust networks

    Proc. Natl. Acad. Sci. U.S.A.

    (2003)
  • S. Bilke et al.

    Stability of the Kauffman model

    Phys. Rev. E

    (2001)
  • A.K. Chandra et al.

    The electrical resistance of a graph captures its commute and cover times

    Comput. Complexity

    (1997)
  • H. de Jong et al.

    Qualitative simulation of the initiation of sporulation in Bacillus subtilis

    Bull. Math. Biol.

    (2004)
  • Cited by (14)

    • A novel technique to combine and analyse spatial and temporal expression datasets: A case study with the sea anemone Nematostella vectensis to identify potential gene interactions

      2017, Developmental Biology
      Citation Excerpt :

      Thereafter, based on high-quality data collected during early development of Drosophila melanogaster, the gap gene network was constructed by fitting the model to the data (Jaeger et al., 2004; Reinitz and Sharp, 1995). This reverse-engineered gene regulatory network has then been used to explain the formation of gap genes patterns in great detail and suggested slight modifications to the well investigated network obtained from functional data (Ashyraliyev et al., 2008; Jaeger et al., 2004; Perkins et al., 2006; Surkova et al., 2008). Collecting spatiotemporal data at the level of accuracy and detail (e.g., data available for D. melanogaster) that allows successful reverse-engineering of GRN is an organism-related (e.g., synchronized and stereotyped development) and a labor-intensive task requiring a community that defines minimal standards.

    • Interactive identification based modelling of gene regulatory networks

      2015, Proceedings of the 2015 27th Chinese Control and Decision Conference, CCDC 2015
    • Complexity and evolution of dissipative systems: An analytical approach

      2014, Complexity and Evolution of Dissipative Systems: An Analytical Approach
    • Imagery

      2013, Imagery
    • Robust dynamics in minimal hybrid models of genetic networks

      2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
    View all citing articles on Scopus
    View full text