Dynamical properties of model gene networks and implications for the inverse problem

doi:10.1016/j.biosystems.2005.09.010

Biosystems

Volume 84, Issue 2, May 2006, Pages 115-123

https://doi.org/10.1016/j.biosystems.2005.09.010 Get rights and content

Abstract

We study the inverse problem, or the “reverse-engineering” problem, for two abstract models of gene expression dynamics, discrete-time Boolean networks and continuous-time switching networks. Formally, the inverse problem is similar for both types of networks. For each gene, its regulators and its Boolean dynamics function must be identified. However, differences in the dynamical properties of these two types of networks affect the amount of data that is necessary for solving the inverse problem. We derive estimates for the average amounts of time series data required to solve the inverse problem for randomly generated Boolean and continuous-time switching networks. We also derive a lower bound on the amount of data needed that holds for both types of networks. We find that the amount of data required is logarithmic in the number of genes for Boolean networks, matching the general lower bound and previous theory, but are superlinear in the number of genes for continuous-time switching networks. We also find that the amount of data needed scales as 2^K, where K is the number of regulators per gene, rather than $2^{2 K}$ , as previous theory suggests.

Introduction

Abstract models of genetic networks, such as Boolean networks and continuous-time switching networks, have been proposed as conceptual models for helping us understand the behavior of real genetic networks (Glass, 1975, Kauffman, 1969, Kauffman, 1993). These formalisms, or generalizations of them, have been used to model such systems as the sporulation network in Bacillus subtilis (de Jong et al., 2004a), the gap gene network of Drosophila melanogaster (Sanchez and Thieffry, 2001), and the segment polarity network in the same organism (Albert and Othmer, 2003). Although these models omit many of the details of the real chemical interactions, they serve as useful syntheses of the often-distributed biological knowledge about these networks, they allow testing of hypotheses about network behavior, and they sometimes lead to new biological hypotheses.

General properties of model networks, particularly randomly generated networks, have been studied in an effort to understand basic principles behind the functioning of real networks. For example, Kauffman has equated different cell types in real organisms with different fixed points or cycles in the dynamics of model networks (Kauffman, 1969, Kauffman, 1993). He and others have studied how the number and period of attractors in random networks depends on network size, topology, and how the dynamics functions are chosen (Bagley and Glass, 1996, Bastolla and Parisi, 1997, Bilke and Sjunnesson, 2001, Glass and Hill, 1998, Kauffman, 1969, Kauffman, 1993, Kauffman et al., 2003, Raeymaekers, 2002, Samuelsson and Troein, 2003, Shmulevich and Kauffman, 2004, Socolar and Kauffman, 2003).

The increasing availability of quantitative gene expression data has kindled the hope of automatically inferring regulatory relationships in real gene networks. There have been some successes (e.g. Jaeger et al., 2004a, Jaeger et al., 2004b, Reinitz and Sharp, 1995), but there is not yet a standard methodology for doing so. Much remains to be understood about the problem. What is the computational complexity of the problem? What algorithms work best? How much data is needed? How should data be collected? Are there fundamental limits on what can be inferred from expression data alone?

Analyses of the inverse problem for Boolean and continuous-time switching networks have begun to provide theoretical answers to these questions. Liang et al. (1998) were the first to propose a solution to the inverse problem for Boolean networks. Later, Akutsu et al. (1999) and Ideker et al. (2000) described alternative solutions. Perkins et al. (2004) described solutions to the inverse problem for continuous-time switching networks. The approaches proposed for Boolean networks can also be applied to continuous-time switching networks, though the methods of Liang et al. (1998) and Akutsu et al. (1999) in particular require significantly more computation than the method described in Perkins et al. (2004).

We focus on the sample complexity of the inverse problem—that is, how much data is needed to identify the network? In particular, we study the problem for Boolean and continuous-time switching networks of N genes in which each gene has precisely K regulators. Akutsu et al. (1999) studied this problem for Boolean networks under the assumption that the data comprises uniformly randomly sampled states of the network. They proved that the amount of data needed scales as log N and as $2^{2 K}$ . The $2^{2 K}$ term is disheartening, because it suggests that it will take enormous amounts of data to identify densely connected networks. However, the log N dependence is encouraging because it suggests that network size per se is not a very important factor.

We consider solving the inverse problem based on time series data, although, as we argue in Section 4, time series data from randomly generated Boolean networks behave in many respects as randomly sampled data. In Section 5, we show that, regardless of how the data is generated, solving the inverse problem for Boolean networks or continuous-time switching networks requires at least $\frac{1}{2} (2^{K} + K (lo g_{2} (N - K) - lo g_{2} K))$ samples. We then derive new estimates for the expected amount of data required. It turns out that differences in the dynamical properties of these networks, examined in Section 4, have a significant impact. In Section 5, we estimate the expected sample complexity for Boolean networks as O(K2^K log N) and for continuous-time switching networks as O(2^KN log N). These estimates are supported by simulation experiments, reported in Section 6.

Section snippets

Boolean networks and continuous-time switching networks

Boolean networks, as introduced by Kauffman, 1969, Kauffman, 1993, are a discrete-time model of gene expression dynamics. Each of N genes has a Boolean level of expression as a function of time, denoted by X_i(t)∈{0,1}, where i∈{1, 2, …, N} and t∈{0, 1, 2, …}. Each gene i has K_i regulators, denoted $r_{i}^{1}, \dots, r_{i}^{K_{i}}$ . Each gene also has a regulation function, or dynamics function, $f_{i} : {0, 1}^{K_{i}} \mapsto {0, 1}$ . The dynamics of gene i is given by $X_{i} (t + 1) = f_{i} (X_{r_{i}^{1}} (t), \dots, X_{r_{i}^{K_{i}}} (t)) .$ Our analysis and simulations focus on

The inverse problem

Given one or more time series generated by a Boolean network or a continuous-time switching network, the inverse problem is to identify the network generating the data. For simplicity of exposition we assume, in the Boolean network case, a single time series of length T + 1. Thus, the data is a sequence, {X(0), X(1), …, X(T)}, where X(t) is a vector of the Boolean states of the genes at time t. We say this sequence comprises T samples of the network dynamics, because it includes T transitions

Dynamical properties of Boolean networks and continuous-time switching networks

Consider a Boolean network of N genes and K regulators per gene, generated randomly as described in Section 2. A Boolean network is a deterministic dynamical model with a finite number of possible states. Thus, the asymptotic behavior of the network is to reach a fixed point of the dynamics or to reach a repeating cycle of states, where a cycle can be between 2 and 2^N states long. It has been observed that when K = 1or 2, a typical network rapidly reaches a fixed point or short-period cycle (

The amount of data needed to solve the inverse problem

How much data is needed to solve the inverse problem for a Boolean or continuous-time switching network of N genes, each having K regulators? A simple lower bound can be derived based on the number of possible networks. There are $(\begin{array}{c} N \\ K \end{array})$ possible sets of regulators for each gene in a Boolean network, and $(\begin{array}{c} N - 1 \\ K \end{array})$ for a continuous-time switching network. In either case, this is more than ${(\frac{N - K}{K})}^{K}$ . There are also $2^{2^{K}}$ possible dynamics functions for each gene. Thus, the total number of networks on N genes with K

Simulation experiments

We performed simulation experiments to test the bound and estimates of the previous section. In the first experiment, we tested the sample complexity of the inverse problem for Boolean and continuous-time switching networks with K = 5 regulators per gene and number of genes N∈{10, 15, 20, …, 50}. For each choice of N we randomly generated 10 networks, each comprising regulator sets and regulation functions for each gene. Each network was simulated as a Boolean network and as a continuous-time

Discussion

We have observed that the dynamics of randomly generated Boolean and continuous-time switching networks have much different statistical characteristics, which lead to different estimates for the amount of data needed to solve the inverse problem. As did Akutsu et al. (1999), we observed that the number of samples needed for Boolean network identification scales as log N, where N is the number of genes in the network. The log N dependence comes from the assumption that the data is independently

Acknowledgements

This work was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. This material is based upon work supported by the National Science Foundation under a grant awarded in 2002. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References (30)

R. Albert et al.
The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster
J. Theor. Biol.
(2003)
R.J. Bagley et al.
Counting and classifying attractors in high dimensional dynamical systems
J. Theor. Biol.
(1996)
U. Bastolla et al.
A numerical study of the critical line of Kauffman networks
J. Theor. Biol.
(1997)
H. de Jong et al.
Qualitative simulation of genetic regulatory networks using piecewise-linear models
Bull. Math. Biol.
(2004)
S.A. Kauffman
Metabolic stability and epigenesis in randomly constructed genetic nets
J. Theor. Biol.
(1969)
C. Oosawa et al.
Effects of alternative connectivity on behavior of randomly constructed Boolean networks
Physica D
(2002)
T.J. Perkins et al.
Inferring models of gene expression dynamics
J. Theor. Biol.
(2004)
L. Raeymaekers
Dynamics of Boolean networks controlled by biologically meaningful functions
J. Theor. Biol.
(2002)
J. Reinitz et al.
Mechanism of eve stripe formation
Mech. Dev.
(1995)
L. Sanchez et al.
A logical analysis of the gap gene system
J. Theor. Biol.
(2001)

T. Akutsu et al.

Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

M. Aldana et al.

A natural class of robust networks

Proc. Natl. Acad. Sci. U.S.A.

(2003)

S. Bilke et al.

Stability of the Kauffman model

Phys. Rev. E

(2001)

A.K. Chandra et al.

The electrical resistance of a graph captures its commute and cover times

Comput. Complexity

(1997)

H. de Jong et al.

Qualitative simulation of the initiation of sporulation in Bacillus subtilis

Bull. Math. Biol.

(2004)

Cited by (14)

A novel technique to combine and analyse spatial and temporal expression datasets: A case study with the sea anemone Nematostella vectensis to identify potential gene interactions
2017, Developmental Biology
Citation Excerpt :
Thereafter, based on high-quality data collected during early development of Drosophila melanogaster, the gap gene network was constructed by fitting the model to the data (Jaeger et al., 2004; Reinitz and Sharp, 1995). This reverse-engineered gene regulatory network has then been used to explain the formation of gap genes patterns in great detail and suggested slight modifications to the well investigated network obtained from functional data (Ashyraliyev et al., 2008; Jaeger et al., 2004; Perkins et al., 2006; Surkova et al., 2008). Collecting spatiotemporal data at the level of accuracy and detail (e.g., data available for D. melanogaster) that allows successful reverse-engineering of GRN is an organism-related (e.g., synchronized and stereotyped development) and a labor-intensive task requiring a community that defines minimal standards.
Understanding genetic interactions during early development of a given organism, is the first step toward unveiling gene regulatory networks (GRNs) that govern a biological process of interest. Predicting such interactions from large expression datasets by performing targeted knock-down/knock-out approaches is a challenging task. We use the currently available expression datasets (in situ hybridization images & qPCR time series) for a basal anthozoan the sea anemone N. vectensis to construct continuous spatiotemporal gene expression patterns during its early development. Moreover, by combining cluster results from each dataset we develop a method that provides testable hypotheses about potential genetic interactions. We show that the analysis of spatial gene expression patterns reveals functional regions of the embryo during the gastrulation. The clustering results from qPCR time series unveils significant temporal events and highlights genes potentially involved in N. vectensis gastrulation. Furthermore, we introduce a method for merging the clustering results from spatial and temporal datasets by which we can group genes that are expressed in the same region and at the time. We demonstrate that the merged clusters can be used to identify GRN interactions involved in various processes and to predict possible activators or repressors of any gene in the dataset. Finally, we validate our methods and results by predicting the repressor effect of NvErg on NvBra in the central domain during the gastrulation that has recently been confirmed by functional analysis.
Interactive identification based modelling of gene regulatory networks
2015, Proceedings of the 2015 27th Chinese Control and Decision Conference, CCDC 2015
Complexity and evolution of dissipative systems: An analytical approach
2014, Complexity and Evolution of Dissipative Systems: An Analytical Approach
Generic properties of random gene regulatory networks
2013, Quantitative Biology
Imagery
2013, Imagery
Robust dynamics in minimal hybrid models of genetic networks
2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

View all citing articles on Scopus

View full text

Dynamical properties of model gene networks and implications for the inverse problem

Abstract

Introduction

Section snippets

Boolean networks and continuous-time switching networks

The inverse problem

Dynamical properties of Boolean networks and continuous-time switching networks

The amount of data needed to solve the inverse problem

Simulation experiments

Discussion

Acknowledgements

J. Theor. Biol.

J. Theor. Biol.

J. Theor. Biol.

Bull. Math. Biol.

J. Theor. Biol.

Physica D

J. Theor. Biol.

J. Theor. Biol.

Mech. Dev.

J. Theor. Biol.

Identification of genetic networks from a small number of gene expression patterns under the Boolean network model

A natural class of robust networks

Proc. Natl. Acad. Sci. U.S.A.

Stability of the Kauffman model

Phys. Rev. E

The electrical resistance of a graph captures its commute and cover times

Comput. Complexity

Qualitative simulation of the initiation of sporulation in Bacillus subtilis

Bull. Math. Biol.