Dynamical properties of model gene networks and implications for the inverse problem
Introduction
Abstract models of genetic networks, such as Boolean networks and continuous-time switching networks, have been proposed as conceptual models for helping us understand the behavior of real genetic networks (Glass, 1975, Kauffman, 1969, Kauffman, 1993). These formalisms, or generalizations of them, have been used to model such systems as the sporulation network in Bacillus subtilis (de Jong et al., 2004a), the gap gene network of Drosophila melanogaster (Sanchez and Thieffry, 2001), and the segment polarity network in the same organism (Albert and Othmer, 2003). Although these models omit many of the details of the real chemical interactions, they serve as useful syntheses of the often-distributed biological knowledge about these networks, they allow testing of hypotheses about network behavior, and they sometimes lead to new biological hypotheses.
General properties of model networks, particularly randomly generated networks, have been studied in an effort to understand basic principles behind the functioning of real networks. For example, Kauffman has equated different cell types in real organisms with different fixed points or cycles in the dynamics of model networks (Kauffman, 1969, Kauffman, 1993). He and others have studied how the number and period of attractors in random networks depends on network size, topology, and how the dynamics functions are chosen (Bagley and Glass, 1996, Bastolla and Parisi, 1997, Bilke and Sjunnesson, 2001, Glass and Hill, 1998, Kauffman, 1969, Kauffman, 1993, Kauffman et al., 2003, Raeymaekers, 2002, Samuelsson and Troein, 2003, Shmulevich and Kauffman, 2004, Socolar and Kauffman, 2003).
The increasing availability of quantitative gene expression data has kindled the hope of automatically inferring regulatory relationships in real gene networks. There have been some successes (e.g. Jaeger et al., 2004a, Jaeger et al., 2004b, Reinitz and Sharp, 1995), but there is not yet a standard methodology for doing so. Much remains to be understood about the problem. What is the computational complexity of the problem? What algorithms work best? How much data is needed? How should data be collected? Are there fundamental limits on what can be inferred from expression data alone?
Analyses of the inverse problem for Boolean and continuous-time switching networks have begun to provide theoretical answers to these questions. Liang et al. (1998) were the first to propose a solution to the inverse problem for Boolean networks. Later, Akutsu et al. (1999) and Ideker et al. (2000) described alternative solutions. Perkins et al. (2004) described solutions to the inverse problem for continuous-time switching networks. The approaches proposed for Boolean networks can also be applied to continuous-time switching networks, though the methods of Liang et al. (1998) and Akutsu et al. (1999) in particular require significantly more computation than the method described in Perkins et al. (2004).
We focus on the sample complexity of the inverse problem—that is, how much data is needed to identify the network? In particular, we study the problem for Boolean and continuous-time switching networks of N genes in which each gene has precisely K regulators. Akutsu et al. (1999) studied this problem for Boolean networks under the assumption that the data comprises uniformly randomly sampled states of the network. They proved that the amount of data needed scales as log N and as . The term is disheartening, because it suggests that it will take enormous amounts of data to identify densely connected networks. However, the log N dependence is encouraging because it suggests that network size per se is not a very important factor.
We consider solving the inverse problem based on time series data, although, as we argue in Section 4, time series data from randomly generated Boolean networks behave in many respects as randomly sampled data. In Section 5, we show that, regardless of how the data is generated, solving the inverse problem for Boolean networks or continuous-time switching networks requires at least samples. We then derive new estimates for the expected amount of data required. It turns out that differences in the dynamical properties of these networks, examined in Section 4, have a significant impact. In Section 5, we estimate the expected sample complexity for Boolean networks as O(K2K log N) and for continuous-time switching networks as O(2KN log N). These estimates are supported by simulation experiments, reported in Section 6.
Section snippets
Boolean networks and continuous-time switching networks
Boolean networks, as introduced by Kauffman, 1969, Kauffman, 1993, are a discrete-time model of gene expression dynamics. Each of N genes has a Boolean level of expression as a function of time, denoted by Xi(t)∈{0,1}, where i∈{1, 2, …, N} and t∈{0, 1, 2, …}. Each gene i has Ki regulators, denoted . Each gene also has a regulation function, or dynamics function, . The dynamics of gene i is given byOur analysis and simulations focus on
The inverse problem
Given one or more time series generated by a Boolean network or a continuous-time switching network, the inverse problem is to identify the network generating the data. For simplicity of exposition we assume, in the Boolean network case, a single time series of length T + 1. Thus, the data is a sequence, {X(0), X(1), …, X(T)}, where X(t) is a vector of the Boolean states of the genes at time t. We say this sequence comprises T samples of the network dynamics, because it includes T transitions
Dynamical properties of Boolean networks and continuous-time switching networks
Consider a Boolean network of N genes and K regulators per gene, generated randomly as described in Section 2. A Boolean network is a deterministic dynamical model with a finite number of possible states. Thus, the asymptotic behavior of the network is to reach a fixed point of the dynamics or to reach a repeating cycle of states, where a cycle can be between 2 and 2N states long. It has been observed that when K = 1or 2, a typical network rapidly reaches a fixed point or short-period cycle (
The amount of data needed to solve the inverse problem
How much data is needed to solve the inverse problem for a Boolean or continuous-time switching network of N genes, each having K regulators? A simple lower bound can be derived based on the number of possible networks. There arepossible sets of regulators for each gene in a Boolean network, andfor a continuous-time switching network. In either case, this is more than . There are also possible dynamics functions for each gene. Thus, the total number of networks on N genes with K
Simulation experiments
We performed simulation experiments to test the bound and estimates of the previous section. In the first experiment, we tested the sample complexity of the inverse problem for Boolean and continuous-time switching networks with K = 5 regulators per gene and number of genes N∈{10, 15, 20, …, 50}. For each choice of N we randomly generated 10 networks, each comprising regulator sets and regulation functions for each gene. Each network was simulated as a Boolean network and as a continuous-time
Discussion
We have observed that the dynamics of randomly generated Boolean and continuous-time switching networks have much different statistical characteristics, which lead to different estimates for the amount of data needed to solve the inverse problem. As did Akutsu et al. (1999), we observed that the number of samples needed for Boolean network identification scales as log N, where N is the number of genes in the network. The log N dependence comes from the assumption that the data is independently
Acknowledgements
This work was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. This material is based upon work supported by the National Science Foundation under a grant awarded in 2002. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
References (30)
- et al.
The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster
J. Theor. Biol.
(2003) - et al.
Counting and classifying attractors in high dimensional dynamical systems
J. Theor. Biol.
(1996) - et al.
A numerical study of the critical line of Kauffman networks
J. Theor. Biol.
(1997) - et al.
Qualitative simulation of genetic regulatory networks using piecewise-linear models
Bull. Math. Biol.
(2004) Metabolic stability and epigenesis in randomly constructed genetic nets
J. Theor. Biol.
(1969)- et al.
Effects of alternative connectivity on behavior of randomly constructed Boolean networks
Physica D
(2002) - et al.
Inferring models of gene expression dynamics
J. Theor. Biol.
(2004) Dynamics of Boolean networks controlled by biologically meaningful functions
J. Theor. Biol.
(2002)- et al.
Mechanism of eve stripe formation
Mech. Dev.
(1995) - et al.
A logical analysis of the gap gene system
J. Theor. Biol.
(2001)
Identification of genetic networks from a small number of gene expression patterns under the Boolean network model
A natural class of robust networks
Proc. Natl. Acad. Sci. U.S.A.
Stability of the Kauffman model
Phys. Rev. E
The electrical resistance of a graph captures its commute and cover times
Comput. Complexity
Qualitative simulation of the initiation of sporulation in Bacillus subtilis
Bull. Math. Biol.
Cited by (14)
A novel technique to combine and analyse spatial and temporal expression datasets: A case study with the sea anemone Nematostella vectensis to identify potential gene interactions
2017, Developmental BiologyCitation Excerpt :Thereafter, based on high-quality data collected during early development of Drosophila melanogaster, the gap gene network was constructed by fitting the model to the data (Jaeger et al., 2004; Reinitz and Sharp, 1995). This reverse-engineered gene regulatory network has then been used to explain the formation of gap genes patterns in great detail and suggested slight modifications to the well investigated network obtained from functional data (Ashyraliyev et al., 2008; Jaeger et al., 2004; Perkins et al., 2006; Surkova et al., 2008). Collecting spatiotemporal data at the level of accuracy and detail (e.g., data available for D. melanogaster) that allows successful reverse-engineering of GRN is an organism-related (e.g., synchronized and stereotyped development) and a labor-intensive task requiring a community that defines minimal standards.
Interactive identification based modelling of gene regulatory networks
2015, Proceedings of the 2015 27th Chinese Control and Decision Conference, CCDC 2015Complexity and evolution of dissipative systems: An analytical approach
2014, Complexity and Evolution of Dissipative Systems: An Analytical ApproachGeneric properties of random gene regulatory networks
2013, Quantitative BiologyImagery
2013, ImageryRobust dynamics in minimal hybrid models of genetic networks
2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences