Boolean regulatory network reconstruction using literature based knowledge with a genetic algorithm optimization method

Background Prior knowledge networks (PKNs) provide a framework for the development of computational biological models, including Boolean models of regulatory networks which are the focus of this work. PKNs are created by a painstaking process of literature curation, and generally describe all relevant regulatory interactions identified using a variety of experimental conditions and systems, such as specific cell types or tissues. Certain of these regulatory interactions may not occur in all biological contexts of interest, and their presence may dramatically change the dynamical behaviour of the resulting computational model, hindering the elucidation of the underlying mechanisms and reducing the usefulness of model predictions. Methods are therefore required to generate optimized contextual network models from generic PKNs. Results We developed a new approach to generate and optimize Boolean networks, based on a given PKN. Using a genetic algorithm, a model network is built as a sub-network of the PKN and trained against experimental data to reproduce the experimentally observed behaviour in terms of attractors and the transitions that occur between them under specific perturbations. The resulting model network is therefore contextualized to the experimental conditions and constitutes a dynamical Boolean model closer to the observed biological process used to train the model than the original PKN. Such a model can then be interrogated to simulate response under perturbation, to detect stable states and their properties, to get insights into the underlying mechanisms and to generate new testable hypotheses. Conclusions Generic PKNs attempt to synthesize knowledge of all interactions occurring in a biological process of interest, irrespective of the specific biological context. This limits their usefulness as a basis for the development of context-specific, predictive dynamical Boolean models. The optimization method presented in this article produces specific, contextualized models from generic PKNs. These contextualized models have improved utility for hypothesis generation and experimental design. The general applicability of this methodological approach makes it suitable for a variety of biological systems and of general interest for biological and medical research. Our method was implemented in the software optimusqual, available online at http://www.vital-it.ch/software/optimusqual/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1287-z) contains supplementary material, which is available to authorized users.

The stochastic method is based on the following algorithm used to find the attractors of a network with perturbation P , starting from a specified set of initial states. In addition to the network, the perturbation and the set of initial states, this algorithm takes as input a number of paths N paths , a maximal number of iterations N max and a number of iterations N stored used to estimate the approximate state transition graph G. We describe the algorithm for asynchronous updates, but it can be trivially generalized to synchronous updates: 1. Randomly choose one network state x init from the set of initial states. Set x = x init and clear the approximate state transition graphG.
2. Perform (N max − N stored ) iterations of a random walk to reach a state closer to an attractor. At each iteration of the random walk, evaluate all successors {x 0 , x 1 , · · · } of state x using asynchronous updates and the constraints given by the perturbation P (nodes states fixed to 0 or 1). If the set of successor states is empty, then x is a steady state: add an edge x → x to the approximate state transition graph G and stop the exploration of this path and go to step (4). Otherwise, randomly choose the next state x ∈ {x 0 , x 1 , · · · } (with uniform probability) and continue the random walk.
3. Perform a depth first search of the state transition graph, starting from the last state obtained during the random walk (denoted x root ). At each iteration, store the current state x and all transitions to successor states {x → x 0 , x → x 1 , · · · } as edges in the approximate state transition graphG. If the set of successor states is empty, then x is a steady state: add an edge x → x to the approximate state transition graphG and stop the exploration of this path and go to step (4). Stop the depth first search and go to (4) whenever all states downstream of x root were discovered or if the number of iterations reaches N stored .
4. The resulting graphG approximates the exact state transition graph in the limit of long term evolution of the network. Attractors of the network are obtained as the set of terminal strongly connected components [2] ofG that contain at least one feedback loop. This constraint on the presence of feedback loops is used to distinguish between a strongly connected component with a single node without feedback loops, which corresponds to an unfinished path (not necessarily an attractor), and a single node with feedback loop which corresponds to a steady state. If an attractor was found, keep the information on the initial state x init which is necessary to recover the reachability from initial states to attractors.
5. Store attractors found and the corresponding initial state x init . Repeat this procedure N paths times from step (1). Output all attractors found, together with information on reachability from inital states to attractors.
The attractor reachability graph of the network can be easily found by applying this algorithm to all transitions from perturbation P n to perturbation P m , using attractors found with perturbation P n as initial state, and evaluating attractors and reachability from initial states for the network with perturbation P m . If a perturbation P n does not have any incoming transition from other perturbations, the initial states are taken as the set of all network states compatible with perturbation P n and with the constraint that network nodes without regulators are fixed to state 0.
The advantage of this method is that it will output only exact attractors of the network. The disadvantage is that not all attractors will necessarily be found. As a consequence, when f T (see below and in main text) is evaluated using the approximate attractor reachability graph obtained with this method, it will be larger than or equal to the exact f T .
In this work, we used N paths = 100, N max = 10N nodes and N stored = N nodes with N nodes the number of nodes in the network.

Fitness function
For each network, we define a multi-dimensional fitness function Its components are defined as: 1. f T measures how well a given model network reproduces experiments from the training set. In the following, the training set is assumed to be given in the form of a graph, with each node defined by a perturbation P of the system (any combination of nodes with states fixed to 0 or 1) and a corresponding observation O (a list of node states measured at equilibrium after the perturbation). The set of nodes is denoted as {(P 0 , O 0 ), (P 1 , O 1 ), · · · }, where P n and O n are the perturbation and observation corresponding to node n. Edges in the training set graph correspond to transitions between stable phenotypes. An edge (P n , O n ) → (P m , O m ) means that under perturbation P n , the system stabilized into a phenotype characterized by observation O n and after perturbation by P m , the system stabilized into a phenotype characterized by observation O m . Given a network and a training set graph, f T is evaluated as follows: • Using the stochastic search described above, evaluate the attractor reachability graph of the network for all perturbations and transitions given in the training set graph. More precisely, for each node (P n , O n ) of the training set graph, the attractor reachability graph contains one node per attractor found after perturbation P n . For clarity, we denote by {A n,0 , A n,1 , · · · } the set of attractors of the network obtained with perturbation P n . For each edge (P n , O n ) → (P m , O m ) in the training set graph, the attractor reachability graph contains edges connecting attractors {A n,0 , A n,1 , · · · } to attractors {A m,0 , A m,1 , · · · }. That is, an edge A n,k → A m,p will be in the attractor reachability graph if and only if at least one state in attractor A n,k obtained with perturbation P n will stabilize into attractor A m,p after perturbation P m .
• For each attractor A n,k obtained with perturbation P n , measure its Manhattan distance to the observed phenotype O n as is the average state of the network node i in attractor A n,k (if attractor A n,k contains more than one state, x (n,k) i is the average node state over all states in A n,k ). If a node appears in observation O n but is not in the network, it is considered as isolated and its state is set to x (n,k) i = 0. An illustration is given in Figure S1.
• Next, we want to consider all subgraphs of the attractor reachability graph which have the same structure as the training set graph. More precisely, we consider all subgraphs g satisfying the following conditions: -For each node (P n , O n ) of the training set graph, the subgraph has exactly one node (chosen among the nodes {A n,0 , A n,1 , · · · }) -For each edge (P n , O n ) → (P m , O m ) in the training set graph, the subgraph has exactly one edge (chosen among the edges of the attractor reachability graph connecting nodes Note that this is not an isomorphism, since one node or edge from the attractor reachability graph may correspond to more than one node or edge in the training set graph. For each subgraph g satisfying these conditions, evaluate the total distance to the training set as the sum of distances d n,k over all nodes (attractors) in the subgraph: where V denotes the set of nodes in g. An illustration is given in Figure S2. Among all possible subgraphs, find the one with minimal total distance to the training set: For the following, we denote by g best the subgraph that minimizes d g .
• Finally, f T is obtained by normalizing this minimal total distance d by the total number of observations in the training set With this normalization, f T is between 0 (best) and 1 (worst) and can be interpreted as the fraction of observations in the training set that are not reproduced correctly by the network. Notes: • Our choice of using only subgraphs of the attractor reachability graph with the same structure as the training set graph has some important consequences. Firstly, if observation O n obtained with perturbation P n is linked to observation O m obtained with perturbation P m in the training set graph (edge (P n , O n ) → (P m , O m )), the attractor associated to O n with perturbation P n must stabilize into the attractor associated to O m after perturbation P m . This reachability condition is strictly enforced, i.e. two attractors which are not connected in the attractor reachability graph will not be considered for the evaluation of f T , even if they are closer to their corresponding observations O n and O m (smaller distances d n,k and d m,p ). Secondly, one node (P n , O n ) in the training set graph must correspond to exactly one attractor of the network after perturbation P n , that is if a node (P n , O n ) in the training set graph has two outgoing edges (P n , O n ) → (P m1 , O m1 ) and (P n , O n ) → (P m2 , O m2 ), then the same attractor of the network obtained with perturbation P n and associated to O n must stabilize into the attractor associated to O m1 after perturbation P m1 and into the attractor associated to O m2 after perturbation P m2 .
• Since the attractor reachability graph of the network is obtained by a stochastic search, it may result in an incomplete graph. That is, for a node (P n , O n ) of the training set graph, the stochastic search may not be able to find any attractor. Similarly, for one edge (P n , O n ) → (P m , O m ) in the training set graph, the stochastic search may not be able to find any transition from attractors obtained with perturbation P n to attractors obtained with P m . If the attractor reachability graph is incomplete, it may not be possible to find any subgraph with same structure as the training set graph. In this case, f T is flagged as invalid.
2. N ess.nodes is the number of essential nodes which are in the network. The set of essential nodes is a predefined set of nodes that should be included in the final model network.
3. N node is the number of nodes in the network.
4. N edges is the number of edges in the network.
Multi-dimensional fitness functions are compared using lexicographical ordering:

Genetic algorithm
The genetic algorithm is implemented as summarized here, with details provided in the following paragraphs: 1. Start with a population of N r empty model networks, also called replicas in the following (unless specified otherwise, we used N r = 50).

Create a new generation of replicas, starting from an empty population:
• Add the n 1 best replicas from the last generation (we used n 1 = 0.1N r ).
• Randomly choose n 2 replicas with probability linearly decreasing from the best to the worst replica. Mutate each of these replicas and add them to the new generation (we used n 2 = N r ).
• Randomly choose n 3 pairs of replicas, with each replica chosen with probability linearly decreasing from best to worst replica. For each pair, generate two new replicas by cross-over and add them to the new generation (we used n 3 = 0.3N r ).
Note that this new generation has n 1 + n 2 + 2n 3 replicas.
3. Evaluate the fitness function for each replica. If a replica has an invalid f T (due to a problem with the stochastic method used to estimate the attractor reachability graph), it is removed from the population. Sort the replicas by increasing fitness function, and keep only the first N r replicas.
4. If the best value of the fitness function did not improve during the last 10 iterations, output the best replica and stop. Otherwise, go to step 2.
Note that the PKN can contain interactions combining multiple input nodes in a Boolean expression with AND and NOT Boolean operators (e.g. gene1 AND NOT gene2 → gene3). In the following, each of these Boolean expressions is considered as one entity (hyperedge) like any other edge and they are added to or removed from the network as a whole.

Mutation:
Replicas can undergo three types of mutation: • Addition/removal of edges (priority p 0 = 10): randomly remove up to 5 edges from the replica and add up to 5 edges taken from one of the these lists: -Edges in PKN but not in replica (probability 0.8).
-Edges in PKN but not in replica and connecting at least an essential node not yet connected to the network (probability 0.2). This mutation should help to increase the number of essential nodes.
• Addition of paths (priority p 1 = 5): Create a list of possible paths in the following way. For each attractor appearing in the subgraph g best (see description of f T evaluation), split the network nodes in two groups G 1 and G 2 by comparing each node state (average over all states in the attractor) to the state expected from the training set. Group G 1 contains nodes that behaves as expected or do not appear in the training set while group G 2 contains the nodes that do not behave as expected. For each pair of nodes (n 1 , n 2 ) ∈ G 1 × G 2 , add (n 1 , n 2 , s) to the list of possible paths, with s = 1 if n 2 state is lower (respectively higher) than expected and n 1 state is higher (respectively lower) than 0.5 and s = −1 if n 2 state is lower (respectively higher) than expected and n 1 state is lower (respectively higher) than 0.5.
Randomly choose one element (n 1 , n 2 , s) in the list of possible paths (with uniform probability distribution). Randomly choose one path (with uniform probability distribution) among all elementary paths in the PKN connecting n 1 to n 2 with sign s and length smaller than or equal to the length of the shortest path from n 1 to n 2 in the PKN + 4. Add this path to the network. To search for elementary paths in the PKN, we use a slightly modified version of the algorithm proposed by Klamt and von Kamp [3] ( Supplementary Information: algorithm 4).
• Remove node (priority p 2 = 1): Randomly remove one non-essential node from the network.
The type of mutation is randomly chosen with probability p i / j p j . After the mutation, isolated nodes (neither incoming nor outgoing edges) are removed from the network.

Cross-over:
Given two replicas, each edge appearing in at least one of the replicas is exchanged between the replicas with probability 0.5. When edges are exchanged, nodes are added if needed.

Evaluation of the network optimization method
Comparing a model network to the gold standard To compare predictions of a model network and the gold standard network, a score (s all ) is defined in the following way: 1. Starting from each attractor of the gold standard network, we use boolSim [1] to evaluate the attractors reached by both the model network and the gold standard network after all single node perturbation of essential nodes (node state fixed to 0, node state fixed to 1 as well as unperturbed network). For an initial attractor a of the gold standard network and a single node perturbation p, we denote by y 2. For a given initial attractor a and perturbation p, we measure the Manhattan distance between the average states reached by the gold standard network and the model network: where the sum runs over all essential nodes. If a node is missing in the model network, it is considered as isolated and its state is set to y (a,p) i = 0.
3. The total distance ∆ between gold standard and model network predictions is then obtained as the sum of the distances obtained for all initial attractors and all perturbations: This distance can then be transformed into a score where M is the number of attractors of the unperturbed gold standard network, N ess.nodes is the number of essential nodes and (2N ess.nodes + 1) is the number of single node perturbations (N ess.nodes perturbations with node state fixed to 0, N ess.nodes perturbations with node state fixed to 1 and 1 perturbation corresponding to unperturbed network).
Using this normalization, the resulting score s all is between 0 (worst) and 1 (best). The score s all is interpreted as a measure of the predictive power of the model network.

Combined predictions: variance and error
In the main paper, we summarize the predictions of multiple model networks by measuring their averages and variances, and we compare variances with errors. In this section, we will define these quantities more precisely. As in the previous section, we consider the average states reached by model networks and the gold standard network after all single node perturbations of essential nodes, starting from each attractor of the gold standard network. The same notations as above are used, except that since we have multiple model networks, the average state of node i in the attractors reached by n-th model network is denoted by y (n,a,p) i . Note that exact attractor reachability graphs are used here (evaluated with boolSim). For each initial attractor a, perturbation p and node i, we measure the average (µ