Identification of control targets in Boolean molecular network models via computational algebra

Background Many problems in biomedicine and other areas of the life sciences can be characterized as control problems, with the goal of finding strategies to change a disease or otherwise undesirable state of a biological system into another, more desirable, state through an intervention, such as a drug or other therapeutic treatment. The identification of such strategies is typically based on a mathematical model of the process to be altered through targeted control inputs. This paper focuses on processes at the molecular level that determine the state of an individual cell, involving signaling or gene regulation. The mathematical model type considered is that of Boolean networks. The potential control targets can be represented by a set of nodes and edges that can be manipulated to produce a desired effect on the system. Results This paper presents a method for the identification of potential intervention targets in Boolean molecular network models using algebraic techniques. The approach exploits an algebraic representation of Boolean networks to encode the control candidates in the network wiring diagram as the solutions of a system of polynomials equations, and then uses computational algebra techniques to find such controllers. The control methods in this paper are validated through the identification of combinatorial interventions in the signaling pathways of previously reported control targets in two well studied systems, a p53-mdm2 network and a blood T cell lymphocyte granular leukemia survival signaling network. Supplementary data is available online and our code in Macaulay2 and Matlab are available via http://www.ms.uky.edu/~dmu228/ControlAlg. Conclusions This paper presents a novel method for the identification of intervention targets in Boolean network models. The results in this paper show that the proposed methods are useful and efficient for moderately large networks. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0332-x) contains supplementary material, which is available to authorized users.


Introduction
The dynamics of gene regulatory networks (GRNs) have been studied using different modeling frameworks, with the goal of building computational models of GRNs to get new insights into important cellular processes, e.g., the cell cycle, [34,20], oscillations in the p53-mdm2 system, [9,3,4], the phage-lambda system, [15,52,24], or the T cell large granular lymphocyte (T-LGL) leukemia network, [53,27].A generally difficult problem is to design control policies to achieve desired dynamics in GRNs.This is particularly important in the control of cancer cells, [4,43,19,44,8] and cell fate reprogramming, [33,46].Thus, developing tools to control mathematical models of GRNs are key to the design of experimental control policies.
There is a rich theory for the control of continuous models such as systems of differential equations, [13,29].Discrete models such as Boolean networks (BN) have been proposed to study GRNs.In a BN, the genes of a GRN are represented by a set of nodes that can take on only two possible states (ON or OFF).Time is discrete, and the state of a gene at the next time step is determined by a Boolean function over a subset of nodes of the BN.The dependence of a gene on the state of another gene is graphically represented by a directed edge.BN models are widely used in molecular and systems biology to capture coarse-grained dynamics of a variety of regulatory networks and have been shown to provide a good approximation of the dynamics of continuous processes, [1,17,28,2,7,41,27,24,38,40].However, control methods for discrete models are still in their infancy, compared to the theory for continuous models.
In this work, we propose a framework to study the control of BNs.Therapeutic interventions are modeled by manipulating the wiring diagram of a BN accordingly.For example, a gene knockout is modeled by fixing the value of one node of the BN to OFF.Similarly, deleting directed edges of a BN models the action of a drug that inactivates the interaction between two gene products.The control problem consists in finding the sequence of actions (node and edge deletions) to get the BN out of an undesirable state, and drive it towards a desirable state.Undesirable states may represent a disease condition of a cell such as, for example, a highly proliferative state of a cancer cell, and a desirable state may represent cell death.Thus, a therapeutic intervention could aim at inducing the death of aberrant tumor cells.
Several approaches to this problem have been developed in recent years.For example, the optimal control techniques developed in [47,48,50,49] take as input a list of possible control nodes.Other approaches for the identification of intervention targets include the use of stable motifs in the network to induce the system to move towards a desirable state, [51].In [26] integer programming was used to find a possible set of nodes to control the state of BNs representing tumor and normal cells.Optimization techniques were used in [32] to determine possible node manipulations for BNs.There are also search algorithms based on genetic and greedy algorithms described in [45,42,25].
The idea behind our approach is to encode the problem of finding control candidates as a problem of solving a system of polynomial equations.Then we can use computational algebra techniques to solve the system.This approach takes advantage of the rich algorithmic theory of computer algebra (e.g.Gröbner basis techniques) and the theoretical foundations of algebraic geometry with software available for computations (e.g., Macaulay2 [10]).The output of our method is a set of candidate actions with the capacity to induce the GRN towards desirable states.The method also has the ability of identifying combinatorial interventions in the network which are sometimes more effective as will be shown in the results section.The algebraic view of discrete models has been previously used for the development of computational tools to determine the attractors of BNs [37,39,11,35], and also for inferring network structure from dynamics [6,36,14,18].

Boolean Networks
A Boolean network can be defined as a dynamical system that is discrete in time as well as in variable states.More formally, consider a collection x 1 , . . ., x n of variables, each of which can take on values in the binary set {0, 1}.A Boolean network in the variables x 1 , . . ., x n is a function where each coordinate function f i : {0, 1} n → {0, 1} is a Boolean function on a subset of {x 1 , . . ., x n } which represents how the future value of the i-th variable depends on the present values of the other variables.If the set {0, 1} is given the structure of a finite field with standard addition and multiplication, F 2 = {0, 1},

F F
Figure 1: Description of the algebraic approach of identification of control targets.A. A BN model of a molecular network.The control variables are the entries of u.B. In the absence of control policies (u = 0), the attractor landscape (L F ) can have undesirable attractors.C. The goal is to choose control values that give a desired attractor landscape, L * .D. To use the algebraic approach we first find the polynomial representation of the BN (see Section 2.2).E. The next step is to set up the desired attractor landscape as a system of equations that the BN has to satisfy, L F (x,u) = L * (see Section 2.3).F. Solving the equation L F (x,u) = L * for u will provide the control values to achieve the desired landscape (see Section 2.4).This approach not only finds individual control policies (u = A, single node; u = B, single edge), but also combinatorial control policies (u = C, two nodes; u = D, two edges).In a combinatorial control policy, the desired attractor landscape is achieved by the combination of two or more entries of u.
then the functions f i : F n 2 → F 2 can be represented as polynomials over F 2 , and the dynamical system F = (f 1 , . . ., f n ) : F n 2 → F n 2 becomes a polynomial dynamical system, see [39], which gives access to a range of mathematical tools for the analysis of F.
Given a Boolean network F = (f 1 , . . ., f n ), a directed graph W with n nodes x 1 , . . ., x n is associated with F. There is a directed edge in W from x j to x i if x j appears in f i .Notice that the presence of the interaction x j → x i implies that the regulatory function f i depends on x j , say f i (x k 1 , . . ., x j , . . ., x km ) with x j ∈ {x k 1 , . . ., x km }.In the context of a molecular network model, W represents the wiring diagram of the network.
The dynamical properties of a Boolean network are given by the difference equation x(t + 1) = F(x(t)); that is, the dynamics is generated by iteration of F.More precisely, the dynamics of F is given by the state space graph S, defined as the graph with vertices in F n 2 = {0, 1} n , which has an edge from x ∈ F n 2 to y ∈ F n 2 if and only if y = F(x).The states x ∈ F n 2 where the system will get stabilized are of particular importance.These special points of the state space are called attractors of a Boolean network and these may include steady states (fixed points), where F(x) = x, and cycles, where F k (x) = x for some integer k > 1. Attractors in Boolean network modeling might represent cell types [16] or cellular states such as apoptosis, proliferation, or cell senescence [12,30].Identifying the attractors of a system is an important step towards the control of that system.
A Boolean network with control is given by a function F : where U is a set that denotes all possible controls (Fig. 1A).Given a control u in U , the dynamics are given by x(t + 1) = F(x(t), u).
A schematic of our approach is given in Figure 1.Here we define and explain all the steps in more detail.

Control Actions: edge and node manipulations
This paper considers two types of control actions: 1. Deletion of edges and 2. Deletion (or constant expression) of nodes.The motivation for considering these actions is that they represent the common interventions that prevent a regulation from happening.For instance, an edge deletion can be achieved by the use of therapeutic drugs that target that specific gene interaction and node deletions represent the blocking of effects of products of genes associated to these nodes; see [4].
We consider a BN F = (f 1 , . . ., f n ) : F n 2 → F n 2 and show how to encode edge and node controls by That is, the case of no control coincides with the original BN.
Definition 2.1 (Edge Control).Consider the edge x i → x j in the wiring diagram W. The function encodes the control of the edge x i → x j , since for each possible value of u i,j ∈ F 2 we have the following control settings: That is, the control is not active.
• When u i,j = 1, F j (x, 1) = f j (x 1 , . . ., x i = 0, . . ., x n ).In this case, the control is active, and the action represents the removal of the edge x i → x j .
Definition 2.1 can easily be extended for the control of many edges, so that we obtain F : , where e is the number of edges in the wiring diagram.Each coordinate, u j i , of u in F(x, u) encodes the control of an edge x i → x j .Definition 2.2 (Node Control).Consider the node x i in the wiring diagram W. The function encodes the control (knock-out or constant expression) of the node x i , since for each possible value of (u i , v i ) ∈ F 2 2 we have the following control settings: That is, the control is not active.
• For u − i = 1, u + i = 0, F j (x, 1, 0) = 0.This action represents the knock out of the node x j .
• For u − i = 1, u + i = 1, F j (x, 1, 1) = f j (x t 1 , . . ., x tm ) + 1.This action changes the Boolean function to its negative value and might not be a relevant case of control.Definition 2.2 can easily be extended for the control of all nodes, so that we obtain F : encodes the control of a node (knock-out for s = 0, and constant expression for s = 1).

Control targets in Boolean networks.
We consider a BN with control F : F n 2 ×U → F n 2 , and denote by F the BN with no control (F(x, 0) = F(x)).

Generating new steady states
Suppose that y 0 = (y 01 , . . ., y 0n ) ∈ F n 2 is a desirable cell state (for instance, it could represent the state of cell senescence) but is not a fixed point, i.e., F(y 0 ) = y 0 .The problem is then to choose a control u such that F(y 0 , u) = y 0 .We now show how this can be achieved in our framework.
After encoding our BN with control as a polynomial system 2), we consider the system of polynomial equations in the u parameters:

Destroying existing steady states, or, in general, blocking transitions
Suppose that x 0 ∈ F n 2 is an undesirable steady state of F(x), that is, F(x 0 ) = x 0 (for instance, it could represent a tumor proliferative cell state that needs to be avoided).The goal here is to find a set of controls such that F(x 0 , u) = x 0 .More generally, we may want to avoid a transition between two states x 0 and z 0 .That is, we want to find controls such that F(x 0 , u) = z 0 .To solve this problem consider the following equation, (F 1 (x 0 , u) − z 01 + 1) . . .
Equation 4 defines a system of polynomial equations in the u parameters.It can be shown that F(x 0 , u) = z 0 if and only if Equation 4 is satisfied.

Blocking regions in the state space
We now consider the case where we want the dynamics to avoid certain regions.For example, if a particular value of a variable, x k = a ∈ F 2 , triggers an undesirable pathway, or is the signature of an abnormal cell, then we want all steady states of the system to satisfy x k = a.In this case, we consider the systems of equations Note that in contrast to Sections 2.3.1 and 2.3.2,we are now using variables for x instead of specific values.Since the steady states with x k = a are to be avoided, we want to find controls u for which Equation 5 has no solution.
Figure 2: The p53-mdm2 network adapted from [4].See Table 1 for the impact of the removal of these edges in the dynamics of the network.

Identifying control targets
In each case of Section 2.3 we obtained a system of equations (or a single equation) that we need to solve to find the appropriate controls.This can be done using computational algebra tools.For instance, we can compute the Gröbner basisof the ideal associated to Equation 3, see [5], As we will show in Examples 3.1-3.2,the computation of a Gröbner basis allows us to read out all controls for which the system of equations has a solution.Furthermore, our algebraic approach can detect combinatorial control actions (control by the synergistic combination of more than one action).

Results
We test our control methods using two published models where potential intervention targets were identified along with experimental validations.In the first example we focus on control with edge deletions while for the second we use control with node deletions.
Example 3.1.P53-mdm2 network.A Boolean network model for the signaling response of DNA damage in a p53 network was developed in [4]; see Figure 2. The tumor supressor protein p53 can trigger either cell cycle arrest or apoptosis in response to DNA damage in normal cells.The authors of [4] extended their analysis to a breast cancer cell line, MCF7, with the purpose of identifying potential therapeutic interventions in the network for the cancer cell case.Thus, in this example we will focus on the cancer cell model where PTEN and p14ARf are inactive (fixed to zero) and cyclinG is always active (fixed to 1), see Table 5 of [4].This system can be represented as a discrete dynamical system F = (f 1 , . . ., f 16 ) : with 16 nodes and 50 edges.We represent the nodes by  7. Edge in red represents the transition target to destroy the limit cycle.
The polynomial functions for this network are listed in the supporting material.In the presence of DNA damage this network has a single limit cycle representing the state of cell cycle arrest, where p53 and p21 are oscillating; see Figure 3. Suppose that we want to find out which edges we could manipulate in order to destroy this limit cycle and to lead the whole system into a different attractor, one that represents a desirable cell state.In this case, let us consider a state that represents cell death, y 0 = (1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1), (8) where x 16 = caspase is active.In order to make y 0 a steady state with the approach described in the Methods section, we form the following system of polynomial equations, The solutions for the system of equations 9 are given by the nonzero generators of the ideal associated to System 9, given in Equation 10, {u 2,5 + 1, u 12,8 (u 1,8 + 1), u 6,6 (u 3,6 + 1), u 2,2 (u 3,2 + 1), u 1,2 (u 3,2 + 1), u 1,1 u 9,1 , u 9,12 u 12,12 (u 6,12 + 1)} (10) There are 945 solutions for Equation 10.Each solution gives a controller that guarantees the desired steady state, but each controller may have a different impact on the dynamics of the network.Table 1 (middle row) shows one of the controllers from Equation 10.In the Table S1 of the supporting material, we list the top ten control combinations from Equation 10, that is, those that maximized the basin size of y 0 as well as the ten sets that give the smallest basin for y 0 ; see Table S2.Furthermore, one could aim to destroy the limit cycle in Figure 3.For instance, if we target one of the transitions within this limit cycle, let us take the transition x 5 → x 6 marked in red in Figure 3. Blocking this transition gives additional controllers that help to stabilize the system in the desired fixed point; see the last row of Table 1.
The deletion of the edges mdm2 → p53 and p53 → Wip1 were identified by [4] by deleting one edge at a time and checking if the modified system had the desired attractor.[4] reported that the combinatorial action of these controllers increased to more that 50% the basin of attraction of the desired fixed point which they also validated experimentally.However, doing this type of search to find all possible combination of controls is infeasible.Since for each edge we have 2 possible actions (control or no control), and there are 50 edges, then there are 2 50 networks to be analyzed in total.In contrast, the computational algebra approach allows us to obtain the parameter combinations that guarantee the desired steady state of the system, see Table 1.
Example 3.2.T-LGL network.A Boolean network model for the blood cancer large granular lymphocyte (T-LGL) leukemia was developed in [53] and further analyzed in [27,51].T-LGL leukemia is characterized by escaping cell death through abnormal mechanisms, which are insensitive to Fas-induced apoptosis, [53].This network has 60 nodes but was reduced to a subnetwork of 16 nodes for steady state analysis, see Figure 4.Here we use the 16-nodes network to identify potential control targets.We represent the nodes by

Controllers applied
Ref.
Basin size of y 0 mdm2 → p53 [4] 35581 (54.29%) A control set that 39856 (60.82%)M dm2 → p21 forces y 0 to be M dm2 → p53 a fixed point, from p21 → Caspase Equation 10. mdm2 → p53 A control set to make 65536 (100%) and for blocking p21 → Caspase the transition in red Table 1: Difference of impact in the combinatorial action of edge deletions.Control edges that increase the basin of attraction of cell death represented by y 0 in Equation 8.There are 2 16 = 65536 possible states.The number in parentheses is the ratio between the basin size and the total number of states.
Figure 4: Reduced T-LGL network adapted from [27].The negative edges from Apoptosis were omitted for simplicity.
The polynomial rules for this system are listed in System 11, This system has three steady states, one that represents the normal state, x 0 = 0000000000000001, where Caspase and Apoptosis is ON, and 0001101000101110, 0011101000101110, the disease states in which Caspase and Apoptosis is OFF.
To find the potential node deletions (or constant expressions) that will block the disease states we use Eq. 5, Equation 12 encodes all parameters for which there is a steady state where Caspase and Apoptosis is OFF.Thus, we are interested in the parameters for which this system of equations has no solution.Since for each node we have 3 possible actions (no control, node deletion, constant expression), there are 3 16 networks to be analyzed in total.Thus, even for this small network, exhaustive search is computationally challenging.
On the other hand, computational algebra allows us to obtain the parameter combinations that guarantee that the disease states are not fixed points of the system (see the supplementary material for details).The parameter combinations (enclosed in brackets, where entries not shown are equal to zero) are Thus we obtain that the constant expression of Ceramide, DISC, Caspase, or BID, or the deletion of MCL1 or S1P, will guarantee that the disease states are not steady states of the system (Table 2).These controls could also be found by trying one control at a time as was done in [27].Importantly, our computational algebra approach shows that there are two additional control policies that consist of a combination of different controls.It can be shown that neither Fas, sFas, FLIP individually can eliminate the disease states, but the deletion of FLIP combined with the constant expression of Fas or the deletion Solution Control targets

Attractor
Basin size of sFas will work (Table 2).Furthermore, those are the only combinations that guarantee the disease states will not be attractors.
We note that, in the worst case, computing the Gröbner basis for a system of polynomial equations has a complexity of doubly exponential in the number of solutions.However, for the type of networks discussed in this paper, namely, biological networks where most of the nodes are regulated by only a small subset of the other nodes, Gröbner basis can be computed in a reasonable time, see [11].

Discussion
The design of control policies for gene regulatory networks is an important challenge in systems biology.The method described here exploits the interplay between the structure and the dynamics of the network to identify potential control interventions that will drive the system towards desired dynamics.The formulation of the problem as that of finding all solutions to a system of polynomial equations provides an alternative to exhaustive search of all possible combinations of interventions, which often is not feasible.
One shortcoming of the method is that, in its current form, it requires the Boolean network model to be updated synchronously, with deterministic dynamics.While steady states of the network do not depend on the update order used, general limit cycles and attractor basins do, however.Thus, some of the methods described here might not be applicable in a stochastic setting.For instance, if the dynamics is generated using the asynchronous update or more generally using the stochastic setting in [24] or in [31] then encoding the controllers for blocking transitions will need have a different set up than the one described here.However, the method for producing a new steady state is still valid for all variants of stochastic updates because the system will have the same steady state.Another shortcoming is that the control methods described in this paper were developed for Boolean networks only.However, many published discrete models of GRNs are multistate discrete models.These methods could potentially be extended to the general setting, where the network variables might attain more than two states, [23,22,39].The control targets identified by the algebraic techniques described in this paper could then be used for further analysis of the system, such as for studying reachability [21], or for designing optimal control policies in a stochastic setting [47,48,50,49].
One important challenge in the process of identification of control targets in a network is to develop methods that can identify controllers that can guarantee global reachability of a desired steady state.This important problem is not addressed in this paper, and remains to be studied in the future.Finally, it is worth pointing out that the methods in this paper might produce large numbers of given control strategies, which all give the desired result, but many of these might be biologically meaningless or infeasible as actual interventions.Eliminating those might be quite challenging.

Conclusions
This paper presents a novel approach to the identification of potential interventions in Boolean molecular networks.The methods use the theoretical foundations of algebraic geometry to encode the structure of a network by a set of polynomials and then, with the use of computer algebra techniques, find a set of nodes and edges to perform interventions in silico.The methods were tested using two published models where dynamic network interventions were identified, the p53-mdm2 system and the T-LGL leukemia model.It was shown that the methods in this paper can identify the controllers that were already reported and also find new potential targets.Some of these new control targets are combinatorial in nature and might result in more efficient strategies as was shown in the results section using the change in the basins of the system as a measure of efficiency.

Figure 3 :
Figure 3: States of limit cycle representing cell cycle arrest in the p53 model.The order of the vector entries follows the indexing in Equation7.Edge in red represents the transition target to destroy the limit cycle.

Table 2 :
Control nodes for the reduced T-LGL network.The last two rows represent combinatorial actions of two nodes.All attractors are steady states, and the basin sizes include the steady states themselves.Notice that node x 6 = Apoptosis is a conceptual node in this model, thus it is not a relevant solution for network control.