Identification of Boolean Network Models From Time Series Data Incorporating Prior Knowledge

Motivation: Mathematical models take an important place in science and engineering. A model can help scientists to explain dynamic behavior of a system and to understand the functionality of system components. Since length of a time series and number of replicates is limited by the cost of experiments, Boolean networks as a structurally simple and parameter-free logical model for gene regulatory networks have attracted interests of many scientists. In order to fit into the biological contexts and to lower the data requirements, biological prior knowledge is taken into consideration during the inference procedure. In the literature, the existing identification approaches can only deal with a subset of possible types of prior knowledge. Results: We propose a new approach to identify Boolean networks from time series data incorporating prior knowledge, such as partial network structure, canalizing property, positive and negative unateness. Using vector form of Boolean variables and applying a generalized matrix multiplication called the semi-tensor product (STP), each Boolean function can be equivalently converted into a matrix expression. Based on this, the identification problem is reformulated as an integer linear programming problem to reveal the system matrix of Boolean model in a computationally efficient way, whose dynamics are consistent with the important dynamics captured in the data. By using prior knowledge the number of candidate functions can be reduced during the inference. Hence, identification incorporating prior knowledge is especially suitable for the case of small size time series data and data without sufficient stimuli. The proposed approach is illustrated with the help of a biological model of the network of oxidative stress response. Conclusions: The combination of efficient reformulation of the identification problem with the possibility to incorporate various types of prior knowledge enables the application of computational model inference to systems with limited amount of time series data. The general applicability of this methodological approach makes it suitable for a variety of biological systems and of general interest for biological and medical research.


INTRODUCTION
Boolean networks (BNs) are discrete-time systems, whose variables can take only two possible values (i.e., 0 and 1). Since Stuart Kaufman firstly introduced BNs in Kauffman (1969) for qualitative description of gene regulatory interactions, BNs have attracted great attention from many scientists and several results have been proposed, for instance, analysis (Albert and Barabási, 2000) and control (Fauré et al., 2006). An overview can be found in Wang et al. (2012) and a database for established models and compatible tools has been introduced (Naldi et al., 2015).
Mathematical models are important to explain dynamic behavior of a system and to understand the functionality of system components (Grieb et al., 2015) and can help scientists to design model-based targeted therapy and diagnosis (Fumia and Martins, 2013). Hence, the inference of models capturing the relevant behavior of the system is an important topic. The inference can be based on the connection of known biochemical reactions, like BN model for the yeast cell cycle in Davidich and Bornholdt (2008), or on experimental data, if the latter is the case it is also called the identification problem. One of the first approaches to identify a BN was REVEAL which is based on mutual information (Liang et al., 1998). In Akutsu et al. (1999) a similar but less complex approach is presented. Both cannot handle errors in the dataset which was solved in Lähdesmäki et al. (2003). The modeled quantities are not Boolean in the experimental data and need to be binarized first. For the binarization several approaches can be found in the literature ranging from mixture model based clustering (Zhou et al., 2003) to more complex methods where the significance of a jump in the time series is estimated in Hopfensitz et al. (2012). A comparison of some identification and binarization approaches and their combinations can be found in Berestovsky and Nakhleh (2013). Most identification approaches are based on previously binarized data, but there also exist approaches directly based on continuous data (e.g., Karlebach and Shamir, 2012). In Higa et al. (2011) the data is considered as given constraint and the set of systems fulfilling the constraints is searched. This approach was then further improved by reducing the sensitivity to noise in Ouyang et al. (2014). An example of recent research is the identification of Boolean models for transient dynamics after perturbations from time course data with answer set programming (Ostrowski et al., 2016). A BN can simply be extended to a Boolean control network (BCN) by considering manipulated external stimuli as control signal of the network. Recently, a powerful tool called semitensor product (STP) of matrices has been proposed in Cheng (2001), which can convert the dynamics of BCNs into a model where all information of the dynamics and the structure of the BCN is contained in two matrices (Cheng et al., 2011a). Using the STP based matrix description of BCN several approaches for identifying BCN have been proposed (Cheng and Zhao, 2011;Fornasini and Valcher, 2014;Zhang et al., 2017a).
However, in general, in order to identify the dynamical model of a BCN from its input and output data, a huge number of data is required (Cheng and Zhao, 2011;Cheng et al., 2011b). Though, in practice, data size is limited by the cost of experiments (Geier et al., 2007). In order to reduce the search space and improve the accuracy of the model, the benefit of biological prior knowledge should be taken into consideration. Cheng and Zhao (2011) pointed out that, if the network graph is known, then the data required can be reduced considerably. In the literature there are several approaches to include different types of prior knowledge. For example the known network structure and known steady state activity is considered in Videla et al. (2015). Moreover, two common properties of the Boolean function, canalizing and unateness, can be further utilized according to Breindl et al. (2013) and Faisal et al. (2010). A Boolean function is canalizing, if a variable takes on a certain "canalizing" value, then the output of the boolean function is always the same (Waddington, 1942). Different from canalizing function, an unate function has monotonic properties, which in biology indicates that a gene acts exclusively as an inducer or as an inhibitor for the expression of another gene (Porreca et al., 2010). The prior knowledge is used in different ways either by introducing additional constraints in the optimization (Breindl et al., 2013), or reducing the number of parameters in the optimization (Cheng and Zhao, 2011). In Dorier et al. (2016) and Terfve et al. (2012) genetic algorithms are used to handle the complexity problem of large networks while satisfying prior knowledge network graphs as constraints. However, these approaches to handle prior knowledge are not compatible and the advantages of different types of prior knowledge can not be combined. In the approach proposed in this paper, all different types of prior knowledge can be utilized simultaneously and it can additionally handle hypotheses for interactions, which could be used for researcher bias free distinction between alternative hypotheses. Furthermore existing approaches can not handle the case that at some time instances some measurement values are missing, which cannot be avoided in practice due to the limitation of measuring techniques like mass spectrometry-based proteomics.
In this paper, we consider the identification problem of BCNs utilizing biological prior knowledge. A part of the results was presented at the 56th IEEE Conference on Decision and Control in Melbourne (Zhang et al., 2017b). However, the BCN model considered in Zhang et al. (2017b) contains a general output equation. By applying prediction error method (PEM), a highdimensional BCN (i.e., 2 n × 2 n+m ) cannot be avoided. Different from that, although the handling of unmeasurable processes is considered in this paper, the proposed approach leads to a low-dimensional matrix for PEM. Besides, more prior biological knowledge is considered in the paper, like potential interactions, known attractors and limit cycles. Moreover, it is discussed how to deal with alternative hypotheses for interactions and missing measurement points. The main contributions of this paper are as follows: • A suitable way to handle the prior knowledge such as known network graph, hypotheses for interactions, canalizing and unateness properties or attractor is introduced. For this purpose the BCN is described by two matrices with unknown parameters as entries. If possible, some parameters are inferred directly. Otherwise, relationships between the parameters are set up. • An approach to deal with the identification of BCNs, in particular, from noisy measurements and missing data points is proposed. The identification problem of BCNs is formulated as a nonlinear pseudo-Boolean optimization, which can be equivalently transformed into a linear binary optimization problem and then solved efficiently.
The remainder of the paper is organized as follows. Section 2 introduces some fundamental definitions and notations. In Section 3, the identification problem of BCNs addressed in this paper will be formulated. Section 4 introduces a way to utilize prior knowledge in identification procedure. The formulation of identification problem of BCNs as an integer linear programming problem is derived and an example is given in Section 5 to illustrate the approach. Finally, a short discussion on the advantages and limitations of the proposed approach is given in Section 6.

PRELIMINARIES
In this part, we list some necessary notations, which will be used in the subsequent sections.
The concept of the semi-tensor product of matrices (STP) has been introduced by Cheng et al. (2011a). The STP of two matrices A ∈ R m×n and B ∈ R p×q is defined as where ⊗ is the Kronecker product and l = lcm{n, p} is the least common multiple of n and p. The following property of the STP will be used in the subsequent sections. m,n] is the swap matrix (Cheng et al., 2011a).
So the order of two vectors which are multiplied can be altered by multiplying a suitable matrix from the left, this is also called the pseudo-commutativity of the STP. In the following parts the symbol ⋉ will be omitted.

PROBLEM FORMULATION
System identification is the determination of a model describing the dynamic behavior of a system based on measured data and known perturbations. In the context of Boolean modeling it is assumed that the transient behavior of the system can be qualitatively described by a finite number of Boolean states and that the interaction of these states can be described by Boolean functions. The perturbations are inputs to the system and cause transient behavior of the interacting states in the system. A measured time series of inputs and states form together the data basis for the identification. Depending on the system which is to be modeled, the states might represent the activity of genes or the abundance of proteins and the perturbations could be a stress like heat or oxygen or a chemical substance. In the following the identification process will be formulated as mathematical optimization problem. Therefore the mathematical model of a BCN needs to be defined first. A Boolean control network (BCN) can be described by the following equations (Cheng and Qi, 2010): . . .
are, respectively, the state vector, input vector at time t, f i are logic functions. At the discrete time instances t the state variables are updated synchronously according to the logic functions f i . As shown in Cheng and Qi (2010), a vector form of Boolean variable X i , i = 1, 2, · · · , n can be simply expressed as According to Cheng and Qi (2010), (2) can be equivalently represented in a vector form: where S i ∈ L 2×2 n+m , i = 1, 2, · · · , n are logical matrices. Multiplying all Equations in (4) together, there is where L ∈ L 2 n ×2 n+m is a logical matrix and Col i (L) = ⋉ n j=1 Col i (S j ), i = 1, 2, · · · , 2 n+m . A polynomial P ml : R k →R with k variables {θ 1 , θ 2 , · · ·, θ k } is called multi-linear polynomial, if its degree in each variable is at most 1 (Alon et al., 1991). So, a multi-linear polynomial can be generally expressed as where c, c i , c I α ∈ R for I α ⊂ V = {1, 2, · · · , k} and the set I α has a cardinality of at least 2, i.e., |I α | ≥ 2, α = 1, 2, · · · , q.
Generally, the identification problem of BCNs can be described as reconstruction of Boolean functions f i , i = 1, 2, · · · , n that explain the experimental data as well as possible. Because of equivalent representation of a Boolean function by a logical matrix, the identification problem is reformulated as searching for logical matrices S i ∈ L 2×2 n+m , i = 1, 2, · · · , n based on the input and measurement state data.
Note that any logical matrix in L 2 a ×2 b can be expressed by multi-linear polynomials in a binary parameter vector θ of dimension a · 2 b . For example, any logical matrix in L 4×8 can be expressed by a binary parameter vector θ = [θ 1 θ 2 · · · θ 16 ] T as where the superscript T denotes the transpose. In this way, each realization of the binary parameter vector θ ∈ D a2 b corresponds to a unique logical matrix. It is straightforward to equivalently convert this logical matrix into readable logical equations. Based on this, the objective of the paper is to find a binary parameter vector θ, such that dynamic behavior of the BCN (5) is consistent with the important dynamics captured in the observed inputstate data.

INCORPORATION OF PRIOR KNOWLEDGE
In this section, we shall show how to utilize known network graph, potential interactions, canalizing and unateness properties and attractors in the identification procedure.

Known or Potential Interactions
Often some or all interaction partners are known in a biological system which is subject of identification. This knowledge can come from databases or can be constructed based knowledge about the underlying biochemical reactions. In some cases a known signaling network is to be complemented and different hypothesis for potential interactions shall be evaluated. If all interaction partners and the direction of the interactions are known, the underlying directed network graph of the BN is known.
In graph theory, a directed graph can be denoted by G = {V, E}, where V is a finite set of nodes and E ⊂ V × V is a finite set of edges (Bollobas, 2012) Cheng et al. (2011a), a BCN can be represented by a directed graph, where each gene is considered as a node. If there is an edge from X i → X j , then X j is affected by X i .
Assume that a directed graph for a BCN G = {V, E} is known. Then we have the following result. Lemma 2. If the node X i is affected by w nodes, then 2 w binary parameters are enough to describe the corresponding logical matrix S i .
Proof: As the node x i is affected by w nodes, then the Boolean function can be represented in a vector form as where the matrix S i is a logical matrix of dimension 2 × 2 w . Recall that the logical matrix S i is a matrix containing only columns belonging to 2 (Cheng et al., 2011a). Hence, 2 w binary parameters are enough for the description of the logical matrix S i .
An example is given below to express logical matrices of a BCN with a known network graph with the help of binary parameters.
Example 1. Consider a BCN as follows.
where the network graph of the BCN is shown in Figure 1 (Cheng and Zhao, 2011). According to Cheng and Qi (2010), the algebraic form of the BCN is obtained, where the logical matrices S 1 , S 2 ∈ L 2×4 can be expressed by the binary parameter vector θ = [θ 1 θ 2 · · · θ 8 ] T in the following form: Potential interactions can be treated in the same way as known interactions as long as all of them could potentially be simultaneously true. If there are two alternative hypotheses and the question is which fits better to the data, then this can be done by introducing a constraint on the parameters θ .
Example 2. Assume that X 1 is influenced either by X 2 or by U 1 , this could be ensured by imposing the constraint

Canalizing Boolean Functions
The concept of "canalizing" values in Boolean functions was introduced in developmental biology in 1940s (Waddington, 1942). The idea is, that one input is dominant and if it takes a certain value it determines the output. After that, in order to explain the phenomenon that absence of repressor or high levels of allolactose assures the operator cannot bind repressor in lac operon of the bacterium Escherichia coli, Kauffman applied this concept to BN modeling of gene regulatory networks (Kauffman, 1974). Canalizing functions are defined as follows.
, 2, · · · , n} and a Boolean function g(X 1 , · · · , X i−1 , X i+1 , · · · , X n ) and a, b ∈ D, such that where a is called the canalizing value for the variable X i and b is the canalizing output value (Kauffman, 1974).
Based on Definition 1, this prior knowledge can be translated into imposing a specified value in the corresponding logical matrix. Assume that the logical matrix for the canalizing function (10) is denoted as S and the canalizing value a and canalizing output b can, respectively, be expressed in a vector form as δ 2−a 2 and δ 2−b 2 . Then, we can get the following result. Theorem 1. Given a canalizing function (10). The corresponding logical matrix S ∈ L 2×2 n satisfies where W [2,2 i−1 ] is the swap matrix.
Let's take an example to illustrate the result of Theorem 1.
Example 3. Consider the BCN (7). Assume that the Boolean function f 1 is a canalizing function in x 2 for a canalizing value δ 2 2 and the corresponding canalizing output is δ 1 2 . Due to the canalizing property, the logical matrix S 1 can be reduced to It can be checked that S 1 u 1 δ 2 2 = δ 1 2 , no matter whether u 1 = δ 1 2 or u 1 = δ 2 2 . Note that the logical matrix S 1 contains only two binary parameters (i.e., θ 1 and θ 3 ). It shows that using canalizing property can reduce the number of binary parameters.
As an important subclass of canalizing function, k-canalizing function is defined as follows.
It is necessary to point out that different from the approaches proposed in Breindl et al. (2013) and Faisal et al. (2010), some binary parameters can be directly inferred, no matter which canalizing value the canalizing variable takes.
Moreover, due to f 2 (0, 1) = 0, there is Remark 1. Theorem 1 implies that considering canalizing property of a Boolean function, the corresponding logical matrix can be expressed with fewer binary parameters. For instance, if a Boolean function f (X 1 , X 2 , · · · , X n ) is a k-canalizing function, then 2 n−k different binary parameters are enough to represent the corresponding logical matrix.

Unate Boolean Functions
The behavior of some substances or genes are well studied and it is known that they act as suppressing or activating in all reactions they are involved. If they always act inhibiting they have the so called negative unateness property. For the case that a quantity exclusively induces the expression of another gene or substance it has the positive unateness property (Porreca et al., 2010). For the mathematical modeling of the unatess properties let us consider another important type of Boolean functions, which is called the unate function (Breindl et al., 2013).
Example 5. Consider the Boolean function x 1 = f 1 (x 2 ), this function f 1 is defined by two unknown parameters θ 1 and θ 2 . Assume that the Boolean function f 1 is a unate function with respect to x 2 , which satisfies (13). As the first step, the matrix S 1 δ 1 2 and S 1 δ 2 2 are calculated, which yields Then, the inequality constraint is

Known Attractors or Limit Cycles
When the BCN is not perturbed for a sufficiently long time it reaches the steady state. The steady state of a BCN can be exactly one state (i.e., attractor) or a fix cycle of some states (i.e., limit cycle). Attractors or limit cycles are assumed to determine the phenotype in the cell differentiation (Huang and Ingber, 2000). The experimental setup to measure the steady state of a system is simpler and measurements are easier to reproduce compared with transient dynamics. As a result, the steady state of the BN is often already known when the perturbation experiments for identification of the transient behavior are carried out. This knowledge can be utilized as follows. An attractor corresponds to a self loop in the reachability graph. For a given input combination this fixes one specific coulumn in the matrix L. For the constant input u(t) = δ i 2 m and the constant state x(t) = δ j 2 n the k-th column is known to be Col k (L) = δ j 2 n with k = (i−1)2 n +j. A limit cycle can be analyzed in a similar manner. For the given state sequence of the limit cycle of length T and the constant input u(t) = δ i 2 m one can calculate T columns of L. For each time instant t of the cycle the actual state x(t) = δ j 2 n and the next state x(t + 1) = δ w 2 n is known. The information of this known transition is used by setting the k-th column to Col k (L) = δ w 2 n with k = (i − 1)2 n + j.

IDENTIFICATION APPROACH
In this part, the identification problem of BCNs will be studied. At first, it will be shown that the identification problem can be reformulated as a nonlinear pseudo-Boolean optimization problem by applying the idea of the prediction error method.
The pseudo-Boolean optimization can be transformed into an equivalent linear binary integer programming problem that can be solved more efficiently. Then, we give a way to deal with missing measurement values. Finally, we discuss how dependencies between measured substances can be handled.

Optimization Problem
The prediction error method (PEM) is one of the most widely used identification methods (Isermann and Münchhof, 2011). The basic idea behind this method is to choose parameters to make the difference between a prediction based on the model and the measured values as small as possible. As the PEM minimizes the prediction error in the identified system, errors in the data set due to noise need no special treatment. Obviously the more noise is expected in the data set the more data should be acquired for identification of a reliable model. Before applying PEM, it is necessary to specify a measure of prediction error. In information theory, the Hamming distance d(X, Y) between two vectors X, Y ∈ D n is defined as the number of positions, in which the entries differ (Hamming, 1950).
As each entry in the vectors X and Y belongs to the Boolean domain {0, 1}, (19) can be equivalently written as Furthermore, let x i , y i be, respectively, the vector form of [X] i and [Y] i . Then, it is straightforward to get Based on this, the Hamming distance d(X, Y) can be rewritten as Assume that the observed input and state data is {(U(t), X(t)), t = 0, 1, · · · , T}. The vector form of the input data {U 1 (t), U 2 (t), · · · , U m (t)} and state data {X 1 (t), X 2 (t), · · · , X n (t)} are denoted, respectively, as u 1 (t), u 2 (t), · · · , u m (t) and x 1 (t), x 2 (t), · · · , x n (t). Since the logical matrix S i for the state variable X i can be represented by the parameter vector θ , we simply denote them as S i (θ ). Suppose that the state variable X i can be influenced by the variables X j 1 , X j 2 , · · · , X j k . According to (5), it is easy to get expression of the predictionx i (θ, t): Recalling (21) and (22), the PEM method will estimate the binary parameters by minimizing the prediction error, i.e., Furthermore, the optimization problem (24) can be equivalently rewritten as which is actually equivalent to Next, it will be shown that the optimization problem (25) can be formulated as a pseudo-Boolean optimization (i.e., optimization of pseudo-Boolean functions). A pseudo-Boolean function is a mapping from a finite number of Boolean variables to a real number and can be uniquely represented by a multi-linear polynomial (Boros and Hammer, 2002). As mentioned before, any logical matrix can be expressed by a multi-linear polynomial. After calculation, the term can be represented by a multivariate polynomial.
where c, c Q β ∈ R for Q β ⊂ V = {1, 2, · · · , k} and the factor r Q β ,j , ∀β, j is a natural number. In addition, using the property of Boolean variables θ r i = θ i , ∀r ∈ Z + , the multivariate polynomial (26) is easily transformed into a multilinear polynomial. Consequently, the term T t=0 n i=1 x T i (t) · x i (θ , t) can be described by a multi-linear polynomial (6) and the optimization problem (25) is transformed into a pseudo-Boolean optimization problem So far, several different ways to handle the nonlinear pseudo-Boolean optimization problems (27) exist, such as reduction to an equivalent linear or quadratic binary programming problem, branch-and-bound method, linear approximations (Boros and Hammer, 2002;Crama and Rodrí-guez-Heck, 2017). As the linear programming relaxation of an integer linear program can be solved efficiently and based on the solution integer solutions can be found, in this paper we consider "linearization", so that nonlinear binary optimization can be reduced to integer linear program (Crama and Rodrí-guez-Heck, 2017). The key is to introduce auxiliary Boolean variables z = [z 1 z 2 · · · ] T to replace the nonlinear monomial j∈I α θ j in (6) by means of the AND-expression z α = j∈I α θ j . Simultaneously to satisfy the AND-expression, linear inequalities as constraints are considered to get feasible value of the nonlinear monomial j∈I α θ j . Finally, an optimization problem equivalent to (27) is obtained as The constraints in the optimization problem in (27) can be complemented by additional constraints representing the prior knowledge of alternative hypotheses or unateness as shown in Section 4.1 and Section 4.3, respectively.
Remark 2. It is important to note that minimizing or maximizing a pseudo-Boolean function is known to be N P-hard (Crama and Rodrí-guez-Heck, 2017). However, Breindl et al. (2013) shows that the optimization problem (28) can be solved using a relaxed problem, i.e., linear programming solver based on the simplex method, which requires less computational effort than mixed integer linear program. The relaxed problem delivers an integer as optimal solution, which is also an optimal solution of the optimization problem (28).

Handling of Large Scale Networks
With modern measurement techniques it is possible to quantify a huge amount of substances simultaneously. A Boolean network which describes the observed interactions is then also of large scale. But the number of substances which are direct relevant for the regulation of certain substance is usually limited, in other words the connectivity inside the network is bounded. For instance, as pointed out by Arnone and Davidson (1997), the connectivity is bounded by 8. Without prior knowledge the complexity of the algorithm is O = 2 n+m as all state and input combinations have to be considered as potential regulators for all states, even though only some of them are relevant in the end. This would limit the applicability of the approach to rather small networks. If one has hypotheses about potential interaction partners and the number of potential regulators per state is limited by a set of k variables, then the complexity of the algorithm is O = 2 k , as the regulative functions for each state can be inferred separately. The hypotheses for the interaction partners are not necessarily based on prior-knowledge, but could also be computed based on the data set. In Margolin et al. (2006) an approach is presented, which is based on the information theoretic concept of mutual information ranking and the restriction to pairwise interactions that leads to a very good scaling with big data sets.

Handling of Missing Measurement Values
Dependent on the measurement technique it is sometimes not possible to measure all states at all time instances and the missing values must be handled in the data analysis. There are approaches in the literature to compute an imputation e.g., for microarrays in Gan et al. (2006) and gel-based proteomics in Albrecht et al. (2010). These approaches are based on interpolation or heuristics. An alternative is to use a data analysis approach which can deal with incomplete data matrices.
A missing measurement value can be estimated during the identification by adding additional binary parameters in the identification process. Because of vector expression of states, all possible states belong to the set 2 n . In this way, n binary parameters are enough for vector expression of a completely unknown state at time k. For example, if n = 2, then we can generally express the unknown state as Furthermore, as the states of the system are known partially, then the number of binary parameters can be reduced accordingly. So for each missing value one parameter is added to the optimization and the imputation for this value is calculated which fits best to the other dynamic behavior of the system.

Handling of Unmeasurable Processes
In some systems post transcriptional protein-protein interactions induce dependencies between the measured abundances similar to the transcriptional regulation. This leads to the situation that the transcriptional regulation can not be observed directly and the identification procedure needs to be adapted accordingly (Geier et al., 2007). The dependencies between the states and the measured outputs can be included in boolean models easily by adding Boolean functions mapping from the actual stats X(t) to the measured outputs Y(t): where [Y(t) = Y 1 (t) Y 2 (t) . . . Y p (t)] T ∈ D p is the output vector at time t , h i are logic functions. All structural information on the logic functions can be expressed with a logical matrix H which can be derived analogous to Equations (2-5). All approaches presented in this paper can be extended for the BN model with output mapping. As additional logic functions are to be identified, additional unknown parameters are added and these parameters cannot be separately identified from the parameters of the regulative functions, which impacts the computational burden drastically (Zhang et al., 2017b) .

Influence of Noise
In real world experiments measurement noise is unavoidable. With a sophisticated binarization method the influence of additive noise can often be suppressed (Hopfensitz et al., 2012). But noise can still lead to wrong binarized values in some cases and consequently errors in the input to the identification method  cannot be totally avoided. As the presented approach is based on an optimization, the network which optimally fits to the observed data is found. Inconsistent transitions caused by noise in the data set can be handled directly and lead to an identification result with a non-zero prediction error. If, due to noise, the observed transitions would lead to an identification result which is contradictory to prior knowledge, the identification approach ignores these transitions directly.
Example 6. Consider the BCN for oxidative stress response pathways with the PI3-Kinase-Akt pathway given in Sridharan et al. (2012).
(33) The information about the steady state is used as described in Section 4.4 to determine one parameter in each matrix, which reduces the number of unknown variables to 34. In the next, we apply the proposed approach to identify the model of the BCN from the given input-state data. Solving the optimization problem (28), in total, 31 unknown binary parameters can be determined. The identification result is depicted in Figure 3B and the identified matrices are as follows, It can be seen that the logical matrices of the Boolean functions for X 5 and X 6 can not be uniquely determined. Combined with an additional information about activating or suppressing properties of the states, for instance, X 4 and X 5 are, respectively, activator to X 5 and X 6 , the complete model can be uniquely reconstructed. The canalizing property of X 4 and X 5 can be utilized as described in Section 4.2. If this information is not available, one could conduct additional experiments with different stimuli and combine the data to have full reconstruction of the model as depicted in Figure 3C.

DISCUSSION
The proposed method facilitates the incorporation of various types of prior knowledge. The optimization problem can be solved by efficient linear programming solvers. By using the simplex method one can guarantee to find the network which optimally fits to the observed data. In comparison, the genetic algorithms based approaches may not guarantee the optimal solution. The proposed method is developed for synchronous Boolean networks. It can be applied to large scale networks, if the connectivity of the network to be identified is limited with aid of prior knowledge or application of information theory.
In future we plan to investigate data-based approaches to infer the connections in large networks and automated partitioning into smaller subsystems (e.g., with an adapted approach from discrete event systems like Saives et al., 2018). We also work on a new method for the binarization based on the idea that the qualitative system behavior before and after the binarization shall be the same.

AUTHOR CONTRIBUTIONS
TL, ZZ, and PZ conception and design of research. TL and ZZ performed simulation and analyzed data. TL, ZZ, and PZ interpreted simulation results. TL and ZZ prepared figures. TL, ZZ, and PZ drafted manuscript and approved final version of manuscript.

FUNDING
This work is supported by the Federal State of Rhineland-Palatinate, Germany in the framework of the project Complex Data Analysis in Life Sciences and Biotechnology (BioComp).