Systems biology: the big picture.

Genomics, proteomics, and metabolomics have all vastly advanced our understanding of human biology and disease. But the functioning of even a simple system such as a single yeast cell or bacterium is much more complicated than the sum of its genes or proteins or metabolites; it’s the activity of all those components and their relationships to one another that add up to a living organism. Recognizing that complexity, the emerging field of systems biology attempts to harness the power of mathematics, engineering, and computer science to analyze and integrate data from all the “omics” and ultimately create working models of entire biological systems. 
 
“Traditionally, scientists—toxicologists included—have relied on a reductionist approach to biology,” says William Suk, director of the NIEHS Center for Risk and Integrated Sciences. Even now, many studies examine complex systems by looking at cellular components in isolation. For instance, a common experiment involves using DNA microarrays to observe the effect of a chemical exposure on thousands of genes at once. This technique can quickly tell a scientist which genes may be vulnerable to that exposure. But a systems biology approach would attempt to model not only the chemical’s effect on gene expression but also how that expression will affect protein function, and in turn how the exposure will affect cell signaling. “There’s nothing wrong with what we’ve been doing,” Suk says. “But systems biology is going to take it to another level.”


INTRODUCTION
Regulation of gene expression is one of the most important processes in the living cell, transmitting static information encoded in the DNA sequence into functional protein molecules, which consequently control most of the cellular processes. The regulation of gene expression depends on the recognition of specific promoter sequences by transcriptional regulatory proteins which allows binding of RNA polymerase and initiates transcription. The transcriptional programs are modified as cell progresses through development or through a reaction to changing environmental conditions.
Developments in microarray technology have permitted the recording of changes in gene expression over time during the cell cycle or other developmental processes. As the regulation of transcription is a dynamic process, analysis of the time series of changes in RNA amounts during the cell cycle can lead to the discovery of causal relations between genes and their regulators. Since, the gene expression data are the result of network interactions between regulators and target genes, it is reasonable to trace the interaction networks from microarray gene expression data. As mRNA levels are the result of the action of such networks, it should be possible to reverse engineer the network architecture from the microarray data.
Cell cycle control has been intensively studied in the budding yeast Saccharomyces cerevisiae and large transcriptomic databases of the alteration of RNA synthesis during the cell cycle have been created. Genome-wide microarray gene expression data relevant to the yeast cell cycle have been collected in a parallel manner (1,2). The data were analyzed using a variety of clustering methods (3,4) for the identification of cell cycle, or co-ordinately controlled genes. A singular value decomposition (5-7) was used to model the gene expression data. Instead of grouping genes, according to the similarity of their gene expression patterns, transcriptional regulatory networks were found by genome-wide location analysis identifying which transcriptional regulators bind to which promoters (8,9). Potential transcriptional regulatory networks were identified independently in the work of Lee et al. (10).
Several procedures for inference of transcriptional regulatory networks from experimental microarray data were published in recent years (11)(12)(13). The methods identified upstream regulatory genes by modeling the regulatory process using differential equation models of gene expression control. The goal was to fit a generalized linear model using a set of putative regulators to estimate the transcription pattern of a specific target gene. Alternatively, Woolf and Wang (14) used fuzzy logic for the prediction of transcriptional regulators. Nachman et al. (15) used a kinetic model in connection with dynamic Bayesian networks to infer the structure of transcriptional regulatory networks and the regulators from gene expression time series. One potentially useful approach, introduced by Bar-Joseph (16), combines genomic information with gene expression data analysis. This approach was extended recently by Wang et al. and Makita et al. who combined the analysis of gene expression data with promoter sequence analysis (17,18), or the sigma factor binding sequence motif (18). Such a combined approach has the advantage of the introduction of additional information independent of the gene expression data.
In this paper, we present an alternative method for the prediction of target gene regulators based on a nonlinear differential equation model of gene expression (19). In the beginning, a set of all potential regulators is selected. For a selected set of target genes, the procedure iteratively picks individual genes from the pool of possible regulators and applies the model to fit the gene expression profile of the target gene using the expression profile of the candidate regulatory gene. This procedure is repeated for all target genes and all possible regulators. Those regulators that are able to model the target gene expression profile correctly are considered to be the true regulators.
The procedure was applied to 40 target genes of S.cerevisiae transcriptomic data (2,11). A pool of 184 candidates for potential regulators was selected by combination from the previous reports of Lee (10), Chen (12), Chen (11), and the database of yeast transcriptional regulators YEASTRACT (http://www.yeastract.com). The data were also analyzed using more common linear model. The results of the presented algorithm were compared with the data computed using the linear model, a generalized linear model published recently by Chen (11), all the results were verified by comparison with independent experimental data collected in YEASTRACT. Results show that the method is capable of correctly identifying regulators and predicting their function as activators or repressors.

Dynamic model of transcriptional control
The presented model results from our previously published work (19)(20)(21) on the dynamic simulation of genetic networks. The model is derived by assuming the recursive action of regulators on the target over time. The model assumes that the regulatory effect on the expression of a particular gene can be expressed as a combinatorial action of its regulators. The target gene expression level z at time t + dt can be derived from the expression levels of regulators (y j ) at time t and the regulatory weights (w j ) of all genes controlling the target gene. Thus g, a regulatory effect for a particular gene, is g % X j w j y j À b‚ 1 where m is the number of regulators controlling the gene. The parameter b represents transcription initiation delay or an unspecific bias caused by regulatory effects associated with gene expression but independent of the particular regulator. Let the rate of expression of a target gene (dz/dt) be given by the regulatory effects of other genes r and the effect of degradation x. The degradation effect is modeled by the kinetic equation of a first order chemical reaction x ¼ k.z. The function r represents the regulatory effect g (Equation 1) of regulators j ¼ 1, 2, . . . , m, transformed by a sigmoidal transfer function for m regulators. The whole model for the control of target gene expression z has the form: The constant k 2 represents the rate constant of degradation of the target gene product, and k 1 is its maximal rate of expression. Here, we consider the case of only one transcriptional factor. In such a case Equation 3 simplifies to where y is approximated with a polynomial of degree n y % a 0 þ a 1 t þ a 2 t 2 þ ::: þ a n t n : 5 Coefficients {a 0 , . . , a n } are computed from the experimental gene expression profile using a least squares minimization procedure. The polynomial fit is used as an approximation of an underlying 'true' expression profile, which is obscured by experimental errors. It is assumed that the weight of the experimental error is the same for all points of the measurement. In principle other approximations can be used [the topic of modeling time series measurements in microarray experiments was discussed in the paper of Bar-Joseph (22)]. This simplified version of the model (Equation 4) was used for the identification of the target gene regulators throughout this paper. The degree n of the polynomial must be chosen so that it reflects the rate of changes in gene expression in the given experiment and must be chosen for each experiment individually. In the case of the yeast cell cycle analyzed here, the degree n ¼ 6.
Using expression profiles Z {z(t t )} of the target and Y {y(t t )} of the regulator genes measured at time points t = t t , t = 1,2, ..., Q, we search for all the gene profiles Y 2 { Y i , i =1,2, ..., m} (the pool of all m potential regulators) that minimize the mean square error function: where {z c (t t )} denotes the reconstructed profile of z(t) 2 Z at time points t ¼ t t , t ¼ 1,2, . . . , Q for Q data points, computed using the model (Equation 4).
The problem now becomes an optimization problem, where the expression profiles of Z and Y are supplied to estimate parameters w, b, k 1 , k 2 of the model (Equation 4) that minimize the error function (Equation 6).
For comparison we applied to the same data more commonly used linear model of the form and computed the parameters d i (i ¼ 1 . . 2) by minimizing error function 6.

Computational algorithm
The method aims to select a set of potential regulators of a particular target gene by estimating the expression profile of the target gene. The method searches for possible regulators from a pool of transcriptional regulators using least squares minimization and the model Equation The procedure was repeated 100 times for each combination of regulator-target with randomly selected initial parameter values at the beginning of the optimization procedure. The parameter set giving the smallest value of the error function was then selected.
The optimization was performed using standard Levenberg-Marquardt procedure, Equation 4 was solved numerically using Runge-Kutta procedure (ode45 function in MATLAB). All computations were performed in the MATLAB environment.
For comparison of the results we identified potential regulators using the linear model (Equation 7) and the scheme given by points 1-7, with the linear model replacing Equation 4 in the step 4.

Dataset selection
To evaluate the performance of our model, we chose the eukaryotic cell cycle dataset published by Spellman et al. (2). This dataset records changes in gene expression measured as amounts of mRNA using microarrays at 18 timepoints over two cell cycle periods. The chip contained 6178 open reading frames. Using multivariate methods, Spellman identified 800 genes whose expression was associated with the cell cycle. Nevertheless, the real number of regulators controlling the cell cycle is much smaller. For the identification of yeast cell cycle regulators, we selected a pool of 184 potential regulator genes by combining data from previously published papers (10)(11)(12) and the YEASTRACT database. In order to enable comparison with previously published data, we chose 40 target genes, the same as those analyzed in the paper of Chen et al. (11).

Inference of regulators
The procedure was applied to 40 yeast cell cycle regulated target genes and 184 potential regulators. The data were in the form of a log base 2 of the ratio between the actual value of the mRNA amount divided by the value of a standard which was the same for all time points. Therefore, after exponentiation the whole time series was just scaled by the value of the standard. Before application of the algorithm the data were exponentiated to a power of 2. The least squares minimization procedure was applied to each target gene for all potential regulators.
It can be assumed that a least squares best fit of the polynomial of degree n to the target gene expression profile z p , is an approximation of the unknown real profile, which is obscured by the experimental noise, inaccuracies and natural biochemical and physiological fluctuations [any other fitting function can be used. Choosing a different function will only change the approximation and consequently the selection of the threshold values (see below), but not the principle]. If repeated measurements of expression values are available, estimation of the overall error by means of a polynomial fit (Equation 8) can be replaced by a statistical model. We chose the polynomial representation as it has widely been accepted as an approximation of unknown functions, and as the data used here were available only as averages. Its deviation from the experimental data are given by Finding the most probable regulator for the given target means finding a regulator profile, which models best the target profile using the model (Equation 4) and minimizing E (Equation 6). It can be assumed that the fit of the model to the target regulator profile is at least as good as the fit given by Equation 8, i.e. the deviation E (Equation 6) must be less than or equal to the deviation E 1 (Equation 9). Therefore we selected those regulators for which E was less than or equal to E 1 . In addition, if we look at the plot of values of E for all potential regulators of a given target sorted in increasing order of E (Supplementary Figure 1), we can see that, in most cases, several first potential regulators have E noticeably smaller than the rest (an edge can be seen on the bar graph). This means that those regulators fit the target gene profile even better than the others (we call them 'best regulators'). A summary of 'correct identification' of the regulators for all targets is given in Table 1. Correct identification means that a gene identified as a regulator for a given target was also annotated as the regulator in the independent YEASTRACT database. It is necessary to emphasize that the YEASTRACT database represents current knowledge, which is already quite comprehensive but still far from complete. Therefore, all comparisons with this database have limited information value. If the regulator was found in YEASTRACT than it was confirmed that the regulator was identified independently elsewhere. If the predicted regulator does not match with the YEASTRACT we can say only that according to the current state of knowledge the gene has not been identified as regulator in other studies. Expression profiles of the target gene, the best fitting regulator and the reconstructed target gene profiles for 12 cell cycle regulated genes are shown in Figure 1. The remaining profiles are shown in Supplementary Figure 2. Table 1 shows that the regulators selected as 'best' were identified correctly as regulators for only 35% of the targets. However, the average false positive (FP) rate, defined as the ratio between regulators identified as FPs and the total number of potential regulators, was very low (average ¼ 1.1%, data not shown). The FP rate defined here represents the number of predicted regulators which were not found in the YESTRACT database. When the criterion (E < E 1 ) for selection of the regulators was used, the probability of correct identification slightly increased (to 37.5% see Table 1) but the FP rate was doubled. Slight modification of the criterion to E < 1.1 * E 1 led to rapid increase of the probability of correct detection (see Table 1) but the FP rate increased as well. This trend continued when the criterion was softened to E < 1.2 * E 1 . Close inspection of FP rates showed that most of the increases was caused by only a few targets, namely YOR323C, YJL155C, YDR285W and YAL018C, whose profiles could be modeled by almost any regulator. The result was a dramatic increase in the FP rate. All of these profiles exhibited very high . 'min(m)' means the position of the first correctly found regulator in the list of regulators for the given target, sorted according to the value of E. % found-percentage of targets for which the regulators were correctly assigned. Correctly found regulators are defined as those which were also identified as regulators in the independent database of yeast regulators-YEASTRACT. fluctuations (see their profiles in Supplementary Figure 2) and therefore had very high E 1 . Moreover, the profiles were flat. Consequently, many regulator genes satisfied the criterion (E < E 1 ). The results for such target profiles are very difficult to evaluate and, in real situations, such genes should be excluded from the evaluation. A complete list of the 40 target genes and the predicted regulators with E < E 1 is given in Supplementary Table 2  together with the values of model parameters (w, b, k 1 , k 2 , Equation 4) and the specificity Sp of prediction. The specificity of prediction is defined here as Sp ¼ (N À FP)/N, where N represents number of potential regulators and FP was defined above. This measure is used to indicate a relative improvement in number of experiments necessary for experimental verification of the results of the algorithm compared with the whole pool of the potential regulators.
When the regulators for each target were sorted according to increasing E and the position of the first correctly identified regulator was recorded, the column 'min(m)' of Table 1 was obtained. This column shows that most regulators for a given target were correctly found among the first 5 regulators in the sorted list. For 36 targets out of 40 (90%), a correct regulator was found among the first 10 regulators in the sorted list. If the first 21 regulators in the sorted list for each target were considered, all targets got at least one correctly assigned regulator. This observation greatly simplifies experimental verification. That means that if the 10 regulators with the smallest E are first selected, 90% of the targets get at least one correctly assigned regulator. If we recall that the pool of all potential regulators considered in this paper comprises 184 genes, the algorithm restricts the amount of putative regulators by almost 20-fold.
All identified regulators (the regulators with the smallest E) could also be identified as activators or repressors according to the sign of the weight w. Predictions made by the algorithm were compared with the data in the YEASTRACT database and 77.8% of the identified regulators were also correctly identified as activators or repressors (Supplementary Table 1). If for a specific target no regulator satisfied the above mentioned criteria, it was concluded that the given target did not have a regulator in the pool of regulators. The list of predicted regulators for all targets used in this paper, together with the values of parameters of Equation 4, is given in Supplementary Tables 1 and 2.
In comparison with Chen et al. (11), who analyzed a similar dataset, only one regulator was predicted by both methods. That means that both methods identified, in substantial part, different sets of regulators. As the results from Chen et al.
were not compared with an independent data source, it cannot be evaluated whether their method gives more reliable results.
Some of the differences between the known and predicted functions could be caused by incomplete information in the YEASTRACT database. Other inaccuracies were caused by relatively high experimental noise which cannot be avoided. In the results of this paper we kept these profiles, but in the real world situation, such profiles should be excluded from evaluation. As a least squares minimization procedure was used to compute the target gene expression profile, there is always a risk that the procedure can get stuck in a local minimum of the parametric space and the optimal solution will not be reached. In such cases the real regulator is not found. This was partially avoided by running the least squares minimization procedure several times for different initial values of parameters and selecting the sets of parameters that gave the best solutions. We repeated the procedure 100 times for each pair target/regulator with randomly generated initial values and selected the parameter set which gave the best approximation of the target gene profile.

Comparison with linear model
In order to compare the results of our algorithm we processed the same data with more commonly used linear model, here given by Equation 7. Results are summarized in Table 1 and Figure 2. In both cases, the potential regulators were sorted according to increasing value of E (Equation 6) and the position of the first correctly identified regulator (as correct regulators were defined those which matched with the YEASTRACT database) in the sorted list for both models was recorded [columns Min(m) and Min(m) lin Table 1]. Figure 2A and B show histograms of distributions of these values. It can be seen that the histogram for the linear model, is broader and reaches higher values of Min(m) than the histogram for the nonlinear model.
If columns E (nolin,lin), representing the error function E (Equation 6) for the first correctly identified genes are compared, it is apparent that the goodness of fit for the nonlinear model (Equation 4) is one order of magnitude better than the linear fit. If we accept that the error of measurement in the microarray experiment is 10% of the value, and if it is required that the error of fit is within these 10%, an average threshold of E can be calculated. If 95% of all regulators has to lie within the 10% interval E ¼ 0.0268. If the regulators with the value of E higher than this threshold are excluded, then for the nonlinear model 9 genes out of 40 are excluded, but for the linear model 37 out of 40 have to be excluded. It can be concluded that the nonlinear model presented here gives markedly better results than more common linear model.
The best fitting regulators also identified in YEASTRACT were compared with the predictions made by the nonlinear model and by the predictions made by Chen et al. (11) (Supplementary Table 1 To summarize, our study provides not only a mathematical background for transcriptional regulation, but also makes correct predictions of regulators. The sign of the parameter w of the model function suggests whether the regulator acts as an activator or a repressor of the target gene. Time courses of expression of the target genes estimated by the procedure fit well with the observed ones (see Figures 1 and 2). The model can be extended to determine cooperative and competitive regulation by more than one regulator. Such extension increases the computational complexity in a combinatorial way and becomes computationally unfeasible. Currently, we are working on optimization of the computational procedure to simplify the algorithm and increase its speed.

DISCUSSION
Dynamic modeling of biological systems including genetic networks and cell regulatory networks has been around for a long time [for a review see (23)]. Some of the models successfully simulated simple real systems as the genetic network of lambda phage (20,(24)(25)(26). Yeast cell cycle gene expression data were analyzed by various clustering methods (3,(27)(28)(29) or by linear algebra (6,7,30). An alternative method of gene location analysis for identification of transcription factors was performed in the work of Lee and Iyer (10,31). A comprehensive list of yeast transcriptional regulators is maintained in the YEASTRACT database (http://www. yeastract.com). Recently, several papers on the mathematical modeling of transcriptional regulation in yeast using microarray time series (11)(12)(13)15), and a fuzzy logic approach for identification of triplets of target/activator/repressor (14) were published.
Here, we present a novel algorithm based on a more general concept of a simulated model of genetic networks (19). The model assumes that the kinetic profile of a target gene results from the activity of a particular regulator, which binds the gene upstream and initiates its transcription. From a modeling point of view it means that it is possible to generate the target gene expression profile from the gene expression profile of the regulator using the model and its parameters.
In order to find the correct set of parameters, the difference between the measured target gene expression profile and the profile computed from the expression profile of the regulator has to be minimized. Therefore the search for an appropriate regulator becomes a least squares minimization problem where all possible regulators are tested one by one to determine whether they can model the target gene expression profile. The regulators fitting the target profile best are then selected. In this paper, we show that the model is capable of correctly identifying regulators of yeast cell cycleassociated target genes and their functions (activator or repressor). In comparison with other previously published models which assume linear dependence between the time course of the regulator and the target gene expression profiles, our model assumes a nonlinear dependence given by Equation 2, which conforms better to the observed dynamic behavior of the transcriptional regulatory systems. Here we have focused on a direct one to one relation between the target and the regulator, but the model in principle can incorporate any number of regulators affecting expression of the target gene (see Equation 3).
For more complicated and indirect control processes comprising, e.g. a cascade of regulatory events, more complex interactions would have to be considered. The general model given by Equation 3, extended to all genes of the network, would still remain valid. The algorithm presented here models all combinations of the target/regulator and selects from the results those regulators, which give the best predictions. We then obtain complete information about the ability of all regulators to model each target gene profile. From this set the regulator, which models best the target is selected. Our approach is very comprehensive but on the other hand limits its extendibility to just a small number of regulators due to the high computational requirements. To systematically model all possible interactions controlling one target in a more complex regulatory network would require a large computational load. Fortunately, the algorithm can easily be parallelized and the comprehensiveness of the results might justify the use of parallel computation for full identification of such complex networks.
The presented algorithm was compared with a linear model and it was proven that the nonlinear model used here gives markedly better results than the linear model both in the sense correct identification of regulators and the goodness of fit of the computed target gene expression profile.
Previously published models [Chen et al. (11)], which iteratively incorporate more regulators in each step to improve the fit of the modeled profile to the experimentally measured target gene expression profile, suffer from principal inaccuracy; they improve the fit but do not follow the mechanism of gene expression. In reality, all members of the network perform at the same moment and the influence of each member is given by the value of the parameters associated with them. Therefore, hierarchical selection of regulators according to their capability of improving the fit of the model does not guarantee correct identification of the regulators.
Comparison of Chen et al's predictions (11) with the predictions made by the linear and nonlinear models showed that all three methods gave different result set of genes. Best fit of the regulator profile is obtained by the Chen's algorithm but at the cost of prediction of non-documented regulators for the given target. Our nonlinear algorithm selected well genes documented in YEASTRACT database and also provided reasonable goodness of fit of the target gene expression profile. The linear model (Equation 7) performed worse giving lowest fit and lowest prediction ability (see Table 1 for comparison between linear and nonlinear model).
Probably, the only correct approach would be to create particular models for all known types of transcriptional regulation. Then all possible regulators should be tested for all models and an appropriate model and the genes which best approximate the target gene profile under the particular model would be identified. Nachman et al. (15) took this approach by suggesting a model of gene expression based on Michaelis-Menten kinetics and dynamic Bayesian networks and retrieving the best network structure from gene expression time series data. Similar to Chen et al. (11) they added new regulators during the network structure reconstruction to improve the match between the predicted and known behavior of the system. In contrast with Chen et al., the search was not mechanistic but built a dedicated network structure that approached a probable model of transcriptional regulation.
Instead of attempting to model more complicated regulatory processes with the high risk of incorrect prediction, we focused here on the simplest case but with a reliable outcome.
A drawback of all published algorithms for inference of transcriptional regulatory networks, including this one, is that the candidate regulators are selected from the pool of potential regulators defined independently, usually by sequence similarity analysis, or by other genome annotation methods. If the regulator is not identified, it inevitably escapes identification by the modeling approach. The less characterized the genome of an organism is, the higher the probability of this type of error.
The procedure in principle can use not only the pool of potential regulators, but all genes immobilized on the chip. But by using this approach, the risk of identifying FPs increases. Also, other genes can have expression profiles satisfying the minimization criterion without having anything to do with regulation. Those FPs would eventually have to be sorted out anyway using some independent information. This is, in principle, the same as the a priori selection of potential regulators.
Also, when deciding that a target gene is not controlled by any of the regulators in the predefined set, the target could be controlled by some of the regulators in the set, but this effect could be masked by the inhibition of the activity of the regulator by some other regulator that is not included in the set. Such genes then would not be identified.
Transcriptional regulation is mediated by means of proteins whose active form quite often arises from post-translational modification, e.g. phosphorylation. Such changes cannot be recorded by microarrays, which measure concentration of mRNA. In the modeling using microarray data it is presumed that the regulator protein concentration profile copies the expression profile of mRNA. This assumption is not always valid and cannot be simply predicted just from the microarray data. From this point of view, the use of proteomic data instead of the microarray data in the modeling of transcriptional control seems to be more appropriate.
In general there is no best model. Only a group of models suitable for particular cases have been, or will be, designed. Each of them has particular properties more suitable for particular cases. Relevant results can be obtained by application of several approaches and by critical analysis of the results. One such model is presented in this communication. Results indicate that the model presented here captures the behavior of transcriptional regulation with good accuracy and the predicted regulators well match with documented function obtained independently. Therefore, we believe that our algorithm will be useful for interpretation of gene expression time series.
Although applied to microarray data of S.cerevisiae, the algorithm can be used for the data of any other organism, and can be analyzed using an experimental design similar to this case. In the future, the model will be developed so that it will be able to identify more complex transcriptional regulatory interactions.

CONCLUSIONS
The goal of the presented algorithm is not to recover all possible transcriptional regulatory interaction in a genetic network, but rather to focus on comprehensive analysis of the influence of all possible regulators of a given target gene and to recover basic transcriptional regulations with high confidence. Instead of trying to model a large genetic network with high probability of incorrect results we decomposed such a network into elementary interactions where a single regulator acts as an activator or repressor. The algorithm is capable of correct identification of such interactions with a higher accuracy than previously published algorithms. The algorithm not only selects the most probable regulator of a given gene but also provides information about the ability of all potential regulators to control the target gene, thus giving the possibility of investigating the role of other regulators and decreasing the probability of misidentification. The approach presented here is based on the correct physicochemical background of the system and avoids building of the control network on the basis of improvement of the fit to the experimental data, which can lead to high rate of misidentifications which are difficult to discover. The complex transcriptional control network can in principle be decomposed into the elementary networks, such as the one presented here and the final network can be assembled form them. The hurdle in this approach is the combinatorial increase in the number of necessary computations that grows with the size of the network. For large scale networks, this can lead to an unrealistic number of computations. This suggests improvements in the speed of the algorithm and incorporation of independent information as genome-wide location data, DNA sequence information and targeted biochemical and molecular biological experiments, which can substantially reduce the number of combinations necessary to perform the optimization. The algorithm presented here allows for such extensions. We plan to investigate these possibilities on a model organism and extend the algorithm to the identification of more complex transcriptional regulatory networks.