Plato's Cave Algorithm: Inferring Functional Signaling Networks from Early Gene Expression Shadows

Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes) show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA) to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are feasible for any signaling network to predict the functional topology of the network and to identify novel relationships.

When multiple experiments are performed using similar perturbations of the same signaling components, the results from these experiments can be used to further refine the interaction network, by choosing only interactions which were inferred from a large majority (or all) of the experiments, and disregarding the rest. Given enough experiments, this method ensures that only correct interactions are chosen, and random ones caused by over-fitting noise are removed.
To determine how many experiments should be considered we must determine the statistical significance of the results obtained by such a cutoff. Given L experiments from which we derive L networks, we create many (10 6 ) sets of L random networks, each set containing the same number of positive and negative interactions as the networks derived from the experimental data. For each set we count how many interactions have the same sign in C out of the L experiments (C ≤ L). We thus create the probability distribution of obataining at least n agreeing interactions in at least C of the experiments. Finally, we examine the number of agreeing interactions for every value of C in the networks derived from the experimental data, and choose the value that gives the highest statistical significance, namely the value that is most unlikely randomly given that value of C. In the experimental results used in this paper this analysis resulted in using 4 out of the 5 experiments, and obtaining 11 interactions. The probability that 5 similar random networks (with the same number of positive and negative interactions as the experimental data) will results in 11 interactions in at least 4 out of the 5 networks is less than 0.002.

II. SYNTHETIC MODEL DESCRIPTION AND METHODS
We constructed a synthetic model that consists of four signaling components and ten early genes. The signaling components are simulated as proteins that have active and inactive forms, and can switch between them. A schematic representation of this regulatory network is presented in Fig. 3A in the main text, in which signaling components are denoted by S j (j = 1..4), and early genes by G n (n = 1..10). The steady state values of this network were obtained both for the unperturbed network and for the network when one of the signaling components is perturbed, using an ode solver in Matlab 7.7.0.471 (R2008b). Perturbation were performed by changing either the activation rate or inactivation rate of the signaling components by 5-10%, which simulates adding a kinase inhibitor or a protease inhibitor, respectively. To these steady state values we added Gaussian noise with a mean of zero and a standard deviation which is some percentage of the mean, which defines the signal to noise ratio (SNR).
Below we introduce the ordinary differential equations used to simulate the synthetic network. We denote the concentration of the mRNA of gene n by m n (n = 1..10), the active and inactive forms of signaling component j by x ja and x ji , respectively (j = 1..4). The rate constant for activation and inactivation of signaling component j are a xj and i xj . The rates at which molecules are generated (translated or transcribed) and degraded are denoted by g or d, respectively, with the molecule name as subscript. The activation and repression of the genes by the signaling components is done using a Michaelis-Menten like function, where we assume a single promoter site that can be bound by a single transcription factor. The rate constant for the activation and repression of gene n by signaling component j is given by k axjgn or k rxjgn , respectively. Finally, the ratio between the basal and maximal gene production for gene n is denoted by r n Translation rate constants: g x1 = 0.5, g x2 = 0.5, g x3 = 1, g x4 = 2. Transcription rate constants: g m1 = 0.8, g m2 = 0.85, g m3 = 0.65, g m4 = 0.9, g m5 = 0.7, g m6 = 0.6, g m7 = 0.8, g m8 = 0.95, g m9 = 0.7, g m10 = 0.65. Michaelis-Menten constants for strength of ativation and repression: k ax1g1 = 0.025, k ax2g1 = 0.0125, k rx2g2 = 0.0125, k ax1g3 = 0.025, k rx4g3 = 0.011, k ax2g4 = 0.0125, k rx1g4 = 0.025, k rx4g4 = 0.0111, k ax1g5 = 0.025, k rx4g5 = 0.01111, k ax2g6 = 0.0125, k rx3g6 = 0.06667, k ax3g7 = 0.0666667, k rx4g7 = 0.0111111, k ax1g8 = 0.025, k ax3g8 = 0.0666667, k ax3g9 = 0.066667, k rx4g10 = 0.0111111. Basal gene activity ratios: r 1 = 0.1, r 3 = 0.1, r 4 = 0.1, r 5 = 0.1, r 6 = 0.1, r 7 = 0.1, r 8 = 0.1, r 9 = 0.2. Note that the basal gene activity levels are only needed for activated genes, and therefore r 2 and r 10 are not defined.

III. NOISE ANALYSIS METHODS
We applied PLACA to the synthetic network with various levels of signal to noise ratios (SNRs), in order to examine the differences between the inferred networks at different levels of noise. To quantify the similarity between two networks that are inferred by PLACA, we denote the vector holding the interaction coefficients of one network by X (holding n(n − 1) interaction coefficients, where n is the number of components in the network), and the second network by Y . Assuming that each coefficient is independent, the similarity between the two networks is given by the normalized Euclidean inner product of the two vectors, which is defined as (S19) Supporting Fig. S1A shows the mean similarity score as a function of the SNR, when applied to the coefficient values (denoted score by value, open circles), and when only the sign of the coefficients were considered (denoted score by sign, full circles). The results were obtained from 10 3 experiments, and similarity scores were evaluated between each pair, giving approximately 5 · 10 5 scores for each SNR value. The mean score is above 0.8 for SNR values larger than 20, and falls rapidly as the noise levels increase. Supporting Fig. S1A also shows the score for networks obtained after simulating five experiments and using majority rule filtering, when the scoring is done by value (squares) and by sign (full squares). It can be seen that using multiple experiments makes the method extremely robust to noise, with a similarity score of more than 0.8 even for SNR levels approaching 5.
Supporting Fig. S1B-E are the functional interaction networks reconstructed by PLACA after simulating five experiments, and leaving only interaction inferred by 3/5 of the experiments. The networks are derived from simulations with SNR values of 100, 20, 10, and 5, respectively, and show that PLACA correctly infers the functional interaction of the toy network for low noise levels. As the noise levels increase, interactions are removed, but erroneous interactions are seldom introduced. Even at low SNR values, PLACA inferrs a functional network that shows high semblance to the original biochemical network.