Introduction

Transcriptional networks have been studied in relation to recurrent gene expressions patterns, which have been interpreted previously as cell types1. In the context of network structure, network motifs2 and a human transcriptional network among 119 transcription factors (TFs)3 have been reported. Hierarchical organization of modularity was described in E. coli metabolic networks4. Additionally, network dynamics have been examined based on relations between network motifs and dynamics5 and coordination of signalling and transcriptional responses have been observed6. Another approach, co-expression analysis, has been used to study functional gene modules7,8,9,10. Ruan et al. proposed gene modules related to a subtype of human lymphoma and to yeast telomere integrity based on co-expression analyses7. Remondini et al. reported a relationship between co-expression and the cascade of MYC-activated genes in rat8. Honkela et al. attempted to identify the targets of transcriptional factors (TFs) based on ordinary differential equation models9,10. However, so far, no system-wide structure involving the transition of expression patterns has been reported in transcriptional networks.

Here, we reveal a system-wide structure in a human transcriptional network based on co-expression analyses of temporal expression profiles. Briefly, our approach was: (i) eliminate irrelevant TFs by filtering TFs based on covariance of temporal expression profiles; (ii) identify interactions connecting the filtered TFs based on goodness-of-fit and slope ratio information using a co-expression model; (iii) divide the filtered TFs based on the goodness-of-fit to the co-expression model; (iv) infer a system-wide structure in the identified interactions based on statistical significance of the interactions between two classes; and (v) simulate expression pattern transitions based on a transcriptional regulatory model deduced from the system-wide structure. We applied a proven index11 to step (i) and a proven co-expression model12,13 to steps (ii) and (iii), to ensure that the approach was reliable and that the predicted structure was convincing. We deduced a system-wide, ladder-like transcription factor cluster structure and validated predicted recurrent pattern transitions by state transition simulations.

Results

We divided 2,247 TFs selected from the Genome Network Platform (http://genomenetwork.nig.ac.jp/index_e.html) into two groups, 1,619 TFs relevant to the transcriptional network and 628 TFs that were not relevant, based on the SUMCOV11 index in which covariance was calculated between temporal expression profiles of the TFs (see Methods, Supplementary Fig. S1 and TF_class_sumcov.xls at http://debe-db.nirs.go.jp/nw/ for details). Interactions connecting the filtered TFs were identified based on information provided by the co-expression model13 (see FltdTF.zip at http://debe-db.nirs.go.jp/nw/ for details). To identify interactions, we first selected the threshold of the goodness-of-fit to the co-expression model as 0.7, which retained almost all of the filtered TFs (99% = 1,606/1,619). Threshold values higher than 0.7 decreased considerably the number of TFs that remained (see Supplementary Fig. S2), even though the discarded TFs had been identified as relevant in the filtering stage. Next, we calculated the slope ratio (see Supplementary Fig. S3) and assigned a slope ratio threshold of 0.15, which is the same as the slope ratio threshold used in a previous study13. Consequently, 80,540 interactions that satisfied the goodness-of-fit (>0.7) and slope ratio (<0.15) criteria, were identified. These interactions connected 1,601 of the 1,619 relevant TFs (99% = 1,601/1,619) (Fig. 1).

Figure 1
figure 1

Transcriptional network of the filtered transcription factors.

All the interactions satisfy two criteria, coefficient of determination r2 > 0.7 and the slope ratio <0.15. Red indicates promotive interaction; blue indicates inhibitory interaction. The network was drawn with Cytoscape33 using a force-directed layout. TFs are coloured according to their assigned classes (lower left corner).

To classify the TFs, we used an approach that differed from those used in previous studies14,15,16 where genes were grouped into clusters based on the expression profiles of the genes. In the present study, the genes were grouped into clusters based on the goodness-of-fit of the interaction; i.e., we grouped together two TFs that similarly interacted with third-party TFs (see Methods and TF_class_sumcov.xls, FltdTF.zip and ClstView.zip at http://debe-db.nirs.go.jp/nw/ for details). As a result, four TF clusters were identified in the goodness-of-fit matrix (Fig. 2). The promotive (red) and inhibitory (blue) regulation patterns in the matrix for each cluster (Fig. 2) indicated that two types of TFs existed in each cluster, implying that further clustering was required. Therefore, we conducted a second clustering using a conventional clustering method, k-means clustering with k = 2, based on the temporal expression profiles. Four TF clusters composed of two types of classes were identified; one roughly showing an upward trend, A1, B1, C1 and D1 and the other showing a downward trend, A2, B2, C2 and D2 (Fig. 3a).

Figure 2
figure 2

Goodness-of-fit matrices for the 1,619 filtered TFs in the transcriptional network.

Each element (i, j) of a matrix indicates the goodness-of-fit, measured by the coefficient of determination r2, by the depth of the colour. Red indicates bij > 0, where bij is the slope coefficient13 between the temporal expression profiles of i-th and j-th TF, suggesting promotive regulation; blue indicates bij < 0, suggesting inhibitory regulation. Letters on the dendrograms indicate the branch from which the TFs in the corresponding cluster split off. The heat-maps (right and below the matrices) show the temporal expression profiles of the TFs.

Figure 3
figure 3

Identification of a system-wide transcriptional network structure.

(a) Normalized temporal expression profiles of TFs in eight classes. Grey lines indicate the temporal profiles of the TF; black lines indicate the representative profile for each class defined as a series of medians. The length of the bar adjacent to each graph indicates the calculated similarity ratio between the unit step function and the representative profile (see Supplementary Fig. S4). The number of TFs assigned to each class is shown above each of the graphs. (b) Distributions of identified interactions between the eight TF classes. The two panels on the left show the numbers of identified interactions between the TFs in the corresponding classes; the upper and lower panel show the numbers of promotive and inhibitory interactions, respectively. The two panels on the right show the Z-scores based on the test statistic in the two-sample test for equality of proportions (see Methods for details); the upper and lower panels show the Z-scores for the promotive and inhibitory interactions, respectively. The p0 in the test statistic is the probability that an interaction with r2 > goodness-of-fit threshold and slope ratio < slope ratio threshold is observed by chance (the formula is given in Methods) and given as p0 = 2,180/6282 = 0.0055. The pink and blue shading indicates statistically significance as p < 0.005 (one-sided probability). The values (i-th row, j-th column) are for the interactions for which directionality13 was assigned such that the TFs in a class in the i-th row → the TFs in a class in the j-th column. (c) System-wide ladder-like structure of statistically significant inter-class interactions. External inputs were suggested to be supplied to classes B1 and B2.

The identified classes were associated with interclass interactions as follows. The 80,540 interactions in the network were identified as promotive or inhibitory (see NM.cys at http://debe-db.nirs.go.jp/nw/ for details) and anchored to a combination of the two classes that included source TFs and sink TFs (Fig. 3b, left panels). A two-sample test17 for equality of proportions of the interactions identified 19 promotive and 17 inhibitory inter-class interactions as statistically significant (coloured elements in Fig. 3b, right panels). When the eight TF classes were connected with the statistically significant inter-class interactions, a system-wide structure that looked like two channels bridged by interfaces was revealed (Fig. 3c). We call this a system-wide, ladder-like transcription factor cluster structure. The TF-level view (Fig. 1) can be described as a force-directed-layout18 of the network, where the TFs are positioned by their connections based on mutual relationships. This view shows that the TFs in cluster B, C and D are aggregated among themselves in the order B, C and D, while the TFs in cluster A are positioned in peripheral parts of the network. The TF-level and class-level (Fig. 3c) views were consistent with each other.

The similarity of temporal profiles was evaluated between a representative profile of each class and a unit step function that modelled the external input by the phorbol myristate acetate (PMA) that was applied at the beginning of the experiments and supplied continuously over the entire experimental time course (see Methods and Supplementary Fig. S4 for details). The calculated similarities are indicated by the lengths of the vertical bars in Fig. 3a. The B1 and B2 classes in the two channels in the system-wide structure (Fig. 3c) both showed the highest similarities among the classes and the further away a class was from B1 and B2, the lower the similarities of the class became. It is possible to speculate that the temporal profile of an external input will be deformed as the external input is processed in a channel. Based on this idea, the positioning of the classes (Fig. 3c) is reasonable because the similarity of a class with the step function decreased the further removed the class was from B1 and B2 (Fig. 3a).

Computer simulations based on Boolean functions1 were performed to validate to proposal that the states transitioned and recurred to attractors towards which patterns of gene activity converged (Fig. 4). To simulate the PMA treatment, we supplied step functions, 0 → 1 to class B1 and 1 → 0 to class B2, as external inputs. The simulation results revealed eight basins of attraction that contained multiple states (for which the attracters were indicated as α1 to α8 in Fig. 4) and 16 singleton basins of attraction that contained one state (indicated as α9 to α24 in Fig. 4). Together, the 24 basins demonstrate multistability of the system-wide, ladder-like transcription factor cluster structure. Some of attractors showed expression patterns that were reminiscent of some cell types. Attractor α1, with an “ALL OFF” state, might suggest cell death (Fig. 4, upper left). Attractors α6, α7 and α8 (Fig. 4, bottom) mimicked the final expression pattern, determined by quantitative real-time reverse-transcription polymerase chain reaction (qRT-PCR) during human THP-1 cell differentiation under PMA stimulation19, in which the representative profiles of B1, C1 and D1 were upregulated and the representative profiles of B2, C2 and D2 were downregulated compared with their initial expression levels (Fig. 3a). Attractors α9 and α10 (Fig. 4, middle left) mimicked the initial expression patterns in which the representative profiles of B1, C1 and D1 were low while the representative profiles of B2, C2 and D2 were high compared with their final expression levels (Fig. 3a). When the external inputs were supplied, the trajectories moved out of attractors, α9 and α10 and into the basin of attractors, α6 and α7. After a trajectory moved into the basin of the α6 and α7 attractors, it converged there and did not return, even when the external inputs were de-actuated. This result demonstrated the irreversibility of the expression patterns, which is similar to the irreversibility of cell types after stem cell differentiation20. These results suggested that the trajectories mimic expression pattern transitions that occur when human THP-1 myelomonocytic leukaemia cells cease proliferation, become adherent and differentiate into mature monocyte- and macrophage-like phenotypes under PMA stimulation.

Figure 4
figure 4

Computer simulations of state transitions in the system-wide, ladder-like transcription factor cluster structure.

The states of the network structure are shown in the checkerboards, where the upper row, from left to right, indicates the state of the A1, B1, C1 and D1 classes and the lower row indicates the state of the A2, B2, C2 and D2 classes. Solid arrows indicate state transitions where no step functions (described below) were supplied. Dashed arrows indicate state transitions where step functions (0 → 1 to class B1 and 1 → 0 to class B2) were supplied to mimic THP-1 cell differentiation under PMA stimulation. The ordinal numbers indicate state transitions after supplying the step functions, which model the external PMA input. Orange checkerboards indicate a final state when the enforced expression was added to the class B2 containing the MYC reprogramming factor under the condition without supplying the step functions which model the external PMA input. Αi indicates an attractor of basin.

The behaviour of a Yamanaka factor essential to dedifferentiate committed adult cells into a stem cell-like state21, MYC, was interpreted based on the system-wide, ladder-like transcription factor cluster structure. Our clustering approach positioned the stem cell reprogramming TF into the system-wide, ladder-like transcription factor cluster structure. MYC was positioned in class B2 as a hub node having promotive interactions with 34 downstream nodes (TFs) positioned in class B2 (see TF_class_sumcov.xls and NM.cys at http://debe-db.nirs.go.jp/nw/ for details). The simulation results showed that enforced expression of class B2 containing MYC maintained the state at a high level either in a basin or in a singleton basin (Fig. 4). These simulation results suggested that MYC activate the TFs that are closely associated, thereby maintaining the energy potential at a high level and keeping it in an undifferentiated proliferative state. MYC is notable among the Yamanaka factors21, as demonstrated in the effective generation of the induced pluripotent stem cells (iPSCs) through the control of histone acetylation22.

Our simulations were performed on a class level so that identical expression levels were assigned for TFs in a class. TF-level views of the expression patterns also showed that the simulation results mimicked actual expression patterns well (Fig. 5). The 0th and 1st states in the simulation results (Fig. 5a) mimicked high level expression over the network in the actual expression patterns from 0 h to 6 h after starting PMA treatment (Fig. 5b). The 2nd to 5th states in the simulation results (Fig. 5a) mimicked the spreading and aggregating areas of high level expression in the right-hand side of the network in the actual expression patterns from 12 h to 96 h (Fig. 5b). These observations further validated our system-wide network model.

Figure 5
figure 5

TF-level views of the simulated and actual expression pattern transitions in the transcriptional network.

(a) Simulated expression patterns of the 0th to 5th states after supplying the step functions, 0 → 1 to class B1 and 1 → 0 to class B2, which modelled the external input by PMA. In these images the averaged expression levels between the two trajectories (from α9 to α6 and from α10 to α7 (Fig. 4)) are displayed. (b) Actual expression patterns 0 to 96 hours after starting PMA treatment. The coloured bar on the right hand side indicates the expression level of the TFs in both panels.

Discussion

Representative structure models for transcriptional networks have been reported previously; for example, (i) a recurrent network in which multiple TFs mutually coordinated their activity changes19; (ii) master regulator positioning at the top of fixed regulatory hierarchies23; (iii) a combination of models (i) and (ii) with switching from model (i) to (ii) as time passed24; and (iv) a three-layered model in which each layer included multiple TFs and intra-layer connections3. The system-wide, ladder-like transcription factor cluster structure revealed in the present study, is novel in that two module series were found to interact competitively.

The simulations based on the transcriptional regulatory model deduced from the system-wide, ladder-like transcription factor cluster structure indicated the validity of the network structure. The simulations successfully mimicked expression pattern transitions when human THP-1 myelomonocytic leukaemia cells ceased proliferation and differentiated under PMA stimulation (Fig. 4). Further, based on the transcriptional regulatory model from the system-wide, ladder-like transcription factor cluster structure, the behaviour of MYC during dedifferentiation could be interpreted, indicating that the factor may be essential for iPSCs. Importantly, the system-wide structure on which the transcriptional regulatory model was based is a novel concept (Fig. 3c).

A two phase model has been reported for the reprogramming process that induces iPSCs24. In this model, an early probabilistic phase of gene activation was followed by a later deterministic phase. Our simulation results can be interpreted equivalently; i.e., the diverse expression patterns in the early phase eventually converged into a fixed pattern of an attractor that recurred (Fig. 4). The simulations also suggested validity of the system-wide, ladder-like transcription factor cluster structure for the reprogramming process, because the results (Fig. 4) matched the interpretation based on the two phase model.

Additionally, the system-wide, ladder-like transcription factor cluster structure exhibited some of the properties of a self-organizing system25, namely, positive feedback and multiple interactions as well as a pattern at the global level that arose from numerous interactions among lower-level components (Fig. 1, 3c). The reproduction of the expression patterns in the transcriptional network could be interpreted based on the paradigm of self-organization; i.e., positive feedback by mutual inhibitions between two channels in the system-wide, ladder-like transcription factor cluster structure (Fig. 3c) autocatalytically amplified fluctuations of TF expressions (Fig. 3a) and external input by the induction of MYC triggered and maintained the process. As a result, the transcriptional network was pushed farther from equilibrium and reached high potential states that were maintained by the enforced expression (orange checkerboards in Fig. 4).

Furthermore, the system-wide structure described here can be linked with complex networks. The scale-free topology criterion can be defined as the coefficient of determination for a power-law distribution ~ck−γ, where k is the number of interactions per node and γ is the degree of decay26. For the predicted network (Fig. 1), the criterion was calculated as 0.67 (see Supplementary Fig. S5), which indicated that the predicted network had a near scale-free property. Γ was calculated to be 1.44 in the predicted network, similar to the value obtained in previous study based on co-expression analyses (γ = 1.19 in human27). This similarity further confirmed our inference of the system-wide, ladder-like transcription factor cluster structure, despite the structure being unique. Most importantly, the above observations imply that the approach described in this study uncovered an ordered structure that differed significantly from the homogeneous appearance (Fig. 3c) in the nearly scale-free network (Fig. 1), which had promised a homogeneous appearance.

Methods

Expression analysis

The expression profiles of 2,315 human TFs measured in a genome-wide dynamics analysis of a THP-1 cell line over a time course of growth, arrest and differentiation were collected from the Genome Network Platform (http://genomenetwork.nig.ac.jp/index_e.html). qRT-PCR was performed using primer pairs generated automatically from a single exon of each target gene. To obtain high quality expression data, the reliability and specificity of each primer pair were confirmed in a preparatory experiment. Temporal expression profiles were observed at 0, 1, 2, 4, 6, 12, 24, 48, 72 and 96 hours after starting PMA treatment of human THP-1 cell during differentiation across two biological replicates. Human THP-1 myelomonocytic leukaemia cells cease proliferation, become adherent and differentiate into a mature monocyte-and macrophage-like phenotype by stimulation with PMA28,29. By eliminating expression data with suspected measurement errors, 2,247 TFs were selected from the 2,315 available TFs. Of the 2,247 TFs, 1,350 (60%) were common to another independently developed dataset of 1,962 human TFs30. The averaged expression levels (copy numbers) of the two biological replicates were used for further analyses.

Filtering TFs

The SUMCOV11 (sum of covariance) criterion was used to eliminate single TFs that were irrelevant to the transcriptional network. This criterion is based on the covariance matrix of the gene expressions assuming that there are groups of genes that are correlated among themselves, while being uncorrelated with the other groups. The SUMCOV for each TF was calculated using the averaged expression levels to identify the relevant genes. We implemented k-means clustering with k = 2 based on the logarithm of the SUMCOV provided by Gene Cluster 3.031, to form a relevant and irrelevant group of genes. The cluster that included the TFs with the highest average criterion value was taken to represent the relevant group of genes for the network. The cluster that included the remaining TFs (representing the irrelevant group of genes) was used in a null-hypothesis to test significant interaction between the classes (see the “Inferring system-wide network structure” section in Methods for details).

Identification of interaction

The temporal profiles were fitted to a linear co-expression model13, xj = aij + bijxi, that represented an interaction between the i-th and j-th TFs, where xi indicates the expression level of i-th TF. The coefficient of determination (r2) was calculated to determine the goodness-of-fit of the equation. The nodes connected by interactions with r2 values higher than the goodness-of-fit thresholds were investigated. We selected a goodness-of-fit threshold that left the highest numbers of filtered TFs in the network. Directionality of an interaction was assigned so that a small expression change in the ‘source’ gene was associated with a large change in the ‘sink’ gene. We chose a slope ratio threshold that imposed directionality on the interaction so that the ratio between the magnitudes of the slope coefficients (smallest/largest) in the linear model13 was less than the threshold.

Clustering TFs

The filtered TFs were divided by the co-expression model into clusters based on a series of rij2 values, ri12, …, rij2, …, rin2, where rij2 indicates the r2 of a model in which the i-th TF regulates the j-th TF. Based on this method, if two TFs regulate all other TFs identically, the distance between the two rij2 series will be zero. It is important to note that r2 is an absolute value that allows two interactions to be determined as identical based on their goodness-of-fit to the co-expression model, even if the effect of the interaction indicated by the sign of the slope coefficient bij is different. Situations similar to this can be seen in the checkered patterns in Fig. 2.

Hierarchical clustering was performed using Gene Cluster 3.031 with the Euclidean distance between the rij2 series and the centroid linkage method. Clusters were visualized using Java Treeview32. The TF clusters were divided into two classes based on the temporal profiles of the TFs assigned to the clusters. For this clustering, we implemented k-means clustering with k = 2 of the normalized expression levels (between zero and one) with the Euclidean distance31. Suffix ‘1’ was assigned to the classes where the representative profiles showed an upward trend and suffix ‘2’ to was assigned to the classes where the profiles showed a downward trend.

Inferring system-wide network structure

The effect of an interaction was decided based on the sign of the slope coefficient (bij), which was estimated based on the co-expression model: promotive if bij >0 and inhibitory if bij <0. A two-sample test for equality of proportions was used to evaluate the statistical significance of the proportion (p1) of the interactions in the total number of potential ni × nj interactions between the classes, where ni and nj indicate the number of TFs contained in the classes. The null hypothesis was that p1 = p0, where p0 is the probability that an interaction, with r2 > goodness-of-fit threshold and slope ratio < slope ratio threshold, is observed by chance. The p0 was calculated based on the potential interactions between the TFs identified as single TFs (see OtherTF.zip at http://debe-db.nirs.go.jp/nw/ for details) that were irrelevant to the transcriptional network in the filtering analysis as (number of interactions, with r2 > goodness-of-fit threshold and slope ratio < slope ratio threshold, between single TFs)/(number of single TFs)2. The test statistic for the two-sample test for equality of proportions17 was: , where and m and n are the number of potential interactions between TFs in the corresponding two classes and between TFs identified as single TFs, respectively. Additionally, we investigated the similarity between the representative profile of each class and the step function that modelled external input by PMA applied at the beginning of the experiments and supplied continuously over the entire time course (see Supplementary Fig. S4).

State transition simulation

The network state transition was simulated based on Boolean functions (modified from Kauffman1). In the simulation, the states were presented as binary values (zero or one) at the class level. The significant interclass interactions were used to model transcriptional regulations in the network structure (see Supplementary Fig. S6). In the simulation of THP-1 cell differentiation under PMA stimulation, step functions (0 → 1 and 1 → 0) were provided to the B1 and B2 classes, respectively, as external inputs modelling the PMA stimulation.