The Role of Structural Disorder in the Rewiring of Protein Interactions through Evolution*

Structurally disordered regions play a key role in protein-protein interaction networks and the evolution of highly connected proteins, enabling the molecular mechanisms for multiple binding. However, the role of protein disorder in the evolution of interaction networks has only been investigated through the analysis of individual proteins, making it impossible to distinguish its specific impact in the (re)shaping of their interaction environments. Now, the availability of large interactomes for several model organisms permits exploration of the role of disorder in protein interaction networks not only at the level of the interacting proteins but of the interactions themselves. By comparing the interactomes of human, fly, and yeast, we discovered that, despite being much more abundant, disordered interactions are significantly less conserved than their ordered counterparts. Furthermore, our analyses provide evidence that this happens not only because disordered proteins are less conserved but also because they display a higher capacity to rewire their interaction neighborhood through evolution. Overall, our results support the hypothesis that conservation of disorder gives a clear evolutionary advantage, facilitating the change of interaction partners during evolution. Moreover, this mechanism is not exclusive of a few anecdotal cases but a global feature present in the interactome networks of entire organisms.

The classical dogma of structural biology maintains that proteins need to adopt a specific well defined three-dimensional structure to perform their functions. However, there is growing evidence suggesting that this paradigm does not hold for a large fraction of the proteome in higher organisms. Indeed, there are many proteins that lack a defined stable secondary or tertiary structure in solution and yet play crucial cellular roles (1)(2)(3). It is now clear that these intrinsically unstructured or disordered proteins (IDPs) 1 might provide some interesting functional advantages over their more structured counterparts. For instance, because they often offer a large and flexible interaction surface area, IDPs are ideal candidates to mediate in signaling cascades where specific interactions with fast association/dissociation rates are required (4,5). However, maintaining a pool of IDPs is not easy for the cell machinery because they tend to aggregate and are more prone to proteolysis, and their overexpression is often damaging (6,7). Thus, it is not surprising that they are subject to tight regulation from transcript synthesis to protein degradation (2).
From an evolutionary perspective, disordered regions have been classified as fast evolving (8), because they show a different pattern of accepted point mutations and present higher rates of insertions and deletions (9,10). However, IDPs will not accept and incorporate any random mutation, and it has been recently shown that long unstructured stretches in proteins are significantly more conserved than their flanking ordered regions or loops in well structured proteins (11,12). It is thus clear that there is selective pressure to maintain structural disorder. Indeed, there is a direct relationship between the level of protein disorder and the tree of life (11,13), suggesting that disorder itself is a major evolutionary tool to increase organism complexity, from bacteria to higher eukaryotes.
With an increasing level of species complexity often comes the implementation of novel regulatory mechanisms, many of which are imprinted as control circuits in protein interaction networks (14). Accordingly, the role of structural disorder has been largely studied in the context of such networks. Interactions between IDPs are common in interactome networks (15) and, in particular, among network hubs (i.e., highly connected proteins) because disorder provides a solution for a protein to interact with many structurally different partners (8, 16 -19). The binding mechanisms mediated by disorder encompass increased flexibility and adaptability to multiple interaction interfaces, a facilitated regulation via diverse post-translational modifications and an increased interaction interface exposed by disordered regions (20). These characteristics, found to be particularly present in hub proteins, have also been proposed to be crucial in the evolution of protein interaction networks by improving the fitness of such proteins for binding through becoming longer and acquiring more disordered regions (16). Furthermore, it was shown that hubs, and particularly singlish-interface hubs (i.e., those that bind all their partners using only one or two interaction interfaces (8)) evolve at a faster rate because of their disordered regions being subject to fewer evolutionary constraints (8). However, although very inspiring, all of these studies have investigated the role of disorder in the evolution of interaction networks only through the analysis of single proteins.
Additionally, the last decade has seen the emergence of studies devoted to analyzing interactome evolution from a systemic perspective. For instance, several works have tried to estimate the rate of interaction gain and loss among different organisms (21)(22)(23), and different models have been proposed on how this may shape the structure of interactomes and influence their global properties (24 -29). All of these advances have been recently reviewed and summarized by Levy and Pereira-Leal (30). In this context, it is our opinion that a lot still needs to be done to clarify which part of the observed effects of structural disorder are related to the evolution of the proteins as individual biological entities and which can entirely be attributed to their role in (re)shaping their interaction environment inside the network.
In this work, we make use of existing large scale interaction data to assess systematically and quantitatively the role of structural disorder in network evolution and rewiring. In par-ticular, we build and compare interactome networks for human, fly, and yeast and classify the interactions into ordered and disordered. We then study whether the two types of interactions are equally conserved through evolution or if, on the contrary, either type is preferred to change the cellular repertoire of interactions, regardless of the level of conservation of their constituent proteins (Fig. 1).

EXPERIMENTAL PROCEDURES
Collection of High Confidence Proteomes-We collected high confidence sets of proteins for yeast (Saccharomyces cerevisiae) and fly (Drosophila melanogaster) through the mapping of the Saccharomyces Genome Database (http://downloads.yeastgenome.org/, January 2010) and FlyBase (http://www.flybase.org, March 2010) (44) proteins onto UniProt (45) and removed all pseudogenes and dubious ORFs. For human, we collected all UniProt sequences belonging to Homo sapiens. In all three cases, we only considered UniProt entries with evidence at the protein or transcript level and discarded fragments. We then added all splice variants available in Swissprot and removed redundancy by clustering sequences using UniRef100 (45). The final high confidence, nonredundant proteomes contain 5,747 sequences for yeast, 18,547 for fly, and 59,898 for human (see Table I).

FIG. 1. Conservation of interactions involving disordered proteins.
We compared the interactomes of human, fly, and yeast to identify conserved subnetworks and tested for enrichment/depletion of interactions involving ordered (green, yellow, and blue) and disordered proteins (red). A, AЈ, and AЉ, as well as B, BЈ, and BЉ, are examples for orthologous proteins between the three species and illustrate how disordered interactions can change during evolution.
Codes, using the mapping file provided by UniProt. We remapped the different interactors through UniRef100 to remove redundancy and discarded those interactions involving proteins not present in our high confidence proteomes, as well as those that could not be traced back to their original publications. We then added interactions found in three-dimensional structures (in biological units to avoid crystal contacts), downloaded from the Protein Data Bank (http://www.pdb.org) (36). We considered two amino acid chains to be interacting if they had at least five residue-residue contacts (i.e., hydrogen bonds, disulfide and salt bridges, and van der Waals interactions), not considering obvious physical clashes (i.e., pairs of atoms closer than the sum of their average covalent radii plus 0.5 Å).
Determination of Orthology Relationships-We determined orthology relationships between human, fly, and yeast proteins using proteome-wide reciprocal BLAST searches (52). To remove spurious hits, we required an E -value of Ͻ10 Ϫ10 and considered only hits in the top 10 of the BLASTP output. This resulted in 11,430 yeast/fly, 17,534 yeast/human, and 75,415 fly/human orthologous pairs. To test the robustness of our findings with respect to the orthology assignments, we repeated all of the analyses using Inparanoid assignments (53,54), both many-to-many and one-to-one relationships (supplemental Table 1), obtaining identical trends (supplemental Fig. 2).
Estimation of Likely Conserved Interactions-We estimated evolutionary distances between homologous proteins as the number of amino acid substitutions per site (d) calculated from the fraction of identical residues (q) using the general equation derived by Grishin (55) that accounts for substitution rate variations both between different types of amino acids and between different sites.
We solved this equation numerically by iteration, using d ϭ (1/q) Ϫ 1, which allows for the substitution rate to vary only among sites, as the starting point, until the difference between subsequent estimates of d was smaller than 10 Ϫ10 (default parameter). Then, for each pair of homologs (A/AЈ, B/BЈ) that interact in at least one of the interactomes, we calculated the probability of the respective interaction being conserved as the posterior probability of interaction conservation given the difference ⌬d A/AЈ, B/BЈ between the evolutionary distances of A and AЈ, and B and BЈ.

P͑C ⌬d
This calculation is based on the likelihood ratio of observing the respective ⌬d under a conservation model C (all pairs of homologs with a conserved interaction) and a null model N (10 6 random pairs of homologs). We calculated the posterior probability using Bayes' theorem, with the prior probability set to such that the pair of homologous proteins (X/XЈ, Y/YЈ) with the highest likelihood ratio is assigned an interaction conservation probability of 0.9 (default parameter). Likelihood ratios were smoothed using monotone regression (Pool Adjacent Violators Algorithm, PAVA (56)). We accepted all predicted interactions with a posterior probability bigger than 0.5.
Computation of Disorder Enrichments/Depletions-To assess whether disordered interactions are more or less conserved than ordered interactions, we computed their enrichment/depletion in the interactome networks of the given species (Fig. 2). For this, we calculated the ratio of the fraction of disordered interactions in the two species (B ϩ D) that are conserved in one of the species (B) and the fraction of ordered interactions (A ϩ C) that are also conserved (A), followed by log 2 transformation to get a symmetrical range of values.
We checked the statistical significance of the enrichments by applying a two-sided Fisher's exact test with a standard p value threshold of 0.05. For clarity, we transformed the log 2 enrichments into fold enrichments and use negative numbers for depletions so that, for instance, a log 2 enrichment of 2 becomes a 4-fold enrichment, whereas a log 2 depletion of Ϫ2 corresponds to a ratio of 0.25 (i.e., 4-fold depletion indicated by Ϫ4).

RESULTS AND DISCUSSION
To carry out our comparative analysis, the first step was to collect and assemble a high confidence set of protein sequences for human (H. sapiens), fly (D. melanogaster), and yeast (S. cerevisiae) and predict unstructured stretches in these proteins. We did this by means of the disorder predictor VSL2 (31,32), which is one of the most precise predictors, according to the results of the last critical assessment of structure prediction rounds (33,34), with an accuracy of ϳ81% (32). We then classified each protein as ordered or disordered if more than 70% of its residues were predicted to be structured or unstructured, respectively. We applied such a strict filter to avoid potential artifacts caused by mispredictions. All of the remaining proteins were classified as partially disordered or uncertain. To ensure the robustness of the classification, we repeated the analysis with a different disorder prediction program, DISOPRED2 (14), obtaining very similar results (data not shown). Table I reports the number of proteins included in this study for each species as well as their ordered/disordered structural classification. It can be appreciated how the fraction of ordered and disordered proteins is fairly similar, ϳ30% for human and fly and 20% in yeast, in the three species considered and agrees well with current estimations (14).
Whole Interactome Comparisons-After collecting the different proteomes and classifying the proteins as ordered/ disordered, we built the interactome networks for each organism and compared their level of conservation among the different species. We retrieved binary protein-protein interactions for the three model organisms from the main data repositories, and made the set nonredundant, as described under "Experimental Procedures" and summarized in Fig. 2. It is worth noting that we deliberately decided not to include experiments reporting multiprotein complexes to avoid errors in the assignment of direct physical interactions between their components and only considered interactions reported by binary detection methods (35). In total, we compiled over 20,000 interactions for each species involving between 5,000 and 8,600 proteins. The percentages of fully ordered and disordered proteins present in the interactome networks range from 26 to 33% and from 21 to 33%, respectively, and they are consistent with the composition observed in full proteomes (Table I). To certify that the strict levels of Ͼ70% ordered/disordered residues were comparable between whole proteins and interaction interfaces and discard the possibility that the interactions are mediated by the remaining 30% of disordered/ordered residues, we analyzed all the 970 human, fly, and yeast interactions for which there is a structure available in the Protein Data Bank (36) and found that, for disordered interactions (structured upon association), the interface contains between 64 and 72% of disordered residues on average, whereas the figure drops to 22-24% for ordered interactions. These numbers are fully compatible with our thresholds and show that whenever an interaction is classified as ordered/disordered, the classification is also consistent in the interfaces.
To compare the species interactomes and assess the level of conservation of the interactions, we first inferred orthology relationships between the proteins in each organism and overlaid the networks. We consider an interaction between the two proteins A and B to be conserved if it is also observed between the corresponding orthologs AЈ and BЈ in the other species. By performing pairwise alignments of the collected binary interactomes for the three organisms, we discovered that interactions involving at least one disordered protein (i.e., disordered interactions) are significantly less conserved than interactions between ordered proteins (i.e., ordered interactions), as determined by enrichment analysis (see "Experimental Procedures"), and this holds for all organism pairs (p values in the range [1.91 ϫ 10 Ϫ56 and 2.74 ϫ 10 Ϫ14 ], twosided Fisher's exact test). In particular, we found disordered interactions to be between 3.3-and 9.6-fold less conserved than ordered ones (Fig. 3A).
Surprisingly, although there is no change in the fraction of ordered/disordered proteins in whole proteomes and interactomes, the figures change considerably when we consider not individual proteins but protein-protein interactions. In this case, we see how the fraction of ordered interactions drops to ϳ6 -8%, whereas the disordered ones account for 45-61% of the interactions (Table I). However, even if disordered interactions are the main components of interactome networks, if we analyze the interactions that are conserved across different species, our results show that disordered interactions are unequivocally and consistently depleted, and ordered interactions are by far more conserved.
Accounting for Undetected or False Interactions-Although the initial results show a clear trend, we need to consider several factors that could potentially affect the analysis. For instance, it is well known that current interactomes are incomplete, and only a small fraction of the total number of estimated interactions have already been identified (37). The high fraction of missing interactions has been attributed to two main causes: on the one hand, the sampling of the interaction space for the different model organisms is incomplete, and on the other, the discovery methodologies available cannot de-tect all types of interactions (38). In practical terms, this means that we could wrongly classify an interaction as nonconserved only because it has not been studied or detected in one of the two species compared. We tried to address the partiality introduced by the different proteome sampling by considering only those proteins that had been included in each study. However, unfortunately, this information in not FIG. 2. General strategy employed in the analysis of conservation of ordered/disordered interactions. After construction of binary interactomes for human, fly, and yeast, we predicted disordered residues in the three species proteomes using VSL2 (31,32) and classified the interactions into ordered (i.e., interactions between two ordered proteins), disordered (i.e., interactions involving at least one disordered protein), and partially disordered (see main text for details). We then performed pairwise alignments of the binary interactomes to identify conserved and not conserved interactions. Finally, we computed the enrichment/depletion of conserved ordered/disordered interactions in the interactome networks.

FIG. 3. Conservation of disordered interactions in interactome networks.
The values report fold enrichments (depletions, because they are all negative) of conserved disordered interactions in the interactome networks of the three model organisms considered. The numbers refer to the interactome of the organisms in the rows when comparing them to the organisms in the columns for complete interactomes (A), complete interactomes including likely conserved interactions (B), high confidence interactomes including likely conserved interactions (C), and only subnetworks with proteins present in both species (D) (see the main text for details). The degree of enrichment/depletion is shown in different blue tones, from light to dark. All of the depletions were statistically significant with a standard p value threshold of 0.05 (two-sided Fisher's exact test). available for most experiments because they only report positive interactions. Nevertheless, what we could do is to check whether there is a bias toward ordered or disordered proteins being more sampled in the different interactomes that could affect our results. Accordingly, we computed the fraction of ordered/disorders proteins in the proteome/interactome of each organism and calculated the ratio for each species comparison (supplemental Fig. 1). We found that although there are some minor differences, they are unlikely to affect our conclusions because they occur in both directions (i.e., either ordered or disordered proteins being slightly oversampled). Finally, it is worth noting that, to the best of our knowledge, this type of structural information was never used to design the very many interaction discovery experiments that are included in our study, being thus unlikely to cause any serious bias.
To compensate for the lack of detection sensitivity of the different interaction discovery methods, we implemented a procedure to predict likely conserved interactions in the interactome alignment procedure. Our strategy is founded on the concept of "interologs " (i.e., orthologous pairs of interacting proteins) (39) and explicitly incorporates evolutionary considerations. In particular, we profit from the observation that interacting proteins evolve at rates significantly closer than expected by chance (40) (even within the same functional module (41)) to predict the probabilities for likely conserved interactions based on the difference of the evolutionary distances between the protein pairs involved in the interactions (see "Experimental Procedures"). The inclusion of likely conserved interactions notably increased the number of interactions considered by adding up to 4,822 interactions in the human versus fly comparison (Table I). We then repeated the above enrichment analysis and very consistently observed significant depletion of conserved interactions in disordered interactions with respect to ordered ones (p values in the range 2.07 ϫ 10 Ϫ128 to 1.58 ϫ 10 Ϫ25 , two-sided Fisher's exact test), with disordered interactions still being 1.6 -9.3fold less conserved than ordered ones (Fig. 3B).
We also sought to exclude any potential artifact from our conclusions coming from false positives in the networks. Accordingly, we built high confidence networks by considering only those interactions reported by at least two independent publications or extracted from biological units in crystal structures. This significantly reduced the number of interactions considered to 14,337, 5,006, and 5,668 for human, fly, and yeast, respectively. We repeated the enrichment analysis and, again, found a 1.5-8.5-fold significant depletion of conserved disordered interactions (p values in the range 3.1 ϫ 10 Ϫ102 to 3.3 ϫ 10 Ϫ5 , two-sided Fisher's exact test; Fig. 3C).
The results of this second set of enrichment analyses, where we deal with missing and dubious interactions, are very consistent with what we observed in complete interactome networks, showing a clear depletion in the conservation of disordered interactions. In addition, they also support the notion that, even if incomplete and error prone, current ver-sions of interactomes already contain enough quality information to permit the analysis of their global and emerging properties (42).
Evolutionary Pressure on Network Rewiring-It has been repeatedly reported that well structured proteins are, in general, more conserved than disordered ones (43). We thus need to assess to what extent our observation that disordered interactions are significantly less conserved in interactome networks is due to the lower level of conservation of the individual interacting proteins or evolution is indeed acting on the interactions themselves: that is, to check whether interactions are not conserved because one of the interacting orthologs in the other species is missing or not detectable or because there is real rewiring of interactions among pairs of proteins that have orthologs in the other organism. Accordingly, we built interactome subnetworks consisting of those interactions whose protein components have detectable orthologs in the other organism. Note that because we are using conserved pools of proteins, interactome subnetworks for each organism do now vary depending on the species that they are compared with. As expected, the number of interactions decreased considerably, ranging from 9,873 for the human-fly comparison to 2,642 in the case of fly-yeast (Table I). We then reran the enrichment analysis and again observed the same trend: a 1.4 -2.8-fold depletion of conserved disordered interactions (Fig. 3D), with p values in the range 3.13 ϫ 10 Ϫ14 to 2.58 ϫ 10 Ϫ3 , in a two-sided Fisher's exact test. In this case, we found a considerable decrease of the depletion of conserved disordered interactions observed when comparing the human and fly interactomes to yeast, from 7.2-and 9.6-fold depletion to 1.4-and 1.9-fold, respectively. This effect is not observed when taking the yeast interactome as a template and comparing it with human and fly, and it can be attributed to the nonexistence of yeast orthologs for many human and fly proteins, whereas most yeast proteins have counterparts in the other species.
This final analysis reveals that, even accounting for the different levels of protein conservation by considering only common subnetworks, there is a clear depletion of conserved disordered interactions in interactomes. This suggests a selective pressure acting directly on the interactions, independent of that acting on the individual partners, that might have a direct effect on the rewiring of cell networks.
Concluding Remarks-Through the comprehensive and quantitative analysis of the conservation of protein-protein interactions across human, fly, and yeast, we have found that disordered interactions are significantly less conserved than those between well ordered interactors. By compensating for several caveats in current interactomes, we have shown that disordered interactions are much less conserved than their ordered counterparts not only because of the higher evolutionary rates of unstructured proteins but also because of their ability to rewire their interaction neighborhood by changing partners during evolution. Moreover, analyzing disordered interactions of known three-dimensional structure, likely involving disorder to order transitions upon binding, we found evidence that disorder plays a direct role at the interaction interface between proteins. Our findings complement previous reports showing that proteins exploit structural disorder to adapt their binding interfaces to several different interactors (20) and that disorder actively participates in the evolution of hubs, providing the molecular basis for their large interaction capacity (8,16).
Overall, our results support the hypothesis that conservation of disorder gives a clear evolutionary advantage, facilitating the change of interaction partners during evolution. Moreover, this mechanism is not exclusive of a few anecdotal cases but a global feature present in the interactome networks of entire organisms.