Protein Complex Evolution Does Not Involve Extensive Network Rewiring

The formation of proteins into stable protein complexes plays a fundamental role in the operation of the cell. The study of the degree of evolutionary conservation of protein complexes between species and the evolution of protein-protein interactions has been hampered by lack of comprehensive coverage of the high-throughput (HTP) technologies that measure the interactome. We show that new high-throughput datasets on protein co-purification in yeast have a substantially lower false negative rate than previous datasets when compared to known complexes. These datasets are therefore more suitable to estimate the conservation of protein complex membership than hitherto possible. We perform comparative genomics between curated protein complexes from human and the HTP data in Saccharomyces cerevisiae to study the evolution of co-complex memberships. This analysis revealed that out of the 5,960 protein pairs that are part of the same complex in human, 2,216 are absent because both proteins lack an ortholog in S. cerevisiae, while for 1,828 the co-complex membership is disrupted because one of the two proteins lacks an ortholog. For the remaining 1,916 protein pairs, only 10% were never co-purified in the large-scale experiments. This implies a conservation level of co-complex membership of 90% when the genes coding for the protein pairs that participate in the same protein complex are also conserved. We conclude that the evolutionary dynamics of protein complexes are, by and large, not the result of network rewiring (i.e. acquisition or loss of co-complex memberships), but mainly due to genomic acquisition or loss of genes coding for subunits. We thus reveal evidence for the tight interrelation of genomic and network evolution.


Ewing et al. HTP IP-HTMS dataset used to calculate conservation between human and yeast interactome
We have chosen Reactome as our reference set in human for calculating the conservation of co-complex membership because it is manually curated and based on expert opinion and therefore is likely to contain fewer errors. A new CoIP dataset for human by Ewing et al. has become available and we show here the same calculations when Reactome is substituted by this dataset below.
The authors state that interactions with a confidence score higher or equal to 0.3 should be regarded as high confidence. When using a higher cut-off value we see a steady rise in conservation (87% for >= 0.5 against the intersection dataset) but also see the total number of conserved protein pairs plummet towards small numbers. The number of conserved protein pairs in Ewing when no cut-off value was used is significantly less than for Reactome and the conservation calculated is therefore less representative.
Ewing shows a much lower preservation of orthologs of protein pairs than Reactome (11% and 32% resp.). It is reported by Ewing et al. explicitly that they have based their bait selection on human disease association. Ewing therefore does not represent the basal conserved eukaryotic machinery as well as Reactome, which would account for the low conservation of protein pairs. We also performed our analysis with another orthology definition. We have used inparanoid [1] to calculate orthology between human sequences from the UniProt database and yeast sequences from SGD. Inparanoid is a script which uses BLAST to obtain homology and calculated orthologs taking into account the existence of paralogs and in-paralogs. We have used the standard settings for inparanoid. Below is a table, like table 2 in the publication but based on the inparanoid orthology. We see that the orthology based on inparanoid results in slightly higher conservation and more conserved protein pairs. We feel that the orthology based on Ensembl is more advanced as it is based on reciprocal match, phylogenetic tree construction and tree reconciliation. We therefore used the Ensembl definition in our main analysis as opposed to InParanoid.

Errors in orthology, complex definition and neofunctionalisation
Of the 167 non-interactions as found using Reactome and the Inclusive dataset, 139 appear to be potential false negatives. The remaining 28 non-conserved interactions consist of errors in orthology of one gene (5 interactions), incorrect assignment of two proteins to a complex in Reactome (10 interactions) and possible neo-functionalisation after duplication in human (3 proteins, 13 interactions).
Five protein pairs do not show an interaction due to incorrect orthology assignment in Ensembl. The human protein TF2H4 [Swiss-Prot:Q92759] is annotated as orthologous to VAS1 [SGD:YGR094W] and is present in five conserved protein pairs in Reactome. We could not confirm any homology between these proteins (let alone orthology) and it seems unlikely as well from the annotation: TF2H4 is a subunit of Transcription Factor IIH complex whereas VAS1 is a valyl-tRNA synthetase. [SGD:YHR156C is its ortholog in yeast. LIN1 is implicated to link chromatine modification and the cohesin complex to the spliceosome complex [3]. But the similarity between CD2B2 and LIN1 is weak, and both have very different functions. CD2B2 is involved in immunity and binds to antibodies, whereas LIN1 is a non-essential component of U5 snRNP. CD2B2 and the spliceosome are mentioned together in an article by Monos et al. [4] because an antibody raised against CD2B2 also reacted with the spliceosomal Sm B/B' proteins. The experimental link between SMC1 alpha and the spliceosome is weak and it can therefore be argued that SMC1 is not part of the spliceosome complex.
We identified 13 protein pairs which could be possible new interactions. Each of these  [5]. Its yeast ortholog PBP2/HEK1 [SGD:YBR233W] is involved in the regulation of telomere position effect and telomere length [6]. However PCBP1 is not the only ortholog of PBP2. 14 human proteins are orthologs to PBP2. These are active in different processes, some of them still perform the ancestral function [7]. So whereas PBP2 solely has a function in the regulation of telomere position effect and telomere length in yeast, the human PCBP family of inparalogs has gained many other functions and interaction partners after several rounds of duplications in the course of evolution (neofunctionalization of inparalogs).
The human PABP2 is a poly(A)-binding protein and is part of the "3' end cleaved, ligated exon containing complex" in the nucleus according to Reactome. Its ortholog in yeast, SGN1 [SGD:YIR001C], is a poorly characterized poly(A)-binding protein that localizes to the cytoplasm and not to the nucleus [8]. Hence some degree of functional differentiation took place in either human or yeast.