Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics
Long indels are disordered: A study of disorder and indels in homologous eukaryotic proteins☆
Highlights
► Disordered residues are more frequent among indels than among aligned residues. ► Disordered residues are particularly common in longer indels. ► Ordered non-terminal indels are short. ► The longest indels, ordered and disordered, occur toward the termini of the proteins.
Introduction
A number of different genetic mechanisms cause mutations in coding genes, ranging in size from point mutations, through insertions and deletions (indels) of a few residues, to rearrangements of protein domains and fusion of entire genes. In general, mutations occur at random but are under selective pressure. One general result of this is that residues in the core of a protein are more likely to be maintained through evolution compared to those on the surface of the protein [1]. Further, short indel events are more likely to occur in loops than in secondary structures.
Short indels occur by, for instance, DNA replication slippage during replication or repair [2]. Longer extensions can occur through the conversion of 3′ UTRs into coding regions [3] and through cassette duplications of protein domain repeats, a feature that is particularly common in higher eukaryotes [4]. Novel coding regions may also be created through tandem repetitions of short nucleotide sequences (microsatellites) within the coding region [5].
As some regions of proteins are less crucial to the functionality to the protein than others it is safe to assume that indels within some regions are less likely to be deleterious than indels in other regions. Short indels that become fixed in the population preferentially occur in solvent accessible loop regions [6]. Longer indel events involve the insertion or deletion of entire protein domains, primarily at the N- and C-termini of proteins [7] but also, when it comes to repeated domains, within the central parts of a protein [7]. The selective pressure acting on these longer indel events is less well understood. However, in the case of repeated proteins it is clear that the duplication of particular domain combinations are strongly favored [8]. The large length variation caused by indels of several protein repeat domains affects binding properties of the proteins, i.e. longer indels events are often associated with functional changes [9].
During the last decade it has become evident that while most proteins contain folded domains, and indeed most proteins contain more than one domain [10], some proteins are partially or even fully disordered [11], [12], [13]. These sequences are characterized by two primary features; (i) a low level of hydrophobicity which precludes the formation of a stable globular core; (ii) a high net charge which favors an extended structural state due to electrostatic repulsion [14]. These properties lead to that intrinsically disordered proteins are, in general, more expanded in native conditions than foldable proteins [15].
One important observation concerning intrinsically disordered regions is the fact that they are not at all as common in prokaryotes as in eukaryotes [16], suggesting that disorder could be a component required for higher complexity [17], although it is possible that another reason for this finding is the compactness that characterizes prokaryotic genomes [18]. Intrinsically disordered regions are in general fast evolving, but there are also examples of highly conserved intrinsically disordered regions [14], [19]. Further, many intrinsically disordered regions are important for binding [13] and intrinsically disordered regions are a common feature of the hubs in protein–protein interaction network of Saccharomyces cerevisiae [20], [21].
Here, we present an investigation into insertions and deletions within disordered regions. We show that indels, here defined as regions that are aligned against gaps, contain much more disordered residues than aligned positions. Further, the longer the indel, the more likely that it is disordered. Finally, among the proteins where the disordered region is at least as conserved as the ordered region, we find an overrepresentation of proteins that are involved in processes related to translation.
Section snippets
Results and discussion
We have applied two disorder predictors, Iupred [22] and Disopred, to analyze the evolutionary patterns of disordered residues in particular with respect to indels. There are many flavors of protein disorder [13], [23]. For instance, short and long disordered regions appear to perform different functional roles, where the short disordered regions often serve as loops in otherwise structurally ordered proteins [16]. Such regions are less conserved than their structured surroundings [24], whereas
Conclusion
Here, we have studied the homologous proteins from C. elegans and D. melanogaster, as well as homologous fungal proteins, with regard to the disorder content of indels. Due to the difficulty of aligning distantly related proteins, even using state of the art HMM–HMM alignment methods, in particular disordered proteins, the results should be regarded with a measure of caution. However, given that the results remain essentially the same irrespective of disorder prediction method and dataset used,
Orthologous protein pairs
Orthologous protein pairs between C. elegans and D. melanogaster were retrieved from pre-computed homology clusters from InParanoid (version 7) [35]. Additionally, an evolutionary distance filter was applied (Tree-Puzzle [36] distance ≤ 4) to avoid inclusion of non-homologs. In total, 3,736 protein pairs were included. Orthologous protein pairs between Saccharomyces cerevisiae and five other fungal species (Candida albicans, Candida glabrata, Debaryomyces hansenii, Kluyveromyces lactis and
Acknowledgements
This work was supported by grants from the Swedish Research Council (VR-NT 2009-5072, VR-M 2010-3555), SSF, the Foundation for Strategic Research, Science for Life Laboratory. The EU 6th Framework Program is gratefully acknowledged for support to the GeneFun project, contract no: LSHG-CT-2004-503567 and the 7th framework through the EDICT project, contract no: FP7-HEALTH-F4-2007-201924. Funding for SL was provided by BILS, Bioinformatics Infrastructure for Life Science.
References (43)
- et al.
Domain rearrangements in protein evolution
J. Mol. Biol.
(2005) - et al.
Nebulin: a study of protein repeat evolution
J. Mol. Biol.
(2010) - et al.
Multi–domain proteins in the three kingdoms of like - orphan domains and other unassigned regions
J. Mol. Biol.
(2005) - et al.
Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm
J. Mol. Biol.
(1999) - et al.
Function and structure of inherently disordered proteins
Curr. Opin. Struct. Biol.
(2008) - et al.
Loopy proteins appear conserved in evolution
J. Mol. Biol.
(2002) - et al.
Protein disorder–a breakthrough invention of evolution?
Curr. Opin. Struct. Biol.
(2011) Protein-length distributions for the three domains of life
Trends Genet.
(2000)- et al.
The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins
J. Mol. Biol.
(2005) Comparison of the predicted and observed secondary structure of t4 phage lysozyme
Biochim. Biophys. Acta
(1975)
Structural diversity of domain superfamilies in the cath database
J. Mol. Biol.
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
J. Mol. Biol.
Protein secondary structure prediction based on position-specific scoring matrices
J. Mol. Biol.
Quality assessment of protein model-structures using evolutionary conservation
Bioinformatics
Slipped-strand mispairing: a major mechanism for DNA sequence evolution
Mol. Biol. Evol.
The conversion of 3’ UTRs into coding regions
Mol. Biol. Evol.
Expansion of protein domain repeats
PLoS Comp. Biol.
Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome
Comp. Funct. Genomics
Systematic analysis of short internal indels and their impact on protein folding
BMC Struct. Biol.
Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation
Proteins: Struct. Funct. Bioinform.
Why are "natively unfolded" proteins unstructured under physiologic conditions?
Proteins
Cited by (0)
- ☆
This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
- 1
Contributed equally.