Skip to main content

The Relation between Indel Length and Functional Divergence: A Formal Study

  • Conference paper
Algorithms in Bioinformatics (WABI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

  • 1015 Accesses

Abstract

Although insertions and deletions (indels) are a common type of evolutionary sequence variation, their origins and their functional consequences have not been comprehensively understood. There is evidence that, on one hand, classical alignment procedures only roughly reflect the evolutionary processes and, on the other hand, that they cause structural changes in the proteins’ surfaces.

We first demonstrate how to identify alignment gaps that have been introduced by evolution to a statistical significant degree, by means of a novel, sound statistical framework, based on pair hidden Markov models (HMMs). Second, we examine paralogous protein pairs in E. coli, obtained by computation of classical global alignments. Distinguishing between indel and non-indel pairs, according to our novel statistics, revealed that, despite having the same sequence identity, indel pairs are significantly less functionally similar than non-indel pairs, as measured by recently suggested GO based functional distances. This suggests that indels cause more severe functional changes than other types of sequence variation and that indel statistics should be taken into additional account to assess functional similarity between paralogous protein pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W.: Local alignment statistics. Methods in Enzymology 266, 460–480 (1996)

    Article  Google Scholar 

  2. Benner, S.A., Cohen, M.A., Gonnet, G.H.: Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology 229, 1065–1082 (1993)

    Article  Google Scholar 

  3. Chan, S.K., Hsing, M., Hormozdiari, F., Cherkasov, A.: Relationship between insertion/deletion (indel) frequency of proteins and essentiality. BMC Bioinformatics 8, 227 (2007)

    Article  Google Scholar 

  4. Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. Journal of Molecular Biology 341, 617–631 (2004)

    Article  Google Scholar 

  5. Cherkasov, A., Lee, S.J., Nandan, D., Reiner, N.E.: Large-scale survey for potentially targetable indels in bacterial and protozoan proteins. Proteins 62, 371–380 (2005)

    Article  Google Scholar 

  6. Cherkasov, A., Nandan, D., Reiner, N.E.: Selective targetting of indel-inferred differences in 3D structures of highly homologous proteins. Proteins: Structure, Function and Bioinformatics 58, 950–954 (2005)

    Article  Google Scholar 

  7. Couto, F.M., Silva, M.J., Coutinho, P.M.: Measuring semantic similarity between Gene Ontology terms. Data & Knowledge Engineering 61, 137–152 (2007)

    Article  Google Scholar 

  8. Dembo, A., Karlin, S.: Strong limit theorem of empirical functions for large exceedances of partial sums of i.i.d. variables. Annals of Probability 19, 1737–1755 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  9. Denver, D.R., Morris, K., Lynch, M., Thomas, W.K.: High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430, 679–682 (2004)

    Article  Google Scholar 

  10. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Fechteler, T., Dengler, U., Schomburg, D.: Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. Journal of Molecular Biology 253, 114–131 (1995)

    Article  Google Scholar 

  12. Gerlt, J.A., Babbitt, P.C.: Can sequence determine function? Genome Biology 1(5), reviews0005.1-0005.10 (2000)

    Google Scholar 

  13. Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162, 705–708 (1982)

    Article  Google Scholar 

  14. Gu, X., Li, W.-H.: The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. Journal of Molecular Evolution 40, 464–473 (1995)

    Article  Google Scholar 

  15. Hsiao, W.W.L., Ung, K., Aeschliman, D., Bryan, J., Finlay, B.B., Brinkman, F.S.L.: Evidence of a large novel gene pool associated with prokaryotic genomic islands. PLoS Genetics 1, e62 (2005)

    Article  Google Scholar 

  16. Karlin, S., Altschul, S.F.: Methods for assessing the statistic significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  17. Kondrashov, A.S., Rogozin, I.B.: Context of Deletions and Insertions in Human Coding Sequences. Human Mutation 23, 177–185 (2004)

    Article  Google Scholar 

  18. Lake, J.A., Riveral, M.C.: Horizontal gene transfer among genomes: The complexity hypothesis. Proceedings of the National Academy of Science 96(7), 3801–3806 (1999)

    Article  Google Scholar 

  19. Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)

    Article  Google Scholar 

  20. Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18 (2007), doi:10.1101/gr.6725608

    Google Scholar 

  21. Nandan, D., Lopez, M., Ban, F., Huang, M., Li, Y., Reiner, N.E., Cherkasov, A.: Indel-based targeting of essential proteins in human pathogens that have close host orthologue(s): Discovery of selective inhibitors for Leishmania donovani elongation factor-1 − α. Proteins: Structure, Function and Bioinformatics 67, 53–67 (2007)

    Article  Google Scholar 

  22. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  23. Pang, A., Smith, A.D., Nuin, P.A.S., Tillier, E.T.M.: SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinformatics 6, 236 (2005)

    Article  Google Scholar 

  24. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  25. Pesquita, C., Faria, D., Bastos, H., Falco, A.O., Couto, F.M.: Evaluating GO-based semantic similarity measures. In: Proceedings of the 10th Annual Bio-Ontologies Meeting (Bio-Ontologies 2007) (2007)

    Google Scholar 

  26. Pipenbacher, P., Schliep, A., Schneckener, S., Schönhuth, A., Schomburg, D., Schrader, R.: ProClust: improved clustering of protein sequences with an extended graph-based approach. Bioinformatics 18(Supp.2), 182–191 (2002)

    Google Scholar 

  27. Peköz, E.A., Ross, S.M.: A simple derivation of exact reliability formulas for linear and circular consecutive-k-of-n F systems. Journal of Applied Probability 32, 554–557 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  28. Qian, B., Goldstein, R.A.: Distribution of indel lengths. Proteins: Structure, Function and Bioinformatics 45, 102–104 (2001)

    Article  Google Scholar 

  29. Resnik, P.: Semantic similarity in a taxonomy: an information- based measure and its application to problems of ambiguity in natural language. Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  30. Rost, B.: Twilight zone of protein sequence alignments. Protein Engineering 12(2), 85–94 (1999)

    Article  MathSciNet  Google Scholar 

  31. Schlicker, A., Domingues, F.S., Rahnenführer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006)

    Article  Google Scholar 

  32. Sevilla, J.L., Segura, V., Podhorski, A., Guruceaga, E., Mato, J.M., Martnez-Cruz, L.A., Corrales, F.J., Rubio, A.: Correlation between gene expression and GO semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4), 330–338 (2005)

    Article  Google Scholar 

  33. The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)

    Google Scholar 

  34. Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: An improved likelihood model of sequence evolution. Journal of Molecular Evolution 34, 3–16 (1992)

    Article  Google Scholar 

  35. The UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 35, D193-D197 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Salari, R., Schönhuth, A., Hormozdiari, F., Cherkasov, A., Sahinalp, S.C. (2008). The Relation between Indel Length and Functional Divergence: A Formal Study. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics