Skip to main content
Log in

A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.

Application of this ‘optimum match’ method of sequence comparison shows that the sensitivity of the ‘maximum match’ method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The ‘maximum match’ method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The ‘constrained match’ method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ambler, R.P., Bartsch, R.G. (1975). Nature253, 285–288

    Google Scholar 

  • Barker, W.C., Dayhoff, M.O. (1972). In: Atlas of Protein Sequence and Structure, (Dayhoff, M.O. ed.), vol. 5, pp. 101–110. National Biomedical Research Foundation, Washington, USA

    Google Scholar 

  • Haën, C. de, Swanson, E., Teller, D.C. (1976). J. Mol. Biol.106, 639–661

    Google Scholar 

  • Dickerson, R.E. (1971). J. Mol. Biol.57, 1–15

    Google Scholar 

  • Dickerson, R.E. (1972). Scientific American, vol. 226, No. 4, pp. 58–72

    Google Scholar 

  • Fitch, W.M. (1966). J. Mol. Biol.16, 9–16

    Google Scholar 

  • Fitch, W.M. (1970). J. Mol. Biol.49, 1–14

    Google Scholar 

  • Haber, J.E., Koshland, D.E. (1970). J. Mol. Biol.50, 617–639

    Google Scholar 

  • Johnson, N.L., Nixon, E., Amos, P.E. (1963). Biometrika50, 459–498

    Google Scholar 

  • Mathews, F.S., Levine, M., Argos, P. (1972). J. Mol. Biol.64, 449–464

    Google Scholar 

  • McLachlan, A.D. (1971). J. Mol. Biol.61, 409–424

    Google Scholar 

  • Needleman, S.B., Wunsch, C.D. (1970). J. Mol. Biol.48, 443–453

    Google Scholar 

  • Ozols, J., Strittmatter, P. (1967). Proc. Nat. Acad. Sci. Wash.58, 264–267

    Google Scholar 

  • Pettigrew, G.W. (1974). Biochem. J.139, 449–459

    Google Scholar 

  • Rossmann, M.G., Argos, P. (1975). J. Biol. Chem.250, 7525–7532

    Google Scholar 

  • Sankoff, D., Cedergren, R.J. (1973). J. Mol. Biol.77, 159–164

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elleman, T.C. A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions. J Mol Evol 11, 143–161 (1978). https://doi.org/10.1007/BF01733890

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01733890

Key words

Navigation