A Sequence Property Approach to Searching Protein Databases

https://doi.org/10.1006/jmbi.1995.0442Get rights and content

Abstract

Currently available sequence alignment programs are generally not capable of detecting functional and structural homologs in the twilight zone of sequence similarity, i.e. when the sequence identity falls below about 25%. Here we attempt to detect such weak similarities using an approach based on a notion of protein sequence similarity radically different from that used in sequential alignment. The approach defines protein sequence dissimilarity (or distance) as a weighted sum of differences of compositional properties such as singlet and doublet amino acid composition, molecular weight, isoelectric point (protein property search or PropSearch).

With PropSearch, either single sequences can be used for a database query, or multiple sequences can be merged into an “average” sequence reflecting the average composition of a protein family.

First, we show that members of structural protein families have a low mutual PropSearch distance when the weights are optimized to discriminate maximally between structural families. Second, we demonstrate the results of database searches using the PropSearch method. Such searches are very rapid when scanning a preprocessed database and do not require alignments.

In cases in which conventional alignment tools fail to detect similarities PropSearch can be used to generate hypotheses about possible structural or functional relationships between a new sequence and sequences in the database.

References (0)

Cited by (160)

  • δ-Carbonic Anhydrases: Structure, Distribution, and Potential Roles

    2015, Carbonic Anhydrases as Biocatalysts: From Theory to Medical and Industrial Applications
  • Distant plant homologues: Don't throw out the baby

    2012, Trends in Plant Science
    Citation Excerpt :

    Thus, protein structure modelling here has led to a major breakthrough that was not possible using sequence similarity detection programs. PROPSEARCH (http://abcis.cbs.cnrs.fr/propsearch/ [23]) is a slightly older homology program that is still very useful in detecting remote protein homologies. It neglects the order of amino acids and uses amino acid composition instead.

View all citing articles on Scopus
f1

Corresponding author

View full text