Panning for Genes—A Visual Strategy for Identifying Novel Gene Orthologs and Paralogs

  1. Jacques D. Retief1,
  2. Kevin R. Lynch2, and
  3. William R. Pearson2,3
  1. 1Information Technology and Communications–Academic Computing Health Sciences, and 2Department of Biochemistry, University of Virginia, Charlottesville, Virginia 22908 USA

Abstract

We have developed a rapid visual method for identifying novel members of gene families. Starting with an evolutionary tree, 20–50 protein query sequences for a gene family are selected from different branches of the tree. These query sequences are used to search the GenBank and expressed sequence tag (EST) DNA databases and their nightly updates using the tfastx3 or tfasty3 programs. The results of all 20–50 searches are collated and resorted to highlight EST or genomic sequences that share significant similarity with the query sequences. The statistical significance of each DNA/protein alignment is plotted, highlighting the portion of the query sequence that is present in the database sequence and the percent identity in the aligned region. The collated results for database sequences are linked using the WWW to the underlying scores and alignments; these links can also be used to perform additional searches to characterize the novel sequence further. With traditional “deep” scoring matrices (BLOSUM50) one can search for previously unrecognized families of large protein superfamilies. Alternatively, by using query sequences and EST libraries from the same species (e.g., human or mouse) together with “shallow” scoring matrices and filters that remove high-identity sequences, one can highlight new paralogs of previously described subfamilies. Using query sequences from the glutathione transferase superfamily, we identified two novel mammalian glutathione transferase families that were recognized previously only in plants. Using query sequences from known mammalian glutathione transferase subfamilies, we identified new candidate paralogs from the mouse class-mu, class-pi, and class-theta families.

Footnotes

  • 3 Corresponding author.

  • E-MAIL wrp{at}virginia.edu; FAX (804) 924-5069.

    • Received December 7, 1998.
    • Accepted February 9, 1999.
| Table of Contents

Preprint Server