Elsevier

Genomics

Volume 46, Issue 1, 15 November 1997, Pages 37-45
Genomics

Regular Article
A Tool for Analyzing and Annotating Genomic Sequences

https://doi.org/10.1006/geno.1997.4984Get rights and content

Abstract

We describe a tool for analyzing and annotating large genomic sequences containing introns. The analysis and annotation tool (AAT) includes two sets of programs, one for comparing the query sequence with a protein database and the other for comparing the query with a cDNA database. Each set contains a fast database search program and a rigorous alignment program. The database search program quickly identifies regions of the query sequence that are similar to a database sequence. Then the alignment program constructs an optimal alignment for each region and the database sequence. The alignment program also reports the coordinates of exons in the query sequence. Pairwise alignments of the query sequence with protein and cDNA database sequences are combined into multiple sequence alignments, which provide a view of all protein and cDNA sequences matching a query region. On a data set of 570 DNA sequences, AAT identified 94% of coding nucleotides correctly and 74% of exons exactly. Results of analyzing a human BAC sequence with the AAT tool are also presented. The AAT tool reduces the labor-intensive work of locating the exons of the query sequence and improves the process of defining intron–exon boundaries by using the wealth of available protein and cDNA data.

References (20)

There are more references available in the full text version of this article.

Cited by (177)

  • Assembly of a hybrid mangrove, Bruguiera hainesii, and its two ancestral contributors, Bruguiera cylindrica and Bruguiera gymnorhiza

    2022, Genomics
    Citation Excerpt :

    For each species, the RNA-seq data was mapped to the genome assembly using GMAP [18] and a transcriptome assembly was performed using PASA v2.4.1 [17]. Protein sequences were obtained from public databases for Oryza sativa, Mimulus guttatus, Sesamum indicum, Populus trichocarpa and Eucalyptus grandis and aligned to the genomes using AAT [19]. The ab initio prediction program Augustus v3.3.3 [20] was trained using the PASA2 alignment assembly [17].

  • Phylogenomics, divergence time estimation and trait evolution provide a new look into the Gracilariales (Rhodophyta)

    2021, Molecular Phylogenetics and Evolution
    Citation Excerpt :

    Genome assemblies were aligned to the protein database using Diamond blastx in the –more-sensitive mode. Splice-aware alignments were made using the AAT pipeline r03052011 (Huang et al., 1997) for contig/protein pairs identified by Diamond v0.8.17 (Buchfink et al., 2015). Ab initio gene predictions were made with Augustus v3.2.2 (Stanke et al., 2004), using the built-in training for Galdieria Merola.

View all citing articles on Scopus

J. Setlow, Ed.

1

To whom correspondence should be addressed at Department of Computer Science, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931. Telephone: (906) 487-2123. Fax: (906) 487-2283. E-mail: [email protected].

View full text