Abstract
The advent of automated DNA sequencing techniques has led to an explosive growth in the number and length of DNAs sequenced frpm different organisms. While this has resulted in a large accumulation of data in the DNA databases, it has also called for the development of suitable techniques for rapid viewing and analysis of the data. Over the last few years several methods have been proposed that address these issues and represent a DNA sequence in a compact graphical form in one-, two- or three-dimensions that can be expanded as necessary to help visualize the patterns in gene sequences and aid in in-depth analysis. Graphical techniques have been found to be useful in highlighting local and global base dominances, to identify regions of extensive repetitive sequences, differentiate between coding and non-coding regions, and to be indicative of evolutionary divergences. Analysis with graphical methods have also provided insights into new structures in DNA sequences such as fractals and long range correlations, and some measures have been developed that help quantify the visual patterns.
This review presents a comprehensive study of the graphical representation methods and their applications in viewing and analysing long DNA sequences and evaluates the merits of each of these from a practical viewpoint with prescriptions on domains of applicability of each method. A discussion on the comparative merits and demerits of the various methods and possible future developments have also been included.
Similar content being viewed by others
References
Baranidharan S, Sankaranarayanan B and Brahmachari S K 1994 Chaos game representation of similarities and differences between genomic sequences;Int. J. Genome Res. 1, 309–319
Berg O G and von Hippel P H 1988 Selection of DNA binding sites by regulatory proteins;J. Mol. Biol. 193 723–750
Blattner F R, Plunkett III G, Bloch A. A, Perna N T, Burland V, Riley M, Collado-Vides J, Glasner J D, Rode A. K, Mayhew G F, Gregor J, Davis N W, Kirkpatrick H A, Goeden M A, Rose D J, Mau B and Shao Y 1997 The complete genome sequence ofEscherichia coli K-12;Science 277 1453–1474
Burdon M G 1984 DNA sequence selection by eye;Nature (London) 312 313
Burma P K, Raj A, Deb J K and Brahmachari S K 1992 Genome analysis: A new approach for visualisation of sequence organisation in genomes;J. Biosci. 17 395–411
Chatzidimitriou-Dreismann A. A and Larhammar D 1993 Long range correlations in DNA;Nature (London) 361 212–213
Clift B, Haussler, McConnell R, Schneider T D and Stormo G D 1986 Sequence landscapes;Nucleic Acids Res. 14 141–158
Dutta C and Das J 1992 Mathematical characterisation of chaos game representation: New algorithms for nucleotide sequence analysis;J. Mol. Biol. 228 715–719
Fitch W M 1966 An improved method of testing for evolutionary homology;J. Mol. Biol. 16 9–16
Fleischmann R D. Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R. Bult C J, Tomb J-F, Dougherty B A. Merrick J M, McKenney K, Sutton G, FitzHugh W, Fields C, Gocayne J D, Scott J, Shirley R, Liu L-I, Glodek A, Kelley J M, Weidman J F, Phillips A. A, Spriggs T, Hedblom E. Cotton M D, Utterback T R, Hanna M C, Nguyen D T. Saudek D M, Brandon R C, Fine L D, Fritchman J L. Fuhrmann J L, Geoghagen N S M. Gnehm C L, McDonald L A, Small K V, Fraser A. M, Smith H O and Venter J C 1995 Whole-genome random sequencing and assembly ofHaemophilus influenzae Rd;Science 269 496–512
Gates M A 1986 A simple wayto look atDNA;J. Theor. Biol. 119 319–328
Hamori E 1985 Novel DNA sequence representation;Nature (London) 314 585–586
Hamori E 1989 Graphical representation of long DNA sequences by the methods of H curves, current results and future aspects;BioTechniques 7 710–720
Hamori E 1994 Visualisation of bioologicoal information encoded In DNA; InFrontiers of computing Science, Volo 3:Scientific visualisation (eds) C Pickover and S K Tewksbury (New York: Plenum Press) pp 91–121
Hamori E and Ruskin J 1983 H curves, a novel method of representation of nucleotide series especially suited for long range DNA sequences;J. Biol. Chemo 258 1318–1327
Hamon E and Varga G 1988 DNA sequence (H) curves of the human immunodeficiency virus I and some related viral genomes;DNA 7 371–378
Hamori E and Varga G 1989 Use of H curves in searches for DNA sequences which code for overlapping peptide genes;FASEB J. 3 A331
Hamon, E, Varga G and LaGuardoia J J 1989 HYLAS: program for generating H curves (Abstract three-dimensional representations of long DNA sequences);Comput. Applic. Biosci. 5 263–269
Hayashi K and Munakata N 1984 Basically musical;Nature (London) 310 96
Holley H and Karplus M 1991 Neural networks for protein structure prediction;Methods Enzymol. 202 204–224
Lida Y 1985 Splice-site signals of mRNA precursors as revealed by computer search, site-specific mutagenesis and thalassemia;J. Biochem. 97 1173–1179
Lida Y 1988 Categorical discriminant analysis of 3’-splice signals of mRNA precursors in higher eukaryotic genes;J. Mol. Biol. 135 109–118
Lida Y and Sasaki F 1983 Recognition patterns for exon-intron junctions in higher organisms as revealed by a computer search;Nucleic Acids Res. 94 1731–1738
Jeffrey H J 1990 Chaos game representation of gene structure;Nucleic Acids Res. 18 2163–2170
Johnston M 1996 Genome sequencing: The complete code for a eukaryotic cell;Curr. Biol. 6 500–503
Karlin S and Brendel V 1993 Patchiness and correlations in DNA sequeonces;Science 259 677–680
Kel A E, Ponomarenko M P, Likhachev E A, Orlov Yu L, Ischenko I V, Milanesi L and Kolcahnov N A 1993 SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites;Comput. Applic. Biosci. 9 617–627
Korn L J, Queen A. L and Wegman M N 1977 Computer analysis of nucleic acid regulatory sequences;Proc. Natl. Acad. Sci. USA 74 4401–4405
Kudo M, Kitamura-Abe S, Shimbo M and Lida Y 1992 Analysis of context of 5c-splice site sequences in mammalian mRNA precursors by subclass method;Comput. Applic. Biosci. 8 367–376
Lathe R and Findlay R 1984 Machine-readable DNA sequences;Nature (London) 311 610s
Leong P M and Morgenthaler S 1995 Random walk and gap plots of DNA sequences;Comput Applic. Biosci. 11 503–507
Li W and Kaneko K 1992Europhys. Lett. 17 655–660
Maddox J 1992 Ever longer sequences in prospect;Nature (London) 357 13
Maizel J and Lenk R 1981 Enhanced graphic matrix analysis of nucleic acid and protein sequences;Proco Natl. Acad. Sci. USA 78 7665–7669
Mizraji E and Ninio J 1985 Graphical coding of nucleic acid sequences;Biochimie 67 445–448
Mount S M 1982 A catalogue of splice junction sequences;Nucleic Acids Res. 10 459–472
Nandy A 1994a A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes;Curr. Sci. 66 309–314
Nandy A 1994b Graphical representation of long DNA sequences;Curr. Sci. 66 821
Nandy A 1994c Recent investigations into global characteristics of long DNA sequences;Indian J. Biochem. Biophys. 131 149–155
Nandy A 1996a Two dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences;Comput. Appl. Biosci. 12 55–62
Nandy A 1996b Graphical analysis of DNA sequence structure: III. Indications of evolutionary distinctions and characteristics of introns and exons;Curr. Sci. 70 661–668
Nandy A and Nandy P 1995 Graphical analysis of DNA sequences structure: no Relative abundances of nucleotides in DNAs, gene evolution and duplication;Curr. Sci. 68 75–85
Nandy A and Raychaudhury C 1996A measure of evolutionary differences in base distributions of gene sequences, presented at the MBU Silver Jubilee Symposium on Structural Biology and 24th Annual Meeting of the Indian Biophysical Society, Indian Institute of Science, Bangalore
Nandy A and Raychaudhury C 1998 Indexation Scheme and Similarity Measures for Macromolecular Sequences;First Indo-US Workshop on Mathematical Chemistry, Vishva Bharati University, Shantiniketan, January 9–13
Nee S 1992 Uncorrelated DNA walks;Nature (London) 357 450
Nussinov R 1991 Compositional variations in gene sequences;Comput. Appl. Biosci. 7 287–293
Ohshima Y and Gotoh Y 1987 Signals for the selection of a splice site in pre-mRNA-computer analysis of splice junction sequences and like sequences;J. Mol. Biol. 195 247–259
Peng C-K, Buldyrev S V, Goldberger A L, Havlin S, Sciortino F, Simons M and Stanley H E 1992 Long range correlations in nucleotide sequences;Nature (London) 356 168–170
Prabhu V V and Claverie J M 1992 Correlations in intronless DNA;Nature (London) 359 782
Reddy B V B, Deshpande M and Pandit M W 1991 A computer prediction of splice sites in human genome; inComputers in biomedicine, Proc. first international conterence, Southampton, 24–26 September, 1991 (eds) K D Held, A. A Brebbia and R D Ciskowski (Boston: Computational Mechanics)
Sankoff D 1972 Matching sequences under deletion-insertion constraints;Proc. Natl. Acad. Sci. USA 69 4–6
Schneider T D and Stephens R M 1990 Sequence logos: A new way to display consensus sequences;Nucleic Acids Res. 18 6097–6100
Shapiro M B and Senapathy P 1987 RNA splice junctions of different classes of eukaryotes: Sequence statistics and functional implications in gene expression;Nucleic Acids Res. 15 7155–7174
Singh G B and Krawetz S 1995 DNAView: A quality assessment tool for the visualisation of large sequenced regions;Comput. Appl. Biosci. 11 317–319
Staden R 1982Nucleic Acids Res. 10 4731
Staden R 1984a Computer methods to locate signals in nucleic acid sequences;Nucleic Acids Res. 12 505–509
Staden R 1984b Graphic methods to determine the function of nucleic acid sequences;Nucleic Acids Res 12 521–538
Staden R 1984c Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes;Nucleic Acids Res. 12 551–567
Staden R 1990 Finding protein coding regions in genomic sequences;Methods Enzymol. 183 163–180
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S and Ramaswamy R 1997 Prediction of probable genes by Fourier analysis of genomic sequences;CABIOS 13 263–270
Voss R 1992 Evolution of long-range fractal correlations and Iff noise in DNA base sequences;Physo Rev. Lett. 68 3805–3808
Wilbur W and Lipman D J 1983 Rapid similarity searches of nucleic acids and protein data banks;Proc. Natl. Acad. Sci. USA 80 726–730
Xiao Y, Chen R, Shen R, Sun J and Xu J 1995 Fractal dimension of exon and intron sequences;J. Theor. Biol. 175 23–26
Zhang C-T and Zhang R 1991 Analysis of distribution of bases in the coding sequences by a diagrammatic technique;Nucleic. Acids Res. 19 6313–6317
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roy, A., Raychaudhury, C. & Nandy, A. Novel techniques of graphical representation and analysis of DNA sequences—A review. J. Biosci. 23, 55–71 (1998). https://doi.org/10.1007/BF02728525
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02728525