Skip to main content

Advertisement

Log in

Novel techniques of graphical representation and analysis of DNA sequences—A review

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

The advent of automated DNA sequencing techniques has led to an explosive growth in the number and length of DNAs sequenced frpm different organisms. While this has resulted in a large accumulation of data in the DNA databases, it has also called for the development of suitable techniques for rapid viewing and analysis of the data. Over the last few years several methods have been proposed that address these issues and represent a DNA sequence in a compact graphical form in one-, two- or three-dimensions that can be expanded as necessary to help visualize the patterns in gene sequences and aid in in-depth analysis. Graphical techniques have been found to be useful in highlighting local and global base dominances, to identify regions of extensive repetitive sequences, differentiate between coding and non-coding regions, and to be indicative of evolutionary divergences. Analysis with graphical methods have also provided insights into new structures in DNA sequences such as fractals and long range correlations, and some measures have been developed that help quantify the visual patterns.

This review presents a comprehensive study of the graphical representation methods and their applications in viewing and analysing long DNA sequences and evaluates the merits of each of these from a practical viewpoint with prescriptions on domains of applicability of each method. A discussion on the comparative merits and demerits of the various methods and possible future developments have also been included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baranidharan S, Sankaranarayanan B and Brahmachari S K 1994 Chaos game representation of similarities and differences between genomic sequences;Int. J. Genome Res. 1, 309–319

    CAS  Google Scholar 

  • Berg O G and von Hippel P H 1988 Selection of DNA binding sites by regulatory proteins;J. Mol. Biol. 193 723–750

    Article  Google Scholar 

  • Blattner F R, Plunkett III G, Bloch A. A, Perna N T, Burland V, Riley M, Collado-Vides J, Glasner J D, Rode A. K, Mayhew G F, Gregor J, Davis N W, Kirkpatrick H A, Goeden M A, Rose D J, Mau B and Shao Y 1997 The complete genome sequence ofEscherichia coli K-12;Science 277 1453–1474

    Article  CAS  Google Scholar 

  • Burdon M G 1984 DNA sequence selection by eye;Nature (London) 312 313

    Article  CAS  Google Scholar 

  • Burma P K, Raj A, Deb J K and Brahmachari S K 1992 Genome analysis: A new approach for visualisation of sequence organisation in genomes;J. Biosci. 17 395–411

    Article  CAS  Google Scholar 

  • Chatzidimitriou-Dreismann A. A and Larhammar D 1993 Long range correlations in DNA;Nature (London) 361 212–213

    Article  CAS  Google Scholar 

  • Clift B, Haussler, McConnell R, Schneider T D and Stormo G D 1986 Sequence landscapes;Nucleic Acids Res. 14 141–158

    Article  CAS  Google Scholar 

  • Dutta C and Das J 1992 Mathematical characterisation of chaos game representation: New algorithms for nucleotide sequence analysis;J. Mol. Biol. 228 715–719

    Article  CAS  Google Scholar 

  • Fitch W M 1966 An improved method of testing for evolutionary homology;J. Mol. Biol. 16 9–16

    Article  CAS  Google Scholar 

  • Fleischmann R D. Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R. Bult C J, Tomb J-F, Dougherty B A. Merrick J M, McKenney K, Sutton G, FitzHugh W, Fields C, Gocayne J D, Scott J, Shirley R, Liu L-I, Glodek A, Kelley J M, Weidman J F, Phillips A. A, Spriggs T, Hedblom E. Cotton M D, Utterback T R, Hanna M C, Nguyen D T. Saudek D M, Brandon R C, Fine L D, Fritchman J L. Fuhrmann J L, Geoghagen N S M. Gnehm C L, McDonald L A, Small K V, Fraser A. M, Smith H O and Venter J C 1995 Whole-genome random sequencing and assembly ofHaemophilus influenzae Rd;Science 269 496–512

    Article  CAS  Google Scholar 

  • Gates M A 1986 A simple wayto look atDNA;J. Theor. Biol. 119 319–328

    Article  CAS  Google Scholar 

  • Hamori E 1985 Novel DNA sequence representation;Nature (London) 314 585–586

    Article  CAS  Google Scholar 

  • Hamori E 1989 Graphical representation of long DNA sequences by the methods of H curves, current results and future aspects;BioTechniques 7 710–720

    CAS  PubMed  Google Scholar 

  • Hamori E 1994 Visualisation of bioologicoal information encoded In DNA; InFrontiers of computing Science, Volo 3:Scientific visualisation (eds) C Pickover and S K Tewksbury (New York: Plenum Press) pp 91–121

    Google Scholar 

  • Hamori E and Ruskin J 1983 H curves, a novel method of representation of nucleotide series especially suited for long range DNA sequences;J. Biol. Chemo 258 1318–1327

    CAS  Google Scholar 

  • Hamon E and Varga G 1988 DNA sequence (H) curves of the human immunodeficiency virus I and some related viral genomes;DNA 7 371–378

    Article  Google Scholar 

  • Hamori E and Varga G 1989 Use of H curves in searches for DNA sequences which code for overlapping peptide genes;FASEB J. 3 A331

    Google Scholar 

  • Hamon, E, Varga G and LaGuardoia J J 1989 HYLAS: program for generating H curves (Abstract three-dimensional representations of long DNA sequences);Comput. Applic. Biosci. 5 263–269

    Google Scholar 

  • Hayashi K and Munakata N 1984 Basically musical;Nature (London) 310 96

    Article  CAS  Google Scholar 

  • Holley H and Karplus M 1991 Neural networks for protein structure prediction;Methods Enzymol. 202 204–224

    Article  CAS  Google Scholar 

  • Lida Y 1985 Splice-site signals of mRNA precursors as revealed by computer search, site-specific mutagenesis and thalassemia;J. Biochem. 97 1173–1179

    Article  Google Scholar 

  • Lida Y 1988 Categorical discriminant analysis of 3’-splice signals of mRNA precursors in higher eukaryotic genes;J. Mol. Biol. 135 109–118

    Google Scholar 

  • Lida Y and Sasaki F 1983 Recognition patterns for exon-intron junctions in higher organisms as revealed by a computer search;Nucleic Acids Res. 94 1731–1738

    Google Scholar 

  • Jeffrey H J 1990 Chaos game representation of gene structure;Nucleic Acids Res. 18 2163–2170

    Article  CAS  Google Scholar 

  • Johnston M 1996 Genome sequencing: The complete code for a eukaryotic cell;Curr. Biol. 6 500–503

    Article  CAS  Google Scholar 

  • Karlin S and Brendel V 1993 Patchiness and correlations in DNA sequeonces;Science 259 677–680

    Article  CAS  Google Scholar 

  • Kel A E, Ponomarenko M P, Likhachev E A, Orlov Yu L, Ischenko I V, Milanesi L and Kolcahnov N A 1993 SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites;Comput. Applic. Biosci. 9 617–627

    CAS  Google Scholar 

  • Korn L J, Queen A. L and Wegman M N 1977 Computer analysis of nucleic acid regulatory sequences;Proc. Natl. Acad. Sci. USA 74 4401–4405

    Article  CAS  Google Scholar 

  • Kudo M, Kitamura-Abe S, Shimbo M and Lida Y 1992 Analysis of context of 5c-splice site sequences in mammalian mRNA precursors by subclass method;Comput. Applic. Biosci. 8 367–376

    CAS  Google Scholar 

  • Lathe R and Findlay R 1984 Machine-readable DNA sequences;Nature (London) 311 610s

    Article  CAS  Google Scholar 

  • Leong P M and Morgenthaler S 1995 Random walk and gap plots of DNA sequences;Comput Applic. Biosci. 11 503–507

    CAS  Google Scholar 

  • Li W and Kaneko K 1992Europhys. Lett. 17 655–660

    Article  CAS  Google Scholar 

  • Maddox J 1992 Ever longer sequences in prospect;Nature (London) 357 13

    Article  CAS  Google Scholar 

  • Maizel J and Lenk R 1981 Enhanced graphic matrix analysis of nucleic acid and protein sequences;Proco Natl. Acad. Sci. USA 78 7665–7669

    Article  CAS  Google Scholar 

  • Mizraji E and Ninio J 1985 Graphical coding of nucleic acid sequences;Biochimie 67 445–448

    Article  CAS  Google Scholar 

  • Mount S M 1982 A catalogue of splice junction sequences;Nucleic Acids Res. 10 459–472

    Article  CAS  Google Scholar 

  • Nandy A 1994a A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes;Curr. Sci. 66 309–314

    CAS  Google Scholar 

  • Nandy A 1994b Graphical representation of long DNA sequences;Curr. Sci. 66 821

    Google Scholar 

  • Nandy A 1994c Recent investigations into global characteristics of long DNA sequences;Indian J. Biochem. Biophys. 131 149–155

    Google Scholar 

  • Nandy A 1996a Two dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences;Comput. Appl. Biosci. 12 55–62

    CAS  PubMed  Google Scholar 

  • Nandy A 1996b Graphical analysis of DNA sequence structure: III. Indications of evolutionary distinctions and characteristics of introns and exons;Curr. Sci. 70 661–668

    CAS  Google Scholar 

  • Nandy A and Nandy P 1995 Graphical analysis of DNA sequences structure: no Relative abundances of nucleotides in DNAs, gene evolution and duplication;Curr. Sci. 68 75–85

    CAS  Google Scholar 

  • Nandy A and Raychaudhury C 1996A measure of evolutionary differences in base distributions of gene sequences, presented at the MBU Silver Jubilee Symposium on Structural Biology and 24th Annual Meeting of the Indian Biophysical Society, Indian Institute of Science, Bangalore

  • Nandy A and Raychaudhury C 1998 Indexation Scheme and Similarity Measures for Macromolecular Sequences;First Indo-US Workshop on Mathematical Chemistry, Vishva Bharati University, Shantiniketan, January 9–13

    Google Scholar 

  • Nee S 1992 Uncorrelated DNA walks;Nature (London) 357 450

    Article  CAS  Google Scholar 

  • Nussinov R 1991 Compositional variations in gene sequences;Comput. Appl. Biosci. 7 287–293

    CAS  PubMed  Google Scholar 

  • Ohshima Y and Gotoh Y 1987 Signals for the selection of a splice site in pre-mRNA-computer analysis of splice junction sequences and like sequences;J. Mol. Biol. 195 247–259

    Article  CAS  Google Scholar 

  • Peng C-K, Buldyrev S V, Goldberger A L, Havlin S, Sciortino F, Simons M and Stanley H E 1992 Long range correlations in nucleotide sequences;Nature (London) 356 168–170

    Article  CAS  Google Scholar 

  • Prabhu V V and Claverie J M 1992 Correlations in intronless DNA;Nature (London) 359 782

    Article  CAS  Google Scholar 

  • Reddy B V B, Deshpande M and Pandit M W 1991 A computer prediction of splice sites in human genome; inComputers in biomedicine, Proc. first international conterence, Southampton, 24–26 September, 1991 (eds) K D Held, A. A Brebbia and R D Ciskowski (Boston: Computational Mechanics)

    Google Scholar 

  • Sankoff D 1972 Matching sequences under deletion-insertion constraints;Proc. Natl. Acad. Sci. USA 69 4–6

    Article  CAS  Google Scholar 

  • Schneider T D and Stephens R M 1990 Sequence logos: A new way to display consensus sequences;Nucleic Acids Res. 18 6097–6100

    Article  CAS  Google Scholar 

  • Shapiro M B and Senapathy P 1987 RNA splice junctions of different classes of eukaryotes: Sequence statistics and functional implications in gene expression;Nucleic Acids Res. 15 7155–7174

    Article  CAS  Google Scholar 

  • Singh G B and Krawetz S 1995 DNAView: A quality assessment tool for the visualisation of large sequenced regions;Comput. Appl. Biosci. 11 317–319

    PubMed  Google Scholar 

  • Staden R 1982Nucleic Acids Res. 10 4731

    Article  CAS  Google Scholar 

  • Staden R 1984a Computer methods to locate signals in nucleic acid sequences;Nucleic Acids Res. 12 505–509

    Article  CAS  Google Scholar 

  • Staden R 1984b Graphic methods to determine the function of nucleic acid sequences;Nucleic Acids Res 12 521–538

    Article  CAS  Google Scholar 

  • Staden R 1984c Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes;Nucleic Acids Res. 12 551–567

    Article  CAS  Google Scholar 

  • Staden R 1990 Finding protein coding regions in genomic sequences;Methods Enzymol. 183 163–180

    Article  CAS  Google Scholar 

  • Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S and Ramaswamy R 1997 Prediction of probable genes by Fourier analysis of genomic sequences;CABIOS 13 263–270

    CAS  PubMed  Google Scholar 

  • Voss R 1992 Evolution of long-range fractal correlations and Iff noise in DNA base sequences;Physo Rev. Lett. 68 3805–3808

    Article  CAS  Google Scholar 

  • Wilbur W and Lipman D J 1983 Rapid similarity searches of nucleic acids and protein data banks;Proc. Natl. Acad. Sci. USA 80 726–730

    Article  CAS  Google Scholar 

  • Xiao Y, Chen R, Shen R, Sun J and Xu J 1995 Fractal dimension of exon and intron sequences;J. Theor. Biol. 175 23–26

    Article  CAS  Google Scholar 

  • Zhang C-T and Zhang R 1991 Analysis of distribution of bases in the coding sequences by a diagrammatic technique;Nucleic. Acids Res. 19 6313–6317

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Nandy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, A., Raychaudhury, C. & Nandy, A. Novel techniques of graphical representation and analysis of DNA sequences—A review. J. Biosci. 23, 55–71 (1998). https://doi.org/10.1007/BF02728525

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02728525

Keywords

Navigation