Skip to main content
Log in

On the analysis of large-scale genomic structures

  • Original Article
  • Published:
Cell Biochemistry and Biophysics Aims and scope Submit manuscript

Abstract

We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments (“junk DNA”) for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Watson, J. D., Hopkins, N. H., Roberts, J. W., Steiz, J. A., and Weiner, A. M. (1987) Molecular Biology of The Gene, ed. 4, The Benjamin/Cummings Publishing Company, Menlo Park, CA.

    Google Scholar 

  2. Venter, J. C., Adams, M. D., Myers, E. W. et al. (2001) The sequence of the human genome. Science 291, 1304–1351.

    Article  PubMed  CAS  Google Scholar 

  3. Lander, E. S., Linton, L. M., Birren, B., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.

    Article  PubMed  CAS  Google Scholar 

  4. Setubal, J. and Meidanis, J. (1997) Introduction to Computational Molecular Biology, PWS Publishing Company, Boston.

    Google Scholar 

  5. Anantharaman, V., Koonin, E. V., and Aravind, L. (2002) Comparative genomics and evolution of proteins involved in RNA metabolism. Nucl. Acids Res. 30, 1427–1464.

    Article  PubMed  CAS  Google Scholar 

  6. Baxevanis, A. D. and Ouellete, B. F. F., eds. (2001) Bioinformatics, ed. 2, John Wiley & Sons, New York.

    Google Scholar 

  7. Wheeler, D. L., Church, D. M., Lash, A. E., Leipe, D. D., Madden, T. L., Pontius, J. U., Schuler, G. D., Schrimi, L. M., Tatusova, T. A., Wagner, L., and Rapp, B. A. (2002) Database resources of the National Center for Biotechnology Information: 2002 update. Nucl. Acids Res. 30, 13–16.

    Article  PubMed  CAS  Google Scholar 

  8. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Weeler, D. L. (2002) GenBank. Nucl. Acids Res. 28, 17–20.

    Article  Google Scholar 

  9. Sueoka, N. (1959) A statistical analysis of deoxyribonucleic acid distribution in density gradient centrifugation. Proc. Natl. Acad. Sci. USA 45, 1480–1490.

    Article  PubMed  CAS  Google Scholar 

  10. Churchill, G. A. (1989) Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79–94.

    PubMed  CAS  Google Scholar 

  11. Bernardi, G. (2000) Isochores and the evolutionary genomics of vertebrates. Gene 241, 3–17.

    Article  PubMed  CAS  Google Scholar 

  12. Oliver, J. L., Bernaola-Gálvan, P., Carpena, P., and Román-Roldán, R. (2001) Isochore chromosome maps of eukaryotic genomes. Gene 276, 47–56.

    Article  PubMed  CAS  Google Scholar 

  13. Li, W. (2001) Delineating relative homogeneous G+C domains in DNA sequences. Gene 276, 57–72.

    Article  PubMed  CAS  Google Scholar 

  14. Li, W. (2001) New stopping criteria for segmenting DNA sequences. Phys. Rev. Lett. 86, 5815–5818.

    Article  PubMed  CAS  Google Scholar 

  15. Clay, O. (2001) Standard deviations and correlations of CG levels in DNA sequences. Gene 276, 33–38.

    Article  PubMed  CAS  Google Scholar 

  16. Eyre-Walker, A. and Hurst, L. D. (2001) The evolution of isochors. Nat. Rev. Genet. 2, 549–554.

    Article  PubMed  CAS  Google Scholar 

  17. Peng, C.-K., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Sciortino, F., Simons, M., and Stanley, H. E. (1992) Long-range correlations in nucleotide sequences. Nature 356, 168.

    Article  PubMed  CAS  Google Scholar 

  18. Peng, C.-K., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Sciortino, F., Simons, M., and Stanley, H. E. (1992) Fractal landscape analysis of DNA walks, Physica A 191, 25–29.

    Article  PubMed  CAS  Google Scholar 

  19. Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C.-K., Stanley, E. H., Stanley, M. H. R., and Simons, M. (1993) Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. Biophys. J. 65, 2673–2679.

    Article  PubMed  CAS  Google Scholar 

  20. Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C.-K., Simons, M., and Stanley, E. H. (1993) Generalized Lévy-walk model for DNA nucleotide sequences. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 47, 4514–4523.

    PubMed  CAS  Google Scholar 

  21. Buldyrev, S. V., Dokholyan, N. V., Goldberger, A. L., Havlin, S., Peng, C.-K., Stanley, E. H., and Viswanathan, G. M. (1998) Analysis of DNA sequences using methods of statistical physics. Physica A 249, 430–438.

    Article  CAS  Google Scholar 

  22. Viswanathan, G. M., Buldyrev, S. V., Havlin, S., and Stanley, H. E. (1998) Long-range correlation measures for quantifying patchiness: deviations from uniform power-law scaling in genomic DNA. Physica A 249, 581–586.

    Article  CAS  Google Scholar 

  23. Rosas, A., Nogueira, E., and Fontanari, J. F., (2002) Multifractal analysis of DNA walks and trails, Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 66, 061906.

    Google Scholar 

  24. Gates, M. A. (1986) A simple way to look at DNA. J. Theor. Biol. 119, 319–328.

    Article  PubMed  CAS  Google Scholar 

  25. Berthelsen, C. L., Glazier, J.A., and Skolnick, M. H. (1992) Global fractal dimension of human DNA sequences treated as pseudorandom walks, Phys. Rev. A 45, 8902–8913.

    Article  PubMed  CAS  Google Scholar 

  26. Abramson, G., Alemany, P. A., and Cerdeira, H. A. (1998) Noisy Lévy walk analog of two-dimensional DNA walks for chromosomes of S. cerevisiae. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 58, 914–918.

    CAS  Google Scholar 

  27. Berthelsen, C. L., Glazier, J. A., and Raghavachari, S. (1994) Effective multifractal spectrum of a random walk. Phys. Rev. E 49, 1860–1864.

    Article  CAS  Google Scholar 

  28. Glazier, J. A., Raghavachari, S., Berthelsen, C. L., and Skolnick, M. H. (1995) Reconstructing phylogeny from the multifractal spectrum of mitochondrial DNA. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 51, 2665–2668.

    PubMed  CAS  Google Scholar 

  29. Tarafdar, S., Nandy, P., Sahoo, S., Som, A., Chakrabarti, J., and Nandy, A. (1999) Self-similarity and scaling exponent for DNA walk model in two and four dimensions. Indian J. Phys. 73B, 337–343.

    CAS  Google Scholar 

  30. Oiwa, N. N. and Glazier, J. A. (2002) The fractal structure of the mitochondrial genomes. Physica A 311, 221–230.

    Article  CAS  Google Scholar 

  31. Oiwa, N. N. and Glazier, J. A. (2004) Self-similar mitochondrial DNA. Cell Biochem. Biophys. 41, 41–62.

    Article  PubMed  CAS  Google Scholar 

  32. Clark, A. G. (2001) The search for meaning in noncoding DNA. Genome Res. 11, 1319–1320.

    Article  PubMed  CAS  Google Scholar 

  33. Bergman, C. M. and Kreitman, M. (2001) Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11, 1335–1345.

    Article  PubMed  CAS  Google Scholar 

  34. Purugganan, M. D. (1993) Scale-invariant spatial patterns in genome organization. Phys. Lett. A 175, 252–256.

    Article  CAS  Google Scholar 

  35. Provata, A. (1999) Random aggregation models for the formation and evolution of coding and noncoding DNA. Physica A 264, 570–580.

    Article  CAS  Google Scholar 

  36. Oiwa, N. N. and Goldman, C. (2000) Phylogenetic study of the spatial distribution of protein-coding and control segments in DNA chains. Phys. Rev. Lett. 85, 2396–2399.

    Article  PubMed  CAS  Google Scholar 

  37. Li, W. and Kaneko, K. (1992) Long-range correlation and partial 1/f a spectrum in a noncoding DNA sequence. Europhys. Lett. 17, 655–660.

    Article  CAS  Google Scholar 

  38. Voss, R. F. (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805–3808.

    Article  PubMed  CAS  Google Scholar 

  39. Larhammar, D. and Chatzidimitriou-Dreismann, C. A. (1993) Biological origins of long-range correlations and compositional variations in DNA. Nucl. Acids. Res. 21, 5167–5170.

    Article  PubMed  CAS  Google Scholar 

  40. Osaka, M., Gohara, K., Ishii, S., Kishida, H., Hayakawa, H., and Ito, N. (1999) Symbolic strings and spatial 1/f spectra. Physica D 125, 142–154.

    Article  CAS  Google Scholar 

  41. Silva, A. C. R., Ferro, J. A., Relnach, F. C., et al. (2002) Comparison of the genomes of two Xanthomonas pathogens with differing host specificities. Nature 417, 459–463.

    Article  PubMed  Google Scholar 

  42. Goffeau, A., Barrel, B. G., Bussey, H., et al. (1996) Life with 6000 genes. Science 274, 546–567.

    Article  PubMed  CAS  Google Scholar 

  43. Ainscough, R., Bardill, S., Barlow, K., et al. (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018.

    Article  Google Scholar 

  44. Herzel, H., Trifonov, E. N., Weiss, O., and Grobe, I. (1998) Interpreting correlations in biosequences. Physica A 249, 449–459.

    Article  CAS  Google Scholar 

  45. Azbel', M. Ya. (1995) Universality in a DNA statistical structure. Phys. Rev. Lett. 75, 168–171.

    Article  PubMed  Google Scholar 

  46. Li, W., Marr, T. G., and Kaneko, K. (1994) Understanding long-range correlations in DNA sequences. Physica D 75, 392–416.

    Article  CAS  Google Scholar 

  47. Vlad, M. O., Schönfisch, B., and Mackey, M. C. (1996) Evolution towards ergodic behavior of stationary fractal random processes with memory: application to the study of long-range correlations of nucleotide sequences in DNA. Physica A 229, 312–342.

    Article  CAS  Google Scholar 

  48. Li, W. (1997) The study of correlation structures of DNA sequences: a critical review. Comput. Chem. 21, 257–271.

    Article  PubMed  CAS  Google Scholar 

  49. Lu, X., Sun, Z., Chen, H., and Li, Y. (1998) Characterizing self-similarities in bacteria DNA sequence. Phys. Rev. Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 58, 3578–3584.

    CAS  Google Scholar 

  50. Z.-G., Anh, V. V., and Wang, B. (2000) Correlation property of length sequences based on global structure of the complete genome. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 63, 011903.

    Google Scholar 

  51. Allegrini, P., Barbi, M., Grigolini, P., and West, B. J. (1995) Dynamical model for DNA sequences. Phys. Rev. E Stat. Phy. Plasmas Fluids Relat. Interdiscip. Topics 5281–5296.

  52. Arnéodo, A., Bacry, E., Graves P. V., and Muzy, J. F. (1995) Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys. Rev. Lett. 74, 3293–3296.

    Article  PubMed  Google Scholar 

  53. Arnéodo, A., d'Aubenton-Carafa, Y., Bacry, E., Graves, P. V., Muzy, J. F., and Thermes, C. (1996) Wavelet based fractal analysis of DNA sequences. Physica D 96, 291–320.

    Article  Google Scholar 

  54. Arnéodo, A., d'Aubenton-Carafa, Y., Audit, B., Bacry, E., Muzy, J. F., and Thermes, C. (1998) Nucleotide composition effects on the long-range correlations in human genes. Eur. Phys. J. B 1, 259–263.

    Article  Google Scholar 

  55. Arnéodo, A., Audit, B., Bacry, E., Mannville, S., Muzy, J. F., and Roux, S. G. (1998) Thermodynamics of fractal signals based on wavelet analysis: application to fully developed turbulence data and DNA sequences. Physica A 254, 24–45.

    Article  Google Scholar 

  56. Vieira, M. S. (1999) Statistics of DNA sequences: a low-frequency analysis. Phys. Rev E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 60, 5932–5937.

    Google Scholar 

  57. Bernaola-Gálvan, P., Román-Roldán, R., and Oliver, J. L. (1996) Compositional segmentation and log-range fractal correlation in DNA sequences. Phys. Rev. 53, 5181–5189.

    Article  Google Scholar 

  58. Herzel, H. and Grobe, I. (1997) Correlations in DNA sequences: the role of protein coding segments. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 55, 800–810.

    CAS  Google Scholar 

  59. Román-Roldán, R., Bernaola-Galván, P., and Oliver, J. L. (1998) Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80, 1344–1347.

    Article  Google Scholar 

  60. Luo, L., Lee, W., Jia, L., Ji, F., and Tsai, L. (1998) Statistical correlation of nucleotide in a DNA sequence. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 58, 861–871.

    CAS  Google Scholar 

  61. Román-Roldán, R., Carpena, P., Bernaola-Galván, P., and Oliver, J. L. (1999) Compositional complexity of DNA sequence models. Comput. Phys. Comm. 121–122, 136–138.

    Google Scholar 

  62. Crochemore, M. and Vérin, R. (1999) Zones of low entropy in genomic sequences. Comput. Chem. 23, 275–282.

    Article  PubMed  CAS  Google Scholar 

  63. Kowalczuk, M., Gierlik, A., Mackiewicz, P., Cebrat, S., and Dudek, M. R. (1999) Optimization of gene sequences under constant mutational pressure and selection. Physica A 273, 116–131.

    Article  CAS  Google Scholar 

  64. Guharay, S., Hunt, B. R., Yorke, J.A., and White, O. R. (2000) Correlations in DNA sequences across the three domains of life. Physica D 146, 388–396.

    Article  Google Scholar 

  65. Weber, J. L., and Myers, E. (1997) Human whole-genome shotgun sequencing. Genome Res. 7, 401–409.

    PubMed  CAS  Google Scholar 

  66. Green, P. (1997) Against a whole-genome shotgun. Genome Res. 7, 410–417.

    PubMed  CAS  Google Scholar 

  67. Green, P. (2002) Whole-genome disassembly. Proc Natl Acad Sci USA 99, 4143–4144.

    Article  PubMed  CAS  Google Scholar 

  68. Myers, E. W., Sutton, G. G., Smith, H. O., et al. (2002) On the sequencing and assembly of the human genome. Proc Natl Acad Sci USA 99, 4145–4146.

    Article  PubMed  CAS  Google Scholar 

  69. Mackiewicz, P., Gierlik, A., Kowalczuk, M., Szczepanik, D., Dudek, M. R., and Cebrat, S. (1999) Mechanism generating long-range correlation in nucleotide composition of the Borrelia burgdorferi genome. Physica A 273, 103–115.

    Article  CAS  Google Scholar 

  70. Mantegna, R. N. (1994) Linguistic features of noncoding DNA sequences. Phys. Rev. Lett. 73, 3169–3172.

    Article  PubMed  CAS  Google Scholar 

  71. Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C.-K., Simons, M., and Stanley, H. E. (1995) Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. Phys. Rev. E 52, 2939–2950.

    Article  CAS  Google Scholar 

  72. Mantegna, R. N., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Peng, C.-K., Simons, M., and Stanley, H. E. (1996) Reply, Phys. Rev. Lett. 76, 1979–1981.

    Article  PubMed  CAS  Google Scholar 

  73. Israeloff, N. E., Kaganlenko, M., and Chan, K. (1996) Can Zipf distinguish language from noise in noncoding DNA. Phys. Rev. Lett. 76, 1976.

    Article  PubMed  CAS  Google Scholar 

  74. Bonhoeffer, S., Herz, A. V. M., Boerlijst, M. C., Nee, S., Nowak, M. A., and May, R. M. (1996) No signs of hidden language in noncoding DNA. Phys. Rev. Lett. 76, 1977.

    Article  PubMed  CAS  Google Scholar 

  75. Voss, R. F. (1996) Comment on linguistic features of noncoding DNA sequences. Phys. Rev. Lett. 76, 1978.

    Article  PubMed  CAS  Google Scholar 

  76. Halsey, T. C., Jensen, M. H., Kadanoff, L. P., Procaccia, I., and Shraiman, B. I. (1986) Fractal measures and their singularities: the characterization of strange sets. Phys. Rev. A Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 33, 1141–1151.

    Google Scholar 

  77. Hao, B.-L. (1989) Elementary Symbolic Dynamics and Chaos in Dissipative Systems. World Scientific, Singapore.

    Google Scholar 

  78. McCauley, J. L. (1993) Chaos, Dynamics and Fractals and Algorithmic Approach to Deterministic Chaos, Cambridge Univ. Press, Cambridge.

    Google Scholar 

  79. Easton, D. F. (1999) How many more breast cancer predisposition genes are there? Breast Cancer Res. 1, 14–17.

    Article  PubMed  CAS  Google Scholar 

  80. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1989) Numerical Recipes—the Art of Scientific Computing, Cambridge University Press, Cambridge.

    Google Scholar 

  81. Mandelbrot, B. (1967) How long is the coast of Britain? Statistical self-similarity and fractional dimension. Science 156, 636–639.

    Article  PubMed  CAS  Google Scholar 

  82. Mandelbrot, B. (1982) The Fractal Geometry of Nature, Freeman, San Francisco.

    Google Scholar 

  83. Grassberger, P. and Procaccia, I. (1983) Characterization of strange attractors. Phys. Rev. Lett. 50, 346–349.

    Article  Google Scholar 

  84. Grassberger, P. and Procaccia, I. (1983) Measuring the strangeness of strange attactors. Physica D 9, 189–208.

    Article  Google Scholar 

  85. Yamaguti, M. and Prado, C. P. C. 1995) A direct calculation of the spectrum of singularities f(a) of multifractals. Phys. Lett. A 206, 318–322.

    Article  CAS  Google Scholar 

  86. Yamaguti, M. and Prado, C. P. C. 1997) A smart covering for a box-counting algorithm. Phys. Rev. E 55, 7726–7732.

    Article  CAS  Google Scholar 

  87. Oiwa N. N. and Fiedler-Ferrara, N. 1998) A moving-box algorithm to estimate generalized dimensions and the f(a) spectrum. Physica D 124, 210–224.

    Article  Google Scholar 

  88. Haken, H. (1988) Information and Self-Organization—A Macroscopic Approach to Complex Systems, Springer-Verlag, Berlin.

    Google Scholar 

  89. Nicolis, G. and Prigogine, I. (1989) Exploring Complexity, W. H. Freeman and Company, New York.

    Google Scholar 

  90. Takahashi, M. (1989) A fractal model of chromosomes and chromosomal DNA replication. J Theor. Biol. 141, 117–136.

    PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nestor Norio Oiwa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oiwa, N.N., Goldman, C. On the analysis of large-scale genomic structures. Cell Biochem Biophys 42, 145–165 (2005). https://doi.org/10.1385/CBB:42:2:145

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1385/CBB:42:2:145

Index Entries

Navigation