Skip to main content
Log in

Periodic power spectrum with applications in detection of latent periodicities in DNA sequences

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Periodic elements play important roles in genomic structures and functions, yet some complex periodic elements in genomes are difficult to detect by conventional methods such as digital signal processing and statistical analysis. We propose a periodic power spectrum (PPS) method for analyzing periodicities of DNA sequences. The PPS method employs periodic nucleotide distributions of DNA sequences and directly calculates power spectra at specific periodicities. The magnitude of a PPS reflects the strength of a signal on periodic positions. In comparison with Fourier transform, the PPS method avoids spectral leakage, and reduces background noise that appears high in Fourier power spectrum. Thus, the PPS method can effectively capture hidden periodicities in DNA sequences. Using a sliding window approach, the PPS method can precisely locate periodic regions in DNA sequences. We apply the PPS method for detection of hidden periodicities in different genome elements, including exons, microsatellite DNA sequences, and whole genomes. The results show that the PPS method can minimize the impact of spectral leakage and thus capture true hidden periodicities in genomes. In addition, performance tests indicate that the PPS method is more effective and efficient than a fast Fourier transform. The computational complexity of the PPS algorithm is \(\mathrm{O}(N)\). Therefore, the PPS method may have a broad range of applications in genomic analysis. The MATLAB programs for implementing the PPS method are available from MATLAB Central (http://www.mathworks.com/matlabcentral/fileexchange/55298).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Afreixo V, Ferreira PJ, Santos D (2004) Fourier analysis of symbolic data: a brief review. Digital Signal Process 14(6):523–530

    Article  Google Scholar 

  • Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Springer, New York

    Book  Google Scholar 

  • Anastassiou D (2000) Frequency-domain analysis of biomolecular sequences. Bioinformatics 16(12):1073–1081

    Article  MathSciNet  Google Scholar 

  • Arora R, Sethares WA (2007) Detection of periodicities in gene sequences: a maximum likelihood approach. In: IEEE international workshop on genomic signal processing and statistics, GENSIPS 2007. IEEE, pp 1–4

  • Buchner M, Janjarasjitt S (2003) Detection and visualization of tandem repeats in DNA sequences. IEEE Trans Signal Process 51(9):2280–2287

    Article  MathSciNet  Google Scholar 

  • Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N (2014) Heterogenome: database of genome periodicity. Database 2014:bau040

  • Chechetkin V, Turygin AY (1995) Search of hidden periodicities in dna sequences. J Theor Biol 175(4):477–494

    Article  Google Scholar 

  • Chen K, Meng Q, Ma L, Liu Q, Tang P, Chiu C, Hu S, Yu J (2008) A novel DNA sequence periodicity decodes nucleosome positioning. Nucleic Acids Res 36(19):6228–6236

    Article  Google Scholar 

  • Costa A, Melucci M (2010) An information retrieval model based on discrete fourier transform. In: Advances in multidisciplinary retrieval. Springer, New York, pp 84–99

  • Datta S, Asif A (2005) A fast dft based gene prediction algorithm for identification of protein coding regions. In: ICASSP (5), pp 653–656

  • Epps J, Ying H, Huttley GA (2011) Statistical methods for detecting periodic fragments in DNA sequence data. Biol Direct 6(21):1–16

    Google Scholar 

  • Foster G (1995) The cleanest fourier spectrum. Astron J 109:1889–1902

    Article  Google Scholar 

  • Glunčić M, Paar V (2013) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41(1):e17–e17

    Article  Google Scholar 

  • Grover A, Aishwarya V, Sharma P (2012) Searching microsatellites in dna sequences: approaches used and tools developed. Physiol Mol Biol Plants 18(1):11–19

    Article  Google Scholar 

  • Gupta R, Sarthi D, Mittal A, Singh K (2007) A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences. EURASIP J Bioinform Syst Biol 2007:3–3

    Article  Google Scholar 

  • Gymrek M, Golan D, Rosset S, Erlich Y (2012) lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 22(6):1154–1162

    Article  Google Scholar 

  • Hauth AM, Joseph DA (2002) Beyond tandem repeats: complex pattern structures and distant regions of similarity. Bioinformatics 18(suppl 1):S31–S37

    Article  Google Scholar 

  • Herzel H, Weiss O, Trifonov EN (1999) 10–11 bp periodicities in complete genomes reflect protein structure and dna folding. Bioinformatics 15(3):187–193

    Article  Google Scholar 

  • Hoang T, Yin C, Zheng H, Yu C, He RL, Yau SST (2015) A new method to cluster dna sequences using fourier power spectrum. J Theor Biol 372:135–145

    Article  MathSciNet  MATH  Google Scholar 

  • Illingworth CJ, Parkes KE, Snell CR, Mullineaux PM, Reynolds CA (2008) Criteria for confirming sequence periodicity identified by fourier transform analysis: application to gcr2, a candidate plant gpcr? Biophys Chem 133(1):28–35

    Article  Google Scholar 

  • Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci 89(6):2195–2199

    Article  Google Scholar 

  • Koning de AJ, Gu W, Castoe TA, Batzer MA, Pollock DD (2011) Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7(12):e1002,384

  • Korotkov EV, Korotkova MA, Kudryashov NA (2003) Information decomposition method to analyze symbolical sequences. Phys Lett A 312(3):198–210

    Article  MathSciNet  MATH  Google Scholar 

  • Kotlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13(8):1930–1937

    Google Scholar 

  • Lyon DA (2009) The discrete fourier transform, part 4: spectral leakage. J ObjectTechnol 8(7)

  • Messaoudi I, Elloumi-Oueslati A, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed fourier analysis. IEEE/ACM Trans Comput Biol Bioinform 11(5):863–877

    Article  Google Scholar 

  • Murray KB, Gorse D, Thornton JM (2002) Wavelet transforms for the characterization and detection of repeating motifs. J Mol Biol 316(2):341–363

    Article  Google Scholar 

  • Narzisi G, Schatz M (2015) The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol 3(8)

  • Nunes MC, Wanner EF, Weber G (2011) Origin of multiple periodicities in the Fourier power spectra of the plasmodium falciparum genome. BMC Genomics 12(Suppl 4):S4

    Article  Google Scholar 

  • Renton AE, Majounie E, Waite A, Simón-Sánchez J, Rollinson S, Gibbs JR, Schymick JC, Laaksovirta H, Van Swieten JC, Myllykangas L et al (2011) A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72(2):257–268

    Article  Google Scholar 

  • Ritchie DW, Kemp GJ (2000) Protein docking using spherical polar Fourier correlations. Proteins Struct Funct Bioinform 39(2):178–194

    Article  Google Scholar 

  • Scargle JD (1982) Studies in astronomical time series analysis. II-statistical aspects of spectral analysis of unevenly spaced data. Astrophys J 263:835–853

    Article  Google Scholar 

  • Segal E, Fondufe-Mittendorf Y, Chen L, Thåström A, Field Y, Moore IK, Wang JPZ, Widom J (2006) A genomic code for nucleosome positioning. Nature 442(7104):772–778

    Article  Google Scholar 

  • Shapiro JA, von Sternberg R (2005) Why repetitive DNA is essential to genome function. Biol Rev 80(02):227–250

    Article  Google Scholar 

  • Sharma D, Issac B, Raghava G, Ramaswamy R (2004) Spectral repeat finder (SRF): identification of repetitive sequences using fourier transformation. Bioinformatics 20(9):1405–1412

    Article  Google Scholar 

  • Silverman B, Linsker R (1986) A measure of DNA periodicity. J Theor Biol 118(3):295–300

    Article  Google Scholar 

  • Sutherland GR, Richards RI (1995) Simple tandem DNA repeats and human genetic disease. Proc Natl Acad Sci 92(9):3636–3641

    Article  Google Scholar 

  • Suvorova YM, Korotkova MA, Korotkov EV (2014) Comparative analysis of periodicity search methods in DNA sequences. Comput Biol Chem 53:43–48

    Article  Google Scholar 

  • Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by fourier analysis of genomic sequences. Bioinformatics 13(3):263–270

    Article  Google Scholar 

  • Treangen TJ, Salzberg SL (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13(1):36–46

    Google Scholar 

  • Trifonov EN (1998) 3-, 10.5-, 200-and 400-base periodicities in genome sequences. Physica A Stat Mech Appl 249(1):511–516

    Article  Google Scholar 

  • Voss R (1992) Evolution of long-range fractal correlation and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808

    Article  Google Scholar 

  • Wang J, Liu G, Zhao J (2012) Some features of Fourier spectrum for symbolic sequences. Numer Math A J Chin Univ 4:341–356

    MathSciNet  MATH  Google Scholar 

  • Wang L, Stein LD (2010) Localizing triplet periodicity in DNA and cDNA sequences. BMC Bioinform 11(1):550

    Article  Google Scholar 

  • Wang W, Johnson DH (2002) Computing linear transforms of symbolic signals. IEEE Trans Signal Process 50(3):628–634

    Article  Google Scholar 

  • Welch PD (1967) The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoust 15(2):70–73

    Article  MathSciNet  Google Scholar 

  • Wojcik EA, Brzostek A, Bacolla A, Mackiewicz P, Vasquez KM, Korycka-Machala M, Jaworski A, Dziadek J (2012) Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria. PloS One 7(12):e51–e64

    Article  Google Scholar 

  • Yin C (2015) Representation of DNA sequences in genetic codon context with applications in exon and intron prediction. J Bioinform Comput Biol 13(2):1550004

  • Yin C, Yau SST (2005) A Fourier characteristic of coding sequences: origins and a non-Fourier approximation. J Comput Biol 12(9):1153–1165

    Article  Google Scholar 

  • Yin C, Yau SST (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247(4):687–694

    Article  MathSciNet  Google Scholar 

  • Yin C, Yau SST (2015) An improved model for whole genome phylogenetic analysis by Fourier transform. J Theor Biol 359(21):18–28

    MathSciNet  MATH  Google Scholar 

  • Yin C, Chen Y, Yau SST (2014a) A measure of DNA sequence similarity by fourier transform with applications on hierarchical clustering. J Theor Biol 359:18–28

    Article  MathSciNet  Google Scholar 

  • Yin C, Yin XE, Wang J (2014b) A novel method for comparative analysis of DNA sequences by Ramanujan–Fourier transform. J Comput Biol 21(12):867–879

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Professor Donald Morrison for helpful discussion, and to the anonymous reviewers for their valuable comments. We thank Emily Yin from Fremd high school for proof reading of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changchuan Yin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, C., Wang, J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J. Math. Biol. 73, 1053–1079 (2016). https://doi.org/10.1007/s00285-016-0982-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-016-0982-8

Keywords

Mathematics Subject Classification

Navigation