Abstract
Periodic elements play important roles in genomic structures and functions, yet some complex periodic elements in genomes are difficult to detect by conventional methods such as digital signal processing and statistical analysis. We propose a periodic power spectrum (PPS) method for analyzing periodicities of DNA sequences. The PPS method employs periodic nucleotide distributions of DNA sequences and directly calculates power spectra at specific periodicities. The magnitude of a PPS reflects the strength of a signal on periodic positions. In comparison with Fourier transform, the PPS method avoids spectral leakage, and reduces background noise that appears high in Fourier power spectrum. Thus, the PPS method can effectively capture hidden periodicities in DNA sequences. Using a sliding window approach, the PPS method can precisely locate periodic regions in DNA sequences. We apply the PPS method for detection of hidden periodicities in different genome elements, including exons, microsatellite DNA sequences, and whole genomes. The results show that the PPS method can minimize the impact of spectral leakage and thus capture true hidden periodicities in genomes. In addition, performance tests indicate that the PPS method is more effective and efficient than a fast Fourier transform. The computational complexity of the PPS algorithm is \(\mathrm{O}(N)\). Therefore, the PPS method may have a broad range of applications in genomic analysis. The MATLAB programs for implementing the PPS method are available from MATLAB Central (http://www.mathworks.com/matlabcentral/fileexchange/55298).
Similar content being viewed by others
References
Afreixo V, Ferreira PJ, Santos D (2004) Fourier analysis of symbolic data: a brief review. Digital Signal Process 14(6):523–530
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. Springer, New York
Anastassiou D (2000) Frequency-domain analysis of biomolecular sequences. Bioinformatics 16(12):1073–1081
Arora R, Sethares WA (2007) Detection of periodicities in gene sequences: a maximum likelihood approach. In: IEEE international workshop on genomic signal processing and statistics, GENSIPS 2007. IEEE, pp 1–4
Buchner M, Janjarasjitt S (2003) Detection and visualization of tandem repeats in DNA sequences. IEEE Trans Signal Process 51(9):2280–2287
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N (2014) Heterogenome: database of genome periodicity. Database 2014:bau040
Chechetkin V, Turygin AY (1995) Search of hidden periodicities in dna sequences. J Theor Biol 175(4):477–494
Chen K, Meng Q, Ma L, Liu Q, Tang P, Chiu C, Hu S, Yu J (2008) A novel DNA sequence periodicity decodes nucleosome positioning. Nucleic Acids Res 36(19):6228–6236
Costa A, Melucci M (2010) An information retrieval model based on discrete fourier transform. In: Advances in multidisciplinary retrieval. Springer, New York, pp 84–99
Datta S, Asif A (2005) A fast dft based gene prediction algorithm for identification of protein coding regions. In: ICASSP (5), pp 653–656
Epps J, Ying H, Huttley GA (2011) Statistical methods for detecting periodic fragments in DNA sequence data. Biol Direct 6(21):1–16
Foster G (1995) The cleanest fourier spectrum. Astron J 109:1889–1902
Glunčić M, Paar V (2013) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41(1):e17–e17
Grover A, Aishwarya V, Sharma P (2012) Searching microsatellites in dna sequences: approaches used and tools developed. Physiol Mol Biol Plants 18(1):11–19
Gupta R, Sarthi D, Mittal A, Singh K (2007) A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences. EURASIP J Bioinform Syst Biol 2007:3–3
Gymrek M, Golan D, Rosset S, Erlich Y (2012) lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 22(6):1154–1162
Hauth AM, Joseph DA (2002) Beyond tandem repeats: complex pattern structures and distant regions of similarity. Bioinformatics 18(suppl 1):S31–S37
Herzel H, Weiss O, Trifonov EN (1999) 10–11 bp periodicities in complete genomes reflect protein structure and dna folding. Bioinformatics 15(3):187–193
Hoang T, Yin C, Zheng H, Yu C, He RL, Yau SST (2015) A new method to cluster dna sequences using fourier power spectrum. J Theor Biol 372:135–145
Illingworth CJ, Parkes KE, Snell CR, Mullineaux PM, Reynolds CA (2008) Criteria for confirming sequence periodicity identified by fourier transform analysis: application to gcr2, a candidate plant gpcr? Biophys Chem 133(1):28–35
Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci 89(6):2195–2199
Koning de AJ, Gu W, Castoe TA, Batzer MA, Pollock DD (2011) Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7(12):e1002,384
Korotkov EV, Korotkova MA, Kudryashov NA (2003) Information decomposition method to analyze symbolical sequences. Phys Lett A 312(3):198–210
Kotlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13(8):1930–1937
Lyon DA (2009) The discrete fourier transform, part 4: spectral leakage. J ObjectTechnol 8(7)
Messaoudi I, Elloumi-Oueslati A, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed fourier analysis. IEEE/ACM Trans Comput Biol Bioinform 11(5):863–877
Murray KB, Gorse D, Thornton JM (2002) Wavelet transforms for the characterization and detection of repeating motifs. J Mol Biol 316(2):341–363
Narzisi G, Schatz M (2015) The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol 3(8)
Nunes MC, Wanner EF, Weber G (2011) Origin of multiple periodicities in the Fourier power spectra of the plasmodium falciparum genome. BMC Genomics 12(Suppl 4):S4
Renton AE, Majounie E, Waite A, Simón-Sánchez J, Rollinson S, Gibbs JR, Schymick JC, Laaksovirta H, Van Swieten JC, Myllykangas L et al (2011) A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72(2):257–268
Ritchie DW, Kemp GJ (2000) Protein docking using spherical polar Fourier correlations. Proteins Struct Funct Bioinform 39(2):178–194
Scargle JD (1982) Studies in astronomical time series analysis. II-statistical aspects of spectral analysis of unevenly spaced data. Astrophys J 263:835–853
Segal E, Fondufe-Mittendorf Y, Chen L, Thåström A, Field Y, Moore IK, Wang JPZ, Widom J (2006) A genomic code for nucleosome positioning. Nature 442(7104):772–778
Shapiro JA, von Sternberg R (2005) Why repetitive DNA is essential to genome function. Biol Rev 80(02):227–250
Sharma D, Issac B, Raghava G, Ramaswamy R (2004) Spectral repeat finder (SRF): identification of repetitive sequences using fourier transformation. Bioinformatics 20(9):1405–1412
Silverman B, Linsker R (1986) A measure of DNA periodicity. J Theor Biol 118(3):295–300
Sutherland GR, Richards RI (1995) Simple tandem DNA repeats and human genetic disease. Proc Natl Acad Sci 92(9):3636–3641
Suvorova YM, Korotkova MA, Korotkov EV (2014) Comparative analysis of periodicity search methods in DNA sequences. Comput Biol Chem 53:43–48
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by fourier analysis of genomic sequences. Bioinformatics 13(3):263–270
Treangen TJ, Salzberg SL (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13(1):36–46
Trifonov EN (1998) 3-, 10.5-, 200-and 400-base periodicities in genome sequences. Physica A Stat Mech Appl 249(1):511–516
Voss R (1992) Evolution of long-range fractal correlation and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808
Wang J, Liu G, Zhao J (2012) Some features of Fourier spectrum for symbolic sequences. Numer Math A J Chin Univ 4:341–356
Wang L, Stein LD (2010) Localizing triplet periodicity in DNA and cDNA sequences. BMC Bioinform 11(1):550
Wang W, Johnson DH (2002) Computing linear transforms of symbolic signals. IEEE Trans Signal Process 50(3):628–634
Welch PD (1967) The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoust 15(2):70–73
Wojcik EA, Brzostek A, Bacolla A, Mackiewicz P, Vasquez KM, Korycka-Machala M, Jaworski A, Dziadek J (2012) Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria. PloS One 7(12):e51–e64
Yin C (2015) Representation of DNA sequences in genetic codon context with applications in exon and intron prediction. J Bioinform Comput Biol 13(2):1550004
Yin C, Yau SST (2005) A Fourier characteristic of coding sequences: origins and a non-Fourier approximation. J Comput Biol 12(9):1153–1165
Yin C, Yau SST (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247(4):687–694
Yin C, Yau SST (2015) An improved model for whole genome phylogenetic analysis by Fourier transform. J Theor Biol 359(21):18–28
Yin C, Chen Y, Yau SST (2014a) A measure of DNA sequence similarity by fourier transform with applications on hierarchical clustering. J Theor Biol 359:18–28
Yin C, Yin XE, Wang J (2014b) A novel method for comparative analysis of DNA sequences by Ramanujan–Fourier transform. J Comput Biol 21(12):867–879
Acknowledgments
We are grateful to Professor Donald Morrison for helpful discussion, and to the anonymous reviewers for their valuable comments. We thank Emily Yin from Fremd high school for proof reading of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yin, C., Wang, J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J. Math. Biol. 73, 1053–1079 (2016). https://doi.org/10.1007/s00285-016-0982-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-016-0982-8