Abstract
Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata’s (PDA) input unit. Whole DNA genome segments will be placed into PDA’s stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.
Similar content being viewed by others
References
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR (1995) Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269:496–512
Lipman DJ, Pearson WR (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340
Newberg LA (2008) Memory-efficient dynamic programming backtrace and pairwise local sequence alignment. Bioinformatics 24(16):1772–1778
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Smith TF, Waterman MS (1981) Comparison of bio-sequences. Adv Appl Math 2:482–489
Arratia R, Morris P, Waterman MS (1988) Stochastic scrabbles: a law of large numbers for sequence matching with scores. J Appl Probab 25:106–119
Dembo A, Karlin S (1991) Strong limit theorems of empirical functional for large exceedances of partial sums of id variables. Ann Probab 19:1737–1755
Karlin S, Altschu SF (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA 90:5873–5877
Ning Z, Cox AJ, Mullikin JC (2001) A fast search method for large DNA databases. Genome Res 11:1725–1729
Watanabe T, Takeda A, Mise K, Okuno T, Suzuki T, Minami N, Imai H (2005) Stage-specific expression of microRNAs during Xenopus, development. FEBS Lett 579:318
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
Kent WJ, Sugnet C, Furey T, Roskin K, Pringle T, Zahler A, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199–208
Khan MI, Kamal MS (2013) RSAM: an integrated algorithm for local sequence alignment. Arch Sci 5:395–412
Weckx S, Favero J, Rademakers R, Claes L, Cruts M, De JP, Van BC, De RP (2005) A novel computational tool for sequence variation discovery. Genome Res 15:436–442
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Stephens M, Sloan JS, Robertson PD, Scheet P, Nickerson DA (2006) Automating sequence-based detection and genotyping of SNPs from diploid samples. Nat Genet 38:375–381
Claverie JM, Poirot O, Lopez F (1997) The difficulty of identifying genes in anonymous vertebrate sequences. Comput Chem 21:203–214
Pagani I, Konstantinos L, Jansson J, Chen A, Smirnova T, Bahador N (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and meta genomic projects and their associated metadata. Nucleic Acids Res 40:571–579
Yok NG, Rosen GL (2011) Combining gene prediction methods to improve meta genomic gene annotation. BMC Bioinform 12:20
Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7:455–457
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khan, M.I., Kamal, M.S. & Chowdhury, L. MSuPDA: A Memory Efficient Algorithm for Sequence Alignment. Interdiscip Sci Comput Life Sci 8, 84–94 (2016). https://doi.org/10.1007/s12539-015-0275-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-015-0275-8