Abstract
DNA tandem repeats (TRs), and in particular, variable number of tandem repeat (VNTR) loci, can have functional effects on gene regulation and disease mechanisms and are useful for forensics studies. The need to quickly analyze high volumes of sequencing data for TRs and VNTRs has motivated the search for a more efficient sequence alignment algorithm for tandem repeats. Alignment of a pattern to a sequence, which may contain zero or more tandem copies of the pattern, can be accomplished using wraparound dynamic programming (WDP). This paper presents the use of Single Instruction, Multiple Data (SIMD) computer instructions as well as a parallel scan to accelerate WDP, extending earlier SIMD algorithms for global alignment. The SIMD data types and intrinsics store data in 128 bit computer words partitioned into 16 1-byte blocks. Operations are performed on the bytes separately and simultaneously. We allow either single values for match and mismatch, or a substitution scoring scheme that assigns a potentially different substitution weight to every pair of alphabet characters. Additionally, for indels, we allow either a simple linear gap penalty or an affine gap penalty. Benchmarking demonstrated that SIMD tandem alignment runs over 3 times faster than standard wraparound dynamic programming.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J.E., White, J., Sikkink, K., Chandler, V.L.: An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442, 295–298 (2006)
Benson, G.: Sequence alignment with tandem duplication. J. Comput. Biol. 4, 351–367 (1997)
Blelloch, G.E.: Vector Models for Data-parallel Computing, vol. 356. MIT Press, Cambridge (1990)
Campuzano, V., Montermini, L., Molto, M., Pianese, L., Cossee, M.: Friedreich’s ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion. Science 271, 1423–1427 (1996)
Clarke, H., Flint, J., Attwood, A., Munafo, M.: Association of the 5-HTTLPR genotype and unipolar depression: a meta-analysis. Psychol. Med. 40, 1767–1778 (2010)
de Koning, A.P., Gu, W., Castoe, T.A., Batzer, M.A., Pollock, D.D.: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7(12), e1002384 (2011)
Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)
Fischetti, V.A., Landau, G.M., Schmidt, J.P., Sellers, P.H.: Identifying periodic occurrences of a template with applications to protein structure. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 111–120. Springer, Heidelberg (1992). doi:10.1007/3-540-56024-6_9
Frothingham, R., Meeker-O’Connell, W.A.: Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats. Microbiology 144(5), 1189–1196 (1998)
Fu, Y.-H., Pizzuti, A., Fenwick, R., King, J., Rajnarayan, S., Dunne, P., Dubel, J., Nasser, G., Ashizawa, T., DeJong, P., Wieringa, B., Korneluk, R., Perryman, M., Epstein, H., Caskey, C.: An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255, 1256–1258 (1992)
Gascoyne-Binzi, D., Barlow, R., Frothingham, R., Robinson, G., Collyns, T., Gelletlie, R., Hawkey, P.: Rapid identification of laboratory contamination with Mycobacterium tuberculosis using variable number tandem repeat analysis. J. Clin. Microbiol. 39, 69–74 (2001)
Gelfand, Y., Hernandez, Y., Loving, J., Benson, G.: VNTRseek - a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res. 42(14), 8884–8894 (2014). http://dx.doi.org/10.1093/nar/gku642
Huntington’s disease collaborative research group: A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72, 971–983 (1993)
Jobling, M.A., Gill, P.: Encoded evidence: DNA in forensic analysis. Nat. Rev. Genet. 5(10), 739–751 (2004)
Keim, P., Pearson, T., Okinaka, R.: Microbial forensics: DNA fingerprinting of Bacillus anthracis (anthrax). Anal. Chem. 80(13), 4791–4800 (2008). doi:10.1021/ac086131g
Lasky-Su, J.A., Faraone, S.V., Glatt, S.J., Tsuang, M.T.: Meta-analysis of the association between two polymorphisms in the serotonin transporter gene and affective disorders. Am. J. Med. Genet. B Neuropsychiatr. Genet. 133B, 110–115 (2005)
Lesch, K.P., Bengel, D., Heils, A., Sabol, S.Z., Greenberg, B.D., Petri, S., Benjamin, J., Muller, C.R., Hamer, D.H., Murphy, D.L.: Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science 274, 1527–1531 (1996)
Lindstedt, B.-A.: Multiple-locus variable number tandem repeats analysis for genetic fingerprinting of pathogenic bacteria. Electrophoresis 26(13), 2567–2582 (2005)
Loving, J.: Bit-parallel and SIMD alignment algorithms for biological sequence analysis. Ph.D. thesis, Boson University (2017)
Loving, J., Hernandez, Y., Benson, G.: BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm. Bioinformatics 30(22), 3166–3173 (2014)
Loving, J., Becker, E., Benson, G.: Bit-parallel alignment with substitution scoring. In: Proceedings of the 8th International Conference on Bioinformatics and Computational Biology (BICoB), pp. 149–154 (2016)
Miller, W., Myers, E.: Approximate matching of regular expressions. Bull. Math. Biol. 51, 5–37 (1989)
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Pritchard, A.L., Pritchard, C.W., Bentham, P., Lendon, C.L.: Role of serotonin transporter polymorphisms in the behavioural and psychological symptoms in probable Alzheimer disease patients. Dement. Geriatr. Cogn. Disord. 24, 201–206 (2007)
Stam, M., Belele, C., Dorweiler, J.E., Chandler, V.L.: Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes Dev. 16, 1906–1918 (2002)
Teixeira, F.K., Colot, V.: Repeat elements and the Arabidopsis DNA methylation landscape. Heredity 105, 14–23 (2010). http://dx.doi.org/10.1038/hdy.2010.52
Van Belkum, A.: Tracing isolates of bacterial species by multilocus variable number of tandem repeat analysis (MLVA). FEMS Immunol. Med. Microbiol. 49(1), 22–27 (2007)
Verkerk, A., Pieretti, M., Sutcliffe, J., Fu, Y., Kuhl, D., Pizzuti, A., Reiner, O., Richards, S., Victoria, M., Zhang, F., Eussen, B., van Ommen, G., Blonden, A., Riggins, G., Chastain, J., Kunst, C., Galjaard, H., Caskey, C., Nelson, D., Oostra, B., Warren, S.: Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991)
Vinces, M.D., Legendre, M., Caldara, M., Hagihara, M., Verstrepen, K.J.: Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009)
Walker, E.L.: Paramutation of the r1 locus of maize is associated with increased cytosine methylation. Genetics 148, 1973–1981 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Loving, J., Scaduto, J.P., Benson, G. (2017). An SIMD Algorithm for Wraparound Tandem Alignment. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-59575-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59574-0
Online ISBN: 978-3-319-59575-7
eBook Packages: Computer ScienceComputer Science (R0)