Abstract
The minimum error correction (MEC) model is one of the important computational models for determining haplotype information from sequencing data, i.e., single individual single nucleotide polymorphism (SNP) haplotyping, haplotype reconstruction or haplotype assembly. Due to the NP-hardness of the model, a fast and accurate enumeration algorithm is proposed for solving it. The presented algorithm reconstructs the SNP sites of a pair of haplotypes one after another. It enumerates two kinds of SNP values, i.e., (0 1)T and (1 0)T, for the SNP site being reconstructed, and chooses the one with more support coming from the SNP fragments that are covering the corresponding SNP site. The experimental comparisons were conducted among the presented algorithm, the FAHR, the Fast Hare and the DGS algorithms. The results prove that our algorithm can get higher reconstruction rate than the other three algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bafna, V., Istrail, S., Lancia, G., Rizzi, R.: Polynomial and APX-hard cases of the individual haplotyping problem. Theoret. Comput. Sci. 335, 109–125 (2005)
Geraci, F.: A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26(18), 2217–2225 (2010)
Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H., Duan, J., Carr, J.L., Lee, M.S., Koshy, B., Kumar, A.M., Zhang, G., Newell, W.R., Windemuth, A., Xu, C., Kalbfleisch, T.S., Shaner, S.L., Arnold, K., Schulz, V., Drysdale, C.M., Nandabalan, K., Judson, R.S., Ruano, G., Vovis, G.F.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489–493 (2001)
Wu, J.L., Liang, B.B.: A fast and accurate algorithm for diploid individual haplotype reconstruction. J. Bioinform. Comput. Biol. 11(4), 1350010 (2013)
Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)
Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Comput. Biol. 8(3), 305–324 (2001)
O’Neil, S.T., Emrich, S.J.: Haplotype and minimum-chimerism consensus determination using short sequence data. BMC Genom. 13(Suppl. 2), S4 (2012)
Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs problems, complexity, and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)
Lippert, R., Schwartza, R., Lancia, G., Istrail, S.: Algorithmic strategies for the SNPs haplotype assembly problem. Brief. Bioinform. 3(1), 23–31 (2002)
Xie, M.Z., Chen, J.E., Wang, J.X.: Research on parameterized algorithms of the individual haplotyping problem. J. Bioinform. Comput. Biol. 5(3), 795–816 (2007)
Xie, M.Z., Wang, J.X.: An improved (and practical) parameterized algorithm for the individual haplotyping problem MFR with mate-pairs. Algorithmica 52, 250–266 (2008)
Cilibrasi, R., Iersel, L.V., Kelk, S., Tromp, J.: The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1), 13–36 (2007)
Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183 (2010)
Panconesi, A., Sozio, M.: Fast Hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)
Wang, Y., Wang, E., Wang, R.S.: A clustering algorithm based on two distance functions for MEC model. Comput. Biol. Chem. 31(2), 148–150 (2007)
Genovese, L.M., Geraci, F., Pellegrini, M.: SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(4), 492–502 (2008)
Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., Lin, Y., MacDonald, J.R., Pang, A.W., Shago, M., Stockwell, T.B., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S.A., Busam, D.A., Beeson, K.Y., McIntosh, T.C., Remington, K.A., Abril, J.F., Gill, J., Borman, J., Rogers, Y.H., Frazier, M.E., Scherer, S.W., Strausberg, R.L., Venter, J.C.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), 2113–2144 (2007)
Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)
Chen, Z., Fu, B., Schweller, R., Yang, B., Zhao, Z., Zhu, B.: Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol. 15(5), 535–546 (2008)
Aguiar, D., Istrail, S.: Haplotype assembly in polyploidy genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)
Mazrouee, S., Wang, W.: FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics 30(17), i371–i378 (2014)
Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Lengauer, T., Schneider, R., Bork, P., et al. (eds.) ISMB 1999, pp. 202–210. AAAI Press, California (1999)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim—a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)
Acknowledgments
The authors are grateful to anonymous referees for their helpful comments. This research is supported by the National Natural Science Foundation of China under Grant No.61363035 and No.61502111, Guangxi Natural Science Foundation under Grant No. 2015GXNSFAA139288, No. 2013GXNSFBA019263 and No. 2012GXNSFAA053219, Research Fund of Guangxi Key Lab of Multisource Information Mining & Security No. 14-A-03-02 and No. 15-A-03-02, “Bagui Scholar” Project Special Funds, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, X., Wu, J., Li, L. (2016). Haplotyping a Diploid Single Individual with a Fast and Accurate Enumeration Algorithm. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-42291-6_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)