Synonyms
Comparison of sequences from human microbiome projects using sequence clustering methods
Definition
Sequence clustering is a computational method that groups similar sequences into families. Clustering sequences from multiple samples from human microbiome projects or other metagenomic projects can effectively compare these samples.
Introduction
Numerous human microbiome projects and other metagenomic projects have sequenced many microbiome samples using high-throughput sequencing platforms. One of the key goals of these projects is to compare samples or groups of samples according to their composition and abundance profiles by taxon, gene, function, and pathway. These profiles are often calculated by comparing the sequences against various reference databases. However, reference-based methods cannot analyze the large number of novel sequences that are frequently found in metagenomics samples.
Clustering analysis is a data mining and classification method that assigns similar...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Huang Y, Niu B, Gao Y, et al. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680–2.
Huse SM, Welch DM, Morrison HG, et al. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol. 2010;12:1889–98.
Kunin V, Engelbrektson A, Ochman H, et al. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010;12:118–23.
Li W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinforma. 2009;10:359.
Li WZ, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
Li W, Wooley JC, Godzik A. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE. 2008;3:e3375.
Li W, Fu L, Niu B, Wu S, Wooley J. Ultrafast clustering algorithms for metagenomic sequence analysis. Brief Bioinform. 2012. doi:10.1093/bib/bbs035.
Niu B, Fu L, Sun S, et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinforma. 2010;11:187.
Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.
Quince C, Lanzén A, Curtis TP, et al. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009;6:639.
Quince C, Lanzen A, Davenport RJ, et al. Removing noise from pyrosequenced amplicons. BMC Bioinforma. 2011;12:38.
Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods. 2010;7:668–9.
Schloss PD, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005;71:1501–6.
Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.
Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 2011;6(12):e27310.
Turnbaugh PJ, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–U487.
White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009;5:e1000352.
Wu S, Zhu Z, Fu L, et al. WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genomics. 2011;12:444.
Yooseph S, Sutton G, Rusch DB, et al. The sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this entry
Cite this entry
Niu, B., Wu, S., Li, W. (2015). Clustering-Based HMP Sequence Comparison. In: Highlander, S.K., Rodriguez-Valera, F., White, B.A. (eds) Encyclopedia of Metagenomics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7475-4_90
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7475-4_90
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7474-7
Online ISBN: 978-1-4899-7475-4
eBook Packages: Biomedical and Life SciencesReference Module Biomedical and Life Sciences