Clustering-Based HMP Sequence Comparison

Niu, Beifang; Wu, Sitao; Li, Weizhong

doi:10.1007/978-1-4899-7475-4_90

Beifang Niu⁴,
Sitao Wu⁴ &
Weizhong Li⁴

Synonyms

Comparison of sequences from human microbiome projects using sequence clustering methods

Definition

Sequence clustering is a computational method that groups similar sequences into families. Clustering sequences from multiple samples from human microbiome projects or other metagenomic projects can effectively compare these samples.

Introduction

Numerous human microbiome projects and other metagenomic projects have sequenced many microbiome samples using high-throughput sequencing platforms. One of the key goals of these projects is to compare samples or groups of samples according to their composition and abundance profiles by taxon, gene, function, and pathway. These profiles are often calculated by comparing the sequences against various reference databases. However, reference-based methods cannot analyze the large number of novel sequences that are frequently found in metagenomics samples.

Clustering analysis is a data mining and classification method that assigns similar...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 499.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
CAS PubMed Central PubMed Google Scholar
Huang Y, Niu B, Gao Y, et al. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680–2.
CAS PubMed Central PubMed Google Scholar
Huse SM, Welch DM, Morrison HG, et al. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol. 2010;12:1889–98.
CAS PubMed Central PubMed Google Scholar
Kunin V, Engelbrektson A, Ochman H, et al. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010;12:118–23.
CAS PubMed Google Scholar
Li W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinforma. 2009;10:359.
Google Scholar
Li WZ, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
CAS PubMed Google Scholar
Li W, Wooley JC, Godzik A. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE. 2008;3:e3375.
PubMed Central PubMed Google Scholar
Li W, Fu L, Niu B, Wu S, Wooley J. Ultrafast clustering algorithms for metagenomic sequence analysis. Brief Bioinform. 2012. doi:10.1093/bib/bbs035.
Google Scholar
Niu B, Fu L, Sun S, et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinforma. 2010;11:187.
Google Scholar
Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.
CAS PubMed Central PubMed Google Scholar
Quince C, Lanzén A, Curtis TP, et al. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009;6:639.
CAS PubMed Google Scholar
Quince C, Lanzen A, Davenport RJ, et al. Removing noise from pyrosequenced amplicons. BMC Bioinforma. 2011;12:38.
Google Scholar
Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods. 2010;7:668–9.
CAS PubMed Central PubMed Google Scholar
Schloss PD, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005;71:1501–6.
CAS PubMed Central PubMed Google Scholar
Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.
CAS PubMed Central PubMed Google Scholar
Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 2011;6(12):e27310.
CAS PubMed Central PubMed Google Scholar
Turnbaugh PJ, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–U487.
CAS PubMed Central PubMed Google Scholar
White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009;5:e1000352.
PubMed Central PubMed Google Scholar
Wu S, Zhu Z, Fu L, et al. WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genomics. 2011;12:444.
PubMed Central PubMed Google Scholar
Yooseph S, Sutton G, Rusch DB, et al. The sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16.
PubMed Central PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Center for Research in Biological Systems (CRBS), University of California, 9500 Gilman Drive MC0446, La Jolla, CA, 92093, USA
Beifang Niu, Sitao Wu & Weizhong Li

Authors

Beifang Niu
View author publications
You can also search for this author in PubMed Google Scholar
Sitao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Weizhong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beifang Niu .

Editor information

Editors and Affiliations

Genomic Medicine, J. Craig Venter Institute, La Jolla, CA, USA
Sarah K. Highlander
Universidad Miguel Hernandez, Campus San Juan, San Juan, Alicante, Spain
Francisco Rodriguez-Valera
The Institute for Genomic Biology Department of Animal Sciences & Pathobiology Division of Nutritional Sciences, University of Illinois, Urbana, IL, USA
Bryan A. White

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Niu, B., Wu, S., Li, W. (2015). Clustering-Based HMP Sequence Comparison. In: Highlander, S.K., Rodriguez-Valera, F., White, B.A. (eds) Encyclopedia of Metagenomics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7475-4_90

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7475-4_90
Published: 04 January 2015
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7474-7
Online ISBN: 978-1-4899-7475-4
eBook Packages: Biomedical and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics