Skip to main content

Clustering-Based HMP Sequence Comparison

  • Reference work entry
  • First Online:
Encyclopedia of Metagenomics

Synonyms

Comparison of sequences from human microbiome projects using sequence clustering methods

Definition

Sequence clustering is a computational method that groups similar sequences into families. Clustering sequences from multiple samples from human microbiome projects or other metagenomic projects can effectively compare these samples.

Introduction

Numerous human microbiome projects and other metagenomic projects have sequenced many microbiome samples using high-throughput sequencing platforms. One of the key goals of these projects is to compare samples or groups of samples according to their composition and abundance profiles by taxon, gene, function, and pathway. These profiles are often calculated by comparing the sequences against various reference databases. However, reference-based methods cannot analyze the large number of novel sequences that are frequently found in metagenomics samples.

Clustering analysis is a data mining and classification method that assigns similar...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Huang Y, Niu B, Gao Y, et al. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680–2.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Huse SM, Welch DM, Morrison HG, et al. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol. 2010;12:1889–98.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Kunin V, Engelbrektson A, Ochman H, et al. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010;12:118–23.

    CAS  PubMed  Google Scholar 

  • Li W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinforma. 2009;10:359.

    Google Scholar 

  • Li WZ, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.

    CAS  PubMed  Google Scholar 

  • Li W, Wooley JC, Godzik A. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE. 2008;3:e3375.

    PubMed Central  PubMed  Google Scholar 

  • Li W, Fu L, Niu B, Wu S, Wooley J. Ultrafast clustering algorithms for metagenomic sequence analysis. Brief Bioinform. 2012. doi:10.1093/bib/bbs035.

    Google Scholar 

  • Niu B, Fu L, Sun S, et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinforma. 2010;11:187.

    Google Scholar 

  • Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Quince C, Lanzén A, Curtis TP, et al. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009;6:639.

    CAS  PubMed  Google Scholar 

  • Quince C, Lanzen A, Davenport RJ, et al. Removing noise from pyrosequenced amplicons. BMC Bioinforma. 2011;12:38.

    Google Scholar 

  • Reeder J, Knight R. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods. 2010;7:668–9.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Schloss PD, Handelsman J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005;71:1501–6.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 2011;6(12):e27310.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Turnbaugh PJ, Hamady M, Yatsunenko T, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–U487.

    CAS  PubMed Central  PubMed  Google Scholar 

  • White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009;5:e1000352.

    PubMed Central  PubMed  Google Scholar 

  • Wu S, Zhu Z, Fu L, et al. WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genomics. 2011;12:444.

    PubMed Central  PubMed  Google Scholar 

  • Yooseph S, Sutton G, Rusch DB, et al. The sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16.

    PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beifang Niu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this entry

Cite this entry

Niu, B., Wu, S., Li, W. (2015). Clustering-Based HMP Sequence Comparison. In: Highlander, S.K., Rodriguez-Valera, F., White, B.A. (eds) Encyclopedia of Metagenomics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7475-4_90

Download citation

Publish with us

Policies and ethics