Using populations of human and microbial genomes for organism detection in metagenomes

  1. Jonathan E. Allen2
  1. 1Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  2. 2Global Security Computer Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  3. 3Instituto de Física Corpuscular, CSIC-UVEG, E-46980 Valencia, Spain
  1. Corresponding author: allen99{at}llnl.gov

Abstract

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.184879.114.

  • Freely available online through the Genome Research Open Access option.

  • Received September 24, 2014.
  • Accepted April 28, 2015.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server