Published November 22, 2017 | Version 1.0
Thesis Open

Computational Methods for Taxonomic Annotation and Genome Reconstruction in Metagenomics

  • 1. Heinrich-Heine-Universität Düsseldorf

Contributors

  • 1. Heinrich-Heine-Universität Düsseldorf

Description

Abstract

Microbial communities can be found in almost every place, from biogas reactors over deep sea vents, the surface of plant leaves and roots, to the human body, which hosts a plethora of foreign cells in its digestion system. These communities may consist of thousands upon thousands of microorganisms, including bacteria, archaea, algae and fungi, which coexist within their habitats but which cannot simply be cultivated and studied due to their complex mutual dependencies and environmental requirements. Metagenomics is a field dedicated to the genetic analysis of such communities. The genes of their members enable their survival, for instance by making nutrients accessible, by neutralizing toxic compounds or by allowing symbiosis with other organisms. Through the use of nucleotide sequencing technologies, this genetic diversity can be explored and rendered usable, for instance in the form of new antibiotics or as enzymes in biotechnology. Apart from its considerable economic potential, metagenomic approaches lead to a fundamentally improved understanding of the microbial processes on earth.

With current technology, it is not directly possible to sequence contiguous genomes from microbial communities. Instead, short sequences, called reads, are produced, which need to be assembled into genes and longer genome sequences using computer programs. Depending on the size and complexity of the metagenome, this task can be very difficult. This thesis describes two methods for assigning metagenomic sequences to taxonomic groups or genomes. The results can be used to analyze the genes, and the corresponding proteins and functions, within their phylogenetic and genetic context to gain better insight into the functioning of individual organisms and the microbial community. | Our first method, taxator-tk, assigns nucleotide sequences from metagenomes to corresponding taxa and approaches two challenges: the precise prediction of taxa and the application to datasets, which are constantly growing due to the rapid progress in DNA sequencing. Since annotation methods such as taxator-tk, which require similarity to known genomes, spend a considerable part of their runtime for sequence comparison, our algorithm exploits the underlying phylogenetic structure for similar gene sequences to efficiently calculate the taxonomic assignment. The same phylogenetic principles are used to achieve a high assignment precision.

The second method in this thesis helps researchers to reconstruct individual genomes. It is a statistical classification model for metagenome data, for which we outline several direct and follow-up applications. These include classification of nucleotide sequences to individual genomes, de-novo calculation of genome clusters in metagenomes, in-silico sample enrichment for genomes and quality checking of reconstructed genomes. We published the method as a software library named MGLEX for integration into other programs to enable the efficient use of the data for reconstructing genomes in different scenarios.

Presumably, metagenomics will continue to play an important role in microbial research, and may partially obviate the sequencing of cloned strain genomes. This trend is supported by the rapid development of DNA sequencing technologies, which is progressing towards faster sequencing and longer reads. The presented methods supplement the existing set of bioinformatics tools for acquiring knowledge from metagenomes. By reducing metagenomes to individual genomes, one can apply traditional algorithms from genomics, for instance to reconstruct metabolic pathways, and one can link data from transcriptomic and proteomic experiments. Therefore, there is much interest in genome reconstruction methods, like the ones presented in this thesis.

Notes

Text and graphics of the main thesis are released under Creative Commons CC-BY-SA 4.0 but article reproductions in the appendix are subject to the rights of the corresponding publishers (most are open-access).

Files

thesis_jdroege.pdf

Files (13.3 MB)

Name Size Download all
md5:80f7c43ae08b89673bd94a735b7c839d
13.3 MB Preview Download

Additional details