Authors:
Yuki Endo
1
;
Fubito Toyama
1
;
Chikafumi Chiba
2
;
Hiroshi Mori
1
and
Kenji Shoji
1
Affiliations:
1
Utsunomiya University, Japan
;
2
University of Tsukuba, Japan
Keyword(s):
Bioinfomatics, Next Generation Sequencing, de novo Assembly.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Bioinformatics
;
Biomedical Engineering
;
Databases and Data Management
;
Genomics and Proteomics
;
Next Generation Sequencing
;
Sequence Analysis
Abstract:
Sequencing the whole genome of various species has many applications, not only in understanding biological
systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput
next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing.
These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the
data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for
merging these fragments. One method of merging these fragments without using a reference dataset is called
de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and
SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and
time consumption. However, memory consumption increases dramatically when the size of input fragments is
larger. Therefore, it is necessary to develop an alte
rnative algorithm with low memory usage. In this paper, we
propose an algorithm for de novo assembly with lower memory. In the proposed method, memory-efficient
DSK (disk streaming of k-mers) to count k-mers is adopted. Moreover, the amount of memory usage for
constructing de bruijn graph is reduced by not keeping edge information in the graph. In our experiment using
human chromosome 14, the average maximum memory consumption of the proposed method was approximately
7.5–8.8% of that of the popular assemblers.
(More)