App-SpaM: phylogenetic placement of short reads without sequence alignment

Abstract Motivation Phylogenetic placement is the task of placing a query sequence of unknown taxonomic origin into a given phylogenetic tree of a set of reference sequences. A major field of application of such methods is, for example, the taxonomic identification of reads in metabarcoding or metagenomic studies. Several approaches to phylogenetic placement have been proposed in recent years. The most accurate of them requires a multiple sequence alignment of the references as input. However, calculating multiple alignments is not only time-consuming but also limits the applicability of these approaches. Results Herein, we propose Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM), an efficient algorithm for the phylogenetic placement of short sequencing reads on a tree of a set of reference sequences. App-SpaM produces results of high quality that are on a par with the best available approaches to phylogenetic placement, while our software is two orders of magnitude faster than these existing methods. Our approach neither requires a multiple alignment of the reference sequences nor alignments of the queries to the references. This enables App-SpaM to perform phylogenetic placement on a broad variety of datasets. Availability and implementation The source code of App-SpaM is freely available on Github at https://github.com/matthiasblanke/App-SpaM together with detailed instructions for installation and settings. App-SpaM is furthermore available as a Conda-package on the Bioconda channel. Contact matthias.blanke@biologie.uni-goettingen.de Supplementary information Supplementary data are available at Bioinformatics Advances online.


Overview
In the main paper, we describe a novel approach for phylogenetic placement of short reads called Alignmentfree phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM). Our approach finds a suitable position for a query read sequence Q in a reference tree T . More precisely, for a given set of reference sequences and a binary rooted tree T with a one-to-one mapping between the sequences and the leaves of T , we describe different methods to find an edge e Q of T where a query read Q can be inserted, based on our previously published Filtered-Spaced Word Matches (FSWM) approach [8], see also [7,6,13]. App-SpaM inserts a new node into e Q , splitting e Q into two new edges, and adds another new edge, connecting e Q with a new leaf that is labelled with the query Q. The main paper describes methods to find a suitable edge e Q , but does not explain how the lengths of the newly generated edges are defined. The methods section of this supplementary material recapitulates our methods to find an edge e Q where a query read Q is inserted into a reference tree T . In addition, we explain how the lengths of the newly generated edges are defined, see Subsec. 1.1. In the main paper, we use the Placement Evaluation WOrkflows (PEWO) [9] to evaluate the accuracy and run time of App-SpaM in most analyses. We discuss the parameters and accuracy metrics used for this evaluation in Subsec. 1.2. Additional information about the used data sets is given in Subsec. 1.3. However, the evaluations on unassembled references were not performed with PEWO because it does not support unassembled reference sequences. Instead, we used our own simplified evaluation pipeline which is described in Subsec. 1.4. We also present additional information about the pattern design in Subsec. 1.5. This supplementary also contains additional results for App-SpaM and comprehensive reports for the accuracy of all programs to which we compared App-SpaM in our main paper -RAPPAS [10], EPA [3], EPA-ng [2], Pplacer [12], and APPLES Section 3 shows additional detailed information for the evaluations carried out in the main manuscript. This includes detailed statistics of the boxplots for the general accuracy evaluation in Subsec. 3.1, additional boxplots for different read lengths and their statistics in Subsec. 3.2, additional results for the accuracy of all programs when different parameters are used in Subsec. 3.3, a table for the memory requirements in Subsec. 3.4, and run time results for the large run time showcase on the tara-3748 data set in Subsec. 3.5. Lastly, Sec. 4 shows additional results for the accuracy when query reads are simulated with the software ART [5] (in comparison to the simpler evaluation procedure used in PEWO).

Inserting a Query Read into the Reference Tree
For an edge e in an edge-weighted tree T , let l(e) denote the length ('weight') of e. For a query sequence Q, we first select an edge e Q in the reference tree T and insert a new internal node into this edge, thereby splitting e Q into two new edges e 1 and e 2 with l(e 1 ) + l(e 2 ) = l(e Q ). Then, we add a new leaf that is labelled with Q, together with a new edge e Q that connects this new leaf with the newly generated internal node. Finally, a length l(e Q ) is assigned to the newly generated edge e Q . To find a suitable edge e Q for a query sequence Q, and to assign lengths to the newly generated edges, we first apply our approach FSWM to compare the Q with each reference sequence S. We are then using either the phylogenetic distances d(Q, S) between Q and S as calculated by FSWM, or the number s(Q, S) of spaced-word matches found by FSWM, with scores larger than some threshold t, between Q and S.
We implemented the following four approaches to find an edge e Q where the query Q is inserted into the tree T : MIN-DIST -In this approach, we first select the reference sequence S that minimizes the distance d(Q, S) over all reference sequences, and we define e Q to be the edge in T that is adjacent to the leaf labelled with S. If multiple references have the same smallest distance to Q, one of them is chosen randomly. As explained above, we split e Q into two new edges e 1 and e 2 . Let e 1 be the new edge that is adjacent to the leaf labelled with S. We distinguish the following two situations: (A) If d(Q, S)/2 < l(e Q ), the length of e 1 and e Q are set to l(e Q ) = l(e 1 ) = d(Q, S)/2, and the length of e 2 is set to l(e Q ) − l(e 1 ).
SpaM-COUNT -This works like the previous approach, but instead of selecting the reference sequence S that minimizes the distance to Q, we select the reference S that maximizes the number s(Q, S) of spaced-word matches with score > t between S and Q. The distances for the new edges are calculated in MIN-DIST.
LCA-DIST -Here, we identify the two reference sequences S 1 and S 2 with the lowest distances d(Q, S 1 ) and d(Q, S 2 ) to Q. Let v be the lowest common ancestor in T of the two leaves that are labelled with S 1 and S 2 , respectively. The edge e Q is then defined as the edge in T that connects v with its parental node. Let l(S 1 , v) be the sum of edge lengths between S 1 and v, and l(S 2 , v) accordingly. We defined(Q) as the average distance d(Q,S 1 )+d(Q,S 2 ) 2 between Q and the two chosen references. We definê d(v) as the average distance l(S 1 ,v)+l(S 2 ,v) 2 from the internal node v to the two chosen references in the tree T . To determine the new edge lengths of e 1 and e Q we distinguish three situations: , we set l(e 1 ) = 0, l(e 2 ) = l(e Q ), and the length of e Q is set to l(e Q ) = 0.
, the length of e 1 and e Q are set to l(e 1 ) = l(e Q ) =d (Q)−d(v) 2 , and l(e 2 ) is set to l(e Q ) − l(e 1 ).
LCA-COUNT -This is the same as the previous approach, but instead of using reference sequences S 1 and S 2 minimizing the distance with Q, we select the two references S 1 and S 2 with the maximal number of spaced-word matches to Q with scores larger than t. We then calculate distances for the newly generated edges as in LCA-DIST.
In addition to these four variants of App-SpaM, we used the distances d(Q, S) calculated by FSWM as input for the program APPLES [1].

Evaluation using PEWO
For the accuracy evaluation we used the Pruning-based ACcuracy evaluation (PAC) workflow in PEWO. PEWO provides two accuracy measures to evaluate phylogenetic-placement methods, the node distance (ND) and the expected node distance (e-ND). In short, ND is the number of nodes between the position where the query Q is placed into the tree T by a method that is to be evaluated, and the 'correct' position of Q, see [9] for details. Note that some methods assign to a query Q not one single position in T , but output several possible positions, that are weighted in some way. For likelihood-based methods, these weightings correspond to the calculated likelihood values normalized to 1 and are referred to as likelihoodweight ratios. In this case, ND measures only the distance between the first proposed placement -i.e. the one with the largest weight -and the 'correct' position of Q. As an alternative, PEWO offers the so-called e-ND metric, to evaluate multiple weighted placement positions. Here, for a single query, the number of nodes between every proposed placement and the 'correct' branch is taken into account with respect to its according weight.
The current version of App-SpaM outputs for each query Q a single position in T . The same is true for APPLES. Therefore, we used the ND metric for the evaluations shown in the paper. Because placement uncertainty can be represented with the e-ND metric, e-ND is typically smaller than the ND metric. We show comprehensive results with all ND and e-ND values for programs across all data sets in Subsec. 3.3. These results also include the accuracy across all parameters that were tested during the evaluation. In general, while App-SpaM cannot compute likelihood values, it could theoretically also derive multiple placement positions for a single query Q based on the number of spaced-word matches to all references and the calculated distances. With our proposed placement heuristics, this might be especially reasonable when several reference sequences have a very similar number of spaced word matches or phylogenetic distance to a given query Q. However, specifying such multiple locations and weighting them appropriately requires extensive additional tests that will take us some time to perform.
In the main paper, we showed one accuracy value (ND) for all programs and data sets. This value is for the default parameters of each program; these parameters are shown in Table 1.

Benchmark Datasets
Each data set in PEWO [9] consists of a multiple sequence alignment of the reference sequences and a phylogenetic tree comprising exactly these references. Reads are always simulated automatically by PEWO by splitting pruned reference sequences into shorter reads (with exception to the unassembled reference sequence simulations, see Subsec. 1.4). The table given in the main manuscript contains the number and length of the reference sequences in the data sets from PEWO, together with the length of the simulated reads that we used. The origin of these data sets is given below. With regard to ultrametricity, we also report the difference in distance between the root and the closest/farthest leaf. Let d c be the distance between the root and the closest leaf and d f the distance between the root and the farthest leaf, we report epa-218 One of the data sets used for the evaluation of EPA [3] consisting of 218 sequences of the small sub-unit rRNA with d f dc = 37.76. It was kindly supplied by Alexandros Stamatakis, as were the next two data sets. We are not aware of any availability online for these three data sets.
epa-628 One of the data sets used for the evaluation of EPA [3] consisting of 628 fungal DNA sequences with

Evaluation Unassembled Reads: Own Pipeline
All evaluations, except with non-assembled reference sequences, were carried out with the PEWO framework to ensure reliability, correctness, and easy reproducibility. However, PEWO has limitations: First, it does not simulate sequencing errors for the query reads. We added this feature for additional tests, see Sec. 4. Second, it is limited in its applicability and does not support other input types for the reference sequences than a multiple sequence alignment of the references. In order to evaluate the performance of App-SpaM on unassembled reference sequences, we implemented a simpler version of the PEWO PACworkflow as follows: First, we simulated reads of a defined coverage from the input reference sequences with the program ART [5]. We used coverages of 4, 2, 1, 0.5, 0.25, 0.125, 0.0625, 0.03125. The resulting 'bags of reads' constitute the reference sequences for the experiments. Then, we use a leave-one-out procedure already used by pplacer [12]. A single reference (bag of reads) is pruned out from the reference data set. The remaining bags of reads and the accordingly pruned backbone tree are used as the reference data set. All reads from the pruned reference are placed onto the pruned backbone tree and the accuracy is measured with the ND metric. The average ND of all queries constitutes the overall ND for this pruning event. Then, the procedure is repeated once for every reference sequence. The average over all pruning events for a given coverage is the recorded as the accuracy for this coverage. The procedure is repeated for all coverages.

Evolutionary Models: Jukes-Cantor
Currently, App-SpaM uses the Jukes-Cantor model to estimate the number of substitutions that happened based on the calculated hamming distance from the don't care positions of all filtered spaced word matches. Preliminary experiments showed little to no improvement of the accuracy when using more sophisticated models such as K80 or K81. However, too little tests were carried out to strongly support this observation, neither did we test other parameter-rich ones like GTR. In general, we are planning to test the influence of more complicated evolutionary models when using the filtered spaced-word matches approach, but this is not scope of this work.

Pattern Design
The default parameters for the pattern set used in App-SpaM is |P| = 1 (a single pattern is used), with weight w = 12 and 32 don't care positions. The weight w = 12 has already worked well in FSMW [8] and Read-SpaM [6], and we see the same for App-SpaM. The influence of using more patterns simultaneously, of using different randomly optimized pattern sets, and the weight is shown in several figures in this supplementary, specifically Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5. The number of don't care positions is more complicated to choose based on the use case: FSWM uses 100 don't care positions to compare whole genomes. Here, a large number of don't care positions helps to differentiate between the random background peak and the homologous peak of spaced-word matches. In Read-SpaM the default number of don't care positions is reduced to 60. This choice was made due to the short reads that are compared: A larger number of don't care positions results in a lower number of possible spaced word matches. As an example for a read length of 150: A pattern length of 112 as in FSWM results in only 39 possible spaced-word matches for a given read; reducing the pattern length to 72 already doubles the number of possible spaced word matches to 79. In practice, the increased number of spaced-word matches per read showed improved results. In App-SpaM, the number of don't care positions was decreased again to 32. Preliminary results showed that the results are rather stable for patterns of these lengths. This comes with the trade-off that background and homologous peaks can potentially overlap more strongly. However, based on these preliminary results, we made the design decision to store all don't care positions in a single 64 bit integer. This comes with large simplifications in the implementation that yield faster computations as a result. On the negative side, the number of don't care positions cannot be raised above 32, even if the results were better. It is also possible to use multiple patterns within App-SpaM. The sets of patterns are generated with the rasbhari [4] software that minimizes the variance of pattern-based matches between queries and references by evaluating the overlap complexity of the patterns in the set. For this, rasbhari uses a hill-climbing method to iteratively improve the pattern set, we refer to the paper for more details.

Additional Results -App-SpaM
App-SpaM has 5 different placement heuristics, as well as the ability to forward calculated distances to APPLES to perform the placement. Additionally, the weight w of the used pattern(s) could potentially influence the placement accuracy. Generally, we strongly recommend to use App-SpaM with default values; that is the heuristic SpaM-4 with a single pattern of weight 12. In all our evaluations these settings resulted in robust and accurate results. We noticed some instances, specifically when using unassembled references, where lower weights yielded improved results. We also performed an extensive evaluation to examine the accuracy of App-SpaM with respect to the five heuristics and different weights on a number of unrelated data sets. Here, however, we show for two data sets (bac-150 and hiv-104) how the placement accuracy varies for pattern weights of w ∈ {8, 12, 16}, and for all 5 heuristics with X ∈ {2, 4} for SpaM-X as well as App-SpaM distances in APPLES in the following. Similar results, in a simplified form, for all data sets are also shown in Subsec. 3.3.

App-SpaM Parameters
bac-150 data set, read length 15 6SD0$33/(6w  hiv-104 data set, read length 500 The heuristics and pattern weights were modified according to the annotated x-axis. For each heuristic, three different boxes exist for w ∈ {8, 10, 12} (colors always with decreasing luminosities from w = 8 to w = 12).

Additional Results -Pattern Robustness
App-SpaM uses a set of patterns, denoted as P. By default, the size of this pattern set is |P| = 1, however, it is possible to use multiple patterns. This feature exists mainly for our internal tests. Generally, P for a given weight w and number of don't care positions is chosen by optimizing the overlap complexity for the pattern set with the software rasbhari [4]. We evaluated the robustness of App-SpaM with respect to different generated pattern sets and the number of such patterns that are simultaneously used. Here shown are the results for the data set bac-150 and the data set bv-797 and the two heuristics LCA-COUNT and SpaM-4 of App-SpaM. Generally, we do not see any pronounced influence between either the number of patterns and the accuracy, or between different patterns and the accuracy.

Relation Between Difficulty of Pruning and Achieved Node Distance
In PEWO, for every pruning event a random node within the tree is chosen and all references below this node are removed. Thus, resulting prunings can vary: from only a single pruned leave, to a few references, to large or very large clades of the references tree. Thus, a case can be made, that prunings vary with respect to their 'difficulty'. However, it is unclear how the 'difficulty' can be described reasonably, or, in other words, which property of a pruning makes it more or less difficult. One possibility could be, that prunings where the correct branch is located deep within the tree, i.e. more closely towards the root (and possibly no closely related reference remains in the tree), are more complex. However, not only the size of the pruned tree could be an indication of the 'difficulty' of a pruning, but also how far away the next reference sequences are from the pruned sequences within the tree. Here, we look at two measures for the difficulty of a pruning: The first proxy for the pruning difficulty is, thus, the difference in branch lengths between the unpruned and pruned reference tree. A large difference in branch lengths indicates that long branches or many sequences were pruned. Short branch length differences indicate that few and/or short branches were pruned. Second, we look at the height of the correct placement branch within the tree. Here, the height for the correct branch is defined as the number of nodes on the longest path towards any leaf below (including the leaf node). Thus, the minimal height of 1 indicates that the correct placement branch is directly above a leaf, while large numbers indicate that the correct placement branch is located more towards the root of the tree. We performed experiments to examine the relation between these two measures for pruning 'difficulty' and the achieved placement accuracy of all programs on three data sets: bac-150, neotrop-512, and bv-797. We also report correlation coefficients and p-values for the Spearman-correlation the accuracy of placements and between both measures, respectively. With a significance level of α = 0.05 we observe the following: For the first measure (difference in branch lengths), App-SpaM shows a significant positive correlation for all three data sets. This correlation is also present for all other programs on data set neotrop-512 and for all programs except RAPPAS and APPLES on data set bv-797. For bac-150, only EPA shows the positive correlation. For the second measure, for two data sets, correlations between height of correct branch and accuracy are not significant for all programs. For the last data set (bv-797) positive correlations are significant for all programs except APPLES and EPA. However, for all experiments note the limited sample size of 100 prunings. In general, we expect to observe a correlation between a measure that represent the 'difficulty' of a pruning and the achieved accuracy on the prunings. However, what exactly constitutes as a 'difficult' pruning can also depend on the used placement software, as different input data sets might pose different demands on the software.  Figure 6: Relation between the difference in branch lengths of pruning and ground truth (x-axis) and performance of programs (y-axis). Every dot corresponds to a single pruning event and shows the average node distance of the programs, respectively. The x-axis on each plot corresponds to difference in branch lengths between original and pruned tree; we regard this as a proxy for the difficulty of the pruning experiment. Read lengths were always fixed at 150. Spearman correlation coefficient and p-value are given for all programs in the form    Figure 10: Relation between the difference in branch lengths of pruning and ground truth (x-axis) and performance of programs (y-axis). Every dot corresponds to a single pruning event and shows the average node distance of the programs, respectively. The x-axis on each plot corresponds to difference in branch lengths between original and pruned tree; we regard this as a proxy for the difficulty of the pruning experiment. Read lengths were always fixed at 150. Spearman correlation coefficient and p-value are given for all programs in the form

Boxplot Statistics
The main manuscript shows results of the accuracy of all six programs (appspam, pplacer, epa, epang, rappas, apples) on eight data sets as box plots in a single figure. The following tables show the corresponding exact values of the box plots (sample size (fixed at n = 100), mean, standard deviation (std), minimal (min) value, maximal (max) value, and quartiles) for all programs and data sets. For every data set, the method that has the lowest average values according to these tables is highlighted with a red star in the corresponding figure.

Accuracy with Different Parameters
The accuracy results in the main manuscript show the average node distance of every program for a specific combination of program parameters (the default ones), see Subsec. 1.2. We also only used ND throughout the paper to ensure comparability between all programs with respect to the same accuracy metric. Here, we show additional results for all programs with different parameters, as well as the e-ND for all programs that support it. Every bar corresponds to the average accuracy across all pruning events. The accuracy for a single pruning event is given as the average over all placed query reads. All parameters on the x-axis labels are abbreviated in accordance with PEWO: App-SpaM mode placement heuristics (mindist, spamcount, lcadist, spamx) w weight of patterns pattern number of patterns to use APPLES meth least squares method to use crit placement criterion (least squares phylogenetic placement, minimum evolution, or hybrid) RAPPAS k size of phylo-k-mers in data base, larger values are more accurate but slower o probability threshold for RAPPAS red reduction: gap/non-gap ratio above which site of alignment is ignored ar software for ancestral state reconstruction PPlacer uses a two-step placement heuristic called the "baseball"-heuristic ms max-strikes sb strike-box mp max-pitches EPA-ng h used heuristic by EPA-ng, 1 is fastest, 2 slower but more accurate EPA g proportion of top scoring branch for which full optimization is computed Table 2: Parameter abbreviations used in the x-axes labels of all following plots.

Memory Usage
We performed a run time and memory usage evaluation using PEWO's resources workflow. The reported run times are shown in the main manuscript. Here, we show the results for the memory usage for the two data sets CPU-652 and CPU-512. All shown results are taken from the pss-max column of the results reported by PEWO. PSS measures the proportional set size which gives a reasonable estimate of the total memory usage of the system (shared libraries between processes are only counted once in comparison to RSS). For more detailed information we refer to the extensive PEWO manual on Github. Memory usage of App-SpaM is mainly dominated by two data structures whose size directly depends on the number and lengths of input references and queries. First, the design of the spaced words: Each spaced word is saved in 28 bytes. This includes all base pairs at match and mismatch positions, as well as its originating sequence and position within the sequence. Thus, for input files that encode every symbol in a single byte, App-SpaM's memory footprint to save all spaced words can be estimated at 28 times the size of input files. Here, only the input of the references is relevant, since queries are read and processed in small batches. Second, a structure that saves statistics (the number of spaced words, mismatches, etc.) between all references and the currently handled queries. With n input references and m simultaneously handled queries, this structure needs roughly 16 · n · m bytes.
For short reference sequences, such as single marker genes, the second structure will dominate the memory footprint of App-SpaM. For long reference sequences the storage of the spaced words dominates the overall memory usage.

Run Time Large Showcase
Besides the results shown in the main manuscript, we performed run time evaluations for a large amount of query reads on the tara-3748 data set. Here, we simulated 3748 · 10 n reads for n ∈ {1, 10, 100, 1000, 10000} for each of the 3748 reference sequences of the data set resulting in up to 37,480,000 query reads. We used weights of 12 and 16 and always used 30 threads. All experiments were carried out twice. We measured the placement time needed by App-SpaM with the time command on Linux. We report the results for real and user. Real is the wall clock time, thus the real time that elapsed during the execution of the program. This not only includes the time of the process, but also potentially time spent by the system or other processes. User is the CPU time spent on App-SpaM summed across all 30 cores.  r  block  209  378  12  30  1  2000  208  375  12  30  1  2000  387  4398  12  30  10  2000  400  4240  12  30  10  2000  1693  41161  12  30  100  2000  1749  41957  12  30  100  2000  14649  400296  12  30  1000  2000  16083  440223  12  30  1000  2000  43607 1110812  12  30  10000 2000  41118 1097684  12  30  10000 2000   Weight = 16   real  user  weight threads  r  block  160  301  16  30  1  2000  166  294  16  30  1  2000  343  3097  16  30  10  2000  304  3097  16  30  10  2000  1442  33963  16  30  100  2000  1437  32883  16  30  100  2000  11654 324631  16  30  1000  2000  11323 312966  16  30  1000  2000  35181 843120  16  30  10000 2000  36089 865720  16  30  10000 2000 4 Additional Results -ART We extended the PEWO PAC workflow by using the program ART [5] that simulates sequencing reads for common sequencing platforms. In the PAC workflow, PEWO generates query reads by splitting the pruned reference sequences into non-overlapping segments of the specified query length and removes all resulting queries that are shorter than 50 bp. Instead, we performed additional accuracy evaluations with Illumina-simulated query reads by ART. ART uses a realistic model to simulate sequencing errors for the query reads. For these test runs we used again the three data sets bac-150, hiv-104, and neotrop-512 with 50 prunings each. In every pruning run, and for every removed reference sequence, we used ART to simulate 50 query reads with length 150 bp with the Illumina HiSeq 2500 error profile with default parameters. We could not find the explicit error profiles used in this case, but we expect that they were estimated similarly to other profiles given in the paper. Thus, we expect the reads to have roughly 0.0011 nucleotide substitutions per sequence position on average. The default for insertions is 0.00009 and for deletions is 0.00011. The results that we obtained for query reads simulated with ART are shown in Fig. 25. Note that, unlike the default version of PEWO, the program ART simulates sequencing errors. As can be seen in the figure, the performance of App-SpaM with the placement heuristic SpaM-4 is almost not affected by the introduced sequencing errors. All other programs show a significant drop in their placement accuracy if ART is used to simulate reads, compared to the simulated reads used in PEWO by default.
We are unsure about the reason for these pronounced results. In general, we think that App-SpaM is only little influenced by substitution errors by design of the spaced words. This is backed by the figure that shows only a small accuracy decrease from PEWO-reads to ART-reads. However, it poses the question why all other programs show such a large drop of accuracy. A possible explanation for the different test results on simulated reads with and without sequencing errors is as follows: Our spaced word approach is generally less affected by nucleotide mismatches than methods that rely on exact word matches or on exact alignments. Also, PEWO uses the program hmmalign to align queries to the reference MSA; in presence of simulated errors the hmm-profile-based alignments might lead to imprecise query alignments and hence inaccurate placements results. We are hoping for PEWO to support the simulation of sequencing errors in the query reads with ART or a similar program by default in the future. More thorough tests are needed to verify or refute the here presented results.