In Silico Whole Genome Sequencer and Analyzer (iWGS): a Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.

ABSTRACT The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS. KEYWORDS genome sequencing high-throughput sequencing de novo assembly experimental design simulation nonmodel organism Whole genome sequences are rich sources of information about organisms that are superbly useful for addressing a wide variety of evolutionary questions, such as measuring mutation rates (Kumar and Subramanian 2002), characterizing the genomic basis of adaptation (Roux et al. 2014), and building the tree of life (Rokas et al. 2003;Salichos and Rokas 2013). Until now, however, organismal diversity has been highly unevenly covered, and most sequenced genomes correspond to model organisms, organisms of medical or economic importance, or ones that have relatively small and simple genomes (Reddy et al. 2015). The rapid advance of DNA sequencing technologies has dramatically reduced the labor and cost required for genome sequencing, which is evidenced by the burst of large-scale genome projects in recent years that includes, for example, the 1000 Fungal Genomes (1KFG) Project (Grigoriev et al. 2011), the Yeast 1000 Plus (Y1000+) Project (Hittinger et al. 2015), the Insect 5K Project (Robinson et al. 2011), and the Genome 10K Project (Genome 10K Community of Scientists 2009). Some of these projects have already begun to fuel important discoveries in evolution and other fields (Zhang et al. 2014). Equally importantly, high-throughput DNA sequencing has made it possible for single investigators to perform de novo genome sequencing in virtually any organism they are interested in (Rokas and Abbot 2009). Such sequencing efforts may target various organisms with a large diversity of genome architectures. Therefore, to achieve optimal results, the choice of sequencing strategy (i.e., the combination of sequencing technology [e.g., Illumina or Pacific Biosciences (PacBio)], sequencing assay (e.g., paired-end or mate-pair), and other variables, such as sequencing depth and assembly protocols (e.g., assemblers and the associated parameters) should ideally be tailored to the characteristics of a given genome, such as size and GC/repeat content (Nagarajan and Pop 2013).
The vast majority of de novo sequenced genomes have been generated using the Illumina technology, either solely or in combination with other technologies (Reddy et al. 2015). This is largely due to the Illumina technology's ability to quickly generate tens to hundreds of millions of highly accurate short sequence reads of up to 300 bases per run at very low per base cost (Glenn 2011). Additionally, the Illumina technology offers two powerful sequencing assays, paired-end (PE) and mate-pair (MP), which generate sequence read pairs that span short (hundreds of base-pairs) and relatively long (thousands of base-pairs) genomic regions, respectively. Mixing multiple PE and MP libraries with different insert sizes allows for highly flexible sequencing strategies, and several state-of-the-art assembly algorithms have been developed that exploit all these advantages. For instance, the de novo genome assembler ALLPATHS-LG can generate high quality draft assemblies for mammalian-size genomes using only Illumina short-read data by including both MP and overlapping PE libraries (Gnerre et al. 2011). On its own, however, the Illumina technology performs less well for more complex genomes, mainly due to the short lengths of Illumina sequence reads and the technology's bias against certain genomic regions (e.g., GC-rich regions) (Ross et al. 2013).
The PacBio technology generates sequence reads that are substantially longer and have much less sequencing bias, albeit at the cost of a substantially lower per-read accuracy; the average read length increases to above 10 kb with the latest chemistry but displays only 87% accuracy (Koren and Phillippy 2015). Thus, this technology is particularly useful for the sequencing of complex genomes (Koren and Phillippy 2015). Recent developments in both sequencing chemistry and assembly algorithms have enabled PacBio-only de novo assembly for microbial genomes (Koren et al. 2013), but the high sequence coverage required for this approach remains cost-prohibitive for large eukaryotic genomes. Nevertheless, in combination with more affordable Illumina short-read data, PacBio long reads-even at low coverage-can lead to significantly improved assemblies (Utturkar et al. 2014;McIlwain et al. 2016).
De novo genome sequencing projects are further complicated by the large array of assembly software tools, which differ in many aspects, such as algorithmic design, supported/required data types, and computational efficiency (Nagarajan and Pop 2013;Simpson and Pop 2015). Systematic evaluations of assembly programs show that no sin-gle assembler is the best across all circumstances; rather, an assembler's performance critically depends on genome complexity and the sequencing strategy adopted (Earl et al. 2011;Bradnam et al. 2013). Moreover, many assemblers use adjustable parameters (e.g., the k-mer size for de Bruijn assemblers), the values of which can critically affect the assembly quality. In practice, such parameters are often selected intuitively or through the time-consuming process of testing multiple values.
The great number of possible ways to combine sequencing technologies, assays, and assembly algorithms poses a great challenge for the experimental design and data analysis in de novo genome sequencing projects, which in turn can sometimes lead to poor quality or downright incorrect assemblies (Denton et al. 2014). As a consequence, several pipelines have been developed to automate specific steps in the process; for example, the recently developed iMetAMOS (Koren et al. 2014) and RAMPART (Mapleson et al. 2015) have been specifically designed to automate genome assembly. However, as de novo genome sequencing is increasingly adopted by single investigator laboratories, there is an urgent need for streamlined approaches that enable investigators to not only efficiently generate high-quality draft genome assemblies but also to predict (via simulation) and identify the most suitable design(s) [i.e., the most suitable combination(s) of sequencing strategy and assembly protocol] currently available for a specific genome.
To address this need, we have developed an automated pipeline for the design and execution of de novo genome sequencing projects that we name iWGS (in silico Whole Genome Sequencer and Analyzer). To approximate the performance of different sequencing strategies and assembly protocols, iWGS simulates high-throughput genome sequencing on user-provided reference genomes (e.g., genomes that closely represent the characteristics of the real targets), facilitating the identification of optimal experimental designs. iWGS allows users to experiment with various combinations of sequencing technologies, assays, assembly tools, and relevant parameters in a single run. iWGS is also designed to work with real data and can be used as a convenient tool for automated selection of the best assembly or genome assembler. Finally, using three case studies, each one focused on specific challenges frequently encountered in de novo genome sequencing studies (e.g., high repeat content and biased nucleotide composition, etc.), we illustrate how iWGS can be applied to guiding the design and analysis of de novo genome sequencing studies.

RESULTS
The design of iWGS iWGS encompasses all major steps of a typical de novo genome sequencing study, including the generation of sequence reads, data quality control, de novo assembly, and evaluation of assemblies ( Figure 1).
Simulation: iWGS uses the realistic high-throughput sequencing (HTS) read simulators ART , pIRS (Hu et al. 2012), and PBSIM (Ono et al. 2013) to generate Illumina and PacBio sequence reads from a given user-specified genome. These programs can simulate all popular data types, including Illumina PE and MP sequence reads, as well as PacBio continuous long sequence reads. The distributions of read quality and read length are easily adjustable for both Illumina and PacBio data. Furthermore, these simulators mimic sequencing errors and nucleotide composition biases in real data by using empirical profiles of these artifacts, which can be easily customized to stay current with upgrades in sequencing technologies. For instance, we have created a quality-score frequency profile learned from sequence reads generated by the latest PacBio chemistry to better reflect the improved sequence read accuracy. This simulation step can be omitted when the goal is the analysis of real data. Alternatively, the users may choose to perform only the simulation and use the simulated data for other analyses.
Quality control: HTS data generated by all technologies contain errors and artifacts, which may sometimes substantially compromise the quality of the assembly (Zhou and Rokas 2014). Therefore, iWGS includes an optional step to perform preprocessing of the data, including trimming of low-quality bases, removal of adapter contaminations, and correction of sequencing errors. Since some assemblers [e.g. ALLPATHS-LG (Ribeiro et al. 2012)] have their own preprocessing modules, iWGS automatically determines for each assembly protocol whether to use the original or the processed data.  (Bradnam et al. 2013;Magoc et al. 2013). These supported assemblers allow users to carry out de novo assembly using only Illumina shortread data (e.g., SOAPdenovo2) and only PacBio long-read data (e.g., Canu and Falcon), or to perform hybrid assembly that uses both (e.g., SPAdes and DBG2OLC). To achieve the best possible results while avoiding the computationally expensive process of testing multiple combinations of parameters, iWGS takes advantage of successful assembly recipes (i.e., recommended settings for each assembler) established in studies such as Assemblathon 2 (Bradnam et al. 2013) and GAGE-B (Magoc et al. 2013), and uses KmerGenie to determine the optimal k-mer size (Chikhi and Medvedev 2014). In addition, assemblies generated from different underlying data and/or assembly algorithms can be merged using Metassembler (Wences and Schatz 2015) to achieve a potentially better final assembly.
Evaluation: iWGS uses QUAST (Gurevich et al. 2013) to evaluate all generated assemblies. In addition to providing basic statistics like N50 (the largest contig/scaffold size wherein half of the total assembly size is contained in contigs/scaffolds no shorter than this value), QUAST compares each assembly against the reference genome (in the case of simulations) and generates a number of highly informative quality matrices, such as misassemblies, assembled sequences not present in the reference (and vice versa), and genes recovered in the assembly if the reference genome is annotated. At the end, iWGS ranks all assemblies based on selected matrices in the QUAST report using a previously described weighting strategy (Abbas et al. 2014). This ranking, along with the detailed QUAST report, helps users to identify the best overall assembly, as well as the corresponding combination of sequencing strategy and assembly protocol. REAPR, which utilizes the sequence data itself for assembly evaluation, is also implemented to better suit real data analysis (Hunt et al. 2013).
iWGS is designed with flexibility and ease-of-use in mind to allow users to readily examine various experimental designs; each data set may be used multiple times in different assembly protocols, and each assembler may be run repeatedly with different input data sets. Multiple sequencing strategies and assembly protocols can be specified in a straightforward fashion in a single configuration file; only a few parameters are required for each strategy/protocol, while other settings (e.g., quality profiles for read simulation) are globally shared across strategies/ protocols of the same type. Alternatively, advanced users can opt to (1) data simulation (optional); (2) preprocessing (optional); (3) de novo assembly; and (4) assembly evaluation. iWGS supports both Illumina short reads and PacBio long reads, and a wide selection of assemblers to enable de novo assembly using either or both types of data. Users can start the analysis simulating data drawn from a reference genome assembly or, alternatively, use real sequencing data as input and skip the simulation step. iWGS, in silico Whole Genome Sequencer and Analyzer; MP, mate-pair; PacBio, Pacific Biosciences; PE, paired-end.
customize the strategies/protocols so that, for example, each sequencing data set is simulated with different quality settings. Furthermore, iWGS rigorously checks the configurations for issues such as the compatibility between sequencing strategies and assembly protocols.
iWGS is a lightweight pipeline written in Perl. The source code, detailed documentation, and example test sets are freely available at https://github.com/zhouxiaofan1983/iWGS. Like many other bioinformatics pipelines, iWGS inevitably relies on a number of third-party software tools to carry out individual analyses such as data simulation and genome assembly. However, most of the tools, including at least one for each of the four major steps aforementioned, either have precomplied executables or can be compiled locally with ease. For the convenience of users, we also include in the package scripts to automate the acquisition and installation of most software dependencies. The users can also customize the selection of tools to install according to their own needs and computational environments.

Case studies
To demonstrate the use of iWGS and provide examples of its utility, we developed three case studies where iWGS was used to guide the selection of sequencing strategy for genomes representing a wide range of sizes and complexity levels (Supplemental Material, Table S1). The competing strategies were selected to enable both Illumina-only and PacBioonly assemblies, as well as hybrid assembly of the two data types (Table 1). To examine the effectiveness of the simulation step of our approach, we also analyzed real sequencing data that largely match our simulation settings.
Case study I (repeat-content issue): We first compared the sequencing of two fungi, Zymoseptoria tritici (synonym: Mycosphaerella graminicola)  and Pseudocercospora fijiensis (synonym: M. fijiensis) (Ohm et al. 2012), which both belong to the class Dothideomycetes yet have dramatically different repeat contents; the estimated repeat contents are 15 and 50% for the two genomes, respectively. Our simulations showed that, while good quality assem-blies can be obtained for Z. tritici using either data type, the PacBioonly assembly for Ps. fijiensis vastly outperforms assemblies based on Illumina data alone or in combination with low-coverage PacBio data ( Figure 2). The results are consistent with the notion that PacBio long reads are particularly powerful in resolving repeats (Koren et al. 2013). We then further tested if these results are informative for guiding the sequencing of another highly repetitive Dothideomycetes genome, Cenococcum geophilum, which has a repeat content of 76% (http:// genome.jgi.doe.gov/Cenge3). For C. geophilum, the PacBio-only assembly was again found to be the best, while the hybrid assembly using DBG2OLC and the Illumina-only assembly using ALLPATHS-LG were next in rank ( Figure 2 and Table S2), nicely recapitulating the results of Ps. fijiensis. We also performed meta-assembly of Illumina-only assemblies ILMN1 to ILMN7 (Table 1) on all three genomes using Metassembler. While the meta-assembly approach substantially improved the assembly continuity for Z. tritici, no improvement was observed for Ps. fijiensis and C. geophilum ( Figure  2 and Table S2). These results suggest that the use of iWGS would provide critical information to help end users choose a successful sequencing of highly repetitive genomes that share similar characteristics. Importantly, since simulated assemblies are recoverable, the likely impact of the different assembly strategies on genes, gene families, or pathways of interest could also be examined in detail.
Case study II (GC-content and mtDNA assembly issue): We next examined the de novo assembly of mitochondrial genomes from whole genome sequencing data of Saccharomyces cerevisiae (Mewes et al. 1997;Foury et al. 1998). Yeast mitochondrial genomes are valuable resources for evolutionary and functional studies (Freel et al. 2015), yet the acquisition of finished mitochondrial genome assemblies is not trivial because of their very low GC-content (17%). We simulated a genome sequencing experiment using the nuclear and mitochondrial genomes of S. cerevisiae. We tested two ratios of nuclear to mitochondrial genome copy numbers representing low (1:50) and high (1:200) mitochondrial contents, respectively (Solieri 2010). iWGS analysis n showed that the S. cerevisiae mitochondrial genome was fully recovered at both low and high mitochondrial contents using Illumina data (Table 2). Consistent with recent observations made during the assembly of the S. eubayanus genome, only certain assemblers performed well; for example, ALLPATHS-LG performed surprisingly poorly, while SPAdes performed quite well (Baker et al. 2015). Importantly, the complete mitochondrial genome can be obtained as a single contig using only Illumina or only PacBio data, or using both data types (Table  2). Similarly, both Illumina and PacBio data resulted in good quality assemblies of the nuclear genome (Table S2). At the same time, different assemblers exhibited widely different performances even with the same input data (Table 2).
Case study III (genomic architecture issue): Lastly, we applied iWGS to three model eukaryotic genomes from different kingdoms and with different genomic architectures. Specifically, we analyzed Drosophila melanogaster (Adams et al. 2000) and Arabidopsis thaliana (Arabidopsis Genome Initiative 2000), which are medium-sized animal and plant genomes, respectively, as well as Plasmodium falciparum 3D7 (Gardner et al. 2002), a smaller protist genome with extremely low . For all three genomes, the best assembly was generated by using only PacBio data (Table 3). In D. melanogaster and A. thaliana, several Illumina-only assemblies were of relatively high-quality (i.e., corrected scaffold N50 $ 100 kb; Table S2), among which the best two were generated by ALLPATHS-LG and DISCOVAR (Table 3). However, all Illumina-only assemblies of Pl. falciparum 3D7 had considerably lower corrected scaffold N50 values, except for DISCOVAR whose sequencing strategy is unique in requiring a PE library with a limited insert size.
To examine how well the simulation-based predictions made by iWGS are supported by empirical data, we collected four real genome sequencing data sets from a previous study of Pl. falciparum IT [one overlapping 100 bp PE library, one overlapping 250 bp PE library, one MP library, and one PacBio library from (Otto et al. 2014); Table S1] that were a good match to our simulated data sets, and ran the same set of assembly protocols. The best assembly was again generated by PacBio data alone, and the assemblies generated by ALLPATHS-LG, DISCOVAR (both are Illumina-only), and DBG2OLC (hybrid) were ranked next, while all other Illumina-only assembly protocols performed poorly (Table 3 and Table S2). The results are largely consistent with our simulation study, suggesting that our simulation-based approach is indeed informative.

Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

DISCUSSION
The design and analysis of de novo genome sequencing experiments is not trivial. On the design front, one has to balance between the complexity of the target genome, the strengths and weaknesses of each sequencing technology, and, importantly, the cost. Analysis is also challenging, as one is faced with multiple different algorithms and dozens of parameters. Although substantial efforts have been made to benchmark different approaches for genome assembly (Earl et al. 2011;Salzberg et al. 2012;Bradnam et al. 2013;Magoc et al. 2013), much less attention has been paid to investigating start-to-finish optimal sequencing strategies for a given genome [see (Chakraborty et al. 2016) for one example].
iWGS is an automated tool that allows users to explicitly compare alternative experimental designs by using simulated sequencing data, even allowing users to estimate costs when these are known for the Figure 2 Performance comparison of five representative experimental designs on three Dothideomycetes genomes. The five designs shown include three Illumina-only designs (ILMN2: ALLPATHS-LG, META: Metassembler, and ILMN8: DISCOVAR), the best performing PacBio-only design (PACB2: Canu), and the best performing hybrid design (HYBR2: DBG2OLC) for each genome. The statistics on the assembled fraction of the reference genome, scaffold N50, and largest scaffold size are all after correction for assembly errors using the reference genome as reported by QUAST in GAGE mode. By default, QUAST (in GAGE mode) corrects contigs/ scaffolds by breaking them at assembly errors larger than 5 bp. Scaffold N50 and largest scaffold size are shown in log10 scale.
generation of each data type. We have illustrated the utility of iWGS in several case studies on mitochondrial and nuclear genomes with varying levels of complexity. For instance, our simulations suggest that Illuminaonly sequencing strategies may be economical choices for the sequencing of relatively simple genomes (e.g., Z. tritici; Table S2), whereas PacBio data would be highly desirable for genomes of greater complexity (e.g., Ps. fijiensis, C. geophilum, and Pl. falciparum). Although not done here, iWGS could also be used to evaluate different combinations of sequencing assays (e.g., PE and MP libraries), read quality, read lengths, and sequencing depths. Empirical studies of both short-and long-read data have shown that these parameters are critical determinants of the quality of de novo genome assemblies (Utturkar et al. 2014;Chakraborty et al. 2016).
One key function of iWGS is the use of simulation data generated from a related reference genome to inform the experimental design for organisms lacking genomic data. A similar concept was previously used to evaluate sequencing strategies for cacao by using the rice genome as the reference (Haiminen et al. 2011). In principle, one could apply iWGS on one or more related reference genomes that resemble the characteristics (e.g., genome size, repeat content, and sequence composition) of the sequencing target. However, if such reference genome is lacking, one solution is to start with a closely related reference genome and tune it toward the target (e.g., adjust GC-and repeat contents) by using third-party tools that simulate genome-wide evolution (Arenas and Posada 2014) before running iWGS. Alternatively, one may simply use iWGS with reference genomes that are of comparable complexity (e.g., similar in size and repeat content) regardless of the evolutionary relatedness. As suggested by previous studies, these factors not only influence the difficulty of genome assembly, but can also be excellent predictors of the assembly quality (Lee et al. 2014). Therefore, iWGS could also be informative in evaluating the performance of alternative experimental designs on genomes with similar characteristics to the sequencing target.
Other important features of iWGS include the support for both Illumina short, and PacBio long, sequence reads and, correspondingly, a wide selection of software tools compatible with these data types, as well as the ability to analyze real data. In comparison, the support for third generation sequencing data are relatively limited in iMetAMOS and currently lacking in RAMPART. Given the increasing importance of long sequence reads in de novo genome assembly, iWGS aims to allow users to fully exploit the strength of long-read data and explore alternative ways of data analysis. Along these lines, several further developments can be envisioned. First, support for additional sequencing technologies, such as Oxford Nanopore, can be added as technologies become commercially available. In fact, the Celera Assembler, Canu, and SPAdes assemblers, which are supported by iWGS, can already n  (low mitochondrial  content)   ILMN1, ILMN6, ILMN8,  PACB2, HYBR1, HYBR2   ILMN7  ILMN2, ILMN4, ILMN5,  PACB1, PACB3   ILMN3   1:200 (high mitochondrial  content)   PACB2, HYBR1, HYBR2  ILMN6, ILMN7  ILMN1, ILMN8,  PACB1, PACB3   ILMN2, ILMN3,  ILMN4, ILMN5 a The de novo assembly generated by each strategy was compared against the reference mitochondrial genome of S. cerevisiae using both QUAST and BLASTN.
Unless a single contig was found to represent the complete mitochondrial genome, the assembled fraction of mitochondrial genome was determined based on the number of "missing reference bases" reported by QUAST, and further confirmed by the BLASTN result.
n a The statistics for simulation-based analysis of D. melanogaster, A. thaliana, and Pl. falciparum 3D7 are after correction for assembly errors using the reference genome, as reported by QUAST in GAGE mode. By default, QUAST (in GAGE mode) corrects contigs/scaffolds by breaking them at assembly errors larger than 5 bp. The statistics for real data based analysis of Pl. falciparum IT are calculated from the original de novo assemblies.
utilize nanopore reads (Bankevich et al. 2012;Berlin et al. 2015). Similarly, realistic simulation of nanopore data will be possible once the patterns of errors and biases are better characterized using real data. Second, iWGS will continue to expand its functionality to achieve better assemblies. For instance, a number of assembly polishing tools can be integrated in iWGS to improve the quality of the final output, including Pilon (Walker et al. 2014), Quiver (Chin et al. 2013), and Nanopolish (Loman et al. 2015), which use Illumina, PacBio, and nanopore data, respectively. In addition, iWGS currently uses Metassembler for metaassembly; in the future, other meta-assembly tools that support assemblies based on PacBio data alone, such as quickmerge (Chakraborty et al. 2016), could be added. Lastly, it would be beneficial to enable users to add new software tools to iWGS in order to stay up-to-date with the rapid advances in genome assembly and other aspects of HTS data analysis. We intend to provide periodic updates, and the expert user can edit iWGS on their own. In summary, iWGS is a flexible, expandable, and easy to use pipeline that will aid in the design and execution of genome assembly experiments across the tree of life.