OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes

Abstract Organellar (plastid and mitochondrial) genomes play an important role in resolving phylogenetic relationships, and next-generation sequencing technologies have led to a burst in their availability. The ongoing massive sequencing efforts require software tools for routine assembly and annotation of organellar genomes as well as their display as physical maps. OrganellarGenomeDRAW (OGDRAW) has become the standard tool to draw graphical maps of plastid and mitochondrial genomes. Here, we present a new version of OGDRAW equipped with a new front end. Besides several new features, OGDRAW now has access to a local copy of the organelle genome database of the NCBI RefSeq project. Together with batch processing of (multi-)GenBank files, this enables the user to easily visualize large sets of organellar genomes spanning entire taxonomic clades. The new OGDRAW server can be accessed at https://chlorobox.mpimp-golm.mpg.de/OGDraw.html.


INTRODUCTION
Organellar genomes display relatively conserved gene contents, are usually transmitted uniparentally (most often maternally) and are, therefore, excluded from sexual recombination. In most taxonomic groups, the mitochondrial and plastid genomes are small and occur in many copies per cell, which makes them convenient and cheap targets of sequencing projects (1)(2)(3). Moreover, these properties make organellar genomes extremely informative in resolving taxonomic relationships, and it can be expected that the enormous increase in published organellar genome sequences will continue for the foreseeable future (2,3). As next-generation sequencing technologies led to a massive increase in available sequence information, they also pushed forward technology development in the area of organellar genome assembly and annotation (1,3). There are many assembly methods and pipelines [e.g., GetOrganelle (4) or IOGA (5)], and we are currently aware of no less than twelve (semi-)automatic annotation tools for organellar genomes (6-15, https://git.metabarcoding.org/org-asm/org-annotate, http://megasun.bch.umontreal.ca/cgi-bin/mfannot/ mfannotInterface.pl). These range from specialized applications such as MITOS (11), that were designed for a subset of organellar genomes and whose output requires little to no quality control or manual curation, to GeSeq, a flexible tool that allows the annotation of essentially any organellar genome (15). In addition, command line tools that can be incorporated into assembly pipelines such as Plann (13) are available, as well as annotation tools that have implemented downstream data processing (e.g. for phylogenetic analyses), as for example, Verdant (14).
The large diversity of sequencing and annotation software for organellar genomes is in stark contrast to the very small number of tools suitable to visualize finalized genome records. With the exception of metazoan mitochondrial genomes (16), most organellar genomes are too large for standard plasmid drawing programs and hence difficult to display graphically. Before the launch of Or-ganellarGenomeDRAW (OGDRAW) in 2007 (17) and GenomeVx in 2008 (18), organelle genome maps were often drawn manually, lacked a homogeneous design and were inconsistent in feature display. These shortcomings were overcome by the two programs and OGDRAW quickly became the standard in the field. It is currently the only tool that is widely used to generate graphical maps of organellar genomes. As of January 2019, OGDRAW has received >1000 citations (Google Scholar), and the OG-DRAW server generates ∼120 maps per day. When the underlying operation system of the original OGDRAW server became outdated (CentOS v6.7, end of life May 2017), we decide to incorporate OGDRAW into the software toolbox CHLOROBOX (https://chlorobox.mpimp-golm.mpg. de) that has been developed at the Max Planck Institute of Molecular Plant Physiology in Potsdam-Golm (MPI-MP). CHLOROBOX offers software applications for the analy- ses of (mainly plant-derived) nucleic acid and protein sequences. As described in detail below, in the course of the movement of OGDRAW to its new environment, we added several new features to the program (in its new version 1.3.1) and fixed known bugs.

Program description
OGDRAW converts annotations in GenBank format to graphical maps. The input file must be a GenBank flat file (https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord. html), whereas the output file can be generated in different file formats. OGDRAW can produce bitmap or vector graphics at a range of resolutions (bitmap). Maps of both circular and linear genomes can be drawn. Coding regions and other feature-bearing regions of organellar genomes (and other DNA molecules such as plasmids) are visualized, and gene expression data can be displayed. The program can also display cut sites of restriction enzymes. For details on the basic functionality and implementation of OGDRAW, the interested reader is referred to the publications describing OGDRAW v1.0 in 2007 (17) and v1.2 in 2013 (19).

The new front end
We equipped OGDRAW with a new front end that, from a technical perspective, greatly increases its user friendliness. As the previous version was a server-side rendered implementation, the user had to click through several pages, including file upload, parameter input and result download. Comparing different input parameters was a cumbersome task, since for every job, the whole process needed to be restarted. By contrast, the new version of OGDRAW represents a state-of-the-art single-page application (SPA) with asynchronous client-server communication. Among other useful features, the site now caches previously set parameters. This facilitates a much easier comparison of outputs, for example, when comparing maps of the same organellar genomes with different resolutions.

The new GUI
The graphical user interface (GUI) of OGDRAW v1.3.1 was adjusted to the CHLOROBOX design and consists of three columns (Figures 1 and 2). In the first column (I), the user can select the mode (standard or transcript; box Ia) and upload the required GenBank files. Genome conformation (circular or linear) and sequence source (plastid, mitochondrial, or other) are automatically extracted from the Gen-Bank entry (box Ib). Here, the user can also select the 'tidy up' option. Upon selecting this option, OGDRAW will ignore very long gene names that likely represent annotation errors. In addition, it will reformat many gene names that do not meet nomenclature conventions [for details, see (17,19)].
In the second column (II), depending on the mode chosen, the user can upload either a custom configuration as an XML file (Figure 1, box IIa) or gene expression datasets ( Figure 2, box IIa). Customizing a configuration file offers the possibility to display features that are not included in the standard configuration of OGDRAW such as CDS, mRNA or misc feature. A custom configuration file further allows modifying the colour of a gene and/or the name of the gene product to be displayed in the map (17,19). In the 'Genes and Features' box (IIb), the user can select/deselect gene or feature classes (standard mode) or single genes (transcript mode), to be displayed or hidden in the map. In the box underneath (box IIc), detection methods for the inverted repeat (IR) regions present in most chloroplast genomes can  4). Genes inside the circle are transcribed clockwise, genes outside the circle counter clockwise. The circle inside the GC content graph marks the 50% threshold. Note that in contrast to earlier versions of OGDRAW (right box), intron containing genes are now marked by an asterisk (*) and introns are no longer directly drawn into the genes. Please also note that the inverted repeat A (IR A ) is now designated as the right one on the map. For details see text.
be chosen. The last box (IId) of column II allows selection of restriction enzymes whose recognition sites are to be displayed in the map (allowing generation of a combined physical and restriction map). In column III, 'Map Options' (box IIIa) and 'Output Options' (box IIIb) can be specified. 'Map Options' enable, for example, inclusion of a graph of the GC content (available for circular maps) or the possibility to zoom into a specific region of the organellar genome, an option available for linear maps. In the transcript mode, the color code for up-and down-regulated transcripts can be adjusted. With the 'Output Options', the user can choose between various kinds of vector and bitmap output file formats, and also specify the resolution of bitmap files. The 'Action' box (IIIc) submits and resets jobs, but also allows loading example jobs that include demonstration files that can be downloaded for inspection by the user. In the 'Results' box (IIId), the output files such as the physical map (Output-Graph) can be download individually by clicking on the respective symbol. By clicking the floppy disk symbol, all result and input files can be conveniently downloaded as a single zip archive.

The new features
With respect to the previous version v1. 2  the NCBI record) common names, the NCBI RefSeq accession number, or combinations thereof. (ii) On the new OGDRAW server, the user can upload and process several individual as well as multi-GenBank files. In combination with selection of the GenBank files from the local NCBI RefSeq database, the large numbers of available organelle genomes for entire taxonomic clades can be visualized easily and very quickly. These large datasets can be downloaded as a single zip file (see above). (iii) As new output options, modern vector graphic formats such as SVG and PDF were implemented. (iv) OGDRAW v1.3.1 in its default parameters follows the convention to label intron-containing genes in organelle genomes with an asterisk (*). In the 'Map Option' box, this optional feature can be deselected, how-ever. If the user then selects 'intron' in the 'Genes and Features' box, the default parameters of previous OG-DRAW versions are applied in that introns are directly drawn into genes as an empty box ( Figure 3). (v) In agreement with current annotation practise, operons can be displayed in the map as polycistronic transcription units, if annotated in the GenBank entry with the feature keys 'prim transcript' (17) and (as a newly included feature key) 'operon'. (vi) The D-loop of metazoan mitochondrial genomes is now drawn by default ( Figure 4). (vii) Genes or features that span start and end of submitted linear sequences are now displayed correctly by OGDRAW. While this feature is of minor relevance to the visualization of finalized annotations of organellar genomes (but also see Figure 4), its implementa-tion became necessary to improve communication between OGDRAW and our organelle genome annotation pipeline GeSeq (15).

Bug fixes
A number of bugs were fixed in OGDRAW v1.3.1 and the most important ones are listed below: (i) We revised the calculation of the stretch factor of linear maps. Previously, this feature did not work for many GenBank files. (ii) OGDRAW now accepts GenBank files that contain an N or other IUPAC characters that are different from the four standard nucleotides A, T, G and C (20). Earlier versions of OGDRAW produced incorrect maps from such sequences. (iii) With respect to v1.2, the new version of OGDRAW transposes inverted repeats A and B in chloroplast genomes (IR A and IR B ; Figure 3). By default, IR A is now designated as the right repeat in the map, since nucleotide number 1 (set at ∼3 o'clock by OGDRAW) is usually annotated as the first base of the large single copy region (LSC) flanked by IR A (21,22).

CONCLUSION
For more than a decade, OGDRAW has provided the community with a user-friendly application to draw maps of organellar genomes. The program has become the standard in the field. The new version presented here (OGDRAW v1.3.1) provides improved functionality and versatility, and further increases user friendliness.