Galactic Circos: User-friendly Circos plots within the Galaxy platform

Abstract Background Circos is a popular, highly flexible software package for the circular visualization of complex datasets. While especially popular in the field of genomic analysis, Circos enables interactive graphing of any analytical data, including alternative scientific domain data and non-scientific data. This high degree of flexibility also comes with a high degree of complexity, which may present an obstacle for researchers not trained in programming or the UNIX command line. The Galaxy platform provides a user-friendly browser-based graphical interface incorporating a broad range of “wrapped” command line tools to facilitate accessibility. Findings We have developed a Galaxy wrapper for Circos, thus combining the power of Circos with the accessibility and ease of use of the Galaxy platform. The combination substantially simplifies the specification and configuration of Circos plots for end users while retaining the power to produce publication-quality visualizations of complex multidimensional datasets. Conclusions Galactic Circos enables the creation of publication-ready Circos plots using only a web browser, via the Galaxy platform. Users may download the full set of Circos configuration files of their plots for further manual customization. This version of Circos is available as an open-source installable application from the Galaxy ToolShed, with its use clarified in a training manual hosted by the Galaxy Training Network.


Findings Background
The Circos visualization tool [19] is widely used in the biological scientific community, and is especially popular for use in scientific publications. Circos has over 4000 citations, and its plots have appeared on the cover of several leading scientific journals [8]. Its popularity is due in a large part to its great flexibility; Circos offers a wide range of visualisation options, and all aspects of a Circos plot may be tweaked and customized to the user's wishes. While originally created for the visualisation of genomic data, Circos makes no a priori assumptions about the format and domain of the input data; this is illustrated by the fact that it has been used for a wide range of ap-plications, ranging from genomics research to visualisations of car sales, urban planning, and even presidential debates [9].
With Circos's great flexibility also comes a high degree of complexity, and a significant learning curve, and as a result its use is often limited to expert users who are experienced with programming and the UNIX command line.
The Galaxy platform [11] aims to provide a user-friendly interface to commandline tools, and empower domain experts to run powerful analysis and visualization tools without the need for any programming experience. Galaxy offers a wide range of tools for a variety of applications domains, and is widely used in the biological scientific community (8900+ citations, 7500+ tools [10,5]). Galaxy also automates the installation of tools and all their dependencies, removing another hurdle for its use by research scientists.  Our tool combines the power of Circos with the userfriendliness of the Galaxy interface to greatly increase the accessibility of the tool and simplify the creation of publicationready plots for scientific data.
Previously, custom Circos Galaxy plotter tools have been written [18]; however, these tools are not generic, but are tailored specifically to the use case at hand. This means that a new Galaxy tool has to be created whenever a new plot type is needed. Galactic Circos aims to be a generic tool capable of creating any Circos plot regardless of data domain.

Results
The Galactic Circos tool changes the way users must specify the configuration of a Circos plot. Instead of writing a number of configuration files, users now only need to select the various plot options from a web interface, and datasets from their analysis history ( Figure 1). Because Circos plot specifications can be quite complex, the tool interface is subdivided into several collapsible sections, each corresponding to a different Circos configuration option in order to increase the usability of the tool. Parameters are preconfigured with sensible default values so that basic plots can be generated with minimal configuration.
We demonstrate the utility of the Galactic Circos tool by recreating one of the more advanced examples from the Circos online tutorials, the microbial genome lesson [2] ( Figure  2). This displays multiple tracks of different types (text, histogram, tiles), has a customized ideogram, and uses rules for colouring data points dependent on their value.
In a second example (Figure 3), we replicate within Galaxy the cover image of the Nature issue [7] dedicated to the EN-CODE project [14]. This cover featured a Circos plot and is also available as part of the official Circos tutorials [1].
These two examples showcase a variety of different track types (histograms, scatterplot, highlights, tiles, text) and configurations (ticks, rules, ideogram customizations) to illustrate the feature-completeness of Galactic Circos. Here we reproduce one of the more complex tutorials from the Circos documentation. The top-left half of the image is produced by the configuration provided by the Circos tutorial, while the bottom-right half is produced completely in Galaxy. While some options used in the original tutorial cannot be directly used (e.g. unrestricted perl code), they can be recreated equivalently in the tool interface. Some options in the tool interface are likewise restricted, Galactic Circos offers a color picker with a limited palette, which explains the differences in colour. However, our tool offers the ability to download the full Circos configuration folder, allowing advanced users to tweak the colour (or other) parameters manually and rebuild the image locally. https://usegalaxy.eu/u/helena-rasche/h/circos-microbe-tutorial

Workflow Summarisation
Visualisations in the Galaxy framework are usually implemented as interactive JavaScript components, but these plots cannot be created automatically in workflows. Individual plot-ting tools exist as Galaxy tools, however these are less common and generally less flexible as tool authors must make a tradeoff between development time and feature support. We put significant time into the development in order to make an extremely generic tool, enabling researchers to use the Galactic Circos tool in their workflows, based on previous experiences building single-purpose Circos plotting tools (e.g. as in Figure 4). This enables creation of human-readable summaries of large analysis workflows, similar to the non-genomics focused iReport [17]. Galactic Circos was born from precisely this usecase, and therefore aims to enable reducing complex analysis pipeline outputs, such as the workflows required in cancer genomics, allowing bioinformaticians to produce a single image summarising all of their relevant outputs in an easily digestible manner.

Supporting Tools
Circos requires input datasets to adhere to a specific and custom file format. In order to facilitate the conversion of data to this custom Circos format, we have developed several supporting Galaxy tools for conversion. These tools allow users to convert their datasets from a variety of common genomics formats such as (big)Wig files, interval files, and MAF/Stockholm alignments. Furthermore, the existing Galaxy ecosystem provides a wide array of tabular data manipulation tools that can be leveraged to transform any tabular or text files into the format accepted by Circos.
To demonstrate the utility of these supporting tools, we show a real-world example of a plot using common genomics datasets. This example is a recreation of a plot in a published paper demonstrating chromothripsis in the VCaP prostate cancer cell line [12]. The input datasets originate from a variety of sources, including a structural variants files (converted to Circos links track), copy number and B-allele frequency track obtained from Affymetrix SNP array data, and a SNP density track generated from a VCF file. Using a combination of the supporting tools included in the Galactic Circos package and the generic file manipulation tools present in Galaxy, we were able to convert these various datasets to Circos-compatible formats without leaving Galaxy, and reproduced the Circos plot from the publication (Figure 4).
Once data has been reformatted for Circos, it can either be used immediately or be further processed. Circos includes a tool suite for post-processing and downsampling of data which can improve plot clarity and processing speed. We additionally included a number of these post-processing tools into Galaxy, notably the link bundling and binning tools used in Figure 5.
Finally, while Circos is widely used for the visualization of genomic data, and many of the parameter names have a distinctly biological feel to them, the tool does not impose any restrictions on the type of input data, and is capable of displaying non-biological data just as easily [9]. To show that our tool retains this degree of flexibily, we recreated the presidential debate plot included in the Circos tutorials, which in turn was based on a plot which appeared in the New York Times aticle [6]. A plot comparison can be seen in Figure 6.

Implementation
The execution of the tool leverages Galaxy's ability to write templated files directly to disk with configuration from the tool form, and then running Circos directly on these templated configuration files.
Installation of the Circos tool and its dependencies is handled by the Galaxy platform which supports different depen- . While the input data originated from a range of standard and nonstandard genomic file formats, conversion to circos-formatted files was possible using the plethora of file manipulation tools already integrated into Galaxy and the set of supporting conversion tools included in the Galactic Circos package. In the second image we produce Circos plots per chromosome, leveraging Galaxy's ability to map a tool execution across a collection of input datasets, in this case each karyotype in a separate input file. The images are reduced and placed together in a montage using further Galaxy tools. https://usegalaxy.eu/u/ helena-rasche/h/circos-cancer-genomics--chromothripsis, https://usegalaxy. eu/u/helena-rasche/h/circos-multiplot with different thresholds. The inner link track was generated directly from a MAF file output by LastZ [20]. This file was processed by Circos' bundling tool in Galaxy in order to decrease the number of links, a process usually done to decrease visual noise and increase efficiency. The outer track demonstrates the link binning script which generates a histogram, in this case from the number of links to that position in the genomic region. dency management frameworks, inlcuding Conda and Containers. All dependencies including circos itself are available from the Bioconda Conda channel [15] and available as a virtualised

File Format Converters
In order to facilitate the interoperability with upstream tools and workflows, we provide a set of file format converters, in addition to many tools already available in Galaxy, which together provide for convertion of a range of common data format standards (e.g. VCF, MAF/Stockholm, BED/GFF3, BigWig). These tools produce files that are ready to be used as input to the Galaxy Circos tool. Additionally the applicable subset of circos-utils were included into Galaxy for Circos-friendly tools for data reshaping.

Circos Configuration Export
While Galactic Circos aims to offer the full range of Circos functionality, some manual tweaking of the Circos plot configuration files may still be desired. To this end, our tool also outputs the full set of configuration files needed to recreated the plot on the command line, and thus allow easy access to any features not exposed in the Galaxy wrapper.

Training Materials
Our tool greatly simplifies the creation of Circos plots, but the great number of options offered by the Circos tool require good documentation and explanation in order to optimize their utility for end-users. Circos offers a collection of tutorials that are designed to familiarize users with the various features of Circos [3]. In a similar fashion, we have created a set of Galaxy tutorials aimed to educate users in the use of Circos within Galaxy. These tutorials are available from the Galaxy training materials website [13].

Reproducible and Reusable Plots
To enable readers to examine the complete parameters settings used and recreate the example plots given here, Galaxy histories for all the figures shown in this work have been made publicly available from the European Galaxy server (see Availability section).

Future Work
While we have aimed to make our tool as feature-complete as possible, some of Circos' functionality is not currently exposed in the Galaxy tool. We intend to extend our tool to include these features, including but not limited to support for scaling subsections of the plots, and generation of HTML image maps. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825775.

Author's Contributions
HR and SH contributed equally to the tool development, documentation, and writing of the manuscript.