GraphBio: A shiny web app to easily perform popular visualization analysis for omics data

Background: Massive amounts of omics data are produced and usually require sophisticated visualization analysis. These analyses often require programming skills, which are difficult for experimental biologists. Thus, more user-friendly tools are urgently needed. Methods and Results: Herein, we present GraphBio, a shiny web app to easily perform visualization analysis for omics data. GraphBio provides 15 popular visualization analysis methods, including heatmap, volcano plots, MA plots, network plots, dot plots, chord plots, pie plots, four quadrant diagrams, Venn diagrams, cumulative distribution curves, principal component analysis (PCA), survival analysis, receiver operating characteristic (ROC) analysis, correlation analysis, and text cluster analysis. It enables experimental biologists without programming skills to easily perform popular visualization analysis and get publication-ready figures. Conclusion: GraphBio, as an online web application, is freely available at http://www.graphbio1.com/en/ (English version) and http://www.graphbio1.com/ (Chinese version). The source code of GraphBio is available at https://github.com/databio2022/GraphBio.


Introduction
With the advance of high-throughput techniques (Goodwin et al., 2016), more and more researchers have started to depict molecular profiling in a systematic manner (Cancer Genome Atlas, 2012;Yan et al., 2015 (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

FIGURE 1
Flow diagram of the retrospective protocol for case inclusion and exclusion and data collection. Some representative visualization results from GraphBio. (A) Gene expression heatmap. Rows represent genes, and columns represent samples. Yellow represents upregulation, and blue represents downregulation. (B) The volcano plot shows significantly differentially expressed genes. The x-axis represents log 2 -transformed fold changes, and the y-axis represents −log 10 (FDR). False discovery rate, FDR. Red represents upregulation, blue represents downregulation, and yellow represents genes that are not statistically significant. Some genes of interest are marked in purple. (C) MA plot shows significantly differentially expressed genes. The x-axis represents mean expression values of genes, and the y-axis represents log 2 -transformed fold changes. Red represents (Continued ) Frontiers in Genetics frontiersin.org 02 et al., 2022). Massive amounts of omics data are produced and usually require sophisticated visualization analysis. For example, gene expression studies frequently use heatmaps, volcano plots, and MA plots to characterize expression changes from thousands of genes (Conesa et al., 2016). Moreover, principal component analysis (PCA) and correlation analysis are widely used to estimate similarity or dissimilarity between samples or groups. Although these methods are popular in omics research, they were usually published as R packages, such as ggplot2 (Wickham, 2016), pheatmap for drawing heatmap, GOplot for drawing chord plot of Gene Ontology (GO) analysis results (Walter et al., 2015), FactoMineR for performing PCA analysis (Lê et al., 2008), and pROC for drawing receiver operating characteristic (ROC) curves (Robin et al., 2011), which require experimental biologists to have good programming skills.
Herein, we developed an online web application called GraphBio using a shiny framework in R software. In comparison to other web tools, GraphBio specifically focuses on facilitating the generation of publication-ready plots easily and rapidly instead of data preprocessing and computing. Users can easily prepare data to be visualized by Excel software based on given reference example files from GraphBio. The default figures are generally ideal for publication, and they are only finetuned in some cases. We anticipate that GraphBio would become a good research tool for experimental biologists and advance new scientific discoveries.

Overview of GraphBio
GraphBio provides 15 visualization analysis modules, including heatmap, volcano plots, MA plots, network plots, dot plots, chord plots, pie plots, four quadrant diagrams, Venn diagrams, cumulative distribution curves, PCA, survival analysis, ROC analysis, correlation analysis, and text cluster analysis. Some representative visualization results are shown in Figure 1. GraphBio supports four input file formats: csv, txt, xls, and xlsx. Notably, csv files cannot be encoded in UTF-8 and txt files must be tab-separated. All plots can be downloaded in a PDF, PNG, JPEG, or TIFF format with customizable size and resolution. A "run example" button was added to each module for users to learn the corresponding function features quickly without data preparation steps. Users can easily try changing default parameter values to observe the changes in the example figures. The example data are partly shown when users click "view example file" button to facilitate users to prepare their own data files. In addition, we have also added a "Help Center" section as a user manual in GraphBio.

Example 1. Heatmap for gene expression profiles.
Heatmap is a data matrix visualizing values in the cells using a color gradient, and it has been frequently used in omics data analysis. In GraphBio, the "Heatmap" module requires a gene expression matrix file as input. Our example data include 20 genes and 10 samples. A well-prepared csv file was uploaded, and then an idea plot was automatically generated ( Figure 1A). We can easily adjust the plot by changing colors, FIGURE 1 (Continued) upregulation, blue represents downregulation, and yellow represents genes that are not statistically significant. Some genes of interest are marked in purple. (D) The network plot shows a group of expression-related genes for a target gene. The correlation values are calculated using the Pearson correlation analysis. Red represents positive correlation, and blue represents negative correlation. (E) The dot plot shows some biological processes of interest. The x-axis represents the gene ratio. The point size represents gene numbers. Color represents significance. (F) Pie plot. (G) PCA analysis. (H) The four-quadrant diagram shows the overlapped genes between RNA-seq and m6A-seq data. The x-axis represents log 2transformed fold changes of m6A-seq data, and the y-axis represents log 2 -transformed fold changes of RNA-seq data. Significant genes are marked in four different colors. (I) The Pearson correlation analysis between two variables. (J) Cumulative distribution curves. Kolmogorov-Smirnov test is used for comparing two samples. (K) Survival curves. Log-rank test is used for comparing two samples.
Frontiers in Genetics frontiersin.org clustering methods, gene, and sample name presentation as needed. Notably, the module provides six popular color presets, which are commonly presented in many papers, and a colorblind-friendly color was selected as default.
Example 2. Volcano plots for differential expression analysis. Volcano plots depict the relationship between significance and fold changes of differentially expressed genes, and genes presented in the upper-left and upper-right corners are generally interesting to biologists. The "Volcano plot" module of GraphBio requires an input file with four columns, including geneID, log 2 (fold change), significance (p or padj values), and label. The "label" column represents the genes to be highlighted on the figure. The example result is clearly shown in Figure 1B. The numbers of upregulated and downregulated genes were summarized. We can also customize the colors, fold changes, significance threshold, point size, and label size.
Example 3. Four quadrant diagrams for differential expression analysis between two omics data sets.
Four quadrant diagrams are generally used to analyze two omics data sets, such as RNA-seq and m6A-seq. We used differentially expressed genes and peaks from RNA-seq and m6A-seq data as a demonstrated example. The input file included six columns, including geneID, log 2 FoldChanges (RNA-seq), significance (p or padj values, RNA-seq), log 2 FoldChanges (m6A-seq), significance (p or padj values, m6A-seq), and label. The "label" column represents genes to be highlighted on the figure. The resulting figure is shown in Figure 1H. Four groups of genes are highlighted in different colors, and corresponding gene numbers are also summarized. We can also adjust the significance threshold, fold changes, point size, label size, and colors.

Conclusion
In this article, we introduce GraphBio, an online web application for omics data visualization. It includes 15 popular visualization analysis methods, such as heatmaps, volcano plots, and MA plots. Experimental biologists can easily perform online analysis and get publication-ready plots via accessing the website http://www.graphbio1.com/en/(English version) or http://www. graphbio1.com/(Chinese version) using any web browsers like Google Chrome and Microsoft Edge. In the future, we will continue integrating more popular visualization analysis methods into GraphBio and provide more support to the research community.

Data availability statement
The original contributions presented in the study are publicly available. These data can be found at: https://github.com/ databio2022/GraphBio.