MutRank: an R shiny web-application for exploratory targeted mutual rank-based coexpression analyses integrated with user-provided supporting information

The rapid assignment of genotypes to phenotypes has been a historically challenging process. The discovery of genes encoding biosynthetic pathway enzymes for defined plant specialized metabolites has been informed and accelerated by the detection of gene clusters. Unfortunately, biosynthetic pathway genes are commonly dispersed across chromosomes or reside in genes clusters that provide little predictive value. More reliably, transcript abundance of genes underlying biochemical pathways for plant specialized metabolites display significant coregulation. By rapidly identifying highly coexpressed transcripts, it is possible to efficiently narrow candidate genes encoding pathway enzymes and more easily predict both functions and functional associations. Mutual Rank (MR)-based coexpression analyses in plants accurately demonstrate functional associations for many specialized metabolic pathways; however, despite the clear predictive value of MR analyses, the application is uncommonly used to drive new pathway discoveries. Moreover, many coexpression databases aid in the prediction of both functional associations and gene functions, but lack customizability for refined hypothesis testing. To facilitate and speed flexible MR-based hypothesis testing, we developed MutRank, an R Shiny web-application for coexpression analyses. MutRank provides an intuitive graphical user interface with multiple customizable features that integrates user-provided data and supporting information suitable for personal computers. Tabular and graphical outputs facilitate the rapid analyses of both unbiased and user-defined coexpression results that accelerate gene function predictions. We highlight the recent utility of MR analyses for functional predictions and discoveries in defining two maize terpenoid antibiotic pathways. Beyond applications in biosynthetic pathway discovery, MutRank provides a simple, customizable and user-friendly interface to enable coexpression analyses relating to a breadth of plant biology inquiries. Data and code are available at GitHub: https://github.com/eporetsky/MutRank.


Introduction
Transcriptomic data can uncover complex biological processes in part through the improved understanding of gene coexpression patterns. Mutual Rank (MR), the geometric mean of the ranked Pearson's Correlation Coefficient (PCCs) between a pair of genes was shown to be a better indicator of functional associations and produces more robust results when using raw data compared to PCC. It was demonstrated that MR analyses of transcripts should be favored in the prediction pathway gene functions and serve as springboard for hypothesis testing and validation. We developed an R Shiny web-application, termed MutRank, to facilitate user control over both targeted and non-targeted MR-based coexpression analyses for rapid hypothesis testing. In addition to identifying highly coexpressed genes in any user-provided expression dataset, MutRank automatically integrates supporting information such as gene annotations, differential-expression data, predicted domains and assigned GO terms and provides useful tabular and graphical outputs as foundation for empirical hypothesis testing. The goal of MutRank is to provide simple, customizable and readily accessible tools to speed research progress in connecting metabolic phenotypes to genotypes for the purpose of understanding biological roles.

R Dependencies
MutRank will automatically install the packages listed below when when starting the app. The app was tested on windows, linux and macOS with the listed library versions.

Data Input Tab
The Data Input tab is the first tab of the MutRank app in which users can load their expression data and supporting information for MR-based coexpression analyses (Fig. 1). The only requirement to conduct MR-based coexpression analyses is the expression data. Additional user-provided supporting information will be automatically integrated with the coexpression results. When the MutRank app starts, each data folder is parsed to find all files with the expected filename extensions and these files are listed in their relevant dropdown menus. A short delay is expected when loading large expression files, but a short text output containing the table dimensions will update once the expression data file is loaded (Fig. 1A). Non-expressed genes (zero sum expression) are automatically filtered to prevent error messages. User provided supporting information includes gene annotations (Fig. 1B), gene symbols ( Fig. 1C), differential expression data ( Fig.  1D), custom categories ( Fig. 1E), Pfam protein domain annotations (Fig. 1F) and the Gene Ontology (GO) database file along the GO assignments (Fig. 1G). By default MutRank starts with loading the example files but this can be changed by pressing the "Remember Selected Files" button. In the main panel users can load (D) differential expression data, (E) custom categories, (F) protein Pfam domain annotation and (G) the GO database file (notice that in this instance "GO-basic.obo" was used instead of the default "goslim_plant.obo") along the GO assignments. The "Remember Selected Files" button can be used to changed the default files MutRank loads on start (H).

Mutual Rank Tab
Once the expression data and supporting information are loaded the MR-based coexpression analyses can start (Fig. 2). First, the user should select one of 3 possible reference gene methods: (1) Single reference gene, (2) compound reference gene or (3) reference gene list and then insert a reference or gene list ( Fig. 2A). The compound reference gene method creates a new compound reference gene from the calculated average, sum, maximum or minimum expression values of the reference gene list. The reference gene list method calculates the MR values between the genes in the list using the first gene in the list as the primary referenece gene. Gene lists can be separated by: tab, new line, vertical tab, space and comma. By default MutRank will find the 200 coexpressed genes using Pearson's Correlation Coefficient (PCC) values (Fig. 2B) to generate the list of genes for which MR values will be calculated. This practical trade-off between whole genome and targeted coexpression analyses allows MutRank to rapidly complete the analysis and to run on the resources of most personal computers. MR values will be calculated after pressing the "Calculate MR Values" button (Fig. 2C). Additional settings allow the user format the coexpression results and to integrate supporting information (Fig. 2D). The final results will be presented in the MR-based coexpression table in the main panel (Fig. 2E) which can be downloaded as a tsv file using the 'Download Table' button (Fig. 2E).

Heatmap Tab
The MR-based coexpression table generated in the Mutual Rank tab can be used to generate a heatmap graphical output in the Heat Map tab (Fig. 3). We set the maximum number of genes to be presented using the heatmap at 25 to keep it intelligible. We have included a few options that allow users to modify the heatmap figure, including the number of genes to included in the heatmap (Fig. 3A), the maximum MR value to be included as text within the heatmap (Fig. 3B), the text size (Fig. 3C) and an option to convert gene IDs to gene symbols, when applicable (Fig. 3D). The red-to-white color gradient is used to represent the MR values and is set to represent MR values between 1 and 100 with all values higher than 100 set to a white color. The heatmap presented in the main panel (Fig. 3E) can be downloaded as a PNG file using the 'Download Heatmap' button (Fig. 3F).  Figure 4: Network Tab Screenshot -In the side panel users can select (A) how many of the top coexpressed genes to include in the network (A), the MR threshold to connect gene vertices with an edge (B), the size of the text labels (C), wether to convert the shape of reference gene vertex to a star (D) and wether to convert the gene IDs of each node to gene symbols, when applicable (E). Differential expression Log2 fold-change values can be integrated by selecting one of the columns from the data to change the color of the gene nodes (F). Custom categories can be integrated by changing the vertex shape (G). In the MR-based coexpression network (H), gene annotations can accessed by pressing on any of the gene nodes to trigger a pop-up text message (4I). The gradient scale (J) was added manually after the screenshot was taken.

Enrichment Tab
The MR-based coexpression table generated in the Mutual Rank tab can be tested for Gene Ontology (GO) enrichment. We use the hypergeometric test using the selected GO database to calculate the P-values for GO term enrichment (Fig. 5). In the side panel users can select MR threshold that will be used to include genes for the enrichment analysis (Fig. 4A). Users can also choose to include in the final table, for each GO term, the non-adjusted p-values, the values used for the hypergeometric test and the list of genes included in the analysis (Fig. 5B). The column names used for the values used for the hypergeometric test are: "N" -Number of genes in the GO annotation files; "M" -Number of genes annotated with specific GO term; "n" -Number of included genes from the coexpression table; "m" -Number of included genes from the coexpression table that are annotated with the specific GO term. Users can also select which method (holm, hochberg, hommel, bonferroni, BH or BY) to use to adjust the P-value for false-discvery rate (FDR) (Fig. 5C). The GO enrichment table presented in the main panel (Fig. 5D) can be downloaded as a PNG file using the 'Download Table' button (Fig. 3E).
Figure 5: Enrichment Tab Screenshot -In the side panel users can select MR threshold that will be used to include genes for the enrichment analysis (A). Users can choose to include in the final table, for each GO term, the non-adjusted p-values, the values used for the hypergeometric test and the list of genes included in the analysis (B). Users also select which method to use to adjust the P-value for false-discvery rate (FDR) (C). The GO enrichment table presented in the main panel (D) can be downloaded as a PNG file using the 'Download Table' button (E).

Credits
MutRank was conceived by Elly Poretsky and Alisa Huffaker and implemented by Elly Poretsky. We are grateful to Eric A. Schmelz for providing helpful comments in process of developing MutRank and on the corresponding manuscript. The MutRank web-application was made possible with R, R Studio, Shiny and the additional dependencies mentioned in this manual.

License
MutRank is available under the terms of the Creative Commons Attribution License (CC BY-NC 3.0).