PaintOmics 4: new tools for the integrative analysis of multi-omics datasets supported by multiple pathway databases

Abstract PaintOmics is a web server for the integrative analysis and visualisation of multi-omics datasets using biological pathway maps. PaintOmics 4 has several notable updates that improve and extend analyses. Three pathway databases are now supported: KEGG, Reactome and MapMan, providing more comprehensive pathway knowledge for animals and plants. New metabolite analysis methods fill gaps in traditional pathway-based enrichment methods. The metabolite hub analysis selects compounds with a high number of significant genes in their neighbouring network, suggesting regulation by gene expression changes. The metabolite class activity analysis tests the hypothesis that a metabolic class has a higher-than-expected proportion of significant elements, indicating that these compounds are regulated in the experiment. Finally, PaintOmics 4 includes a regulatory omics module to analyse the contribution of trans-regulatory layers (microRNA and transcription factors, RNA-binding proteins) to regulate pathways. We show the performance of PaintOmics 4 on both mouse and plant data to highlight how these new analysis features provide novel insights into regulatory biology. PaintOmics 4 is available at https://paintomics.org/.


INTRODUCTION
Multi-omics approaches have become popular in the study of a wide range of biological domains, with multi-omics datasets being now commonly obtained by individual investigators as well as large consortia (1)(2)(3). Moreover, the number and diversity of measured omics modalities have also increased, with former studies combining at most two or three omics platforms, and more recently genomics, transcriptomics, methylomics, chromatin accessibility, proteomics, metabolomics and/or lipidomics often combined in a single study (4)(5)(6). Multiple approaches have been proposed for the integrative analysis of these data (see (7,8) for reviews). Current methods can be broadly divided into three groups: detecting biomarkers, classifying samples, and in-W552 Nucleic Acids Research, 2022, Vol. 50, Web Server issue ferring functional relationships between molecular layers (9). Regardless of the aim of these analyses, the subsequent biological interpretation of the analysis results is frequently difficult and time-consuming, as it requires human interaction and comprehension. Biological interpretation of multiomics models is a grand challenge faced by investigators interrogating multi-omics data (9).
Three major strategies may be applied for the interpretation of multi-omics data. Overrepresentation and enrichment analyses are widely used in genomics and transcriptomics analyses (10). These methodologies have been adapted to the different omics data types (11)(12)(13), as well as to their integrative analysis (14)(15)(16). Enrichment methods are powerful tools to identify which biological processes are regulated in a given condition, but they are limited by the vocabulary, lack of comprehensive annotations and absence of mechanistic insight within and across omics layers.
An alternative approach is to use multi-partite networks, where nodes depict molecular entities and edges indicate regulatory or covariance information that can be extracted from the integrative statistical analysis (17). A variety of network (graphical) resources (e.g. Cytoscape (18), 3Omics (19), VisANT (20), OmicsAnalyst (21)) may be used to visualise relationships among biomolecules. Unfortunately, networks are frequently too large and interpretability is limited by the lack of context. A recent novel related method is mul-tiSlide, which visualizes interconnected molecular features in heatmaps of multi-omics data sets (22).
A third option is to leverage existing biological knowledge represented in pathway maps to project multiomics data and visualise them within a highly interpretable format. Examples are kaPPA-view 4 (23), Map-Man4 (24) and PaintOmics 3 (14). Some tools may include several of these options. For instance, Cytoscape, OmicsAnalyst and Paintomics 3 also support enrichment analysis.
Pathway-based visualization methods also have limitations. First, they lack the flexibility to incorporate significant features suggested by the statistical analysis since absent in the available map. Moreover, interpretation is limited by the pathway boundaries decided by curators and by the amount and identity of the molecular features captured in the maps (25). Some of the measured features may not have a pathway location and some pathway components might not be measured. Additionally, some pathway elements might correspond to multiple measured features, such as a protein complex or a gene family. The limitations of pathway methods are particularly evident in untargeted and semi-targeted metabolomics data, since many measured compounds are unidentified, or only identified based on a large class of similar compounds (i.e. lipids) and therefore not present on current maps. Consequently, pathwaybased enrichment methods behave poorly on metabolomics data and other strategies, such as metabolite-class enrichment (26) might be more appropriate. Here we present PaintOmics 4, a substantial expansion of the Paintomics 3 web server for pathway-based multi-omics data analysis (14,27), that addresses some of these current limitations in pathway definitions, metabolomics data integration and visualisation across molecular layers. An overview of the new functionalities is provided in Supplementary Figure S1.

Implementation of Reactome and MapMan pathway databases into PaintOmics 4
Next to KEGG, PaintOmics 4 adds Reactome and Map-Man to the list of pathway databases supported by the application (Supplementary Figure S2) (24,28). The new pathway data was incorporated into the existing PaintOmics MongoDB where pathways are classified into categories and features are mapped to the lowest pathway level. All PaintOmics data structures were modified to include a 'source database' field. This resulted in 904 new pathways distributed into 29 categories from Reactome and 25 new pathways in two categories from MapMan added to PaintOmics 4. Queries to the integrated database now proceed in batches to accommodate the larger number of entities to be searched. PaintOmics represents multi-omics data on pathways by overlaying feature expression and intensity values on their box positions in the pathway map using a colour scheme. In order to display multi-omics data on Reactome pathways, node coordinates available in pathway XML files were used. Since MapMan BIN coordinates are approximate, all ManMap pathway images were manually inspected and when required, XML files were edited to ensure the correct painting of data. Currently

Regulatory Omics functionality
PaintOmics 3 provided the Regulatory Omics option designed to upload data on features such as microRNA-seq, acting as regulators of gene expression. PaintOmics 4 extends this functionality to accept any type of trans-acting element operating on genes, transcripts or proteins and includes filtering functions to extract meaningful regulatory relationships. In addition to microRNA-seq, transcription factors (TF) and splicing factors (SF), detected by RNAseq, RNA-binding proteins identified by CLIP-seq, etc., can be analysed with this option. The Regulatory Omics option takes a trans-regulatory-feature data matrix with expression or activity values for regulators in the conditions of the study. The regulator-gene/protein mapping file is provided by the user, together with an optional list of significant deferentially expressed regulators. PaintOmics 4 filtering options include thresholds for positive or negative correlation to select the expected regulatory relationships. Applying these criteria, regulatory features will be mapped to their targeted features and their corresponding pathways. A pathway enrichment score is calculated either based on the number of regulators mapping to each pathway or on the number of regulated genes present in the pathway. Enriched pathways for the Regulatory Omics modality represent biological processes that are significantly impacted by that regulatory layer.

Novel metabolomics interpretation methods
Metabolite hub analysis. One of the goals of multi-omics studies that combine metabolomics and gene expression or Nucleic Acids Research, 2022, Vol. 50, Web Server issue W553 proteomics data is to associate changes in metabolite levels with the regulation of the enzymes that may contribute to metabolite turnover. PaintOmics 4 leverages pathway information to identify metabolites that have a high proportion of differentially expressed features in their close network. Two tests were implemented ( Figure 1). The binomial test is used to evaluate for each differentially expressed metabolite (DEM) the hypothesis that, given the overall percentage of differentially expressed genes (DEG) p 0 in the dataset, the proportion of DEG linked to the metabolite is significantly higher than p 0 . The hub analysis evaluates genes directly connected in the network (step 1) or genes associated with the metabolite through up to 3 intermediate nodes (steps 2 to 4). P-values are corrected for multiple testing (29). Alternatively, the distribution in the overall metabolic network of the percentage of neighbouring DEG for metabolite nodes is computed and the percentile position of each measured metabolite in this distribution is calculated. Metabolites with a high percentile value have a higher proportion of connecting DEG than the majority of the metabolites in the database.

Metabolite class activity.
To test the hypothesis of a metabolite class being regulated, PaintOmics 4 implements a metabolite class activity analysis tool, where a binomial test is used to assess the hypothesis of the proportion of significant compounds in a given measured metabolite class being higher than a user-defined threshold. In case the user does not define an activity threshold, PaintOmics 4 will use the average percentage of significant metabolites as threshold for the null hypothesis. P-values are corrected for multiple testing (29). These novel metabolomics analysis tools are provided as a separate tab in the main PaintOmics results panel that includes hyperlinks to facilitate navigation between metabolite data, neighbouring genes and metabolic pathways ( Figure 1).

Metagenes for nodes and pathways
PaintOmics displays omics data on pathways maps by colouring the node position of the omic feature according to its experimental value. When a node contains multiple features, e.g. MapMan BINs, the map topology may not be able to accommodate the amount of data. In order to address this problem, metagenes are computed for pathway nodes with more than four matching features (30), resulting in a compressed representation of omics data in complex nodes that fits available space on the map. Note that when one node contains features with different profiles, the analysis might return multiple metagenes for the node, one per profile type.

Use case datasets
PaintOmics 4 functionalities were demonstrated using two different multi-omics datasets. The STATegra data that collects multi-omics data for a mouse B-cell differentiation process triggered by the expression of the Ikaros TF (4).
The second dataset corresponds to an Arabidopsis study that evaluated the root transcriptional and metabolic profile of a BRL3 overexpressing mutant in drought conditions. BRL3 is a vascular-enriched member of the brassinosteroid family, which was speculated to confer drought tolerance (31).

STATegra data analysis with PaintOmics 4 reveals novel regulatory events during B-cell differentiation
We used the STATegra multi-omics dataset describing the differentiation of the murine B3 cell line from a proliferating pre-BI state to differentiated pre-BII (4) to demonstrate PaintOmics 4 functionalities. RNA-seq, micro-RNA-seq, DNase-seq, metabolomics and TF data were available. The dataset contains temporal data for 13 123 genes, 469 mi-croRNAs, 10 272 DNaseq regions, 320 TFs and 60 metabolites, of which 5224, 172, 5099, 180 and 40 features, respectively, were found to have significant differences along the differentiation course (4). Five STATegra omics modalities were run in PaintOmics 4 selecting both KEGG and Reactome as pathway databases. KEGG disease and organismal pathways were excluded from the analysis. Data mapped to a total of 169 KEGG and 439 Reactome pathways, of which 14 and 11, respectively, were found significant by the Fisher combined P-value method (32) that jointly considers all omics modalities. The full list of significant pathways is provided in Supplementary Table S1.

PaintOmics 4 indicates a multi-layered control of B-cell differentiation.
We first analysed the overall patterns of pathway changes across molecular layers using the pathway network analysis in PaintOmics 4 to focus attention on the analysis of our Regulatory Omics types, microRNA and TFs (Supplementary Figure S3). This tool revealed that pathways change during B-cell differentiation according to 2-3 patterns. For gene expression, most metabolic and genetic information pathways were downregulated, while signalling pathways showed both up and down regulation trends. microRNAs associated genes in these pathways tended to be upregulated at late-time points (pre-BII stage) in metabolic and genetic information processing pathways, while upregulation for signalling pathways took place at early time points (pre-BI stage). Finally, TF regulation showed the opposite behaviour, with TFs that bind metabolic and genetic information processing pathways being downregulated as differentiation progresses but upregulated for signalling pathway genes. These results indicate a highly coordinated control of biological pathways during B-cell differentiation, characterized by the transcriptional activation of signalling pathways and the downregulation of metabolic activities, with transcriptional (TF) and posttranscriptional (microRNA) mechanisms contributing to this program.
KEGG and Reactome complemented each other in the analysis of STATegra multi-omics data. The analysis of enriched KEGG and Reactome pathways indicated commonalities and differences between the two resources (Supplementary  Table S1). Both databases reported enrichment of glucose, amino-acids and nucleotide metabolic processes, and of p53 signalling. However, most significant signalling pathways did not coincide, possibly due to different pathway definitions between KEGG and Reactome. The combined analysis revealed many of the known processes operating during the differentiation of the hematopoietic and immune cell lineages, e.g. Interleukin-2 family signalling, Interferon gamma signalling, RAF-independent MAPK1/3 signalling, for Reactome, and JAK-STAT signalling, FOXO signalling, and Hippo signalling for KEGG (33). Reactome but not KEGG identified the RET signalling pathway as enriched (combined P-value = 0.029). RET is a tyrosine kinase receptor essential for embryonic development (34) which has also been found to be expressed in hematopoietic tissues, suggesting a role in the development of the immune system. Specifically, RET induces the expression of chemokines and cytokines, and downregulates chemokine/cytokine receptors (35). In the B3 cell differentiation process, we found a strong upregulation of RET and other associated membrane receptors (Figure 2A and B). A concordant regulation by many microRNAs and transcription factors was also found ( Figure 2C). Interestingly, a different component of the RET signalling pathway is the DOK protein, represented by three family members (1, 3 and 4) in our data. These genes were downregulated at the transition between pre-BI and pre-BII stages, together with their only significant TF (STAT4: ENSMUSG00000062939), meanwhile the associated mi-croRNA (mir-188-3p) was strongly upregulated. STAT4 is a key TF of the immune cell lineage (36), and mir-188-3p has been reported to regulate cell proliferation (37). Whether these different regulatory relationships in the RET signalling pathway represent specific contributions to the differentiation of B-cells remains to be investigated. Our results showcase the power of PaintOmics 4 pathway-based multi-omics analysis to present and dissect a diversity of multi-layered regulatory relationships.

PaintOmics 4 novel metabolomics analysis tools highlight metabolite roles in B-cell differentiation.
Pathway enrichment analysis based on metabolomics data did not detect any significant pathways, possibly due to the limited number of metabolites in this dataset. However, the Metabolite Class Activity analysis identified amino acids as significant and, marginally, also carboxylic acids ( Figure 3A). Accordingly, most amino acids had higher values at early time points (Figure 3B), which is consistent with the high proliferative state of the pre-BI stage where protein synthesis is highly active (38). Moreover, the metabolite hub analysis of the STATegra data highlighted a number of compounds as having a high proportion of DEG in their proximal network, among them three polyamines: spermidine, putrescine, and spermine ( Figure 3C). These metabolites have higher levels at the pre-BI stage and decrease as cells differentiate towards pre-BII ( Figure 3D). Neighbouring genes for these metabolites included Srm (Spermidine synthase), Sms (Spermine synthase) and Amd1 (S-adenosylmethionine decarboxylase proenzyme 1) which were downregulated during differentiation ( Figure 3E). Ikaros triggers pre-B-cell differentiation through repression of the c-Myc transcription factor (39), which is known to regulate the expression of polyamine synthesis genes such as Srm, Sms and Amd1 genes (40). Therefore, repression of c-Myc is consistent with the observed downregulation of these genes and of the three polyamines in the STATegra dataset. Moreover, spermidine, putrescine, and spermine have an established role in cell proliferation (41) and are likely to play a key role in T-cell and B-cell differentiation (42). The PaintOmics 4 analysis highlights the polyamine Nucleic Acids Research, 2022, Vol. 50, Web Server issue W555

PaintOmics 4 analysis of Arabidopsis drought response leverages MapMan and KEGG pathways for novel pathway insights
We used the Arabidopsis BRL3ox study (31) to showcase the utility of PaintOmics 4 for the interpretation of multi-omics data from plants. This study evaluated the response to drought conditions of a mutant overexpressing BRL3, a plant brassinosteroid receptor. Roots RNA-seq and metabolomics data after 5 days of drought treatment, together with a list of differentially expressed features, were available.
We run PaintOmics 4 on the BRL3ox data using the KEGG and MapMan databases. While KEGG is a general pathway database, MapMan is tailored to plants and contains a more detailed representation of plant-specific pathways. A total of 18 and 8 enriched pathways were found for KEGG and MapMan databases, respectively (combined adjusted P-value < 0.05) ( Figure 4A, Supplementary Table S2). KEGG results indicated enrichment of multiple metabolic pathways (e.g. Phenylpropanoid biosynthesis, Biosynthesis of secondary metabolites, Brassinosteroid metabolism, among others), as well as signalling pathways, including MAPK signalling pathway, ABC transporters and the general Plant hormone signal transduction. MapMan, however, returned an enrichment picture that highlighted the role of specific hormones (synthesis of jasmonic acid, GABA, abscisic acid), secondary metabolites (flavonoids, chorismate, polyamines) and the synthesis of lignin in the BLR3ox response to drought, complementing the biological interpretation provided by KEGG. Many of these processes were discussed in the original publication, supporting the robustness of PaintOmics 4 analysis. Here, we discussed two novel results not identified in the previous study.
The Synthesis of Chorismate pathway was found enriched by the PaintOmics 4 MapMan analysis. Visual inspection of this pathway indicated a general downregulation in the BRL3ox mutant under drought conditions (Figure 4B). This pathway catalyses the formation of chorismate, the last step in the shikimate pathway and a branchpoint metabolite used for the synthesis of aromatic amino acids, p-aminobenzoic acid, folate, and other cyclic metabolites such as ubiquinone (43). Under abiotic stress conditions, plants activate the synthesis of aromatic compounds through the shikimate pathway, improving salt stress tolerance but not causing oxidative or drought stress (44). However, some aromatic compounds, such as m-tyrosine, inhibit the growth of many plant species by slowing down root development, and high tryptophan levels have been reported to inhibit root growth (45). Interestingly, both tyrosine and tryptophan were down regulated in BRL3ox plants compared to the WT plants under drought conditions (31). We speculate that BRL3ox overexpression results in downregulation of chorismate pathway and aromatic compound synthesis under drought, improving drought tolerance without growth arrest. Another significant MapMan pathway was synthesis of lignin, which shows upregulation in treated BRL3ox plants compared to the WT ( Figure 4C). This pathway is represented in the MapMan database by 113 different genes distributed into 12 reactions, which implies that multiple steps in the pathways are associated with a large number of genes. For example, the last steps of the pathways represent the conversion from aldehyde to alcohol of coumaryl, coniferyl and synapil, catalysed by cinnamyl alcohol dehydrogenase protein family, with 44 associated genes, which would be difficult to include in the map. The PaintOmics metagene function calculated two major upregulating trends for this node, providing a more interpretable representation for the reaction ( Figure 4C). Lignin is known to play an important role in improving plants' drought resistance through water transport and mechanical support (46). The PaintOmics 4 analysis suggests that the coordinated upregulation of gene families catalysing monolignol synthesis is part of the BRL3ox mechanism of drought resistance.

DISCUSSION
While multi-omics studies have increased in number, scope, sample size and diversity of measured omics modalities, and a wide range of integrative statistical data analysis tools have been proposed, the biological interpretation of these data is still a major challenge. PaintOmics 4 addresses this problem by projecting processed values of multi-omics features onto pathway maps. However, the success of a pathway analysis strategy depends on the amount and identity of the features captured in the database and on their distribution among pathway definitions. Different biological pathway tools focus on different types of organisms, cellular processes, and types of reactions, thereby offering different but complementary views of the biology. Our use cases showed that, by implementing different databases in PaintOmics 4, complementary information about the experiment is gained, and interpretation of the data is improved. To the best of our knowledge, PaintOmics 4 is the only tool to combine these three pathway databases under one analysis.
Another distinctive aspect of PaintOmics 4 is its versatility in representing virtually any omics modality on pathways. This implies that omics layers that act as direct or indirect regulators of pathway activities, such as epigenetic marks or microRNAs, can be interpreted from the perspective of the pathways that they impact. PaintOmics 4 not only provides enrichment analysis for these layers, but further improves interpretability by showing the relationship between the regulatory feature and regulated gene on the pathway maps. Importantly, and in contrast to other methods that do accept regulatory omics data for pathway views (21) PaintOmics 4 displays multiple regulatory information simultaneously on the pathway representation both globally and for each pathway node, allowing for different levels of granularity in the analysis. This unique functionality implies that both cis and trans regulatory relationships can be directly linked to the mechanistic representation of the biological process captured by the pathway, thereby facilitating the understanding of the multi-layer component of the multi-omics study. Analysis and interpretation of metabolomics experiments are particularly challenging because usually pathways only accurately represent a fraction of the measured metabolites (e.g. in lipidomics, many different compounds might be represented by the same entity in the pathway) and/or, many metabolites present in the metabolic network are not measured in the metabolomics experiment. Moreover, a priori hypotheses of metabolite relevance for the study may dictate the type of metabolomics assay to be run (our STATegra and BLR3ox use cases are examples of this). In such a case, a large fraction of the measured compounds may show significant changes, and this jeopardises any enrichment analysis strategy. To still be able to evaluate if the targeted metabolite types are affected in the experiment, we introduced the Metabolite Class Activity analysis, where we test the hypothesis of the measured metabolite class having a high proportion of compounds with significant changes. Another type of question that multi-omics studies involv-ing metabolomics may pose is the link between the metabolite change and the regulation of the expression of the genes, proteins or other compounds that could modify their metabolite levels. This is relevant, for example, when looking for metabolite biomarkers or targets of metabolic control. This question can be addressed by a gene-metabolite bipartite network analysis (47) or by a flux balance analysis (FBA) strategy (48). However, gene-metabolite correlation networks usually lack the pathway context and FBA returns information on fluxes rather than compounds, requires a complex mathematical formulation, and still has limited adaptations to multi-omics data (49). In PaintOmics 4, we propose a simple approach based on the analysis of the local metabolite network to identify the proportion of differentially expressed features. Applied to our STATegra dataset, this method identified a number of metabolites displaying hub-like properties (spermidine, spermine, citricacid, etc.) and complemented the limited metabolomics en-W558 Nucleic Acids Research, 2022, Vol. 50, Web Server issue richment analysis results. As the hub analysis panel also provides links to the involved features and pathways, the user has all the information on hand to navigate and interpret the data.
In summary, PaintOmics 4 is a web server for the multilayered biological interpretation of multi-omics data that includes a wealth of resources for a comprehensive and interactive analysis. Future developments will address the growing applications of the multi-omics paradigm to assist precision medicine and single-cell analyses.