Imaging-AMARETTO: An Imaging Genomics Software Tool to Interrogate Multiomics Networks for Relevance to Radiography and Histopathology Imaging Biomarkers of Clinical Outcomes.

PURPOSE
The availability of increasing volumes of multiomics, imaging, and clinical data in complex diseases such as cancer opens opportunities for the formulation and development of computational imaging genomics methods that can link multiomics, imaging, and clinical data.


METHODS
Here, we present the Imaging-AMARETTO algorithms and software tools to systematically interrogate regulatory networks derived from multiomics data within and across related patient studies for their relevance to radiography and histopathology imaging features predicting clinical outcomes.


RESULTS
To demonstrate its utility, we applied Imaging-AMARETTO to integrate three patient studies of brain tumors, specifically, multiomics with radiography imaging data from The Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and low-grade glioma (LGG) cohorts and transcriptomics with histopathology imaging data from the Ivy Glioblastoma Atlas Project (IvyGAP) GBM cohort. Our results show that Imaging-AMARETTO recapitulates known key drivers of tumor-associated microglia and macrophage mechanisms, mediated by STAT3, AHR, and CCR2, and neurodevelopmental and stemness mechanisms, mediated by OLIG2. Imaging-AMARETTO provides interpretation of their underlying molecular mechanisms in light of imaging biomarkers of clinical outcomes and uncovers novel master drivers, THBS1 and MAP2, that establish relationships across these distinct mechanisms.


CONCLUSION
Our network-based imaging genomics tools serve as hypothesis generators that facilitate the interrogation of known and uncovering of novel hypotheses for follow-up with experimental validation studies. We anticipate that our Imaging-AMARETTO imaging genomics tools will be useful to the community of biomedical researchers for applications to similar studies of cancer and other complex diseases with available multiomics, imaging, and clinical data.


INTRODUCTION
Major collaborative initiatives have unleashed a myriad of multiomics, clinical, and imaging data for large patient cohorts in studies of cancer, such as multiomics and clinical data from The Cancer Genome Atlas (TCGA) 1 and the Clinical Proteomic Tumor Analysis Consortium (CPTAC) 2 and radiographic and histopathology imaging data from The Cancer Imaging Archive (TCIA). 3 For example, the brain tumor section of TCGA provides multiomics profiles, including RNA sequencing (RNA-seq), DNA copy number variation, and DNA methylation data for approximately 500 patients with glioblastoma multiforme (GBM) 4,5 and approximately 500 patients with low-grade glioma (LGG). 6 The TCIA Visually AcceSAble Rembrandt Images (VASARI) [7][8][9] project curated a feature set of approximately 30 magnetic resonance imaging (MRI)-derived features on the basis of specialists' review that is available for approximately 200 patients with GBM and approximately 180 patients with LGG.
A trade-off exists between the number of patients included in a data set and the depth of analysis that has moved to increasing levels of refinement, ranging from studying tissues, to cell populations, to single-cell [10][11][12] sequencing. For example, the Ivy Glioblastoma Atlas Project (IvyGAP) [13][14][15] provides 270 transcriptomic profiles refined through histopathology imaging and annotated by a consensus of histopathologists for studying anatomic structures and cancer stem cells for a subset of approximately 30 patients from the TCGA GBM cohort.
In parallel, quantitative imaging provides tools that are capable of processing large volumes of radiography and histopathology images, such as deep convolutional neural networks. The promising field of radiogenomics is based on the idea that entities at different scales, such as molecules, cells, and tissues, are linked to one another and, therefore, may be modeled as a whole. 16 Studies have shown that quantitative image features extracted from radiography imaging data are associated and predictive of gene expression patterns from tissues of matched tumors. [17][18][19][20] Recent efforts further expand this work to link multiomics data with both radiography and histopathology imaging, 21,22 toward developing methods for imaging genomics.
These large archives of multimodal and multiscale data sources provide complementary insights into the mechanistic basis of cancer, toward better diagnosis and treatment, and open unprecedented modeling opportunities to link multiomics data with clinical and imaging phenotypes. As a solution to imaging genomics, we introduce the Imaging-AMARETTO software tools to systematically interrogate networks derived from multiomics data for relevance to imaging biomarkers of clinical outcomes. We demonstrate the utility of these imaging genomics tools by integrating three patient brain tumor databases, including the TCGA GBM and LGG and the IvyGAP GBM cohorts. We uncover known and novel drivers of tumor-associated microglia and macrophage mechanisms, and neurodevelopmental and stemness mechanisms, with interpretation of the underlying molecular mechanisms in light of imaging biomarkers of clinical outcomes.

The *AMARETTO Software Architecture
The *AMARETTO framework (Fig 1) provides tools for network-based fusion of multiomics, clinical, and imaging data within and across multiple patient studies of cancer. Specifically, this framework offers modular and complementary solutions to multimodal and multiscale aspects of network-based modeling within and across studies of cancer through the AMARETTO and Community-AMARETTO algorithms, respectively. In this work, we present an imaging genomics software toolbox that comprises the newly formulated Imaging-AMARETTO and Imaging-Community-AMARETTO algorithms that together facilitate interpretation of patient-derived multiomics networks for their relevance to radiography and histopathology imaging biomarkers of clinical outcomes. Resources of the *AMARETTO 23 software toolbox are available through GitHub, Bioconductor, R Jupyter Notebook, GenePattern, GenomeSpace, and GenePattern Notebook.

Imaging-AMARETTO From a User's Perspective
The workflow for multiomics, clinical, and imaging data fusion includes utilities to learn networks from individual patient cohorts using Imaging-AMARETTO and to link networks across multiple related patient cohorts using Imaging-Community-AMARETTO. Together, these workflows allow users to integrate patient tumor-derived multiomics or transcriptional profiles with clinical and molecular characteristics and radiography and histopathology imaging features within and across related patient cohorts. The Imaging-AMARETTO source code in R is available from GitHub. 24 An R Jupyter Notebook that CONTEXT Key Objective Imaging-AMARETTO provides software tools for imaging genomics through multiomics, clinical, and imaging data fusion within and across multiple patient studies of cancer, toward better diagnostic and prognostic models of cancer. Knowledge Generated Our network-based imaging genomics tools serve as powerful hypothesis generators that facilitate the testing of known hypotheses and uncovering of novel hypotheses for follow-up with experimental validation studies. Our case study that integrated multiple studies of brain cancer illustrates how Imaging-AMARETTO can be used for imaging diagnostics and prognostics by interrogating multimodal and multiscale networks for imaging biomarkers to identify their clinically relevant underlying molecular mechanisms.

Relevance
We anticipate that our Imaging-AMARETTO tools for network-based fusion of multiomics, clinical, and imaging data will directly lead to better diagnostic and prognostic models of cancer. In addition, our tools for network biology and medicine will open new avenues for drug discovery by integrating pharmacogenomics data into these networks, toward better therapeutics of cancer.
provides stepwise guidelines for running the source code directly from GitHub for its application to brain cancer is available from Google Colaboratory. 25 Imaging-AMARETTO supports multiple workflows: a patient cohort with only transcriptional profiles and a patient cohort with multiomics profiles. When only RNA-seq data are available for a cohort, a predefined list of candidate driver genes is required, which can be selected or uploaded by the user. (2) the Community-AMARETTO algorithm learns communities or subnetworks of regulatory circuits that are shared or distinct across networks derived from multiple studies of cancer (eg, across the TCGA GBM, IvyGAP GBM, and TCGA LGG cohorts); and (3) the Imaging-AMARETTO and Imaging-Community-AMARETTO algorithms associate these circuits (AMARETTO) and subnetworks (Community-AMARETTO) to clinical, molecular, and imaging-derived biomarkers by mapping radiography and histopathology imaging data onto the networks and assessing their clinical relevance for imaging diagnostics.
analysis. Networks of regulatory modules are inferred from RNA-seq data using an iterative optimization procedure. The algorithm [34][35][36][37] starts with an initialization step that clusters the genes into modules of co-expressed target genes. For each of these modules, we learn the regulatory programs as a linear combination of candidate driver genes that best predict their target genes' expression using Elastic Net-regularized regression. 38 Target genes are then reassigned to the regulatory programs that best explain their gene expression levels as estimated by the predictive power of the regulatory programs' respective regularized regression models when predicting the target genes' expression. [34][35][36][37] The algorithm iterates over these two steps until convergence. This analysis generates a network of regulatory modules, defined as a group of target genes collectively activated or repressed by their associated drivers.
Imaging-Community-AMARETTO: Linking Imaging-AMARETTO Networks Across Cohorts To compare and integrate networks of regulatory modules across multiple cohorts, the user can submit two or more Imaging-AMARETTO networks and optionally add known networks as collections of signatures to guide subnetwork learning and interpretation, such as immune cell (CIBERSORT 39,40 ) and stemness [41][42][43] signatures. The algorithm 34 creates a module map of all pairwise comparisons between modules across multiple networks to assess the extent of overlapping genes between all pairs of modules (−log 10 P value, hypergeometric test). This module map is partitioned using an edge betweenness community detection algorithm (Girvan-Newman 44 ) that groups the modules into subnetworks or communities across the multiple networks. These communities represent shared behavior across two or more cohorts, and modules not assigned to communities are reported as distinct behavior specific for each cohort. This analysis generates subnetworks or communities of regulatory modules that are shared or distinct across multiple Imaging-AMARETTO networks derived from multiple cohorts and further refines shared and distinct behavior of modules with respect to their specific drivers.

Downstream Utility for Interpreting Clinical and Experimental Outcomes
We developed several downstream utilities that facilitate interpretation of the Imaging-AMARETTO networks, including functional characterization, driver validation, clinical correlation, and imaging association. To functionally characterize modules and communities, we provide signatures from known gene sets databases (MSigDB) that can be augmented with user-defined signatures, such as immune cell, 39,40 stromal cell, 45 and stemness [41][42][43] signatures. Regulatory modules and communities are assessed for enrichment in these known functional categories (hypergeometric test).
To validate the predicted drivers as regulators of their targets in modules and communities, we assess whether activator or repressor drivers have a direct or indirect impact on their targets using experimental genetic perturbation data. To characterize regulatory modules for clinical outcomes and molecular biomarkers, the user can submit phenotypes known for all or subsets of samples and specify the statistical hypothesis tests to use for each phenotype. Examples of clinical and molecular phenotypes include survival data, molecular subclasses (eg, mesenchymal, proneural, or classical GBM, 51 astrocytoma or oligodendroglioma LGG) and biomarkers (eg, IDH mutation, EGFR amplification, MGMT methylation status). Our implementation supports survival analysis using Cox proportional hazards regression, nominal two-class and multiclass analysis using the Wilcoxon rank sum and Kruskal-Wallis tests, and continuous or ordinal analysis using the Pearson linear and Spearman rank correlation tests. These clinical and molecular phenotype associations are assessed for each of the regulatory modules in individual cohorts and combined for communities across cohorts.
Finally, to interpret regulatory modules for relevance to radiography or histopathology imaging features, associations with these imaging features can be assessed. Examples of radiography and histopathology phenotypes include the 30 TCIA VASARI MRI features defined by expert consensus for the TCGA GBM and LGG cohorts, the IvyGAP 13-15 histopathology imaging features characterizing RNA-seq samples refined for anatomic structures and cancer stem cells defined by expert consensus for the IvyGAP GBM cohort, and radiography and histopathology imaging features derived using quantitative imaging methods. 20,[52][53][54] Users are provided with all results in the form of hypertext markup language (HTML) reports that are generated in an automated manner for individual cohorts using Imaging-AMARETTO and multiple cohorts using Imaging-Community-AMARETTO. These reports include searchable tables within and across modules and communities, including statistics (ie, coefficients, P values, false discovery rate [FDR] values) for functional enrichment, driver validation, clinical and molecular biomarkers, and radiography and histopathology imaging features. These reports also include heat map visualizations for modules (Figs 2-6) and graph visualizations for communities (Appendix Fig A1). Source code is also provided to convert Imaging-AMARETTO and Imaging-Community-AMARETTO networks for depositing networks in the NDEx 55 network database, taking advantage of its interactive features.

RESULTS
To demonstrate its utility, we applied the Imaging-AMARETTO workflow to three studies of brain tumors: Disease progression in glioma is characterized by infiltration of resident microglia and peripheral macrophages in the tumor microenvironment and by pervasive infiltration of tumor cells in the healthy surroundings of the tumor. 58 Understanding microglia and macrophage physiology and its complex interactions with tumor cells can elucidate their roles in glioma progression and uncover potentially interesting druggable targets.
Our results show that Imaging-AMARETTO captures these hallmarks of glioma, for example, key drivers of tumorassociated microglia and macrophage mechanisms 59 mediated by STAT3, AHR, and CCR2, and neurodevelopmental and stemness mechanisms 60 that involve OLIG2. Our findings recapitulate recent discoveries 59,60 and provide interpretation of the molecular mechanisms in light of imaging biomarkers of clinical outcomes. Of note, Imaging-Community-AMARETTO also uncovers novel key master drivers that are shared by these distinct key mechanisms.

Imaging-AMARETTO Deciphers Clinical Relevance of Multiomics Modules of Key Driver Mechanisms
Of clinical relevance, higher expression levels of STAT3, AHR, and CCR2 modules (Figs 2, 3, and 6) are associated with shorter survival in GBM and LGG, and these modules also distinguish between molecular subclasses of GBM and LGG. In GBM, the mesenchymal subclass is represented by higher expression of these modules compared with the classical and proneural subclasses. In LGG, the astrocytoma subtype is characterized by higher expression of these modules compared with the oligodendroglioma subtype.
Diametrically opposed, higher expression levels of OLIG2 modules (Figs 4-6) are associated with better survival in GBM and LGG, and these modules also distinguish between molecular subclasses of GBM and LGG but in the opposite direction. In GBM the classical and proneural subclasses are represented by higher expression of these modules compared with the mesenchymal subclass. In LGG, the oligodendroglioma subtype is characterized by higher expression of these modules compared with the astrocytoma subtype.

Imaging-AMARETTO Deciphers Histopathology Imaging Biomarkers of Key Driver Mechanisms
Histopathology imaging features of anatomic structures show that higher expression of STAT3, AHR, and CCR2 modules (Fig 3) distinguishes between samples derived from the cellular tumor compared with those from leading edge and infiltrating tumor regions. Higher expression of OLIG2 modules distinguishes infiltrating tumor from cellular tumor samples.
Features representative of cancer stem cells show that higher expression of STAT3, AHR, and CCR2 modules (Fig 3) distinguishes cancer stem-cell samples from their non-stem cancer cell counterparts. This observation is consistent across the distinct substructures of the cellular tumor, including hyperplastic blood vessels, microvascular proliferation, perinecrotic zone, and pseudopalisading cells around necrosis. Diametrically opposed, higher expression of OLIG2 modules distinguishes non-stem cancer cells from cancer stem cells consistently across these microvascular and necrosis substructures.

Imaging-AMARETTO Deciphers Radiography Imaging Biomarkers of Key Driver Mechanisms
Radiographic image features of STAT3, AHR, and CCR2 modules (Figs 2 and 6) are highly consistent across GBM and LGG. Higher expression is associated with a higher proportion of enhancing tumor, lower proportion of nonenhancing tumor, and less cortical involvement. These modules also distinguish between measures of thickness of enhancing margin in both GBM and LGG. In GBM these STAT3, AHR, and CCR2 modules show higher expression in association with eloquent cortex, while in LGG, they show higher expression in association with enhancement intensity.
Features of OLIG2 modules (Figs 4-6) are also consistent across GBM and LGG and diametrically opposed to those of STAT3, AHR, and CCR2. In both GBM and LGG, higher expression is associated with a higher proportion of nonenhancing tumor and lower proportion of enhancing tumor.

Imaging-Community-AMARETTO Uncovers Known Key and Novel Master Drivers Linking Mechanisms
Recent discoveries of STAT3, AHR, and CCR2 as drivers of tumor-associated microglia and macrophage mechanisms 59 are captured by modules in communities 1 and 5: TCGA LGG module 125 (Fig 2) shows hypomethylation of STAT3 as activator driver of AHR, with higher expression associated with shorter survival and astrocytoma LGG, and IvyGAP GBM module 64 (Fig 3) shows that higher expression of AHR and CCR2 is associated with the presence of cancer stem cells and microvascular substructures and suggests as novel activator driver, THBS1, that plays important roles in macrophage infiltration and angiogenesis, 61 vascularization, 62 and tumorigenesis 62 in glioma. OLIG2 as a driver of neurodevelopmental and stemness mechanisms 60,63,64 is captured by modules in community 2: (1) TCGA LGG module 91 (Fig 4) shows hypomethylation of OLIG2 as activator driver of this module, with higher expression associated with better survival, oligodendrocyte LGG, and IDH1 wild-type status; (2) TCGA GBM module 75 ( Fig 5) is driven by OLIG2, with higher expression associated with proneural and classical versus mesenchymal GBM, and suggests as novel repressor driver THBS1 and as novel activator driver hypomethylation of neuronal marker MAP2 that plays important roles in microtubuleassociated neurogenesis 65,66 and reduces invasiveness 67 and stemness 68 in glioma; and (3) TCGA GBM module 98 (Fig 6) shows CCR2 and OLIG2 co-acting as activator and repressor drivers, respectively, highlighting their diametrically opposed behavior, with higher CCR2 and lower OLIG2 expression associated with mesenchymal versus proneural and classical GBM and suggesting as novel repressor driver hypomethylation of MAP2, consistent with observations in TCGA GBM module 75 (Fig 5).
Using knockdown experiments of THBS1 from LINCS/ CMAP, we confirmed that THBS1 acts as activator and repressor of its targets in IvyGAP GBM module 64 (Fig 3) and TCGA GBM module 75 (Fig 5), respectively.
Thus, Imaging-Community-AMARETTO (Appendix Fig A1) identified THBS1 and MAP2 as novel master drivers across the three STAT3, AHR, CCR2, and OLIG2 communities that provide new insights into how these distinct key mechanisms are linked in glioma. Interesting avenues for further exploration with experimental validation studies include testing novel hypotheses of THBS1 and MAP2 as master regulators of shared mechanisms that involve macrophage infiltration, vascularization, tumorigenesis, invasion, stemness, and neurogenesis in glioma.

DISCUSSION
We developed the Imaging-AMARETTO algorithms and software tools for imaging genomics to facilitate systematic interrogation of regulatory networks derived from multiomics data within and across related patient studies for their relevance to radiography and histopathology imaging features that predict clinical outcomes. We demonstrated its utility through application to three patient studies of brain tumors, including multiomics and radiography imaging data from the TCGA GBM and LGG studies and transcriptional and histopathology imaging data from the Ivy-GAP GBM study.
Our results show that Imaging-AMARETTO recapitulates known key drivers of tumor-associated microglia and macrophage mechanisms (STAT3, AHR, and CCR2) and neurodevelopmental and stemness mechanisms (OLIG2). Imaging-AMARETTO provides interpretation of the underlying molecular mechanisms in light of imaging biomarkers of clinical outcomes, and Imaging-Community-AMARETTO also uncovered novel master drivers THBS1 and MAP2 that establish relationships across these distinct mechanisms.
Of note, the querying of the Imaging-AMARETTO networks for modules whose elevated expression is inversely associated with proportions of enhancing tumor and cancer stem cells on radiography and histopathology imaging, respectively, shows that these modules are putatively coregulated by activator drivers OLIG2 and MAP2 and repressor drivers STAT3, AHR, CCR2, and THBS1. Thus, we hypothesize that restoration of the function of OLIG2 and MAP2 and attenuation of the expression of STAT3, AHR, CCR2, and THBS1 potentially shift their target genes' expression to more benign functional states associated with better survival in GBM and LGG.

C2
FIG A1. Imaging-Community-AMARETTO identifies known drivers AHR, STAT3, CCR2, and OLIG2 and uncovers novel master drivers THBS1 and MAP2 that link distinct key mechanisms that underlie glioma. In this Imaging-Community-AMARETTO graph, the nodes represent the regulatory modules or circuits that are learned from the three studies of brain cancer (The Cancer Genome Atlas [TCGA] glioblastoma multiforme [GBM], Ivy Glioblastoma Project [IvyGAP] GBM, and TCGA low-grade glioma [LGG]; node sizes are scaled by the number of driver and target genes in the modules), the edges represent the extent of overlapping genes between the modules across the three cohorts (edge thickness is scaled with the significance of the overlapping genes between modules), and the clouds represent how the modules across the three cohorts are grouped into the communities or subnetworks that are learned using the Girvan-Newman edge betweenness community detection algorithm. Imaging-Community-AMARETTO organized modules regulated by known drivers of tumor-associated microglia and macrophage mechanisms STAT3, AHR, and CCR2 and neurodevelopmental and stemness mechanism OLIG2 into three communities. Community 5 (C5) links STAT3, AHR, and CCR2, and C1 links AHR and CCR2 as activators of shared modules. Modules regulated by OLIG2 are represented in C2 with OLIG2 as activator of its modules. Of note, C2 also contains a module that links OLIG2 and CCR2 co-acting as repressor and activator, respectively (Fig 6). Imaging-Community-AMARETTO also uncovered THBS1 and MAP2 as novel master drivers across the three STAT3, AHR, CCR2, and OLIG2 communities that provide new insights into how these distinct key mechanisms are linked in glioma. THBS1 is an activator driver of three modules in C1 and C5, and a repressor driver of three modules in C2. MAP2 is a repressor driver of three modules in C1 and C5 and an activator driver of six modules in C2 (except repressor of TCGA GBM module 98). Taken together, C1 links AHR and THBS1 (TCGA GBM module 79). C2 links OLIG2, MAP2, and THBS1 (TCGA GBM module 75; Fig 5); OLIG2 and MAP2 (IvyGAP GBM module 38, TCGA GBM module 61); CCR2, OLIG2, and MAP2 (TCGA GBM module 98; Fig 6). C5 links CCR2, AHR, and THBS1 (IvyGAP GBM module 64; Fig 3) and AHR and STAT3 (TCGA LGG module 125; Fig 2). Using genetic knockdown experiments of THBS1 from LINCS/Connectivity Map, we confirmed that THBS1 acts as activator and repressor drivers of its targets in IvyGAP GBM module 64 (Fig 3) and TCGA GBM module 75 (Fig 5), respectively (http://portals.broadinstitute.org/ pochetlab/JCO_CCI_Imaging-AMARETTO/Imaging-AMARETTO_HTML_Report_TCGA-GBM_IVYGAP-GBM_TCGA-LGG/; www.ndexbio.org/#/network/16820740-d7ea-11e9-bb65-0ac135e8bacf).