ABSTRACT
Characterizing cancer poses a delicate challenge as it involves deciphering complex biological interactions within the tumor’s microenvironment. Histology images and molecular profiling of tumors are often available in clinical trials and can be leveraged to understand these interactions. However, despite recent advances in representing multimodal data for weakly supervised tasks in the medical domain, numerous challenges persist in achieving a coherent and interpretable fusion of whole slide images and multi-omics data. Each modality operates at distinct biological levels, introducing substantial correlations both between and within data sources. In response to these challenges, we propose a deep-learning-based approach designed to represent multimodal data for precision medicine in a readily interpretable manner. While demonstrating superior performance compared to state-of-the-art methods across multiple test cases, our approach also provides robust results and extracts various scores characterizing the activity of each modality and their interactions at the pathway and gene levels. The strength of our method lies in its capacity to unravel pathway activation through multimodal relationships and extend enrichment analysis to spatial data for supervised tasks. We showcase the efficiency and robustness of its predictive capacity and interpretation scores through an extensive exploration of multiple TCGA datasets and validation cohorts, underscoring its value in advancing our understanding of cancer. The method is publicly available in Github.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Typos in multiple sections of the manuscript; Typos in the results for Table 2, experiments have been rerun to correct them