Identification of neoplasm-specific signatures of miRNA interactions by employing a systems biology approach

MicroRNAs represent major regulatory components of the disease epigenome and they constitute powerful biomarkers for the accurate diagnosis and prognosis of various diseases, including cancers. The advent of high-throughput technologies facilitated the generation of a vast amount of miRNA-cancer association data. Computational approaches have been utilized widely to effectively analyze and interpret these data towards the identification of miRNA signatures for diverse types of cancers. Herein, a novel computational workflow was applied to discover core sets of miRNA interactions for the major groups of neoplastic diseases by employing network-based methods. To this end, miRNA-cancer association data from four comprehensive publicly available resources were utilized for constructing miRNA-centered networks for each major group of neoplasms. The corresponding miRNA-miRNA interactions were inferred based on shared functionally related target genes. The topological attributes of the generated networks were investigated in order to detect clusters of highly interconnected miRNAs that form core modules in each network. Those modules that exhibited the highest degree of mutual exclusivity were selected from each graph. In this way, neoplasm-specific miRNA modules were identified that could represent potential signatures for the corresponding diseases.


INTRODUCTION
are endogenous, small (∼22 nucleotides in length), non-proteincoding RNAs that play a critical role in regulating the expression of target mRNAs at the post-transcriptional level by complementary base-pairing of the so-called miRNA ''seed'' region with the 3 UTR region of the target (Bartel, 2004;Kehl et al., 2017). In this way, miRNAs reduce the stability of the target mRNA and/or inhibit its translation, thereby downregulating the expression of the corresponding gene (Ambros, 2004).
Bioinformatics approaches have been widely used to identify and predict miRNA diagnostic signatures for diverse types/subtypes of cancers (Nunez Lopez et al., 2018;Rehman et al., 2019;Shams et al., 2020;Yang, Zhang & Yang, 2019). However, despite the wealth of studies on cancer-associated miRNAs and the identification of miRNAs that could serve as potential diagnostic biomarkers, these studies have largely focused on certain types/subtypes of cancers. Of note, these miRNA signatures ignore mutual exclusivity, i.e., miRNAs used to detect a particular type/subtype of cancer and not another. To the best of the authors' knowledge, identification of neoplasm-specific miRNAs that are detected exclusively in a particular type/subtype of cancer is lacking. Therefore, there is an emerging need for identifying core miRNA signatures for characterizing diverse cancers. The findings of this study could be considered towards reducing the cases of spurious cancer diagnosis and improving neoplasm-specific detection by the application of panels of miRNAs exclusive and unique to the major types of cancers.
In the present study, we have employed an integrated bioinformatics methodology for the identification of core sets of miRNAs specific to a particular neoplastic disease, by exploiting relevant data from publicly available resources. Only highly connected miRNA modules displaying significant mutual exclusivity were selected for each group of neoplasms. The component miRNAs of those modules could potentially constitute biomarkers for distinguishing different types of cancers, which could complement and update the existing ones

MATERIALS & METHODS
A graphical presentation of the overall workflow of this study is depicted in Fig. 1. All analyses were performed in the R statistical computing environment v.4.2.0 (https://www.r-project.org, accessed on 10 January 2022), unless otherwise stated.

Acquisition of miRNA-neoplasm associations
The miRNA and neoplasm associations were retrieved from four comprehensive publicly available databases: (a) miR2Disease (Jiang et al., 2009)

Nomenclature
In order to maintain a consistent nomenclature for the neoplasm names from the four databases, the disease terms were annotated according to the International Classification of Diseases and Related Health Problems (ICD), revision 11 (ICD-11) (Pickett & Anderson, 2018). ICD is maintained by the World Health Organization (WHO) and represents the global, comprehensive reference for the collection, processing and hierarchical classification of diseases, which can be used accurately and effectively to investigate the relationships among different diseases. NCBI's Medical Subject Headings (MeSH) (Lipscomb, 2000), a comprehensive, hierarchically-organized vocabulary for subject indexing, cataloguing and searching biomedical and life sciences-related information, was used for assigning the neoplastic diseases to broader groups. Official gene symbols were assigned to gene names according to the HUGO Gene Nomenclature Committee (HGNC) (Tweedie et al., 2021); the official miRNA names in miRBase (Kozomara, Birgaoanu & Griffiths-Jones, 2019) were used in this study.

Prediction of miRNA target genes
The corresponding gene targets of the miRNAs under investigation were identified. To this end, both the experimentally supported and predicted mRNA targets of the neoplasmassociated miRNAs were retrieved with the usage of the following state-of-the-art software tools: (i) microT_CDS has implemented the target prediction algorithm DIANA-microT-CDS (Paraskevopoulou et al., 2013); (ii) TargetScan searches for the presence of conserved sites that match the seed region of each miRNA (Agarwal et al., 2015); (iii) miRDB predicts functional miRNA gene targets by machine learning modeling of target gene expression data (Chen & Wang, 2020;Liu & Wang, 2019); (iv) PicTar relies on an algorithm for the identification of common microRNA targets (Krek et al., 2005); (v) miRanda, which is available online as part of the miRanda-mirSVR tool, depends on evolutionarily conserved miRNA-targets (Enright et al., 2003). To enhance the robustness of the prediction accuracy, only those target genes predicted by more than three tools were included in the study.

Integration of miRNA-miRNA interactions
The miRNA-miRNA pairwise functional relationships were assembled and inferred from two diverse sources, subsequently followed by merging and removal of duplicate miRNA-miRNA interactions.
GOSemSim (Yu et al., 2010), utilizes information-theoretic measurement to find similarities between genes based on Gene Ontology (GO) (Ashburner et al., 2000) terms. Thus, GOSemSim assigns a function to genes according to shared GO annotations. A miRNA-miRNA matrix was created based on the degree of functional similarity of their target genes; the higher the degree of gene relatedness, the higher the association score of the respective miRNAs.
In addition, miRNA-miRNA pairwise relationships were obtained from MIRGOFs (Yang et al., 2018), which measures functional similarities for miRNAs based on the GO annotation of their corresponding target genes by adopting two novel features: (i) it utilizes a new GO semantic similarity metric which considers both common ancestors and descendants of GO terms; (ii) it computes similarity between GO sets in an asymmetric manner, and weights each GO term by its statistical significance.
Only those miRNA pairwise interactions, from GoSemSim and MIRGOFs, with weight score ≥ 0.6 (where ''1'' is the highest) were selected to ensure robustness.

Construction of miRNA-neoplasm networks
Firstly, miRNA-neoplasm networks were constructed for each group of neoplasms, where the links were assigned two weights, one from MIRGOFs (Yang et al., 2018) and the second one from the calculated miRNA matrix based on GOSemSim.
The topological features of the generated networks were investigated through Cytoscape (v.3.9.1) (Shannon et al., 2003), a software platform for network manipulation and visualization. Arena3Dweb, an interactive, web-based tool for the visualization of multilayer networks in tree-dimensional space was also used for network depiction (Karatzas et al., 2021).

Identification of mutually exclusive cancer-relevant miRNA modules
To achieve the maximum possible mutual exclusivity, first, unique miRNAs were detected in each neoplasm. Second, the igraph package in R was used to calculate the node degree distribution (Barabasi, Gulbahce & Loscalzo, 2011) and clustering coefficient (i.e., an estimate of the tendency of the nodes in a graph to cluster together) (Watts & Strogatz, 1998) in each miRNA-neoplasm network (a total of thirty graphs). Then, the degree distribution was applied and a simultaneous comparison was performed between the 30 networks in order to find modules of miRNAs that are included in one neoplasm and not in the others. An in-house R script was used in order to handle the complexity of the datasets. Every dataset was included in a hash and the mutually exclusive modules were detected.

RESULTS
Functionally linked diseases tend to share common epigenetic mechanisms to ensure the coordinated regulation of their causative genes (Zinani, Keseroglu & Ozbudak, 2022). MiRNAs cooperate towards gene co-targeting and co-regulation in various neoplastic diseases, and hence, tend to form densely connected communities in the disease regulome (Bryan et al., 2014). Pivoted on this premise, the aim of this study was to explore the miRNA-miRNA interaction networks for diverse groups of neoplastic diseases, based on the functional relations of their corresponding target genes, towards the discovery of core sets of miRNAs per cancer that could represent potential signatures for each cancer type. To this end, this work sought to identify tightly interconnected modules per neoplasm with the highest possible degree of mutual exclusivity based on network topological properties.

Neoplasm classification
All neoplasms were classified into thirty major groups using first the ICD-11 standard hierarchy for medical terms and then the MeSH vocabulary for assigning those terms to broader classes (Table 1). For example, the ICD terms 'adenocarcinoma of lung', 'follicular carcinoma of thyroid gland', and 'adenoid cystic carcinoma' were classified under the broader MeSH term 'adenocarcinoma'. The ambiguous terms were removed from the subsequent stages of the analysis.

Construction of miRNA-miRNA graphs per neoplasm
The miRNA-neoplasm bipartite graphs were constructed by relying entirely on experimentally known miRNA-cancer associations assembled and integrated from four diverse resources.
The miRNA-miRNA interactions in the corresponding thirty monopartite networks were inferred from two different sources, directly and indirectly. In the former case, the pairwise miRNA associations were obtained from miRGOFS which includes miRNA-miRNA similarities computed with respect to the functional relationships of their target genes according to shared GO terms. In the latter case, miRNA targets were first detected through the consensus output of more than three miRNA-gene interactions databases, which include both experimentally supported and predicted entries. Then, GOSemSim was deployed, where the similarities for each pair of miRNAs were computed based on the functional annotation of their target genes according to GO; the more semantically related two GO terms, the higher the degree of association of the corresponding genes. The miRNA-miRNA pairs from both sources were combined and the duplicates were removed; a total of 2604 unique miRNAs were detected across the thirty neoplasms (Table S1).
All thirty networks are very dense with an average estimated clustering coefficient ∼0.89 (where '1' is the highest value). This indicates that the nodes in the given graphs exhibit a high tendency to cluster together. Despite the stringent cutoff values, the number of significant links is quite large, and as such, no visually discernible modules could be detected in any of the networks. Based on topological analyses, all graphs have a random degree distribution, where the node degrees (i.e., the number of connections of a given node to neighboring nodes) follow a Poisson distribution, unlike the biological networks which are scale-free and are dominated by few 'hubs' (i.e., highly connected nodes) (Barabasi, 2009;Barabasi, Gulbahce & Loscalzo, 2011;Goh et al., 2007;Kontou et al., 2016). This is probably attributed to the ''promiscuous'' nature of miRNAs to potentially target numerous genes, resulting in this way to a high number of corresponding miRNA connections. In order to infer less dense and more robust networks, only those links with weights above the 90th percentile per network were selected. In this way, the highest scoring miRNA-miRNA interactions were used for network construction and subsequent module detection. After pruning, there were 1,893 unique miRNAs across neoplasms, with the number of miRNAs per neoplasm ranging from 39 (endocrine gland neoplasm) to 1,821 (colorectal neoplasms). Consequently, the number of the corresponding pairwise miRNA-miRNA associations was reduced significantly (Table 1 and Table S2). Noteworthy, this is observed in a more pronounced way in the more populated graphs, since the number of potential miRNA connections increases exponentially with an increase in the number of member miRNAs. As it is shown in Fig. 2, the endocrine gland neoplasm miRNA network becomes less dense with a reduced number of nodes (miRNAs) and connections (miRNA

Detection of neoplasm-specific miRNA modules
The goal of this study was to detect distinct subgraphs within the neoplasm graphs which cannot be contained by another subgraph. These subgraphs, which represent local clusters or modules, are considered semiautonomous and therefore are of high biological importance (Pelaez & Carthew, 2012). The top scoring mutually exclusive modules per neoplastic disease were selected based on the criteria that they exhibited significant mutual exclusivity and their constituent miRNAs were more highly connected to each other than to other miRNAs in the entire graph.
Modules of miRNA interactions that displayed a statistically significant level of mutual exclusivity were detected in seventeen neoplasms (Table 2). A single miRNA-miRNA pair (i.e., hsa-miR-1296-3p and hsa-miR-6748-5p) was detected in bladder cancer, which was excluded from the analysis. In the other twelve neoplasms under study, no distinguishable modules could be detected. The seventeen modules were comprised of three to twentythree miRNAs (Table 2). Of note, only a few miRNAs were found in up to five cancer modules (hsa-miR-4273, hsa-miR-4422, hsa-miR-4720-3p and hsa-miR-5002-3p); this is because mutual exclusivity was a prerequisite. Among them were modules consisting of a relatively large number of miRNAs, such as the esophageal (23), colorectal neoplasms (20), and breast neoplasms (17), and also modules with a moderate number of component miRNAs including the liver (8) and lung neoplasms (7). Cliques of only three miRNAs were detected in the sarcoma and squamous cell carcinomas (Fig. 3). In addition, PubMed (https://pubmed.ncbi.nlm.nih.gov/) was searched with each of the neoplasms and the core miRNA components of the corresponding modules (Table 2). Nonetheless, in spite of extensive searches, any reported combinations of any two miRNAs that compose the neoplasm-specific modules could not be detected (Table 2).

DISCUSSION
MicroRNAs constitute major components of the epigenetic regulome of diseases including cancers. Previous studies have focused on the identification of disease-specific miRNAinteraction signatures. Nalluri and colleagues (2017) introduced miRsig, a network inference method for identifying preserved miRNA-miRNA interactions across different types of cancers using miRNA expression profiles (Nalluri et al., 2017). In another study, cancer-specific miRNAs were detected in miRNA target-deregulated networks built on the differentially expression profiles of miRNAs and mRNAs (Xu et al., 2011). Song et al. (2015) identified co-regulating miRNAs in lung cancer based on shared functionally linked target genes. However, these studies have not taken into consideration the mutual exclusivity relations between the neoplastic diseases under investigation.
In the present study, a priori miRNA-cancer relationships were collected from four comprehensive databases which contain different categories of association data, i.e., differential expression profiles, circulating biomarkers, genome-wide association studies (GWAS), knockdown studies, etc. This sort of 'pluralistic' data would enable the extraction of the maximum amount of available cancer-associated miRNA information, in order to predict and prioritize pairwise miRNA interactions per neoplasm towards the identification of distinct groups of miRNAs. The neoplasm terms were first homogenized, in order to eliminate any duplicate entries. Then, miRNA connections were inferred by relying on the shared functionality of their target genes. Based on the premise that miRNAs contribute to the post -transcriptional gene regulation in a synergistic manner (Shi et al., 2013), by co-ordinately regulating target genes, we generated miRNA-oriented networks of the top-ranking miRNA pairs, and sought to discover modules of tightly connected miRNAs per neoplasm.
Herein, mutual exclusivity was included as a prerequisite in order to detect sets of densely connected miRNAs that could represent the signature component for a certain type of cancer and not for another. This approach would be particularly useful for the accurate detection of specific types of cancers. Likewise, in a prominent study by Ciriello et al. (2012), mutual exclusivity analysis of diverse genomic alterations was performed in order to identify oncogenic pathway modules.
The environmental context, that is, the connection patterns of the individual miRNAs, is very important, since in different niches (e.g., normal versus diseased state) the same miRNA may behave differently, including its differential expression status and connections. Therefore, both the miRNAs per se and the connection patterns among the component miRNAs of a module should be taken into consideration when extracting miRNA signatures. This is suggestive of the cooperativity between the respective miRNAs, so as to exert their regulatory effect on their targets.
In this study, miRNA modules were identified in seventeen neoplastic disease groups, composed of three to twenty-three members. The unique miRNA hsa-miR-3689c was found Table 2 Mutually exclusive miRNA modules per neoplasm.

CONCLUSION
Taken together, in the present study, a novel computational pipeline is provided, which could likely update and complement previous studies on cancer-relevant miRNAs. The predicted miRNA-miRNA interactions found in the modules of the neoplastic diseases could pave the way for the rational design of experimental studies directed towards the elucidation of these novel pairings. These miRNA-interaction signatures unique to each type of cancer could be possibly utilized in the clinical setting, along with the biomarkers that are currently being applied, for enhancing diagnostic accuracy in differentiating between cancer types; thereby, a panel of miRNAs could be used for the precise detection of a particular neoplastic disease and not another. The overall methodology proposed in this study could be also extrapolated to other datasets for the diagnosis and prognosis of human diseases/disorders.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the project ''ELIXIR-GR: The Greek Research Infrastructure for Data Management and Analysis in Life Sciences'', Grant Number (MIS) 5002780. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.