Network Medicine‐Based Analysis of Association Between Gynecological Cancers and Metabolic and Hormonal Disorders

Different metabolic and hormonal disorders like type 2 diabetes mellitus (T2DM), obesity, and polycystic ovary syndrome (PCOS) have tangible socio-economic impact. Prevalence of these metabolic and hormonal disorders is steadily increasing among women. There are clinical evidences that these physiological conditions are related to the manifestation of different gynecological cancers and their poor prognosis. The relationship between metabolic and hormonal disorders with gynecological cancers is quite complex. The need for gene level association study is extremely important to find markers and predicting risk factors. In the current work, we have selected metabolic disorders like T2DM and obesity, hormonal disorder PCOS, and 4 different gynecological cancers like endometrial, uterine, cervical, and triple negative breast cancer (TNBC). The gene list was downloaded from DisGeNET database (v 6.0). The protein interaction network was constructed using HIPPIE (v 2.2) and shared proteins were identified. Molecular comorbidity index and Jaccard coefficient (degree of similarity) between the diseases were determined. Pathway enrichment analysis was done using ReactomePA and significant modules (clusters in a network) of the constructed network was analyzed by MCODE plugin of Cytoscape. The comorbid conditions like PCOS-obesity found to increase the risk factor of ovarian and triple negative breast cancers whereas PCOS alone has highest contribution to the endometrial cancer. Different gynecological cancers were found to be differentially related to the metabolic/ hormonal disorders and comorbid condition.


Introduction
Different gynecological cancers like ovarian, cervical, and endometrial cancers are steadily increasing in different countries. In USA alone, statistics showed 2.5 L death during 2003-2015 [1] due to aforementioned cancers. Age standardized rate of the incidence of cervical cancer was found to have an upward trend over the years in India, Japan, UK, Brazil, etc. [2]. Metabolic disorders like type 2 diabetes mellitus (T2DM) and obesity have been clinically attributed to multiple complicacies including cancer. Obesity (defined as body mass index ≥30 kg/m 2 ) has been found to have steadily increased in both male and female with time [3]. Clinically, obesity is linked to low but persistent inflammation which is linked to cancer [4,5]. Association between obesity and different gynecologic cancers like ovarian, cervical, endometrial, and breast cancer has been identified and poor clinical outcome is linked to obese patients [6,7]. Type 2 diabetes is closely linked to obesity as most of the T2DM patient are obese and obesity increases the risk of diabetes [3]. Different clinical observations suggest T2DM is linked with increased risk of different gynecologic cancers like endometrial, cervical, and ovarian [8]. T2DM causes inflammation, insulin resistance, and modulation of female steroid hormone estrogen which results in endometrial cancer [9]. Although significant correlation between cervical cancer and T2DM is yet to be established, higher insulin growth factor 1 level was found in plasma of cervical cancer patients [10]. Patient cohort analysis identified significant correlation between poor prognoses of ovarian cancer among patient with T2DM patients. Overall survival rate and progression free survival for T2DM patients were shorter than non-diabetic patients [11].
Polycystic ovary syndrome (PCOS) is a hormonal and metabolic disorder used to affect woman in their reproductive age. PCOS is generally characterized by amenorrhea, ovulatory dysfunction, hyperandrogenism, etc. The metabolic dysfunction includes insulin resistance-related complicacies and obesity [12]. There are several clinical evidences that suggest PCOS patients have higher risk of gynecological cancer. Thickened endometrium and continuous exposure to estrogen increase the risk of endometrial cancer for PCOS patient [13]. There are conflicting reports about association between PCOS and ovarian cancer. Some reports suggest longer exposure to androgen in PCOS results ovarian cancer [14,15]. Few other reports have found no association between PCOS and ovarian cancer [16,17]. Although Barry et al. found when older women with PCOS history included in the study, the risk of ovarian cancer significantly increases [16]. Although no direct link has been established between PCOS and cervical cancer, prolonged estrogen exposure increases the chance of human papilloma virus (HPV) infection [18]. There are several other factors that influence gynecologic cancers like smoking, diet, estrogen therapy, and HPV infection. Different studies suggest that smoking increases the risk of ovarian cancer [19], breast cancer [20], and cervical cancer [21,22]. High fat diet is related to higher endometrial cancer risk [23] whereas Mediterranean diet is associated with lower endometrial cancer risk [24]. Positive association of TNBC and animal protein intake was observed [25].
Triple negative breast cancer (TNBC) is a subset of breast cancer characterized by the absence of estrogen, progesterone, and ErbB-2 receptor. It is a highly aggressive form of cancer and prevalent in younger age woman. Poor prognosis and high recurrence rate make it more challenging to treat [26]. In population-based analysis, TNBC is more prevalent among African-American and Asian population [27]. Different reports suggest TNBC is rapidly increasing among younger population in India and nearly 31 % of total breast cancer cases comprised of TNBC [26][27][28].
Gene network analysis has been used to identify molecular markers in the cancer cells. Several network models like implication network, Boolean network, and Pearson's correlation network were used to identify the significant biomarkers [29]. Gene-gene network analysis also identified the expression pattern of the genes change during breast cancer progression. Authors identified specific pathways like glycerophospholipid metabolism and h-Epf, specifically related to metastasis progression [30]. Network analysis of lung adenocarcinoma revealed its scale free and "small world" network property [31].
In the current work, we have collected the list of genes for T2DM, obesity, TNBC, endometrial carcinoma, TNBC, epithelial ovarian cancer, cervical cancer, and uterine corpus cancer from DisGeNET database (v 7.0). DisGeNET is a publicly available, gene-disease association database [32] and used in multiple work [33][34][35]. In the present article, we have investigated the molecular correlation between PCOS, obesity, T2DM, and gynecological cancers individually and in comorbid condition. We have also identified some key genes playing important functions in shared component analysis between metabolic/hormonal disorder and gynecological cancers.

Materials and Methods
This study involves data extraction, data analysis, diseasome construction, network construction, module analysis, and pathway enrichment.

Diseasome Construction
Association between diseases can be represented as a network called diseasome. This is constructed based on the proteins shared between the diseases or if the disease-associated proteins interact with one another in the interactome [36] (Fig. 1). Information regarding the interactions between proteins associated with diseases was obtained from the Human Integrated Protein-Protein Interaction rEference (HIPPIE v2.2) [37] (http:// cbdm. mdc-berlin. de/ tools/ hippie/ infor mation. php). Using the disease-gene associations and human interactome data from HIPPIE (filtered to confidence value > 0.5), proteinprotein interaction (PPI) networks between diseases were constructed.
To remove the bias created by occurrences of common shared genes (proteins; genes and proteins are used interchangeably in this report) between more extensively characterized diseases, the strength of association between two diseases in the diseasome is estimated using a value called molecular comorbidity index (MCI) [33,39,40]. MCI between two diseases dis1 and dis2 can be calculated using the following definition: Fig. 1 Construction of the diseasome using gene-disease association and interactome data [38]. Gene-disease association data was obtained from DisGeNET. Protein-protein interaction data obtained from HIPPIE reveals association between the gene products related to different diseases where "Proteins dis1 " and "Proteins dis2 " are the proteins associated with disease 1 and disease 2, respectively, "Proteins dis1→dis2 " are the proteins associated with disease 1 that interacts with proteins associated with disease 2 and vice versa for "Proteins dis2→dis1 ". "∩" (intersection operator) denotes the proteins that are common between the diseases and "∪" (union operator) denotes the total proteins associated with both the diseases. Both the numerator and denominator are written within vertical bars to represent their cardinality (number of elements). Visual graphical representation of the diseasome network and disease PPIs was created using Cytoscape [41].

Functional Enrichment Analysis of Disease Proteins
Pathway enrichment analysis of the gene lists of each disease from the Reactome [42] database is done to find the most significant pathways involved in the disease pathogenesis. ReactomePA [43] is an R package that uses hypergeometric function to do a statistical overrepresentation test on the input gene list with the information in the database to give a list of significantly enriched pathway annotations. A cutoff of q value <0.05 (false discovery rate (FDR) measure employed by ReactomePA) was used to shortlist the enriched pathways. For this analysis, besides using individual metabolic/hormonal conditions, three comorbid conditions (coexistence of more than one disease conditions in an individual) were also taken into account. These include T2DM-obesity, obesity-PCOS, and PCOS-T2DM. Gene lists for these conditions were obtained from the PPI network between these disorders built using Cytoscape. This is based on the assumption that the interaction network represents the actual comorbid condition in an individual.
As a measure of degree of similarity between diseases, Jaccard coefficient (JC) was used to identify the commonly shared pathways among the disease conditions [33,40].
Jaccard coefficient between two diseases dis1 and dis2 is given by the following definition: "Pathways dis1 " denotes the pathways enriched in disease condition 1 and "Pathways dis2 " denotes the pathways enriched in disease condition 2. "∩" (intersection operator) denotes the pathways that are common between the diseases and "∪" (union operator) denotes the total pathways associated with both the diseases. Both the numerator and denominator are written within vertical bars to represent their cardinality (number of elements). The resultant JC values were visualized in the form of a heatmap created using ComplexHeatmap [42] package in R.

Module Analysis of Comorbidity Network
Modules are one of the fundamental elements of network theory. These are basically the regions in a network that shows the dense interlink within the nodes in the region (topological modules). In a biological network, presence of these modules implies a group of genes or proteins that having similar function or contributing a common pathway (functional modules) and some of these pathways may lead to a disease phenotype (disease modules) [36]. Modules in a network can be identified using network clustering algorithms. MCODE [44], a Cytoscape plugin, was used to identify the modules present in the PPI networks of comorbid conditions (T2DM-obesity, obesity-PCOS, and PCOS-T2DM).
All the parameters were left as default in the MCODE analysis panel. The top modules based on rank and number of nodes were selected in each of the three comorbid networks and further analyzed using the Cytoscape StringApp [45] for functional enrichment of the nodes (proteins) in the modules. The results from this functional enrichment analysis were filtered only to those obtained from the KEGG [46] and Reactome databases, to understand the relevance of these modules in the comorbid disease network to cancers.

Variable Strength of Association Between Metabolic Disorders and Gynecological Cancers
MCI is an important parameter to understand the strength of association between two diseases or pathological condition. The number of shared genes also indicate the association between two diseases. Obesity and ovarian cancer have shown the strongest correlation with MCI 0.8; also, the number of genes shared by is highest among the all 4 types of cancers. Common genes shared between TNBC and obesity are also lowest among the 4 types of cancer and also MCI is also lowest among 4 cancer types (0.73) ( Fig. 2A). T2DM also showed similar trend with ovarian cancer at the highest, both in MCI (0.79) and number of shared genes and TNBC lowest both in MCI (0.73) and number of shared genes (Fig. 2C). But the correlation between number of shared genes and MCI is different in PCOS. Although TNBC shared lowest number of shared genes with PCOS, it has highest MCI (0.79) among all 4 types of cancers, followed by endometrial cancer (0.77) (Fig. 2B). It is because the number of proteins associated with PCOS and TNBC is higher despite the shared genes are low (Eq. 1). Also, the number of genes associated with both the diseases is low (Fig. 2D). Thus, it corroborates the hypothesis on the link between PCOS and TNBC [47,48]. In Fig. 2D, the diseasome has been constructed which gives an overview on interaction between different diseases. The details have been described in the "Materials and Methods" section. T2DM and obesity showed high correlation (MCI 0.82) compared to their correlation with PCOS. Ovarian and cervical cancer showed high correlation (0.83) and overall, all 4 types of cancers showed high correlation with each other with MCI>0.8 for all the cancer types.

Comorbid Condition Has Higher Number of Shared Pathways
Jaccard coefficient (Eq. 2) is a representation of correlation between two diseases by means of shared pathways out of the total number of pathways. We have calculated Jaccard coefficient (JC) between T2DM, obesity, PCOS, ovarian cancer, endometrial cancer, cervical cancer, TNBC, and comorbid conditions such as T2DM-obesity, T2DM-PCOS, and obesity-PCOS. Overall, the risk of TNBC is considerably high for T2DM and obesity alone, but in comorbid condition, the risk increases nearly 2.2 fold in obesity-PCOS and T2DM-PCOS comorbid condition (Fig. 3B). In cervical cancer, risk is high for T2DM and obesity alone and does not increase significantly for T2DM-obesity comorbid condition. But, alone for PCOS, it was not significant, but in comorbid condition (T2DM-PCOS and obesity-PCOS), it has increased more than 2.2 fold. In endometrial cancer, Jaccard coefficients suggest all three conditions have moderate risk factors, although PCOS alone has the highest JC for endometrial cancer compared to other cancers which supports the clinical findings [17,49]. In comorbid condition, risk factor increases 1.7 fold when obesity-PCOS is compared against PCOS alone. In ovarian cancer (Fig. 3B), risk factor is high for T2DM and obesity alone but less for PCOS. Risk factor increases in case of comorbidities. The JC for T2DM-obesity is higher than T2DM or obesity alone but not highly significant (1.24 fold compared to T2DM). But, PCOS-obesity and T2DM-PCOS show significant increase compared to PCOS alone (2.46 and 2.48 fold, respectively) (Fig. 3B). The result indicates PCOS associated to T2DM or obesity significantly increase the risk of gynecological cancers. This observation can be attributed to the combined effects of chronic inflammation in obesity and T2DM and sex steroid metabolism in PCOS which together contributes to the higher risk of cancers [50]. Obesity-PCOS and T2DM-PCOS were observed to have higher JC than T2DM-obesity comorbidity towards cancer. This as previously discussed might be because of the combined effect of dysfunctional metabolism caused by obesity and T2DM and the dysregulation of sex steroids caused by PCOS. Hub genes in the modules of these two metabolic comorbidities apart from having the genes that were identified previously among the cancer-isolated metabolic conditions cases also seem to have androgen receptor (AR) as another hub gene, indicating the significance of sex steroid-related signaling in these comorbidities and how they might lead to cancer progression. Hierarchical clustering of JC values resulted in separate 3 clusters (Fig. 3A). One cluster comprised of T2DM, obesity, T2DM-obesity, obesity-PCOS, and T2DM-PCOS. The second cluster is comprised of cancers alone. The third and separate cluster is of PCOS alone. Transitivity (T) and clustering coefficient (C) are two important parameters for proteinprotein interaction network. They are the measure of modularity of the network. The modularity of the comorbid network is a good indication of network robustness. We have calculated the T and C using igraph package in R (Figs. S8, S9, and S10 and Table S2). It was found that the deviation in clustering coefficients among comorbid networks was similar.

Pathway Analysis Revealed Common Pathways and Hub Genes
Pathway enrichment analysis is a method to evaluate shared pathways between comorbid diseases [51]. Comorbidity is the evaluation of the pathological properties of two different diseases and how they are interrelated. The pathway enrichment for comorbidity analysis helps us to understand molecular mechanisms and pathways that are common between the diseases [52]. Also, comorbidity analysis helps to estimate the risk factors Fig. 3 Jaccard coefficient analysis for A hierarchical clustering of cancer and all the conditions mentioned. Cancers make one cluster and comorbid conditions along with T2DM and obesity makes a separate cluster. Shared pathways between PCOS and other cancers are low. B Jaccard coefficient of different cancers against T2DM, obesity and PCOS, and comorbid conditions. Overall, JC is found to be higher in comorbid condition associated between diseases. Many studies have revealed the comorbidity through pathway enrichment and other analyses [38,52,53]. The protein-protein interaction (PPI) network was constructed using Cytoscape software. Three metabolic/hormonal disorders and 4 cancers interaction network was constructed to calculate the MCI. These visual representations of the disease interaction networks do not include all the genes (proteins) associated with a disease but rather those that only interacts with the proteins of the other disease in the pair showing how these diseases are connected at the molecular level (Fig. 4). The layout used to represent the networks was "Edge-weighted Spring Embedded layout" on the basis of the edge betweenness of the network. Nodes are colored on the basis of the disease types and node labels denote the Entrez gene ID.
Furthermore, the PPI networks were analyzed for the hub genes and/or proteins (nodes with the highest degrees) that are common across the cancer comorbidities for the three metabolic/hormonal disorders (Table 2). Epidermal growth factor receptor (EGFR/HER1/ ERBB1) and estrogen receptor (ESR1 and ESR2) were identified to be hub genes across all the metabolic/hormonal cancer comorbidities. Tumor protein 53 (TP53) was observed to be a significant gene in both T2DM and obesity-related cancer comorbidities. Apart from these, several other genes including AKT1, BRCA1, and MYC that have strong correlation with many different types of cancers were found as hub genes in the obesity-cancer condition. Literature review of these key genes/proteins portrays that they play important regulatory roles in the development and progression of these diseases by taking part in major intracellular cell signaling pathways and thus possibly acting as a molecular bridge in their comorbidities.
EGFR dysfunction is a common phenomenon observed in many cancers including breast lung, colon, cervical, and ovarian. In our study, we have found EGFR is a common hub gene found in all three disease condition ( Table 2) which supports the findings about the role of EGFR in TNBC [54], cervical [55], endometrial [56], and ovarian cancers [57,58] where authors have used EGFR inhibitors. Estrogen receptors (ER) α and β (ESR1 and ESR2 genes, respectively) were also identified to be the hub genes. Endometrial cancer has shown strong epidemiological evidence among women with underlying conditions like PCOS and obesity. Endometrium among other gynecological organs is highly sensitive to estrogen signaling. This is also evident from this study that PCOS as an individual condition showed the highest JC value for endometrial cancer among all the four cancer types.
The functional enrichment analysis results of the disease-gene list from ReactomePA were visualized as bar graphs (Supplementary file, Figs. S1-S7), which shows the top ten pathways enriched for each of the disease conditions using the enrichplot (https:// github. com/ Guang chuan gYu/ enric hplot) package in R. The number of commonly enriched pathways among the different metabolic/hormonal conditions and cancers was calculated and shown in Table 3. Different pathways like interleukin signaling, FOXO mediated transcription regulation, estrogen signaling were found to be most significantly enriched pathways commonly found in all the conditions (Fig. S1). Also, in this list, leptin signaling was identfied which is a significant biomarker for breast and endometrial cancer and related to obesity and multiple mitogenic pathways (Fig. S1). We also identified AKT/PI3P, receptor tyrosine kinase (RTK), MAP kinase and different growth factor receptors (including IGF-1, ERBB, different growth hormones) mediated signaling (Fig. S1). Henceforth, we identified multiple cellular events like cytokine-mediated inflammation, gene expression regulation, developement of insulin resistance, steroid hormone metabolism etc. which lead to the cancers due to aforementioned hormonal/matbolic condition.

Module Analysis and Pathway Enrichment of Comorbid Condition Linked to Cancer-Related Pathways
In order to analyze the reason of higher JC values for hormonal/metabolic comorbid condition, we identified key modules in the comorbid network using MCODE. MCODE analysis of the comorbidity networks (Fig. 5) in Cytoscape resulted in a number of modules. From the results, top three modules were selected, taking into account both the module rank and number of nodes. These top modules were then separated out from the parent network and visualized as subnetworks (Fig. 5). Table S1 includes the information for the modules obtained for all the three comorbidity networks. Functional enrichment of the modules of T2DM-obesity network resulted in a list of KEGG pathways. The best enriched KEGG pathways obtained from the module enrichment include cancers of breast, colorectal, pancreatic, endometrial, HPV associated cervical and prostate. The enrichment analysis also resulted in a number of molecular signaling pathways like PI3K/AKT signaling, estrogen signaling, insulin resistance, endocrine resistance, adipokines signaling, and mTOR signaling that are highly relevant and known to contribute to the pathogenesis of cancers in this study. Each of the modules was also analyzed for hub genes with the highest degree centralities in the networks. EGFR was found to be the major hub gene in module 1. ESR2 was present in module 1 although it did not come under the top degree centrality genes. Likewise, analysis of module 2 and 4 revealed ESR1 and TP53 as their major hub genes. This seems to be in agreement with the hub genes identified from the previous analysis of the whole disease networks between T2DM and obesity with the cancer networks.
In T2DM-PCOS comorbid network, several types of cancer including small cell and non-small cell lung cancer, prostate cancer, breast cancer, endometrial cancer, liver cancer, bladder cancer, pancreatic cancer, colorectal cancer, cervical cancer, and ovarian cancer were all enriched for the three modules (Fig. 5b). Besides, a large number of signaling pathways that could contribute to cancer initiation and progression were also found to be enriched. In module 1, EGFR, ESR1, and AR were found to be the major hub genes in the network. In module 4, ESR2 was found to have the highest degree centrality. Although module 3 did not have any of those genes that are identified in the previous analysis, hub genes in these modules including PI3KR1, CDKN1A, and IRS1 are some important genes in the pathways that could lead up to cancer progression. Pathways like endocrine resistance, JAK/STAT signaling, PI3K/AKT and AGE-RAGE signaling, p53 signaling, ErbB signaling, insulin resistance, T2DM, apoptosis, adipokines, and estrogen signaling were some of the major molecular pathways that were found to be enriched across the modules.
In obesity-PCOS network (Fig. 5c), hub gene analysis of the modules showed that AR and IRS1 for module 1 and EGFR for module 2 were the major genes among others in the network. Although ESR2 and TP53 that were identified to be important genes in metabolic disorder-cancer comorbidities were present in modules 1 and 2, respectively, they were not among the top degree centrality genes. The pathways enriched for the these modules were mostly the same as that enriched for the obesity-PCOS comorbidity network including several intracellular signaling like PI3K/AKT, MAPK, and FOXO to name a few. KEGG enrichment of the modules also showed majority of the cancers as enriched for the other two metabolic comorbidities. Module 3 however seems to have a trivial contribution in terms of cancer, as the enrichment analysis showed some pathways that are relevant to cancers but not as strong as the other two modules. It also did not show any hub genes that were identified previously. However, PTEN gene was found to be a crucial node with the highest degree centrality in this module. PTEN is known to contribute to several intracellular cancer-related pathways.

Conclusions
Despite numerous clinical observations and meta-analysis showing increasing prevalence of gynecological cancers in women with metabolic/hormonal disorders like T2DM, obesity, and PCOS and the common use of metabolic drugs like metformin in treating these cancers, there are only very few studies that showed molecular interlink in how these disorders contribute to cancer pathogenesis and prognosis, especially for ovarian, endometrial, uterine, and triple negative breast cancers. Besides showing the molecular relation between the metabolic disorders and those cancer types in this study, certain analysis also showed higher relationship with other major and deadly cancers including but not limited to the cancers of the pancreas, colon, liver, and lungs. Pathway enrichment data indicated the involvement of the interleukin pathways especially IL-4 and IL-13 for all the disorders and related cancers. IL-4 and IL-13 are believed to exert epithelial carcinogenesis through its receptors in different types of cancers including pancreatic cancer [60]. IL-4 dependent initiation of PI3K/AKT pathway [60] is observed in PCOS and all 4 gynecological cancers (Figs. S3-S7). IL-4 secreted by the cancer cells is believed to play key role in M1 to M2 transformation of tumor-associated macrophages [60]. There are multiple common pathways between hormonal/metabolic disorder and gynecological cancers (Table S1) which are pro-inflammatory, induce mitogenic signal, etc. Separate independent studies are necessary to further look into the relationship between the metabolic disorders in study and these other cancer types. The findings in this in silico study shed lights on some of the major and hidden molecular relationship among the diseases which will be highly useful in terms of disease diagnosis, categorization, and in the process symptoms mitigation, treatment, and cure development. Although this study highlighted some of the potentially important genes and pathways that could contribute to the metabolic disorder-cancer comorbidities, further validation from experimental studies needs to be conducted both at molecular level and in the respective populations which will also be valuable in better understanding the comorbidities in detail.