Common and specific gene signatures among three different endometriosis subtypes

Aims To identify the common and specific molecular mechanisms of three well-defined subtypes of endometriosis (EMs): ovarian endometriosis (OE), peritoneal endometriosis (PE), and deep infiltrating endometriosis (DIE). Methods Four microarray datasets: GSE7305 and GSE7307 for OE, E-MTAB-694 for PE, and GSE25628 for DIE were downloaded from public databases and conducted to compare ectopic lesions (EC) with eutopic endometrium (EU) from EMs patients. Differentially expressed genes (DEGs) identified by limma package were divided into two parts: common DEGs among three subtypes and specific DEGs in each subtype, both of which were subsequently performed with the Kyoto Encyclopedia of Genes (KEGG) pathway enrichment analysis. The protein-protein interaction (PPI) network was constructed by common DEGs and five hub genes were screened out from the PPI network. Besides, these five hub genes together with selected interested pathway-related genes were further validated in an independent OE RNA-sequencing dataset GSE105764. Results A total of 54 EC samples from three EMs subtypes (OE, PE, DIE) and 58 EU samples were analyzed, from which we obtained 148 common DEGs among three subtypes, and 729 specific DEGs in OE, 777 specific DEGs in PE and 36 specific DEGs in DIE. The most enriched pathway of 148 shared DEGs was arachidonic acid (AA) metabolism, in which most genes were up-regulated in EC, indicating inflammation was the most common pathogenesis of three subtypes. Besides, five hub genes AURKB, RRM2, DTL, CCNB1, CCNB2 identified from the PPI network constructed by 148 shared DEGs were all associated with cell cycle and mitosis, and down-regulated in EC, suggesting a slow and controlled proliferation in ectopic lesions. The KEGG pathway analysis of specific DEGs in each subtype revealed that abnormal ovarian steroidogenesis was a prominent feature in OE; OE and DIE seems to be at more risk of malignant development since both of their specific DEGs were enriched in the pathways in cancer, though enriched genes were different, while PE tended to be more associated with dysregulated peritoneal immune and inflammatory microenvironment. Conclusion By integrated bioinformatic analysis, we explored common and specific molecular signatures among different subtypes of endometriosis: activated arachidonic acid (AA) metabolism-related inflammatory process and a slow and controlled proliferation in ectopic lesions were common features in OE, PE and DIE; OE and DIE seemed to be at more risk of malignant development while PE tended to be more associated with dysregulated peritoneal immune and inflammatory microenvironment, all of which could deepen our perception of endometriosis.


INTRODUCTION
, characterized by the growth of endometrium-type tissue outside the uterine cavity, is a common and usually chronic (long-term) inflammatory disorder, affecting 5-10% of women in their reproductive years (Zondervan et al., 2018). EMs is also considered as a phenotypically heterogeneous condition not only due to diverse symptoms, such as infertility, pelvic pain or dysmenorrhea but also different lesion locations, predominantly but not exclusively, in the pelvic compartment (Vercellini et al., 2014). Since the classic retrograde menstruation hypothesis (Sampson, 1927) that during menstrual uterine contractions, endometrial fragments via trans-tubal reflux flowed to implant onto the peritoneum and abdominal organs could not explain the fact that 76-90% of women experienced retrograde menstruation but not all of these women suffered from EMs (Blumenkrantz et al., 1981), there must exist other mechanisms facilitating the development of EMs. On the other hand, as early as 1997, Nisolle and Donnez provided morphological and histochemical evidence indicating that three main subtypes of endometriosis: ovarian endometriosis (OE), peritoneal endometriosis (PE), and deep infiltrating endometriosis (DIE), should be considered different entities, though they shared the histologic features of endometrial glands and stroma (Nisolle & Donnez, 1997). Thus, investigating the common and specific mechanisms among different EMs subtypes may provide new insight into the pathogenesis of endometriosis. However, due to the limited information as well as samples available from single cohorts, few integrative analyses of EMs subtypes were conducted. Therefore, in this article, we analyzed the microarray datasets GSE7305 and GSE7307 of OE, E-MTAB-694 of PE, and GSE25628 of DIE to obtain common differentially expressed genes (DEGs) among three EMs subtypes along with specific DEGs in each subtype by comparing ectopic lesions (EC) with eutopic endometrium (EU) from EMs patients. Then, the Kyoto Encyclopedia of Genes (KEGG), protein-protein interaction (PPI) network and validation analysis were performed to analyze these common and specific DEGs (Fig. 1). Overall, all results were combined to promote further understanding of different EMs subtypes and reveal a more thorough landscape of EMs.

Data resources
The search for endometriosis-related microarray datasets was conducted in two public databases: Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and  Besides, all EC and EU samples in GSE7305 and GSE25628, as well as most in E-MTAB-694, were paired, meaning they were obtained from the same patients. The information on patients' clinical characteristics in E-MTAB-694 for PE and GSE25628 for DIE were provided in the Tables S1 and S2 according to the original articles (Crispi et al., 2013;Sohler et al., 2013). However, the clinical information of patients was unavailable in the two datasets for OE: GSE7305 (not provided by the original article (Hever et al., 2007)) and GSE7307 (not originally published). Since our analysis did not involve any experiment on humans or animals directly, ethical approval was not necessary.

Data preprocessing
The raw data CEL files and group information of four datasets were downloaded from the GEO and Array-Express database. Firstly, the CEL files of GSE7305 and GSE7307 of subtype OE were combined into an integrated dataset, which was subsequently named the 'merged OE dataset'. Then, CEL files of this 'merged OE dataset' as well as the individual PE dataset E-MTAB-694 and DIE dataset GSE25628 were read by 'Affy' R package (Gautier et al., 2004), and further processed by the robust multi-array average (RMA) method for background correction, normalization, and expression calculation (Irizarry et al., 2003). Moreover, the 'ComBat' function in the SVA R package (Leek et al., 2012) was utilized to adjust the batch effects in the merged OE dataset, and then the principal component analysis (PCA) was applied to assess the performance of the batch effect adjustment. Also, the individual PE and DIE datasets were performed with PCA analysis to visualize sample distributions. Furthermore, probes were annotated according to the annotation files provided by Affymetrix official website (http://www.affymetrix.com/), and unmatched probes were abandoned.

Identification of DEGs
After pretreatment, R package limma (Ritchie et al., 2015) was applied to filtrate DEGs of these three datasets: the merged OE dataset of GSE7305 and GSE7307, the individual PE dataset E-MTAB-694 and DIE dataset GSE25628, respectively, with the cutoff criteria: |log 2 fold change (FC)|>1 and p-value < 0.05. Additionally, the intersection analysis was conducted among three sets of DEGs.

KEGG pathway enrichment analysis
To explain common pathogenesis among three EMs subtypes (OE, PE, and DIE), KEGG pathway enrichment analysis of shared DEGs among the merged OE dataset of GSE7305 and GSE7307, the individual PE dataset E-MTAB-694 and DIE dataset GSE25628 was performed by DAVID (http://david.abcc.ncifcrf.gov/), an online tool for comprehensive functional annotation of genes and proteins. Furthermore, to explore specific pathogenesis of these three subtypes, specific DEGs in each dataset were also conducted with KEGG pathway enrichment analyses by DAVID. A threshold of p-value < 0.05 was utilized to filter all KEGG pathways, which were also ranked by the p-value.

Validation of interested pathway-related genes in GSE105764
Considering all training datasets were comprised of microarray data, we intended to validate interested pathways related genes, for example, the most enriched pathway of common DEGs or representative pathways in the certain subtype, in RNA-sequencing datasets. However, due to the lack of PE and DIE RNA-sequencing datasets in searching results, we only performed the validation analysis in an OE RNA-sequencing dataset GSE105764 from the GEO database, which contained 8 paired EC and EU tissue samples based on GPL20301 (Illumina HiSeq 4000) (Zhao et al., 2018). The raw read counts were calculated by the DEseq2 R package (Love, Huber & Anders, 2014) to obtain the DEGs. Additionally, we also listed the expression of selected genes in the previously analyzed results of microarray datasets as a comparison.

PPI networks construction and identification of hub DEGs
The identified common DEGs among three EMs subtypes were uploaded to the online database STRING (http://string-db.org/; version 11.0) to explore their interactions at the protein level, with an interaction score > 0.4 as the cutoff value. Afterward, the PPI network was visualized in software Cytoscape (version 3.6.1) and analyzed by the 'Degree method' in plugin Cytohubba to identify the top 5 hub nodes in the network.

Validation analysis of hub DEGs in GSE105764
We also performed the validation analysis of the top 5 hub DEGs identified from the PPI network constructed of 148 common DEGs among three EMs subtypes in the OE RNA-sequencing dataset GSE105764. Additionally, we also listed the expression of these hub genes in the previously analyzed results of microarray datasets as a comparison.

Data preprocessing and Identification of DEGs
After combing the expression data, the merged OE dataset of GSE7305 and GSE7307 included 33 EU and 28 ovarian EC samples. Besides, the individual PE dataset E-MTAB-694 contained 17 EU and 18 peritoneal EC samples, and the individual DIE dataset GSE25628 comprised 8 EU and 8 deep infiltrating EC samples. After normalization of raw data in these datasets, boxplots were depicted, showing even mean values of gene expression in each sample. (Fig. S1). Moreover, the principal component analysis (PCA) was performed to examine the differences between EC and EU groups (Figs. 2A, 2B and 2C). DEGs with log 2 FC > 1 were shown in red dots; DEGs with log 2 FC < −1 were in green dots (P < 0.05  Subsequently, analyzed by the R package limma (Ritchie et al., 2015) with the threshold of |log 2 FC| >1 and p-value < 0.05, a total of 744 upregulated as well as 626 downregulated DEGs were obtained from the merged OE dataset of GSE7305 and GSE7307, along with 731 upregulated and 772 downregulated DEGs from the individual PE dataset E-MTAB-694, and 268 upregulated and 77 downregulated DEGs from the individual DIE dataset GSE25628. Volcano plots were depicted for the visualization of DEGs in each dataset (Figs. 2D, 2E and 2F). The intersection analysis indicated that 148 genes were common to three datasets, 729 DEGs specific to the merged OE dataset, 777 DEGs specific to the individual PE dataset and only 36 DEGs specific to the individual DIE dataset owing to relatively fewer probes in this dataset, as shown in the Venn diagram (Fig. 3A).

KEGG enrichment analysis of common and specific DEGs
Based on the KEGG analysis from the online tool DAVID, 148 common DEGs were mainly enriched in the pathways of arachidonic acid (AA) metabolism, vascular smooth muscle contraction, drug metabolism-cytochrome P450, complement and coagulation cascades, tyrosine metabolism, according to the p-value ranking (Fig. 3B).
Likewise analyzed by DAVID, KEGG analysis showed that the top5 significantly enriched pathways of 729 specific DEGs in the merged OE dataset of GSE7305 and GSE7307 were staphylococcus aureus infection, PI3K-Akt signaling pathway, ovarian steroidogenesis, Rap1 signaling pathway, and cell adhesion molecules (CAMs) (Fig. 3C);  the top5 significantly enriched pathways of 777 specific DEGs in the individual PE dataset E-MTAB-694 were cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs), leukocyte trans-endothelial migration, regulation of lipolysis in adipocytes, and natural killer cell-mediated cytotoxicity (Fig. 3D); 36 specific DEGs in the individual DIE dataset GSE25628 were only enriched in two pathways: pathways in cancer and amoebiasis (Fig. 3E). Furthermore, the above-mentioned significant enriched pathways related genes were detailed in Table 2.

Validation of interested pathway-related genes in GSE105764
Since endometriosis was defined as an inflammatory disorder (Bulun et al., 2019) and the top 1 enriched pathway of 148 common DEGs was the arachidonic acid (AA) metabolism pathway, a key pro-inflammatory pathway (Kuehl & Egan, 1980), we validated this pathway-related genes: GGT5, CYP2J2, PTGIS and PLA2G2A in the OE RNA-sequencing dataset GSE105764. The consistency results between microarray training datasets and the RNA-sequencing validation dataset showed that most of these genes were found to be significantly higher in EC tissue than EU endometrium, except for CYP2J2 (Table 3). Moreover, besides the common pathogenetic pathways, each different subtype of EMs might possess its peculiar molecular signature. Since the validation dataset GSE105764 was an OE dataset and the inflammation in EMs were driven by estradiol (Bulun et al., 2019), we chose ovarian steroidogenesis-one representative pathway in subtype OE related genes to validate. The validation results showed that most genes in this pathway were significantly overexpressed in EC in subtype OE, almost consistent with that in the OE microarray training dataset, whereas with no significant difference between EU and EC in the individual PIE and DIE dataset (Table 4), which suggested ectopic tissue specificity and focal prominent aberrant hormonal environment in OE.
We also noticed that both specific DEGs in OE and DIE were enriched in the pathways in cancer, though enriched genes were different. The enriched genes in the 'pathways in cancer' in DIE were COL4A3, COL4A6, PTGER3, RAD51, F2R, while OE seems to be associated with 'pathways in cancer' mainly by PI3K-Akt signaling pathway-related genes, such as FN1, GNG2, KIT, PTGS2, PDGFRA, LPAR1, CCND1, ITGA6, FGFR3, VEGFA, PIK3R1, FGFR2, MET, indicated by both microarray training datasets and RNA-sequencing validation dataset (Tables S3 and S4).

PPI network construction and hub DEGs identification
By analyzing 148 common DEGs in database STRING, a PPI network with 99 nodes and 178 edges was constructed and then visualized in Cytoscape (Fig. 4A). Furthermore, the top 5 hub nodes: CCNB1, CCNB2, RRM2, DTL, AURKB were identified from this PPI network by using the Degree method in plugin Cytohubba (Fig. 4B).

Validation analysis of hub DEGs in GSE105764
As shown in Table 5, the expression trend of top 5 hub DEGs (CCNB1, CCNB2, RRM2, DTL, AURKB) was consistent in three subtypes-all under-expressed in ectopic lesions compared to eutopic endometrium, which was also further validated in the OE RNAsequencing dataset GSE105764.

DISCUSSION
Endometriosis (EMs) was considered as a heterogeneous disease and its subtypes: ovarian endometriosis (OE), peritoneal endometriosis (PE), and deep infiltrating endometriosis (DIE) were likely to possess different aetiologies, which might require different diagnostic markers as well as treatments (Nisolle & Donnez, 1997;Gordts, Koninckx & Brosens, 2017). However, it is rare to see that all these three EMs subtypes were simultaneously included and analyzed in one study. To our best of knowledge, our article represents the first endeavor to analyze the common and specific molecular signatures among OE, PE, and DIE in a bioinformatic way based on effective microarray or RNA-sequencing datasets.
In this article, we conducted an integrative analysis of EMs mRNA microarray datasets belonging to three different subtypes: GSE7305 and GSE7307 of OE, E-MTAB-694 of PE, and GSE25628 of DIE. By analyzing DEGs between EU and EC samples, we found that Table 3 The expression of the arachidonic acid metabolism pathway-related genes in microarray training datasets and the OE RNA-sequencing validation dataset GSE105764.

Gene Description OE RNA-sequencing dataset GSE105764
The merged OE dataset of GSE7305 and GSE7307

The individual PE dataset E-MTAB-694
The individual DIE dataset GSE25628    148 shared DEGs were common to three EMs subtypes, which were mainly involved in the pathways of arachidonic acid (AA) metabolism, vascular smooth muscle contraction, drug metabolism-cytochrome P450, complement and coagulation cascades, tyrosine metabolism revealed by KEGG enrichment analysis. Moreover, both the microarray training datasets and validation OE RNA-sequencing dataset showed that most of the genes in the top 1 enriched pathway-arachidonic acid (AA) metabolism pathway were activated in the EC tissues, except for CYP2J2. In the arachidonic acid (AA) metabolism pathway, under the action of bioactive enzymes, substrate AA was catalyzed to generate endogenous eicosanoids such as prostaglandins (PGs), leukotrienes (LTs) and epoxyeicosatrienoic acids (EETs), which acted as mediators of various inflammatory disorders (Kuehl & Egan, 1980), including endometriosis (Benedetto, 1989;Monsivais et al., 2012;Sacco et al., 2012). PLA2G2A, a member of the phospholipase A2 family (PLA2) to catalyze the release of AA from membrane phospholipids, had been reported over-expressed in OE (Kocbek et al., 2015), DIE (Carrarelli et al., 2016) andPE (Lousse et al., 2010) ectopic lesions compared to normal or eutopic endometrium. The increased levels of PLA2G2A in endometriotic tissue would be responsible for providing AA for further PGs biosynthesis, such as PGE2, thus potentiating survival, migration, and invasion of endometriotic cells (Banu et al., 2008). The overexpression of PTGIS, functioning to convert prostaglandin H2 (PGH2) to prostaglandin I2 (PGI2), whose higher expression in the peritoneal fluid was observed in patients with endometriosis (Ylikorkala et al., 1984), had also been found a previous study (Monsivais et al., 2012). Under hypoxic conditions, PTGIS was reported to promote VEGF expression in human lung fibroblasts by producing PGI2 (Wang et al., 2013), whose enhanced production in ovarian endometrial cyst seemed to be associated with dysmenorrhea in endometriosis patients (Koike et al., 1992). GGT5 was responsible for converting leukotriene C4 (LTC4) to leukotriene D4 (LTD4), both of which were found with an increment of concentration in menstrual blood from patients with primary dysmenorrhea (Rees et al., 1987) and the highly selective LTD4 receptor antagonist had an inhibiting effect on endometriotic implant growth in rat endometriosis model (Kiykac Altinbas et al., 2015). However, our analysis also revealed that in the ectopic lesions, CYP2J2 was under-expressed, an enzyme of cytochrome P450 (CYP) superfamily to metabolize AA into EETs, which exerted an anti-inflammatory effect by various mechanisms (Shahabi et al., 2014). Taken together, the activated arachidonic acid (AA) metabolism and its corresponding products seemed to promote inflammation in ectopic lesions, thus facilitating the development of EMs.
Interestingly, the 5 hub nodes RRM2, AURKB, DTL, CCNB1, CCNB2 identified from the PPI network constructed by148 shared DEGs were all associated with cell cycle and mitosis, and down-regulated in the ectopic lesions in our analysis while always up-regulated in cancer tissues (Wang et al., 1997;Kolesar et al., 2009;Takashima et al., 2014;Kobayashi et al., 2015;Chieffi, 2018), thus promoting excessive proliferation, which suggested limited and controlled proliferative activity in EMs endometriotic lesions, distinct from cancerous proliferation feature. For example, RRM2, one of the subunits of ribonucleotide reductase complex providing precursors indispensable for DNA synthesis (Engström et al., 1985), was reported lowly expressed in EC compared to the EU in another genome-wide microarray study (Zafrakas et al., 2008), while its overexpression would enhance tumor angiogenesis and growth in multiple cancers (Zhang et al., 2009;Rahman et al., 2013). AURKB, a chromosomal passenger protein ensuring correct chromosome alignment and segregation in the mitosis (Kelly et al., 2010), was found low-expressed in OE ectopic tissues (Calcagno et al., 2011) whereas high-expressed in many tumors causing cell aneuploidy division (Giet, Petretti & Prigent, 2005). DTL, known as denticleless E3 ubiquitin-protein ligase homolog, was responsible for mediating the polyubiquitination and subsequent degradation of multiple regulators to ensure proper cell cycle progression (Higa et al., 2006), though seldom studied in EMs. However, the cyclin CCNB1, key regulator as well as CCNB2 in cell cycle controlling G2/M transition (Gong & Ferrell, 2010), were found up-regulated in ectopic tissue compared to eutopic endometrium in Tang et al. study (Tang et al., 2009), which was inconsistent with our analysis. This inconsistency might be explained by different detection methods and the intrinsic heterogeneity of EMs. Moreover, the phenomenon that most ectopic lesions contained sparse epithelial cells, which were abundant in eutopic endometrium and available to proliferate at a rapid speed, and mainly of stromal cells, whose multiplication rate was relatively mild, might result in the slow growth in ectopic tissue (Bulun et al., 2019).
Moreover, besides the common pathogenesis, we also explored the peculiar molecular signature in each subtype of EMs by the KEGG analysis of specific DEGs in different subtypes. For instance, most genes in the pathway of ovarian steroidogenesis were significantly up-regulated in EC in subtype OE while with no significant difference between EU and EC in the PE and DIE, which suggested ectopic tissue specificity and focal prominent aberrant hormonal environment in OE. Furthermore, the most up-regulated gene STAR in the pathway was proved to be correlated with the severity of OE and PE (Tian et al., 2009). That OE was regarded as an indicator of more severe pelvic and intestinal disease (Redwine, 1999) might be explained by this overactive ovarian steroidogenesis in OE since the steroid hormone was primarily produced in the ovarian and then reached endometriotic lesions via blood circulation and follicular fluid at the time of ovulation.
We also noticed that both specific DEGs in OE and DIE enriched in 'pathways in cancer', though enriched genes were different: DIE was associated with the risk of malignant transformation by enriched genes COL4A3, COL4A6, RAD51, F2R, PTGER3, while OE mainly by those PI3K pathway-related genes, such as FN1, GNG2, KIT, PTGS2, PDGFRA, LPAR1, CCND1, ITGA6, FGFR3, VEGFA, PIK3R1, FGFR2, MET. However, PE was more likely related to dysregulated peritoneal immune and inflammatory microenvironment, indicated by its specific DEGs enriched pathways, such as maladjusted cytokine-cytokine receptor interaction, leukocyte trans-endothelial migration, and natural killer cell-mediated cytotoxicity. Whether PE could be regarded as more 'superficial' or 'slight' disease primarily affected by peritoneal fluid factors, than OE and DIE mainly influenced by blood or ovarian factors (Koninckx, Kennedy & Barlow, 1998), still needs further investigation. Hence, the landscape of specific signatures in each subtype were not clear enough in our analysis due to a relatively small number of available microarray datasets as well as limited information provided by these datasets, especially the DIE dataset based on a platform with fewer probes, which would be modified by obtaining more RNA-sequencing datasets including all three subtypes and considerable samples in future analysis.
However, our analysis still has some limitations. Firstly, due to the lack of RNAsequencing datasets for PE and DIE, validation analysis was only performed in subtype OE. Secondly, since the uterine endometrium undergoes molecular signature changes throughout the menstrual cycle (Talbi et al., 2006), different cycle phases might be one of the confounding factors: DIE samples were all collected during the proliferative phase, all PE samples and OE samples in GSE7305 were collected during either proliferative or secretory phase, while cycle phases of OE samples in GSE7307 were unknown. Comparing the gene expression in each subdivided cycle phase respectively would provide an improved perspective. Additionally, the contamination of normal tissue sometimes was unavoidable during the sampling process, which in a way would lead to potentially controversial results. For example, in our results, the significant overexpression of ovarian steroidogenesis related genes in OE while not in PE and DIE ectopic tissues may be explained by the intrinsic differences among subtypes and/or the mixture of normal ovarian tissues. This needs further laboratory investigation in the future.

CONCLUSION
Through integrated bioinformatics analysis, we found that inflammation, especially arachidonic acid (AA) metabolism-related inflammatory process was the most common pathogenesis of OE, PE, and DIE. Besides, a slow and controlled proliferation in ectopic lesions was commonplace in these three EMs subtypes. Meanwhile, abnormal ovarian steroidogenesis was a prominent feature in OE; OE and DIE seemed to be at more risk of malignant development while PE tended to be more associated with dysregulated peritoneal immune and inflammatory microenvironment. All these findings could deepen our understanding of the common and specific molecular events in different subtypes in endometriosis.