Bioinformatical analysis identifies PDLIM3 as a potential biomarker associated with immune infiltration in patients with endometriosis

Background Endometriosis is a chronic systemic disease, whose classic symptoms are pelvic pain and infertility. This disease seriously reduces the life quality of patients. The pathogenesis, recognition and treatment of endometriosis is still unclear, and cannot be over emphasized. The aim of our study was to investigate the potential biomarker of endometriosis for the mechanism and treatment. Methods Using GSE11691, GSE23339 and GSE5108 datasets, differentially expressed genes (DEGs) were identified between endometriosis and normal samples. The functions of DEGs were reflected by the analysis of gene ontology (GO), pathway enrichment and gene set enrichment analysis (GSEA). The LASSO regression model was performed to identify candidate biomarkers. The receiver operating characteristic curve (ROC) was used to evaluate discriminatory ability of candidate biomarkers. The predictive value of the markers in endometriosis were further validated in the GSE120103 dataset. Then, the expression level of biomarkers was detected by qRT-PCR and Western blot. Finally, the relationship between candidate biomarker expression and immune infiltration was estimated using CIBERSORT. Results A total of 42 genes were identified, which were mainly involved in cytokine–cytokine receptor interaction, systemic lupus erythematosus and chemokine signaling pathway. We confirmed PDLIM3 was a specific biomarker in endometriosis (AUC = 0.955) and validated in the GSE120103 dataset (AUC = 0.836). The mRNA and protein expression level of PDLIM3 in endometriosis tissue was significantly higher than normal. Immune cell infiltration analysis revealed that PDLIM3 was correlated with M2 macrophages, neutrophils, CD4+ memory resting T cells, gamma delta T cells, M1 Macrophages, resting mast cells, follicular helper T cells, activated NK cells, CD8+ T cells, regulatory T cells (Tregs), naive B cells, plasma cells and resting NK cells.


INTRODUCTION
It affects nearly 10% of women who are of reproductive age. The most common symptoms are pelvic pain and infertility. Nearly 70% of teens with pelvic pain are later diagnosed with endometriosis (Yeung et al., 2011). Endometriosis seriously reduces the life quality of patients, increases incidence of depression and causes a heavy burden on the direct and indirect healthcare (Agarwal et al., 2019). The gold standard for endometriosis diagnosis is invasive laparoscopy examination. It is difficult to recognize because of patients' nonspecific descriptions of symptoms, especially at early stages. Despite its prevalence, the average time from symptom onset to diagnosis is 5 to 10 years (Hudelist et al., 2012;Greene et al., 2009). The new opinion about diagnosis of endometriosis is emphasize disease symptoms and their origins rather than lesion presence or absence (Yeung et al., 2011). Specific circulating biomarkers, such as miR-125b-5p, miR-150-5p, miR-61, have been studied as promising early clinical diagnostic tools, although functional researches are needed to confirm their effect on this disease. Currently, medical therapy and surgical resection are the gold standard treatment for endometriosis (Vercellini et al., 2014) while both are associated with high recurrence rates and limited by high occurrence of side effects. The mechanisms underlying the pathogenesis and development of endometriosis will open the door to a new approach to treatment.
Various hypotheses have been proposed to explain the underlying molecular mechanisms of endometriosis, consisting of the endometrial implantation theory of Sampson (Kruitwagen et al., 1991), eutopic endometrium determinism, stem cell factors (Burney & Giudice, 2012), epigenetic factors, immunodeficiency as well as immune and inflammatory factors (Ahn et al., 2015).
Recent clinical and molecular studies have suggested that innate properties or acquired properties of the endometrium and impaired immune homeostasis are systems of significance in explaining the mechanism of endometriosis (Burney & Giudice, 2012). Immune imbalance is associated with increased implantation, proliferation, and angiogenesis of ectopic endometrium. Previous studies have shown that immune dysfunction happens in peritoneal fluid of endometriosis, which may contribute to the progression of endometriosis. Functional changes in immune components have been indicated, consisting of natural killer (NK) cells, monocytes/macrophages, B cells, T lymphocytes, and cytokines (Zou et al., 2021). Although numerous studies have described occurrence of immune abnormalities in endometriosis, the role of the immune system is not clearly understood. Hence, it is essential to explore mechanism of immune dysfunction in endometriosis, which can contribute to elucidate the role of immune system in pathogenesis of endometriosis and generate applicable insights to develop preventive and control strategies, original non-invasive diagnostic methods and targeted therapeutics.
In the present study, we downloaded three microarray datasets (GSE11691, GSE23339, GSE5108) from the Gene Expression Omnibus (GEO) database to detect the potential biomarkers for further analysis. We identified candidate genes, explore immune mechanism and key signaling pathways regulating occurrence, development and progression of endometriosis through bioinformatics method. In conclusion, these findings provide novel insights into understanding the underlying immune mechanisms and seeking for potential molecular treatment.

Gene expression profile
We searched the associated gene expression profiles of endometriosis in NCBI-GEO (https://www.ncbi.nlm.nih.gov/geo/) (Barrett et al., 2005), using the keywords "endometriosis" and "Homo sapiens". Three gene expression datasets containing microarray data from normal endometrial tissues and ectopic endometrial tissues, GSE11691 (Hull et al., 2008), GSE23339 (Hawkins et al., 2011) andGSE5108 (Eyster et al., 2007), were selected for further analysis. The GSE11691 microarray data consisted of nine normal human endometrium and nine ectopic human endometrium (two from the peritoneal wall, seven from the broad ligament) and was generated using the GPL96 platform Affymetrix Human Genome U133A Array. The GSE23339 microarray data consisted of nine normal human endometrium and 10 ectopic human endometrium (all from the endometriotic cysts of the ovaries) and was generated using the GPL6102 Illumina human-6 v2.0 expression beadchip. The GSE5108 microarray data consisted of 11 normal human endometrium and 11 ectopic human endometrium (five from the peritoneal wall, six from the endometriotic cysts of the ovaries) and was generated using the GPL2895 GE Healthcare/Amersham Biosciences CodeLink Human Whole Genome Bioarray. In addition, the GSE120103 (Bhat et al., 2019) dataset, containing 18 endometriosis (all from the endometriotic cysts of the ovaries) and 18 control samples, was used as the validation cohort using the GPL6480 platform Agilent-014850 Whole Human Genome Microarray 4 × 44K G4112F. GSE12768 (Ritchie et al., 2015) dataset, containing two endometriosis (both from the endometriotic cysts of the ovaries) and two control samples, was used as another validation cohort using the GPL7304 Institut Cochin HG18 60mer expression array 47K.

Identification of differentially expressed genes (DEGs)
The raw data of three datasets, GSE11691, GSE23339 and GSE5108, was merged and removed batch effects by "sva" package (http://www.bioconductor.org/) (Ritchie et al., 2015). Then, the "limma" package (http://www.bioconductor.org/) was used to attain DEGs between endometriosis samples and normal samples. The raw data of GSE12768 was analyzed separately to acquired the DEGs list, and compared with the list of three datasets in order to verify the reliability of our analysis. The significant DEGs were identified according to adjusted P value < 0.05 and |log fold change (FC)| > 2.

Functional and pathway enrichment analysis of DEGs
To reflect gene functions, gene ontology (GO) (http://geneontology.org/) (Thomas, 2017) biological processes were shown in three aspects: biological processes (BP), cellular component (CC) and molecular function (MF). And the analysis process was achieved by "clusterProfiler" package (Wu et al., 2021;Yu et al., 2012) in the present study, using the cut-off criterion P value < 0.05. ToppGene (https://toppgene.cchmc.org/enrichment.jsp) (Chen et al., 2009) is a free tool for uncovering the biological significance behind a cluster of genes. The REACTOME (https://reactome.org/) (Jassal et al., 2020) pathway enrichment analysis, was exactly performed using this tool. A P value < 0.05 was considered as the threshold points.
Moreover, to get more information of mechanism behind DEGs, gene set enrichment analysis (GSEA) was employed to explore the potential feature between the endometriosis and control groups. The "c2.cp.kegg.v7.4.symbols.gmt" from the Molecular Signatures Database (MSigDB) was downloaded and used as the reference gene set. An adjusted P < 0.05 and false discovery rate (FDR) < 0.05 were considered as significant.
Protein-protein interaction (PPI) networks construction STRING (https://string-db.org/) (Szklarczyk et al., 2015) is a tool designed to achieve a comprehensive and objective global network and visualize PPI information. Proteins associated with DEGs were chosen based on information in STRING (confidence score > 0.4), and then PPI networks were constructed using Cytoscape software (http://cytoscape. org/) (Shannon et al., 2003).

Screening of specific gene biomarkers
The least absolute shrinkage and selection operator (LASSO), a regression analysis algorithm, was applied to evaluate the value of DEGs in this study. The "glmnet" package in R did the above analysis and returned candidate potential genes. And the expression levels of candidate biomarkers were further tested in the GSE120103 dataset.

Receiver operating characteristic (ROC) curve analysis
To test the predictive accuracy of the identified biomarkers, we generated ROC curves using "pROC" package (Robin et al., 2011) based on the candidate genes and compare the predictive value of candidate genes by the area under the ROC curve (AUC). And then, the predictive value of these genes was also validated in the GSE120103 dataset.

Relationship between identified genes and infiltrating immune cells in endometriosis
We used the CIBERSORT (https://cibersortx.stanford.edu/) (Newman et al., 2015;Robin et al., 2011) to calculate immune cell infiltrations of endometriosis and control tissues. Then, we applied "corrplot" and "vioplot" R package to visualize the results and explore correlation of infiltrating immune cells. In addition, the relation of the candidate gene markers with the levels of infiltrating immune cells was calculated and visualized by "ggpubr" package.

Protein extraction and western blot
Normal endometrium tissues and ovarian endometriosis tissues were lysed in RIPA lysis buffer (PC101; EpiZyme, Shanghai, China) with protease and phosphatase inhibitor cocktail. The protein concentration was detected by a BCA protein assay. The same amounts of total protein (32 µg) were loaded and separated onto 12.5% SDS-PAGE and was then transferred to PVDF membranes. After blocking with protein-free rapid blocking buffer (PS108P; EpiZyme, Shanghai, China) for 0.5 h at room temperature, the membranes were incubated with primary antibodies at 4 C overnight. The primary antibodies rabbit anti-PDLIM3 antibody (1:2,000; Servicebio) was used and rabbit antiβ-Actin (1:5,000; ABclonal) served as the internal control.
The membranes were probed with secondary antibodies (1:3,000, Servicebio) for 1.5 h at room temperature and three washes in TBST were processed to remove excess secondary antibody. We then used an ECL western blotting kit (NCM Biotech, Suzhou, China) to visualize and image the targeted protein bands.

Statistical analysis
Most statistical calculations were conducted using R (version 4.1.1). The results of qRT-PCR and western blot were shown as the mean ± SD and GraphPad Prism 8.0 (GraphPad Software, San Diego, CA, USA) were used for statistical analysis. All the comparisons between two groups were analyzed using Student's t-test for normally distributed variables. If the variables distribution was abnormal, the Mann-Whitney U-test would be used. The relationship between the expression of biomarkers and infiltrating immune cells was analyzed using Spearman's rank correlation analysis. Two-sided with P < 0.05 (95% confidence interval) and false discovery rate (FDR) < 0.05 were considered as statistically significant.

Identification of DEGs in endometriosis
Data from a total of 30 endometriosis samples (14 peritoneal endometriosis samples, 16 ovarian endometriosis samples) and 29 control endometrium samples from three GEO datasets (GSE11691, GSE23339 and GSE5108) were analyzed. The DEGs were calculated using the "limma" package after removing the batch effects. We obtained a total of 42 DEGs, including upregulated 23 genes and downregulated 19 genes (Table 1). All DEGs were shown in the heatmap and volcano map (Fig. 1). In addition, the DEGs list from GSE12768 dataset was displayed in Table S1.

GO and pathway enrichment of DEGs in endometriosis
The above list of DEGs was used to evaluate if they were enriched into specific GO or reactome pathways. GO analysis identified that the DEGs were significantly enriched in BP, including humoral immune response, complement activation, alternative pathway, antimicrobial humoral response, and so on (Table 2). In terms of MF, DEGs were mainly enriched in Toll-like receptor binding, G protein-coupled receptor binding, long-chain fatty acid binding, and chemokine receptor binding (Table 2). Moreover, CC demonstrated that filamentous actin, lipid droplet, sperm flagellum and actin filament were the most significantly enriched GO term ( . REACTOME pathway enrichment analysis was also used to obtain the signaling pathways for different genes. In this study, these DEGs were mainly involved in the regulation of complement cascade, suprofen pathway, indomethacin pathway and mefenamic acid pathway (Table 3).

PPI networks construction and validation of predictive feature biomarkers
We uploaded a list of proteins associated with DEGs, removed the disconnected nodes and then acquired a PPI network using Cytoscape software (Fig. 3A). There were 26 nodes and 50 edges on this network. Furthermore, the LASSO algorithm was used to narrowed down the DEGs, leading to the identification of seven variables as specific biomarkers for endometriosis (Fig. 3B). The seven genes were FMO1, PDLIM3, FAM129A, CLDN11, C3, TMPRSS4 and DEFB1. And then, we used ROC curve to estimate the predictive value of genes. As shown in Fig. 3C, the ability of PDLIM3 in discriminating endometriosis from the control samples demonstrated a favorable diagnostic value, with an AUC of 0.955 (95% CI [0.894-0.997]). Moreover, this result was confirmed in the GSE120103 dataset with an AUC of 0.836 (95% CI [0.691-0.948]). Other six genes showed no significant difference in the GSE120103 dataset (Fig. 3D).
At the same time, in order to generate more accurate and reliable results, the GSE120103 dataset was employed to verify the expression levels of the seven features. The expression levels of PDLIM3 in endometriosis tissue were notably higher than those in the normal tissue (Fig. 4A). There was no significant difference between the two groups in other six gene expression (Fig. 4A). Therefore, PDLIM3 became one of the potential specific biomarkers in endometriosis.

QRT-PCR and western blot
To validate the expression of PDLIM3, qRT-PCR and western blot were used to measure transcriptional and protein expression in both ovarian endometriosis and normal eutopic endometrium tissue from endometriosis-free women. We found PDLIM3 had higher expression level in endometriosis group than normal control (qRT-PCR: n = 6 vs. 6; Western blot: n = 7 vs. 4) (Figs. 4B and 4C). The patients of endometriosis ranged in age from 26 to 43 years (average age 32 years). The BMI was 21.76 ± 2.42 (mean ± SD). And, their rASRM endometriosis stage was III or IV. Normal group samples were from non-endometriosis patients, ranged in age from 31 to 41 years (average age 34 years). The BMI was 21.23 ± 1.21. There was no significant different in their general characteristics.

DISCUSSION
The most widly accepted theory about pathogenesis of endometriosis is retrograde menstruation and endometrial implantation reported by Sampson et al. However, it cannot perfectly explain all aspects of endometriosis. More evidence determined a combination of genetic or epigenetic alterations leads to endometriosis (Saha et al., 2015;Dalsgaard et al., 2013;Bulun et al., 2019;Manolio et al., 2009). Therefore, it is critical to acquire a clear understanding of the molecular mechanisms underlying endometriosis if we are to improve the diagnosis and treatment of endometriosis. The main purpose of our study was to detect the potential biomarker of endometriosis for the mechanism and treatment.
In this investigation, we compared three public datasets and a total of 42 DEGs were identified, including 23 upregulated genes and 19 downregulated genes. Another DEGs list of the validation set included most of 42 DEGs, which strengthen current results. The results of enrichment analyses suggested that gene function enriched by DEGs were mainly associated with immune system, such as humoral immune response, complement activation, alternative pathway, antimicrobial humoral response, cytokine-cytokine receptor interaction, systemic lupus erythematosus, chemokine signaling pathway, and  regulation of complement cascade. Endometriosis is a disease with prolonged inflammatory response. The presence of ectopic tissues in peritoneal cavity, without clear mechanism, reaches a state of balance with a high level of proinflammatory cytokines and immune cells (Li et al., 2020). This evidence is consistent with our findings, proving that the results in the present study are accurate, as well as confirming that the immune response plays a vital role in endometriosis.
Based on a machine-learning algorithm, seven variables were identified from 42 DEGs and then verified their diagnosis value by ROC curve. Finally, PDLIM3 was screened out due to it is highly expressed in ectopic endometrium and has a favorable diagnostic value in all datasets. There were totally 14 peritoneal endometriosis samples, 16 ovarian , which mostly accounted for our training data and totally accounted for our validating data. Therefore, we believed four datasets used in our study could effectively reflect the clinical proportions of endometriosis of different types. On the other hand, indeed, the origin of samples, which was similar but not identical, between meta-analysis and validation could partly explain the no significant different gene expression for six of the seven genes initially identified. PDLIM3, also known as ALP (actin-associated LIM protein), is a member of the PDZ-LIM domain family (Xia et al., 1997). It is mainly contained a N-terminal PDZ domain and a C-terminal LIM domain. Previous studies revealed PDLIM3 was mainly expressed in skeletal muscle (Passier, Richardson & Olson, 2000) and could regulate skeletal muscle development (Pomiès et al., 2007;Ohsawa et al., 2011). Lak et al. (2021 identified PDLIM3 was one of the rhabdomyosarcoma markers and associated with clinical outcome. Wang et al. (2019) found that polymorphisms in PDLIM3 gene increased risk of idiopathic dilated cardiomyopathy. Other studies indicated PDLIM3 was also correlated to hypertrophic cardiomyopathy (Lopes et al., 2015;Bagnall, Yeates & Semsarian, 2010), atrial fibrillation (Vad et al., 2020), myofibrillar myopathy (Williams et al., 2021), the prognosis of hepatocellular carcinoma (Wu et al., 2021), early regional metastasis of tongue cancer (Lee et al., 2021), muscle-invasive bladder urothelial carcinoma (Feng et al., 2020) and medulloblastoma (Shou et al., 2015). It is the first time that PDLIM3 has been linked to endometriosis. We believe it will play an essential role in better understanding endometriosis mechanism and acting a potential target of therapy.
Furthermore, we used CIBERSORT to assess the types of immune cell infiltration in endometriosis and normal samples. A variety of immune cell subtypes were potentially related to the occurrence and development of endometriosis. We performed correlation analysis between PDLIM3 and immune cells, PDLIM3 was found to be correlated with M2 macrophages, neutrophils, CD4+ memory resting T cells, gamma delta T cells, M1 Macrophages, resting mast cells, follicular helper T cells, activated NK cells, CD8+ T cells, Tregs cells, naive B cells, plasma cells and resting NK cells. And, we analyzed the immune microenvironment of validation dataset. The tissue compositions and sample size of the meta-analysis and GSE120103 were different, which may be able to explain the different results. Notably, these two datasets both showed the specificity of the proportion of Tregs cells. Previous studies indicated the alteration of Tregs cell and Th17 cell may lead to the occurrence and development of ectopic endometrium implantation and accelerate Endometriosis is more than a pelvic disease. Changes in multiple systems, such as cardiovascular system, immune system and nervous system, have been found in patients with endometriosis (Taylor, Kotlyar & Flores, 2021). The shifts in circulating immune cell populations as well as the presence of proinflammatory cytokines create a widespread inflammatory environment. A precise control over various types of immune cells is demanded to achieve a safe and effective treatment on endometriosis. The novel biomarkers of endometriosis correlated with the magnitude of immune cell infiltration may be targets of therapy or predictors of therapeutic response.
The limitations of this study should also be acknowledged. First, the study was retrospective; thus, important clinical information was unavailable. Second, immune cell infiltration and the functions of PDLIM3 in endometriosis were inferred by bioinformatics analysis, and future prospective studies with larger sample sizes should be employed to validate our findings. Third, the biomarker PDLIM3 in this study is derived from endometriosis tissue, which indicates that this biomarker is not suitable to be used for early clinical diagnosis. Further study should be performed to confirm the function of PDLIM3 in endometriosis.

CONCLUSIONS
In a word, our results strongly suggest that PDLIM3 plays a previously unidentified but critical role in pathogenesis and immune environment of endometriosis. Better understanding of PDLIM3 could have a highly beneficial impact on the clinical treatment and outcomes of endometriosis patients.