Discovery of non-invasive biomarkers for the diagnosis of endometriosis

Background Endometriosis is a common gynaecological disorder affecting 5–10% of women of reproductive age who often experience chronic pelvic pain and infertility. Definitive diagnosis is through laparoscopy, exposing patients to potentially serious complications, and is often delayed. Non-invasive biomarkers are urgently required to accelerate diagnosis and for triaging potential patients for surgery. Methods This retrospective case control biomarker discovery and validation study used quantitative 2D-difference gel electrophoresis and tandem mass tagging–liquid chromatography–tandem mass spectrometry for protein expression profiling of eutopic and ectopic endometrial tissue samples collected from 28 cases of endometriosis and 18 control patients undergoing surgery for investigation of chronic pelvic pain without endometriosis or prophylactic surgery. Samples were further sub-grouped by menstrual cycle phase. Selected differentially expressed candidate markers (LUM, CPM, TNC, TPM2 and PAEP) were verified by ELISA in a set of 87 serum samples collected from the same and additional women. Previously reported biomarkers (CA125, sICAM1, FST, VEGF, MCP1, MIF and IL1R2) were also validated and diagnostic performance of markers and combinations established. Results Cycle phase and endometriosis-associated proteomic changes were identified in eutopic tissue from over 1400 identified gene products, yielding potential biomarker candidates. Bioinformatics analysis revealed enrichment of adhesion/extracellular matrix proteins and progesterone signalling. The best single marker for discriminating endometriosis from controls remained CA125 (AUC = 0.63), with the best cross-validated multimarker models improving the AUC to 0.71–0.81, depending upon menstrual cycle phase and control group. Conclusions We have identified menstrual cycle- and endometriosis-associated protein changes linked to various cellular processes that are potential biomarkers and that provide insight into the biology of endometriosis. Our data indicate that the markers tested, whilst not useful alone, have improved diagnostic accuracy when used in combination and demonstrate menstrual cycle specificity. Tissue heterogeneity and blood contamination is likely to have hindered biomarker discovery, whilst a small sample size precludes accurate determination of performance by cycle phase. Independent validation of these biomarker panels in a larger cohort is however warranted, and if successful, they may have clinical utility in triaging patients for surgery. Electronic supplementary material The online version of this article (10.1186/s12014-019-9235-3) contains supplementary material, which is available to authorized users.


Introduction
Endometriosis is estimated to affect 5-10% of women of reproductive age. Although some women with endometriosis may be asymptomatic, it can cause significant pain symptoms and infertility. The diagnosis of endometriosis can be difficult to achieve. Various studies worldwide have documented a delay of 7-11 years from appearance of symptoms until diagnosis [1][2][3]. This delay is partly due to the fact that an invasive laparoscopy is needed to establish the diagnosis in some women. Imaging techniques are sensitive for diagnosis of ovarian endometriomas and deep endometriosis in experienced hands, but are less accurate for diagnosing other forms of endometriosis [4].
While efforts have been made to develop bloodbased diagnostic tests for endometriosis, few reported markers have been independently validated and none are in clinical use. As a result, definitive diagnosis of endometriosis remains surgical, requiring laparoscopy under general anaesthetic, exposing patients to potentially serious complications. Difficulty in establishing a diagnosis leads to patients spending years on visits to their doctors and unnecessary investigations, adding to the significant economic burden of the disease [5]. While early stage endometriosis is not without consequence, delay in diagnosis may allow disease progression with reduced treatment efficiency and poorer outcomes. There is clearly an urgent and unmet need for a minimally invasive diagnostic test for endometriosis, though clinically useful blood-borne markers have yet to be found [6,7]. The aim of a non-invasive test of endometriosis should be to expedite clinical diagnosis by achieving a high positive predictive value (> 90%) or to triage patients for laparoscopic investigation for conclusive pathological diagnosis. A lower sensitivity may be tolerated for the latter if specificity is high (> 90%). Ideally, a test would also need to function independently of menstrual cycle phase. Early diagnosis of endometriosis may allow preventative measures that delay progression of the disease such as use of the progesterone only pill or Mirena coil. Women hoping to conceive or those who are refractory to medical management can be put forward for surgical treatment earlier.
The first aim of our study was to identify potential biomarkers through proteomic profiling of eutopic and ectopic endometrial tissue specimens taken from women with a confirmed diagnosis of endometriosis and relevant controls undergoing exploratory surgery for chronic pelvic pain without endometriosis or prophylactic surgery due to strong familial risk of cancer. The second aim was to test these potential biomarkers in sera from a larger case control set of women, and additionally validate biomarkers reported in the literature [4].

Patient recruitment
Patients and samples were sourced from the University College London Hospital Gynaecology Department following ethical approval and informed consent. Patients were those referred by their general practitioners or other clinicians for investigation of pelvic pain or for diagnosis and/or treatment of endometriosis. Those who agreed to undergo laparoscopic surgery for investigation and treatment of endometriosis, pelvic pain or bilateral salpingoophorectomy for strong family history of breast and ovarian cancer were approached and recruited. Inclusion criteria were as follows: 'endometriosis cases' were defined as women diagnosed with endometriosis at laparoscopy and confirmed histologically; 'controls with pain' were defined as symptomatic women with pelvic pain of unknown cause or chronic pelvic inflammatory disease without surgical evidence of endometriosis; 'controls without pain' were regularly cycling women with no known disease undergoing bilateral tubal ligation and/or prophylactic bilateral salpingoophorectomy due to familial risk of breast and ovarian cancer and with no visual evidence of endometriosis at laparoscopy. The following women were excluded from the study; post-menopausal women, women with a positive pregnancy test or unknown pregnancy status on day of surgery, those with other benign conditions or malignancies (particularly patients with fibroids and/or cancer were excluded as these conditions may compromise the integrity of the endometrium), women on any hormonal medication < 3 months prior to surgery and those whose surgical findings and pathological reports were inconsistent. Cycle phase was determined by a triple approach to ensure accuracy; chronologically, by histological dating and by sex steroid hormone determination. Women with unconfirmed menstrual cycle stage were excluded. Additional patient data was collected including age, fertility history, treatment history (oral contraceptive and GnRH analogue use), menstrual cycle phase, pain history, histopathology findings and anatomic characteristics of disease lesions. All patient records were handled per NHS confidentiality practices. Samples were anonymised and sequentially numbered. A lab coding system and database were developed for recording anonymised patient and sample information.

Sample collection
Endometrial tissue biopsies (eutopic endometrium) were obtained by Pipelle cannula or curettage from women undergoing laparoscopy. Ectopic endometrial tissue (endometriosis) samples were obtained from the same women at laparoscopy. Superficial lesions and deep infiltrating endometriosis samples only were used in this study, the rationale being that imaging techniques are sensitive enough for diagnosis of ovarian endometriomas, but are less accurate for diagnosing endometriosis elsewhere in the pelvis.
A portion of each tissue specimen was fixed in 10% buffered formalin for histological examination, whilst remaining tissue was washed in sterile phosphate buffered saline to remove excess blood. Tissues were dried using lint-free paper, transferred into weighed Eppendorf tubes and snap-frozen in liquid nitrogen. Samples were transported to the lab and stored at − 80 °C. Cycle stage for each patient was determined by endometrial dating by an experienced histopathologist without prior knowledge of sample group according to [8]. Endometriotic lesions were confirmed histologically. Samples were excluded where there was insufficient tissue (< 20 mg wet weight). 10 mL of blood was also collected from each patient before surgery by venepuncture into two vacutainer gel tubes. 5 mL of blood was used for determination of oestradiol, progesterone, CA125 and CRP levels using standard assays. The remaining sample was allowed to clot at room temperature for 1 h, centrifuged at 3000g for 10 min at 4 °C, the serum collected, aliquoted and stored at − 80 °C. In total, 87 women met the inclusion criteria who had donated a serum sample; 45 with endometriosis, 21 controls with pain and no evidence of endometriosis and 21 controls with no pain and no disease (Table 1).

Protein extraction, quality assessment, immunodepletion and pooling
Each snap-frozen tissue sample was weighed and homogenised by grinding in liquid nitrogen into a fine powder.
Ground tissue was lysed in 2D lysis buffer (8 M urea, 4% w/v CHAPS and 10 mM Tris-HCl pH 8.3) using a ratio of 5 μL of lysis buffer per 1 mg of tissue at room temperature to extract protein. Samples were sonicated and centrifuged at 3000g for 10 min at 4 °C to pellet insoluble material. The supernatant was collected and protein concentration determined using a Bradford microtitre plate assay (Pierce) using a bovine serum albumin standard curve. All samples were normalised to the same (lowest) protein concentration using 2D lysis buffer. To assess quality of the samples, 10 μg total protein from each sample were run on NuPAGE ® Novex ® 4-12% Bis-Tris 1.5 mm, pre-cast SDS-PAGE gels (Invitrogen) alongside 10 μg of total protein from human serum. Gels were stained with colloidal Coomassie Blue (Instant Blue gel stain; Expedion) and imaged on a GS-800 densitometer (BioRad). Samples were excluded if they comprised largely of serum proteins (i.e. barely visible tissue protein bands) and/or had low protein staining above 20 kDa. Equal protein amounts from each sample were then pooled into 6 groups according to tissue type, case control status and menstrual cycle phase. These groups were: eutopic tissue from endometriosis cases in the secretory (ES; n = 19) or proliferative phase (EP; n = 9), eutopic tissue from patients with chronic pelvic pain with no evidence of endometriosis at laparoscopy in the secretory phase (PS; n = 7), eutopic tissue from asymptomatic controls scheduled for risk-reducing surgery with no evidence of disease at laparoscopy in the secretory (CS; n = 6) or proliferative phase (CP; n = 7) and ectopic tissue from endometriosis cases in the secretory phase (EcS; n = 11). There were no eutopic samples available for the pain group in proliferative phase or eligible ectopic samples from endometriosis patients in proliferative phase. To improve proteomic coverage, immunodepletion of the 12 most abundant serum proteins was carried out on the pooled samples using Protein Purify 12 (PP12) Human Serum Protein Immunodepletion resin (R&D). Briefly, 500 μL of PP12 immunodepletion slurry was incubated with diluted endometrial tissue lysate (500 μg protein) on a rotary shaker for 30 min. Unbound material was recovered using Spin-X Filter units by centrifugation, the filtrates were concentrated to 25 μL using 5 kDa molecular weight cut-off Vivaspin columns and protein concentration determined using the Bradford method.

2D-DIGE profiling of endometrial tissue lysates
Immunodepleted samples (80 μg total protein) were labelled differentially in triplicate with NHS-cyanine dyes (GE Healthcare) at a dye to protein ratio of 6 pmol/μg protein on ice for 30 min in the dark. Cy2 dye was used to label an internal standard pool prepared by mixing equal amounts of protein from all pools. Reactions were quenched by adding a 20-fold molar excess of l-lysine to dye and incubating on ice for 10 min in the dark. Pairs of differentially labelled samples (Cy3 and Cy5) and Cy2labelled standard were mixed appropriately and reduced by addition of dithiothreitol (DTT) (65 mM final concentration) and carrier ampholines and pharmalytes added to a final concentration of 2% v/v with 1 μL of 2% bromophenol blue. Samples were then run on 9 2D-gels according to our standard procedures [9]. Briefly, Immobiline IPG strips (24 cm; pH 3-10 NL) (GE Healthcare) were rehydrated with labelled samples overnight and isoelectric focusing performed for 80 kVh at 16 °C. Strips were equilibrated with DTT reduction and iodoacetamide (IAM) alkylation steps, transferred onto 10% SDS-PAGE bonded gels cast between low fluorescence glass plates, overlaid with agarose and run for 16 h at 2.2 W per gel. Images were acquired by scanning gels on a Typhoon 9400 multi-wavelength fluorescence imager (GE Healthcare) and analysed using DeCyder Software V6 (GE Healthcare), calculating standardised spot abundances (using the Cy2-labelled standard pool) for all matched spots across the 9 gel images. Differences in spot abundances between conditions were filtered by specifying a 1.5 threshold of average fold-change with P < 0.05 (Student's t test). Pick lists of spots of interest were created and exported to an Ettan spot picking robot (GE Healthcare) for excision from the same colloidal Coomassie Blue post-stained gels. with a 60 min linear gradient of 10-50% buffer B (100% ACN + 0.1% (v/v) FA) at a flow rate of 300 nL/min. Tandem MS was performed in the data-dependent mode to automatically switch between MS (full ion scan) and MS/ MS (fragment ion scan) acquisition. Survey full scan MS spectra (m/z 390-1700) were acquired in the orbitrap with a resolution of 60,000 at m/z 400. The most intense (top 6 ions per survey scan) were sequentially isolated for fragmentation in the linear ion-trap by collision induced dissociation and dynamically excluded for 60 s. Acquired mass spectra were processed using Mascot Distiller version 2.5 (Matrix Science Ltd) and searched against the SwissProt database. The following parameters and search filters were used; MS tolerance was 10 ppm, MS/MS tolerance was 0.5 Da, two missed cleavages were allowed, carbamidomethylation of cysteines was set as a fixed modification, methionine oxidation, acetylation (protein N-term) and deamidation (asparagine and glutamine) were set as variable modifications. Protein identifications were accepted where there were two unique peptides of score > 20 at a Mascot significant threshold of P < 0.05. Protein identifications were matched to specific spots in DeCyder with experimental molecular weights checked against theoretical values.

Profiling of endometrial tissues using a TMT-based 3D-LC-MS/MS strategy
A 6-plex tandem mass tagging (TMT) approach with 3-dimensional peptide separation [strong anion exchange (SAX) chromatography, off-line high pH RPLC and low pH nano-RPLC coupled to MS/MS] was applied to the 6 tissue lysate pools described above to maximise quantitative proteomic coverage. Immunodepleted tissue lysate pools (100 μg each) were re-suspended in 100 mM triethylammonium bicarbonate (TEAB), pH 8.5 and 0.1% SDS, reduced with 1 mM tris(2-carboxyethyl)phosphine for 1 h at 55 °C and alkylated with 7.5 mM IAM for 1 h at RT in the dark. Samples were digested overnight at 37 °C using 4 µg of modified porcine trypsin. Samples were then labelled with TMT reagents (Thermo Fisher). Briefly, each digested pool was labelled with 0.8 mg of one of six TMT reagents (re-suspended in 41 μL of ACN) for 1 h at RT as follows: CS-TMT126, CP-TMT127, PS-TMT128, EcS-TMT129, ES-TMT130 and EP-TMT131. Samples were then incubated with 0.25% hydroxylamine for 30 min at RT to quench the reaction. The six TMT-labelled samples were then combined. SDS was removed using detergent removal spin columns (Pierce) as per the manufacturer's instructions, the samples desalted using 1 cc Oasis HLB cartridges (Waters) as per the manufacturer's instructions, dried in a SpeedVac and re-suspended in 300 μL of 100 mM TEAB pH 8.5. For SAX chromatography, DEAE ceramic HyperD F slurry (300 μL; Pall Corporation) was added to a spin filter unit and centrifuged at 3000 rpm for 2 min to remove storage buffer, then washed with 200 μL of 1 M NaCl in 100 mM TEAB pH 8.5, then three times with 200 µL of 200 mM TEAB pH 8.5 and then equilibrated by washing with 100 mM TEAB pH 8.5, followed by centrifugation at 3000 rpm for 2 min to remove the supernatant. The pooled sample was then incubated with the resin on a rotary shaker for 5 min. Un-bound peptides were removed by centrifugation followed by washing with 200 µL of 100 mM TEAB and the two fractions combined as the flow-through. Bound peptides were then sequentially eluted using 200 μL × 2 of increasing salt concentration buffer; 100 mM TEAB plus 400 mM NaCl, 600 mM NaCl and 1 M NaCl. The two eluates at each salt concentration were combined and desalted using 1 cc Oasis HLB cartridges, vacuum dried and stored at − 20 °C. For high pH RP-LC, peptide fractions were re-suspended in 45 μL of 20 mM ammonium formate pH 8.5 and fractionated on a Poroshell 300SB-C18 (5 μm, 2.1 × 75 mm) column using an Agilent 1100 series microflow pump. Briefly, 40 µL of re-suspended peptides were injected onto the column in 20 mM ammonium formate pH 8.5 and peptides fractionated using a 0-45% linear gradient of 20 mM ammonium formate pH 8.5 in 80% acetonitrile at a flow rate of 200 µL/min for 55 min. 30 fractions were collected for each of the 4 SAX fractions (total 120 fractions). Each fraction was dried, re-suspended in 200 μL of 0.1% FA and re-dried prior to MS analysis.
LC-MS/MS was carried out essentially as described above, using a 90 min linear gradient of 10-50% buffer B, with data-dependent acquisition of the top 3 ions for fragmentation by both CID and HCD (collision energy 40%) for optimal reporter ion measurement. For HCD, product ions were detected in the orbitrap at a resolution of 7500. Proteome Discoverer version 2.4 was used for protein identification and quantification using Mascot for searching the SwissProt database. Search parameters were as described above, except that only one missed cleavage was allowed. A co-isolation threshold of 25% was set in the quantification method to limit the recording of reporter ion ratios from multiple peptides. Protein groups with a ratio above 1.5 or below 0.67 for each comparison were considered to be differentially expressed.

Gene ontology and pathway enrichment analysis
Differentially expressed proteins were imported into WebGestalt [10] and each clinical group analysed separately for enrichment of GO biological process, molecular function and cellular component, GO Slim terms, protein interaction networks, KEGG pathways and disease association. Significantly enriched terms were identified using a hypergeometric test with a Benjamini-Hochberg (BH) correction at a significance of P < 0.05. The top 10 GO terms with the most significant P values were reported. In each comparison, the protein lists were analysed separately as up-or down-regulated proteins.

Selection and verification of candidate biomarkers
For selection of proteins for verification, a biomarker scoring system was devised based on fold-change, TMT reporter ion ratio count, variability, number of unique peptide sequences, whether they were possible serum contaminants and their membership in 1 of 5 expression clusters. Clustering was performed using Graphical Proteomics Data Explorer (GProX) based on reporter ion ratios for the different comparisons with a higher score given to candidates whose expressions differed between endometriosis and both control groups and in both cycle phases. Selection was also weighted based on prior knowledge of function, proteins known to be secreted and those for which commercial detection reagents were available. LUM, TPM2, CPM, PAEP and TNC were selected from the discovery profiling work.
Promising candidate markers reported in the literature (sICAM1, MCP1, MIF, IL1R2, VEGF and FST) were also tested using commercial ELISA kits. Assays were tested for reproducibility and technical sensitivity to ensure the protein of interest could be accurately detected in serum samples. Optimal dilutions and intra-assay CVs for the assays are shown in Additional file 1: Table S1. Assays were carried out according to the manufacturers' instructions using the complete study set.

Statistical analysis
Data analysis was carried using Graphpad Prism software version 5.0.1 and the R statistical software environment. For samples where the protein of interest was determined to be below the limit of detection of the assay, the value of the lowest standard for that assay was used. The Shapiro-Wilk test was applied to test data distribution with an unpaired t test used to compare groups for normally distributed data and the Mann-Whitney test used for non-normally distributed data. For each clinical group, the data was analysed independently of cycle phase and then by cycle phase to assess menstrual cycle dependency. A P value of < 0.05 was considered significant. To determine the diagnostic performance of each biomarker, ROC curve analysis was performed with the area under the curve (AUC) reported for each comparison [endometriosis (E) vs. no-pain control (C) and endometriosis vs. pain control (P)]. Multi-variate analysis was carried out to assess the diagnostic performance of combined marker panels using logistic regression, reporting cross-validated (leave-one-out) AUC and sensitivity at fixed specificity.

Discovery proteomic profiling by 2D-DIGE
In total, 122 pre-menopausal women were consented to the study with 87 women meeting the inclusion criteria. The set was divided into cases who were diagnosed with endometriosis (n = 45) and two control groups comprising women with pelvic pain (n = 21) or no known disease and no pain (n = 21). Eutopic and ectopic endometrial tissue specimens from these women were lysed and first quality-assessed by SDS-PAGE with colloidal Coomassie Blue staining. It was evident that some samples were heavily contaminated with blood proteins, impairing the ability to visualise tissue-derived proteins at that protein load, or had low protein staining overall (see Additional file 1: Figure S1). These tissue samples were subsequently excluded from further analysis. The remaining samples were pooled (based on equal protein amount) into six groups by clinical group and cycle phase with 6-20 tissue samples pooled per group (designated as CS, CP, PS, ES and EcS-see "Materials and methods" section). With blood protein contamination in mind, immunodepletion of the 12 most abundant serum proteins was undertaken to improve tissue protein coverage (Additional file 1: Figure S1). Contamination was significant, as immunodepletion resulted in protein yields of 20-25% of starting material, although the approach did reveal lower abundance (tissue-derived) proteins.
2D-DIGE profiling was undertaken, analysing the 6 pools in triplicate for differential protein expression. Merged fluorescent gel images are shown in Additional file 1: Figure S2. Differential expression was assessed using Decyder image analysis software, where spot matching was performed and standardised spot abundances calculated with reference to a Cy2-labelled standard pool (equal mix of all samples) run on all gels. Spot abundances were compared between clinical groups revealing 72 protein spots matching on all 9 gel images that displayed a > 1.5-fold change in standardised abundance (P < 0.05) between one or more clinical groups. Of these, 52 were detectable as well-defined colloidal Coomassie Blue-stained spots. These spots were excised, digested with trypsin and analysed by LC-MS/MS, resulting in 130 confident protein identifications (see Additional file 1: Table S2).
Since multiple proteins were identified in many of the spots, it was not possible to assign which protein was differentially expressed with absolute certainty. Therefore, the most likely protein was assigned based on the number of matched peptides and concordance of theoretical and experimental molecular weights. Differentially expressed proteins comprised cytoskeletal proteins, metabolic enzymes, extracellular matrix proteins, muscle proteins, those involved in protein folding and blood proteins, including haemoglobin. A simple biomarker score was used to prioritise candidates for further testing and was based on proteins displaying the same directionality of differential expression between endometriosis and both control groups and in ectopic versus eutopic tissue and that did not differ between control groups (PS/ CS). High-scoring proteins included lumican (LUM) and tropomyosin β chain (TPM2). Expression of LUM (identified in 3 spots) was higher in the secretory phase endometriosis group compared to both control groups (e.g. spot 708; ES/CS = 1.86, P = 0.005; ES/PS = 2.25, P = 0.004) and in ectopic compared to eutopic tissue (EcS/ES = 2.09, P = 0.008), but not in proliferative phase samples (Additional file 1: Table S2). Similarly, expression of two proteoforms of TPM2 was higher in ectopic versus eutopic tissue and in secretory phase endometriosis compared to control groups [e.g. spot 1548 (EcS/ES = 4.15, P = 0.0002; ES/CS = 3.15, P < 0.0001; ES/PS = 2.36, P = 0.0057)]. These differences suggest menstrual cycle dependency. LUM and TPM2 were selected for further testing as serum biomarkers.

Discovery proteomic profiling by 3D-LC-MS/MS with TMT labelling
To improve depth of coverage for candidate biomarker discovery, immunodepleted tissue pools were subjected to tryptic digestion, 6-plex TMT labelling and extensive peptide fractionation (120 fractions) using SAX, high pH RPLC and low pH nano RPLC linked on-line to tandem MS. Results are presented in Additional file 2. A total of 1581 proteins groups were identified of which 1433 (91%) had quantitative information across all clinical groups. To gain insight into the possible functional consequences of altered protein expression, a functional enrichment analysis was undertaken using GO biological process and KEGG pathway terms for the differentially expressed proteins (> 1.5-fold). The results were somewhat ambiguous. Analysis of proteins up-regulated in eutopic tissue from endometriosis patients, revealed enrichment of macromolecular complex subunit organization, mRNA metabolic process, protein complex disassembly, translational termination, aromatic/cyclic compound metabolic process, nitrogen compound metabolic process, catabolic process and acute inflammatory response, with only enrichment of mRNA metabolic process shared for all three comparisons (Table 2). Down-regulated proteins were also enriched for macromolecular complex subunit organization, mRNA metabolic process, protein complex disassembly and acute inflammatory response, with no enrichment common to the three comparisons. Comparing ectopic and eutopic tissue, nitrogen compound metabolic process, response to wound healing and cytoskeleton organisation were enriched for up-regulated proteins, whilst mRNA metabolic process and catabolic process were enriched for down-regulated proteins. KEGG pathway mapping revealed enrichment of genes involved in metabolic pathways, ribosome, proteasome, spliceosome and notably, regulation of the actin cytoskeleton, focal adhesions and extracellular matrix (ECM)-receptor interactions, although there was no obvious pattern to the enrichment across the comparisons ( Table 2). Disease association enrichment was also ambiguous, although there was a trend of down-regulated gene products linked with carcinoma and neoplasm invasiveness. This is perhaps not surprising since endometriosis shares some features with cancer, such as local and distant invasion, attachment and damage to affected tissues. Numerous muscle-related proteins were identified that were highly expressed in ectopic versus eutopic tissue, including TPM1, 2, 3 and 4, MYLK, MYL6 and 9, PDLIM7, CNN1, CALD1 and TAGLN, suggestive of significant amounts of muscle tissue in the ectopic samples (Additional file 2 and Table 3). ECM proteins including FN1, LUM, COL1A2, COL6A1, COL6A3, COL14A1, PRELP, OGN, DCN, BGN, FMOD and MFAP4, were also highly expressed in ectopic tissue. Protein groups were scored for biomarker potential with carboxypeptidase M (CPM) the highest scoring, displaying increased endometrial expression in endometriosis versus both control groups and in both cycle   Selection of proteins with biomarker score, protein score, numbers of unique peptides and peptide spectrum matches (PSMs) and ratios of expression for the different tissue comparisons. Proteins in italics are of particular note with several selected for serum testing phases (ES/CS = 1.62, ES/PS = 2.53, EP/CP = 2.45) ( Table 3) Notably, LUM and TPM2 were also identified with increased expression in ectopic tissue, similar to that observed by 2D-DIGE profiling, although were not altered appreciably in the eutopic tissue comparisons.

Testing candidates as serum biomarkers of endometriosis
From the tissue profiling, TNC, CPM, TPM2, LUM and PAEP were selected for further testing as serological markers using samples from 87 women (control n = 21; pain control n = 21; endometriosis n = 45) ( Table 1). Biomarker candidates reported in the literature (sICAM1, MCP1, MIF, IL1R2, VEGF and FST) were also tested with CA125, progesterone, oestradiol and CRP. Commercial assays were first assessed using a test pool of all samples to determine optimal sample dilutions and reproducibility (Additional file 1: Table S1) and then run on the full set. Serum measurements were correlated with clinical group and cycle phase. Only CA125 and sICAM1 were significantly elevated (P < 0.05) when comparing endometriosis to both control groups considering all cycle phases (Table 4A). Areas under the ROC curve for the endometriosis versus pain groups were 0.713 for CA125 and 0.722 for sICAM1. TNC was significantly elevated in the pain group versus the control and endometriosis groups. In the secretory phase, CA125 elevation maintained significance when comparing endometriosis to both control groups, whereas sICAM1 was significantly raised only when comparing endometriosis and pain controls (Table 4B). Elevation of VEGF became significant between endometriosis and non-pain controls in secretory phase, whilst FST (lower in endometriosis) was the only candidate found to be significant when comparing groups in the proliferative phase. Together, these data suggest cycle dependency in the serum levels of some of the candidate markers. Multi-marker logistic regression models were generated to assess if candidates would complement one another to improve classification. Generally, cross-validated models for discriminating the endometriosis and pain groups (all phases) performed similarly to those discriminating the endometriosis and non-pain groups with sensitivities of 62-67% at 80% specificity (Table 5). sICAM1 featured in the best models for both groups, whilst CA125 was only included in models for discriminating endometriosis from the non-pain group. The best performing model [sICAM1, FST, TNC] for discriminating endometriosis from pain controls (E vs. P; all phases), gave a sensitivity of 67% at 80% specificity. When considering menstrual phase, for which both control groups were pooled, the best model [ICAM1, FST, oestradiol] gave 77% sensitivity at 80% specificity for detecting endometriosis in the proliferative phase, whilst the best model for secretory phase samples [CA125, MIF, PAEP], gave a sensitivity of 65% at 80% specificity. This again highlights cycle-dependency in the performance of the biomarker panels.

Discussion
The aim of this study was a tissue-based discovery of potential new biomarkers of endometriosis with translation to serum-based tests aimed at non-invasive diagnosis. Promising biomarkers reported in the literature were also assessed. The main challenges experienced in the discovery was the heterogeneity of tissue samples and variable blood contamination. This was evidenced by the presence of variable levels of abundant fibro-muscular and serum proteins across the sample set that will have compromised the quality of the analysis. Whilst pooling would average out some of this heterogeneity, it is possible that outlier samples may skew the data leading to a higher false discovery rate. Despite this, a number of differentially expressed proteins were identified as potential biomarkers that were assayed in serum samples and which contributed to multivariate models that could discriminate endometriosis from either or both control groups with reasonable accuracy.
Notably, we did not find CA125 or any of the previously reported candidate biomarkers using the proteomic approaches described. CA125 exists at relatively low abundance and is an extremely large (up to 4 MDa), heterogeneously glycosylated protein. These properties would make sampling of CA125 using these methods unlikely-the protein is not likely to resolve well on 2D gels due to its large size and heterogeneity, whilst its heavy modification may hinder efficient tryptic digestion and decrease mass matching in database searches.
Similarly, we postulate that the other candidates (sICAM1, MCP1, MIF, IL1R2, VEGF and FST) exist at relatively low abundance in tissue, reducing the chance of their identification. This highlights that coverage of the proteome is still somewhat limited despite the multidimensional approach used in our methodology.
Tissue profiling identified numerous differentially expressed proteins implicated in the implantation of endometriotic tissue beyond the endometrium and/or involved in disease progression. Although the enrichment analysis was somewhat ambiguous, multiple proteins involved in cytoskeletal and ECM organisation and cellmatrix interactions were enriched. This supports findings from previous studies [11][12][13][14]. Adhesion to the peritoneal ECM and invasion of retrograde-shed endometrial cells is one of the vital stages in implantation of ectopic endometrial cells and it is tempting to speculate that the adhesion/ECM proteins identified herein play important roles in this process. Lumican (LUM) has a role in cell migration and proliferation during embryonic development, tissue repair and tumour growth through regulation of matrix metalloprotease activity [15] and collagen fibrillogenesis [16]. In the endometrium, LUM expression was reported to increase in the secretory phase [17], in agreement with our findings. We also found multiple collagens and other ECM glycoproteins and integrin ligands to be overexpressed in endometriotic samples, and particularly in ectopic versus eutopic tissue. One such protein, tenascin C (TNC), was reported to be regulated across the menstrual cycle and aberrantly in endometriosis [18][19][20]. Our data suggest that its cyclic expression is lost in the endometrium of endometriosis patients, and whilst we did not observe TNC overexpression in ectopic lesions, its increased expression may promote adhesion and invasion of endometrial cells at ectopic sites. Tropomyosins TPM1 and TPM2 play a role in muscle contraction, motility, maintenance of cell shape and cell-matrix interactions through stabilisation of actin filaments. Expression of TPM2 was higher in ectopic versus eutopic tissue and increased in secretory phase eutopic tissue from endometriosis patients compared to both control groups. This may point to its involvement in mediating cellular structural changes that allow movement and invasion of endometrial cells. The similarly expressed smooth muscle actin-binding protein TAGLN, may also be involved in this process and supports previous findings [21]. Increased expression of TPM1, TPM2 and TAGLN may be the result of a higher smooth muscle content of deep endometriotic lesions. Another selected candidate was CPM; a GPI-anchored extracellular peptidase that functions in processing of peptide hormones, chemokines and growth factors. It has a reported role in inflammation and stem cell mobilisation and has been shown to be up-regulated in endometrial epithelial cells during the proliferative phase [22]. The increased expression in endometriosis observed herein, may support the inflammatory response, although its lower expression in ectopic lesions is somewhat at odds with this.
Of particular note was the differential expression of the progesterone receptor (PGR). Several genes found to be dysregulated in the endometrium of endometriosis patients are known progesterone targets (Foxo1a, Mig6 and Cyp26a1) and their overall pattern of expression suggested prolongation of the proliferative phase [23]. Indeed, incomplete transition of the endometrium from the proliferative to the secretory phase is a hallmark of endometriosis that has been attributed to progesterone resistance [24]. Our finding of reduced PGR expression in ectopic tissue has been previously reported as a possible mechanism of progesterone resistance [25]. However, whether the observed higher expression of PGR observed in eutopic endometrium from endometriosis patients is involved in progesterone resistance, or a response to it, is unclear. In part agreement with the literature [26][27][28], we also showed reduced expression of PGRMC1 and the progesterone target gene PAEP in endometriotic lesions; changes that may also contribute to progesterone resistance in ectopic lesions.
CA125 was identified as the best single marker for discriminating endometriosis from controls and featured Table 5 Performance of cross-validated multi-marker models for discriminating endometriosis from control groups Models were generated by logistic regression using up to 4 candidates with cross-validation by leave-one-out. The best performing models (by sensitivity) and area under the ROC curve (AUC) are reported for each comparison at fixed specificities of 0.90 or 0.80. E = endometriosis; C = no pain controls; P pain controls. Control groups were pooled (C + P) for some of the analyses prominently in the best-performing multimarker models. It is noteworthy that CA125 is currently the best single serum marker for ovarian cancer, although its median level in the endometriosis samples fell below the clinical threshold of 35 U/mL used for ovarian cancer detection. CA125 has been investigated extensively as a circulating marker of endometriosis, although lacks diagnostic accuracy when used alone. Our data supports this notion, with serum CA125 giving 40% sensitivity at 90% specificity in our cohort. Its performance was also cycle-dependent, being better at discriminating secretory phase samples. Cyclic differences in CA125 levels in endometriosis have been reported previously [29][30][31][32], although the diagnostic benefit of taking cycle stage into account is unclear. Soluble ICAM1 has also been investigated as a circulating marker with conflicting reports on its usefulness as a biomarker [33][34][35][36][37]. Our data do not support its use alone as a diagnostic marker, however its inclusion in our best classification models, suggests its value when combined with other markers, particularly CA125, FST and TNC. Previous studies have tested panels of serum or plasma biomarkers, including those investigated here. Kocbek et al. reported a model using the ratio of leptin to glycodelin/PAEP and age, with 83.6% sensitivity and 83.8% specificity for distinguishing ovarian endometriosis from controls independently of cycle phase [38]. Although also tested in this study, ICAM1, VEGF, CRP and MCP1 were not included in the reported best models, whilst CA125 was not assessed. In another study, 28 plasma proteins were assessed in a large patient cohort, with an independently validated model comprising CA125, annexin V, VEGF and sICAM1/or PAEP giving 81-90% sensitivity and 63-81% specificity for detecting endometriosis in the menstrual phase [39]. Other reported models including CA125, MCP1 and/or MIF showed reasonable diagnostic accuracies [40,41]. Whilst MCP1 added little to our models, MIF was in the best models for discriminating endometriosis from both control groups, particularly for the secretory phase. Our data thus corroborate CA125, sICAM1, PAEP, MIF and FST as potentially useful diagnostic markers when combined in multivariate models.

Conclusions
In conclusion, we have identified molecular changes associated with endometriosis in eutopic and ectopic tissue and have derived non-invasive, cycle phase-specific diagnostic models for endometriosis with respectable performance characteristics that are similar, if not better, than those reported previously. Our study has some weaknesses. Firstly, models were only tested by cross-validation on the same dataset with some evidence of overfitting and subgrouping by cycle phase has underpowered the study. Thus, validation of these models in a larger independent cohort is necessary and should include samples from patients with other gynaecological conditions presenting with similar symptoms, particularly gynaecological malignancies. This would allow better assessment of the specificity of the biomarker panels. Secondly, our best biomarker model for discriminating endometriosis from the more relevant pain control group [sICAM1, FST, TNC] provided a sensitivity of 67% at 80% specificity, and so its usefulness as a triage test for guiding surgery is debatable. However, taking cycle phase into account, one model provided 61.5% sensitivity at 90% specificity, and so might be acceptable for triaging. Thirdly, whilst we show improved model performances by taking cycle-phase into account (particularly for proliferative phase), this type of testing may be difficult to implement in clinical practice should the models be validated.