Study Design and Workflow
The study design and workflow for discovery and validation of methylation markers is shown in Additional file 1: Fig. S1. First, we conducted in silico analysis of the TCGA-CESC, -UCEC array data collection and identified candidate markers, which we technically validated in external 450K datasets. Next, we confirmed these findings by testing the selected markers using QM-MSP on tissues and cervical smears from the U.S, Vietnam, and S. Africa (Fig. 1). Finally, we evaluated the association between methylation and expression, and methylation and HPV status.
Marker Discovery
The TCGA-CESC and UCEC array datasets were used to interrogate 307 cervical cancers, and 48 normal tissues in order to identify markers of cervical carcinoma (Fig. 2). Principal component analysis revealed clear separation between cervical carcinoma and normal tissues (Fig. 2A). For discovery of cervical tumor-specific markers, the array probes were serially filtered in several steps as described in Additional file 1: Fig. S1. The 14 top candidate cervical cancer markers were evaluated as shown in the histogram of cumulative β-methylation in the tumors (Fig. 2B). To arrive at the 5 CpG probes from the top 14 candidate markers selected in this study, 1) we further refined probe selection for achieving a high level of specificity by eliminating the remaining nine probes which were found to have beta methylation values higher than 0.05 units among the 48 normal samples. Data was thus reduced to 5 CpG probes that clearly distinguished SCC and adenocarcinoma from normal tissues (P < 0.0001, Mann-Whitney (Fig. 2C). Descriptive statistics for the 5 markers are provided in Additional file 1: Table S1.
In the TCGA databases examined, we observed significant cumulative methylation (CM) levels of the 5-marker panel in squamous cell carcinoma (SCC, N = 254) and also in the rarer histological type, adenocarcinomas (AC, N = 53) compared to normal tissues in the TCGA-CESC database. Interestingly, CM-5 levels were not significantly different (P = 0.1319) between SCC and AC (Fig. 2C). This suggested this marker panel could be broadly useful among different types of cervical carcinoma. The selected probes recognize CpG sites in FMN2, EDNRB, ZNF671, TBXT and MOS, as summarized in Additional file 1: Table S1; Additional file 2: Table S2 provides probe index (ID), gene name, location and function.
The TCGA-CESC methylation profile of the 5-marker panel was examined in three other publicly available datasets (Additional file 1: Fig. S2). The analysis confirmed that the selected markers showed high levels of tumor-specific methylation, and low levels of methylation in normal tissues in two databases, GSE68339 [20], and GSE21168 [21]. Intermediate levels of methylation were present in cervical intraepithelial neoplasia that increased with grade in GSE143752 [28].
Association between methylation of the five CpG markers and gene expression
CpG methylation can lead to gene silencing and subsequent loss of tumor suppressor function [8]. To determine if there was a relationship between gain in methylation in the 5-marker panel and loss of gene expression, we plotted TCGA-CESC methylation (Fig. 3A) and RNA-seq expression data (Additional file 1: Fig. S3B) for the same CESC cases. Methylation of all 5 CpG markers was significantly higher in tumor than normal (Additional file 2: Fig. S3A). By RNA-seq, ZNF671, EDNRB, and FMN2 showed a high level of expression in normal and low expression in tumors. In contrast, gene expression was low in both normal and tumor samples for TBXT and MOS (Additional file 1: Fig. S3B). Thus, we observed a loss of expression for ZNF671, EDNRB, and FMN2, but not for TBXT and MOS.
Analytical validation of 5-marker panel in FFPE tissue from U.S., S. Africa and Vietnam
To determine if the 5-marker panel validated by in silico analysis of array datasets will show equally high performance when examined by a laboratory test, we performed analytical validation of the markers using the quantitative methylation-specific PCR method, QM-MSP. This assay was performed on clinical FFPE tissue samples (N = 293). The samples from the U.S. (N = 63; Fig. 3A) were macrodissected, while whole tissue sections were used for samples from S. Africa (N = 69; Fig. 3B), and Vietnam (N = 120; Fig. 3C). In tissue from U.S., cumulative methylation of the 5 markers (CM-5) distinguished between normal and SCC with 100% sensitivity and 91% specificity (ROC AUC = 1.000, 95% CI 1.000 to 1.000, P = 0.0001) and between normal and CIN3 at 100% sensitivity and 91% specificity (ROC AUC = 1.000, 95% CI 82.41 to 100.0; 95% CI 1.000 to 1.000, P < 0.0001) (Fig. 3A). Considered individually, the markers achieved significantly higher methylation in both SCC and CIN3 (P < 0.0001; Additional file 1: Fig. S4, Additional file 2: Table S3). In tissue from S. Africa, CM-5 distinguished between normal and SCC with 100% sensitivity at 95.65% specificity (ROC AUC = 1.000, 95% CI 1.000 to 1.000, P < 0.0001) and between normal and CIN2/3 with 78.26% sensitivity and 95.65% specificity (ROC AUC = 0.928, 95% CI 0.851 to 1.000, P < 0.0001 ) (Fig. 3B). In tissue from Vietnam, CM-5 distinguished between normal and SCC with 100% sensitivity at 93.33% specificity (ROC AUC = 1.000, 95% CI 1.000 to 1.000, P < 0.0001), and between normal and CIN2/3 with 54.84% sensitivity at 93.33% specificity (ROC AUC = 0.793, 95% CI 0.681 to 0.905, P = 0.0001). In contrast, CIN1 methylation was not significantly different from normal (Mann-Whitney P = 0.259) (Fig. 3C).
In summary, in cervical tissues obtained from three regions of the world, SCC and CIN3 could be detected with very high sensitivity and specificity. Considered together or separately, in each geographic region, progressively higher methylation occurred as a function of increasing severity of dysplasia.
Assay validation in cervical smears from U.S., S. Africa, and Vietnam
Cervical smears are often performed in the screening setting to collect cells from the cervix and vagina for cytological analysis for early detection of precancerous lesions. The potential clinical utility of the 5-marker panel to detect the presence of CIN3 + disease in cervical smears was evaluated by QM-MSP in a total of 244 cervical samples from the U.S., Vietnam, and S. Africa. Data from all three countries was pooled to assess CM-5 of SCC, HSIL, LSIL and normal (Fig. 4). The histogram (Fig. 4A) and box plot (Fig. 4B) showed that CM increased progressively with higher grades of neoplasia. The assay distinguished between normal and SCC with 86.84% sensitivity and 95.35% specificity (ROC AUC = 0.925, 95% CI 0.878 to 0.974, P < 0.0001), and between normal and HSIL with 73.77% sensitivity and 95.35% specificity (ROC AUC = 0.907, 95% CI 0.851 to 0.964, P < 0.0001).
Considered individually, each of the five markers distinguished HSIL from normal in the U.S. sample set (N = 77) (Additional file 1: Fig. S5). As shown in the histograms, cervical smears of HSIL (N = 38) showed higher levels of methylation compared to normal (N = 38). Individually for each gene, ROC AUC ranged from 0.861 (95% CI 0.776 to 0.947, P < 0.0001) to 0.933 (95% CI 0.875 to 0.991, P < 0. 0001) (Additional file 1: Fig. S5). Thus, each of the five markers sensitively detected HSIL in cervical smears.
Analysis of the 5-marker panel in paired tissue and cervical smears from the same individuals
We performed a pairwise comparison between the tissue and cervical smear from the same individual to evaluate whether the methylation results agreed between the sample types. The QM-MSP results of 92 samples from Vietnam were re-analyzed using data presented in Fig. 3C (tissue) and Fig. 4 (cervical smears). Histogram plots show CM-5 in tissue and cervical smears of patients diagnosed with SCC, HSIL, LSIL and benign lesions (Fig. 5). There was a high level of agreement in CM-5 methylation between pairs for SCC (23/25) (Fig. 5A), HSIL (17/25) (Fig. 5B), LSIL (20/21, Fig. 5C) and benign (23/23, Fig. 5D). In SCC, discordance was observed in 2 instances where the tissues were positive while the smear was negative (Fig. 5A). In HSILs, discordance was observed in 4 pairs where smears were positive while tissues were negative, and in four pairs where tissues were positive while smears were negative (Fig. 5B). In LSILs, methylation was consistently low in both tissues and smears (19/21) (Fig. 5C). There were two outliers; in one, methylation was very high in both tissue and smear, while in the other, the tissue was positive while the smear was negative. Strikingly, in all 23 pairs of benign tissues and smears, methylation was below the threshold for normal (Fig. 5D). We concluded that, with few exceptions, cervical smears provided a good reflection of the histopathology of the tissue.
To further clarify the source of samples of tissue, smears, and how many among them were tissue/smear pairs from the same patient, a detailed table of patient samples used in this study is shown in Additional file 1: Table S4. Available demographic data is presented for patient samples from the U.S (Additional file 1: Table S5) and Vietnam (Additional file 2:Table S6).
The 5-marker panel is methylated in both Human Papilloma Virus (HPV)-positive and HPV-negative cervical cancer
The majority of cervical carcinomas are HPV-positive. HPV testing is therefore recommended throughout the world to screen for cervical cancer [29, 30]. Under these circumstances, the 3–10% of carcinomas that are HPV-negative for all the HPV-subtypes currently tested can be missed [31]. To determine whether the 5-marker panel detects cervical carcinoma in both HPV-positive and HPV-negative cases, TCGA-CESC/UCEC and GSE68339 [20] databases were analyzed, correlating β-methylation levels of the 5-CpG marker panel to HPV status (Fig. 6). HPV-negative carcinomas represented 5.5% (17 of 307 cases) in TCGA-CESC (Fig. 6A) and 7.4% (20 of 268 samples) in GSE68339 datasets (Fig. 6B, Additional file 1: Table S7). As observed in the histogram and box plot of the TCGA datasets, HPV-negative samples had significantly higher cumulative β-methylation (P < 0.0001) in the 5-marker panel compared to normal cervix and uterus (N = 48) In the HPV-negative TCGA-CESC samples, 71% (12/17) of tumors were hypermethylated (Fig. 6A). In the GSE68339 dataset of 268 cancers, where HPV status was determined by a qPCR assay, 95% (19/20) of HPV-negative carcinomas were hypermethylated compared to the normal samples in the TCGA-CESC dataset (Fig. 6B). Interestingly, in both data sets HPV-negative samples were found to have significantly lower methylation than HPV-positive samples (P < 0.0001). Although the numbers were small, the results suggested that the 5-marker panel detects both HPV-positive and HPV-negative samples with similar sensitivity.
Correlation of age to methylation. It is well established that human aging is associated with characteristic changes in DNA methylation throughout the genome [32–34]. To estimate the size of this effect in our markers, we fit linear regression models between age and DNA methylation level (Additional file 1: Fig. S6). In general, we observed that methylation level of our 5-marker signature increased with patient age, although the association did not reach statistical significance (range r = 0.053 to 0.096 with P = 0.604 to 0.781) in any of our study populations. A negative correlation was observed in the samples from U.S. (r = -0.493, P = 0.123). Importantly, the changes in methylation with age were small in absolute terms, and unlikely to lead to misclassifications. Thus, we concluded that CM-5 methylation was not significantly correlated with age in either CESC and UCEC normal datasets or in QM-MSP data on our study samples.