Coupling of serum CK20 and hyper-methylated CLIP4 as promising biomarker for colorectal cancer diagnosis: from bioinformatics screening to clinical validation

Colorectal cancer (CRC) is one of the most common and lethal malignancies. The identification of minimally invasive and precise biomarkers is an urgent need for the early diagnosis of CRC. Through bioinformatics analysis of 395 CRC tissues and 63 CRC cell lines, CK18, CK20, de-methylated HPDL and hyper-methylated CLIP4 were identified as candidate serum biomarkers. Then, a training cohort consisting of 60 CRC, 30 colorectal adenomas (CA) and 33 healthy controls and a validation cohort consisting of 60 CRC, 30 CA and 30 healthy controls were enrolled. In the training cohort, enzyme-linked immunosorbent assay (ELISA) showed that CK18 and CK20 were all significantly higher in CRC and CA. CK18 diagnosed CRC with 46.67% sensitivity and 87.3% specificity; CK20 diagnosed CRC with 28.33% sensitivity and 90.47% specificity. Methylation-specific PCR (MSP) indicated that de-methylated HPDL and hyper-methylated CLIP4 were significantly detected in CRC and CA. De-methylated HPDL diagnosed CRC with 36.67% sensitivity and 93.65% specificity and hyper-methylated CLIP4 with 73.33% sensitivity and 84.13% specificity. Random combined analysis suggested that CK20/hyper-methylated CLIP4 diagnosed CRC with 91.67% sensitivity and 82.54% specificity. In the validation cohort, CK20 diagnosed CRC with 36.7% sensitivity and 88.3% specificity and hyper-methylated CLIP4 with 80% sensitivity and 85% specificity. CK20/hyper-methylated CLIP4 diagnosed CRC with 95% sensitivity and 81.7% specificity. Compared with serum biomarkers reported before, CK20/hyper-methylated CLIP4 possessed the potential to be a new effective and precise diagnostic biomarker for CRC.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common malignancies worldwide and has high mortality rates. In recent years, the incidence of CRC has risen. CRC has become the third most common cancer among males and the second most common cancer among females [1]. CRC development is a complex multistep process that involves a gradual progression from adenomatous polyps to adenomas, and then to malignant carcinomas [2]. From a clinical perspective, CRC is difficult to diagnose early, as patients do not present with symptoms such as colorectal bleeding or anemia until later stages, and the survival rate decreases as the stage of diagnosis increases. Therefore, early detection and rapid diagnosis are important for CRC screening and treatment. Blood serum contains a certain amount of secretory proteins and cell-free DNA (cfDNA) derived AGING from all cells in the body and could be a useful material for screening CRC.
Aberrant DNA methylation changes have previously been shown to be an early event in the development of CRC [7] can be detected in cfDNA, making it an ideal and useful biomarker for the early detection of CRC [8,9]. Currently, various tumor suppressor genes have emerged as potential blood-based methylation markers for CRC including APC, MGMT, hMLH1, HLTF, ALX4, NGFR, TMEFF2, NEUROG1, SERP2, VIM, RASSF2A, WIF1, RUNX3 and SEPT9 with sensitivities spanning from 34% to 90% and specificities ranging from 69% to 100% [10,11].
With the vast amounts of CRC transcriptomics and DNA methylomics data that are continuously generated and easily accessed from published sources, it is possible to use bioinformatics to screen biomarkers for CRC diagnosis, specifically and systematically. In this study, The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx) [12], Cancer Cell Line Encyclopedia (CCLE) [13], Gene Expression Profiling Interactive (GEPIA) [14], Human Protein Atlas (HPA) [15], UCSC [16], UALCN [17] and MEXPRESS [18] were used to screen specific secretory protein-encoding genes, de-methylated overexpressed genes and hypermethylated underexpressed genes in CRC tissues and cell lines. Then, these candidate biomarkers in CRC cell lines and clinical serum samples including CRC, colorectal adenoma (CA) patients and healthy controls, were detected and the relationship with clinicopathologic parameters and their value as CRC diagnostic markers were analyzed.

Bioinformatics analysis
mRNA data of 395 CRC patients were downloaded from the TCGA database. The "limma" package was used to calculate the DEGs between CRC tissues and normal colorectal tissues, and the filter was applied according to the thresholds |log2FC|>1 and P value <0.01. Specifically overexpressed or underexpressed genes in CRC tissues were verified by GEPIA. Overexpressed genes in CRC cell lines were selected by CCLE. Genes that encoded secretory proteins were screened according to HPA. The methylation status in CRC tissues and the CpG island locations of candidate genes were checked by UALCAN, MEXPRESS and UCSC.

Clinical specimens
Serum and tissue samples were obtained from the First People's Hospital of Yunnan Province and the Third People's Hospital of Yunnan Province with informed consent, comprising a training cohort (60 CRC, 30 colorectal CA and 33 healthy controls) and a validation cohort (60 CRC, 30 CA and 30 healthy controls). The diagnosis of CRC was verified by endoscopy and pathological biopsy. None of the patients had received prior radiotherapy, chemotherapy or surgery treatment when blood samples were collected. In addition, 1 placental sample was used as a control to test the methylation status of HPDL and CLIP4.

Cell culture and treatment
Seven human CRC cell lines (HT29, HCT116, SW480, SW620, RKO, DLD-1 and LOVO) and one normal colon cell line (CCD841CON) were obtained from the cell bank of the Chinese Academy of Sciences (Shanghai, China). All of cell lines were cultured in DMEM medium containing 10% fetal bovine serum (BI) and 100 IU/ml penicillin and streptomycin (Gibco) and maintained in 37°C in a humidified incubator with 5% CO2. For de-methylation treatment, cultured cells were incubated with 10 µm 5-aza-2'-deoxycytidine (Sigma, USA) for 3 days with medium changed every day.

Quantitative PCR (Q-PCR) and Real-time PCR (RT-PCR)
The mRNA expression of candidate genes was analyzed by Q-PCR and RT-PCR. Total RNA was extracted with a Tissue Total RNA Isolation Kit (TSINGKE, China) and cDNA was obtained with a PrimerScript ™ RT Reagent Kit (TSINGKE, China). Q-PCR was performed with 2 × Taq PCR Master Mix (TIANGEN, China). Real-time PCR was performed with EvaGreen 2 × qPCR MasterMix (Takara, Japan) in a CFX96TM Real-Time PCR System (BioRad, USA). The PCR reaction conditions were listed as follows: predenaturation at 94°C for 1.5 min, 30 cycles of predenaturation at 94°C for 10 s, annealing at 60°C for 20 s, extension at 72°C of 30 s, and ultimate extension at 72°C of 1 min. Primer sequences (10 µM concentration), annealing temperatures, and product sizes are listed in Table 1. The expression of the assayed genes was normalized to GAPDH.

Methylation-specific PCR (MSP)
The methylation status of HPDL and CLIP4 was detected by methylation-specific PCR assay utilizing the abovementioned bisulfite-modified DNA as templates, according to the previously mentioned protocols [19]. The methylated and de-methylated specific primer sequences (10 µM concentration), annealing temperatures, and product sizes are listed in Table 1. PCR products were evaluated by electrophoresis on ethidium bromide (EB)-stained 2% agarose gels. The sample was considered de-methylated HPDL when only a visible band was detected in unmethylation primer allele. The sample was considered hyper-methylated CLIP4 when a visible band was detected in the methylation primer allele. All of the samples were amplified twice to check the accuracy of the results.

Statistical analysis
The differences in CEA, CK18, CK20, MUC13, CK8 and EPCAM among the study groups were compared via nonparametric analysis. The correlations between CK18, CK20, de-methylated HPDL, hyper-methylated CLIP4 and clinicopathologic parameters were evaluated by the chi-square test or Fisher's exact test.
To evaluate the validity of each studied parameter, sensitivity and specificity were used. All statistical analyses were performed using SPSS 19.0 (SPSS Inc., USA).

De-methylated HPDL was observed in CRC and CA serum
Normally, DNA de-methylation can lead to genome instability and high expression of oncogenes. Based on the previous bioinformatics analysis results, among 74 specifically overexpressed genes, UCSC showed that 19 genes possessed CpG islands in their promoters or the first exon region (Supplementary Table 4; Figure 3A). Detecting the expression of 19 genes in 7 CRC cell lines and CCD841CON revealed that HPDL, LGR5, ASCL2, KCNE3, HNF4G, KRT8, KRT18, SLC12A2 and FERMT1 were significantly overexpressed in CRC cell lines ( Figure 3B). To test the relationship between DNA methylation status and the expression of these genes in CRC, the expression of 9 genes in 7 CRC cell lines treated with 5′-aza-2′-deoxycytiding (DAC) were detected. As shown in Figure 3C and 3D, HPDL, KRT8, KRT18, FERMT1 and SLC12A2 were increased in CRC cell lines in response to DAC treatment. According to the CpG island region, MSP primers for these genes were designed and the methylation status of 5 genes in CRC cell lines and a normal colon cell line were tested.
The results revealed that only HPDL presented more demethylation status in CRC cell lines (especially SW620) than CCD841CON ( Figure 4A and 4B). GEPIA and CCLE demonstrated that HPDL was highly expressed in CRC tissues and CRC cell lines; MEXPRESS also revealed that CRC tissues possessed HPDL demethylated regions (Probes ID: cg13951491 and cg16593917) compared with normal tissue ( Figure 5). These results indicated that HPDL overexpressed in CRC may be upregulated by DNA de-methylation.
Because serum contains a certain amount of DNA derived from lysed tumor cells, the methylation status of HPDL was detected in the serum of 60 CRC patients, 30 CA patients and 33 healthy controls. As shown in Figure 4C, HPDL de-methylation was detectable in AGING CRC and CA patients but not in healthy controls. Statistical analysis showed that the de-methylated frequency of serum HPDL was 36.7% (22/60) in CRC patients and 13.3% (4/30) in CA patients ( Figure 4D). Additionally, representative cases consisting of 20 CRC, 10 CA patients and 10 healthy controls were selected to detect HPDL methylation status in serum and colorectal normal or tumor tissue from the same patient. The results indicated that the HPDL methylation status in serum was almost consistent with that in CRC tissues ( Figure 4C).  Table 6), we found that 9 genes showed underexpression and promoter hyper-methylation in AGING CRC tissue. UCSC exhibited that 8 genes possessed CpG islands located in promoters ( Figure 6A). Detecting the expression of 8 genes in 7 CRC cell lines and CCD841CON revealed that CLIP4, GARA1 and UCHL1 were underexpressed in CRC cell lines and overexpressed in a normal colon cell line ( Figure 6B). As determined by Q-PCR and RT-PCR, after DAC treatment, CLIP4 and UCHL1 were upregulated in CRC cell lines ( Figure 6C and 6D). According to the CpG islands located in the promoter, MSP primers were designed and tested the methylation status of 2 genes in CRC cell lines and a normal colon cell line. The results showed that CLIP4 presented significant hyper-methylation in CRC cell lines and total de-methylation in a normal colon cell line ( Figure  7A and 7B). GEPIA and UALCAN also indicated that CLIP4 was underexpressed and hypermethylated in CRC tissue ( Figure 8). By detecting the methylation status of CLIP4 in serum from 30 CRC, 20 CA patients and 33 healthy controls, it was found that CLIP4 hyper-methylation was detectable in CRC and CA but not in healthy serum AGING ( Figure 7C). By statistical analysis, the hypermethylation frequency of serum CLIP4 was 73.3% (44/60) in CRC and 33.3% (10/30) in CA patients ( Figure 7D). Furthermore, representative cases consisting of 20 CRC patients, 10 CA patients and 10 healthy controls were chosen to detect the CLIP4 methylation status in serum and colorectal normal or tumor tissue from the same patient. The results illustrated that the CLIP4 methylation status in serum was completely consistent with that in CRC tissue ( Figure 7C).  AGING characteristics of the patients and controls are summarized in Table 2.
The relationships between CK18, CK20 or HPDL, CLIP4 methylation status and various clinicopathologic parameters in CRC patients are summarized in Table 3.
According to the results, in the training cohort, CK18 was significantly correlated with TNM stage, differentiation grade, CEA and CA19-9 (all P < 0.05). CK20 was closely correlated with tumor size and CA199 (P < 0.05). De-methylated HPDL was apparently associated with tumor size, CEA and CA199 (P < 0.05).
Hyper-methylated CLIP4 was markedly associated with differentiation grade and CEA (P < 0.05) in CRC patients. In the validation cohort, CK20 was significantly correlated with tumor location and CA199 (P < 0.05  (Table 4). Considering sensitivity and specificity, CK20/hyper-methylated CLIP4 was a potential diagnostic biomarker for CRC.

DISCUSSION
Cytokeratin is a conserved group of proteins that form the cytoplasmic structure of epithelial cells and tissues. Cytokeratin 20 (CK20) is a type 1 cytokeratin. It is a prominent component of the intestinal epithelium. CK20 expression is confined to astrointestinal epithelium, urothelium, and Merkel cells of the epidermis, as well as malignancies that originate from the aforementioned sites [20]. According to previous studies, Y Imai indicated that CK20 expression in tumor tissues was an independent prognostic factor of poorly differentiated adenocarcinoma of the colon and rectum [21]. As one of the most investigated markers for the detection of circulating CRC cells, CK20 mRNA in serum is widely tested by RT-PCR for predicting recurrence and poor prognosis of CRC [22][23][24][25][26][27][28][29]. However, the efficacy of CK20 protein in serum as a biomarker for early CRC screening and diagnosis is not clear. In this study, we offered a precise value of serum CK20 protein in CRC diagnosis with 28.33% sensitivity and 90.47% specificity in the training cohort and 36.7% sensitivity and 88.3% specificity in the validation cohort. We also detected that CK20 presented higher levels in CA patients with a rate of 16.67% in the training cohort. This result indicated that CK20 possessed diagnostic potential for early CRC screening.
CLIP4, as a member of the CAP-Gly domain containing linker protein (CLIP) family, which is involved in plusend binding of microtubules, has been implicated in immune response-related biological processes, cell migration and viability in certain cancer metastases [30]. Hyper-methylation of CLIP4 has been shown diagnostic potential for CRC in serum [31]. S.O. Jensen reported that hyper-methylated CLIP4 was capable of AGING distinguishing serum from CRC patients and healthy controls (the area under the curve was 0.88) [32]. By testing the methylation status in CRC serum, we found that serum hyper-methylated CLIP4 detected CRC with a sensitivity of 73.33% and specificity of 84.13% in the training cohort and 80% sensitivity and 85% specificity in the validation cohort. We also detected hypermethylated CLIP4 in CA patients at a rate of 33.3% but not in healthy controls. This implied that serum CLIP4 hyper-methylation could be used for early CRC screening.
Due to the highly heterogeneous nature of CRC, a single tumor marker is unlikely to become a stand-alone diagnostic test as the commonly insufficient sensitivity and/or specificity. Using a panel of tumor markers and testing with different methods for CRC diagnosis has the potential to be an effective approach. With systematic bioinformatics screening and clinical verification, our study showed that a combination of serum CK20 and hyper-methylated CLIP4 was a novel and effective biomarker for CRC diagnosis with 91.67% sensitivity and 82.54% specificity in the training cohort; and 95% sensitivity and 81.7% specificity in the validation cohort. It was more sensitive than CLIP4 hyper-methylated alone in stool specimens (90.3% sensitive, 88.4% specificity) [33]. Comparing with previous serum CRC biomarkers, CK20/hypermethylated CLIP4 was more effective than CEA/MMP-7/TIMP-1 (sensitivity: 70.3%, specificity: 91.3%) [34], RUNX3/SFRP1/CEA (sensitivity 84.71%) [35], LRG1/EGFR/ITIH4/ HPX/SOD3 (sensitivity: over 70%, specificity: 89%) [36], anti-SLP2/-p53/-SEC61B/-PLSCR1 (sensitivity: 64.1%, specificity: 80%) [37], miR-203a-3p/miR-145-5p/miR-375-3p/miR-200c-3p (sensitivity: 81.52%, specificity: 73.33%) [38], miR-144-3p/miR-425-5p/miR-1260b (sensitivity: 93.8%, specificity: 91.3%) [39], and less than CCL20/IL-17A (sensitivity: 96.1%, specificity: 96.5%) [40]. Elevated CCL20 and IL-17A levels may reflect inflammatory condition, which can increase the false-positive fraction (FPF) of CRC detection [40]. In comparison, CRC cells   In this study, we found several limitations, which should be regarded as preliminary research, and upcoming surveys should focus on several issues. First, CRCs can be characterized by their primary tumor location. Left-sided colon cancer (LCC), including rectum and right-sided colon cancer (RCC), is different in pathogeneses, molecular characteristics, incidences and prognoses. In LCC, chromosomal instability has been detected in approximately 75% more than 30% of RCCs [41]. With increased chromosomal instability, LCC has been associated with more frequent overexpression of the epidermal growth factor receptor (EGFR) ligands, EGFR, EREG, AREG, HERS, VEGF-1 and COX-2 [42]. In RCC, Hypermutation is more prevalent. RCC has been shown to be associated with an increase in RAS and phosphoinositide 3-kinase pathway AGING  AGING mutations, BRAF mutations, and TGFβR2 mutations. CpG island methylator phenotype (CIMP)-high and microsatellite-high subtype (MSI) have also been detected in RCC [43]. According to our study, in the validation cohort, elevated levels of CK20 were significantly correlated with the tumor location of the colon, not the rectum. Therefore, whether the expression of CK20 in tumor tissues and the serum level of CK20 are different between LCC and RCC and whether serum CK20 could distinguish LCC from RCC need to be further studied. Second, serum CK20 mRNA is a biomarker of circulating CRC cells. Serum CK20 protein originates from circulating CRC cells or CRC tumor tissue, which urgently needs to be determined. Therefore, for serum CK20 protein-positive patients, serum CK20 mRNA should be detected, and CK20 protein in CRC tumor tissues should be examined by IHC. Third, bioinformatics and DNA methylomics showed that breast and gastric cancer tissues presented hyper-methylated CLIP4 [44][45][46]. The diagnostic value of hyper-methylated CLIP4 in serum for breast cancer and gastric cancer has not yet been reported. Thus, a study involving several cancer types should be conducted to verify the specificity of hyper-methylated CLIP4 and CK20/hyper-methylated CLIP4 for CRC diagnosis. Fourth, through clinical serum sample validation, we found that only the combination of CK20 and hyper-methylated CLIP4 displayed high sensitivity and specificity for CRC diagnosis. The reason is unclear. Therefore, the biological function of CK20 and CLIP4 in CRC and the relationship between them should be further explored. In addition, our study was performed on a limited number of CRC individuals (Only 120 patients were enrolled) from two centers. In the future, a study involving several hospitals/clinics from different regions covering a large population should be conducted to avoid overestimation of the sensitivity and specificity of serum CK20/hypermethylated CLIP4. Finally, although none of the CRC patients had received radiotherapy, chemotherapy or surgery treatment prior to blood collection, they had already been clinically diagnosed by endoscopy and pathological biopsy. Serum biomarkers would be more likely detectable in clinical patients than subclinical patients. Therefore, a large number of blood samples from a health examination center should be collected and serum CK20/hyper-methylated CLIP4 should be detected. Then, for patients positive for serum CK20 or hyper-methylated CLIP4 should be examined by endoscopy and pathological biopsy to verify the ability of serum CK20/hyper-methylated CLIP4 to diagnose CRC.