Genome-scale analysis identifies NEK2, DLGAP5 and ECT2 as promising diagnostic and prognostic biomarkers in human lung cancer

This study aims to identify promising biomarkers for the early detection of lung cancer and evaluate the prognosis of lung cancer patients. Genome-wide mRNA expression data obtained from the Gene Expression Omnibus (GSE19188, GSE18842 and GSE40791), including 231 primary tumor samples and 210 normal samples, were used to discover differentially expressed genes (DEGs). NEK2, DLGAP5 and ECT2 were found to be highly expressed in tumor samples. These results were experimentally confirmed by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). The elevated expression of the three candidate genes was also validated using the Cancer Genome Atlas (TCGA) datasets, which consist of 349 tumor and 58 normal tissues. Furthermore, we performed receiver operating characteristics (ROC) analysis to assess the diagnostic value of these lung cancer biomarkers, and the results suggested that NEK2, DLGAP5 and ECT2 expression levels could robustly distinguish lung cancer patients from normal subjects. Finally, Kaplan-Meier analysis revealed that elevated NEK2, DLGAP5 and ECT2 expression was negatively correlated with both overall survival (OS) and relapse-free survival (RFS). Taken together, these findings indicate that these three genes might be used as promising biomarkers for the early detection of lung cancer, as well as predicting the prognosis of lung cancer patients.

In the present study, we identified differentially expressed genes that were common among several expression profiles. We selected the target genes from among the 100 differentially expressed genes based on biology. According to the literature, NIMA-related kinase 2 (NEK2), disc large (drosophila) homolog-associated protein 5 (DLGAP5) and epithelial cell transforming 2 (ECT2) are three specific mitosis-associated genes. In this study, CCNB1, CCNB2, CDKN2A, BUB1, BUB1B and TTK were also involved in cell cycle. Deregulated gene expression of mitosis-related factors, which forces chromosomal segregation during cell division, is frequently observed in cancer. The results of high throughput screening were confirmed by qRT-PCR and further validated in the TCGA datasets. The expression levels of NEK2, DLGAP5 and ECT2 were significantly higher in lung cancer patients than in normal subjects. In addition, we explored and discussed the diagnostic and prognostic value of the three genes in lung cancer. ROC analyses showed that NEK2, DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values. Elevated expression of NEK2, DLGAP5 and ECT2 were both remarkably associated with reduced survival and increased risk of recurrence. Taken together, our findings revealed that NEK2, DLGAP5 and ECT2 might be used as promising biomarkers for the early detection of lung cancer, as well as predicting the prognosis of lung cancer patients.

Results
Identification of DEGs between tumor tissues and normal lung tissues. In our study, three expression profiles (GSE19188, GSE18842, GSE40791) were used to identify DEGs between tumors and normal lung tissues. Genes with corrected P-values <0.05 and absolute fold changes >4 were considered as DEGs. The results showed that 131 genes were up-regulated in GSE19188, 316 genes were up-regulated in GSE18842, and 309 genes were up-regulated in GSE40791 (Figure S1A-C). Then, we performed an overlap analysis of the DEGs, a total of 100 genes were significantly up-regulated in the three lung cancer datasets ( Figure S1D, Table S2). The increased expression of NEK2, DLGAP5 and ECT2 in lung cancer was identified in three GEO datasets. An unpaired t-test was applied to comparisons of the two groups (tumor vs normal), and p-values of less than 0.05 were considered to be statistically significant ( Fig. 1A-C). Importantly, these three genes play an important role in mitosis. Thus, in this study, we focused on NEK2, DLGAP5 and ECT2, three critical mitotic genes.

Independent validation.
To confirm our previous results, we selected a series of DEGs for further investigation using another independent set of 56 paired tumors and normal lung tissues. The clinical characteristics of this cohort are summarized in Table 1. NEK2, DLGAP5 and ECT2 expression levels were significantly elevated in tumor tissues compared with normal lung tissues ( Fig. 2A-C). As our study was limited to a small number of patients, we expanded the sample size for further validation by using TCGA datasets. A total of 349 lung cancer and 58 normal tissue samples were selected. The expression levels of NEK2, DLGAP5 and ECT2 were similar to those in our training cohort, with significant differences in expression between tumor and normal ( Fig. 3A,C,E), suggesting that the differential expression statuses of these three genes is a common feature of lung cancer. Moreover, the increases in NEK2, DLGAP5 and ECT2 expression levels were clearly discernible between TNM stages, with significantly higher levels in stage II-IV patients compared with stage I patients. (Fig. 3B,D,F).
Correlation between the three biomarkers and clinicopathologic variables. Next, the analysis of the associations between DEG expression and clinicopathological characteristics are presented in Table 2. The TCGA dataset was used for correlation analyses. NEK2 expression was significantly associated with age (P = 0.027), gender (P < 0.001), clinical stage (P = 0.033), pathologic T stage (P < 0.001) and therapy outcome (P = 0.004). Elevated DLGAP5 expression was significantly correlated with all six clinicopathologic variables. No significant association was observed between ECT2 expression and patient age or clinical stage. Table 2 shows the significant associations between high ECT2 expression in lung cancer and gender (P = 0.002), new tumor event (P = 0.026), pathologic T stage (P = 0.002), and therapeutic outcome (P = 0.012). These results suggest that expression changes in NEK2, DLGAP5 and ECT2 may play a vital role in lung cancer progression.
Diagnostic value of NEK2, DLGAP5 and ECT2 in lung cancer. Subsequently, ROC analysis was performed to assess the diagnostic value of NEK2, DLGAP5 and ECT2 as biomarkers detecting lung cancer. The AUC of tumor and normal groups in NEK2 analyses were significantly different for all four lung cancer datasets, with the following values: AUC GSE19188 = 0.927 (sensitivity: 0.923, specificity: 0.890), AUC GSE18842 = 1 (sensitivity: 1, specificity: 1), AUC GSE40791 = 0.967 (sensitivity: 0.910, specificity: 0.926) and AUC TCGA = 0.977 (sensitivity: 0.983, specificity: 0.873) (Fig. 4A, Table 3). Similarly, ROC analyses showed that DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values ( Fig. 4B-C, Table 3). Furthermore, in order to exclude the influence of primary clinical factors (age, gender, clinical stage, smoking history) on target gene performance, we further constructed prediction models including (Model 1) or excluding (Model 2) the target gene. Model 1 includes clinical factors and the target gene. Model 2 includes only clinical factors, and excludes the target gene. We compared these models, and the results of these comparisons are shown in Table S3 and Fig. 4D-F. Model 2 performed worse than Model 1. These results suggest that these target genes are important factors for maintaining the model's performance. Collectively, our results suggest that NEK2, DLGAP5 and ECT2 could be suitable biomarkers for lung cancer diagnosis.
Prognostic value of NEK2, DLGAP5 and ECT2 in lung cancer. Furthermore, in order to assess the prognostic value of NEK2, DLGAP5 and ECT2 as biomarkers for lung cancer, we investigated the association between the expression levels of each of these targets with survival through Kaplan-Meier analysis. We used the log-rank test in 349 lung cancer patients. The Cox proportional hazards regression model was also used to evaluate the predictive value of NEK2, DLGAP5 and ECT2 mRNA levels in lung cancer patients. Two types of survival outcomes were considered in survival analyses. Overall survival (OS) was defined as the time between the date of surgery and date of death or last follow-up, and relapse-free survival (RFS) was defined as the period from surgery to recurrence or last follow-up.
In this study, the TCGA dataset was used for prognostic analyses. We divided expression levels into two categories using the median. High expression levels were classified as those that were above the median, while low expression levels were below the median. On the whole, patients with low NEK2 levels had statistically longer OS (P = 0.009; Fig. 5A) and RFS (P = 0.006; Fig. 5B) than those with high NEK2 levels. The median OS in NEK2 low expression group is 72.5 months, in NEK2 high expression group is 39 months. The median RFS in NEK2 low expression group is 73.9 months, in NEK2 high expression group is 25.7 months. Similarly, DLGAP5 expression was significantly related with OS (P = 0.001; Fig. 5C) and RFS (P = 0.003; Fig. 5D) of lung cancer patients. The median OS in the low and high DLGAP5 expression groups is 59.7 months and 35.8 months, respectively. The median RFS in the low and high DLGAP5 expression groups is 68.2 months and 25.7 months, respectively. These figures revealed that higher DLGAP5 expression correlated with a worse prognosis and earlier recurrence. Elevated expression of ECT2 was also remarkably associated with reduced survival (P = 0.007; Fig. 5E) and increased risk of recurrence (P = 0.005; Fig. 5F). The median OS in low and high ECT2 expression groups is 59.7 months and 41.2 months, respectively. The median RFS in low and high ECT2 expression groups is 68.2 months and 25.7 months, respectively. Taken together, high expression of these three genes were all remarkably associated (C) Identification of mRNA expression of ECT2 in three datasets, respectively. ***corresponds to P < 0.001; **P < 0.01 and *P < 0.05. with reduced survival and increased risk of recurrence. The univariate/multivariate analyses were carried out to evaluate the target genes and other factors using a Cox proportional hazard regression model. The results showed that the expression of each target gene was significantly correlated with the prognosis of lung cancer patients (Table 4).

Discussion
Lung cancer remains the most common cause of cancer related death worldwide 1 . The high mortality among patients with lung cancer is mainly due to the absence of an effective screening strategy to identify lung cancer in early stages 10 . Current screening strategies for lung cancer include conventional radiography, sputum cytology, and more recently, low-dose computed tomography (LDCT). LDCT screening can significantly improve early diagnosis and reduce lung cancer mortality. However, the false-positive rate is high for screening with LDCT and this can lead to harm due to unnecessary workups of benign nodules 11,12 . For many decades, cytotoxic chemotherapy was the most effective treatment to improve overall survival and life quality in these patients, despite its many drawbacks 13 . At the same time, researchers made substantial efforts towards the development of molecular targeted agents 14 . Systematic clinical studies and basic research on lung cancer has improved the survival; however, the long-term outcomes of lung cancer patients remain poor. Thus, it is necessary to identify new biomarkers to improve the diagnosis and prognosis of lung cancer.
NEK2 is a serine/threonine kinase that is involved in regulation of centrosome duplication and spindle assembly during mitosis 15,16 . Dysregulation of these processes causes chromosome instability (CIN) and aneuploidy, which are hallmark changes in many tumors 17,18 . NEK2 exists in three alternative splice isoforms, which are NEK2A, NEK2B and NEK2C 19 . NEK2 overexpression has been observed in several human cancers. Increased expression of NEK2 has been reported to be involved in tumor progression and is associated with poor prognosis in pancreatic ductal adenocarcinoma 20 , prostate cancer 21 , colon cancer 22 . However, the association between the expression level of NEK2 and the early diagnosis of lung cancer patients remains to be rigorously and systematically evaluated. ECT2 is a BRCT-containing protein whose function has been best studied in cytokinesis. He et al. 23 showed that ECT2 is located to the chromatin and DNA damage foci-like structures and it facilitates PIKK-mediated phosphorylation of p53 on Ser15, the execution of apoptosis, and the activation of S and G2/M checkpoints. Luo et al. 24 showed that elevated expression of ECT2 predicts an unfavorable prognosis in patients with colorectal cancer. Another potential predictor of lung cancer diagnosis and prognosis is DLGAP5. DLGAP5 is a mitotic spindle protein that promotes the formation of tubulin polymers resulting in tubulin sheets around the end of the microtubules 25 . DLGAP5 contains a guanylate-kinase-associated protein (GKAP) domain that is conserved among various species. This domain is also found in many eukaryotic signaling proteins, suggesting that DLGAP5 may have important biological functions as a signaling molecule 26 . DLGAP5 is involved in cancer formation and progression, suggesting that the gene and its product may be potential therapeutic targets 27 . NEK2, DLGAP5 and ECT2 are mitosis-associated genes that play an important role in tumorigenesis. At present, these genes have been reported to be involved in lung cancer development. Through clustering of a genome-scale co-expression network, lung adenocarcinoma modules were revealed; in few modules, the genes   such as DLGAP5 and BIRC5 are present that play a crucial role in cell cycle progression 28 . Das et al. 29 uncovered a novel role for Nek2 in promoting tumorigenesis by regulating an axis of metastasis and cell survival. Ect2 regulates rRNA synth-esis through a PKCi-Ect2-Rac1-NPM signaling axis that is required for lung tumorigenesis 30 . It is of great clinical significance to explore the early diagnosis and prognosis of these three genes. In previous studies, there are some studies on the association between gene overexpression and poor prognosis in lung cancer. Zhong et al. 31 Table 3. ROC curve analyses using NEK2/DLGAP5/ECT2 for distinguishing patients with lung cancer from normal control subjects. survival time compared to those with low expression for all stages. Landi et al. 32 showed that the very mitotic genes (NEK2 and TTK) known to be involved in cancer development are induced by smoking and affect survival. Schneider et al. 33 found that the expression of the mitosis-associated genes AURKA, DLGAP5, TPX2, KIF11 and CKAP5 is associated with the prognosis of NSCLC patients. ECT2 overexpression may be a useful index for application of adjuvant therapy to lung cancer patients who are likely to have poor clinical outcome 34,35 . However, some genes identified with prognostic implications in one cohort might be difficult to be verified in other cohorts. The high reliability and reproducibility of the microarray technology in identifying the target genes are also essential for its application in discovering the clinical biomarkers. Microarray technology has substantially enhanced the search for biomarkers for cancer diagnosis and prognosis. In this study, we identified and validated the expression of NEK2, DLGAP5 and ECT2 in multiple lung cancer datasets, and the results showed that the expression levels of these three genes were significantly higher in lung cancer patients than in normal subjects. Importantly, the expression levels of the three candidate genes were significantly associated with clinicopathologic variables. Furthermore, we revealed the diagnostic and prognostic value of the candidate genes. These cancer biomarkers can be used for early detection, disease monitoring and risk assessment. However, there are some limitations in this study. We just examined the expression of the target genes in tissue samples. Because the ultimate goal of biomarker is specific, early and non-invasive diagnosis and post-therapy monitoring of cancer, body fluid (plasma, urine and sputum) has been thought as an appropriate biological material. In the future, we will also detect the expression of these biomarkers in body fluid samples.
Taken together, these findings indicate that NEK2, DLGAP5 and ECT2 overexpression might be used as promising biomarkers for the diagnosis and prognosis of lung cancer. These genes may also serve as potential therapeutic targets in lung cancer. More work is needed to elucidate the function of these three candidate genes and their roles in tumorigenesis.

Materials and Methods
Patients and tissue samples. Fifty-six patients from Xiangya Hospital (Changsha, China) were included in this study. All the patients provided written informed consent. Experiments and procedures were performed in accordance with the Helsinki Declaration of 1975; and were approved by the Ethics Committee of Xiangya School of Medicine, Central South University. Tumor and matched distant (>5 cm) normal lung tissue samples were collected from NSCLC patients who underwent resection for primary lung cancer. All fresh tissues were frozen in liquid nitrogen immediately after resection and stored at −80 °C. Their basic clinical characteristics were summarized in Table 1.
Lung cancer gene expression datasets. Three lung cancer datasets (GSE19188, GSE18842, GSE40791) generated from the Affymetrix platform and corresponding clinical information of lung cancer patients were retrieved from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). GSE19188, including 91 tumors and 65 adjacent normal lung tissues, GSE18842, which includes 46 tumors and 45 controls, and GSE40791 containing 94 tumors and 100 non-tumor tissues.
Validation datasets were acquired from the Cancer Genome Atlas (TCGA) data portal (http://tcga-data.nci. nih.gov). This data set contains 349 adenocarcinomas and 58 non-tumor tissues with both mRNA expression data and clinical feature information available for performing the Receiver Operating Curves (ROC) analysis, survival analysis and correlation analysis. The aim of this study was to identify promising biomarkers for the early detection of lung cancer and to evaluate the prognosis of lung cancer patients. The latest version of the TCGA LUAD dataset includes 571 samples (513 tumors and 58 normal tissues). Two recurrent tumor samples were removed, 28 samples lacking OS data were removed, 133 samples lacking RFS data were removed, and 1 sample lacking clinical stage data was removed, and finally retained the 349 adenocarcinoma samples (primary tumor) and 58 non-tumor samples. Detailed clinical information of patients used in this study was shown in Table 2. mRNA expression profiling using microarrays. Raw Table 4. Univariate and multivariate Cox regression analyses for overall survival and recurrence-free survival.
that, the Linear Models for Microarray Data (LIMMA) package in R was used to calculate the probability of probes being differentially expressed between cases and controls 37 . P value correction was performed using the Benjamini-Hochberg (BH) FDR from the package in R. Corrected P-values <0.05 and absolute fold changes >4 were used to identify significantly DEGs. All data analysis were performed using R (http://www.r-project.org/, version 2.15.0) and Bioconductor 38 . Visualization of the DEGs including heat map, volcano plot and venn diagram was achieved by using gplots, lattice, and venn diagram packages in R, respectively. The threshold cycle value (Ct) of each product was determined and normalized against that of the internal control GAPDH. The differences in mRNA expression levels were compared by t test using SPSS 18.0 (SPSS Inc, Chicago, Illinois, USA). P-values of less than 0.05 were considered statistically significant.

Quantitative reverse transcription-polymerase chain reaction (qRT-PCR
Statistical analysis. The SPSS version 18.0 (Chicago, IL) and Prism 5.0 GraphPad software (San Diego, CA) were used for statistical analysis. Student's t-test was applied for comparisons of two groups. ROC curves were used to assess the diagnostic value of each marker 39 . Area under the curve (AUC) was computed for each ROC curve, and 95% confidence intervals (CI) were also estimated by bootstrapping with 1,000 iterations. Survival analysis was carried out according to Kaplan-Meier analysis and the Log-rank test. The Cox proportional hazards regression model was applied to perform univariate and multivariate analyses. P-values of less than 0.05 were considered statistically significant.