Diagnosis value of aberrantly expressed microRNA profiles in lung squamous cell carcinoma: a study based on the Cancer Genome Atlas

Background Lung cancer is considered as one of the most frequent and deadly cancers with high mortality all around the world. It is critical to find new biomarkers for early diagnosis of lung cancer, especially lung squamous cell carcinoma (LUSC). The Cancer Genome Atlas (TCGA) is a database which provides both cancer and clinical information. This study is a comprehensive analysis of a novel diagnostic biomarker for LUSC, based on TCGA. Methods and Results The present study investigated LUSC-specific key microRNAs (miRNAs) from large-scale samples in TCGA. According to exclusion criteria and inclusion criteria, the expression profiles of miRNAs with related clinical information of 332 LUSC patients were obtained. Most aberrantly expressed miRNAs were identified between tumor and normal samples. Forty-two LUSC-specific intersection miRNAs (fold change >2, p < 0.05) were obtained by an integrative computational method, among them six miRNAs were found to be aberrantly expressed concerning characteristics of patients (gender, lymphatic metastasis, patient outcome assessment) through Student t-test. Five miRNAs correlated with overall survival (log-rank p < 0.05) were obtained through the univariate Cox proportional hazards regression model and Mantel–Haenszel test. Then, five miRNAs were randomly selected to validate the expression in 47 LUSC patient tissues using quantitative real-time polymerase chain reaction. The results showed that the test findings were consistent with the TCGA findings. Also, the diagnostic value of the specific key miRNAs was determined by areas under receiver operating characteristic curves. Finally, 577 interaction mRNAs as the targets of 42 LUSC-specific intersection miRNAs were selected for further bioinformatics analysis. Conclusion This study indicates that this novel microRNA expression signature may be a useful biomarker of the diagnosis for LUSC patients, based on bioinformatics analysis.


TCGA dataset and sample information
Up to February 12, 2017, a total of 504 LUSC samples with related information were obtained from the TCGA database. According to the inclusion criteria: (1) histologic diagnosis was LUSC; (2) data of samples were available on expression of genes and characteristics of patients. For the exclusion criteria: (1) first histologic diagnosis was not LUAD; (2) patients suffered from other malignant neoplasms except LUSC; and (3) overall survival more than five years. At last, this study embraced 332 eligible patients.
Among these 332 LUSC, related information of LUSC tissue samples was from 293 patients and adjacent non-tumorous information were from 39 subjects. In addition, personal comprehensive sources including related clinical information and RNA expression data on LUSC patients were also obtained from the Data Coordinating Center. Among these 293 cases, 105 LUSC subjects had lymphatic metastases and 188 LUSC subjects had no lymphatic metastases. Moreover, according to the staging system of the Union for International Cancer Control (UICC), well or moderately differentiated LUSC (stage I-II) were 243 cases, and poorly differentiated LUSC (stage III-IV) were the remaining 50 cases. Because the data were provided by the TCGA, there was no need for approval of the Ethics Committee. This study fully met the guideline of the NIH TCGA human subject protection and data access policies.
In addition, 47 LUSC tissues samples were obtained from the Nanjing Chest Hospital of Southeast University. These samples including tumor tissues and adjacent non-tumor tissues were obtained immediately after surgical resection from patients who do not undergo preoperative radiotherapy or chemotherapy. These patients consented in advance and signed informed consent forms. Tissues were stored in RNA later (Ambion, Austin, TX, USA) at -80 C until further use. This study was approved by the ethics committee of Zhongda Hospital Southeast University.

Differential analysis of expressed miRNAs in LUSC samples
The LUSC RNA expression data (level 3) of patients were collected from the TCGA Data Portal. The TCGA database provided the normalized expression profile data of RNA sequencing including mRNAs and miRNAs by RNASeqV2 and Illumina HiSeq 2000 miRNA sequencing platforms (Illumina Inc., Hayward, CA, USA), respectively.
Then, abnormally expressed miRNAs were compared in level 3 (fold changes >2, p < 0.05), including LUSC tumor samples vs. adjacent non-tumorous lung samples, lymphatic metastases of LUSC samples vs. non-lymphatic metastases of LUSC samples, and I-II stage vs. III-IV stage, respectively. Figure 1 shows the flow chart for bioinformatics analysis.

Correlation analysis between LUSC-specific intersection miRNAs, characteristics of patients and overall survival
To explore the connection between LUSC-specific intersection miRNAs and clinical information, related miRNAs were found, using the comparative analysis of LUSC miRNA sequencing data in TCGA. The characteristics of patients was divided into five parts including gender, tumor grade, TNM stage, lymph node metastasis, and patient outcome assessment. Subsequently, Student t-test was used to evaluate the association between the LUSC-specific miRNAs and personal information using IBM SPSS Statistics Version 21.
Furthermore, to correlate specific intersection miRNAs with patients' prognosis characteristics, the univariate Cox proportional hazards regression model and Mantel-Haenszel test were used to explore the association between specific intersection miRNAs and LUSC patient survival. At the same time, the overall survival (OS) curves were evaluated. LUSC-specific intersection miRNAs associated with OS were defined two independent subclasses by using hazard ratios (HRs), whose cutoff was that a positive signature showed HR < 1 and hazardous signature had HR > 1.

Total RNA extraction and qRT-PCR verification
Total RNA was extracted tissue samples using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the instructions. Concentration and integrity of all extracted RNA were evaluated to be satisfactory.
A two-step protocol of reverse transcription reactions: First, 1 mg of RNA samples were pre-denatured (5 min at 65 C and held at 4 C). Then the 9 ml mixture including 2 ml 5 Â RT buffer, 0.5 ml RT Enzyme Mix, 0.5 ml miRNA-specific stem-loop RT primers and 6 ml RNase-free water were added in the 1 mg pre-denatured RNA (37 C Â 15 min, 98 C Â 5 min and subsequently held at 4 C). Real-time PCR was then performed by Thunderbird SYBR qPCR Mix (QPS-201; TOYOBO, Osaka, Japan) according to the manufacturer's protocol. The PCR reaction components comprised 1 ml cDNA, 5 ml Thunderbird SYBR qPCR Mix, 0.3 ml PCR primers (RiboBio, Guangzhou, China) and 3.4 ml RNase-free water. Then a two-step protocol [95 C Â 1 min; 40 cycles of (95 C Â 15 s, 60 C Â 30 s, 72 C Â 30 s)] was undertaken in on StepOnePlus PCR System (Applied Biosystems, Waltham, MA, USA). Results were normalized to the expression of U6. The results were expressed as mean ± SD. Each sample expression was calculated by the 2 -ÁÁCt method (ÁCt = Ct miRNAs -Ct U6 and ÁÁCt = ÁCt tumor tissues -ÁCt adjacent non-tumor tissues ). Paired t-test was applied to for comparison between tumor tissues and the adjacent non-tumor lung tissues. In all cases, differences with p < 0.05 were considered to be statistically significant.

ROC curve analysis of specific key miRNAs
According to the outcome of 332 patients, receiver operating characteristic (ROC) curve was also applied to evaluate the diagnosis value of the specific key miRNAs of LUSC.

The prediction of microRNA target genes
It is known that mRNAs are regulated by corresponding miRNAs. So, to find the differential display of targeted mRNAs, samples were quartered similarly as abnormally expressed miRNAs, whose flow chart of bioinformatics analysis is presented in Fig. 1. In addition, interaction mRNA of LUSC-specific intersection miRNAs was predicted by using the Targetscan and the miRbase (Hsu et al., 2014). Last, according to the results of previous two steps, the mRNAs of the microRNA target genes were obtained.

Differential expression of key mRNA function
To understand the potential biological processes and pathways of aberrant expression of intersection mRNAs, bioinformatics resources from database for annotation, visualization, and integrated discovery (DAVID) were used. We were only interested in the significant level (the enrichment score >2 and p < 0.05) of gene ontology (GO) biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to analyze the potential role of these key mRNAs. Moreover, to further explore the relationship and function of intersection mRNAs, the PPI network was analyzed using the protein-protein interaction (PPI) network via STRING (http://string-db.org; Version 10.5).

Significantly specific miRNAs in LUSC
In this study, 332 LUSC patients were selected from TCGA database. Then the expression of 332 LUSC and adjacent normal tissue miRNAs were compared to identify significantly differential miRNAs through a certain criterion (fold change !2 or 0.5, and p < 0.05) (Romero-Cordoba et al., 2012;Xu et al., 2014;Zhu et al., 2015).
Then 97 LUSC-associated abnormally expressed miRNAs were identified between 293 LUSC tumor tissue samples and 39 adjacent non-tumor tissues. Further, these 97 miRNAs between tumor stage and lymphatic metastasis were analyzed. And aberrantly expressed miRNAs were selected from four groups: (1) I-II stage (non-lymphatic metastases) LUSC tissues and adjacent non-tumorous lung tissue; (2) III-IV stage (non-lymphatic metastases) LUSC tissues and adjacent non-tumorous lung tissue; (3) I-II stage (lymph node metastasis) LUSC tissues and adjacent non-tumorous lung tissues; and (4) III-IV stage (lymph node metastasis) LUSC tissues and adjacent non-tumorous lung tissues. At last, we found 70, 52, 76 and 84 aberrantly expressed miRNAs in these four groups respectively (fold change >2, p < 0.05). To further enhance data reliability for bioinformatics analysis, 42 aberrantly expressed miRNAs in the intersection of these four groups were selected. Among these 42 intersection miRNAs, 17 miRNAs were upregulated and 25 miRNAs were downregulated (Figs. 2 and 3; Table 1).

Association between LUSC-specific miRNAs and characteristics of patients
After analyzing the correlation between expression of the 42 LUSC-specific intersection miRNAs and their related characteristics of patients (race, gender, age, TNM stage, lymphatic metastases, and patient outcome status in the TCGA database), six specific miRNAs were significantly abnormal in clinical information (p < 0.05, Table 2).

Prediction of target genes
In 474 LUSC samples from TCGA database, 18,633 mRNAs were obtained from the TCGA database. According to the cutoffs that was fold change !2 or 0.5, and p < 0.05,  significantly aberrantly expressed mRNAs were identified between 293 LUSC tumor tissue samples and 39 adjacent non-tumor tissues. In total, 14,689 LUSC-associated abnormally expressed mRNAs were selected to be further compared between tumor stage and lymphatic metastasis. 3,681 aberrantly expressed mRNAs were selected from comparisons of I-II stage (non-lymph node metastasis) LUSC specimens and adjacent non-cancerous lung tissues, 3,848 from comparisons of I-II stage (lymphatic metastasis) with adjacent non-cancerous lung tissues, 3,275 from comparisons of III-IV stage Table 2 The correlations between LUSC-specific intersection miRNAs and characteristics of patients.
Based on the above steps, the interaction mRNAs that appeared simultaneously in previous results were selected. It is shown that 2,950 aberrantly expressed mRNAs in the first step contained all 577 interaction mRNAs in the second step, which meant that the 577 mRNAs were the target genes.

Function analysis
Furthermore, functions of LUSC-specific intersection miRNAs were predicted by interaction mRNAs with DAVID bioinformation.
Eighty-one pathways were indicated by KEGG pathway analysis, and 433 GO terms (p < 0.05 and enrichment >2) were identified by analyzing the enrichment of these genes. Moreover, KEGG pathway analysis showed the most significant networks were Staphylococcus aureus infection (path ID: 05150) and cell cycle (path ID: 04110) (Fig. 8). The result suggested that five of the top 20 pathways were tumorous pathways including cell cycle, cell adhesion molecules (CAMs), cytokine-cytokine receptor interaction, Rap1 signaling pathway, and pathways in cancer. There were other tumor-related pathways such as cAMP signaling pathway, metabolic pathways, phagosome, PI3K-Akt signaling pathway, miRNAs in cancer, small cell lung cancer, and hippo signaling pathway in the other significantly differentially altered pathways. The highest enriched GO terms were mitotic cell cycle (GO: 0000278) and GO 0007165 (GO: 0007268) (Fig. 8). The relationship and function of 577 mRNAs were revealed in the PPI network (Fig. 9).

DISCUSSION
Lung squamous cell carcinoma is one of the most common subtypes of lung cancer, remaining large cancer-related death in China (Peng et al., 2015). It is reported that lung cancer may be caused by many factors, including environment (smoky coal, radon) (Kim et al., 2014;Torres-Duran et al., 2015), life style (smoking, diets) (Gnagnarella et al., 2013;Yang, 2011), human papillomavirus (Hsu et al., 2013), chronic pulmonary infection (it can be caused by S. aureus) (Mulanovich, Mulanovich & Rolston, 2008;Parker & Prince, 2012), genetic factors (family history)  and so on, which may be the risk factors of LUSC. The development of individualized accurate diagnosis and therapy have been faced with huge challenges due to immense difference in multiple parameters such as molecular, pathology, surgery, and radiology among LUSC patients (Pavel & Vasile, 2016). Diagnosis, medical treatment, surgery and prognosis of LUSC have been drawn much attention and mushroom growth, but the five-year overall survival is still unsatisfactory (Liu, Chen & Yang, 2017). According to past decade reports, miRNAs can serve as potential non-invasive biomarkers for cancer, more effective and accurate (Lin et al., 2014;Su et al., 2012;Yoshino et al., 2013;Youssef et al., 2011). There is growing evidence that miRNAs play a key role in the development and progression of lung cancer, as crucial biomarkers for early diagnosis, pathological classification, clinical treatment and prediction of outcome for lung cancer (Melkamu et al., 2010;Tian et al., 2016).
In this study, aberrantly expressed miRNAs in LUSC were identified from the TCGA database. Based on miRNA sequencing profiles in TCGA, we explored the relationship between LUSC miRNAs and different characteristics of patients (gender, tumor grade, TNM staging system, lymph node metastasis, and patient outcome assessment). The association between LUSC miRNAs and overall survival was also analyzed by OS curves. Then the target genes of abnormally expressed miRNAs were selected by related analysis. At last, we further predicted gene function and biological pathway, analyzed the protein-protein interaction.
After abnormally expressed miRNAs from GCTA database compared in level 3, 42 aberrantly expressed miRNAs were obtained, including 17 upregulated and 25 downregulated miRNAs. Among them, some have been observed in LUSC. Tan et al. (2011) reported that downregulation of hsa-miR-486-5p could distinguish LUSC from other lung cancers. Jiang et al. (2013) also reported that expressions of miR-205-5p in LUSC were significantly higher than in lung adenocarcinoma samples, which could be beneficial to the precise diagnosis. These results showed that their findings were consistent with the findings of the present study. However, a few of these abnormally expressed miRNAs have not been verified. We wish that the best expectation is to establish 100% concordance between the conventional diagnosis and miRNA-based methods. Through exploring the associations between the 42 LUSC-specific key miRNAs and characteristics of patients (race, gender, age, TNM stage, lymphatic metastasis, and patient outcome), six miRNAs were found to be related to clinical information, including gender, lymphatic metastasis, and patient outcome assessment. Among them hsa-miR-511-5p has not been reported in any cancer. Zhang et al. (2017) found that downregulation of miR-30c-2-3p and miR-30a-3p could increase the risk of lung cancer using conditional logistic regression analysis, serving as non-invasive biomarkers for lung cancer diagnosis (Cazzoli et al., 2013;Jin et al., 2017). Wang et al. (2017) reported that miR-629-3p might serve as a new biomarker and potential therapeutic target for lung metastases of breast cancer, which was in accord with the association between miR-629-3p and lymphatic metastasis in the present study. There is no study on miR-130b-5p in LUSC but in breast cancer and ovarian cancer (Chang et al., 2015;Wang et al., 2014). Mitra et al. (2014) claimed that miR-130b-3p could be the biomarker in non-small cell lung cancer by the network consisting of miRNAs, transcription factors and predicted target genes. The results in the present study and related reports support the signature of six miRNAs in predicting LUSC. To explore the associations between 42 LUSC-specific key miRNAs and patients' survival were analyzed, five miRNAs were found to be related to LUSC overall survival. Some related reports has reported that miR-139-5p (Hiroki et al., 2010;Yonemori et al., 2016), miR-139-3p Yonemori et al., 2016), miR-326 (Wang et al., 2013), miR-486-5p (Madhavan et al., 2016) were related to overall survival in some cancers, but there is no study about miR-30d-3p. To validate the analyzed results from TCGA data, 5 of the 42 LUSC-specific key miRNAs (miR-205-5p, miR-30a-3p, miR-30a-5p, miR-30c-2-3p and miR-30d-5p) were randomly selected and measured using qRT-PCR. The results indicated that TCGA analysis and qRT-PCR results from 47 LUSC patients were in 100% agreement.
Then ROC was also used to determine the five specific key miRNAs as the diagnosis value of LUSC detection. In the future, above 11 miRNAs will be the priority in the lung cancer studies, using lung cancer tissues and blood samples to examine the relationship between lung cancer and the environment to identify risk factors.
In order to further understand the function of LUSC-specific intersection miRNAs, their target genes were obtained. We analyzed the enrichment and pathways of these target mRNAs by DAVID. After related analysis, 577 mRNAs were obtained as the target genes and bioinformatic analyzed, revealing that some tumor-related genes were significant in cancers and protein-protein interaction. Similarly, many genes of them have reported on LUSC Schlensog et al., 2016;Sun et al., 2017).
Huang, Yang & Cai (2015) also used the TCGA to find microRNA let-7a and miR-31 as novel candidate key roles in LUSC, however, the present study not only found key miRNAs by larger samples and more analytical methods, but also verified the reliability of TCGA findings by small sample. The greatest merit of the study is that it opens up a new method to predict and identify related miRNAs of LUSC. It is based on large-scale public data platform, reliable and representative. The findings of TCGA can play a guiding role for LUSC diagnosis and prognosis assessment. If the new biomarkers of LUSC is further confirmed, these miRNAs will be applied to the clinic for non-invasive diagnosis.
Nevertheless, a few limitations may exist in the present study. First, TCGA LUSC dataset had relatively high censored rate, having the influence on related analysis. Second, the present study only used small sample to verify. The diagnostic value of these miRNA biomarkers should be validated by a large number of proof tests or in the independent long-term cohort.

CONCLUSION
In conclusion, the present study identified LUSC-specific miRNAs as potential diagnosis biomarkers for LUSC patients. These LUSC-specific miRNAs can be further validated using independent large-sample-size cohorts, and future functional studies are necessary to explore the underlying mechanisms of these LUSC-specific miRNAs.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The present study was supported by the National Natural Science Foundation of China (81472939 and 81673132), and the Fundamental Research Funds for the central universities and innovative Research project for postgraduates in Colleges of Jiangsu province. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.