Differential mRNA expression profiling of oral squamous cell carcinoma by high-throughput RNA sequencing

Abstract Differentially expressed genes are thought to regulate the development and progression of oral squamous cell carcinomas (OSCC). The purpose of this study was to screen differentially expressed mRNAs in OSCC and matched paraneoplastic normal tissues, and to explore the intrinsic mechanism of OSCC development and progression. We obtained the differentially expressed mRNA expression profiles in 10 pairs of fresh-frozen OSCC tissue specimens and matched paraneoplastic normal tissue specimens by high-throughput RNA sequencing. By using Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, the functional significance of the differentially expressed genes were analyzed. We identified 1,120 significantly up-regulated mRNAs and 178 significantly down-regulated mRNAs in OSCC, compared to normal tissue. The differentially expressed mRNAs were involved in 20 biological processes and 68 signal pathways. Compared to adjacent normal tissue, the expression of MAGEA11 was up-regulated; TCHH was down-regulated. These findings were verified by real-time PCR. These differentially expressed mRNAs may function as oncogenes or tumor suppressors in the development and progression of OSCC. This study provides novel insights into OSCC. However, further work is needed to determine if these differentially expressed mRNAs have potential roles as diagnostic biomarkers and candidate therapeutic targets for OSCC.


Introduction
Oral squamous cell carcinoma (OSCC) is the sixth most common cancer worldwide. It is the cause of over 400,000 cancer related deaths each year, with 80% of deaths occurring in developing countries [1] . The inci-OSCC has increased by approximately 21% [3] . OSCC has a very poor prognosis due to its invasive nature, and despite advances in treatment and diagnostics, the five-year survival rate is below 50% and has not improved in the last three decades [2] . In common with other cancers, the development and progression of OSCC is a multistep process with the accumulation of genetic and epigenetic changes [4] . It is generally accepted that many of cancer-associated gene changes originate with the gain, loss, or mutation of genetic information [5] . Therefore, studying these genetic changes improves understanding of the molecular basis of cancer in order to deliver potential diagnostic or outcome markers, and therapeutic targets for cancer patients.
Recent studies have indicated that changes of mRNA expression levels in OSCC is associated with tumor development, maintenance, and progression [6][7] . As a consequence, inducing or silencing mRNA expression of these oncogenes or anti-oncogenes has been used as a method to control and treat neoplasms [8][9] . In addition, mRNAs could potentially be used as biomarkers of OSCC or other oral cancers [10][11] .
In large-scale clinical studies, genome-wide gene expression profiling can advance the discovery of cancer biomarkers and therapeutic targets. The human transcriptome is highly complex; the human genome encodes 20,000-25,000 genes. In contrast, because each gene can produce multiple transcripts and, thereby, multiple proteins, the proteome is many times larger and diverse in its functions [12] . Therefore, efficient sequencing technologies are urgently required.
High-throughput RNA sequencing (RNA-Seq) and the rapid development of new technologies is providing unprecedented opportunities for interrogating the human transcriptome [13] . This has led to next generation sequencing (NGS) applications, such as Illumina/Solexa, Applied Biosystems/SOLiD, and Roche/454 FLX [14] . Compared to traditional techniques, NGS offers greatly improved dynamic ranges and specificity for transcriptome analyses, while sample throughput is continuously increasing and costs are being reduced [15] . Although there have been published reports on the successful construction of genome-wide mRNA expression profiles in other types of cancer, highlighting the power and capabilities of high-throughput sequencing techniques, there have been relatively few relating to OSCC [16][17] . In this study, we exploited high-throughput RNA-Seq technology to construct comprehensive mRNA expression profiles of OSCC and matched paraneoplastic normal tissues. We used the information to explore and predict the molecular functions and biological pathways of the differentially expressed mRNAs. Our results have provided a basis for identifying further molecular markers for the diagnosis and treatment of OSCC.

Tissue samples
All the samples used in this study were selected from OSCC patients treated at Affiliated Stomatological Hospital of Nanjing Medical University between November 2011 and February 2012. Written informed consent was obtained from all patients and the study protocol was approved by the Institutional Review Board of Affiliated Stomatological Hospital of Nanjing Medical University. Ten pairs of OSCC tissue and matched paraneoplastic normal tissue specimens, which had been identified by pathologists, were collected and freshfrozen immediately after surgery, and stored at -80uC. The proportion of cancer cells in a tissue section was greater than 80%. None of the patients had received preoperative chemotherapy, radiotherapy, or any other anticancer treatment prior to surgery. Clinical characteristics of the patients are given in Table 1.

Illumina Solexa sequencing
Total RNA was isolated from the 10 paired freshfrozen OSCC tissue and matched paraneoplastic normal tissue specimens using TRIzolH (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. The integrity of the RNA was evaluated using the Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA, US). The mRNA was purified from extracted total RNA using Oligo (dT) magnetic beads adsorption, and converted into double-stranded cDNA by reverse transcription. NlaIII was used to digest the cDNA and the Illumina adaptor 1 was ligated. MmeI was used to digest the CATG site and the Illumina adaptor 2 was ligated to the 39 end. Primer GX1 and GX2 were used for PCR. The DNA was purified, and sequencing was performed by the Illumina Cluster Station and Genome Analyzer (Illumina Inc., CA, USA) at Beijing Genomics Institute, Shenzhen, according to the manufacturer's protocol.

Data processing
Sequencing-received raw image data were transformed by base calling into sequence data (termed raw data or raw reads). Dirty tags were filtered through data processing steps. Clean tags were obtained by sequence quality assessment; saturation analysis of sequencing, and experimental repeatability analysis. Subsequently, all clean tags were mapped to the reference sequences and only one-bp mismatch was accepted. Clean tags mapped to reference sequences from multiple genes were filtered and the remaining clean tags were designated as unambiguous clean tags. The number of unambiguous clean tags for each gene was calculated and normalized to the number of transcripts per million (TPM) of clean tags [18][19] .

Screening of differentially expressed genes
Differentially expressed genes (DEGs) between OSCC samples and their matched paraneoplastic normal samples were identified by a rigorous algorithm [20] . False discovery rate (FDR) was used as a statistical method to determine the threshold P-value by multiple testing and analyzed by manipulating the FDR value [21] . The levels of mRNA expressions in the paired samples were calculated by log 2 Ratio. For this study, we applied an FDR <0.001 and the absolute value of log 2 Ratio > 1 as the threshold to judge the significance of gene expression difference.

Gene Ontology functional enrichment analysis for DEGs
For gene expression profiling analysis, Gene Ontology (GO) enrichment analyses of functional significance were performed by hypergeometric testing to map all differentially expressed genes to GO terms (each GO term belongs to a type of ontology in the GO database; http://www.geneontology.org). Searches were carried out for significantly enriched GO terms in DEGs, compared to the genome background. The P-value was corrected by Bonferroni and a corrected P-value of 0.05 was used as the threshold to judge the differential gene expression of significantly enriched GO terms.

Pathway enrichment analysis of DEGs
The Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg) [22] was used for pathway enrichment analysis of significance by using the hypergeometric test to map all DEGs and search for pathways significantly enriched in DEGs, compared to the genome background. We defined a Q-value of 0.05 as the threshold to judge the differential gene expression of significantly enriched pathways.

Real-time PCR
We selected one of the most up-regulated mRNAs (MAGEA11) and one of the most down-regulated mRNAs (TCHH) for confirmation against 15 other paired OSCC tissue and adjacent normal tissue samples. SnRNA U6 was used as the endogenous control. Specific primer sets for MAGEA11, TCHH and U6 were obtained from Genecopoeia, Guangzhou, China.
For each mRNA amplification, a total reaction volume of 20 mL containing 26SYBR Green Real-time PCR master mix, mRNA primers, the cDNA template, and diethyl pyrocarbonate (DEPC)-treated water was used. The PCR procedure was as follows: 95 uC for 5 minutes; 40615 seconds cycles at 95 uC; 60 uC for 30 seconds; and 72 uC for 30 seconds. To quantify changes in gene expression, relative mRNA expressions were calculated according to the formula; 2 -DDCt [19] .

Statistical analysis
Data was expressed as mean ¡ SEM. Statistical analysis was performed using SPSS v17.0 software and Student's t-test was used to analyze the OSCC tissue and adjacent normal tissue samples. P,0.05 was considered statistically significant.

Differentially expressed mRNAs
By constructing the mRNA expression profile of OSCC and analyzing the sequence data, we identified 1,120 significantly up-regulated mRNAs and 178 significantly down-regulated mRNAs. These included MAGEA11 (log 2 Ratio510.252665), which was upregulated .1,000-fold, and TCHH (log 2 Ratio5 -10.759888), which was down-regulated .1,700-fold. The 10 most up-regulated and down-regulated mRNAs are shown in Table 2.
GO functional enrichment analysis: cellular components Through GO enrichment analysis, we determined that the differentially expressed mRNAs were significantly enriched in the following cellular components: extracellular region; extracellular matrix; proteinaceous extracellular matrix; collagen; cytoplasm; endoplasmic reticulum; macromolecular compounds; fiber collagen ( Table 3).

GO functional enrichment analysis: molecular functions
The results of GO molecular function analysis showed that the differentially expressed mRNAs were significantly enriched in the following molecular functions: endopeptidase activity; growth factor binding; peptidase activity; protein binding, peptidase activity, acting on Lamino acid peptides; oxidoreductase activity; and cytokine activity ( Table 4). Protein binding was found to be the function with the highest level of enrichment with 353 differentially expressed mRNAs.
To validate the results of deep sequencing, the differential expression of MAGEA11 and TCHH were verified by real-time PCR against 15 paired OSCC and adjacent normal tissues. The results showed that MAGEA11 expression was up-regulated in the tumor samples. The mean expression of MAGEA11 in OSCC tissues (3.0518 ¡ 1.5700) was significantly higher than in normal tissues (P,0.05; Fig. 1A); TCHH expression was down-regulated in the tumor samples. The mean expression of TCHH in OSCC tissues (0.5388 ¡ 0.1700) was significantly lower than normal tissues (P,0.05; Fig. 1B).
GO functional enrichment analysis: biological processes GO biological process analysis showed that the differentially expressed mRNAs were significantly  enriched in a total of 20 biological processes, including: extracellular structure organization, immune system process, and response to wounding ( Table 5).
The processes found to be most involved were the immune response, anatomical structure development, response to organic substance, developmental process, and response to stimulus.

Pathway enrichment analysis
The differentially expressed genes in the OSCC samples and matched paraneoplastic normal samples were involved in a total of 215 signaling pathways. Of these, 68 signaling pathways were significantly enriched (Q-value < 0.05), e.g., extracellular matrix (ECM)-receptor interaction, rheumatoid arthritis, and focal adhesion. Table 6 shows the 10 most significantly enriched pathways.

Discussion
By utilizing high-throughput RNA-Seq technology, we have constructed genome-wide mRNA expression profiles in OSCC and detected 1,120 up-regulated mRNAs and 178 down-regulated mRNAs in OSCC tissue, compared with matched paraneoplastic normal tissue. These findings suggest that these aberrantly expressed mRNAs may play roles in OSCC development and progression. The most significantly up-regulated mRNAs was MAGEA11, which has been reported to be oncogenic. Bai et al. showed that high expression of MAGEA11 increased the androgen receptor (AR)dependent cell proliferation and promoted the development of prostate cancer [23] . Lian et al. reported that MAGEA11 expression was more frequent in breast cancer tissue, compared to tumor-free breast tissue, suggesting that MAGEA11 may be involved in estrogen receptor (ER)-dependent cell proliferation and could act as a potential prognostic factor of poor outcome in breast cancer patients [24] . Tsaia et al. reported differential expression of MAGEA11 in non-small cell lung cancer (NSCLC) [25] . In contrast, the role of THCC, which was the most significantly down-regulated mRNA detected in our study, is unclear. In common with previous reports on mRNAs in OSCC, similar results were obtained in our study: MMP1 (log 2 Ratio51.038711) and MMP3 (log 2 Ratio53.488456) have been reported as highly expressed in OSCC, and put forward as robust diagnostic biomarkers [26] . PLAU (log 2 Ratio51.925736) has been implicated in enhanced cell proliferation and migration and, has been proposed as a prognostic marker of relapse-free survival of OSCC, together with its receptor uPAR [27] . Sapkota et al. reported that S100A14 (log 2 Ratio5 -1.595579) was progressively down-regulated in vitro during the transition from normal to dysplastic stages of cancer in OSCC progression. In addition, an inverse correlation between mRNA expression levels of MMP1 and MMP9 with S100A14 was found in 19 cases of OSCC [28] . Other differentially expressed mRNAs identified in our study have been also been reported in different cancers, including AZGP1 (log 2 Ratio5 8.392317), COL1A1 (log 2 Ratio52.460066), SPARC (log 2 Ratio51.866150), COL4A1 (log 2 Ratio5 1.487532), ISG15 (log 2 Ratio51.434671), RUNX3   [29][30][31] . Gene Ontology (GO) is an international standardized gene functional classification system which offers a dynamic-updated vocabulary to comprehensively describe properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component, and biological process. Every GO term belongs to one of these ontologies [32] . Because genes often cooperate with each other to perform their biological functions, pathway analyses facilitate further understanding genes and their roles [33] . In this study, the results of GO functional enrichment analyses indicated that differentially expressed mRNAs in OSCC: were significantly enriched in cellular components, including extracellular region, extracellular matrix, macromolecular complex and cytoplasm; involved molecular functions that included endopeptidase activity, growth factor binding, peptidase activity; and may be involved in biological processes which are  closely related to the development and progression of cancer, including immune system process, cell migration, cell motility [34][35] . These results suggest that these differentially expressed mRNAs play an important role in the tumorigenesis and development of OSCC. The results have provided an experimental basis for further research of these differentially expressed mRNAs in OSCC.

ΔΔ ΔΔ
As shown in Table 6, the differentially expressed genes (DEGs) were enriched in 215 pathways, of which 68 were found to be significantly enriched by KEGG pathway enrichment analysis. These included ECM-receptor pathway interaction and focal adhesion pathways, which have both been previously reported in OSCC [29][30][31] . In addition, the results of our study support previous studies which reported that mTOR affects the progression of OSCC, suggesting that the mTOR signaling pathway may play a role in OSCC development [36][37] .
To our knowledge, our study presents the first genomewide profiling of mRNAs of OSCC by high-throughput RNA-Seq. We utilized this data to identify the mRNAs that were significantly up-regulated or down-regulated in OSCC tissue compared to matched paraneoplastic normal tissue. By exploring their GO ontologies, our findings suggest that these differentially expressed mRNAs may act as oncogenes or tumor suppressors in the development and progression of OSCC. Our data has provided novel insights into cancer biology. However, further studies are required to determine their potential roles as diagnostic biomarkers of candidate therapeutic targets for OSCC.