A primary tumor gene expression signature identifies a crucial role played by tumor stroma myofibroblasts in lymph node involvement in oral squamous cell carcinoma

Oral squamous cell carcinoma (OSCC) is the most common oral and pharyngeal cancer, and is responsible of approximately 3% of cancers in men and 2% in women in the Western World, with increasing incidence rates in developing countries. Early detection by screening is necessary to prevent fatal disease because early, curable lesions are rarely symptomatic. The overall 5-yr survival rate is approximately 50% when surgery, radiation, or both are employed as treatment options, but lymph node involvement greatly influences this estimate, by decreasing the survival rate by about 50%. Here, we aimed at finding genetic signatures associated with lymph node metastasis in OSCC patients. We addressed this issue by whole transcriptome analysis through microarray expression profiling of a set of OSSC specimens of patients without lymph node involvement (10 patients, mean age ± SD 61.2±13.8, male 7, female 3) and with lymph node involvement (11 patients, mean age ± SD 62.1±15.1, male 8, female 3). We evidenced a gene expression signature associated to muscle contraction-related genes in specimens obtained from OSCC patients with lymph node involvement. This gene signature suggests the presence of myofibroblasts in tumor stoma of patients with lymph node involvement and emphasizes the decisive role played by myofibroblasts probably through their secretome in determining OSCC invasiveness.


INTRODUCTION
Oral squamous cell carcinoma (OSCC) represents more than 95% of the carcinomas of the oral cavity and the most common form of head and neck cancer, corresponding to the sixth commonest cancer in males, the tenth commonest site of cancer in females, and a major health problem as well as a leading cause of death in developing countries [1]. In spite of the progress in diagnosis and treatment of other malignant tumors, OSCC www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 62), pp: 104913-104927 Research Paper www.impactjournals.com/oncotarget is characterized by dreadfully insufficient therapeutic approaches and meager rates of response to treatment that lead to limited 5-year survival rate not exceeding 55%, mainly caused by locally aggressive tumor phenotypes, and the survival rate diminishes by about 50% in the presence of lymph node involvement [2,3]. Tumor biological behavior or patients' outcome is not adequately predicted by clinic-pathological characteristics and staging systems taking into account anatomical parameters, for example the Tumor-Nodes-Metastasis (TNM) classification. A crucial role in the anatomical changes, molecular alterations and pathway derangements that support the deregulation of physiological processes causing onset and progression of malignant neoplasms is played by the tumor microenvironment [4][5][6][7]. In particular, cancer related death is caused principally by metastatic progression, and the interplay between neoplastic lesions and host tissue, in addition to the mutual influence among cancer cells and the niche created by the microenvironment encompassing the tumor-associated stroma, drives the progression of cancer from onset to metastatization [8]. The process of tumor progression is driven by chemo/cytokines and growth factors produced through autocrine or paracrine secretion by elements of the cancer tissue and tumor microenvironment exerting their effects on neighboring cells [9]. Importantly, in the framework of malignant neoplasms the same factor may act as a tumor suppressor if greatly expressed in tumor tissue, or as an oncogene if greatly expressed in the tumorassociated stroma [10]. In this way, the microenvironment enclosing and sustaining the tumor tissue can impact malignant behavior of a developing neoplastic lesion and lead to metastatic dissemination [11]. This interplay may impact significantly OSCC onset and progression, as well as patients' outcome and response to therapy, and there is urgent need for a better knowledge of pathophysiological mechanisms and molecular events involved in oral squamous cell carcinogenesis to advance prognostic stratification and clinical management of OSCC patients.
A number of candidate genes involved in apoptosis, cell cycle and cell proliferation have been found associated with OSCC progression [12][13][14][15]. Anyway, it is not possible to predict the biological behavior of OSCC on the basis of the differential expression of a single gene. Transcriptome-wide analysis performed by means of highthroughput gene expression profiling techniques permits the evaluation of expression levels of thousands of genes and allows to identify gene expression signatures to screen patient subsets and stratify prognostic classes [16,17]. Besides, better understanding of the processes involved in carcinogenesis and metastatization was afforded by gene expression profiling using microarray hybridization, which rather than pinpointing the expression of a small number of genes makes available genomic-scale expression outlines, consenting the investigation of genetic expression variability in the background of broader genetic contexts and identification of specific biochemical pathways that might be targeted by new therapeutic agents. A number of studies have addressed this issue in OSCC patients and identified gene signatures comprising functional gene groups correlated with aggressiveness and invasiveness. The data allow patient subsets clustering and corroborate the evidence that primary OSCC with and without metastatic lymph node dissemination are hallmarked by measurable differences in gene expression, able to predict the node status and overall prognosis of an independent set of patients based on the gene expression of the primary tumor [18][19][20][21][22][23]. Anyway, none of these results highlighted the role of tumor microenvironment, implying the need for further studies pinpointing processes in both the tumor and nontumor cells.
We planned to evaluate a series of OSCC specimens obtained from patients with and without lymph node involvement by means of gene expression microarray technology and both a qualitative and a quantitative enrichment analysis. Our analyses identified a significant enrichment of muscle contraction-related genes on the side of lymph node-positive OSCC, in respect to specimens obtained from lymph node-negative patients, suggesting a crucial role played by myofibroblasts present in tumor stroma in driving invasive behavior.

RESULTS
The 21 individuals under examination, with and without lymph node involvement, clustered in an unsupervised fashion by their gene expression profiles, as from the PCA and heatmap plots in Figure 1. A total of 41 genes were differentially expressed when fixing the fold change barrier to ±2 at a nominally significant p-value (p < 0.05) and remained significant after correcting for multiple testing (FDR < 0.05): 19 genes were expressed at higher levels and 22 gene were expressed at a lower level in tumor specimens from OSCC patients with lymph node involvement (Table 1). Differentially regulated genes in the primary tumor of patients with OSCC with and without lymph node involvement are rendered as heat map and PCA in Figure 1. The most striking functional difference between samples obtained from lymph nodenegative and lymph node-positive patients consisted in the significant enrichment of muscle contraction-related genes in favor of lymph node-positive patients. Muscle contraction (GO:0006936, enrichment ratio = 12.62, adjusted p-value = 1.94e-05) resulted to be the most significant Biological Process, according to WebGestalt analysis. CHRNA1, CKMT2, DSC2, MYBPC2, MYOT, NEB, SMPX, and TNNC1 genes represented this category, and were found to be highly expressed in lymph nodepositive patients. Other significant categories, indirectly associated to muscle contraction, were: contractile fiber (GO:0043292, adjusted p-value = 2.66e-06); sarcomere (GO:0030017, adjusted p-value = 1.04e-05); myofibril www.impactjournals.com/oncotarget  , and not by up-regulated genes in specimens from lymph node-negative patients, thereby indicating that they characterize exclusively the lymph node-positive sub-set of patients. DAVID confirmed these results, by ranking a cluster containing muscle contraction process with the highest value (enrichment score = 3.12, GO:0006936, FDR=0,04). Still, highly expressed genes in specimens from lymph node-negative patients were not assigned to any significant functional cluster. Similar results were achieved even if lowering the fold change barrier to ±1.5; DMD, MURC, MYL3, MYLPF, MYOT, MYOZ2, NEB, RYR1, SYNM, TNNC1, TNNI1, TNNI2, TPM2, TPM3, and TRIM54 represented significantly the abovementioned ontological terms and were almost exclusively expressed in specimens from lymph nodepositive patients. WebGestalt and GeneCodis3 agreed on the enrichment of a myocyte enhancer factor recognition site, recurrent in five genes of our dataset (adjusted p-value = 0.0007): the HMEF2_Q6 binding site. Four out of these genes (CKMT2, SMPX, SOX2 and TNNC1) were associated to the GO terms: muscle contraction, sarcomere and myofibril, and were highly expressed in samples from lymph node positive patients. Only one gene, ENO3, was associated to the HMEF2_Q6 binding site, still being down regulated in the lymph node positive patients' dataset. Significant co-occurrences of HMEF2_Q6 sites in four of the previously mentioned genes (CKMT2, ENO3, SMPX and TNNC1) were also confirmed by GeneCodis3 (adjusted p-value = 2.26e-05).
The expression of muscle tissue associated genes in samples from lymph node positive patients but not in OSCC cell lines suggests that such gene signature may derive from myofibroblasts, which are absent in normal oral mucosa and constitute the OSCC stroma [29].
We also studied the expression profile of the 22 genes over-expressed in lymph node negative patients' samples. We noticed that two relevant genes: DSC2 (encoding desmocollin 2, involved in cell adhesion, GO:0007155), and KRT17 (encoding keratin 17, involved in morphogenesis of an epithelium, GO:0002009; signal transduction GO:0007165; epidermis development, GO:0008544; positive regulation of cell growth, GO:0030307) are highly expressed in oral squamous cell carcinoma cell line, while they are down regulated in specimens from patients with lymph node involvement. KRT17 expression level is confirmed to be elevate in most OSCC cell lines (BHY, BICR 1, BICR 7, HN, as evidenced in RNAseq (ArrayExpress ID: "E-MTAB-2706") and array public experiments (E-GEOD-30784, log2FC=4.1 for "oral squamous cell carcinoma vs normal samples", p-value=2.19E -38 ). Interestingly, under-expression of KRT17 gene within OSCC samples without lymph node involvement was previously reported [30].
Interestingly, also the NEFL gene (encoding the light chain neurofilament protein and involved in cell death, GO:0008219; intermediate filament organization, GO:0045109) was found to be under-expressed in OSCC specimens with lymph node involvement: it seems congruent with its role in the promotion of apoptosis and prevention of cancer invasion, as previously reported in head and neck squamous cell carcinoma cell lines [31]. Among the other, down-regulated genes in samples obtained from lymph node positive patients, we found that IGFBP2 (encoding insulin-like growth factor

DISCUSSION
The numerous constituents of the tumor stroma participate in different ways realizing a complex system that decisively impinges on cancer behavior and disease outcome, but on the other hand could represent targets for therapeutic approaches [32]. In this regard, an important role is played by cancer associated fibroblasts, found in the nearness in the developing and advancing cancer and able to promote tumor development and metastasis through secretion of chemokines. A bidirectional crosstalk between tumor cells and cancer associated fibroblasts drives discharge by cancer cells of factors augmenting the ability of the fibroblasts to secrete an assortment of tumor-promoting chemokines, which subsequently impact on the malignant behavior of tumor cells upholding their proliferation, migration, and invasiveness [33]. The chemokines secreted by cancer associated fibroblasts impinge on the tumor microenvironment as well, stimulating neoangiogenesis and in some instances an important presence of cancer-supporting macrophages in tumors [34]. Tumorigenicity is propped up by a number of proteins secreted by cancer associated fibroblasts and, in turn, factors secreted by cancer cells induce secretion of these proteins, exerting considerable effects on the ability of fibroblasts to promote tumor growth [35,36]. This interplay seems to play a crucial role in OSCC where the crosstalk between carcinoma associated fibroblasts and OSCC cells influences phenotype transition of cancer cells as a prerequisite for tumor progression, conditionating patients' outcome and therapies efficacy, and activated fibroblasts and in particular myobroblasts are often found in the stroma of OSCC, representing an important risk factor of patient's shortened survival [37][38][39]. Myofibroblasts in the stroma of OSCC may influence proliferation and invasion, resulting in more aggressive tumor, and tumor-associated myofibroblasts produce higher levels of growth factors compared to tumor-associated fibroblasts, and induce in vitro www.impactjournals.com/oncotarget increased production of matrix metalloproteinases (MMP) accompanying invasion of OSCC cells [38]. In agreement with these evidences we found a significant enrichment of muscle contraction-related genes in the specimens obtained from lymph node-positive patients, and in particular this category was enriched by the genes TNNC1, CKMT2, SMPX, MYOT, CHRNA1, MYBPC2, NEB, found upregulated, and DSC2, found downregulated.
Similarly, other categories found enriched by upregulated genes in specimens from lymph node-positive patients only were contractile fiber, sarcomere, myofibril, which are in relationship with the immunoistochemical evidence of myofibroblast occurrence in tumor stoma of node positive patients and support the prominent role for myofibroblasts in OSCC invasiveness [40,41]. In addition, bioinformatics tools for functional genomic, proteomic and large-scale genetic studies agreed on the enrichment of a myocyte enhancer factor recognition site, the HMEF2_Q6 binding site, recurrent in five genes of our dataset. Four out of these genes (TNNC1, CKMT2, SMPX and SOX2) were associated to the GO terms: muscle contraction, sarcomere and myofibril, and were highly expressed in samples from lymph node positive patients. The most important activated stromal Figure 2: Ingenuity pathway analysis. Embryonic Development, Organ Development, Organ Morphology network illustrated as a graph according to differentially expressed genes. A node represents a differentially expressed gene. The node color is associated with expression level. Red indicates that the gene is down-regulated while green indicates that the gene is up-regulated. Grey indicates that the gene expression level is unpredicted. www.impactjournals.com/oncotarget fibroblast phenotype is the myofibroblast, and the presence of myofibroblasts in tumour stroma is notably associated with a poor prognosis of OSCC patients [39]. Among the up-regulated genes we did not find ACTA-2, encoding α-smooth muscle actin (α-SMA), the most commonly used myofibroblast molecular marker, probably in relation to the differentiation process. Transdifferentiation of myofibroblasts is induced in the course of oral carcinogenesis and in particular in the invasive stage of OSCC irrespective of the epithelial cell differentiation. The protomyofibroblast, whose stress fibers contain only β-and γ-cytoplasmic actins, is a stable phenotype in vitro, but is an intermediate step in most in vivo conditions where it may progress, but not necessarily always, into the appearance of the differentiated myofibroblast, which is characterized by de novo expression of α-SMA [42].
Rearding DSC2, this gene encodes desmocollin 2, a calcium-dependent glycoprotein member of the cadherin superfamily, which in epithelial cells makes up the adhesive proteins of the desmosome cellcell junction and is required for cell adhesion and desmosome formation. The down-regulated expression of DSC2 in specimens from patients with lymph node involvement in concert with the down-regulation of KRT17, encoding keratin 17, which are normally greatly expressed in healthy mucosa, represent a further hallmark of invasive OSCC. Regarding to SOX2, a putative cancer stem-like cell/cancer-initiating cell marker for several human malignancies, it was found down-regulated in the specimens obtained from lymph node positive OSCC. In previous studies its immunohistochemical staining has been correlated with poor overall, cancer-specific and disease-free survivals illustrated as a graph according to differentially expressed genes. A node represents a differentially expressed gene. The node color is associated with expression level. Red indicates that the gene is down-regulated while green indicates that the gene is up-regulated. Grey indicates that the gene expression level is unpredicted. www.impactjournals.com/oncotarget in patients with histologically node-negative tongue localized OSCC [43] and with lymph node metastasis in OSCC when the immunohistochemistry analysis evidenced a diffuse staining pattern [44]. Among the genes found down-regulated in the specimens obtained from lymph node positive patients, the NQO1 gene encodes the phase II drug-metabolizing enzyme NAD(P)H:quinine oxidoreductase 1 (NQO1) and is considered significant for susceptibility to general carcinogenesis, and the genomic mutation in NQO1 is frequently found in cancer of colon, bladder, lung and pediatric leukemia. In addition, the genomic polymorphism in NQO1 is related to OSCC susceptibility, representing a genetic risk factor for OSCC carcinogenesis [45]. Also EPGN was found down-regulated in the specimens obtained from lymph node positive patients, and interestingly this gene was found upregulated in relapsed or metastatic head and neck squamous cell cancer patients treated with firstline cetuximab and platinum therapy who presented long progression-free survival [46]. In OSCC specimens obtained from lymph node positive subjects we found   downregulation of SERPINB11 and MMP10, and our data are in agreement with reports showing downregulated expression of SERPIN genes located on chromosome 18q21 in OSCC [47], and the reliability of MMPs as molecular markers for monitoring progression from normal tissue to dysplasia and OSCC more than to invasive behavior of tumor cells [48].
On the other hand, in lymph node positive OSCC patients we found upregulation of SERPINA3, the gene encoding the protease inhibitor alpha 1-antichymotrypsin. This finding is in agreement with the association of SERPINA3 up-regulation with poorer survival in patients with HLA-positive cervical carcinoma [49], and with disseminated disease in patients affected by colorectal adenocarcinoma [50].
In conclusion, we performed a transcriptome-wide analysis on specimens obtained from lymph node-positive and lymph node-negative OSCC patients by means of high-throughput gene expression profiling techniques. A qualitative and quantitative enrichment analysis evidenced a gene expression signature associated with spread of cancer to cervical lymph nodes. This category is related to muscle contraction-related genes, suggesting the presence of myofibroblasts in tumor stroma of patients with lymph node metastasis. This gene signature found in the specimens obtained from lymph node-positive patients corroborates results of immunoistochemical studies performed in OSCC patients evidencing positivity of myofibroblast markers associated to invasiveness [51][52][53][54]. This evidence underscores the crucial role played by myofibroblasts probably through their secretome in determining OSCC invasiveness. In addition, this gene signature corroborates the reliability of myofibroblastic stroma to predict OSCC mortality and the importance of molecular classification to sort patients with aggressive cancers, when used for prognostic application in alternative or in association to disease staging through TNM system.

Patients
Tumor tissue specimens were collected from 21 patients (mean age ± SD 61.56 ±13.95 yrs, age range 47-75 yrs, male 15, female 6,) affected by OSCC during surgical intervention, 10 patients without lymph node involvement (mean age ± SD 61.2±13.8 yrs, age range 47-72 yrs, male 7, female 3) and 11 patients with lymph node involvement (mean age ± SD 62.1±15.1 yrs, age range 48-75 yrs, male 8, female 3). Written informed consent to a protocol reviewed and approved by the Institutional Ethical Committee was obtained from patients to use their surgical specimens for research purposes. Criteria of inclusion were: diagnosis of OSCC, no previous radiotherapy, no previous chemotherapy; criteria of exclusion were: previous treatments, infective diseases or salivary gland diseases. Primary treatment was based on surgery and local resection was governed by T categories. Part of each sample was frozen in liquid nitrogen and stored at -80°C until RNA extraction, and the remaining part was used for histopathological analysis. Patients' characteristics are described in Table 2.

RNA extraction from fresh frozen tissue and first-strand cDNA synthesis
About 150 to 200 mg of fresh frozen tissues were sampled, total RNA was obtained by phenol extraction (TRIzol Reagent, Invitrogen Corporation, Carlsbad, CA, USA), and subsequently purified by column chromatography (RNeasy Mini Kit, Qiagen, Valencia, CA, USA). The amount of extracted RNA was checked by Nano Drop Spectrophotometer whereas RNA integrity was monitored using Agilent 2100 Bioanalyzer after subsequent digestion by DNaseI. Next, 1.0 mg of total RNA was reversed transcribed using the High-Capacity cDNA Archive Kit following the manufacturer's instructions (Applied Biosystems, Applera, Foster City, CA, USA).

DNA microarray assays
Whole genome gene expression profiling experiments was performed by using Affymetrix GeneChip1 Human Gene 1.0 ST Arrays. Samples were prepared for Affymetrix GeneChip1 Human Gene 1.0 ST Array System that interrogate 28,869 well annotated genes by using an average of 26 probes per gene; 100 ng of total RNA were processed according to the GeneChip Whole Transcript (WT) Sense Target Labeling Assay as provided by the manufacturer (Affymetrix). A random priming method was used to generate cDNA from all RNA transcripts present in a sample. Each DNA fragment was end-labelled with biotin using terminal deoxynucleotidyl transferase (TdT) before being hybridized on a GeneChip hybridization Oven 640. Following hybridization and post-hybridization washes, the arrays were scanned using the Affymetrix GeneChip Scanner 3000 7G to generate the raw data (CEL file). The QC steps of the experiment were performed using Expression Console software (Affymetrix, Santa Clara, CA).

Statistics
Statistical analysis of gene expression profiles was performed by using Partek Genomic Suite ver. 6.6 (Partek Inc., St. Louis, MO). Raw probe intensity values were background corrected by the Robust Multi-array Average (RMA) algorithm, quantile normalized and log transformed. Principal Component Analysis (PCA) was performed, as a quality control procedure, to verify that the division of the 21 individuals into N+ (lymph node involvement) and N-(without lymph node involvement) groups may be supported by a significant difference in gene expression profiles. Differentially expressed genes between 10 OSSC specimens of N-patients and 11 N+ patients were obtained by two-way ANOVA. Significant genes were those exhibiting at least 2-folds difference in gene expression between the two groups and FDR-corrected p values < 0.05.

Gene set enrichment analysis
We have identified functional classes of differentially expressed genes or association with disease phenotypes by gene set enrichment analysis. We have compared the list of significant genes with the Gene Ontology FAT category of biological processes, molecular functions and cellular component, through DAVID ver. 6.8 [24,25], WebGestalt (update 2013) [26] and GeneCodis3 [27]. Genes were first analyzed together, and then in subgroups made of 19 up and 22 down regulated genes only. DAVID was used to identify significant clusters of functional annotations of pathways and biological processes, while WebGestalt and GeneCodis3 were run against a number of Transcription Factor Binding Sites (TFBS) repositories with the aim to capture the putative regulatory sites.

Functional network analysis
Functional networks of differentially expressed genes were inferred by Ingenuity Pathway Analysis (IPA spring 2017 release; QIAGEN, Redwood City, CA; www.qiagen.com/ingenuity). The entire inferential procedure worked on the Ingenuity Knowledge Base, which is a large structured collection of observations in various experimental contexts with nearly 6 million findings manually curated from the biomedical literature or integrated from third-party databases. Resulting networks were ranked according to their degree of relevance to the differentially expressed genes. The network scores are based on the hypergeometric distribution and were calculated with the right-tailed Fisher's Exact Test. Genes wired by the resulting networks were colored in green if up-regulated in N-(lymph node negative), red if up-regulated in N+ (lymph node positive) and white if not significantly deregulated into our datasets.