Transcriptome Analysis Identifies ALCAM Overexpression as a Prognosis Biomarker in Laryngeal Squamous Cell Carcinoma

Background: Laryngeal squamous cell carcinoma (LSCC) is one of the most incident tumors in the world, especially in developing countries, such as Brazil. Different from other tumors, LSCC prognosis did not improve during the past four decades. Therefore, the objective of this study was to develop biomarkers that can predict LSCC patient’s prognosis. Results: Transcriptome analysis pointed out 287 overexpressed genes in LSCC in comparison to adjacent mucosa. Among these, a gene-pattern signature was created with 24 genes associated with prognosis. The Bayesian clustering of both Brazil and The Cancer Genome Atlas (TCGA) data pointed out clusters of samples possessing significative differences in the prognosis, and the expression panel of three genes (ALCAM, GBP6, and ME1) was capable to distinguish patients with worse prognosis with an accuracy of 97%. Survival analyses with TCGA data highlighted ALCAM gene expression as an independent prognostic factor for LSCC. This was further confirmed through immunohistochemistry, using a validation set of Brazilian patients. ALCAM expression was not associated with prognosis for other head and neck tumor sites. Conclusion: ALCAM overexpression seems to be an independent prognosis biomarker for LSCC patients.


Introduction
Laryngeal squamous cell carcinoma (LSCC) is a highly incident and mortal disease [1], affecting mainly the male population of medium-and low-income countries, such as Brazil that presents the fourth highest incidence of this disease in the world [2,3]. LSCC diagnosis and treatment is multidisciplinary, with the employment of different procedures. Nevertheless, 60% of patients present advanced disease, and LSCC is one of few tumors with decreasing five-year survival rates over the past 40 years [4,5]. Consequently, there is a demand to improve therapy response.
The application of molecular biomarkers in the diagnosis, prognosis, and treatment choice was essential to reduce the mortality rates of prostate, breast, colorectal, and lung cancer [6][7][8][9]. Recently, The Cancer Genome Atlas (TCGA) consortium published a comprehensive molecular study on head and neck squamous cell carcinoma (HNSCC) [10]. However, not only several molecular alterations were not site-specific, but also, they were not analyzed regarding clinicopathological features [11]. There is a lack of a more comprehensive molecular analyses concerning LSCC prognosis [5], resulting in gene-specific or small gene panel analyses. So, at the somatic level, mutations in CDKN2A and TP53, and copy number alterations in CDKN2A, PIK3CA, and HER2 were associated with worse survival rates of patients with LSCC. Besides, analysis of mutations in CDKN2A and TP53 in laryngeal dysplasia could predict lesions that would develop into a tumor [12][13][14]. In addition, hypermethylation of CDKN2A gene body was associated with better locoregional control after surgery [15], and LMX1B hypermethylation was associated with worse overall and disease-free survival rates [16]. Additionally, the expression of long noncoding RNAs, such as CCAT1, DGR5, H19, and HOTAIR, were also associated with LSCC prognosis and diagnosis [17]. The signature of claudin expression, specifically claudin 1, 3, 7, and 8, was associated with early diagnosis and metastasis identification and related to prognosis [18], whereas the analysis of tumor-associated immune cells components, especially CD3, CD4, CD8, CD68, and CD163 positive cells, was described as useful to predict the response of immunological checkpoint inhibitor therapy [19].
Therefore, the objective of this study was to identify biomarkers associated with LSCC prognosis. To achieve it, we carried out a transcriptomic analysis and validated the identified genes in our own validation set of samples, and also in the TCGA database, revealing ALCAM overexpression as an independent prognostic factor for LSCC patients.

Identifying Molecular Prognostic Biomarker for LSCC Patients
Transcriptome analysis revealed 725 differentially expressed genes (DEG), 287 overexpressed and 438 underexpressed, in LSCC when compared to nonmalignant surrounding mucosa (NSM) ( Table S1). These DEGs were related to cell signaling pathways associated with neoplastic progression, such as cell-extracellular matrix interaction, focal adhesion, PI3K/AKT, and small cell lung cancer-associated pathway.
Among the 287 overexpressed genes, 24 were associated with prognosis (log-rank p-value < 0.05), creating a suggestive prognostic gene-pattern expression signature panel of LSCC (Table 1). Therefore, the expression value of this prognostic gene-pattern signature was used in clustering LSCC samples in the investigation set of samples, resulting in two groups with significative differences in prognosis. Cluster 1 contained samples from patients who presented better prognosis (median survival 129.4 months) than patients grouped in Cluster 2 (median survival 14.10 months) (p = 0.0002, Harzard Ratio (HR) = 45.41, 95% confidence interval (CI) = 6.19−333.0) ( Figure 1A). Applying this gene set to TCGA data, three clusters of LSCC samples were observed. Cluster 3 showed a five-year survival rate of 36.3%, presenting a significative worse prognosis than samples from Clusters 1 and 2, which possessed a five-year survival rate of 65.4% ( Figure 1B).
The ROC (receiving operating characteristic) curve analysis with all 24 genes applied to TCGA validation set revealed area under curve (AUC) of 1.0 to detect LSCC patients with worse prognosis, and the same result was observed when using as a minimal subset of 12 genes (ADH7, ALCAM, CYP2C19, GBP6, LYPD6B, TPD52L1, ODC1, BTBD11, PTGR1, ME1, C12ORF75, and ACVR1). Further analysis reducing the number of genes from the panel revealed that with three genes (ALCAM, GBP6, and ME1) we could reach an accuracy of 0.97 (sensitivity of 94.7% and specificity of 93.1%). Further reduction in the number of genes caused a significant loss of accuracy ( Figure 1C,D).   Aiming to understand possible reasons behind overexpression of ALCAM, we also analyzed ALCAM somatic alterations in the LSCC TCGA dataset. ALCAM was amplified in 11.8% of LSCC samples, showing association with the expression levels (p = 0.018). Only one sample showed a missense mutation, with unknown biological significance ( Figure S1).

ALCAM Protein High Levels Was also Associated to LSCC Worse Prognosis
In order to validate the association between ALCAM overexpression and LSCC prognosis, we evaluated ALCAM protein expression by immunohistochemistry in 44 LSCC samples ( Figure 3A-L). This analysis showed that 12 tumors (27.3%) had no ALCAM expression, while 32 (72.7%) presented positive ALCAM immunostaining. The median percentage of positive staining cells was 20%, and this value was used as cut-off for classifying samples with low or high ALCAM levels. In this way, eight samples (25%) presented low ALCAM levels, and 24 (75%) tumors presented high expression. ALCAM immunostaining was restricted to cell membranes and presented a direct correlation with ALCAM gene expression (r = 0.37, p = 0.029, 95% CI = 0.03-0.63) ( Figure S2). ALCAM protein immunohistochemical analysis confirmed the worse prognosis associated with ALCAM gene overexpression. Patients with high ALCAM levels presented a lower median survival time (30.7 months) compared to those tumors showing low or negative ALCAM levels (137.9 months), and high ALCAM protein levels was also an independent prognostic factor for LSCC (p= 0.04, HR = 2.31, 95% CI = 1.03-5.28) ( Figure 3M). No association was observed between ALCAM protein levels and LSCC clinical-pathological characteristics.

Discussion
LSCC is one of the few tumors that have presented decreasing overall survival rates during the past decades. Therefore, in this manuscript we developed a gene-expression panel that is strongly associated with LSCC patient's prognosis. Among the 24 genes that made the panel, ALCAM gene and protein expression was shown to be an independent prognostic factor for LSCC.
Activated leukocyte cell adhesion molecule (ALCAM) gene is located at human chromosome 3q13.11 [20], and encodes a transmembrane glycoprotein, which acts in the cell-cell adhesion, either in homotypic (ALCAM-ALCAM) or in heterotypic (ALCAM-CD6) interactions between adjacent cells [21]. ALCAM expression could be detectable in a variety of tissues and cells under certain spatial and temporal controls during development [14]. In homeostasis, homotypic ALCAM interactions could modulate epithelial and endothelial cells' interactions and neuronal guidance, while ALCAM-CD6 heterotypic interaction shows physiological relevance in antigen presentation in immune cell adhesion [22][23][24][25]. Several studies show a role for CD6 as a co-stimulatory molecule in T-cell activation [26,27] and the ALCAM-CD6 interaction was described as pivotal for antigen presentation [28,29]. Interestingly, it was observed that the molecule I/F8 scFv induces ALCAM internalization and the conjugation between I/F8 scFv and the saporin immunotoxin efficiently kill ALCAM-positive tumor cells selectively [30]. Additionally, vaccine-induced cytotoxic T-lymphocytes can recognize an epitope expressed by ALCAM and this could be useful as a novel mechanism of induction of potent tumor-specific cellular responses by mimotopes of tumor-associated carbohydrate antigens [31].
ALCAM seems to characterize cancer stem cells (CSC) in some tumors and ALCAM was highly expressed in intestinal stem cell niche, with an association to intestinal carcinoma progression, including benign and metastatic tumors [32]. Subpopulation of nonsmall cell lung cancer (NSCLC) triple-positive for EPCAM, ALCAM, and CD44 possessed CSC characteristics, including being highly proliferative, having greater clonogenicity, ability for self-renewal through spheroid formation, and chemoresistance [33]. Recently, ALCAM-E3 ligase-mediated degradation was associated with CSC features' regulation in HNSCC cells [34]. Besides, ALCAM membrane expression was considered a CSC marker in OCSCC-derived cell lines [35]. We are going to carry out in vitro analysis with LSCC cell lineages to try to understand the role of ALCAM overexpression in LSCC prognosis.
The long arm of chromosome 3, which presents the ALCAM gene, is a classically amplified genomic region in squamous carcinomas, particularly in esophageal squamous cell carcinoma and HNSCC [10,11]. However, only 12% of LSCC samples presented ALCAM copy number gain/amplification associated with its overexpression, suggesting that further mechanisms, such as DNA methylation, already shown to be associated with ALCAM overexpression in breast tumors [36], may also be associated with this deregulation in LSCC.
Although our data pointed out ALCAM gene expression association only with the prognosis of LSCC patients among HNSCC, our study was the only one that evaluated this marker in the larynx exclusively. In other HNSCC studies, ALCAM protein overexpression, evaluated by immunohistochemistry, was already related as an independent prognostic factor for OCSCC, associated with the sonic hedgehog signaling pathway [37], or Epidermal Growth Factor Receptor (EGFR) activation [38], in the Chinese population. In a similar way, ALCAM protein level was described as potential biomarkers for predicting tumor behavior and prognosis of salivary gland tumor in Iranian patients [39]. Recently, Clauditz et al. [40] evaluated the ALCAM protein expression in HNSCC, including LSCC samples, combining in the same group of samples laryngeal and hypopharyngeal tumors, and observed a discordant result to our findings, being ALCAM expression mainly cytoplasmic, and not associated with the prognosis of LSCC patients.
Recently, studies have proposed the potential use of gene-expression signature to measure the prognosis of LSCC patients, employing both protein-coding and -noncoding genes. Concerning protein-coding genes, a panel of 26 hypoxia-related genes was associated with the improvement of hypoxia-modifying treatment in laryngeal cancer [41], and the expression of 18 inflammatory-associated genes was capable to distinguish LSCC samples according to prognosis with AUC of 0.61 [42]. A panel of two long noncoding RNAs was also associated with LSCC prognosis presenting AUC of 0.69 [43]. The limited number of studies that propose biomarkers for LSCC prognosis reflects in only one clinical trial recruiting LSCC patients according to a biomarker, aiming to block PD1/PD-L1 interaction.
Although our transcriptomic analysis was conducted in a limited number of samples, it was one of the few studies that evaluated exclusively LSCC, separate from the large HNSCC group. Moreover, our data exposed the prognostic value of the gene panel and ALCAM in both Brazilian and TCGA samples. Besides the prognostic value of ALCAM expression measurement, our study pointed out a valuable prognostic gene-expression signature, which shows high power to discriminated LSCC samples regarding their patient's outcome (12-genes signature AUC 1.00, 3-genes signature ROC 0.97), which can improve the treatment option and patient monitoring, aiming to improve treatment response.

LSCC Samples
A total of 44 LSCC and paired nonmalignant surrounding mucosa (NSM, histopathologically adjacent normal mucosa, 3 cm from tumor borders) samples were collected from 2008 to 2014 by the Head and Neck Surgical Division of the Instituto Nacional de Câncer (INCA, Rio de Janeiro, Brazil) from patients who had not undergone chemo-or radiotherapy treatment. Histopathological profiling was evaluated by the Pathology Department of INCA. All patients signed an informed consent form, and the project was approved by the institution's Ethics Committee.
Among these set of LSCC and NSM samples, 14 LSCC and 12 NSM samples were randomly selected (investigation set of samples) for transcriptome analysis. The first validation set to confirm gene expression by qPCR and immunohistochemistry analysis were conducted with samples from all 44 patients. A second validation set was composed by the Head and Neck provisional data from TCGA consortium [10]. TCGA data were analyzed with the web-based software cBioPortal [44,45]. Patients' clinical and pathological features are described in Table 3.

LSCC Gene-Expression Profiling
RNA of all samples was isolated from frozen tissue with the RNeasy Mini Kit (Qiagen, Inc, Hilden, Germany). RNA of the investigation set of samples was converted to complementary DNA (cDNA) with WT Expression Kit, biotinylated, and applied to GeneChip Human Exon 1.0 ST array (Affymetrix, Inc., Santa Clara, CA, USA), as previously described [46]. The raw data were normalized in the Expression Console software (Affymetrix) using the robust multi-array average (RMA) method. Subsequent analysis of gene expression was carried out in R environment, using the limma package, available from the Bioconductor project, to obtain quantitative expression levels for coding genes. Differentially expressed genes (DEG) were classified by the following criteria: p < 0.05 and fold-change expression cutoff |2.0|. Microarray data are available at Gene Expression Omnibus Accession Browser (accession number GSE143224) [47][48][49].

Prognostic Gene-Pattern Signature
The prognostic value of all overexpressed genes was evaluated using the microarray data herein performed. For this purpose, we analyzed each gene expression regarding its association with the patient's prognosis in the investigation set of samples. Higher and lower gene expression were defined, using as cut-off the median expression value. Genes with log-rank p-value < 0.05 were used for the Bayesian hierarchical clustering of both investigation and validation set of samples followed by survival analysis between clusters of samples. Receiver-operating characteristic (ROC) curves and the area under the ROC curve (AUC), as well as the sensitivity and specificity values, were used to assess the feasibility of using messenger RNA (mRNA) expression levels as prognostic biomarkers for LSCC patients. Initially, all prognostic-associated gene expressions were included in the ROC curve analysis. Genes were removed from the ROC curve analysis following the backward stepwise method regarding the gene individual AUC value, which resulted in a ROC curve with the lower number of genes possessing significantly high AUC. Survival analyses, Bayesian clustering, and ROC curve analyses were conducted in R using the survival packages, BHC and Epi, respectively [50][51][52].

Gene-Expression Validation by Quantitative PCR
ALCAM expression was assessed by RT-qPCR. The cDNA of 44 paired LSCC and NSM was synthesized with SuperScriptIITM Reverse Transcriptase (Invitrogen ® ) and quantitative PCR was carried out with the Quantifast SYBR Green PCR kit (Qiagen) in a Rotor-Gene 6000 thermal cycler (Qiagen).

Immunohistochemistry Analysis
Immunohistochemistry (IHC) was performed on 3-µm paraffin sections of all 44 LSCC cases. For ALCAM antigen retrieval, sections were incubated in a steam oven while submerged in a trilogy buffer solution (Cell Marque), for 30 min at 98 • C. Sections were then incubated with the primary monoclonal antibodies against ALCAM (Sigma, St. Louis, MO, USA, HPA010926, working dilution 1:1000), for at least 12 h. Formalin-Fixed Paraffin-Embedded (FFPE) prostate carcinoma samples served as positive control staining. As the negative control, the primary antibody was replaced by the diluent solution. The detection system used was the NovoLinkTM Max Polymer Detection System (Leica Biosystems, Wetzlar, Germany), following the protocol described by the manufacturer, using diaminobenzidine as substrate (Dako). Sections were counterstained with Harris' hematoxylin. Scored cases were considered positive when at least 1% of epithelial cells were stained. LSCC samples were categorized as low and high ALCAM protein expression using the median number of positive cells as the cut-off. Samples with positive epithelial cells lower than median value were classified as low ALCAM tumors and samples with positive epithelial cells equal or higher than median value were classified as high ALCAM tumors.

ALCAM Somatic Alterations in LSCC
The frequency of ALCAM copy number alterations (CNA) and single nucleotides variants (SNV) in LSCC were evaluated in the LSCC of TCGA using cBioportal software, through whole exome sequencing and DNA microarray applying GISTIC 2.0 protocol [54], respectively.

Statistical Analyses
Differences in gene expression were evaluated using Kruskal-Wallis test, followed by Dunn's multiple comparison tests. Spearman's rank correlation was used for assessing gene and protein expression correlation. All analyses were performed with GraphPad Prism 5 software. In survival analyses using TCGA data, univariate analysis was estimated by the Kaplan-Meier method and log-rank test. Variables with p < 0.2 were selected for multivariate analysis. Finally, Cox regression was applied with the stepwise forward method [55]. R environment using the survival package was used for survival analyses. The same protocol of survival analysis was applied for immunohistochemistry data.

Conclusions
ALCAM gene and protein expression seems to be an independent prognosis biomarker to LSCC patients.