Secreted protein markers in oral squamous cell carcinoma (OSCC)

Oral squamous cell carcinoma (OSCC) is a main cause of oral cancer mortality and morbidity in central south Asia. To improve the clinical outcome of OSCC patients, detection markers are needed, which are preferably non-invasive and thus independent of a tissue biopsy. In the present study, we aimed to identify robust candidate protein biomarkers for non-invasive OSCC diagnosis. To this end, we measured the global protein profiles of OSCC tissue lysates to matched normal adjacent mucosa samples (n = 14) and the secretomes of nine HNSCC cell lines using LC–MS/MS-based proteomics. A total of 5123 tissue proteins were identified, of which 205 were robustly up- regulated (p-value < 0.01, fold change > + 2) in OSCC-tissues compared to normal adjacent tissues. The biological process “Secretion” was highly enriched in this set of proteins. Other upregulated biological pathways included “Unfolded Protein Response”, “Spliceosomal complex assembly”, “Protein localization to endosome” and “Interferon Gamma Response”. Transcription factor analysis implicated Creb3L1, ESRRA, YY, ELF2, STAT1 and XBP as potential regulators. Of the 205 upregulated tissue proteins, 132 were identified in the cancer cell line secretomes, underscoring their potential use as non-invasive biofluid markers. To further prioritize our candidate markers for non-invasive OSCC detection, we integrated our data with public biofluid datasets including OSCC saliva, yielding 25 candidate markers for further study. We identified several key proteins and processes that are associated with OSCC tissues, underscoring the importance of altered secretion. Cancer-associated OSCC secretome proteins present in saliva have potential to be used as novel non-invasive biomarkers for the diagnosis of OSCC.


Background
Head and neck squamous cell carcinoma (HNSCC) arises from the mucosal epithelium of the oral cavity, nasopharynx, oropharynx, hypopharynx (and larynx. Squamous cell carcinoma is the most common histological type of head and neck cancer, accounting for 90% of all head and neck malignancies. Oral squamous cell carcinoma (OSCC) and oropharyngeal squamous cell carcinoma (OPSCC) are the most common HNSCC [1]. OSCC is the 6th most common epithelial malignancy world-wide and associated with high morbidity and mortality. Therefore, OSCC is a global growing concern for public health [2].
The main risk factors related to HNSCC development are use of smokeless tobacco products, chewing/smoking tobacco, genetic predisposition, alcohol consumption and/or infections with human papillomavirus (HPV). In Pakistan, Taiwan, India and China several social behaviors, such as chewing raw tobacco, gutka, paan (betel quid), chaliya (areca nut), manpuri and niswar are among common addictions [3]. Initial symptoms are often mild, causing a delay in presentation. Diagnostic care includes Graphical Abstract Table 1 Clinical information of the study subjects clinical examination of the oral mucosa and histological evaluation of tissue biopsies. However, these methods generally lead to a late diagnosis of OSCC. Therefore it is important to develop new non-invasive diagnostic tools that can detect the presence of OSCC at an earlier stage preferably in saliva or oral rinses.
To identify new biomarkers that can improve the diagnosis of OSCC, analysis of global protein expression and secretion using mass spectrometry has gained increasing interest [4]. Global identification of altered expression of proteins in OSCC tissues may provide an unbiased approach to find vital biological pathways, leading to improved understanding of the primary molecular mechanism responsible for disease development and progression. Non-invasive markers may be best identified using biofluid analysis, yet plasma proteomics has been hampered, mainly because of the vast complexity and great dynamic range of proteins in blood. To overcome this challenge, cancer cell and tumor tissue secretome analysis have emerged as alternative approaches to identify candidate tumor serological markers [5,6].
Recently, the power of clinical proteomics was shown in a large scale analysis of 66 HNSCC tumors [7]. Comparison to paired normal adjacent tissues revealed profound alterations in multiple biological processes such as downregulation of platelet degranulation, acute inflammatory response, fatty acid metabolic process and muscle system process and upregulation of protein hydroxylation, leukocyte migration, cell chemotaxis, and angiogenesis [7].
In this study, we set out to identify "non-invasive" protein biomarker candidates that can be detected in oral cancer-relevant biofluids. To this end, we employed mass spectrometry-based proteomics, to examine protein expression profiles in OSCC and adjacent normal tissue of 14 patients along with analysis of secretomes of nine HNSCC cell lines. To further prioritize our candidate markers for non-invasive OSCC detection, we integrated our data with public biofluid datasets: salivary proteome healthy vs. OSCC dataset by Chu et al. [8], Human salivary proteome by Sivadasan et al. [9] and available Normal Saliva Proteome database (https:// saliv arypr oteome. nidcr. nih. gov/). Moreover, we also describe oral cancer associated biological pathways and TFs involved in regulating those proteins to gain understandings of differential biology of secretion. or with history of current or earlier chemotherapy or radiotherapy were excluded from the study. All tumors of OSCC patients included in the study were HPV negative. Tumors were TNM staged according to AJCC recommendations. After surgery tissue specimens (tumor and adjacent non-tumor tissue) were immediately frozen in liquid nitrogen and stored at − 80 °C for future use. Non-tumor tissues were cancer cell-free and around 80% cancer cells were present in stored tumor tissues as revealed by pathological evaluation. The histology information showing cases of moderately differentiated SCC and well-differentiated SCC in (Additional file 1: Fig. S1). The tumor composition consisted of ~ 80% cancerous and 20% non-cancerous cell presence ensures a high likelihood of selecting proteins originating from cancer cells. To further confirm this, we generated secretome data from a panel of cancer cell lines and focused our attention on the overlapping proteins. Clinical information of patients was collected from the hospital files including medical history and demographic data (Table 1).

Tissue lysis and protein extraction
As mentioned previously [10], 200 mg of the tumor tissue and control tissue samples were homogenized with liquid nitrogen in a pestle with mortar, suspended in modified chilled lysis buffer, vortexed for 1 h and centrifuged for 90 min at 1400 rpm at 4 °C (Eppendorf Centrifuge, 5417R). The resultant supernatant was aliquoted and stored at − 80 °C. Bradford assay [11] with bovine serum albumin (BSA) as reference was used for estimating protein concentration in tissue lysates.

Cell line secretome collection
Conditioned medium from cell lines was collected and processed as described before [17]. Briefly, cells were seeded in T175 flasks and cultured until ~ 70% confluence. Cells were subsequently washed three times with PBS and incubated for 24 h in serum-free medium prior to harvest of the conditioned medium. Following centrifugation (5 min at 500 × g) and passing of the supernatant over a 0.45 μm filter for removal of cell debris, the medium was concentrated to ~ 200 μl using an Amicon 3 kDa filter (Merck Millipore). Protein concentration determination was performed using BCA Protein Assay Kit (Pierce, Thermo Scientific). Following addition of LDS Sample Buffer (Invitrogen) containing 100 mM dithiothreitol, 30 µg of secreted protein lysates were separated using gel electrophoresis.

Protein gel electrophoresis of tissue lysates and cell line secretomes and in-gel digestion
Prior to mass-spectrometry analysis, proteins were fractionated by gel electrophoresis followed by coomassie blue staining. Gel electrophoresis and in-gel digestion were done as described earlier [17]. Tissue lysates (25 µl) were separated on a 12% SDS-PAGE acrylamide gel (BioRad, Hercules, CA), at 100 V until the sample had entered just into the running gel. The cell line secretomes were separated until the bromophenol-blue dye-front reached the end of the running gel. Following electrophoresis, gels were fixed in 3% phosphoric acid/50% ethanol solution, stained with Coomassie brilliant blue G250 and then washed twice in 50 mM ABC, twice in 50 mM ABC/50%ACN and once in 50 mM ABC. Gels bands (tissue lysate) were cut into ~ 1 mm 3 cubes. The lane for each secretome was cut into 5 fractions and each fraction was cut into ~ 1 mm 3 cubes. Gel cubes were reduced in 10 mM DTT/50 mM ABC for 1 h at 56 °C and alkylated in 50 mM iodoacetamide for 45 min at room temperature in the dark. After washing in 50 mM ABC/50% ACN, gel cubes were dried in vacuum centrifuge for 10 min at 50 °C. Following gel cube rehydration by trypsin solution (Promega, 6.25 ng/mL in 50 mM ABC), the gel cubes were covered with 50 mM ABC and incubated overnight at 25 °C. Peptides were isolated from the gel cubes with 5% FA/50% ACN (twice) and 1% formic acid (FA) (once). In a vacuum centrifuge at 60 °C the extracts were concentrated before LC-MS/MS after which volumes were adjusted to 50 μl with 0.05% FA into LC autosampler vials after filtering through a spin filter of 0.45 μm [18].

NanoLC-MS/MS proteomic analysis
NanoLC-MS/MS measurement was performed as described previously [19]. Peptides were separated using an Ultimate 3000 nanoLC-MS/MS system (Dionex LC-Packings, Amsterdam, The Netherlands) equipped with a 40 cm × 75 μm ID fused silica column custom packed with 1.9 μm 120 Å ReproSil Pur C18 aqua (Dr Maisch GMBH, Ammerbuch-Entringen, Germany). After injection, peptides were trapped at 10 μl/min on a 10 mm × 100 μm ID trap column packed with 5 μm 120 Å ReproSil Pur C18 aqua in buffer A (buffer A: 0.1% formic acid in MQ; buffer B: 80% ACN + 0.1% formic acid in MQ) and separated at 300 nl/min in a 10-40% buffer B gradient in 90 min (130 min inject-to-inject) at 35 °C. Eluting peptides were ionized at a potential of + 2 kVa into a Q Exactive mass spectrometer (Thermo Fisher, Bremen, Germany). Intact peptide masses were measured at resolution 70.000 (at m/z 200) in the orbitrap using an AGC target value of 3 × 10 6 charges. The top 10 peptide signals (charge-states 2 + and higher) were submitted to MS/MS In the HCD (higher-energy collision) cell (1.6 m/z isolation width, 25% normalized collision energy) using an AGC target value of 1 × 10 6 charges an underfill ratio of 0.5% and a maxIT of 60 ms at resolution 17.500 (at m/z 200). Dynamic exclusion was applied with a repeat count of 1 and an exclusion time of 30 s. Each biological sample was injected twice and the LC-MS raw files for each sample were combined in the database search.

Protein quantitation and differential analysis
Spectral counting quantification of proteins was used, that is the sum of all MS/MS spectra for every detected protein [21]. Spectral counts for known proteins in a sample were normalized to the sum of spectral counts for that sample and subsequently multiplied by the mean of the sum for all samples. This procedure gives the relative spectral count contribution of a protein to all spectral counts in the sample. The normalized spectral counts were used to calculate the ratio of different biological sample groups. In that manner, we were able to correct for loading differences between samples. Differential analysis of proteins between samples was done using dedicated statistics by the beta-binominal test [22] using the R package countdata, which considers within-and between-sample variations. Cluster analyses of differentially expressed proteins were performed using hierarchical clustering in R, in which the protein abundances were normalized to zero mean and unit variance. For protein clustering, the Euclidean distance was used. The mass

Data mining, visualization & network analysis
The "protein-protein interaction network" analysis of up-regulated proteins in OSCC was performed using the STRING tool, version 10.5 [24] with medium stringency and default settings. The edges represent protein-protein interactions and nodes denote proteins based on different levels of evidence collected by String, and also visualized using MCL-cluster plugin of Cytoscape version 3.4.1 [25]. Gene ontology analysis was performed using the ClueGO app to get biological processes overrepresented and proteins involved (32) with cut-off 0.01 p-value. We used WebGestalt tool to do the GSEA. It was performed using a pre-ranked protein list as input. All proteins in the dataset were assigned a rank based on the differential expression analysis statistics (-log10 p-value multiplied with the sign of the fold-change) [26] and choose geneontology (biological process noRedundant) under functional database category to identify GO-biological-processes enriched for high and low expressed proteins. Expression plot was made in R. TFs which are responsible for regulating the expression of many of genes that mediate their biological activities like induction of cell-cycle arrest and apoptosis were identified using iRegulon [27]. The set criteria for motif enrichment analysis was as follows: FDR on motif similarity ≤ 0.001, identity between orthologous genes ≥ 0.0, and TF motifs with normalized enrichment score (NES) & gt; 3. The ranking preference for Motif collection was fixed to a putative regulatory region of 20 kb centered on TSS (7 species) and 10 K (9713 PWMs) was selected for the analysis. Thereafter, TF-target pairs were found based on the databases like TRANSFAC comprised in the iRegulon plugin. To predict non-classically and classically secreted proteins SignalP 5.0/SecretomeP 2.0 Server were used, whereby the signal peptide's presence in the sequence of protein was used to categorize it as classically secreted and the threshold of NN score ≥ 0.6 and no existence of a signal peptide to categorize it as non-classically secreted proteins [28].

Protein profiling of OSCC tissue vs NAT lysates
To identify altered biology and candidate biomarkers for OSCC, we performed mass spectrometry-based proteomics of cancer and adjacent normal tissues of 14 patients. For optimal label-free protein comparison, we ensured that equal protein amounts were loaded on the gels (Additional file 1: Fig. S2). In total, 5123 proteins were identified (Additional file 2: Table S1). About 3104 proteins were identified per sample. We used a paired beta-binomial test to identify the differentially expressed proteins. We used unadjusted p-values since the protein ranking does not change between adjusted and unadjusted values. To guard from multiple testing issues, we performed extra filtering using a fold change cutoff to identify candidates for further exploration. In addition, as the proteins do not act independently, the selected proteins were subsequently subjected to further network analysis. Finally, we used overlap analysis of multiple datasets to pinpoint biomarker candidates of high confidence and highest interest.

Up-regulated protein networks associated with OSCC
To study which proteins and pathways were associated with OSCC tissues, we performed differential analysis. A total of 205 upregulated proteins and 94 downregulated   Table S2) were significantly differentially expressed (p < 0.01, twofold up/down). First, we performed GSEA on the proteins identified in either the mucosa and OSCC samples to investigate the most prominent biological processes. This analysis revealed that secretion was the most significant pathway enriched in OSCC and myogenesis the pathway mostly reduced in tumor samples (or relatively enriched in mucosa samples) (Fig. 1). The latter likely indicates that the mucosa samples contained a relatively high number of muscle cells when compared to the OSCC samples. This relates to the different tissue architecture of either normal mucosa or tumor tissue and is hard to circumvent. For this reason we focused our analyses on the upregulated proteins, which are also easier to detect as biomarkers.
To gain further insight into the tumor biology, we generated a protein-protein interaction network for the 205 upregulated proteins in OSSC and found high connectivity with several tightly connected clusters (Fig. 2a, b, Additional file 4: Table S3). The 6 major clusters were enriched for proteins involved in secretory pathway (Cluster 1), spliceosomal complex assembly (Cluster 2), protein localization to endosome (Clusters 3), tRNA aminoacylation for protein translation (cluster 4), immunity (cluster 5) and protein biosynthesis (cluster 6).
Cancer cells are known to exhibit altered protein secretion to create a favorable tumor microenvironment [29]. The relative upregulation of the secretory pathway may indicate that OSCC tissues are secreting more proteins when compared to the paired normal tissues. These secreted proteins may also be suited to be exploited as non-invasive biomarkers, since they can be secreted into biofluids, including blood or saliva. In addition, the enrichment of biosynthesis pathways of aminoacyl tRNA may be explained by the active metabolism of cancer cells. Aminoacyl tRNA synthetases (ARSs) are vital enzymes that connect amino acids to their related tRNA. Beyond ARSs central role in protein translation, current studies have discovered others functions and expression levels may be associated to the prognosis of cancer [30,31]. For example, in cancer cells stimulated by TNF, human lysyl-tRNA synthetase (KARS) is secreted that triggers proinflammatory signaling in immune cells [32].

Tumor-promoting transcription factors in oral cancer
To predict the coordinated regulation of the proteins involved in the differential protein secretion and other biological processes associated with the highly connected protein clusters 1 to 6 ( Fig. 2a), we analyzed whether specific transcription factors can be identified that regulate these proteins (Table 3). Transcription factor binding motifs were predicted in Cytoscape using the iRegulon plugin (version 1.3) [27]. We used proteins from cluster 1 to 6 as input proteins for motif and track search in the cis-regulatory control elements of genes of those proteins.
We found that Creb3/Creb3L1, ESRRA, YY, ELF2, STAT and XBP1 were the central transcription factors that could explain the highly connected upregulated cluster-proteins 1 to 6 respectively as shown in Fig. 2a, b and Table 3. These TFs were previously reported to be upregulated in oral cancer [33][34][35][36] [37][38][39][40]. Targeting these TFs may alter the expression of many proteins. One of these TFs, STAT is known as the vital transcription factor regulating genes of the Type I interferon pathway/viral process. In our study, we found that expression of STAT1 was increased in OSCC tumor samples and secreted in HNSCC cell line secretome as well (Table 4).

OSCC tissue protein candidates predicted to be secreted
To further explore the process of secretion, we determined which of the deregulated tissue proteins are predicted to have a signal peptide using the SignalP database [28]. To gain more insight in the presumed subcellular localization of the proteins, the annotated subcellular localization was retrieved from the IPA database (Ingenuity ® Systems www. ingen uity. com). Interestingly, when analyzing the 205 differentially expressed proteins (p < 0.01) of OSCC tumor tissue vs. paired normal adjacent tissue 38% of the proteins were predicted to be secreted, either via classical or non-classical secretion mechanisms (annotated in Additional file 11. Table S10).
Overall, the proportion of proteins from the various subcellular origins was comparable between the OSCC tissues vs. NATs samples except for the proportion of cytoplasmic proteins̴ (~ 50%) that was much higher in OSCC tissues.

Protein profiling of HNSCC cell line secretomes and comparison to OSCC tissue
Biological annotation revealed that the secretome pathway was highly enriched in OSCC tissues compared to normal mucosa. However, the differences in tissue architecture might have impacted these data. To further explore the potential of the secretome and annotate OSCC-associated proteins as candidate biofluid markers, we performed proteomics on the secretomes of HNSCC cell lines. We selected a diverse panel of 9 HPV-negative HNSCC cell lines to capture the complex tumor biology as much as possible. Cancer-associated proteins that are released may be more likely to be detectable in body fluids such as saliva [41]. Using an in-depth workflow based on gel fractionation coupled to nanoLC-MS/MS [42], we profiled the secretomes of HNSCC cell lines (Table 2). Prior to mass-spectrometry analysis, input quantities were checked by Coomassie blue-stained SDS-gel (Additional file 1: Fig. S3). A total of 4472 cancer cell secretome proteins were identified (Additional file 5: Table S4), of which on average ~ 3500 proteins per sample. Of these, 1724 secretome proteins were identified (> 5 average counts) in all HNSCC secretomes (Additional file 6: Table S5). Overlap analysis of these robustly identified secretome proteins with the differential OSCC tissue proteins revealed 132 promising proteins as candidate non-invasive markers for HPV-negative HNSCCs. Underscoring their value as potential OSCC marker, 131 proteins were also cancer-associated in the large-scale proteomics analysis of OSCC tissue recently reported by Huang et al. 2021 (Additional file 7: Table S6) [7]. In our list of most overexpressed proteins in OSCC tissue or HNSCC secretomes, transferrin receptor (TFRC) was the most differentially expressed protein in tumors as compared to normal mucosa. Human Protein Atlas data expression of TFRC is high in HNSCC and it is also found abundant across different cancers, indicating that this is a common protein involved in multiple cancers. TFRC plays an essential role in the cellular uptake of iron. TFRC is related to lysosomes/endosomes, which can be secreted into saliva and blood (Additional file 1: Fig. S4a-c). In previous studies, TFRC expression rate in OSCC was found to be substantially higher than in dysplasia [43]. Interestingly, functional analysis in vitro and in vivo showed that an anti-TFRC antibody that blocks the interaction between transferrin and TFRC and consequently inhibits iron uptake, lead to the suppression of cell growth and induced apoptosis via iron deprivation [43]. Altogether the results of us and others suggest that TFRC may serve as a promising, cancer biology-linked biomarker for noninvasive diagnostics in OSCC.
Strikingly, only 23% of the 132 overlapping OSCC tissue and HNSCC secretome candidates had a predicted signal peptide in their sequence (Fig. 3a-d, Additional file 8: Table S7). The remaining 16% of the proteins did not contain a classical signal peptide but were predicted to be secreted via non-classical routes, as predicted by the SecretomeP algorithm, which is based on other sequence-derived features and their subcellular localization [28]. Remarkably, among the proteins significantly more abundant in the HNSCC cell lines secretomes, the proportion of nuclear proteins was 30%. The biological background of this observation seems an enigma, but these secreted proteins might also serve as non-invasive biomarkers.

Promising candidate biomarker for distinct clinical applications
Non-invasive detection of OSCC might improve diagnosis of oral cancer. Therefore, we further annotated the 132 promising proteins from our OSCC tissue proteome (p < 0.01; > 2 upregulated) that are likely to be secreted (identified in all 9 cancer cell line secretomes with an average abundance of > 5 counts), for their potential use as non-invasive biomarker. To this end, we explored detectability in public proteomics datasets; salivary proteome data of normal mucosa vs. OSCC datasets [8], human salivary proteome [9] and the Normal Saliva Proteome database (https:// saliv arypr oteome. nidcr. nih. gov/) (Additional file 9: Table S8). Importantly, 106 out of our 132 candidates (Fig. 4, Additional file 10: Table S9), have been identified as  [8], human salivary proteome [9] and Normal saliva proteome (https:// saliv arypr oteome. nidcr. nih. gov/) potential OSCC biofluid markers, with 25 top candidates detected in all 3 studies (Table 4). Therefore, these proteins may have the potential to be further exploited for developing a non-invasive biomarker test for the detection or prognosis of OSCC. Of these, THBS2, LGALS3BP and DNAJB11 were potentially useful salivary markers for the detection of OSCC as reported previously [41,[44][45][46].

Conclusions
In conclusion, using comparative analysis of matched OSCC and paired normal mucosa tissues we identified proteins with different expression levels that form highly connected biological networks associated with a variety of distinct functional pathways, which may contribute to pathogenesis. Bioinformatics and overlap with cancer cell secretome and public saliva data identified an interesting set of OSCC proteins involved in secretory pathways and potential non-invasive biomarkers to explore for their ability to detect OSCC. Protein TFRC, the protein most abundantly expressed in OSCC cancer tissues, and identified in all cancer cell lines secretomes, is an especially promising candidate non-invasive biomarker that was previously not reported and merits further evaluation for its potential for the detection of OSCC.