Identification of potential biomarkers of head and neck squamous cell carcinoma using iTRAQ based quantitative proteomic approach

Head and neck squamous cell carcinoma (HNSCC) is one of the most common cancers in India. Despite improvements in treatment strategy, the survival rates of HNSCC patients remain poor. Thus, it is necessary to identify biomarkers that can be used for early detection of disease. In this study, we employed iTRAQ-based quantitative mass spectrometry analysis to identify dysregulated proteins from a panel of head and neck squamous cell carcinoma (HNSCC) cell lines. We identified 2468 proteins, of which 496 proteins were found to be dysregulated in at least two out of three HNSCC cell lines compared to immortalized normal oral keratinocytes. We detected increased expression of replication protein A1 (RPA1) and heat shock protein family H (Hsp110) member 1 (HSPH1), in HNSCC cell lines compared to control. The differentially expressed proteins were further validated using parallel reaction monitoring (PRM) and western blot analysis in HNSCC cell lines. Immunohistochemistry-based validation using HNSCC tissue microarrays revealed overexpression of RPA1 and HSPH1 in 15.7% and 32.2% of the tested cases, respectively. Our study illustrates quantitative proteomics as a robust approach for identification of potential HNSCC biomarkers. The proteomic data has been submitted to ProteomeXchange Consortium (http://www.proteomecentral.proteomexchange.org) via the PRIDE public data repository accessible using the data identifier - PXD009241.

to control. The differentially expressed proteins were further validated using parallel reaction monitoring (PRM) and western blot analysis in HNSCC cell lines. Immunohistochemistry-based validation using HNSCC tissue microarrays revealed overexpression of RPA1 and HSPH1 in 15.7% and 32.2% of the tested cases, respectively. Our study illustrates quantitative proteomics as a robust approach for identification of potential HNSCC biomarkers. The proteomic data has been submitted to ProteomeXchange Consortium (http://www.proteomecentral.pro teomexchange.org) via the PRIDE public data repository accessible using the data identifier -PXD009241.

Value of the data
This data set provides insights into proteomic alterations in HNSCC cell lines compared to normal oral keratinocyte cell line and validates candidate biomarkers using targeted proteomics approach using PRM and IHC.
The data provides a useful resource to study proteins altered in HNSCC and will aid in identification of early detection biomarkers.

Data
The data represents iTRAQ-based quantitative proteomic analysis of HNSCC cell lines -CAL 27, FaDu and JHU-O28, compared to immortalized normal oral keratinocytes -OKF6/TERT1. A representative work flow of the study is depicted in Supplementary Fig. 1A. The study led to the identification of 2468 proteins, among which 496 proteins were dysregulated (fold change Z 2) in at least two out of three HNSCC cell lines. The complete list of proteins and peptides identified in this study is provided in Supplementary Table 1. Overexpression of RPA1 and HSPH1 identified in HNSCC cell lines in comparison to OKF6/TERT1 was validated using PRM and western blot. Further, the relative abundance of RPA1 and HSPH1 were assessed in HNSCC primary tissue compared to normal head and neck tissue samples using IHC.

Sample preparation for LC-MS/MS analysis
Sample preparation for mass spectrometry analysis was performed as described previously [3]. Each cell line was grown to 70% confluency. The cells were then maintained in serum-free growth media for 12 h prior to harvesting for protein extraction. Cells were lysed using lysis buffer (2% SDS, 5 mM sodium fluoride, 1 mM β-glycerophosphate and 1 mM sodium orthovanadate in 50 mM triethyl ammonium bicarbonate (TEABC)). Protein concentration was determined using bicinchoninic acid (BCA) assay (Pierce, Waltham, MA) [4]. Filter-assisted sample preparation (FASP) was used to reduce SDS concentration in the sample as described earlier [5]. Briefly, equal amounts of protein from each cell line was reduced using tris(2-carboxyethyl)phosphine (TCEP) at 60°C for 1 h and alkylated using methyl methanethiosulfonate (MMTS) for 20 min at room temperature. The samples were then digested using trypsin (Promega, San Luis Obispo, CA) overnight at 37°C.

iTRAQ labeling and SCX fractionation
Peptides from each cell line were labelled using iTRAQ 8-plex kit (AB SCIEX, Washington, D.C.) in two technical replicates as per manufacturer's instructions. OKF6/TERT1 was labelled using iTRAQ label 113 and 114, CAL 27 with 115 and 116, FaDu with 117 and 118, and JHU-O28 with 119 and 121. The iTRAQ-labeled samples were pooled, dried and subjected to SCX fractionation, as described earlier [6]. Briefly, the samples were reconstituted in 25% acetonitrile and, adjusted to pH 2.8 using orthophosphoric acid. Sample fractionation was carried out on polysulfoethyl A column

LC-MS/MS
iTRAQ based LC-MS/MS analysis was carried out using LTQ-Orbitrap Velos mass spectrometer (Thermo Scientific, Bremen, Germany) interfaced with Easy-nLC II system (Thermo Scientific, Bremen, Germany). Samples were loaded on a trap column (75 μm Â 2 cm, Magic C18Aq, 5 μm, 100Å) and washed using Solvent A (0.1% formic acid) at a flow rate of 3 μl/min. Samples were then resolved on an analytical column (75 μm Â 12 cm, Magic C18 Aq, 3 μm, 100Å) at 350 nl/min flow rate using a linear gradient of 7-30% of Solvent B (0.1% formic acid in 95% acetonitrile) over 75 min. MS and MS/MS scans were at a mass resolution of 60,000 at 400 m/z and 15,000 at 400 m/z, respectively. Full MS scans were acquired in m/z range of 350-1800 m/z. For each cycle, twenty most abundant precursor ions with charge state Z 2 were sequentially isolated. Higher energy collision dissociation was used as the activation method with 40% normalized collision energy. Isolation width used was 2 m/z. Precursor ions with single charge or unassigned charges were rejected.

Data analysis
Proteome Discoverer (Version 1.4) software suite (Thermo Scientific, Bremen, Germany) was used for protein identification and quantification. Mass spectrometry data was queried against NCBI Human RefSeq protein database (Version 81) using Sequest and Mascot (version 2.4) search algorithms. Oxidation of methionine was set to dynamic modification in the search parameters while static modifications included -carbamidomethylation at cysteine and iTRAQ 8-plex modification at N-terminus of the peptide and lysine. Trypsin was specified as protease with one missed cleavage allowed. The precursor mass tolerance was set to 10 ppm and fragment mass tolerance was set to 0.1 Da. Decoy database searches were carried out to calculate the false discovery rate (FDR). 1% FDR cut-off at PSM, peptide and protein levels were considered for protein identification. The protein abundance ratios in HNSCC cell lines was obtained as follows: CAL 27 (115þ116)/OKF6/TERT1 (113þ114), FaDu (117þ118)/OKF6/TERT1 (113þ 114) and JHU-O28 (119þ121)/OKF6/TERT1 (113þ114). The representative spectra of two proteins, RPA1 and HSPH1 is provided in Supplementary Fig. 1B.

Parallel reaction monitoring (PRM)
Select proteins identified from the mass spectrometry-based proteomic analysis were validated using PRM on a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). The full scan event was carried out between 120.0 and 1240.0 mass selection with ion injection time of 120 ms and a resolution of 17,500. The PRM scans were carried out in MS2 mode with 1 microscan set to a resolution of 17,500. The target AGC was set to 2 Â 10 5 with an isolation window of 0.7 m/z. The data was acquired using an isolation offset of 0.2 m/z with a first fixed mass of 120.0 m/z. The normalized collision energy was set to 27%. The data was acquired in triplicate and analysed on Skyline software (version 3.5) [7] (Fig. 1A, Table 1). Statistical significance (p-value) for the fold changes was calculated using two-sample t-test.

Immunohistochemistry (IHC)
Immunohistochemistry on commercially procured tissue microarrays (TMA) was performed as described previously [3]. Commercially available HNSCC TMAs were purchased from US Biomax (Cat # HN242a and HN483; US Biomax Inc., Derwood, MD). Briefly, the formalin fixed paraffin embedded tissue sections were deparaffinised, followed by antigen retrieval using citrate buffer (0.01 M Trisodium citrate buffer, pH 6) for 20 min. To inhibit the activity of endogenous peroxidases, the sections were quenched with blocking solution (methanol: water mixed in the ratio of 3:1) for 20 min. The sections were then washed using phosphate buffered saline and treated with 5% goat serum to block non-specific binding of antibodies. The sections were incubated in mouse monoclonal anti-RPA1 antibody (Cat # sc-48425; Santa Cruz Biotechnology) and polyclonal rabbit anti-HSPH1 antibody (Cat # HPA028675; Sigma) overnight at 4°C. Following this, the sections were treated with secondary antibody conjugated with horseradish peroxidase (Bangalore Genei, India) before the addition of DAB chromogen (Dako, Glostrup, Denmark). Once the colored substrate was formed, the reaction was quenched using water and counterstained using hematoxylin solution. The tissue sections were examined under the microscope by two expert pathologists independently and scored based on the expression of proteins in the tissue sections. The tissue expression of RPA1 and HSPH1 were scored on a scale of 0-3, with 0 representing absence of staining, þ 1 representing weak staining, þ 2 representing moderate staining, and þ 3 representing intense staining ( Fig. 1C; Tables 2A and 2B). Twotailed Chi-square test was carried out to evaluate the p-value significance of RPA1 and HSPH1 expression in HNSCC.