Quantitative Shotgun Proteomics Unveils Candidate Novel Esophageal Adenocarcinoma (EAC)-specific Proteins *

Esophageal cancer is the eighth most common cancer worldwide and the majority of patients have systemic disease at presentation. Esophageal adenocarcinoma (OAC), the predominant subtype in western countries, is largely resistant to current chemotherapy regimens. Selective markers are needed to enhance clinical staging and to allow targeted therapies yet there are minimal proteomic data on this cancer type. After histological review, lysates from OAC and matched normal esophageal and gastric samples from seven patients were subjected to LC MS/MS after tandem mass tag labeling and OFFGEL fractionation. Patient matched samples of OAC, normal esophagus, normal stomach, lymph node metastases and uninvolved lymph nodes were used from an additional 115 patients for verification of expression by immunohistochemistry (IHC). Over six thousand proteins were identified and quantified across samples. Quantitative reproducibility was excellent between technical replicates and a moderate correlation was seen across samples with the same histology. The quantitative accuracy was verified across the dynamic range for seven proteins by immunohistochemistry (IHC) on the originating tissues. Multiple novel tumor-specific candidates are proposed and EPCAM was verified by IHC. This shotgun proteomic study of OAC used a comparative quantitative approach to reveal proteins highly expressed in specific tissue types. Novel tumor-specific proteins are proposed and EPCAM was demonstrated to be specifically overexpressed in primary tumors and lymph node metastases compared with surrounding normal tissues. This candidate and others proposed in this study could be developed as tumor-specific targets for novel clinical staging and therapeutic approaches.

the originating tissues. Multiple novel tumor-specific candidates are proposed and EPCAM was verified by IHC.
This shotgun proteomic study of OAC used a comparative quantitative approach to reveal proteins highly expressed in specific tissue types. Novel tumor-specific proteins are proposed and EPCAM was demonstrated to be specifically overexpressed in primary tumors and lymph node metastases compared with surrounding normal tissues. This candidate and others proposed in this study could be developed as tumor-specific targets for novel clinical staging and therapeutic approaches. Esophageal cancer is the sixth leading cause of cancer death worldwide (1) and esophageal adenocarcinoma (EAC) 1 has become the predominant histological subtype in western countries (2,3). In the UK, 95% of patients diagnosed with EAC will die from metastatic disease and the majority are resistant, at presentation, to current platinum-based chemotherapy regimens (4 -6).
EAC is frequently associated with both lymphatic and distant metastases yet current staging modalities including computed tomography (CT), positron emission tomography (PET) and endoscopic ultrasound (EUS) are limited in both sensitivity and specificity (5). Surgical resection only benefits patients with localized disease and carries a 40% risk of major morbidity and 2-3% risk of perioperative mortality (7,8). The development of accurate noninvasive imaging markers of EAC would enhance clinical staging by allowing the specific detection of locoregional and distant metastases, enabling treatment stratification (9).
The normal squamous epithelium-lined esophagus is vulnerable to toxic insult from the esophageal lumen. Indeed, chronic reflux of gastric acid and bile is thought to underlie the development of columnar metaplasia, "Barrett's esophagus", the precursor lesion of EAC (10). Although the exact molecular mechanisms of Barrett's development and esophageal carcinogenesis remain obscure, the detection and treatment of EAC at an early stage offers the prospect of long term cure with over 80% of patients undergoing surgery for stage I esophageal cancer surviving 5 years (11).
Intriguingly many of the genetic mutations present in EAC have also been demonstrated in nondysplastic Barrett's epithelium raising the possibility that a transcriptional change such as splicing or RNA-editing, or a post-translational modification is responsible for transformation (12). If such a biomarker could be identified this would offer the possibility of earlier diagnosis and more effective treatment.
Characterizing the proteomic changes associated with EAC may also allow novel therapies to be designed. Tumorspecific proteins have been exploited as immunotherapeutic targets in other cancer types by engendering a host response to the cancer (13), in some cases leading to durable responses (14).
To date, no specific markers of EAC have been identified. To identify candidate proteins de novo, expression must be measured using untargeted proteomic methods.
Quantitative proteomic methods have now been applied across many cancer tissues. Most previous proteomic studies in EAC, however, have only identified a small number of dysregulated proteins limiting the comparisons that can be made between studies or with other cancers (summarized in Table I). Only one of these previous studies employed a quantitative shotgun proteomic strategy. The authors compared pooled biopsies of EAC, normal esophagus, gastric adenocarcinoma and normal gastric tissue and identified 972 proteins. Although no EAC-specific protein was identified, neutrophil defensin 1, an antimicrobial peptide found in neutrophil granules, was overexpressed in both cancer types relative to normal tissue (15). This may reflect the inflammatory environment associated with these cancers.
The comparisons between EAC and normal squamous epithelium in published work reveals many dysregulated proteins, some of which represent proteins associated with glandular differentiation and some associated with carcinogenesis. Glandular-associated proteins may be expressed in gastric and intestinal epithelium and may not represent tractable targets for therapy as toxicity because of intestinal epithelial damage would be expected. It is possible that including columnar epithelium-lined gastric tissue along with squamous and EAC tissue may enable the discrimination of proteins that reflect glandular differentiation from those driving carcinogenesis.
Multitissue proteomic profiling has been applied across mouse tissues with relative quantitation using a super-SILAC Identifying Esophageal Adenocarcinoma-specific Proteins approach (16). In this study, snap-frozen biopsies from 28 tissue types were subjected to shotgun proteomics with a spike-in, heavy-labeled mixture of all tissues obtained from the SILAC mouse. By comparing the relative expression of proteins across tissues, tissue-specific expression could be highlighted. The esophagus was not included in this profiling effort although gastrointestinal tissues with columnar epithelia showed similar expression patterns (16). This comparative approach has also been employed in a large proteomic study of 30 human tissues by label-free quantification and again tissue-specific expression patterns identified (17). This biomarker discovery study therefore used a quantitative shotgun proteomic strategy to evaluate protein expression in EAC and adjacent matched normal squamous and gastric tissues from seven patients. By quantifying the relative expression between EAC and normal esophagus and EAC and normal stomach, proteins aberrantly expressed in EAC were identified. The accuracy of this approach was confirmed by immunohistochemistry for multiple candidates and a potential tumor biomarker verified in a cohort of 115 patients with resected EAC and matched normal and metastatic tissues.

EXPERIMENTAL PROCEDURES
Experimental Design and Statistical Rationale-Fresh frozen biopsies representing macroscopically normal esophagus, normal stomach and esophageal adenocarcinoma tissue were prospectively collected from resection specimens from seven patients undergoing neoadjuvant chemotherapy and attempted curative surgery for locally advanced esophageal and esophagogastric junctional cancer at the Royal Infirmary of Edinburgh between 2010 and 2012. Local institutional ethical and research and development approvals were in place (REC references 06/S1101/16 and 10/S1402/33) (R&D ID 2006/W/PA/ 01). All patients gave informed consent and participants and their donated samples were de-identified at the time of recruitment. Patients were selected for relative clinical homogeneity with respect to known prognostic variables including lymphatic metastasis and tumor differentiation (18). The clinical characteristics of the cohort are presented in Table II.
At the commencement of this study, no shotgun proteomic data were available for esophageal adenocarcinoma tissue to inform a power calculation for sample size determination. The sample number was therefore based on previous esophageal discovery-phase proteomic studies or studies in similar tissue types (19 -22), and the availability of high quality clinical material.
Because of the risk of false-positives because of the small sample size, proposed tumor-specific proteins identified by mass spectrometry were additionally verified by immunohistochemistry (IHC) using cores from archival tumors and matched normal and metastatic tissues from an independent cohort of 115 patients with esophageal or EGJ adenocarcinoma (clinical characteristics in supplemental Table  S1).
Sample Processing-The sample processing workflow is summarized in Fig. 1. Fresh tissue biopsies were snap frozen within 30 min of tumor extirpation and maintained in liquid nitrogen or on dry ice until lysis. Frozen sections from each biopsy were reviewed by a consultant histopathologist to confirm the histological diagnosis and, for tumor biopsies, a minimum of 50% tumor cellularity.
The published Filter-Aided Sample Preparation (FASP) method was adapted for protein extraction and tryptic digestion from esophagogastric tissue (23).
Biopsies between 30 mg and 60 mg in weight were maintained on dry ice until rapid disruption at room temperature (RT) in low-binding micro-centrifuge tubes containing 1 mm ceramic beads (Matrix D, MP   Bio, Santa Ana, CA) by rapid shaking in a bench-top homogenizer (FastPrep-24, MP Bio) for 40 s at 6ms Ϫ1 . Homogenates were dissolved in FASP lysis buffer (4% w/v SDS, 100 mM Tris/HCl, 100 mM DTT, pH 7.6), mixed for 20 min at RT and sonicated at maximum amplitude, for 30 s, on ice, using a needle sonicator (Bioruptor, Diagenode, Liè ge, Belgium). Sonicated lysates were heated for 5 min at 95°C and clarified by centrifugation at 14,000g for 5 min at 20 p C before buffer-exchange as per the published FASP protocol (23). Trypsinization was performed off-column, overnight at 37°C at a ratio of 100:1 (lysate protein mass : trypsin mass) using sequencing grade modified trypsin (Promega, Madison, WI) as per manufacturer's instructions. Protein concentration was determined using a modified Lowry procedure as per the manufacturer's recommendations (RC-DC, Bio-Rad, Hercules, CA).
Isobaric labeling and Fractionation of Tryptic peptides-Tryptic peptides from each tissue sample were independently labeled with one of the 6 Tandem Mass Tag (TMT) reagents (Thermo Scientific, San Jose, CA) in technical duplicate per manufacturer's instructions ( Fig. 1). Labeled peptides from a single patient (6 reporter ions) were pooled, desalted on a Macro SpinColumn C18 (Harvard Apparatus, Holliston, MA) and separated into 24 fraction by OFFGEL electrophoresis as previously described (24).
For ultraperformance liquid chromatography (UPLC), peptides were separated using a variable solvent gradient created by a combination of 0.1% (v/v) FA in dH 2 O (solvent A) and 0.1% (v/v) FA in acetonitrile (solvent B). The gradient was run as follows: 0-1 mins, 95% (A) and 5% (B), 1-56 mins, 65% (A) and 35% (B), 66 -76 mins, 20% (A) and 80% (B) using a flow rate of 220 nL/min. Mass Spectrometry-Peptides were analyzed in positive ion mode after electrospray ionisation on an LTQ-Orbitrap Velos mass spectrometer (Thermo Scientific, San Jose, CA). For MS survey scans, the Orbitrap (OT) resolution was set to 60,000 and the ion population was set to 5 ϫ 10 5 with an m/z window from 400 to 2000. A maximum of three precursor ions with the greatest peak intensities were selected for both collision-induced dissociation (CID) and high-energy C-trap dissociation (HCD) in the LTQ with analysis in the OT. For fragment ion analysis in the LTQ, the ion population was set to 7 ϫ 10 3 (isolation width of 2 m/z) whereas for detection in the OT, the ion population was set to 2 ϫ 10 5 (isolation width of 2.5 m/z), with resolution of 7500, first mass at m/z ϭ 100, and maximum injection time of 750 ms. The normalized collision energies were set to 35% for CID and 60% for HCD.
Protein Identification-Protein identifications were made using the Easyprot platform (v2.3 build 720, Swiss Institute of Bioinformatics) (25) Thermo RAW files were converted to peak lists using ReAdW (version 4.3.1, ThermoFinnigan) and CID and HCD spectra were merged for simultaneous identification and quantification as previously described (26). Peaklist files were searched against the Uniprot human reference proteome (release 09/01/2013, containing 87,613 entries) using Phenyx® (version 2.6.1, GeneBio) (27) with a precursor ion tolerance of 10 parts per million and a fragment ion tolerance of 0.6 Da. Variable peptide modifications included TMT-modified N termini and lysines (additional 229.1629 Da) and oxidized methionines, with carbamidomethylation of cysteines set as a fixed modification. Trypsin was selected as the digestion enzyme, with one potential missed cleavage and a minimum of a single-tryptic terminus, a peptide length of 6 amino acids and a z-score of 4 were required.
All data sets were searched separately, once using a forward and once using a reversed protein database. The peptide false discovery rate was set to 1%. A single unique peptide was accepted for protein identification. Identified peptide sequences, scores, precursor m/z, corresponding proteins, protein coverage and raw data files have been deposited to the ProteomeXchange Consortium (http:// proteomecentral.proteomexchange.org) via the PRIDE partner repository (28) with the dataset identifier PXD004962.
Protein Quantitation-For relative protein quantitation, the TMT reporter ion intensities were extracted for each peptide. An isotopic purity correction was performed within Easyprot for each reporter based on the isotopic distribution of the sixplex-TMT reporters provided by the manufacturer.
The ratios of peptide expression between EAC and normal esophagus (TvE) and EAC and normal gastric epithelium (TvG) were calculated for each peptide as the ratio of 126/128 reporter ion intensities and 126/130 reporter intensities respectively. This was repeated for 127/129 (TvO) and 127/131 (TvG) reporter ions as a technical replicate. The geometric mean peptide TvE and TvG ratios were calculated for each protein to derive an estimate of the relative protein expression between patient-matched tissue types, and limiting the skew introduced by outlier ratios (29). Because of the ambiguity from potentially shared peptides, all protein isoforms were grouped under the parent protein identifier (Uniprot accession). The variance of each ratio and the number of peptides contributing to the mean were used for subsequent significance calculations.
Deriving a Mean Expression Ratio across Replicates-A mean expression value was derived across pooled technical and biological replicates by a meta-analysis approach using a fixed-effect model (30). The inverse of the variance in peptide reporter ratios was used to weight the contribution of protein expression from a replicate to the mean. In this manner, replicates with many peptides detected with similar relative tissue expression (low variance) for a protein contribute a greater proportion to the mean expression value (see supplementary Methods). The unweighted arithmetic mean and variance of the ratios across experiments was calculated for proteins identified by a single unique peptide in multiple replicates. The log 2 -transformed protein expression ratios (TvE and TvG) were median normalized to account for systematic errors during TMT labeling such as minor variations in protein loading.
Statistical Tests-The intertissue ratios (TvE and TvG) were compared with technical replicates (TvT, EvE and TvT, GvG) using Welch's modified t test (31) as the variances between them differed significantly (32); Fligner-Killeen test p Ͻ 10 Ϫ10 , supplemental Fig. S1. Welch's t test was used to test the hypothesis that the relative protein expression between different tissues was not different from relative protein expression between technical replicates with "n" defined as the number of peptides contributing to the mean ratio (see supplementary Methods) (33). All p values were corrected using the Benjamini-Yekuteili method to control for multiple hypothesis testing (34). Significance was defined as a false discovery rate (FDR)-corrected p Ͻ 0.05 and p values were two-tailed. Relative quantitation and significance are provided for all identified proteins (uploaded supplementary file; All_quantitation.xlsx).
Immunohistochemistry-For verification of relative protein expression, 4 m sections were cut from formalin-fixed, paraffin-embedded (FFPE) blocks derived from the same resection specimens used to collect the fresh tissue for this study and representing normal esophagus, normal stomach and EAC. Sections were subjected to immunohistochemical staining using standard techniques as previously described (35). Staining conditions were optimized for each antibody, included a no primary antibody control for each protein and are detailed in supplemental Table S2.
Western Blotting and RNA Interference-Lysates from each of the esophageal cell types growing under basal conditions were resolved by SDS-PAGE (20 g protein per lane, 12% SDS gel) as previously described (37). Western blots were probed with primary antibodies directed against ARHGDIB (ab88317, Abcam, Cambridge, UK, 1/500 overnight at 4°C) or ␤-Actin (AC-15, Sigma, Dorset, UK, 1/5000, 2 h at RT) followed by secondary incubation with rabbit anti-mouse or goat anti-rabbit antibodies conjugated to either the IRDye™ 680RD or IRDye™ 800CW ((Li-Cor Biosciences, Lincoln, NE, 1/2000, 2 h at RT protected from light). Blots were imaged using the Odyssey SA system (Li-Cor Biosciences) as per manufacturer's recommendations.
For siRNA experiments, OE33 cells were transfected with vector, nontargeting scrambled sequence siRNA (siGENOME, Non-Targeting siRNA#3, Dharmacon, Lafayette, CO) or siRNA to ARHGDIB (siG-ENOME, SMARTPool, Dharmacon) as previously described (37). Cells were harvested 72 h after transfection and lysates resolved by Western blotting as previously. RESULTS A total of 6349 proteins were identified and quantified across all samples corresponding to 4772 unique Entrez GeneIDs with 744 proteins quantified in both replicates from all seven samples. The protein identifications per patient are shown in Table III.
Reproducibility of Quantitation-The reproducibility of quantification of protein expression was assessed by comparing technical replicates. Expression levels were highly correlated between technical replicates from the same patient's tissues ( Fig. 2A, 2B, median Pearson correlation coefficient (PCC) ϭ 0.9811, p Ͻ 0.001). There was also a reassuringly good correlation between different patients (biological replicates) when TvE ratios were considered (Fig. 2C, median PCC ϭ 0.555, p Ͻ 0.001) demonstrating concordance of the protein expression from histologically similar tissues within a relatively clinically homogeneous patient cohort. As expected, there was no significant correlation between TvE and TvG ratios across patients (Fig. 2C, median PCC ϭ 0.0115, p Ͼ 0.05) underscoring the diversity in protein expression of the tissues studied.
A Map of Protein Expression across Esophagogastric Tissues-Protein expression was quantified by the ratio of TMT reporter ion intensities. Ratios were not, however, calculable for proteins exclusively expressed in one tissue and these proteins may have been excluded from our analysis despite their biological importance. Several peptides were manually identified with no reporter ion expression from one or two tissue types. In the context of other unique peptides identified from the same protein, however, no proteins with entirely tissue-specific expression could be identified in this study.
For the 4181 proteins identified by more than one peptide in more than one replicate, and the 2154 proteins identified by more than one peptide in a single replicate or a single peptide in more than one replicate a mean expression value and variance were derived (uploaded supplementary file; All_ quantitation.xlsx). Those 14 proteins identified by a single peptide and only in a single replicate were considered low confidence identifications and were excluded from further analysis. The expression ratios for the 3082 proteins significantly dysregulated between tissues (FDR-corrected p Ͻ 0.05 for either TvG or TvE ratios) were used to produce a two- dimensional protein expression map with vectors of TvG expression on the x axis and TvE on the y axis (Fig. 3). Proteins overexpressed in EAC would be expected to have both high TvE and TvG ratios and therefore be identified in the upper right quadrant of the protein expression plot, discrete from other nonspecifically expressed proteins. It was pro-posed that other proteins with tissue-specific expression patterns would also be closely associated on the plot. To test this hypothesis, proteins with an established expression pattern were considered. Gastric Intrinsic Factor (GIF) and Mucin 5AC are known to be specifically highly expressed in gastric epithelium (38,39 TvG1  pt51 TvG2  pt53 TvG1  pt53 TvG2  pt48 TvG1  pt48 TvG2  pt60 TvG1  pt60 TvG2  pt44 TvG1  pt44 TvG2  pt61 TvG1  pt61 TvG2  pt46 TvG1  pt46 TvG2  pt46 TvE2  pt46 TvE1  pt61 TvE2  pt61 TvE1  pt44 TvE2  pt44 TvE1  pt51 TvE2  pt51 TvE1  pt53 TvE2  pt53 TvE1  pt60 TvE2  pt60 TvE1  pt48 TvE2  pt48 TvE1 C FIG. 2. Correlation of technical replicates. A, Median-normalized log 2 ratios are displayed for two technical replicates of samples from patient 44. TvE ϭ log 2 ratio of expression between EAC and normal esophagus. TvG ϭ log 2 ratio of expression between EAC and normal gastric tissue. B, Box and whisker plot summarizing the median (line within the box), 25th and 75th percentiles (box limits) and 5th and 95th percentiles (whiskers) for the Pearson's correlation coefficients (PCC) for technical replicates across all experiments (n ϭ 14). C, Heatmap representation of PCC between technical replicates across all experiments. Unsupervised hierarchical clustering was performed using an agglomerative complete linkage method to generate the similarity dendrogram (left). associated on the protein expression map and demonstrated similar expression in EAC and normal esophageal tissue but high expression in gastric epithelium (Fig. 3). Similarly, Keratins 4, 5, and 14 are known to be highly expressed in esophageal squamous epithelium (40,41) and were all clustered together. It was possible that the tissue expression pattern of other proteins could be inferred from their location on this map.
To test this, a further two proteins predicted to be upregulated in squamous tissue (heat shock protein family B (small) member 1; HSPB1 and transglutaminase 3; TGM3), two proteins upregulated in tumor but also generally expressed (SAM and HD domain containing deoxynucleoside triphosphate triphosphohydrolase 1; SAMHD1 and Rho GDP dissociation inhibitor beta; ARHGDIB), two proteins upregulated in both tumor and gastric tissue (anterior gradient 2, protein disulfide isomerase family member; AGR2 and heat shock protein family A (Hsp70) member 5; HSPA5) and one protein predicted to be specifically highly expressed in tumor (epithelial cell adhesion molecule; EPCAM) (supplemental Table   S3) were selected for verification by immunohistochemistry (IHC).
Verification of Protein-expression Differences Across EAC and Patient-matched Normal Tissues-Sections from the FFPE tissue blocks from the original resection specimens used to derive the fresh tissue samples for proteomic analysis were subjected to IHC (Fig. 4). All five of the proteins predicted to be upregulated in EAC compared with normal esophagus (SAMHD1, ARHGDIB, AGR2, HSPA5, EPCAM) showed higher expression in the tumor sections compared with squamous epithelium. Similarly, the proteins predicted to be expressed preferentially in squamous epithelium; HSPB1 and TGM3, showed the highest expression in normal squamous esophageal tissue (supplemental Table S3).
Although false positive homogeneous cytoplasmic staining was observed in all or scattered basal crypt epithelial cells in most gastric tissue sections ( Fig. 4; No primary antibody) (42), both HSPA5 and AGR2 demonstrated true gastric epithelial staining as expected and were overexpressed in EAC compared with squamous epithelium.  1-14). Selected proteins, labeled by their official gene names, are highlighted in green for proteins known to be specifically expressed in gastric tissue, purple for proteins known to be expressed in squamous tissue and red for proteins selected for further validation by IHC. SAMHD1 showed more widespread nuclear expression across tissues but was mildly upregulated in EAC cells, as predicted. In contrast, EPCAM was very highly expressed in tumor cells with only moderate staining in basal epithelial squamous cells and gastric epithelial cells. Although overall expression of ARHGDIB was indeed higher in EAC compared with normal squamous and gastric tissue, this staining was observed in stromal cells, most likely lymphocytes, rather than epithelial-derived tumor cells.
These findings support the accuracy of the quantitative proteomic approach for each of the seven candidates selected. As expected from the proteomic data, EPCAM demonstrated the greatest specificity for tumor cells.
To determine the specificity of EPCAM for EAC cells compared with surrounding normal tissues, protein expression was determined by IHC using a tissue microarray consisting; normal gastric tissue, normal squamous tissue, uninvolved lymph nodes, involved lymph nodes and primary tumor samples from resection specimens from 115 patients whom had undergone surgical resection for EAC (Fig. 5). EPCAM was expressed at low levels in basal squamous epithelial cells and low to moderate levels in gastric epithelium. In contrast EPCAM was highly expressed in EAC and was expressed at higher levels than the median normal gastric or normal esophageal epithelial expression in 98% of tumors. This high spec-ificity was demonstrated in metastatic lesions as well as primary tumors with no EPCAM expression detectable in normal lymph nodes but high expression in 93% of lymph node metastases.
This proteomic strategy has therefore demonstrated the ability to detect relatively specific markers of EAC. Those proteins predicted to be highly expressed in EAC compared with surrounding normal tissues, including several novel candidate therapeutic targets, are presented in more detail in Fig. 6.

DISCUSSION
There is a need to identify selective markers of esophageal adenocarcinoma for early diagnosis, to enhance clinical staging and direct novel therapies. This shotgun proteomic study compared protein expression from matched esophageal adenocarcinoma, normal esophagus and normal gastric samples from seven patients and provides quantitative data on protein expression for over 6000 proteins across these tissues. A comparative analysis approach was employed to select tumor-specific proteins and this method was verified to be accurate by IHC. Multiple novel tumor-specific proteins are proposed and EPCAM was demonstrated to be specifically overexpressed in primary tumors and lymph node metastases.   Incomplete proteome coverage remains a significant limitation of all proteomic studies, however, and only 744 proteins (12%) were detected in all technical replicates across the seven patients' tissues. Despite this intrinsic limitation of proteomic studies, the technical reproducibility of protein quantitation was very high and a strategy was developed to identify significantly dysregulated proteins.
The combination of TvE and TvG ratios for each quantified protein allowed the generation of a 2D expression map. The relative quantitative accuracy of this approach was confirmed first by evaluation of proteins with a well validated expression profile (GIF, Muc5AC, Keratin 4,5,14) and then subsequently by IHC for 7 further proteins with varied expression profiles (ARHGDIB, SAMHD1, AGR2, HSPA5, EPCAM, TGM3, HSPB1). In each case the observed staining pattern mirrored the expected expression from the proteomic data (supplemental Table S3). This provides confidence in the predicted expression for proteins with an unknown pattern.
Both TGM3 and HSPB1 were found to be expressed at high levels in squamous epithelium compared with EAC, as expected from our proteomic data and as reported in previous work (43). In contrast, AGR2 was found to be expressed in gastric epithelium and EAC with no expression observed in squamous epithelia. Similarly, HSPA5 was found to be expressed in both EAC and gastric epithelium. These findings agree with previous reports (44 -48). HSPA5 is thought to play a key role in the regulation of the unfolded-protein response and this expression pattern may reflect an increased protein chaperone demand in secretory cells (48,49). Indeed, AGR2 has a function in protein homeostasis and secretion (50). SAMHD1 exhibited nuclear expression in both epithelial and nonepithelial cells with the highest expression in EAC. It has a reported role in restricting HIV replication and modulating the immune response in T cells (51) but there is limited data for its role in cancer and this would be worthy of further study.
ARHGDIB was overexpressed in EAC sections compared with normal squamous and normal gastric tissue. The ARHGDIB positive cells appeared, however, to be lymphocytes rather than epithelial-derived tumor cells. This expression pattern has been observed for ARHGDIB with a different antibody (47) and a multitissue study suggested expression was restricted to hematopoietic cells (52).
Divergent roles for ARHGDIB have subsequently been proposed in the literature with some evidence for a role in the suppression of metastasis in bladder cancer (53) and in contrast a proinvasive role in gastric cancer (54). One previous proteomic study reported ARHGDIB overexpression in EAC compared with Barrett's epithelium at both the mRNA and protein level (22). In contrast to our study, the previous work presented cytoplasmic and membrane staining in epithelial cells with minimal stromal positivity in EAC sections. The specificity of the rabbit anti-ARHGDIB antibody used in that study was not demonstrated in the manuscript. In contrast the antibody in this study identifies a protein of the predicted mass (48 kDA) in a panel of esophageal cell lines by Western blotting and is specific for that protein as confirmed by siRNA (supplemental Fig. S2). The variable expression of ARHGDIB noted in the panel of esophageal cell lines may reflect context-dependent regulation. Staining across a larger number of esophageal tumors could establish greater confidence over the cell-specific expression profile.
The cancer antigen EPCAM was predicted to be highly expressed in EAC cells compared with surrounding normal tissues and this was indeed observed. This has previously been demonstrated for several cancer types, including esophageal (55,56). EPCAM is a cell adhesion molecule that is highly expressed on the cell membrane and may possess a signaling role through the regulation of cell proliferation via a cleaved intracellular domain (57). Because of its high specificity for malignancy, EPCAM-based assays have been developed to detect circulating tumor cells although these are less sensitive for mesenchymal tumors (58).
The expression of EPCAM was examined in detail by IHC in an independent cohort of 115 esophageal adenocarcinomas with matched normal and metastatic tissues. EPCAM was highly expressed in the clear majority of EACs with higher EP-CAM histoscores observed in 85/87 EACs than the median gastric or esophageal scores. EPCAM was also highly specifically expressed in lymph node metastases compared with surrounding normal lymph nodes raising the possibility that it could exploited to enhance clinical staging using novel techniques (59).
These data agree with previous work demonstrating overexpression of EPCAM in EAC compared with surrounding normal tissues (56). A further study identified disseminated tumor cells from bone marrow and lymph nodes in patients with esophageal cancer and, although the primary tumors predominantly expressed high levels of EPCAM, EPCAM expression was only observed in 37% of disseminated tumor cells from bone marrow aspirates (60). A reduction in EPCAM expression by RNA interference increased migration in vitro and the authors proposed that EPCAM expression reduced during the process of invasion as cells adopted a more mesenchymal phenotype. This may have clinical implications if anti-EPCAM therapies are to be considered.
Samples were obtained from six men and one woman (Table II) reflecting the 4-fold greater prevalence of EAC in men (18). Although this bias could limit the applicability of this study to women with EAC, no differences were noted in the expression patterns of EPCAM across tissue types from men (n ϭ 97) and women (n ϭ 18) in the TMA.
EAC exhibits a high frequency of DNA mutation and mutant proteins are highly likely to be tumor specific (12). A limitation of this study is that mass spectra were searched against a protein database containing only wildtype proteins, however, so that mutant proteins could not be identified. An alternative strategy is to generate mutant protein databases using tumor genome sequencing data, either specific to the patient's tu-mor or using commonly identified variants and search mass spectra against these (61). This proteogenomic approach has the potential to reveal tumor-specific proteins, however, significant technical challenges remain in controlling the protein database size and false-discovery rate (62).
Potentially because of this limitation, no entirely tumorspecific proteins could be identified in this study. Importantly, however, a group of proteins highly expressed in tumors relative to surrounding normal tissues was proposed (Fig. 6). Immunotherapeutic trials are already underway in other cancer types with agents directed against several of these including EPCAM (63), glycoprotein A33; GPA33 (64,65), mucin 1, cell surface associated; MUC1 (66) and melanoma antigen; MAGE (67) proteins. If the expression of these can be validated to be specific to EAC cells over surrounding tissues, there would be a compelling rationale to expand trials to include patients with EAC and to develop specific imaging tools to these targets to enhance clinical staging. These