Sorbitol Dehydrogenase Overexpression and Other Aspects of Dysregulated Protein Expression in Human Precancerous Colorectal Neoplasms: A Quantitative Proteomics Study*

Colorectal adenomas are cancer precursor lesions of the large bowel. A multitude of genomic and epigenomic changes have been documented in these preinvasive lesions, but their impact on the protein effectors of biological function has not been comprehensively explored. Using shotgun quantitative MS, we exhaustively investigated the proteome of 30 colorectal adenomas and paired samples of normal mucosa. Total protein extracts were prepared from these tissues (prospectively collected during colonoscopy) and from normal (HCEC) and cancerous (SW480, SW620, Caco2, HT29, CX1) colon epithelial cell lines. Peptides were labeled with isobaric tags (iTRAQ 8-plex), separated via OFFGEL electrophoresis, and analyzed by means of LC-MS/MS. Nonredundant protein families (4325 in tissues, 2017 in cell lines) were identified and quantified. Principal component analysis of the results clearly distinguished adenomas from normal mucosal samples and cancer cell lines from HCEC cells. Two hundred and twelve proteins displayed significant adenoma-related expression changes (q-value < 0.02, mean fold change versus normal mucosa ±1.4), which correlated (r = 0.74) with similar changes previously identified by our group at the transcriptome level. Fifty-one (∼25%) proteins displayed directionally similar expression changes in colorectal cancer cells (versus HCEC cells) and were therefore attributed to the epithelial component of adenomas. Although benign, adenomas already exhibited cancer-associated proteomic changes: 69 (91%) of the 76 protein up-regulations identified in these lesions have already been reported in cancers. One of the most striking changes involved sorbitol dehydrogenase, a key enzyme in the polyol pathway. Validation studies revealed dramatically increased sorbitol dehydrogenase concentrations and activity in adenomas and cancer cell lines, along with important changes in the expression of other enzymes in the same (AKR1B1) and related (KHK) pathways. Dysregulated polyol metabolism might represent a novel facet of metabolome remodeling associated with tumorigenesis.

Colorectal cancer ranks third among the world's high-incidence cancers and is a leading cause of cancer-related death among older adults (1,2). In the United States alone, projections for 2013 include 102,480 new cases and 50,830 deaths (2). Cancerogenesis in the large bowel begins with the transformation of the epithelial cell lining of the gut. Molecular alterations, mainly involving the WNT signaling pathway, render these cells hyperproliferative, and they form benign adenomatous tumors. The neoplasms are initially noninvasive (3,4), and the vast majority remain that way. But as genetic and epigenetic anomalies continue to accumulate, the tumor cells' capacity for invasion and destruction of surrounding tissues increases. At some point, this process drives certain adenomas into the realm of frank malignancy, transforming them into adenocarcinomas.
Early diagnosis of colorectal tumors has been greatly facilitated by screening methods based on fecal analysis or colonoscopy, but both approaches have limitations (5)(6)(7)(8)(9). Better understanding of the molecular mechanisms underlying large bowel tumorigenesis could improve our chances of detecting these lesions in the adenomatous or localized adenocarcinomatous stage, when the chances of successful treatment are greater. Promising results for the detection and validation of potential cancer biomarkers are emerging from proteomic studies of cancer development (10). Relative to older gel-electrophoresis-based approaches, shotgun proteomics methods, particularly those that include pre-MS OFFGEL electrophoretic peptide fractionation (11), enhance the sensitivity, robustness, and reproducibility of these studies (12) and expand the proteome coverage to include proteins that are less abundantly expressed (13)(14)(15)(16). Furthermore, with the aid of isobaric-tag peptide-labeling strategies, MS can also be used for the relative quantification of protein expression levels within a series of multiple human tissue samples (12,(17)(18)(19).
Thus far, only a few MS-based proteomics studies have examined human colorectal adenomas (reviewed in Refs. 9 and 20). We therefore decided to explore the proteome of a relatively large series of these precancerous lesions (each with a paired sample of normal colon mucosa) using quantitative shotgun MS with the widely used iTRAQ 1 peptide labeling technique (21,22) and OFFGEL fractionation. Adenoma-related protein expression variations specific to the epithelial compartments of these lesions were identified with a novel approach, which involved comparing the human tissue proteome with that of colon epithelial cell lines. The results of these studies revealed several protein expression changes previously documented only in advanced colorectal cancers. They also disclosed several novel changes with potentially important roles in early-stage large bowel tumorigenesis, including the marked up-regulation of a key enzyme in the polyol pathway.

EXPERIMENTAL PROCEDURES
Human Tissue Samples and Cell Lines-Human colorectal tissues were prospectively collected from patients undergoing colonoscopy in the Istituti Ospitalieri of Cremona, Italy. Approval was obtained from the local ethics committee, and tissues were used in accordance with the Declaration of Helsinki. Each donor provided written informed consent to sample collection, data analysis, and publication of the findings. Progressive numbers were assigned to each patient to protect human confidentiality. The series comprised 30 colorectal adenomas, each with a paired sample of normal mucosa from the same colon segment, Ͼ2 cm from the lesion. Tissues were collected endoscopically, promptly frozen in liquid nitrogen, and stored at Ϫ80°C.
Five colorectal cancer cell lines (HT29, Caco2, CX1, SW480, and SW620) were obtained from the Zurich Cancer Network's Cell Line Repository. All had been recently purchased from the American Tissue Culture Collection (Teddington, UK) and were certified as mycoplasma infection free. We cultured Caco2 and CX1 cells in Dulbecco's modified Eagle's medium; HT29 cells in McCoy's medium; and SW480 and SW620 cells in RPMI 1640 medium supplemented with 10% fetal bovine serum, L-glutamine, and 1% penicillin-streptomycin (Sigma, St. Louis, MO). The recently established line of immortalized human colon epithelial cells (HCEC) was obtained from J. W. Shay and grown as described elsewhere (23).
Protein Extraction from Tissues and Cell Lines-For MS studies, frozen tissue samples were quickly weighed and homogenized on ice (1 min of grinding, 1 min on ice, 1 min of grinding) in a Wheaton glass borosilicate grinder containing a solution of 100 mM triethylammonium bicarbonate (Sigma, St Louis, MO), 1X Complete EDTA-free Protease Inhibitor Mixture (Roche, Mannheim, Germany), 1 M urea, 5 mM ␤-glycerophosphate disodium salt hydrate, 1 mM sodium orthovanadate, and 5 mM sodium fluoride (Sigma). The efficiency of cell lysis was microscopically confirmed. The homogenates were then sonicated with a Bioruptor (Diagenode, Denville, NJ) (high power, five 10-s/10-s on/off cycles) and centrifuged (16,000g for 5 min at 4°C). The supernatant containing the proteins was collected and stored at Ϫ80°C.
Cells (grown to Ͼ80% confluence in 15-cm 2 dishes) were washed in PBS, covered with 250 l of the buffer used for tissue sample homogenization (see above), detached from the dish with a cell scraper, and homogenized (25 passages through a 25-gauge needle). The efficiency of cell lysis was microscopically confirmed. Sonication and centrifugation were repeated as described above, and the protein concentration was determined via Bradford assay. Prior to MS analysis, a 5-g sample of each protein extract was subjected to one-dimensional gel electrophoresis on a 12% bisacrylamide gel to assess protein integrity and extraction protocol reproducibility. The entire proteomic workflow, from tissue/cell processing to statistical analysis, is summarized in Fig. 1 and described in detail in the next five subsections.
For sorbitol dehydrogenase (SORD) assays (see below), Ͼ80% confluent cells were washed in PBS and covered with a solution consisting of 100 mM triethanolamine (Sigma) and 1X Complete EDTA-free Protease Inhibitor Mixture (Roche). (A simple buffer was used to reduce the risk of introducing anti-enzymatic substances into our extract.) Cells were then scraped and homogenized with 25 passages through a 25-gauge needle. Tissue samples were weighed and homogenized in a Wheaton glass borosilicate grinder containing the buffer described above. After centrifugation (16,000 g, 4°C, 5 min), the supernatant was aliquoted and stored at Ϫ80°C. Protein concentration was measured via Bradford assay.
Protein Digestion and iTRAQ 8-plex Labeling-iTRAQ 8-plex experiments were performed to analyze tissue extracts (10 experiments) and cell-line extracts (1 experiment) (Fig. 1). Labeling efficiency and relative quantitation accuracy were assessed with the aid of two reference protein extract mixtures: one for tissue samples (pooled extracts from three normal tissues and three adenomas) and one for cell lines (pooled aliquots of each of the six cell line extracts). Fifty micrograms of protein per sample were used for each iTRAQ channel. Tryptic digestion (10% w/w, sequencing-grade modified trypsin, Promega, Madison, WI) and iTRAQ 8-plex labeling (AB Sciex, Framingham, MA) were performed according to the manufacturers' instructions (2.5-h incubation of samples with iTRAQ labels). For tissue experiments, two iTRAQ labels, 113 and 114, were chosen for the reference mixture, and labels 115/116, 117/118, and 119/121 were used for the three pairs of normal/adenomatous tissues included in each experiment. For the cell line experiment, labels 113 and 114 were used for the reference mixture, and labels 115-121 represented HCEC, HT29, Caco2, CX1, SW480, and SW620 cells, respectively ( Fig. 1). After iTRAQ labeling, the samples (for each experiment) were combined, desalted on 500-mg SepPak C18 columns (Millipore, Billerica, MA), dried in a SpeedVac concentrator (Thermo Scientific), and subjected to peptide fractionation.
OFFGEL Electrophoresis-Peptide fractionation was performed according to the manufacturer's protocols with an Agilent 3100 OFFGEL fractionator and 12-well OFFGEL kit (both from Agilent Technologies, Santa Clara, CA). Briefly, samples were resolubilized in 1.8 ml of 1X OFFGEL peptide stock solution containing carrier ampholytes (pH range 3-10), loaded into the wells (150 l per well), and focused until 20 kV/h was reached with a maximum current of 50 A. For each experiment, 12 fractions were collected. A 15-l aliquot of each fraction was acidified with 1.5 l of a 50% acetonitrile/1% trifluoroacetic acid solution, desalted using ZipTip C18 (Millipore, Billerica, MA), dried, resolubilized in 15 l of a 0.1% formic acid/3% acetonitrile solution, and analyzed with MS.
Full-scan MS spectra (300 -1700 m/z) were acquired at a resolution setting of 30,000 at 400 m/z after accumulation to a target value of 1 ϫ 10 6 . For the eight most intense signals per cycle above a threshold of 1000, both collision-induced dissociation and higher-energy collisional dissociation spectra were acquired in a data-dependent manner (Fig. 1). Collision-induced dissociation scans were recorded in the ion trap (settings: normalized collision energy, 35; maximum injection time, 50 ms; automatic gain control, 1 ϫ 10 4 ions). For the higher-energy collisional dissociation scans, spectra were recorded at a resolution setting of 7500 at 400 m/z (normalized collision energy, 52; maximum injection time, 125 ms; automatic gain control, 5 ϫ 10 4 ions). Charge state screening was enabled, and singly charged states were rejected. Precursor masses previously selected for MS/MS were excluded from further selection for 60 s, and the exclusion window was set at 10 ppm. The maximum number of entries in the exclusion list was set at 500. All samples were analyzed in duplicate, and precursors selected in the first run were excluded from fragmentation in the second run. The exclusion list was set on a time window of 4 min and a mass width of 10 ppm. Spectra were acquired using internal lock mass calibration on m/z 429.088735 and 445.120025.
Peak List Generation and Database Search-As depicted in Fig. 1, Mascot Distiller 2.4.3.3 (Matrix Science, Boston, MA) was used to generate Mascot generic format peak lists. De-isotoping and peak picking were not performed between 112.5 and 121.5 m/z (the range containing iTRAQ reporter ions), and the higher-energy collisional dissociation and collision-induced dissociation spectra were merged by summing. For each of the 11 experiments, the corresponding 24 Mascot generic format peak lists were concatenated and searched, with the aid of Mascot Server 2.3.02 (Matrix Science), against a forward UniProtKB/Swiss-Prot database for human proteins concatenated to a reversed decoyed FASTA database. The concatenated database contained a total of 147,438 proteins with accessions in Gene Ontology-compatible format and 260 common MS contaminants (NCBI taxonomy I.D. 9606, released December 13, 2011).
Methylthio (C), iTRAQ 8-plex labeling at the N terminus and lysine were set as fixed modifications, and variable modifications consisted of methionine oxidation and iTRAQ 8-plex labeling of tyrosine. We used the iTRAQ 8-plex-vs114 (Applied Biosystems Zug, Switzerland) quantitation method. The isotope and impurity correction factors used for each iTRAQ label were those provided by the manufacturer. Precursor and fragment tolerances were set at 10 ppm and 0.8 Da, respectively. The enzyme specificity was set to trypsin with an allowance of up to one missed cleavage. Using Mascot internal export scripts, we transformed Mascot DAT files into XML files and parsed them with in-house scripts so that peptide sequences, scores, and intensities of the individual reporter ion channels were reported. Confidently identified and quantified peptides were selected with the following filters: rank 1 (best spectra assignment); ion score, Ͼ 15; and presence of iTRAQ intensity values for reporter channels 113 and 114.
Quantification of Relative Protein Abundance-(These steps are described in the boxes of the lower half of Fig. 1.) Peptide reporter channel intensities were summed for each protein individually using R-scripts. Ratios were built from summed channels (113/114 to 121/ 114) for all proteins identified in each iTRAQ experiment. False discovery rates (FDRs) (24) were determined at the spectrum, peptide, and protein levels. The results of individual experiments were then merged into one matrix, which was used for statistical analysis in R and Perseus (Version 1.2.7.4). All proteins identified with the same peptide(s) were grouped into families, each of which was identified by a unique protein family number. Ratios of the intensity of each ion channel to that of 114 were converted to base 2 logarithmic values and normalized respectively on the median (which was set at 0), resulting in ratios that followed a Gaussian distribution. Proteins identified on the basis of the same peptide(s) shared the same family number and were represented once in our statistical analysis. The paired t test was used to compare the expression of a given protein in each adenoma and that found in the corresponding sample of normal mucosa. To correct for multiple comparisons, the FDR was controlled with the Benjamini-Hochberg procedure. The average protein-expression fold change in adenomas, relative to the normal mucosa, was then calculated. For this, median normalized ratios for all proteins in each paired adenoma-normal mucosa sample were deconvoluted of the reference standard effects (114) to compute the adenoma versus normal mucosa ratio per protein (deconvoluted fold change, (116/114)/(115/114) ϭ (116/115)) and the mean fold change per protein in all tissue pairs. The Mascot emPAI values for all proteins were included in XML exports for each experiment. Thereafter, the mean Mascot emPAI value was calculated for all proteins.
Functional Annotation of Proteins-Gene Ontology (GO) annotations and GO terms for proteins in the UniProt/Swiss-Prot database were sourced from the European Bioinformatics Institute. The Scaffold program (Version 3) was used to identify the cellular localizations and biological processes most represented in lists of proteins quantified in tissues and cell lines. The topGO Bioconductor software package in R was used to identify and screen for GO biological process categories displaying enrichment for proteins that were differentially regulated in adenomas (versus normal mucosa) (25). First, we prepared a "universe" comprising all the proteins quantified in our study, each matched to GO terms and annotations. This served as the "background." The "foreground" consisted of the list of significantly dysregulated proteins. The most significant GO terms were scored with the Eliminating Genes (elim) method (25).
Measurement of SORD Activity-Total protein was extracted from cell lines and tissues as described above. SORD catalyzes the reversible conversion of D-sorbitol to D-fructose, with ␤-NADH as a cofactor.
SORD activity was quantified via continuous spectrophotometric rate measurement of the ␤-NAD formation rate (temperature 25°C, pH 7.6, A 340 , light path of 1 cm) in a Cary 50 Scan UV-visible spectrophotometer using the Cary Kinetics Application (both from Varian Inc., Palo Alto, CA) (28). The final reagent concentrations in a 1-ml cuvette were as follows: 78.33 mM triethonolamine, 183 mM D-fructose, 0.21 mM ␤-NADH, 0.033% (w/v) BSA. The absorbance reading was recorded when the enzyme was added. One unit of enzyme activity was defined as the amount of enzyme required per minute to convert 1.0 M D-fructose to D-sorbitol at pH 7.6 at 25°C. A mixture of reagents plus recombinant SORD was used as the positive control; negative controls consisted of the same reagent mixture with no recombinant SORD, with recombinant SORD but no D-fructose, or with recombinant SORD but no ␤-NADH.
Extraction and Quantification of Intracellular Metabolites by Targeted Gas Chromatography Coupled with MS-Frozen tissue (50 to 100 mg) was homogenized in 250 l of ice-cold 80% methanol using a glass borosilicate grinder from Wheaton (Rockdale, UK). The homogenate was microscopically examined to ensure that it was cell free, and then it was transferred to Eppendorf vials and left on ice for 15 min to ensure efficient protein precipitation. After centrifugation (15,000g for 3 min at 0°C), the supernatant was snap-frozen and stored at Ϫ80°C, and the protein content of the pellet was determined via the Bradford method.
The derivatized metabolites and internal standards were subjected to GC-TOF-MS (GC 7890A, Agilent Technologies, Santa Clara, CA; GCT Premier Micromass, Waters, Manchester, UK) with an Rxi-5Sil MS Integra-Guard column (length, 30 m; internal diameter, 0.25 mm) and a film thickness of 0.25 m (Restek, Bellafonte, PA). One micro-liter of each derivatized sample was injected in splitless mode on a baffled glass liner and transferred to the capillary column by rapid heating of the liner from 50°C to 250°C at a rate of 12°C/s. For the separation of the metabolites, helium was used at a flow rate of 1 ml/min, and after an initial hold time of 2 min, a temperature gradient from 80°C to 320°C (rate ϭ 8°C/min) was applied. The TOF-MS was set to acquire centroided standard electron ionization mass spectra over a range of 50 to 600 m/z at a rate of three spectra per second. The GC-MS transfer line was heated to 280°C. Dynamic range enhancement was activated. C 6 ClF 5 was used as lock mass compound.
The MassLynx and QuanLynx programs (Waters, UK) were used to review and analyze the acquired data. The absolute concentrations of D-sorbitol, ␣ and ␤ D-fructose, and ␣ and ␤ D-glucose were calculated on the basis of the ratio of the intensity of specific fragments originating from the unlabeled compound to that of the added labeled analog (internal standard). These concentrations were used to estimate intracellular levels per milligram of tissue (adenoma versus normal mucosa). The relative concentration of lactate was estimated from the ratio of the intensity of specific fragments originating from the unlabeled compound to that of the added 13 C 1 -sorbitol (internal standard).

Proteomic Analysis of Human Colorectal Tissues and Colon
Cell Lines-We used a quantitative-MS-based discovery strategy to explore the proteome of human colorectal tissues and colon cell lines (normal and neoplastic). The characteristics of the precancerous colorectal lesions are listed in Table  I. Protein extracts from these tumors and their paired samples of normal mucosa (60 samples total) were analyzed using iTRAQ LC-MS/MS and the workflow described in Fig. 1. The inclusion of two reference sample mixes allowed us to control for technical variability across the 10 experiments on tissue samples, as the reference sample was analyzed twice in each experiment. OFFGEL electrophoresis was used to obtain highly reproducible, pI-based, in-solution separation of pooled iTRAQ-labeled peptides. Furthermore, for relative quantification of proteins using iTRAQ reporter ions, we adopted a stringent FDR for peptide spectra matches (PSMs), and high-confidence peptides for protein quantification were selected only if the reporter ions (113 and 114) were quantified in the reference sample mix (iTRAQ reporter channels 113 and 114). The dataset generated with this approach was large and complex, but we developed a simplified analytical method that allowed us to work with and merge the large data files generated after MS/MS ( Fig. 1). High-resolution MS/MS spectra acquired on the LTQ-Orbitrap Velos spectrometer after duplicate analysis of OFFGEL tissue sample fractions produced a total of 240 raw files (10 experiments, 120 fractions, 2 replicates). A total of 37,184 (FDR ϭ 0.9%) unique tryptic peptides were confidently identified and quantified from 285,929 unique PSMs (FDR ϭ 0.2%) (Table II, supplemental  Table S1). Ten thousand four hundred and fifty-two proteins (FDR ϭ 1.5%) were assembled from the quantified peptides. Proteins that were indistinguishable in MS/MS (i.e. two or more proteins identified on the basis of the same peptide sequence; see "Experimental Procedures" for details) were represented as a single family. The result was a total of 4325 nonredundant protein families, two-thirds (2865, 66%) of which were relatively quantified in at least 9 normal mucosaadenoma pairs, and 1072 (25%) in all 30 pairs (Table II,  supplemental Table S1).
To verify the efficiency of iTRAQ protein labeling, we repeated the database search with methylthio (C) set as a fixed modification and iTRAQ 8-plex (N-term), iTRAQ 8-plex (K), iTRAQ 8-plex (Y), and oxidation (M) set as variable modifications. (All other search parameters were unchanged.) The assigned PSMs were filtered, as described in "Experimental Procedures," and the average iTRAQ labeling efficiency achieved in each of the 10 tissue experiments was 96% (supplemental Table S2). To ascertain the efficacy of including a standard sample mix as a reference for normalization, we compared combined Gaussian plots of log 2 -protein ratios of normal mucosa or adenoma samples with the respective reference channel per experiment (e.g. 115/114 versus 113/114 for normal tissues, 116/114 versus 113/114 for adenomas; see Fig. 1). The ratios displayed normal distributions in all channels. For the reference channel (113/114), log 2 -ratios were largely centered on 0, whereas the distribution of adenoma and normal channel log 2 -ratios was broader and not always centered at 0 (data not shown).
Sample complexity is a common problem in the analysis of proteomic data from human colorectal tissues. It stems in part from contamination of the epithelial cell proteome by proteins from stromal cells (which were inevitably present in our specimens, even though the endoscopic tissue sampling procedure we used yielded superficial specimens with consistently high epithelial contents). Microdissection can be utilized to  Number of  previously  excised lesions   1  77  M  S  25  IIaϩIIc  IIIs-IIIL  TA  LGD  1  0  2  73  F  A  25  IIaϩIIc  IIIs-IIIL  TA  LGD  1  2  3  59  M  T  30  IIaϩIIc  IIIsϩIIIL  TA  LGD  1  0  4  73  F  R  50  Is  IV  VA  LGD  1  0  5  74  M  R  40  Is  IV  VA  HGD  2  1  6  77  M  C  25  IIa  IIIL  VA  LGD  1  0  7  80  M  A  40  IIa  IIIL  TVA LGD 1 A  15  Is  IV  TVA  LGD  7  0  16  85  F  S  25  IsϩIIa  IV  TA  LGD  1  1  17  66  M  A  30  IIa  IIIL  TA  HGD  2  0  18  72  M  A  30  Is  IV  TVA  HGD  2  0  19  71  M  S  30  IIa  IIIL  TVA  LGD  2  0  20  59  M  R  60  Is  IV-Vi  TVA  HGD  1  0  21  78  M  A  50  Is  IV-Vi  TA  LGD  1  0  22  75  M  R  25  Is  IV-Vn  TVA  HGD  6  0  23  73  F  D  25  Is  IV  TA  LGD  1  0  24  69  F  R  90  IsϩIIa  IV  TVA  LGD  1  0  25  75  M  T  18  IIa  IIIL  TA  LGD  1  0  26  61  M  A  40  IsϩIIa  IV  TVA  LGD  20  0  27  76  M  S  30  Is  IV-Vi  TA  HGD  1  0  28  78  F  R  60  IIaϩIs  IV  TVA  LGD  1  1  29  89  M  R  30  Is  IV  TA  LGD  3  0  30  75  M  A  50  Is  IV-Vn  TVA  HGD/ (82). c Low-grade versus high-grade dysplasia as defined by the World Health Organization classification of tumors of the digestive system at the editorial and consensus conference in Lyon, France, November 6 -9, 1999. d This number includes the lesion included in our proteomic study.
isolate subpopulations of cells, but it can diminish the quantity and quality of the proteins, rendering them suboptimal for some types of proteomic analysis. To avoid this problem, we adopted a novel strategy for preliminary identification of the proteomic alterations that were most likely to involve the epithelial-cell component of the adenomas. The proteomic profiles of the colon tissues were compared with those of six colon epithelial cell lines (five colon cancer cell lines plus   (23)). Changes in expression levels observed in adenomas (i.e. up-regulation or down-regulation with respect to normal mucosal levels) were presumed to be epithelial-cell-specific if similar changes were found in the colon cancer cell lines (relative to HCEC cells). After OFFGEL fractionation, duplicate MS analysis of iTRAQ-labeled peptides (24 fractions) from the six cell lines was performed in an LTQ-Orbitrap Velos mass spectrometer, and 11,266 peptides (FDR ϭ 0.5%) were confidently identified and quantified from 27,922 unique PSMs (FDR ϭ 0.4%) ( Table  II, supplemental Table S3). A total of 2017 nonredundant protein families (FDR ϭ 1.1%) were identified and relatively quantified in cell lines; 1957 (97%) were present in all six cell lines (Table II). In the iTRAQ experiment with cell lines, the peptide labeling efficiency was 95% (supplemental Table S2).
Relative Quantification of the Proteomes of Colorectal Tissues and Cell Lines-The concentration range for proteins expressed in human tissues spans 10 orders of magnitude. We chose not to deplete our protein samples of high-abundance proteins (e.g. albumin, IgG), because with the number of tissue samples being analyzed, additional sample preparation steps were considered potential sources of confounding variability (30). As an alternative, each of the 10 pooled iTRAQ-labeled samples (10 experiments) was separated into 12 fractions based on the isoelectric point of peptides, reducing the complexity of our protein matrix and limiting the risk of bias toward the more abundant proteins.
The expression levels of the 4325 nonredundant protein families we were able to relatively quantify in colorectal tissues spanned 4 orders of magnitude, as deduced from the protein Mascot emPAI value (used as a proxy for the emPAI value (31) to estimate protein concentrations) ( Fig. 2A). Thirty percent (1304/4325) of these families were relatively quantified on the basis of more than one unique peptide. At the top of this list were the large proteins AHNAK, DYNC1H1, DSP, and FCGBP (Fig. 2B). In colon epithelial cell lines, 1174 of the 2017 protein families were relatively quantified with more than one unique peptide.
Gene Ontology Annotation in Scaffold was used to identify the subcellular localizations of these protein families and the biological processes they were involved in. The GO categories represented in the tissue and cell line proteomes were fairly similar. In the cell line proteome, however, the categories generally contained fewer proteins, as the total number of proteins detected in these cells was less than that in the tissues (Fig. 2C). Cytoplasmic and organelle-or membraneassociated proteins were the most highly represented categories in our extracts, but nuclear proteins were also readily identified, which indicates that our protein extraction procedure was not strongly biased toward a few cell compartments. The most highly represented biological processes in the tissue proteome were metabolic or biosynthetic processes, whereas cell component organization and developmental processes predominated in the cell line proteome (Fig. 2D). Stromal contamination was probably responsible for the increased representation of immune system processes in the tissue proteome (relative to that of the cell lines).
Log 2 -expression levels of the protein families identified in all tissues (n ϭ 1072) and cell lines (n ϭ 1957) (Table II) were subjected to principal component analysis, which easily distinguished the adenomas from the normal mucosa samples (Fig. 3A) and the five colon cancer cell lines from the immortalized normal colon epithelial cell line HCEC (Fig. 3B). The cancer cell lines were also segregated into three distinct groups reflecting their patient origins (Fig. 3B). When principal component analysis was performed on the expression intensity values of the 1496 nonredundant proteins expressed and quantified in all tissues and cell lines (i.e. those representing the intersection of the tissue (n ϭ 10,452) and cell line (n ϭ 5056) protein sets reported in Table II), colon cancer cell lines clustered with adenomas, whereas HCEC cells were closer to the normal mucosa samples (Fig. 3C).
As a quality control measure, data for the 60 tissue samples (1072 protein families) were subjected to hierarchical clustering analysis. As shown in supplemental Fig. S1, three main clusters emerged: one consisting almost exclusively of normal mucosa samples, a second containing mainly adenomas, and a third that included both tissue types. The 18 samples in the third cluster (nine adenoma-normal mucosa pairs) formed three subclusters, which corresponded to 3 of the 10 experiments for which trypsin digestion, iTRAQ labeling, and LC-MS/MS were performed on the same day. These findings were suggestive of an experimental bias. Indeed, when these 18 potentially substandard samples were included in subsequent statistical analyses, they diminished the stringency of our threshold and increased the error margin for false identification. We therefore excluded these samples from the analyses described in the following section.
Proteins Displaying Dysregulated Expression in Colorectal Adenomas and Colon Cell Lines-To identify proteins with significantly altered expression in adenomas (relative to normal mucosa), we analyzed data on the proteins quantified in the remaining 21 tissue pairs. The experimentally derived protein fold-change threshold defining differential expression was based on comparison of the distributions of average intensity log 2 ratios in the reference standard (113 versus 114, seven experiments) and in patient samples (adenoma versus normal, seven experiments). The average ratios in the reference sample were centered on 1 (i.e. log 2 0). Average foldchange ratios for the tissue samples displayed wider variance (supplemental Fig. S2). Seventeen percent of the log 2 ratios for the tissue samples exceeded Ϯ0.5 log 2 scale (indicating a linear fold change Ն Ϯ1.4), as opposed to only 5% of those for the reference samples. For each protein, a paired t test was used to compare the intensity ratios in normal and adenomatous samples (i.e. normal/114; adenoma/114). After adjustment for multiple comparison (Benjamini-Hochberg method), we selected a stringent q value cutoff of Յ0.02.
The 212 proteins that satisfied this criterion and presented a mean expression fold change of Ϯ1.4 (log 2 0.5) or greater were classified as significantly dysregulated in adenomas. They included 76 with up-regulated expression and 136 with down-regulated expression in the tumor samples (Table III). When protein abundance iTRAQ ratios for these 212 proteins FIG. 2. Protein coverage with iTRAQ shotgun analysis in colorectal tissues. A, analysis of Mascot emPAI values (used as a proxy for emPAI values) revealed a dynamic range of protein abundance in tissues that spanned 4 orders of magnitude (y-axis) and corresponded with known abundance estimates for various proteins in these tissues. The high/moderate-abundance proteins (e.g. ACTB, FABP5, CHGA) and low-abundance protein (e.g. POLR3A) relatively quantified in our samples are highlighted relative to their mean Mascot emPAI value. B, distribution of reported abundance ranges for the proteins with at least one unique peptide identified in tissues, and the high-molecular-weight proteins with the greatest number of unique peptides identified. Subcellular localizations of the proteins identified in colorectal tissues and cell lines (C) and biological processes in which these proteins are involved (D). This analysis was performed using Scaffold and Gene Ontology annotations (see "Experimental Procedures").    were plotted on a heat map, adenomas and normal mucosa samples formed two distinct clusters (Fig. 4A). As shown in Fig. 4B, tissue expression levels for the 212 dysregulated proteins showed good correlation (r ϭ 0.74, p Ͻ 0.001, 95% confidence interval ϭ 0.67-0.79) with those of mRNAs for the same genes (measured by our group in another set of colorectal adenomas) (26). Table IV lists the biological processes that were overrepresented in this set of proteins. At the top of this list was xenobiotic metabolism, a process already linked with adenoma formation on the basis of enrichment studies of transcriptomic datasets conducted by our group (32). Three of the dysregulated proteins involved in this process (CYP2S1, NQO1, and GSTP1) displayed up-regulated expression in adenomas, but most were characterized by tumor-related down-regulation (ADH1B, ADH1C/ADH1A, UGT1A9/UGT1A6, UGT1A1/UGT1A4/UGT1A3/UGT1A5, UGT1A7/UGT1A8, UGDH, MAOA, SULT1A3/SULT1A1, PAPSS1/PAPSS2, UGP2). Network-building analysis revealed that all these proteins were linked by subnetworks controlled by cancer-associated transcription factors, such as SP1 or, less frequently, MYC, HIF1A, or TP53 (supplemental Fig. S3). As noted in Table IV, a very similar picture emerged when GO enrichment was also analyzed in a larger set of 621 dysregulated proteins selected with less stringent criteria (q value cutoff Յ 0.2; average log 2 fold change Ն Ϯ0.5).
The expression levels of 111 (52%) of the 212 proteins that were differentially expressed in adenomas were also quantified in cell lines (those shown in bold in Table III and referred to hereinafter as the "epithelial cell signature" proteins). Almost half (n ϭ 51, 46%) showed directionally similar tumorrelated dysregulation in both analyses. Because cell line stud- a Two or more accession numbers: proteins from the same family or isoforms from the same gene. Boldface numbers indicate "epithelial cell signature" proteins (see text).
b Designated candidate cancer biomarkers in the Human Protein Atlas database.
ies were conducted with only one noncancerous line, these findings obviously require further validation. They suggest, however, that these 51 proteins are indeed expressed in the epithelial cells of normal colorectal tissues and that their expression is dysregulated in the epithelial cells of adenomas.
Up-regulation of SORD Expression and Activity in Colorectal Adenomas and Cancer Cell Lines-Sorbitol dehydrogenase, a key enzyme in the polyol pathway, was one of the most significantly up-regulated proteins in our colorectal adenomas (based on q values) (Table III). Because its increased expression could have metabolic consequences with potential effects on tumorigenesis, we performed Western blotting and immunostaining studies to validate this finding. The reliability of the anti-SORD antibody we had chosen was first tested on protein extracts from the six colorectal epithelial cell lines (Fig.  5A). The tumor-related log 2 fold changes detected with Western blotting were substantially greater than those documented with iTRAQ (2 to 6 versus 0.4 to 1, respectively) (Fig. 5B), which was not surprising, as iTRAQ has been reported to underestimate protein abundance (33). However, the relative quantities of SORD found with the two methods were fully consistent. As for the 21 adenomas, the elevated SORD expression documented in these tumors by iTRAQ (Fig. 5C) showed good correlation with the increased SORD mRNA levels we had previously found in 42 other lesions of this type (26) (Fig. 5D). Western blot analysis of four randomly selected adenoma-normal mucosa pairs from the present series revealed obvious up-regulation of SORD expression in all four tumors, although the magnitude of the increase varied (Fig. 5E).
SORD activity was then assayed (see "Experimental Procedures") to see how it corresponded with the enzyme expression levels reported above. As shown in Fig. 5F, the results of cell line assays were fully consistent with the Western blotting data: SORD activity was seven times higher in HT29 than in HCEC cells, and more limited up-regulation was found in SW480. High correlation between enzyme activity and protein level was also documented for three randomly selected adenoma-normal mucosa pairs (Fig. 5F).
MS and Western blotting findings were further validated with immunostaining studies, as shown in Fig. 6. Cytoplasmic SORD staining was evident in the colon cancer cell line HT29 but was weaker or even absent in normal epithelial HCEC cells (Figs. 6A and 6B). As for colorectal tissues, SORD cytoplasmic expression was limited to the bottom of the normal epithelial crypts (Figs. 6C, 6D, and 6E), but its expression was markedly increased in adenomatous and cancerous glands (Figs. 6F-6I). These findings suggest that SORD is likely to be A N N N N N N N N N N N N N N N N N N N N N (Table   3) Adenomas Normal mucosa expressed in proliferating cells, although it was largely absent in HCECs, which undergo regular proliferation in vitro. Furthermore, nuclear localization of SORD was noted in some adenomatous crypts (supplemental Figs. S4A and S4C), and the cells in question were almost always negative for the well-known proliferation marker Ki-67 (supplemental Figs. S4B and S4D). This mutually exclusive staining pattern was also observed in normal crypts of the ileum, where SORD, interestingly, appeared to be expressed in the nuclei of putative stem cells (supplemental Figs. S4E and S4F).

proteins
Polyol Pathway Enzyme Expression and Metabolite Levels in Cell Lines and Tissues-We then examined the state of the polyol pathway (supplemental Fig. S5A) in colorectal cell lines and tissues. As shown in supplemental Fig. S5B, immunoblot studies revealed decreased AKR1B1 expression in HT29 (versus HCEC cells) and adenomas (versus corresponding normal mucosal samples), whereas SORD expression and that of KHK were up-regulated in tumor cells and tissues. As for the metabolites (supplemental Fig. S5C), D-glucose levels were significantly decreased in adenomas. Less dramatic changes were observed in the levels of D-sorbitol and D-fructose, which both showed a tendency to decrease in tumor tissues. DISCUSSION Although a number of proteomic studies have comparatively analyzed different types of colorectal tissues, precancerous lesions have been considered in only three (21,34,35), and in two of these (21,34), the number of adenomas analyzed was very small (Յ4). The study by Lam et al. (35) is the only one that compared protein expression in a relatively large number (n ϭ 20) of paired adenoma and normal mucosa samples. They used two-dimensional gel electrophoresis to resolve over 1000 proteins in the two tissue groups, and those displaying differential expression were then analyzed with MALDI-TOF/TOF-MS. MS/MS validation pinpointed four proteins (ANXA3, S100A11, EIF-5A1, and S100P) whose expression in adenomas was significantly increased. Using MS with iTRAQ 8-plex peptide labeling and OFFGEL fractionation allowed us to quantitatively compare protein expression in 30 colorectal adenomas and paired samples of normal mucosa and investigate low-abundance proteins that cannot be evaluated with proteomics based on two-dimensional gel electrophoresis. All in all, 4325 nonredundant protein families were quantified in our colorectal tissues (25% of which were identified in all 60 samples) (Table II), and the 212 proteins we flagged as significantly dysregulated in adenomas included three of the four proteins identified by Lam et al. (up-regulation of the fourth, EIF-5A1, failed to meet our stringent criterion for significance) ( Table III).
The cell types in which these proteomic changes occur is of obvious interest, as colorectal cancer arises from the epithelial component of the colorectal mucosa. Although our findings are preliminary and will naturally require validation in future studies, 51 of the 212 proteins listed in Table III were "epithelial cell signature" proteins and showed directionally similar expression changes in colon cancer cell lines versus HCEC. It therefore seems likely that their dysregulated expression in adenomas is a feature of neoplastic transformation of colorectal epithelial cells. However, epithelial-stromal cell interactions can also play important roles in tumorigenesis (20). Our approach also allowed us to identify 101 proteins displaying adenoma-related dysregulation that were probably  (25). Only processes with an elim p value Ͻ 1.0E-04 are shown. e Processes that were also among the top 12 processes displaying enrichment in a larger set of 621 dysregulated proteins selected with less stringent criteria (q value Յ 0.2; average log 2 fold change Ն Ϯ0.5; see "Results" section for details).
f Processes that shared a common GO ancestor (immune system process) with the process displaying the most significant enrichment in the larger set.   (Table III). These proteins were mainly involved in immune-related processes (immune response, complement activation, T-cell co-stimulation), which are usually not represented in colon epithelial cell lines. Their expression changes are likely to have important effects on the microenvironment of an epithelial-cell tumor.
Our search for potential biomarkers of early-stage colorectal tumorigenesis focused exclusively on the 76 proteins whose expression was significantly up-regulated in adenomas. According to the Human Protein Atlas database (36), 69 (91%) of these have cancer-related features, and 16 of the 69 are already classified as candidate cancer biomarkers (Table  III). The Human Protein Atlas database contains information on protein expression in normal and cancer tissues, but not in those regarded as precancerous. The overlap between our findings and those of the Human Protein Atlas suggest that most protein expression changes identified thus far in colorectal adenocarcinomas are probably already detectable in the benign precursors of these lesions. Supplemental Fig. S6 shows the expression profiles of the 10 proteins that were most markedly up-regulated in adenomas. This group included two of the four proteins identified by Lam et al. (35) as significantly overexpressed in adenomas. Annexin A3 (ANXA3), for example, is at the top of our list (based on q values) (Table III). An angiogenic factor that induces VEGF production via the HIF-1 pathway (37), ANXA3 belongs to a family of calcium-dependent, phospholipid-binding proteins involved in diverse biological processes, including signal transduction, inflammatory responses, membrane organization, and the regulation of cellular growth (38,39). Dysregulated ANXA expression is also a common feature of colorectal cancer (39), and most other cancers as well (40). S100A11 expression was also increased in these tumors, which is consistent with earlier reports (41). The cytosolic S100 proteins interact directly with peptides on the N-terminal domain of annexins (38,42), and like the annexins, they also have diverse intracellular and extracellular functional roles (43).
Among the other top 10 proteins displaying adenoma-related up-regulation were LDHA and LDHB. Their expression levels were not measured separately, but LDHA is presumably responsible for the increased expression observed in our adenomas. LDHB expression is in most cases epigenetically silenced in colon cancer cells (44,45), whereas LDHA is overexpressed, and its activity is maintained via the oncogenic tyrosine kinase FGFR1 (46). LDHA is a key player in the reversible conversion of pyruvate to lactate during aerobic glycolysis, a typical feature of cancer cell metabolism first described by Warburg (47). The sodium-and potassium-coupled chloride cotransporter SLC12A2 is expressed on the basolateral membrane of the normal colon epithelium, where its recruitment and activation are regulated by calcium and cAMP. Loss of SLC12A2 leads to impaired chloride secretion in the intestine (48,49), but to our knowledge, there are no published data linking this protein to colon cancer. The fifth markedly overexpressed protein was SET, one of the five proteins that make up the inhibitor of acetyltranferases (INHAT) complex. Two other INHAT components, APEX1 and ANP32A/ANP32B, were also up-regulated in adenomas (albeit to a lesser extent than SET) (Table III). These changes are noteworthy because INHAT binds directly to histones, preventing their acetylation by histone acetyl transferases (50 -52), and loss of histone acetylation is a crucial step in gene silencing (53,54). Thus far, INHAT's role in cancer has not been widely investigated, but overexpression of the complex components has been observed in serous epithelial ovarian cancer (55). The up-regulated expression of PPA1/PPA2 in our adenomas might play various roles in colorectal tumorigenesis, as these proteins are key players in the synthesis of fatty acids, nucleotides, amino acids, and other essential molecules (56). The phosphoprotein nucleolin, an essential protein for proliferating cells (57), appears to regulate several steps in the biogenesis of ribosomes, including transcription, ribosome assembly, and the processing of precursor ribosomal RNA (58 -60), all of which might be instrumental in adenoma growth. As for OCIAD2, strong immunoreactivity for this protein has been reported in early-stage adenocarcinomas of the lung and in ovarian cancers (61)(62)(63), but there are no published data linking it to colorectal tumorigenesis. In contrast, the secreted protein REG4, which promotes mitosis and enhances the motility and invasiveness of colon cancer cells, is strongly expressed in these cells and in the serum of patients with colorectal cancer (64 -66).
The final protein characterized by marked adenoma-related up-regulation was SORD, a key enzyme in the polyol metabolic pathway. It was selected for validation studies because although aberrant polyol pathway activity has been implicated in diabetic complications (67)(68)(69)(70) and myocardial ischemia FIG. 5. Significantly up-regulated SORD expression and activity in colorectal cell lines and adenomas. A, tumor-related up-regulation of sorbitol dehydrogenase (SORD) in colon cancer cell lines was confirmed with Western blotting. The SORD dysregulation trend was identical to that observed with iTRAQ-based MS/MS, although when immunoblot results were quantified (B), the log 2 fold changes were more than five times greater than those documented in the iTRAQ study. C, SORD protein expression (iTRAQ analysis) in 21 normal mucosa-adenoma tissue pairs. D, SORD mRNA expression in 42 other normal mucosa-adenoma pairs from a previous study by our group (26). Error bars indicate the means and 95% confidence intervals. E, Western blots showing tumor-related up-regulation of SORD expression in four randomly selected adenoma (A)/normal mucosa (N) tissue pairs of the 21 shown in panel C (see Table I for sample descriptions). F, SORD activity also displayed tumor-related up-regulation in cell lines (HT29 and SW480 versus HCEC cells) and tissues (adenomas versus normal mucosa). Columns show mean enzyme activity measured in at least two replicates; error bars indicate standard deviations from means. The Western blot beneath the graph shows SORD levels measured in the extracts used for the enzyme activity assays. (71), the role of SORD in tumorigenesis was completely unknown. During the execution of this study, however, up-regulated SORD expression was reported in prostate cancer (72) and in colorectal adenomas (21), and these findings strength-ened our resolve to characterize this phenomenon in colorectal tumorigenesis.
Up-regulated SORD expression and activity in adenomas (Fig. 5) would enhance the production of fructose (see sche- Consistent with proteomic data, SORD expression was (A) negligible or absent in HCECs but (B) clearly expressed in the cytoplasm of HT29 cells. C, in normal colorectal mucosa, SORD expression was limited to the lower portion of the epithelial crypts, where stem cells and highly proliferating cells are located. Higher magnification views show staining at (D) the base versus (E) the mouth of colonic crypts. F, G, its expression was markedly increased in adenomatous glands (red arrowheads) relative to normal crypts (green arrowheads). Panels H and I show abundant expression of SORD in a large adenoma and in a cancer, respectively. matic of supplemental Fig. S5A), thereby increasing the generation of triose sugars and diacylglycerol (intermediates in the glycolytic and lipid signaling pathways, respectively). Fructose is also several times more effective than glucose in promoting intracellular non-enzymatic glycation (73)(74)(75), and advanced glycation end products may contribute to the vascular complications of diabetes and other pathologic conditions (67, 76 -78). Whether these fructose-driven metabolic events play a role in the development of adenomas is unclear, but the polyol pathway was very active in the adenomatous cells we examined. This activity was also reflected in the concomitant increase of the expression of KHK (supplemental Fig. S5B), the enzyme that catalyzes the transformation of fructose to fructose-1-P, downstream from the polyol pathway.
The effects of these enzymatic changes on sorbitol and fructose concentrations in adenomas need to be investigated in larger tissue series, but our preliminary data suggest that the levels of both are slightly decreased in these lesions (supplemental Fig. S5C). In contrast, our adenomas exhibited dramatically reduced concentrations of glucose, the initial substrate in the polyol pathway (supplemental Fig. S5C). Adenoma-related dysregulation was also noted in the expression of AKR1B1, the enzyme that converts glucose to sorbitol (supplemental Fig. S5B). Exploitation of the polyol pathway to divert carbon from glucose to other energy intermediates might provide adenomatous cells with a selective advantage over normal cells. This pathway might prove to be another means of tumor-related glucose consumption in addition to the well-known glycolytic and pentose phosphate pathways (supplemental Fig. S5A). Advanced cancer cells consume glucose at a much higher rate than normal cells, and much of their energy is generated by aerobic glycolysis rather than by oxidative phosphorylation of glucose in the mitochondria (i.e. the Warburg effect) (79). The predominantly glycolytic phenotype of cancer cells results in low glucose levels and high concentrations of lactate (47,80,81). The relative concentrations of lactate in the three adenomas we tested were significantly greater than those found in matched samples of normal mucosa (supplemental Fig. S5C), indicating that the Warburg effect is already evident in precancerous colorectal lesions. Studies involving metabolic flux analysis to monitor the fate of isotopic tracers in in vitro and in vivo systems would provide further insight into the biological roles of the polyol pathway in tumorigenesis. Further information on selected PSMs, peptides, and corresponding assembled proteins can be found in supplemental Tables S1 (tissues) and S3 (cell lines). Supplemental Figs. S7 through S17 show spectra for the proteins identified with a single peptide (listed in supplemental Tables S1 and S3).