Combined Proteomics and Transcriptomics Identifies Carboxypeptidase B1 and Nuclear Factor κB (NF-κB) Associated Proteins as Putative Biomarkers of Metastasis in Low Grade Breast Cancer*

Current prognostic factors are insufficient for precise risk-discrimination in breast cancer patients with low grade breast tumors, which, in disagreement with theoretical prognosis, occasionally form early lymph node metastasis. To identify markers for this group of patients, we employed iTRAQ-2DLC-MS/MS proteomics to 24 lymph node positive and 24 lymph node negative grade 1 luminal A primary breast tumors. Another group of 48 high-grade tumors (luminal B, triple negative, Her-2 subtypes) was also analyzed to investigate marker specificity for grade 1 luminal A tumors. From the total of 4405 proteins identified (FDR<5%), the top 65 differentially expressed together with 30 previously identified and control markers were analyzed also at transcript level. Increased levels of carboxypeptidase B1 (CPB1), PDZ and LIM domain protein 2 (PDLIM2), and ring finger protein 25 (RNF25) were associated specifically with lymph node positive grade 1 tumors, whereas stathmin 1 (STMN1) and thymosin beta 10 (TMSB10) associated with aggressive tumor phenotype also in high grade tumors at both protein and transcript level. For CPB1, these differences were also observed by immunohistochemical analysis on tissue microarrays. Up-regulation of putative biomarkers in lymph node positive (versus negative) luminal A tumors was validated by gene expression analysis of an independent published data set (n = 343) for CPB1 (p = 0.00155), PDLIM2 (p = 0.02027) and RELA (p = 0.00015). Moreover, statistically significant connections with patient survival were identified in another public data set (n = 1678). Our findings indicate unique pro-metastatic mechanisms in grade 1 tumors that can include up-regulation of CPB1, activation of NF-κB pathway and changes in cell survival and cytoskeleton. These putative biomarkers have potential to identify the specific minor subpopulation of breast cancer patients with low grade tumors who are at higher than expected risk of recurrence and who would benefit from more intensive follow-up and may require more personalized therapy.

Breast cancer is the most common form of cancer in women worldwide and distant metastases are the main reasons for patient mortality. Cancer emerges as a consequence of multiple genetic aberrations, whereas metastatic characteristics may be predisposed or acquired during disease development and are governed by a number of genetic and biochemical mechanisms (1,2). In clinical practice, both traditional and molecular prognostic markers are used for riskgroup discrimination and determination of metastatic potential. Traditional prognostic markers in breast cancer involve age at diagnosis, tumor size and grade, lymph node status, and presence of distant metastasis. Tumor size is a potent prognostic factor predicting higher probability of metastatic behavior for larger tumors. More differentiated tumors (e.g. grade 1) have low dissemination potential in general, although less differentiated, more proliferative high grade tumors (e.g. grade 3) form metastases much more frequently. Low grade breast tumor cells spread predominantly via lymph vessels and lymph nodes are therefore the first site of tumor cell dissemination prior to eventual spread into distant organs such as lung or bone (3). Molecular prognostic markers involve hormonal receptors (estrogen receptor (ER) 1 , progesterone receptor (PR)), Her-2/neu receptor, and expression panels like Oncotype DX and MammaPrint. Also, the American Society for Clinical Oncology (ASCO) has recommended urokinase plasminogen activator (PLAU) and urokinase plasminogen activator inhibitor (SERPINE1) as indicative factors for metastatic potential in breast cancer (4,5), however their use in clinical practice has not been generally accepted (4).
Currently available markers are not sufficient for precise risk-group or individual assessment specifically in low grade luminal-A tumors, whose general prognosis is very favorable, resulting in treatment by less aggressive adjuvant therapy and no chemotherapy. However, a low percentage of these tumors develop early lymph node metastases. The molecular mechanism of this phenomenon is not known and current clinical practice lacks the means for predicting its occurrence. New knowledge is thus essential for identifying biomarkers that can identify high risk individuals within the predominantly low risk population of patients with low grade breast cancers. These high risk patients should then receive more intensive follow-up and could be considered for more aggressive therapy, which cannot be achieved currently in view of the detrimental effects of therapy on the majority of patients who will not show benefit. In addition, understanding the mechanisms of metastasis of low grade breast cancer may lead to the identification of new therapeutic targets.
Shotgun proteomics with isobaric tags for relative and absolute quantification (iTRAQ) is an established approach for quantification of proteins related to cancer metastasis (6,7). Moreover, recent developments made to multidimensional liquid chromatography and mass spectrometry, including the FT-Orbitrap detector technology, have significantly advanced the discovery proteomics field (8). We have used this untargeted quantitative approach to identify proteins correlating with lymph node metastasis in low grade breast cancer. A complementary targeted transcriptomics study was performed on the same sample set to identify those proteins that exhibited correlation with gene expression. The combining of protein and transcript level profiles allowed us to interrogate independent large patient data sets for validation and their impact on survival. In addition, we compared G1 and G3 tumors with and without metastasis to see whether the metastatic process is the same or different between these groups, providing fundamental information into the mechanisms of tumor progression in different tumor grades.

EXPERIMENTAL PROCEDURES
Tissue Procurement and Patient Characteristics-Patient informed consent forms along with tissue procurement procedures were approved by the ethics committee of the Masaryk Memorial Cancer Institute (MMCI) (see supplemental File 1 for ethics documents). Tissues were frozen in liquid nitrogen within 20 min after surgical removal and stored at Ϫ180°C in tissue bank of MMCI. A complementary formalin fixed, paraffin embedded tissue block was available for each sample for histological evaluation and immunohistochemical (IHC) analysis. A set of 96 preoperatively untreated breast carcinomas of 11-20 mm maximum diameter (pT1c) was selected for analysis. The sample set characteristics are shown in supplemental File 1 in detail and the study design is summarized in Fig. 1. The sample set included 48 grade 1 tumors positive for both ER and PR, without HER2 amplification; 24 of these had lymph node metastases at the time of operation, selected from a total collection of about 4000 carcinoma cases, as they are quite rare. The matching node negative pT1c grade 1 tumors exhibiting identical profiles of ER, PR, and HER2 were selected randomly from the total collection.
To investigate similarities and difference in metastasis biomarkers in high and low grade breast cancers, a second sample set was collected of 48 pT1c grade 3 carcinomas, 24 of them node positive and 24 node negative. These were selected to ensure representation of the following immunophenotypes: triple negative (n ϭ 16); tumors with HER2 amplification (n ϭ 16), of which 8 were ER positive and 8 ER negative; and ER positive tumors without HER2 amplification (n ϭ 16), always half of them with and half without metastases.
For all samples, frozen tumor samples were cut into two pieces used for (1) RNA isolation for confirmation of RNA integrity and for qRT-PCR analyses on TaqMan Low density arrays, and (2) protein isolation for iTRAQ-2DLC-MS/MS analysis. The set of the same 96 tumors was used for proteomics, transcriptomics, and IHC analyses. An independent set of 64 additional grade 1 luminal A breast carcinomas was used for IHC validation of CPB1 protein levels. These samples were preoperatively untreated, ER positive, PR positive, HER2 negative tumors of 11-30 mm maximum diameter (T1c-T2); 43 were lymph node negative and 21 lymph node positive.
RNA Isolation, RNA Integrity Control and Reverse Transcription-After homogenization in a MM301 mechanical homogenizer (Retsch, Haan, Germany) using a metal ball for 2 ϫ 2 min at 25 s Ϫ1 in 600 l of RLT buffer (Qiagen, Hilden, Germany) with 1% ␤-mercaptoethanol, total RNA was isolated using RNeasy Mini Kit (Qiagen) following the manufacturer's protocol. RNA was eluted with 30 l of RNase-free water, quantified at 260 nm using NanoDrop ND-1000 (Thermo Scientific, Wilmington, DE) and quality checked by measurement of RNA integrity number (RIN) on Agilent 2100 Bioanalyzer (Agilent Technol- 1 The abbreviations used are: ER, estrogen receptor; ASCO, American Society for Clinical Oncology; CPB1, carboxypeptidase B1 protein encoded by CPB1 gene; ESR1, estrogen receptor alpha protein encoded by ESR1 gene; FCH, fold change; G1, grade 1; G3, grade 3; HER2, Her2/neu receptor encoded by ERBB2 gene; iTRAQ, isobaric tags for relative and absolute quantification; LA, luminal A subtype breast tumors; MMCI, Masaryk Memorial Cancer Institute; N0, lymph node negative tumors; N1-2, lymph node positive tumors; PDLIM2, PDZ and LIM domain protein 2 encoded by PDLIM2 gene; PLAU, urokinase plasminogen activator encoded by PLAU gene; qRT-PCR, quantitative real-time polymerase chain reaction; PR, progesterone receptor protein encoded by PGR gene; RELA, NF-B transcription factor p65 protein encoded by RELA gene; RNF25, ring finger protein 25 protein encoded by RNF25 gene; SERPINE1, plasminogen activator inhibitor 1 (also known as PAI-1) encoded by SERPINE1 gene; STMN1, stathmin 1 protein encoded by STMN1 gene; TN, triple negative subtype breast tumors; TRAF3IP2, TRAF3-interacting protein 2 encoded by TRAF3IP2 gene; TMSB10, thymosin beta 10 protein encoded by TMSB10 gene; YWHAH, 14 -3-3 protein encoded by YWHAH gene. ogies, Waldbronn, Germany). Samples that did not pass the criterion of RNA quality (RINϾ7) were excluded and replaced by other tissues with the same clinicopathological characteristics for the whole study (transcriptomics, proteomics, and IHC). The final sample set is presented in supplemental File 1. Isolated RNA was stored at Ϫ80°C. Total RNA (0.9 g) was reverse transcribed using H Minus M-MuLV Reverse Transcriptase (Fermentas, Vilnius, Lithuania) according to the manufacturer's protocol with the use of random hexamer primers (Fermentas).
Proteomics Sample Preparation, Pooling, Digestion, and Labeling-One hundred and fifty microliters lysis solution (0.5 M triethylammonium bicarbonate, pH 8.5; 0.05% w/v SDS) was added to each tissue and homogenized in a mechanical homogenizer (Retsch Technology, Haan, Germany) using a metal ball for 2 ϫ 2 mins at 25 s Ϫ1 . The homogenates were then subjected to needle sonication (Bandelin 2200 Ultrasonic homogenizer, Bandelin, Germany, 30 ϫ 0.1 s pulses at 50 W), kept on ice for 1 h, centrifuged at 14000 ϫ g for 20 min at 4°C and the total protein in the supernatant quantified using a modified Bradford assay (Bio-Rad, Hercules, CA) according to manufacturer's instruction.
To match the sample set size with the sample capacity of iTRAQ-2DLC-MS/MS approach, sample pooling was performed as outlined in Fig. 1 with details presented in supplemental File 1: 25 g (total protein) aliquots of tumor lysates from four patients with the same clinicopathological characteristics (tumor grade, lymph node status, ER and HER2 status) were pooled together. The pooled lysates containing 100 g of protein in 20 l of 0.5 M triethylammonium bicarbonate, pH 8.5 and 0.05% w/v SDS were subjected to reduction of cysteine S-S bridges by the addition of 2 l of 50 mM tris-2carboxyethyl phosphine (TCEP), followed by incubation for 1 h at 60°C. Cysteines were blocked by adding 1 l 200 mM methyl methanethiosulfonate in isopropanol and 10 min incubation at room temperature. For trypsin digestion, 6 l of freshly prepared trypsin (Roche, Mannheim, Germany) solution (500 ng/l) were added and incubated for 12 h at RT (protein/trypsin ratio ϳ1:30). Labeling with iTRAQ eight-plex (ABSCIEX, Darmstadt, Germany) was performed at RT for 2 h according to the manufacturer's instructions (see Fig. 1 for the design of three iTRAQ experiments and supplemental File 1 for the tissue specimens involved). The samples in each eight-plex were then mixed and evaporated in a centrifugal vacuum concentrator to 100 l final volume.
ZIC-HILIC Peptide Fractionation-A Dionex P680 HPLC Pump with PDA-100 photodiode Array Detector and SeQuant ZIC-pHILIC column 150 ϫ 4.6 mm, 5 m with a corresponding precolumn (20 ϫ 2.1 mm, 5 m) were used for peptide fractionation. Mobile phase (A) comprised 0.1% ammonium hydroxide in acetonitrile. Mobile phase (B) comprised 10 mM ammonium formate in water adjusted with ammonium hydroxide to pH 10. The mixture of iTRAQ labeled peptides (100 l) was diluted with 100 l ZIC-HILIC mobile phase (A) and centrifuged at 14000 ϫ g 10 min at RT. Sample injection volume was 190 l. Separation conditions were: isocratic 10% B for 10 min; gradient up to 40% B over 40 min; gradient up to 100% B over 40 min; isocratic 100% B for 10 min and gradient down to 10% B over 5 min. Flow rate was 0.4 ml/min, column temperature 30°C and UV detection at 280 and 254 nm. Thirteen to fourteen fractions were collected from each fractionation and dried with centrifugal vacuum concentrator.
Protein identification and quantification was based on accurate precursor mass measurements and high resolution/accurate fragmentation data. The mass spectrometer was operated in positive ion mode and a data-dependent "Top 10" method was employed. In each cycle a full scan spectrum was acquired in Orbitrap Velos (Thermo Fisher Scientific) (m/z range 400 -2000) at a target value of 1 ϫ 10 6 ions (two microscans) with resolution r ϭ 30.000 at m/z 400 followed by higher energy collision dissociation (HCD) on the 10 most intense ions with a target value of 5 ϫ 10 4 ions (1 microscan). Fragment ions were measured in the Orbitrap mass analyzer with resolution r ϭ 7500 at m/z 400. The 'lock mass' function was enabled for the MS mode, where the background ion at m/z 445.1200 was used as the lock mass ion. General mass spectrometric conditions were as follows: spray voltage, 1.75 kV; no sheath or auxiliary gas flow; S-lens, 60%. FT preview mode was disabled, charge state screening enabled and rejection of singly charged ions enabled. Ion selection thresholds were 500 counts for MS2, isolation width 1.2 Th, HCD normalized collision energy was 42. Dynamic exclusion was employed and Ϯ5 ppm window of the selected m/z was excluded for 30 s.
Proteomics Data Analysis-Protein identification and quantification in the iTRAQ experiments was performed with Proteome Discoverer ™ version 1.1 software (Thermo Fisher Scientific, Bremen, Germany) using the Mascot database search algorithm (Mascot server version 2.2.4, Matrixscience, London, UK). The data analysis parameters were as follows: Spectrum properties filter: Peptide mass range: 800 -7000 Da. Peak filters: S/N ϭ 3. Input data: Protein database: SwissProt (version 2010_04), enzyme name: Trypsin (cleaving polypeptides at the carboxyl side of lysine or arginine except when either is followed by proline), max. missed cleavage sites 2, taxonomy: Homo sapiens (20279 human protein entries were searched in total). Peptide scoring options: Peptide cut-off score: 10 (default by Proteome Discoverer). Protein scoring options: Use MudPIT Scoring: Yes. Protein relevance threshold: 20. Decoy database search: True. Target FDR 0.05 (as calculated by Proteome Discoverer). Tolerances: 5 ppm precursor mass tolerance and 0.8 Da fragment mass tolerance. Modifications: Dynamic (variable): Phosphorylation (STY), oxidation (M), deamidation (NQ), acetylation (K). Static (fixed): iTRAQ eight-plex (K, N-term), methylthio (C). Quantitation method: iTRAQ eight-plex (Thermo Scientific Instruments). Protein quantification was based on unique peptides (supplemental File 2) with at least three quantitative ratios using statistical analysis described below. Protein grouping function was disabled.
Statistical Analysis of Proteomics Data-Loess and global median normalization was used to process the proteomics data. Data were log2-transformed and analyzed on both peptide and protein level. Statistical significance of observed fold-change ratios was determined by one sample t test. p values were adjusted for multiple hypothesis testing by Benjamini-Hochberg procedure. To select differentially expressed proteins for further validation, three criteria were applied in parallel: (1) Fold change higher than 1.2 for up-regulation or lower than 0.8 for down-regulation, (2) lower limit of the fold change confidence interval above 1.0 for up-regulation and upper limit below 1.0 for down-regulation; (3) FDR adjusted p values Ͻ 0.05 were considered as statistically significant for proteins with high and medium number of observations (NϾ10) and FDR adjusted p values of 0.1 for proteins with low number of observations (N Յ 10). The rationale for choosing the above criteria was to filter out the false positive protein level changes caused by (1) technical method variability up to 20% (criterion 1), (2) inconsistent protein levels accross biological replicates (criteria 2 and 3), and (3) variability in levels of various peptides representing a single protein (criteria 2 and 3). Higher threshold of FDR adjusted p values (0.1) was allowed to not filter out proteins whose quantification relied on low numbers of peptides; this criterion was chosen based on the HER2/ERBB2 protein change between HER2-positive and HER2-negative carcinomas (P04626, n ϭ 4, FCH ϭ 5.27, p ϭ 0.099, Supplemental file 4, HER2ϩ versus HER2-sheet and supplemental File 5); HER2 tumor positivity/negativity was independently determined by IHC. All calculations were performed in R 2.10 (9) using packages from Bioconductor (www. bioconductor.org).
qRT-PCR Gene Expression Analysis Using Low Density Arrays-Gene expression analysis of 95 genes (see supplemental File 6 for detailed information on the TaqMan assays used) in 96 primary breast cancer tissues was performed using Low Density Arrays (Micro Fluidic Card System, 384 qRT-PCR reactions/card) on 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, CA). Each RNA isolated from an individual tissue piece was analyzed in three analytical replicates that were run independently in three different cards. 100 l of sample mix containing 200 ng of cDNA and 1ϫ TaqMan Universal PCR Master Mix UNG was used in each loading reservoir. All analysis parameters were electronically provided by the manufacturer together with the custom MicroFluidic Cards. Gene expression was evaluated by the comparative C T method of relative quantification using RQ Manager 1.2 software (Applied Biosystems) with 18S rRNA as an endogenous control. The baseline was established manually to 0.25 for all samples and ⌬C T values were exported for external statistical evaluation.
Statistical Analysis of qRT-PCR Data-The comparative C t method (10) was used to calculate ⌬⌬C t , r ϭ 2 Ϫ⌬⌬Ct and their standard deviations. Discrepancies in observation numbers for each gene were inspected by Fisher exact test. C t values were then compared by two sample t test, the results were considered significant when p Յ 0.05.
Correlation Analysis and Hierarchical Clustering-Fisher Z-transformed Pearson's correlation was used as the distance measure between samples coupled with average linkage criterion. Spearman correlation coefficients for each pair of samples in qRT-PCR study were computed. Additionally, qRT-PCR profiles from samples pooled in proteomic experiments were averaged and Spearman correlation coefficients for combined proteomics/qRT-PCR profiles calculated.
Immunohistochemistry (IHC) on Tissue Microarrays (TMA)-IHC was used to evaluate protein levels and staining patterns of selected putative biomarkers in TMAs prepared from formaldehyde fixed, paraffin embedded blocks, taken in parallel with the frozen tissue. Antibodies and protocols are presented in supplemental File 9. The antibody dilutions were optimized at test TMA from breast carcinomas from the same sample set. The paraffin sections were placed on Superfrost Plus slides (Gerhard Menzel GmbH, Braunschweig, Germany), slides were deparaffinized by three changes of xylene (5 min each step), followed by rehydration in 95%, 70%, 50% ethanol (5 min each step) and brought to distilled water. After blocking endogenous peroxidase (3% hydrogen peroxide in phosphate buffered saline (PBS), pH 7.5, for 15 min) antigen retrieval was performed in 10 mM sodium citrate buffer, pH 6.0 for 30 min at 95°C. The primary antibodies diluted in DAKO antibody diluent (DAKO, Glostrup, Denmark) were incubated overnight at 4°C followed by detection by DAKO EnVision™/peroxidase kit (cat. No K4007 for anti-mouse and K4011 for anti-rabbit secondary antibody, DAKO) and counterstained with Gill's hematoxylin (Sigma-Aldrich, St. Louis, MO). All slides were scored by the same pathologist who was blinded to other data. Associations between staining intensity of each selected protein and clinicopathological characteristics were assessed by Wilcoxon rank sum test for continuous variables and by Pearson's Chi-squared for categorical variables (Fisher's exact test was used when the number of observations was less than 5). All statistical tests were two-sided. All analyses were performed in R (9).

Analysis of Gene Expression and Connection of Gene Expression With Patient Survival in Independent Published Sample Sets-Pub-
licly available gene expression data set SUPERTAM_HGU133A including data from four studies (all platform Affymetrix Human Genome U133A, 856 samples in total) was downloaded in a log2 normalized form that was used in (11). Samples were classified into 4 breast cancer subtypes using a subtype classification model based on gene modules called SCMOD2 (12) resulting in 348 luminal A samples. Information on lymph node status was available for 343 cases (n ϭ 76 node positive, n ϭ 267 node negative). Association between gene expression and lymph node status within luminal A samples was assessed by Wilcoxon rank sum test (also known as Mann-Whitney test).
Survival analysis was performed using Kaplan-Meier Plotter (http:// kmplot.com) for both relapse-free survival (RFS) and distant metastasis free survival (DMFS) involving a microarray data set from 4142 breast cancer tissues (2014 database version) (13). For each gene, the population was split according to upper quartile (based on approximate proportion of lymph node positive patients in luminal A grade 1 group) and 15 years follow up threshold was applied. Each gene was represented by user-defined probe set, Affymetrix IDs were as follows: 201783_s_at, and 209878_s_at (RELA), 201020_at (YWHAH). The following target group of patients were analyzed: luminal A (data from 1678 patients were available for analysis of RFS and from 918 patients for DMFS), luminal A restricted to grade 1 (n ϭ 228 for RFS and n ϭ 140 for DMFS), luminal A restricted to N0 patients (n ϭ 933 for RFS and n ϭ 546 for DMFS), luminal A restricted to grade 1 N0 patients (n ϭ 159 for RFS and n ϭ 109 for DMFS) and luminal B (n ϭ 989 for RFS and n ϭ 360 for DMFS).

RESULTS
Untargeted Proteomics Screening-To identify metastasisrelated proteins in low grade breast cancer, we performed a high-resolution proteomics discovery study on a set of 48 clinicopathologically well characterized small primary grade 1 luminal A (ERϩ, PRϩ, HER2-) breast tumors; 24 lymph node positive and 24 lymph node negative. A similar matched set of 48 high grade tumors was used to investigate the selectivity of proteins for the low grade luminal A tumor group and to gain potential insight into common versus distinct metastatic processes during progression of low and high grade tumors and of different breast cancer subtypes. The workflow of the proteomic experiment together with all follow-up studies is shown in Fig. 1. A total of 4405 proteins were identified based on at least one tryptic peptide (FDRϽ0.05). Protein and peptide identification data together with peptide fractionation chromatograms are available in supplemental File 2. Peptide spectra are accessible through MS-Viewer at http://prospector2. ucsf.edu/prospector/cgi-bin/msform.cgi?formϭmsviewer, search key access dd7asd2je5. The mass spectrometry raw proteomics data have been deposited to the Proteome-Xchange Consortium (http://proteomecentral.proteomexchange. The quantitative data at peptide level and protein level are presented in supplemental Files 3 and 4, respectively, where detailed comparisons of protein levels between the key breast tumor characteristics (grade, lymph node status, estrogen and HER2 receptors) are available. Forty-two proteins whose levels correlated with lymph node metastasis either positively or negatively were selected for further verification. An additional 23 proteins connected to metastasis according to the literature and that exhibited dysregulation in other parameters under comparison (Table I) were also involved in the verification process to investigate their correlation with low-grade cancer metastasis.
Targeted Transcriptomics-To further elucidate the mechanisms of protein alterations in low grade breast cancer, we designed a custom TaqMan Low Density Array (Microfluidic card). The arrays were made to monitor (1) 65 gene transcripts selected on the basis of the observed changes at proteome level (shown in Table I), (2) an additional 30 genes related to prometastatic mechanisms according to the literature that were not detected at protein level, or that internally validated the sample set design (see list in supplemental File 6). Some genes within the array were also members of Oncotype DX (15) (MMP11, ESR1, PGR, ERBB2, ACTB) and MammaPrint (16) (TGFB1, STMN1 and MMP9) gene expression arrays. Complete results of transcriptomics experiment are presented in supplemental File 6.
Connecting Proteomics and Transcriptomics Data-The targets that exhibited statistically significant changes at both protein and transcript levels in lymph node positive versus negative grade 1 tumors were then selected. This group involved carboxypeptidase B1 (CPB1), PDZ and LIM domain protein 2 (PDLIM2), ring finger protein 25 (RNF25), NF-B transcription factor p65 (RELA), 14 -3-3 (YWHAH), stathmin 1 (STMN1) and thymosin beta 10 (TMSB10). TRAF3 interacting protein 2 (TRAF3IP2) was up-regulated in lymph node positive versus negative tumors regardless of grade and integrin beta-1 (ITGB1) was up-regulated in grade 3 but not grade 1 tumors with metatasis (see Tables I to III, in bold). Table II also summarizes genes and proteins that exhibited both differential expression and different protein levels between other subgroups of breast tumors that were involved in the study to investigate clinicopathological selectivity of identified targets.
Protein levels and gene expression were also analyzed together using hierarchical clustering ( Fig. 2 and supplemental Files 7A-7J). Three clusters of gene products in cold maps related to lymph node metastasis in grade 1 tumors were revealed (Fig. 2): G1 metastasis related cluster 1 involves CPB1, PDLIM2 and RNF25. It neighbors a large cluster of estrogen receptor related gene products composed of ESR1, PGR and anterior gradient proteins 2 and 3 (AGR2 and AGR3). Among the genes within the second lymph-node positivity related cluster in grade 1 tumors (G1 metastasis related cluster 2 in Fig. 2) there are STMN1, TMSB10, and ITGB1. This cluster involves other metastasis-related genes EPCAM, KISS1 and MTA1, and neighbors with a cluster containing PLAU and SERPINE1. Two other, RELA and YWHAH are involved in the third G1 metastasis related cluster 3, together with other metastasis-associated Plasminogen activator inhibitor 1 RNA-binding protein (SERBP1) and gelsolin (GSN) (Fig. 2).
Immunohistochemistry-IHC staining was performed for the top targets from the previous analysis and for which IHC compatible, specific antibodies were available: CPB1, RNF25, STMN1, ITGB1, and YWHAH. Data confirming specificity of antibodies are available in supplemental File 8, including effects of siRNA silencing on protein level using Western blot and protein profiles in breast cancer cell lines. Representative IHC images showing protein staining in low grade breast tumors for CPB1 are shown in Fig. 3, for other proteins see supplemental File 9. Although IHC is by nature a rather semiquantitative approach, the data confirmed key trends observed in proteomics and transcriptomics data: (1) Up-regulation of CPB1 in lymph node positive versus lymph node negative grade 1 tumors, (2) up-regulation of STMN1 in grade 3 versus grade 1 tumors. Data are available in Table III,     Evaluation of Clinicopathological Selectivity of Gene Products Correlating with Metastasis of Low Grade Tumors-Analysis of clinicopathological selectivity within breast cancer subtypes was enabled by analysis of grade 1 and grade 3 tumors in the same study design and was based on agreements between proteomics, transcriptomics and IHC data. Table III shows clinicopathological selectivity of proteins/mRNAs that were up-regulated in lymph node positive versus lymph node negative grade 1 tissues: CPB1, RNF25, PDLIM2, STMN1, TMSB10, RELA, YWHAH, and TRAF3IP2. Up-regulation of these proteins and corresponding transcripts in lymph node positive tissues was observed in grade 1 tumors (G1:N1-2/N0 in Table III) but not in grade 3 tumors (G3:N1-2/N0), with the exception of TRAF3IP2 that was up-regulated in lymph node positive tissues regardless of grade (Table III). Further stratification of these potential biomarkers was based on their differential expression between grade 3 and grade 1 tumors (G3/G1). Table III clearly show that higher level of CPB1, RNF25 and PDLIM2 proteins and transcripts were typical of highly differentiated grade 1 tumors. On the other hand, STMN1 and TMSB10 that were up-regulated in grade 1 lymph node positive versus negative tumors, however, were up-regulated also in grade 3 versus grade 1 tumors. The above protein stratification into groups according to the clinicopathological selectivity was fully reflected in hierarchical clustering (Fig. 2).

DISCUSSION
Methodological Strategy-The main aim of this study was to identify novel proteins correlating with early lymph node metastasis in low grade breast cancer that could be easily incorporated into clinical use. We used quantitative proteomics approach to identify such proteins by comparing a group of 24 lymph node positive luminal A grade 1 tumors with a corresponding set of 24 lymph node negative tumors. We selected small (pT1c, 11-20 mm) tumors to keep potential sampling error as low as possible, to capture metastasisresponsible cell population. Although this approach is sufficient for this purpose, we also wished to study whether low grade metastasis-associated proteins differ from those connected to metastasis in high grade tumors. The distinctive identification of proteins associated specifically with metastatic spread of tumors of different grades would imply different mechanisms employed during their progression, requiring different therapeutic approaches for each. Identifying markers of potential metastasis in low grade tumors specifically is also important clinically because these are less sensitive to chemotherapy than high grade tumors and may therefore require different targeted therapeutics. However, markers that indicate metastasis in all breast cancer patients regardless of grade are also very important clinically and are discovered as Genes marked with full green dots were up-regulated whereas those marked with empty green dots were down-regulated at protein level in G1 N1-2 versus G1 N0 tumors. Genes marked with full orange dots were statistically significantly up-regulated at transcript level in G1 N1-2 versus G1 N0 tumors (no down-regulated transcripts were found here). The genes up-regulated at both levels are considered as core genes of G1 metastasis related clusters 1 (CPB1, PDLIM2, RNF25, associated with lymph node metastasis of grade 1 tumors and had low levels in grade 3 tumors), G1 metastasis related clusters 2 (STMN1, TMSB10 associated with lymph node metastasis of grade 1 tumors and had high levels in grade 3 tumors) and G1 metastasis related clusters 3 (RELA, YWHAH associated with lymph node metastasis of grade 1 tumors with no significant difference between low and high grade tumors). Clusters related to estrogen receptor (ESR1) and HER2 receptor, and PLAUϩSERPINE1 genes are also highlighted. of potential general applicability through this methodological strategy.
We also analyzed individual patients at the transcript level for a large panel of putative protein biomarkers and using IHC for a more limited set for which antibodies are available. In addition to providing independent verification of the shotgun proteomic data, these approaches are imperative for the adoption of biomarkers into clinical practice, either through incorporation into multiplex expression assays (e.g. Oncotype DX and MammaPrint) or for use in current routine diagnostic histopathology laboratories. For mRNA analysis, discrepancies between mRNA and protein levels are not uncommon, but selecting targets correlating at two distinct biological levels reduces false positive findings inherent to a single screening approach and allowed us to interrogate independent large patient data sets for further validation and investigation of clinical correlation (Fig. 4) and impact on survival (Fig. 5), which is not otherwise possible from our limited number of samples.    Table I-III, CPB1 exhibited the largest increase in lymph node positive versus lymph node negative tumors grade 1 (G1:N1/N0 FCH ϭ 4.34, p ϭ 0.002); transcript level (FCH ϭ 6.737, p ϭ 0.025); supported by IHC staining in discovery set (p ϭ 0.03809). IHC in an independent sample set (n ϭ 64) of luminal A grade 1 tumors showed similar trend (Fig. 3) but was not statistically significant (p ϭ 0.13055); this may also reflect the semiquantitative nature of the method as well as the increased proportion of N0 to N1-2 samples in the validation set. Moreover, high protein levels and expression were specific for grade 1 and not grade 3 tumors. Kaplan-Meier plots on independent samples support both the clinical impact of CPB1 (Fig. 5) and its selectivity for luminal A tumors, because different association with patient survival for luminal B tumors of higher grade was found (Fig. 5K). CPB1 is a secreted tissue protease and its potential prometastatic role in lymph node positive tumors might be related to matrix metalloprotease activity (http://www.uniprot. org/uniprot/P15086). In this regard, its homolog, plasma carboxypeptidase B2, has been implicated in the pro-metastatic urokinase plasminogen-activator/inhibitor system (17).
Another target correlating with lymph node metastasis of grade 1 tumors identified here was PDLIM2, a protein with pro-metastatic, pro-survival, pro-angiogenic and pro-transformation functional properties through the NF-B pathway (18). PDLIM2 inhibits the central transcription regulator of the NF-B pathway, p65/RELA, via its nuclear ubiquitin ligase activity. Both PDLIM2 and RELA were up-regulated in discovery and validation (Fig. 4) sample sets, indicating activation of NF-B pathway (Fig. 4). In addition, the inhibition effect of PDLIM2 may be reflected in Kaplan-Meier plots, where association of higher PDLIM2 expression with better survival was found (Figs. 5B and 5I). PDLIM2 was recently proposed as a potential therapeutic target in breast cancer (19). A recent study also revealed that PDLIM2 regulates transcription factor activity in epithelial-to-mesenchymal transition via COP9 signalosome (20). The third target identified in the study as up-regulated in grade 1 tumors with lymph node involvement was ring finger protein 25 (RNF25, or AO7), which binds to the transactivation domain of p65/RELA and enhances its transcriptional activity (21). In parallel, RNF25 has a pro-metastatic role as an E3-ubiquitin ligase of Naked2, an antagonist of the pro-metastatic Wnt pathway (22). Moreover, levels of these NF-B associated proteins were down-regulated in grade 3 tumors compared with grade 1 tumors, indicating the components of NF-B pathway being more typical of grade 1 tumors (Table III and supplemental Files 9, 10).
Other proteins correlating with lymph node metastasis are STMN1 and TMSB10 which were linked more generally with an aggressive phenotype because of their overexpression in grade 3 and triple negative tumors (Table I). STMN1 (also known as oncoprotein 18, OP-18) is a microtubule destabilizing phosphoprotein with a key role in the control of mitosis. Recent evidence supports a role for STMN1 in advanced invasive and metastatic cancer because of its pro-survival role (23) and is associated with high risk and lymph node metastasis (24,25). High STMN1 expression also negatively influences tamoxifen response in estrogen positive breast cancer (26). These observations support the role of STMN1 as a marker of poor prognosis and a target for antitumoral and anti-metastatic therapies (23,27). On the other hand, it was reported that STMN1 plays a protumorigenic role in early stages of carcinogenesis but may act as a tumor suppressor and inhibit metastasis formation in later stages (28). The role of TMSB10 in cancer is mainly related to cytoskeletal alterations. TMSB10 is part of a gene expression signature for predicting lymph node metastasis of early stage cervical carcinomas (29) and is up-regulated in metastatic papillary thyroid carcinomas (30). The roles of STMN1 and TMSB10 in breast cancer metastasis are supported by their correlation and clustering with other metastasis-associated gene products within G1 metastasis related cluster 2, e.g. epithelial cell adhesion molecule (EPCAM), metastasis-suppressor KiSS-1 (KISS1) and metastasis-associated protein 1 (MTA1). Furthermore, their neighboring cluster in Fig. 2 contains two markers recommended for prediction of breast cancer metastasis (4), PLAU and SERPINE1. The last marker identified here is 14 -3-3 (YWHAH). Bergamaschi and Katzenellenbogen recently reported high YWHAH protein levels in correlation with early breast cancer recurrence and regulation by tamoxifen via down-regulation of microRNA-451 (31).
Different Prometastatic Proteins and Mechanisms Observed in Grade 3 Tumors-In contrast to low grade tumors, fewer proteins correlated with lymph node status in grade 3 cancers. The protein and corresponding transcript most significantly up-regulated in lymph node positive versus negative grade 3 tumors was integrin B1 (ITGB1) which functions in cell-to-cell and cell-to-extracellular matrix (ECM) adhesion, transducing signals from the ECM to the cell and vice versa to influence cell migration and invasion (32). Overexpression of plasminogen activator inhibitor 1 (SERPINE1) in lymph node positive versus negative tumors supported the validity of the experiment and was specific for grade 3 tumors (p ϭ 0.024 for grade 3 versus p ϭ 0.241 for grade 1, Table II). The same applied for osteopontin (SPP1), a known pro-metastatic and pro-survival protein from previous breast cancer studies (33,34).
To determine prognosis in early stage breast cancer, two gene expression-based tests are available (MammaPrint and Oncotype DX). In our study, we analyzed several genes included in these tests (MMP11, ESR1, PGR, ERBB2, ACTB, TGFB1, STMN1, and MMP9) and observed a significant relation to metastasis only in the case of STMN1. Changes in MMP11, MMP9, ESR, and PGR were more related to differences between breast cancer subtypes with different tumor grade (Table II). Our data also indicated that generally accepted prometastatic markers in breast cancer (urokinase plasminogen activator/inhibitor system, osteopontin and most MammaPrint and OncotypeDX genes tested here, with the exception of STMN1) are effective mainly in high grade tumors and may not be useful for predicting metastatic potential of low grade carcinomas.
In addition to the proteins and corresponding transcripts discussed above, the design of our study enabled the identification of proteins and transcripts correlating with tumor grade, ER and HER2 receptors status. As identification of such targets was not aim of the study, we discuss the most interesting observations in supplemental File 11 and the data sets are publically available for future inspection and analysis. CONCLUSIONS A combination of state of the art proteomics, transcriptomics and IHC, together with validation in independent database sets led to identification of CPB1, PDLIM2, RNF25, RELA, STMN1, TMSB10, TRAF3IP2, and YWHAH (listed according to tumor grade specificity and verification) as proteins correlating with lymph node positivity of low grade breast cancer.
Our findings indicate that pro-metastatic mechanisms in low grade breast tumors may involve overexpression of CPB1, activation of NF-B pathway, pro-survival mechanisms and changes in cytoskeleton, and are different from those in high grade tumors. These data provide candidates for further char-acterization and validation toward clinically usable diagnostic and therapeutic targets in low grade breast cancer patients and may be useful to predict those rare low grade luminal A breast cancer patients that should receive more regular follow-up and intensive therapy.