A Serum Protein Profile Predictive of the Resistance to Neoadjuvant Chemotherapy in Advanced Breast Cancers*

Prediction of the responses to neoadjuvant chemotherapy (NACT) can improve the treatment of patients with advanced breast cancer. Genes and proteins predictive of chemoresistance have been extensively studied in breast cancer tissues. However, noninvasive serum biomarkers capable of such prediction have been rarely exploited. Here, we performed profiling of N-glycosylated proteins in serum from fifteen advanced breast cancer patients (ten patients sensitive to and five patients resistant to NACT) to discover serum biomarkers of chemoresistance using a label-free liquid chromatography-tandem MS method. By performing a series of statistical analyses of the proteomic data, we selected thirteen biomarker candidates and tested their differential serum levels by Western blotting in 13 independent samples (eight patients sensitive to and five patients resistant to NACT). Among the candidates, we then selected the final set of six potential serum biomarkers (AHSG, APOB, C3, C9, CP, and ORM1) whose differential expression was confirmed in the independent samples. Finally, we demonstrated that a multivariate classification model using the six proteins could predict responses to NACT and further predict relapse-free survival of patients. In summary, global N-glycoproteome profile in serum revealed a protein pattern predictive of the responses to NACT, which can be further validated in large clinical studies.

Breast cancer is the most common cause of cancer death in women (1). Neoadjuvant chemotherapy (NACT) 1 is being administrated increasingly for the treatment of patients with locally advanced breast cancers at stage IIb-III. NACT is commonly employed to reduce the size of the primary tumor size and also abolish occult systematic micro-metastases before surgery. The treatment has been evaluated in several clinical studies, a majority of which have shown that it increases the chance of breast conservation and its outputs are correlated with survival of patients (2,3). Unfortunately, variable proportions of patients (up to 40% depending on the chemotherapy regimen used) may not benefit clinically or pathologically from NACT (4). Such breast cancer patients may possess or develop resistance to the chemotherapeutic agents used (5,6). Currently, there is no method to reliably predict tumor responses to NACT prior to the therapy. Therefore, it is of considerable interest to identify molecular biomarkers that can be used to predict the resistance to NACT and thus to assess the treatment options of individual patients.
The search for novel biomarkers has been facilitated by the use of microarray and proteomic technologies. Several gene expression (7)(8)(9)(10)(11)(12) and proteomic studies have provided a whole spectrum of profiles for breast cancer cell lines (13,14), tissues (15,16) and serum samples (17,18). These profiles were then used to discover mRNA/protein signatures that can be used to predict disease subtypes, prognosis, and therapy outputs in breast cancer. Specifically, to investigate molecular signatures associated with chemoresistance, several gene expression studies have been performed for breast cancer cell lines (10,11) and tissues (19 -21). The genes reported to be associated with chemoresistance in breast cancer cell lines or tissues include 1) several ABC transporters (MDR1/ ABCB1, MRP1/ABCC1, ABCG2, etc.) (22,23) and other transporters acting as energy-dependent efflux pumps of xenobiotics (24), 2) glutathione-S-transferase (GSTpi (25,26)), and 3) thioredoxin (19,27). However, to our knowledge, there has been only single liquid chromatography-tandem MS (LC-MS/ MS) based shotgun proteomic study where a hundred potential tissue biomarkers associated with tamoxifen resistance were identified (28). Moreover, although serum biomarkers have a high utility in clinical practice, noninvasive serum biomarkers capable of predicting the NACT resistance have not been exploited.
Here, we preformed profiling of N-glycosylated proteins in serum to identify such serum biomarkers. The serum samples were collected prior to NACT from two groups of patients who are chemosensitive (CS) and chemoresistant (CR), respectively. In this study, we focused on a combination NACT regimen using docetaxel (DC) and doxorubicin (AC) chemotherapeutic agents. A LC-MS/MS-based label-free quantitative measurement technique in combination with isolation of N-glycosylated peptides (29) was employed in order to identify differentially expressed serum proteins between CS and CR patients. N-glycosylation is a prevalent modification of serum proteins (29). Many of known cancer biomarkers such as Her2/neu, ␤ human chorionic gonadotropin, ␣-fetoprotein, prostate-specific antigen, and CA125 are N-glycosylated. Also, the isolation of N-glycopeptides was reported to increase the dynamic range of protein concentrations that can be measured by effectively reducing the complexity in serum samples (29,30). Through a series of statistical analyses of N-glycosylated serum proteomic data, we identified 13 serum biomarker candidates. Among them, we selected the final set of six potential serum biomarkers (AHSG, APOB, C3, C9, CP, and ORM1) whose differential expression was confirmed in the independent samples. Using the six potential biomarkers, we further built a partial least squares-discriminant analysis (PLS-DA) model that can predict the responses to the DCϩAC NACT regimen and further predict relapse-free survival of patients. Therefore, the results demonstrate that global Nglycoproteome profile in serum revealed a proteome profile predictive of the responses to NACT, which can be further validated in large clinical studies.

EXPERIMENTAL PROCEDURES
Patients and Sample Collection-Patients with locally advanced breast cancer (i.e. tumor size greater than 2 cm) and clinically tumorpositive axillary lymph nodes were eligible for the study. All participating patients were hospitalized at the Seoul National University Hospital from March to October in 2005. They were administered with three cycles of combination NACT using docetaxel (DC) and doxorubicin (AC) every 3 weeks. Blood samples were obtained at the time of diagnosis. Blood was contained in additive-free vacutainers (BD Vacutainer Systems, Basel, Switzerland) and allowed to clot for 30 min at room air. The samples were then centrifuged at 2000 ϫ g for 20 min at 4°C, and the supernatant was filtered through a cellulose acetate filter with 0.2 m pore size and 5 cm 2 filtration areas. Serum was aliquoted and stored at Ϫ80°C. The response to NACT was evaluated based on RECIST (Response Evaluation Criteria in Solid Tumors) version 1.0 (31), a set of rules which define the responses to chemotherapy using the following four categories : complete response, partial response, stable disease, and progressive disease. The first two categories were defined as CS whereas the others were defined as CR. We carefully selected ten chemosensitive (CS) and five chemoresistant (CR) patients for our study based on the evaluation criteria. Note that all the samples were collected under the valid IRB approval. Blood of breast cancer patient was collected after informed consent at the Seoul National University Hospital (Seoul, Korea).
Isolation of N-Glycosylated Peptides-N-glycosylated peptides were obtained from human sera obtained from the 15 patients as previously described (29). No depletion of abundant proteins was attempted. Before the isolation, we divided each serum sample into three aliquots and then isolated N-glycosylated peptides independently from each aliquot, resulting in three N-glycopeptide samples for each serum sample and thus a total of 45 samples for the 15 patients. For each of 45 samples, the glyco-moieties of the proteins were oxidized by addition of aqueous sodium periodate solution (100 mM,) to the protein solution. The oxidized protein samples were then transferred to a filter plate for conjugation to the hydrazide resin. The conjugation was then done at room temperature for 20 h with gentle shaking. The nonglycosylated proteins were removed by centrifugation at 1000 rpm for 1 min and by washing the resin six times with urea solution (8 M  Reduction and alkylation reactions were conducted by the addition of a tris(2-carboxyethyl)phosphine hydrochloride and iodoacetamide solution, respectively. The resin was washed sequentially with the digestion buffer and 100 mM NH 4 HCO 3 buffer and then resuspended in 100 mM NH 4 HCO 3 buffer. Trypsin (Worthington, Freehold, NJ) (protein-to-trypsin ratio, 1:1 (w/w)) was added to each sample, and the samples were then incubated overnight at 37°C. After discarding the supernatant, the resin was washed sequentially with 1.5 M NaCl, 80% acetonitrile in water, methanol, water, and 100 mM ammonium bicarbonate buffer. The N-linked glycopeptides were then released from the resin through the addition of a diluted glycerol-free PNGase F solution (1:150 dilution, New England Biolabs, Ipswich, MA), followed by incubation with gentle shaking at 37°C overnight. The supernatant containing glycopeptides was collected through centrifugation (at 1000 rpm for 1 min). The resin was then washed with 80% acetonitrile, and the washing solutions were combined with the initial supernatant. The solutions were evaporated for approximately 2-3 h in a SpeedVac system so that the final volume was less than 200 l per sample. They were then desalted using 96-well Vydac Silica C-18 columns. The released glycopeptides were collected in a glass vial, dried in the SpeedVac system (approximately 1 h), and resuspended in aqueous 0.4% acetic acid prior to mass spectrometric analysis. Immediately prior to LC/MS/MS analysis, we then pooled the three N-glycopeptide samples from the same patient and performed triplicate LC/MS/MS experiments on each of the pooled samples. In summary, we obtained a total of 45 LC/MS/MS data sets from triplicate experiments on each of the 15 N-glycopeptide samples that were produced by isolating N-glycopeptides three times from each of 15 patient serum samples and pooled the three the samples from the same patient.
LC-MS/MS Analysis-The peptide samples were separated using a homebuilt ultrahigh-pressure dual on-line solid phase extraction/capillary reverse-phase liquid chromatography system, having a maximum operating pressure of 10,000 psi, as previously described (32). Briefly, the system is equipped with two reverse-phase liquid chromatography capillary columns (75 m ID ϫ 360 m OD ϫ 80 cm length, C18-bonded particles, 3 m, 300 Å pore size, Jupiter) (Phenomenex, Torrance, CA), which were manufactured in-house through slurry packing. The system is also equipped with two SPE columns that were prepared by packing a 1 cm long liner (250 m ID) inside an internal reducer (1/16" to 1/32"; VICI) with the same C18-bonded particles. Each of the samples was desalted online on the SPE column and eluted using a gradient of solvents A (0.1% formic acid in water) and B (90% acetonitrile, 0.1% formic acid in water), where the percentage of the latter mobile phase was increased exponentially from 0 to 80% by volume over a period of 180 min.
A 7-tesla Fourier transform ion cyclotron resonance mass spectrometer (FTICR, LTQ-FT, Thermo Electron, San Jose, CA) was used to collect the mass spectra. MS precursor ion scans (m/z 400 -2000) were acquired in a full-profile mode with an AGC target value of 1 ϫ 10 6 , a mass resolution of 1 ϫ 10 5 , and a maximum ion accumulation time of 1000 ms. The mass spectrometer was operated in the datadependent tandem MS mode. The three most abundant ions detected in a precursor MS scan were dynamically selected for MS/MS experiments. To prevent reacquisition of MS/MS spectra of the same peptides, we applied a dynamic exclusion option (exclusion mass width low: 1.10 Th; exclusion mass width high: 2.10 Th; exclusion list size: 120; exclusion duration: 30 s). Collision-induced dissociations of the selected precursor ions were in an ion trap (LTQ) with the collisional energy and isolation width set to 35% and 3 Th, respectively. The Xcalibur software package (version 2.0 SR1, Thermo Electron) was used to construct the experimental methods.
Peptide Identification-All tandem mass spectrometric data (i.e. DTA files) were extracted using the ExtractMSn (version 3; creation date: 2006.7.18) of Bioworks™ software (version 3.2EF2, Thermo Electron). The precursor masses of MS/MS data were corrected and refined by the PE-MMR method (33) before they are subjected to the search against a composite database containing both the IPI database (version 3.14) containing 57546 protein entries and its reversed complement using SEQUEST (version 27) on a Linux 14-node cluster system. The tolerance was set to 10 ppm for precursor ions and 1 Da for fragment ions. Enzyme specificity was not considered. Variable modification options were used for the carbamidomethylation of cysteine (57.021460 Da), the oxidation of methionine (15.994920 Da), and the hydrolysis of asparagine (0.987000 Da). The false positive rate of peptide assignment was estimated through a composite target/decoy database search. The values of Xcorr and the delta Cn for 5% false positive rate (supplemental Table S2) were used to obtain peptide IDs (34).
Label-free Peptide Quantification-In an LC/MS/MS experiment, a peptide MS peak emerged over a period of time during LC elution. Such mass spectral peaks that have similar monoisotopic masses (i.e. within the mass tolerance of 10 ppm) but different LC elution times can be grouped into a unique mass class (UMC) (35). Ideally, a peptide is represented by a UMC. The same peptides with different charges were also included in the same UMC. After PE-MMR analysis, all of the UMCs (called a "UMC list") observed in an LC-MS/MS experiment were recorded in an XML-formatted file. Each UMC contained all mass spectral components having different charge states, abundances, and scan numbers, but similar monoisotopic masses within the given mass tolerance. For each UMC, the UMC mass is then calculated as the intensity weighted average of the monoisotopic masses of all the UMC components. For each peptide, we use the abundance summation of all mass spectral components of the corresponding UMC to represent the experimental peptide abundance. The precursor masses of MS/MS data were matched to those of UMCs and refined by the UMC masses. Then, DTA information was linked to the matched UMC. When the linked DTA file resulted in peptide identification within false positive rate of 5% after SEQUEST search and target-decoy analysis (34), the correspond-ing peptide ID was recorded in the UMC. The abundance summation represents the observed intensity of the peptide ID during the LC-MS/MS experiment.
Alignment of the Identified Peptides-The identified UMCs (peptides) from all 45 data sets were combined into an alignment table where each row contains the abundances of the same peptide identified in the 45 data sets with the missing values when the peptide was not identified in the corresponding samples. For the missing values in each row of the alignment table, we further searched for the UMCs whose retention times and masses could be matched with those of the aligned UMCs by the peptide sequences within 5 mins and 50 ppm, respectively. Note that we used a relaxed mass tolerance (50 ppm) for the alignment, compared with the one (10 ppm) used for peptide identification and UMC generation. We then filled up the missing values with the intensities of the searched UMCs.
Identification of Differentially Expressed Peptides-To identify differentially expressed peptides between two CS and CR patient groups, we applied both two-tailed t test and Wilcoxon rank sum test to the intensities of the 2699 aligned peptides (36). Before the statistical hypothesis tests, we removed the following sets of peptides, among the aligned peptides, to ensure reliability in the tests: 1) 1398 nonglycopeptides because their abundance information was unreliable after the glycopeptides isolation and 2) 742 peptides whose abundances were missing in more than 30% of all replicates in each group of patients. This filtering resulted in 558 N-glycosylated peptides. Using the p values from each test, we computed false discovery rates (FDRs) using Storey's method (37). We then selected the 81 differentially expressed peptides with the FDRs less than 0.05 in both tests. The results from the two tests were combined to select the differentially expressed peptides to reduce the false positives that can be selected because of 1) their small standard deviations by the two-tailed t test and 2) abnormal data (e.g. outliers or noisy data) that violate normality of peptide intensity to be ensured for t test (38,39).
Furthermore, to remove potential false positives coming from the statistical tests using the unbalanced sizes of CR and CS patients, we performed the following experiments: 1) from the ten CS patients, we generated 252 sets of five CS patients; 2) a set of differentially expressed peptides was identified by applying the above combined statistical test to each set of the five CS patients and the five CR patients, resulting in 252 sets of differentially expressed peptides; 3) for all the 558 peptides, the frequencies selected as differentially expressed peptides from the 252 comparisons were computed; 4) to evaluate the significance of the frequency, we generated a null hypothesis distribution of the frequency for nondifferentially expressed peptides by dividing the ten CS samples into two sets of five CS samples (252 cases) and performing Steps 2 and 3 for the 252 cases; 5) we computed the 95 percentile frequency value (i.e. 60 out of 252) from the null hypothesis distribution of the frequency (Step 4); and 6) among the 81 differentially expressed peptides, we finally selected 79 peptides that have their frequencies larger than the cutoff value of 60. In this way, we could remove two potential false positives that came from the unbalanced sample sizes of the two groups.
Clustering and Partial Least Squares-Discriminant Analysis-To show how the samples and the differentially expressed peptides can be grouped based on their expression patterns, we first carried out a unsupervised hierarchical clustering using "complete linkage" method and "Pearson's correlation" as a similarity measure. To show clearly the differential expression across the samples, the log2 median intensity was subtracted from all the intensities in each set of aligned peptides.
The missing values are estimated as the median intensity in the corresponding set of aligned peptides. Furthermore, we applied a supervised classification analysis, PLS-DA to visualize the separation between CS and CR patients and to compute the relative importance of each pep-tide to the separation based on the variable importance in the projection (VIP) (40). We computed the VIP for each peptide and then selected the peptides highly contributing to the separation as the ones with the VIPs larger than one as previously described (41,42).
Western Blotting-Six high-abundant proteins in serum samples were depleted by multiple affinity removal LC column-Human 6 (Agilent, Santa Clara, CA). This depletion reduces potential interference of detection of biomarker candidates by the abundant proteins. More importantly, differential expression of the final set of candidates should be valid, regardless of the use of isolation of N-glycopeptides.
Note that SERPINs were not tested because they were removed during the depletion step. The Bradford quantification assay was used to determine protein concentration of each sample. Proteins were separated by 1D-SDS-PAGE and transferred onto a polyvinylidene difluoride membrane (Bio-Rad, Hercules, CA) via a Bio-Rad Trans-Blot Cell system (Bio-Rad). Membranes were blocked with 5% skim milk for 1 h at room temperature and then probed with primary antibodies. Blots were washed three times with Tris-buffered saline/ Tween 20 buffer (20 mM Tris, 150 mM NaCl, and 0.05% Tween 20, pH 7.4), probed with horseradish peroxidase-conjugated secondary antibody for 1 h at room temperature, and then developed with a chemiluminescence detection system (GE Healthcare, Piscataway, NJ). To test differential expression of the selected proteins in the independent samples, we used the primary antibodies for AHSG (Abcam, Cambridge, MA), APOB (Abcam), BTD (Abcam), C4A (Abcam), C3 (Abcam), C9 (Abcam), CP (Abcam), HPX (Abcam), PLTP (Abcam), VTN (Abcam), and ORM1 (Abnova, Taipei City, Taiwan). The blots were quantified with FUJI FILM Multi Gauge version 3.0. The intensities of Coomassie staining of duplicate gels across samples were normalized to the same level (43).

RESULTS
Collection of Serum Samples-In this study, we used the samples collected from 15 patients who were treated by a NACT regimen with a combination of docetaxel (DC) and doxorubicin (AC). Characteristics of the patients are listed in supplemental Table S1. Sera were drawn from these patients before NACT. For each patient, the tumor sizes before NACT and after three courses of NACT were measured using magnetic resonance imaging. By computing the reduction in the tumor size, we evaluated the responses to the NACT based on the response evaluation criteria in solid tumors (RECIST) (31). The evaluation results are summarized in supplemental Table S1. As a result, we grouped the 15 patients into 1) ten chemosensitive (CS) and 2) five chemoresistant (CR) patients (Experimental Procedures). One patient showed 27.7% reduction in the tumor size. This reduction rate was close to the cutoff of 30% used to define partial response (PR) according to RECIST. We classified the patient into a CS patient because the reduction rates in the five CR patients were clearly lower than 27.7%.
Serum N-glycoproteome Profiling-We first selectively isolated N-glycosylated peptides from serum samples collected from breast cancer patients prior to NACT using the method introduced by Zhang et al. (29) as described under "Experimental Procedures." For each sample, a label-free LC-MS/MS analysis was then performed three times to analyze the N-glycosylated serum proteome. Fig. 1 summarizes the overall scheme of our approach. First, N-glycosylated peptides were analyzed using the LTQ FT-ICR (Experimental Procedures). Second, peptide features were identified from each data set as unique mass classes (UMCs; Experimental Procedures). On average, 7893 UMCs (peptides) were identified in each data set using PE-MMR software (33). Third, the sequences of such peptides were identified using the SEQUEST database search, and the search results were then evaluated using the target-decoy method (FPRՅ0.05; supplemental Table 2; Experimental Procedures). In each data set, 81.4% of peptides on average were found to be N-glycosylated peptides, supporting the validity of our glycopep- tide isolation. Fourth, abundances of the peptides were estimated as those of the corresponding UMCs (Experimental Procedures). Fifth, the identified peptides (UMCs) from all 45 data sets were combined into an alignment table (Experimental Procedures) including 2699 peptides corresponding to 930 proteins. For each of these proteins, the identified sequences and the percent of sequence coverage are summarized in supplemental Table S3. We then normalized the intensities in the alignment table using quantile normalization (44). Finally, the overall label-free quantification procedure, including identification, quantification and alignment of peptide features (UMCs), was evaluated by computing the LC/MS similarity scores described by Muller et al. (45). The high similarity scores of 0.848 for the averaged proportion of the overlapping peptides and 0.804 for the averaged intensity correlation for all possible pairs of samples (supplemental Fig. S1) indicate the validity of our label-free quantification method.
Identification of a Serum Protein Profile Associated with Chemoresistance-To identify differentially expressed peptides between two CS and CR patient groups, we applied both two-tailed t test and Wilcoxon rank sum test to the intensities of the aligned peptides (36). Among the 2699 aligned peptides, we focused on only 558 N-glycosylated peptides (116 unique proteins; supplemental Table S4; supplemental Fig. S2 for their MS/MS spectra) to identify reliable differentially expressed peptides as described under "Experimental Procedures." The precursor m/z, observed charge state, and SEQUEST outputs of each of the 558 N-glycosylated peptides are summarized in supplemental Table S5. Also, the single peptides included in the statistical tests are listed in supplemental Table S6. Among the 558 peptides, we initially identified 79 differentially expressed peptides with both FDRs Յ 0.05 from the two tests (37) (Experimental Procedures). Out of the 79 peptides, we further stringently selected the 50 peptides (23 unique proteins; Table  I and supplemental Table S7) 1) whose corresponding proteins have more than two nonredundant differentially expressed peptides, or 2) whose corresponding mRNAs or proteins showed differential expression associated with the resistance to either DC or AC, or prognosis after treatment in the breast cancer tissues (see the next section), though the corresponding proteins have only single nonredundant differentially expressed peptides. We first manually examined the MS and MS/MS spectra of the 50 differentially expressed peptides and confirmed their N-glycosylation (supplemental Fig. S3). We further manually confirmed the differential expression of the 50 peptides between the CR and CS samples. For example, Figs. 2A and 2B show that the abundance of an N-glycosylated peptide (SKPTVSSSM#EFKYDF-N@SSM#LYSTAK) from apolipoprotein B (APOB) at the retention time showing the maximum peak intensity during the course of elusion was clearly different between a pair of CS and CR patients. Fig. 2C further shows abundances of all the four peptides from APOB in both CS and CR patients. As shown in Fig. 2D, the abundances of all the four peptides from APOB in CS patients were significantly increased (p ϭ 1.291 ϫ 10 Ϫ4 ), compared with those in CR patients.
Integration of Differential Expression Associated with Chemoresistance from Breast Cancer Tissues-During the selection of the 50 differentially expressed peptides, we investigated the differential expression of the mRNAs and proteins in the four sets of mRNA and proteomic data collected from breast cancer tissues and cell lines, as described above: 1 and 2) differential expression of the mRNAs associated with DC (GSE6434; supplemental Table S8) (46) and AC resistance (GSE1647; supplemental Table S9) (10) in breast cancer tissues and ZR751 cells, respectively; 3) differential expression of the proteins associated with DC ϩ AC resistance in breast cancer tissues (supplemental Table S10); and 4) differential expression of the mRNAs associated with prognosis in breast cancer tissues (9) (supplemental Table S11). The last data set was integrated because several studies (2, 3) have shown that clinical and pathologic responses to NACT are correlated with prognosis and survival. Differentially expressed genes (DEGs) and proteins (DEPs) in each data set were identified as the ones with FDR Յ 0.05 from two-tailed t test and the ones with XPRESS ratio Ͼ 1.5 or XPRESS ratio Ͻ 0.67, respectively. The 50 differentially expressed peptides corresponded to 23 proteins. Among the 23 proteins, seven showed shared differential expression with either the DEGs or DEPs from the four tissue and cell line data sets (the columns in the right side of Fig. 3 and supplemental Tables S8 to S11), including complement component 4A (C4A), complement component 9 (C9), ceruloplasmin (CP), and serpin peptidase inhibitor, clade C, member 1 (SERPINC1), coagulation factor XII (F12), biotinidase (BTD), and orosomucoid 1 (ORM1). The shared differential expression from the independent studies may imply the reliability of such proteins on their association with chemotherapy resistance (see Discussion). Based on this hypothesis, the peptides from these seven overlapping proteins were identified as the differentially expressed peptides, regardless of the numbers of nonredundant peptides, as described earlier.
Characteristics of the Differential Serum Protein Profile Associated with Chemoresistance-To investigate characteristics of differential expression patterns of the 50 selected peptides, we applied hierarchical clustering to their intensities (Experimental Procedures). The clustering results (Fig. 3) show that there seems to be five clusters, three of which (Clusters 2, 4, and 5) are in favor of separation of CS from CR patients whereas others (Clusters 1 and 3) are against the separation (supplemental Table S7). Fig. 3 also shows three clusters of patients comprising 1) all five CR patients, 2) four CS patients [1][2][3][4], and 3) six CS patients [5][6][7][8][9][10], where the numbers in bracket represent the sample (patient) indexes. Interestingly, the four CS patients (CS1-CS4) are closer to the CR patients than the other six CS patients (the dendrogram for the sample clusters in Fig. 3). This is because of the similarity of expression patterns of the differentially expressed peptides in Clusters 1 and 3 between CS1-CS4 and CR patients. Thus, the clustering results indicate that the differentially expressed peptides in Clusters 2, 4, and 5 contribute to defining the difference between the CS and CR patient groups, whereas the ones in Clusters 1 and 3 define differential expression patterns with mixed characteristics of CS and CR patient groups. Selection of Serum Biomarkers Predictive of Chemoresistance-To identify a set of biomarkers predictive of the resistance to NACT, it is important to evaluate the relative contribution of the 50 differentially expressed peptides to the separation between CS and CR patients. To select the 50 peptides using both the t test and rank sum test, we evaluated the significance of differential expression of each peptide between CS and CR patients independently. However, the importance of the peptides in the separation between CS and CR patients should be evaluated based on the collective contribution of the individual peptides to the separation. To this end, we applied a multivariate classification analysis, PLS-DA (40) to the intensities of the 50 differentially ex-pressed peptides (supplemental Table S12). The PLS-DA result (Fig. 4A) shows that a clear separation between CS and CR can be achieved by the differential expression of the 50 peptides as indicated by the decision function (blue line). The two subgroups of CS patients (CS-1 and CS-2 in Fig. 4A) were observed again, as in the clustering results (Fig. 3), though PLS-DA attempts to minimize the variance within the CS patient group to maximize the separation between the CS and CR groups (see Discussion). For the separation achieved by the PLS-DA (Fig. 4A), the relative, collective contribution of the individual peptides was then estimated as their variable importance in projection (VIP) (41, 42) (supplemental Table S13). Fig. 4B shows the sorted VIP values of the 50 differentially expressed peptides in descending order. A large VIP value indicates a high contribution to the separation of the CS and CR patient groups, and a VIP value larger than one is considered to have significant contribution to the separation (41,42). The VIP values of 19 peptides were larger than one (Fig. 4B). The thirteen proteins including the 19 peptides were thus selected to be tested in the independent serum samples.

FIG. 2. Differential expression of four peptides from APOB. A-B,
The isotopic clusters of a differentially expressed peptide from APOB with the sequence K.SKPTVSSSM#EFKYDFN@SSM#LYSTAK.G in MS spectra at the retention time (66.40 min for a CS patient and 66.48 min for a CR patient) showing the maximum peak intensity during the course of elusion. The MS spectra show that the abundance of the differentially expressed peptide was increased by more than 2-fold in a CS patient, compared with that in a CR patient. C, Relative abundances of the four differentially expressed peptides from APOB in CS and CR samples. D, Higher abundances of all these peptides in CS patients than CR patients (p ϭ 1.291 ϫ 10 Ϫ4 from two-tailed t test).

Testing Differential Expression of the Biomarker Candidates in Independent
Samples-To test the differential expression of the 13 selected proteins, we first collected 13 independent serum samples that were obtained from five CR and eight CS patients. Characteristics of the patients are listed in supplemental Table S1. We obtained sera from these patients before NACT and then tested the differential expression of the selected proteins using Western blotting. Fig. 5A showed that four proteins (C3, C9, CP, and APOB) were significantly increased or decreased between CS and CR patients (p Յ 0.05) whereas two proteins AHSG (p ϭ 0.0797) and ORM1 (p ϭ 0.0534) were moderately changed. The observed changes were consistent with differential expression observed in the LC-MS/MS data. Interestingly, among the six proteins whose differential expression was confirmed by Western blotting, the four proteins (AHSG, C3, C9, and CP) belonged to the three clusters (Clusters 2, 4, and 5) that are in favor of separation of CS and CR patients.
Serum Proteome Profile Predictive of the Resistance to NACT in Advanced Breast Cancer-We finally built a PLS-DA model using the intensities of only the six selected biomarkers by computing discriminant scores of the samples (supplemental Fig. S4). Using the coordinate values (LV1, LV2) of the samples and the decision function defined in the PLS space, we then computed discriminant scores of the samples by projecting the coordinate values of the samples onto the axis orthogonal to the decision function. Fig. 5B showed that the clear separation of discrimination scores of CS and CR patients (p ϭ 1.0699e-017) could be achieved by PLS-DA. Furthermore, Fig. 5C shows that these biomarkers might be FIG. 3. Clustering analysis of the differential serum protein profile associated with the resistance to DC ؉ AC NACT. The columns and rows in the heat map represent samples and differentially expressed peptides, respectively. Red and blue colors indicate increased and decreased peptide expression levels, respectively, compared with the median intensity (white) of each peptide across all samples. The color bars denote the clusters of samples (CR [1][2][3][4][5], CS [1][2][3][4], and CS [5][6][7][8][9][10]) and peptides (Clusters 1 to 5). The relationships among the clusters were shown in the dendrograms. The bars on the right represent the peptides showing the shared differential expression between the serum proteomic data and the four tissue/cell line data sets. For clarity, only single representative overlapping peptide is labeled when there are multiple peptides for the same protein. See supplemental Table S7 for the cluster memberships of 50 differentially expressed peptides. also able to predict the relapse-free survival rate (p ϭ 0.0172), which is consistent with a high anti-correlation between chemoresistance and survival reported in other studies (47,48). Taken together, all these results indicate that the six selected biomarkers can predict the responses to NACT and further relapse-free survival rate. DISCUSSION Currently, there is no efficient method to predict tumor responses to NACT in breast cancer before therapy. Although several genes, including 1) several ABC transporters, 2) glutathione-S-transferase, and 3) thioredoxin, have been suggested as tissue biomarkers, no noninvasive serum biomarker capable of such prediction has been previously reported. In this study, we examined N-glycoproteome profile in serum with the goal to identify noninvasive serum biomarkers predictive of the resistance to DC ϩ AC NACT in breast cancer before therapy. To achieve this goal, we developed an approach that involves 1) profiling of serum N-glycoproteome from CS and CR patients; 2) identification of differentially expressed peptides between the two patient groups; 3) integration of the proteome data with differential mRNA and proteomic data in breast cancer tissues and cell lines; 4) PLS-DA to identify a further reliable set of biomarkers based on the collective contribution of the differentially expressed peptides; and 5) validation of differential expression of the selected biomarkers using Western blotting. Using the approach, we selected the six serum biomarkers (AHSG, C3, C9, CP, APOB, and ORM1) predictive of the resistance to DC ϩ AC NACT.
We integrated the N-glycosylated serum proteomic data with the four sets of mRNA and proteomic data associated with the resistance to either DC or AC, or prognosis after treatment in breast cancer tissues and cell lines. The integration results were used when the 50 differentially expressed peptides (23 unique proteins) were identified. Among the 23 proteins, seven (C9, CP, BTD, ORM1, C4A, SERPINC1, and F12) showed shared differential expression of mRNAs or proteins in the breast cancer tissue and cell line data. Thirteen proteins out of the 23 proteins were further selected to be tested in the independent samples by the VIP scores computed from PLS-DA. The five proteins (C4A, C9, CP, BTD, and ORM1) overlapped between the 13 and seven proteins. Furthermore, three of the five proteins (C9, CP, and ORM1) belonged to the final set of the six proteins whose differential expression was validated in the independent samples. Taken together, the results indicate the significance of the integration of serum proteomic data with tissue mRNA and proteomic data.
The six selected biomarkers are related to several biological processes including drug clearance, complement system, and LDL oxidation. First, the increased level of ORM1, known to be involved in DC clearance (49), indicates that clearance of DC was highly elevated in CR patients. Second, the increased level of C3 in CR patients may indicate the increased activity of generation of C3b in the early complement pathway. Unlike the increased activity of the early complement pathway, the decreased level of C9 in CR patients may indicate the decreased formation of membrane attach complex in the terminal complement pathway. The imbalance between the early and late complement systems might contribute to the resistance to NACT. Third, the serum level of CP, known to oxidize LDL, was reported to be decreased and increased in breast cancer patients who are sensitive and resistant to adjuvant chemotherapy, respectively (50). Also, LDL containing APOB has been reported to be decreased in serum after chemotherapy in breast cancer patients (51). In our study, CP and APOB were found to be increased and decreased in CR patients, respectively, which indicates that their collective dysregulation might contribute to the resistance to NACT. Taken together, these observations suggest that the differences in complement system and LDL oxidation can contribute to the resistance to NACT. The levels of some of the selected biomarkers in tissues or serum have been previously shown to be used to monitor the responses to chemotherapy or to be modulated during chemotherapy in several cancers, including breast cancers. However, their collective differential expression patterns associated with the resistance to NACT have been neither reported nor shown to be used to predict the response to NACT. Furthermore, no association of the selected candidates with the particular combination NACT using DC and AC has been previously reported. An interesting aspect of the six selected biomarkers is that the genes corresponding to the six selected biomarkers appear to be dominantly expressed in liver (supplemental Fig. S5), suggesting that the serum levels of the six biomarkers might be able to be also modulated by liver conditions in patients.
Both clustering and PLS-DA results show that there might be two subgroups of CS patients (Figs. 3 and 4A). The distinct signature in the two CS subgroups might be respon-sible for the complexity in evaluating chemoresistance using clinical and pathologic responses. Fig. 4A suggests that the differentially expressed peptides highly contributing to the separation along the LV2 (i.e. the peptides in Cluster 2 of Fig. 3, which showed distinct differential expression patterns in CS-1, compared with CS-2 and CR) can be used to distinguish the CS-1 group from CS-2 and CR groups. In our data, the peptides from the proteins involved in complement activation (C4A and C9) showed such differential expression pattern as shown in Fig. 3. Therefore, investigating the complement activation in CS patients before and after NACT might be able to provide a complementary feature for evaluation of the NACT responses.
In conclusion, we propose serum protein profile predictive of the resistance to NACT in breast cancers. Our results suggest the possibility that the six selected biomarkers can predict clearly the responses to NACT (Fig. 5B) and further predict relapse-free survival (p ϭ 0.0172; Fig. 5C). However, FIG. 5. Testing of differential expression of the six selected biomarkers between CS and CR patients. A, Differential expression of the six proteins confirmed in the independent samples using Western blotting. B, The clear separation of discriminant scores between CS and CR patients achieved by PLS-DA when the PLS-DA model was constructed only using the selected six biomarkers. The low p value indicates a high prediction power for the response to NACT. C, Relapse-free survival analysis between high and low discriminant score groups, showing that survival rates of two groups were significantly different (p value Յ 0.05). The low p value indicates that the prediction of the NACT resistance might also contribute to predicting survival and prognosis after the treatment.
the validity of the six selected biomarkers should be tested in a large group of patients and will be the subject of further investigations. * This study was supported by grants from the 21C Frontier Functional Proteomics Program, the Converging Research Center Program, and WCU program of the Korean Ministry of Education, Science and Technology (FPR08A1-050, FPR08A2-080, FPR08A1-010, FPC08A1-030, 2010K001298, and R31-2008-000-10105-0). We gratefully acknowledge funding from the National Heart, Lung, and Blood Institute of the NIH under Contract No. N01-HV-28179 as well as from the Swiss National Science Foundation Grant 3100A0-107679.
□ S This article contains supplemental Figs. S1 to S5 and Tables S1 to S13.