Proteomics Analysis of Conditioned Media from Three Breast Cancer Cell Lines

A “bottom-up” proteomics approach and a two-dimensional (strong cation exchange followed by reversed-phase) LC-MS/MS strategy on a linear ion trap (LTQ) were utilized to identify and compare expressions of extracellular and membrane-bound proteins in the conditioned media of three breast cell lines (MCF-10A, BT474, and MDA-MB-468). Proteomics analysis of the media identified in excess of 600, 500, and 700 proteins in MCF-10A, BT474, and MDA-MB-468, respectively. We successfully identified the internal control proteins, kallikreins 5, 6, and 10 (ranging in concentration from 2 to 50 μg/liter) in MDA-MB-468 conditioned medium as validated by ELISA and confidently identified Her-2/neu in BT474 cells. Subcellular localization was determined based on Genome Ontology terms for all the 1,139 proteins of which 34% were classified as extracellular and membrane-bound. Proteomics analysis of MDA-MB-468 cell lysate demonstrated that only 5% of all identified proteins were extracellular. This confirmed our hypothesis that examining the CM of cell lines, as opposed to the cell lysates, leads to a significant enrichment in secreted proteins. Tissue specificity, functional classifications, and spectral counting were performed. Elafin, a protease inhibitor, identified in the conditioned media of BT474 and MDA-MB-468 and the three kallikreins (KLK5, KLK6, and KLK10) were validated using an immunoassay on various serum and biological samples. Some of the secreted proteins identified have established roles in breast cancer development (cell growth, differentiation, and metastasis) and/or are linked to early onset breast cancer. Our approach to mining for low abundance molecules could identify proteins in various stages of breast cancer development. Many of the identified proteins are potentially useful to investigate as circulating serum breast cancer biomarkers.

middle and late ages of life as 75% of breast cancer is diagnosed in women over the age of 50 (2). Although breast cancer is less common at a young age, younger women tend to have a more aggressive form of the disease than older women. The 5-year survival rate is close to 97% when the cancer is confined to the breast (2). However, when breast cancer has metastasized at the time of diagnosis, the 5-year survival rate is ϳ23%. Gene expression patterns have been used to classify breast tumors into clinically relevant subgroups (luminal A, luminal B, basal, ERBB2-overexpressing, and normal-like) (3,4). In general, the luminal subtypes are estrogen receptor (ER) 1 -positive and grow slowly, whereas basal types lack ER and are usually high grade cancers that grow rapidly. Recently the molecular taxonomy has been confirmed by protein expression profiling (5,6).
Aberrant secretion or shedding of proteins is commonly associated with disease, including cancer. The pathogenic signaling pathways involved during the process of cancer initiation and progression are not confined to the cancer cell itself but are rather extended to the tumor-host interface (7). It is a dynamic environment in which fluctuating information flows between the tumor cells and the normal host tissue. Therefore, it is conceivable that either the tumor itself or its microenvironment could be sources for biomarkers that would ultimately be shed into the serum proteome, allowing for early disease detection (8), monitoring therapeutic efficacy, or understanding the biology of the disease (9). Given that ϳ20 -25% of all cellular proteins are secreted, it is reasonable to hypothesize that proteins or their fragments originating from cancer cells or their microenvironment may eventually enter the circulation (10).
Accordingly one of the best ways to diagnose cancer early or to predict therapeutic response is to use serum or tissue biomarkers. Carcinoembryonic antigen and carbohydrate antigen 15-3 (CA 15-3) are the most commonly used tumor markers for breast cancer. Their levels in serum are related to tumor size and nodal involvement and are recommended for monitoring therapy of advanced breast cancer or recurrence but are not suitable for population screening (due to low diagnostic sensitivity and specificity) (11)(12)(13). Currently mammography remains the cornerstone of breast cancer screening despite its disadvantages such as high false positive and negative rates, hazardous exposure, and patient discomfort (14,15). In addition, for women under the age of 40, mammographic screening yields a poor sensitivity of 33% (16,17). Recent technological advances in proteomics have opened up new and exciting avenues for the discovery of biomarkers or for characterization of molecules involved in cancer initiation and progression.
A number of different proteomics-based approaches have been utilized to discover and characterize disease-specific molecules. Fluid found within the ductal and lobular system of the breast can be extracted through the nipple using an aspiration device to obtain nipple aspirate fluid (NAF) (18). Non-pregnant and non-lactating women continuously secrete and reabsorb this fluid (19). Because of the complex nature of biological fluids relevant to breast cancer, only a handful of high abundance proteins have been identified in NAF, illustrating the need to find another source to mine for the initial biomarker discovery (20 -22).
A number of studies have used a cell culture model system in which the cells were grown in serum-free media to perform proteomics analysis (23)(24)(25)(26)(27)(28). The clinical relevance of using a cell culture model to understand biological processes and functions has been examined. Using DNA microarrays, the molecular subtypes of 31 breast cell lines yielded two discriminating clusters corresponding to luminal cell lines and basal/mesenchymal cell lines (29). The basal subtype was further subdivided into Basal A and Basal B; this subdivision was not observed in primary tumors. Also recently, it was found that cell lines display the same heterogeneity in copy number and expression abnormalities as the primary tumors (30).
In this study, we report a shotgun proteomics approach to sample the conditioned media of three human breast cell lines (MCF-10A, BT474, and MDA-MB-468). MCF-10A, a Basal B subtype with intact p53, was derived by spontaneous immortalization of breast epithelial cells from a patient with fibrocystic disease, and it has been used extensively as a normal control in breast cancer studies (31). These cells do not survive when implanted subcutaneously into immunodeficient mice (31). BT474, a luminal subtype obtained from a stage II localized solid tumor, is positive for ER and progesterone receptor (50 -60% of all breast cancer cases) (32). This cell line also displays amplification of Her-2/neu or ERBB2 (30% of all breast cancer cases) (33). Her-2/neu is a cell membrane surface-bound tyrosine kinase involved in signal transduction, leading to cell growth and differentiation. Its overexpression is associated with a high risk of relapse and death (33) and is the target of the therapeutic monoclonal antibody Herceptin (34). Finally MDA-MB-468, a Basal A-like subtype obtained from a pleural effusion of a stage IV patient (35), is ER-and proges-terone receptor-negative (15-25% of breast cancer) and phosphatase and tensin homolog-negative (30% of breast cancer) (36,37).
These cell lines were cultured in serum-free media (SFM) to ensure that the collected conditioned media (CM) contain no other extraneous proteins except for the secreted or shed proteins from the cancer cells. By collecting and concentrating large volumes of CM produced from cell lines representing seminormal (MCF-10A), non-invasive (BT474), and metastatic origins (MDA-MB-468), the secreted and shed proteins would accumulate in the CM, thereby facilitating their identification through MS. Our comparative proteomics analysis of the CM of MCF-10A, BT474, and MDA-MB-468 identified over 600, 500, and 700 proteins, respectively. A large portion of the proteins was present in all three cell lines; however, a significant portion contained proteins that were unique to each of the lines. Among these were our internal control proteins, human kallikreins 5, 6, and 10, that were identified by MS and ELISA in MDA-MB-468 cells at a concentration ranging from 2 to 50 g/liter. Members of the human kallikrein family (KLKs) have been implicated in the process of carcinogenesis, and the application of kallikreins as biomarkers for diagnosis and prognosis is currently being investigated. Kallikreins are secreted enzymes that encode for trypsin-like or chymotrypsinlike serine proteases (38). Prostate-specific antigen (KLK3), belonging to the family of human tissue kallikreins, and human kallikrein 2 (KLK2) currently have important clinical applications as prostate cancer biomarkers (39). In addition to the control proteins, various proteases, receptors, protease inhibitors, cytokines, and growth factors were identified. Cellular localization, biological function, and Unigene analyses were performed for the shortened list of candidates consisting of extracellular, membrane, and unclassified proteins. A significant degree of overlap was observed among the proteins identified in this study using a cell culture model and other studies using relevant biological fluids such as NAF and tumor interstitial fluid (TIF). The expression of four candidate molecules was examined in biological fluids, tissues, serum, and breast cytosols. Finally spectral counting analysis revealed promising molecules to investigate further for both understanding the disease and as potential biomarkers for breast cancer.

MATERIALS AND METHODS
Cell Lines-The breast epithelial cell line MCF-10A and the breast cancer cell lines BT-474 and MDA-MB-468 were purchased from the American Type Culture Collection (ATCC), Manassas, VA. MCF-10A was maintained in Dulbecco's modified Eagle's medium and F-12 medium (DMEM/F-12) supplemented with 8% fetal bovine serum, epidermal growth factor (20 ng/ml), hydrocortisone (0.5 g/ml), cholera toxin (100 ng/ml), and insulin (10 g/ml). BT-474 and MDA-MB-468 were maintained in phenol red-free RPMI 1640 culture medium (Invitrogen) supplemented with 8% fetal bovine serum. All cells were cultured in a humidified incubator at 37°C and 5% CO 2 in tissue culture T 75-cm 2 flasks.
Cell Culture-Approximately 30 ϫ 10 6 cells were seeded individually into six 175-cm 2 tissue culture flasks per cell line. After 2 days, the RPMI 1640 or DMEM/F-12 media were discarded, and the cells were rinsed twice with 1ϫ PBS. Following this, 30 ml of chemically defined Chinese hamster ovary serum-free medium (Invitrogen) supplemented with glutamine (8 mM) (Invitrogen) were added, and the flasks were incubated for an additional 24 h. The CM were collected and spun down to remove cellular debris. CM were then frozen at Ϫ80°C until further use. A 1-ml aliquot was taken at the time of harvest to measure for total protein (Bradford assay), lactate dehydrogenase (LDH), KLK5, KLK6, and KLK10 via ELISA. The adhered cells were trypsinized and counted using a hemocytometer. This procedure was repeated several times for reproducibility. In addition, 30 ml of the culture media (RPMI 1640 and DMEM/F-12) were subjected to the same conditions as above with no cells added and used for comparison. For the MDA-MB-468 cell lysate experiment, at the end of 24 h in SFM, the adhered cells were lysed using a French press (Thermo Electron) in which the cells are sheared by forcing them through a narrow space. Total protein was measured, and 400 g of protein from the lysate were added to 60 ml of chemically defined Chinese hamster ovary medium and processed in the same manner as the CM. The cell lysate experiment was performed in duplicate.
Sample Preparation-Two 30-ml CM aliquots were combined (60 ml) for each cell line, creating three biological replicates per cell line, and dialyzed using a 3.5-kDa molecular mass cutoff membrane. The CM were dialyzed in 5 liters of 1 mM ammonium bicarbonate solution overnight at 4°C with two buffer changes. The dialyzed CM were poured equally into two 50-ml conical tubes. The CM were frozen and lyophilized to dryness. The lyophilized sample was denatured using 8 M urea and reduced with DTT (final concentration, 13 mM; Sigma). Following reduction, the sample was alkylated with 500 mM iodoacetamide (Sigma) and desalted using a NAP5 column (GE Healthcare). The sample was lyophilized and trypsin (Promega)-digested (1:50, trypsin:protein concentration) overnight in a 37°C waterbath. Following this, the peptides were lyophilized to dryness.
Strong Cation Exchange Liquid Chromatography-The trypsin-digested dry sample was resuspended in 120 l of mobile phase A (0.26 M formic acid in 10% acetonitrile). The sample was directly loaded onto a PolySULFOETHYL A TM column (The Nest Group, Inc.) containing a hydrophilic, anionic polymer (poly-2-sulfoethyl aspartamide). A 200-Å pore size column with a diameter of 5 m was used. A 1-h fractionation procedure was performed using an HPLC system (Agilent 1100). A linear gradient of 0.26 M formic acid in 10% acetonitrile as the running buffer and 1 M ammonium formate added as the elution buffer was used. The eluent was monitored at a wavelength of 280 nm. Forty fractions, 200 l each, were collected every minute after the start of the elution gradient. These 40 fractions were pooled into eight combined fractions (each pool consisting of five fractions) and lyophilized to ϳ200 l.
Mass Spectrometry (LC-MS/MS)-The eight pooled fractions per replicate per cell line were loaded into a ZipTip C18 pipette tip (Millipore; catalogue number ZTC18S096) and eluted in 4 l of 68% ACN made up of Buffer A (95% water, 0.1% formic acid, 5% ACN, 0.02% TFA) and Buffer B (90% ACN, 0.1% formic acid, 10% water, 0.02% TFA). 80 l of Buffer A were added, and 40 l were injected onto a 2-cm C 18 trap column (inner diameter, 200 m). The peptides were eluted from the trap column onto a resolving 5-cm analytical C 18 column (inner diameter, 75 m) with an 8-m tip (New Objective). The LC setup was coupled on line to a 2-D linear ion trap (LTQ, Thermo Inc.) mass spectrometer using a nano-ESI source in data-dependent mode. Each pooled fraction was run on a 120-min gradient. The eluted peptides were subjected to MS/MS. DTAs were created using the Mascot Daemon (version 2.16) and extract_msn. The parameters for DTA creation were: minimum mass, 300 Da; maximum mass, 4000 Da; automatic precursor charge selection; minimum peaks, 10 per MS/MS scan for acquisition; and minimum scans per group, 1.
Data Analysis-The resulting raw mass spectra from each pooled fraction were analyzed using Mascot (Matrix Science, London, UK; version 2.1.03) and X!Tandem (Global Proteome Machine Manager, version 2.0.0.4) search engines on the non-redundant International Protein Index (IPI) human database version 3.16 (Ͼ62,000 entries). Up to one missed cleavage was allowed, and searches were performed with fixed carbamidomethylation of cysteines and variable oxidation of methionine residues. A fragment tolerance of 0.4 Da and a parent tolerance of 3.0 Da were used for both search engines with trypsin as the digestion enzyme. This operation resulted in eight DAT files (Mascot) and eight XML files (X!Tandem) for each replicate sample per cell line. Scaffold (version Scaffold-01_05_19, Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 95.0% probability as specified by the PeptideProphet algorithm (40). Protein identifications were accepted if they could be established at greater than 80.0% probability and contained at least one identified peptide. Protein probabilities were assigned by the ProteinProphet algorithm (41). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. The DAT and XML files for each cell line plus their respective negative control files (RPMI 1640 or DMEM culture media only) were inputted into Scaffold to cross-validate Mascot and X!Tandem data files. Each replicate sample was designated as one biological sample containing both DAT and XML files in Scaffold and searched with MudPIT (multidimensional protein identification technology) option clicked. The results obtained from Scaffold were processed using an in-house-developed program that generated the protein overlaps between samples. Each protein identification was assigned a cellular localization based on information available from Swiss-Prot, Genome Ontology (GO), Human Protein Reference Database, and other publicly available databases. To calculate the false positive error rate, the individual fractions were analyzed using the "sequence-reversed" decoy IPI human version 3.16 database by Mascot and X!Tandem, and data analysis was performed as mentioned above.
Ingenuity Pathways Analysis (IPA)-Extracellular, membranebound, and unclassified proteins were evaluated by Ingenuity Pathways Analysis software to identify global functions of the proteins. This software uses a knowledge base derived from the literature to relate gene products to each other based on their interaction and function. The list of proteins and their corresponding IPI human identification numbers were uploaded as an Excel spreadsheet file onto the Ingenuity software (Ingenuity Systems). Ingenuity then used these proteins and their identifiers to navigate the curated literature database. The biological functions assigned to each network were ranked according to the significance of that biological function to the network. A Fischer's exact test was used to calculate a p value. A detailed description of IPA can be found on the Ingenuity Systems website.
Spectral Counting-Using the number of total spectra output from Scaffold, we identified the differentially expressed proteins using spectral counting. Common peptides among proteins were grouped, proteins containing more than 10% of their total spectra from negative control samples were removed, and one Excel file containing total proteins identified and their presence (defined by spectral counts) in the three cell lines was generated. A normalization criterion was applied to normalize the spectral counts so that the values of the total spectral counts per sample were similar. An average of the spectral counts was generated for each cell line (based on the triplicate samples). The sum of the three variances for the cell lines, an indicator of the variance within each cell line, was calculated. The variance of the average spectral counts for each cell line revealed the variability between the cell lines. Analysis of variance (Fisher test) was performed to obtain the ratio of the "between sample variance" to the "within sample variance." Apparent -fold changes were calculated when possible.
Total Protein Assay and LDH Measurements-Total protein was quantitated in the CM using a Coomassie (Bradford) protein assay reagent (Pierce). All samples were loaded in triplicates on a microtiter plate, and protein concentrations were estimated by reference to absorbances obtained for a series of BSA standard protein dilutions. LDH, an intracellular enzyme that if found in the CM is an indicator of cell death, was measured using an enzymatic assay based on lactate to pyruvate conversion and parallel production of NADH from NAD. The production of NADH was measured by spectrophotometry at 340 mm using an automated method (Roche Applied Science Modular system).
Quantification of Elafin and KLK5, -6, and -10 by ELISA-Elafin sandwich ELISA kit, purchased from Hycult biotechnology, was used to measure levels of human elafin in serum, pooled biological samples, and pooled tissue lysates. The assay was performed according to the manufacturer's instructions. The concentration of KLK5, -6, and -10 was quantified with KLK5, -6, or -10-specific non-competitive immunoassays developed in our laboratory (42)(43)(44). For more details, see the cited literature.

RESULTS
Optimization of Cell Culture-MCF-10A, BT474, and MDA-MB-468 cells were grown in SFM to ensure that the conditioned media contained no other exogenous proteins. Seeding density, incubation time in SFM, volume of media used, type of SFM, and type of tissue culture flasks were all variables that were explored thoroughly to select the most optimal conditions for growth. Refer to "Materials and Methods" for details on the optimal conditions selected for the cell lines. Approximately 20, 33, and 40 ϫ 10 6 cells were found to be attached and alive at the end of the experiment with 25, 10, and 15 g of total protein/ml in MCF-10A, BT474, and MDA-MB-468, respectively (Supplemental Fig. 1, A and B). Furthermore to minimize cell death and maximize secreted protein concentration in the CM, LDH levels, which represent the amount of cell death occurring in cell culture, were also measured (Supplemental Fig. 1C). Known amounts of all three types of cells were lysed individually, and their corresponding LDH levels were graphed to create an LDH standard curve (Supplemental Fig. 1D). Using the LDH standard graph, the value of LDH found in the 24-h CM, and the total number of cells alive at the end of harvest, it was estimated that ϳ6 -7% cell death was occurring in the CM of the cells. Finally to demonstrate the accumulation of extracellular proteins in our optimized cell culture model system, our internal control proteins KLK5, KLK6, and KLK10 were quantified in MDA-MB-  Table 1A). Many of the proteins in this list originated from fetal bovine serum used to initially culture the cells. These proteins were deleted from the list of total proteins identified in the CM and were not considered further. In MCF-10A, we identified 632 proteins ( Fig. 2A). Of these, 459 were identified in all three replicates, yielding a protein identification reproducibility of 73%. Furthermore a total of 505 proteins were identified in BT474 (Fig. 2B). For this cell line, 380 proteins were common to all three replicates (75% reproducibility). Finally 723 proteins were identified in MDA-MB-468 (Fig. 2C) of which 553 were identified in all three replicates (76% reproducibility). In general, using the workflow presented here, we typically achieved technical reproducibility (same sample injected twice into the LTQ) of ϳ90% (data not shown). The total number of proteins identified per number of unique peptides per cell line is shown in Table I . Many of the proteins identified contained two or more unique peptide hits. Supplemental Table 2 contains detailed information on all of the proteins identified for each of the cell lines, including number of unique peptides identified per protein, peptide sequences, precursor ion mass, and charge states. In addition, eight, three, and five proteins were identified in MCF-10A, BT474, and MDA-MB-468 cells, respectively, using a non-sense database, yielding a false positive rate of ϳ1% (see Supplemental Table 1B) (45).

Identification of Internal Control Proteins in CM by MS-
One of the advantages of our approach to biomarker discovery was the presence of endogenous internal control proteins in the CM of MDA-MB-468. We identified KLK5, -6, and -10 in the CM of MDA-MB-468 in all three replicates for this cell line by MS. Supplemental Fig. 2 illustrates the sequences of KLK5, -6, and -10 and the peptides that were successfully identified by MS. Three unique peptides were identified for KLK5, two unique peptides were identified for KLK6, and three unique peptides were identified for KLK10 (Supplemental Fig. 2, A-C). Furthermore the BT474 cell line is well characterized to exhibit amplification of Her-2/neu (also known as ERBB2). The Her-2/neu protein consists of a cysteine-rich extracellular ligand binding domain (ECD), a short transmembrane domain, and a cytoplasmic protein-tyrosine kinase domain (46). ECD/Her-2 can be released by proteolytic cleavage from the full-length Her-2 receptor and detected in serum. In the CM of BT474, we successfully identified receptor tyrosine-protein kinase ERBB2 precursor with four unique peptides (Supplemental Fig. 2D). All of the peptides identified fall into the ECD portion (amino acids 23-652) of this 1255-amino acid protein. However, we did not identify CA 15-3 in the  Overlap of Proteins between the Three Cell Lines-The proteins identified among the three cell lines were analyzed for overlapping members (Fig. 3). A significant portion (234 or 20%) of the 1,139 proteins was identified in all three cell lines (Fig. 3A). Fig. 3, B and C, show the overlap among the 175 extracellular proteins and the 211 membrane-bound proteins, respectively. Combined together, extracellular and membrane proteins accounted for 34% of all proteins identified. MDA-MB-468 displayed the greatest number of extracellular and membrane proteins, presumably illustrating that cancer cells secrete and/or express an increased amount of these proteins. In accordance with this postulation, cellular localization analysis of the overlap between BT474 and MDA-MB-468 yielded the greatest percentage (40%) of secreted and membrane proteins (Fig. 3D).
Cell Lysate Proteome-One of the major challenges in the analysis of secreted proteins is distinguishing between proteins that are targeted to the extracellular space versus those that arise as low level contaminants due to cell death during routine cell culture. To address this, we performed a cell lysate proteome experiment to examine whether our approach was enriching for secreted proteins. A total of 716 proteins were identified in MDA-MB-468 cell lysate after removal of the negative control proteins (culture media only). Supplemental Table 3A shows all proteins identified. 87% protein identification reproducibility was observed among the two replicates (Fig. 3E). 5% of the total proteome was classified as extracellular/secreted (Fig. 3F). In the CM of MDA-MB-468, 13% of the total proteins identified were classified as being secreted (Fig. 2F). The internal control secreted proteins, kallikreins 5, 6, and 10, were not identified in the cell lysate. Of the secreted proteins identified in the cell lysate, 30 were also identified in the CM for this cell line, whereas 19 proteins were unique to the lysate. Supplemental Table 3B shows the secreted proteins that were found in both the MDA-MB-468 lysate and CM as well as the extracellular proteins found only in the lysate and CM, respectively.
Tissue-specific Expression and Biological Function Analysis-Unigene analysis of 422 extracellular, membrane-bound, and unclassified proteins ("shortened list of candidates"), using the Expressed Sequence Tag ProfileViewer, was utilized to identify genes that were relatively breast-specific. Of these genes, five were found to be relatively specific to normal and/or cancerous breast (SCGB1D2, SBEM, TFF1, DCD, and CALML5). Literature mining on these proteins showed that one of the proteins had previously been evaluated in serum of breast cancer patients (trefoil factor 1). As well, using the IPA software, we classified the shortened list of proteins by biological function. The top 15 functions are displayed in Fig. 4A. The top functions were cellular movement followed by cellto-cell signaling and interaction. Finally these proteins were searched with the Human Plasma Proteome database to decipher whether they have been identified in plasma. 104 of 422 proteins were identified in human plasma.

Comparison of Proteins Identified from CM with Other
Publications-Using the ICAT technology, Pawlik et al. (47) quantified tumor-specific proteins in NAF. In total, of the 39 proteins that were differentially expressed in tumor-bearing versus disease-free breasts, six were also found among the 1,139 proteins in our CM. In addition, Varnum et al. (48) identified 64 proteins in NAF using MS of which 15 were previously reported to be altered in patients with breast cancer. From the 64 proteins, 21 were also found in our proteomics study. More importantly, of the 15 proteins previously reported to be altered in serum or tumor from women with breast cancer, 10 were found in our CM proteome. A proteomics study involving MCF-7 targeted membrane-associated breast cancer proteins as potential biomarkers (25). Using 2-D electrophoresis and MS analysis of the protein spots, Canelle et al. (25) identified 98 proteins. Among the 98 proteins, 42 of them were also found in our study. In addition, Celis et al. (49) mined another source of biomarkers through the investigation of TIF that perfuses the breast tumor microenvironment. Given the importance of the tumor-host interface and the increasing appreciation of the role that the microenvironment plays in cancer initiation and progression, we compared our list of proteins identified through a cell culture model system with the proteins excreted by the cells that are within their native cancer microenvironment. The authors identified ϳ260 proteins using 2-D gel electrophoresis, immunoblotting, and mass spectrometry of which 112 were also identified in our study (43%). Fig. 4B summarizes the overlaps observed among the other publications and the data presented here. Supplemental Table 4 contains all of the proteins identified by us and the four previous studies. Finally the lung, along with the bone, is one of the most frequent sites of breast cancer metastasis. A set of 54 genes that mediate breast cancer metastasis to the lungs have been identified (50).
Given that MDA-MB-468 cells were collected from pleural effusion, we compared the list of proteins identified from MDA-MB-468 conditioned media with the 54 candidate lung metastasis genes. Seven genes were found in the CM of our cell line (KYNU, TNC, ROBO1, FSCN1, MAN1A1, LTBP1, and  GSN). Interestingly none of the genes that overlapped between the lung and bone metastasis signatures were identified in MDA-MB-468.
Spectral Counting and Differential Expression of Proteins Identified-An alternative way to decipher protein abundance is to perform multidimensional scaling for all nine experiments (each cell line in triplicate) using spectral counts. Refer to "Materials and Methods" for details on how the analysis was conducted. The Venn diagram in Supplemental Fig. 4 displays the overlaps of proteins among the cell lines based on spectral count analysis and their cellular localization. The top ϳ100 extracellular and membrane-bound proteins obtained from spectral counting analysis are shown in Table II. The variability within the replicates (within variance) and between the three cell lines (between variance) are highlighted along with the F ratio. Apparent -fold changes were calculated where possible. A numerical value is indicated in places where both cell lines/ conditions being examined contain a normalized spectral count greater than zero. In the event of a comparison where one of the conditions/cell lines had a spectral count of zero, an expression is given (e.g. BT Ͼ Ͼ MCF; indicating that the spectral count for BT474 was greater than MCF10A). Cells that are gray indicate a negative -fold change, whereas cells that are white indicate a positive (numerical value indicated) or no -fold change (cells are blank). In addition, in the first column displaying the -fold change between BT474/MCF-10A, cells in white highlight the proteins that have a higher spectral count in BT474 compared with MCF-10A, whereas cells in black highlight proteins that have a lower spectral count in BT474 compared with MCF-10A. A similar color coding scheme applies to the other two columns comparing the different cell lines/conditions within each column. Known breast cancer biomarkers such as Her-2/neu are among the top five proteins identified by this unbiased method of analysis. Furthermore 23 proteins previously associated with cancer (as determined by Ingenuity Biomarkers Comparison Analysis software) were found among the top ϳ100 extracellular and membrane proteins including epidermal growth factor receptor and various insulin-like growth factor-binding proteins (IGFBP-2, -3, -5, and -6). Supplemental Table 5 contains all of the 1,062 proteins on which this analysis was performed. In addition, Supplemental Table 6 contains the overlaps of the proteins among the cell lines.
Validation of Elafin and Kallikreins-Proteases and their inhibitors play an important role in cancer metastasis and angiogenesis (51). Elafin, a secreted epithelial proteinase inhibitor, also referred to as skin-derived antileukoprotein-ase or elastase-specific inhibitor, belongs to the Trappin gene family (52). Elafin has been found to be expressed in normal mammary epithelial cells but is down-regulated in most breast tumor cell lines (53). We identified elafin in BT474 and MDA-MB-468 with the latter having more unique peptides for the protein. Using a commercially available sandwich immunoassay for elafin, we examined a number of different biological samples for its expression. Serum from women with different levels of CA 15-3 was measured. Typically women with CA 15-3 levels of Ͻ30 units/ml are considered in the "normal" range (13). Examining sera from women with Ͻ30 units/ml, Ͼ30 but Ͻ100 units/ml, and Ͼ100 units/ml CA 15-3 for elafin showed no significant differences among the groups (Fig. 5A). In addition, the median values for elafin in normal and tumor breast cytosols showed that tumor cytosols had higher levels of this inhibitor than normal (Fig. 5B). However, the range in normal breast cytosols among the 15 samples examined varied   TABLE II  Top 100 differentially expressed extracellular and membrane proteins Apparent -fold changes obtained from spectral counting were calculated where possible. A numerical value is indicated in places where both cell lines/conditions being examined contain a normalized spectral count greater than zero. An expression value is given (e.g. BT ӷ MCF, indicating that the spectral count for BT474 was greater than MCF10A) when one of the conditions exhibited a spectral count of zero. Cells that are gray indicate a negative -fold change, whereas cells that are white indicate a positive (numerical value indicated) or no -fold change (cells are blank). DBI, diazepam-binding inhibitor; LISCH, lipolysis-stimulated lipoprotein receptor.
considerably. The levels of elafin in a variety of pooled biological samples showed that milk contained this protein at 18 g/liter followed by urine at 7 g/liter, follicular fluid, and seminal plasma (Fig. 5C). These were biological samples from normal individuals and hence do not correlate to diseased phenotype. Finally we checked the levels of elafin in normal pooled tissue lysates. After correction of total protein content in the samples, colon, small intestine, and ureter contained the highest levels. Among the top 10 expressing tissues, breast was ranked as seventh (Fig. 5D). In addition, using the same sample sets, we measured the levels of kallikreins 5, 6, and 10. The serum samples did not show any biomarker potential. Kallikrein 6 levels in breast cytosols demonstrated some differentiation potential between normal and tumor. Interestingly in pooled biological samples, KLK5 was found to be expressed in all of the relevant breast fluids (milk, breast cyst fluid, and NAF) (Supplemental Figs. 5 and 6).

DISCUSSION
In this study, a shotgun proteomics strategy was utilized to sample the conditioned media of three breast cell lines, MCF-10A, BT474, and MDA-MB-468. By searching with both Mascot and X!Tandem, we successfully identified over 1,100 proteins in the CM of all three cell lines combined, which, to our knowledge, is one of the largest repositories of proteins identified for breast cancer. Studies have shown that by combining results from multiple search engines, a better, more confident protein identification is made because different search engines are based on different algorithms and scoring (54,55). Two different methods of determining relative abundance were used: protein identification and spectral counts. Although spectral counting as an index of protein abundance is appealing, there are a number of different ways to analyze a dataset such as counting the spectra and adjusting by the length of the protein (normalized spectral abundance factor) (56), counting peptides (not individual spectra) and adjusting for the number of tryptic peptides in the protein (protein  (57), calculating a function of protein abundance index called exponentially modified protein abundance index (58), counting 1 if any spectra matched a peptide and assigning 0 otherwise (SASPECT (significant analysis of peptide counts)), or merely counting the spectra. Currently there is no consensus as to which approach to use. In this study, we used both protein identification and spectral counts to determine protein abundance.
We specifically examined MCF-10A, BT474, and MDA-MB-468 because they represent progression phases to breast cancer. We observed a higher cell death in MCF-10A, which was expected because MCF-10A is considered to be a normal breast epithelial cell line that does not have the advantage of growing uncontrollably as do the cancer cell lines in SFM (Supplemental Fig. 1). It was our aim that examining the proteins that are unique to each of the cell lines might shed light into both the pathways leading to breast cancer development and to the discovery of biomarkers for breast cancer. Toward this aim, using the Ingenuity Pathways Analysis, the shortened list of candidates was filtered by extracting only those genes that were identified in literature to be present in human and associated with cancer. Approximately 100 of 422 extracellular and membrane genes were identified to meet this criterion. The top 14 canonical pathways that demonstrated a relationship to the 100 genes filtered as well as known genes linked to breast cancer are highlighted in Fig. 6. Dysregulation of signaling pathways plays an important role in cancer initiation and progression. The proteins identified in this study may be useful for further investigation.
One of the advantages of our high throughput qualitative comparative proteomics analysis was the presence of internal controls in the CM of MDA-MB-468. KLK5 expression, as assessed by quantitative RT-PCR, has been implicated as an independent and unfavorable prognostic marker for breast carcinoma (59). The clinical utility of KLK6 as a breast cancer marker has not been determined, whereas KLK10 levels have been shown to predict response to tamoxifen therapy (60). Based on experience with other currently used biomarkers, the expected concentration of new cancer markers in serum should be in the low g/liter range. Through our strategy, we

Analysis of Breast Cancer Conditioned Media
successfully identified all three control proteins by MS, thus supporting the notion that we are enriching for, and surveying deep enough, into the low abundance proteins to mine for other candidate biomarkers for breast cancer. Furthermore the fact that we successfully identified the ECD portion of Her-2/neu in BT474 further supports our hypothesis that candidate biomarkers can be discovered using the CM of breast cancer cell lines. High levels of Her-2 in serum correlate with poor prognosis in patients with breast cancer (61). In 2000, the Food and Drug Administration cleared the serum Her-2/ neu test, the first Food and Drug Administration-cleared blood test for measuring circulating levels of Her-2 in the follow-up and monitoring of patients with metastatic breast cancer. The assay, ADVIA Centaur HER-2/neu Assay (Oncogene Science Biomarker Group), is a sandwich immunoassay that uses two monoclonal antibodies that are specific for unique epitopes on the extracellular domain of the Her-2 oncoprotein. The assay has a reference limit of 15 g/liter with a sensitivity of 0.1 g/liter (62).
Particular emphasis was placed on extracellular, membrane-bound, and unclassified proteins because these proteins have the highest chance of being found in the circulation

Analysis of Breast Cancer Conditioned Media
and thus serving as cancer biomarkers or as important molecules involved in cancer progression. More than 34% of these proteins were classified as being extracellular and membrane-bound. Among the known and novel proteins released by breast cancer cells, we identified various proteases, receptors, protease inhibitors, and cytokines. All experiments were performed in triplicates with excellent reproducibility between runs. Due to the inherent nature of mass spectrometers, not all peptides were ionized in each run, and consequently, different peptides were selected for ionization and detected. This selective ionization can account for the 75% reproducibility in our biological triplicates. As well, the various steps during sample preparation, including C 18 extraction of the fractions, cannot be dismissed as an important contribut-ing factor to the variations observed. In addition, we validated the ability of elafin to discriminate between normal and tumor breast cytosols. Currently no conclusion can be drawn without examining more cytosols. Although we did not observe circulating biomarker potential in elafin to discriminate normal from diseased individuals, to our knowledge, we are the first to report on the expression of this protein in breast cytosols, biological fluids, and tissue lysates using an immunoassay. Elafin was found to be expressed in normal breast tissue.
Although the objective of this study was to identify secreted and membrane-bound proteins that have the potential to be cleaved and thus found in circulation, the identified proteins included many intracellular proteins, including ones classified by GO as nuclear and cytoplasmic. During the cell culture FIG. 6. Canonical pathways and known genes linked to breast cancer. Using Ingenuity Pathways Analysis, ϳ100 genes were identified in the literature as being reported in humans and cancer from the shortened list of candidates. The top 14 canonical pathways have been mapped onto the 100 genes along with genes linked to breast cancer including IGFBP5, VEGF, and ERBB2. CP, canonical pathway; Fx, function. process, a portion of the cell population will die, resulting in the release of intracellular proteins into the media. Despite optimizing cell culture conditions to minimize cell death, the identification of intracellular proteins in the CM is inevitable because of the high sensitivity of MS-based techniques utilized in this study. Martin et al. (23) examined the CM of a prostate cancer cell line and found a very similar GO distribution to the one we present here: that more than 50% of the proteins identified were intracellular. Therefore, one of the major challenges in the analysis of secreted proteins is distinguishing between proteins that are targeted to the extracellular space versus those that arise as low level contaminants due to normal cell death in routine cell culture. To address this, we performed a cell lysate proteome experiment to demonstrate conclusively that through our approach we are significantly enriching for secreted proteins. Furthermore a recent study examining the cell lysate proteome of a human mammary epithelial cell line found that 2% of the entire proteome was classified as extracellular, which was consistent with our findings (63).
Recently the analysis of thousands of genes in breast and colorectal cancers has shown that individual tumors can accumulate an average of 90 mutant genes (64). The authors identified 189 previously unknown genes that were mutated at a high frequency. From these genes, six were found in this study (filamin-B, spectrin ␣ chain, gelsolin, extracellular sulfatase Sulf-2, neuronal cell adhesion molecule, and polypeptide N-acetylgalactosaminyltransferase 5). Furthermore it is particularly important that many of the proteins identified by other groups using relevant biological fluids such as NAF and TIF were also present in our analysis (see Supplemental Table  8). Many of the proteins identified by Pawlik et al. (47) were highly abundant serum proteins such as albumin, transferrin, and various immunoglobulins, all of which were not identified in our serum-free media. As well, the top biological functions of the candidate molecules appear well suited to cancer initiation and progression such as those involved in cellular movement and cell-to-cell signaling. Finally a large portion of the shortened list of proteins identified was also found in the Human Plasma Proteome database. This finding was not unexpected as it served to highlight the fact that many of the proteins identified in our study had previously been found in plasma. But it also demonstrated that many more proteins have yet to be identified in plasma. There can be a number of reasons why they have not been identified, one of which is the fact that their concentration in plasma is too low to measure by current technologies, and thus other means of initially identifying them and then developing a specific and sensitive immunoassay are critical.
Our group has previously published data on the conditioned media of a prostate cancer cell line, PC3(AR)6, using a roller bottle cell culture method (65). Through this approach, the authors identified 262 proteins in the CM. The workflow presented in the current study has significant differences and improvements compared with our previous work. At the tissue culture step, we cultured three cell lines in triplicate versus only one cell line studied. We also optimized our cell culture conditions (seeding density, incubation time, and volume of media used) to minimize cell death and maximize secreted protein content. Different methods of fractionation of the peptides versus protein fractionation and a more robust and sensitive mass spectrometer were utilized in this study compared with our previous work. Finally the bioinformatics analysis has also been significantly improved upon from before as we used two different search engines and incorporated protein and peptide probability calculations into our final list of proteins. As a result, in the current study, we identified over 1,000 proteins using the conditioned media approach.
Current therapies for advanced cancers are elusive. Novel breast cancer biomarkers that can be effective early in the course of the disease have the potential to reduce morbidity and mortality as well as receive a higher compliance rate by patients for undergoing screening. However, there is a growing consensus that panels of markers may be able to supply the specificity and sensitivity that individual markers lack. A number of studies have demonstrated that this is indeed true. Although a biomarker should be detected in serum using an immunoassay (ELISA), developing such an assay for multiple potential novel biomarkers is very labor-intensive (66). The vast majority of the proteins identified in this study (extracellular, membrane, and unclassified) do not have commercially available ELISA kits. Alternatively to decipher whether the candidates are present in serum, multiple reaction monitoring mass spectrometry technology can be performed. Using the latter technology, it is possible, in a single experiment, to detect and quantify specific peptides (representing specific proteins) in biological fluids of patients with breast cancer to determine whether the protein has biomarker potential. A number of studies have shown the feasibility of such an approach (67)(68)(69).