Multivariable Difference Gel Electrophoresis and Mass Spectrometry

Multivariable DIGE/MS was used to investigate proteins altered in expression and/or post-translational modification in response to activation of transforming growth factor (TGF)-β receptors in MCF10A mammary epithelial cells overexpressing the HER2/Neu (ErbB2) oncogene. Proteome changes were monitored in response to exogenous TGF-β over time (0, 8, 24, and 40 h), and proteins were resolved using medium range (pH 4–7) and narrow range (pH 5.3–6.5) isoelectric focusing combined with up to 2 mg of protein to allow inspection of lower abundance proteins. Triplicate samples were prepared independently and analyzed together across multiple DIGE gels using a pooled sample internal standard to quantify expression changes with statistical confidence. Unsupervised principle component analysis and hierarchical clustering of the individual DIGE proteome expression maps provided independent confirmation of distinct expression patterns from the individual experiments and demonstrated high reproducibility between replicate samples. Fifty-nine proteins (including some isoforms) that exhibited significant kinetic expression changes were identified using mass spectrometry and database interrogation and were mapped to existing biological networks involved in TGF-β signaling. Several proteins with a potential role in breast cancer, such as maspin and cathepsin D, were identified as novel molecules associated with TGF-β signaling.

2D 1 gel-based approaches are often used to survey the proteome on a global scale, typically resolving thousands of intact proteins based on charge (using isoelectric focusing) and apparent molecular mass (using SDS-PAGE). Despite its popularity for differential display proteomics, 2D gel-based strategies have until recently lacked the ability to directly quantify abundance changes in the same fashion as in stable isotope strategies using liquid chromatography coupled with tandem mass spectrometry (1)(2)(3), i.e. by multiplexing samples into a single run to remove analytical (gel-to-gel or columnto-column) variation. Multiplexing samples labeled with stable isotopes have been used in gel-based proteomics (4), but in this case abundance changes are monitored during the mass spectrometry stage on each individual protein prior to the knowledge of which proteins are changing. DIGE technology (first described by Unlu et al. (5)) adds an essential quantitative component to 2D gel-based strategies and allows for the detection of subtle changes in protein abundance with statistical confidence (6 -8). DIGE uses three spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5) to label up to three samples to be run together on the same 2D gel. A pooled mixture containing an equal aliquot of all samples is made and labeled in bulk with Cy2 and used as an internal standard to coordinate between multiple DIGE gels with each gel containing two samples from the experiment that have been individually labeled with Cy3 or Cy5. This use of a pooled sample internal standard provides every resolved protein form under survey with a unique internal standard across a coordinated set of DIGE gels (Refs. 9 and 10; for a review, see Ref. 11) and allows for the quantitative comparison of proteomic changes with statistical confidence afforded by analyzing replicate samples relative to the same internal standard.
As with any complex system, pre-and postfractionation allows for access to lower abundance proteins by increasing the total amount of protein analyzed without overloading the analytical system. Prefractionation is useful for subproteomes but can often introduce additional non-biological variation into the samples that must be controlled. Postfractionation of complex samples is perhaps most popular in complementary peptide-based proteomics strategies using multidimensional HPLC separations prior to mass spectrometry (e.g. multidimensional protein identification technology (MudPIT) (12)). In a similar fashion, resolution and sensitivity can be improved in a 2D gel experiment by postfractionating complex proteomes with medium range (e.g. pH 4 -7) and narrow range (e.g. pH 5.3-6.5) isoelectric focusing gradients with commensurate increases in protein load (13). By resolving intact proteins, 2D gels can resolve multiply charged isoforms (that may result from phosphorylation or other charged post-translational modifications) and biologically significant proteolytic products. Subsequent mass spectrometry can verify that a set of isoforms is in fact related without necessarily identifying the modified peptide(s), whereas such changes may be completely overlooked in the more sensitive peptide-based approaches without mass spectral information on the modified peptide(s).
The main objective of this study was to demonstrate the technical advantage of DIGE/MS to facilely quantify changes in protein abundance and/or post-translational modification on a global scale from multiple experimental variables, each with independent biological repetition. To this end, we used as a model system transforming growth factor-␤ (TGF-␤) stimulation of human mammary epithelial cells transfected with a HER2 expression vector. This model system is of particular biological interest because overexpression of the tyrosine kinase receptor HER2/Neu (ErbB2) is detected in ϳ25% of breast cancers (14). TGF-␤ is a cytokine that suppresses early tumor formation but promotes tumor progression and metastasis at later stages (15) and has been shown to synergize with ErbB receptor tyrosine kinases. For example, overexpression of active TGF-␤1 (or active receptor mutants) in transgenic mice also expressing murine mammary tumor virus/Neu (ErbB2) accelerates metastases from Neuinduced mammary cancers (16 -18). A genetic modifier screen in non-tumorigenic mammary epithelial cells identified TGF-␤1 and TGF-␤3 as molecules that cooperate with HER2 in inducing cell motility and invasion (19), and inhibition of HER2 with the antibody trastuzumab blocked the promigratory effect of TGF-␤ on HER2-overexpressing mammary epithelial cells (20). Despite these and other studies, little is known regarding the molecular mechanisms or downstream effectors of cross-talk between TGF-␤ and ErbB receptor signaling, making the findings herein of potential value for further investigation.
Changes due to TGF-␤ stimulation were assessed over time (0, 8, 24, and 40 h) in triplicate experiments that were analyzed coordinately by DIGE. High resolution information on over 1500 protein forms was surveyed in the pH 4 -7 range (0.5 mg of total protein per gel), and in some cases increased sensitivity and resolution were afforded by using narrow range isoelectric focusing (pH 5.3-6.5) in conjunction with up to 2 mg of total protein per gel. Principle component analysis (PCA) and unsupervised hierarchical clustering (HC) of the individual Cy3-and Cy5-labeled DIGE expression maps provided independent confirmation of distinct expression patterns from each group and demonstrated high reproducibility between the replicate samples. Each experimental condition was measured using independent replicates for statistical confidence and to rule out false-positive results due to nonbiological variation. Proteins of interest were identified using mass spectrometry and database interrogation, and identified proteins were mapped to existing biological networks and pathways to reveal additional information regarding the relationship of the proteins identified in these studies.
Extraction of Total RNA and RT-PCR-Total RNA was extracted using the RNeasy minikit (Qiagen). RT-PCR was carried out using the Titanium one-step RT-PCR kit (BD Biosciences). For each RT-PCR, 100 ng of RNA were added to a 50-l reaction system according to the manufacturer's protocol (50°C for 1 h followed by 30 cycles of PCR amplification) using primers 5Ј-gcaatggatgccctgcaactagcaaattc and 5Ј-cacttaaggagaacagaatttgccaaag specific for maspin cDNA. The PCR products were analyzed in 1.2% agarose gels.
DIGE Experimental Design-The mixed internal standard methodology of Friedman et al. (10) and Gerbasi et al. (22) was used with the following modifications. Each experiment contained four conditions repeated in triplicate, generating 12 individual samples that were co-resolved across six DIGE gels all coordinated by the same pooled sample internal standard. Experiments utilizing 24-cm pH 4 -7 IEF gradients contained 0.5 mg of total protein equally divided between any two samples and an aliquot of the internal standard as follows. For each individual sample, 0.25 mg of protein was separately precipitated with methanol and chloroform (23) and resuspended in 30 l of labeling buffer (7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris, 5 mM magnesium acetate). One-third of each sample (10 l, 83.3 g) was removed and combined into a single tube to comprise the pooled sample internal standard. Thus for each six-gel experiment comprising 12 individual samples, the pooled standard contained 1000 g. The remaining two-thirds of each individual sample (20 l, 166.7 g) was labeled with 200 pmol of either Cy3 or Cy5, whereas the pooled sample was labeled en masse with 1200 pmol of Cy2. A dye-swapping scheme was used such that the three samples from any condition were never labeled all with Cy3 or Cy5 to control for any dyespecific labeling artifacts that might occur. Experiments utilizing 24-cm pH 5.3-6.5 IEF gradients contained 2 mg of total protein treated essentially as above, only starting with 1 mg of extract from each sample, labeling individual samples with 400 pmol of Cy3/Cy5, and labeling the pooled sample internal standard with 2400 pmol of Cy2.
The N-hydroxysuccinimidyl ester forms of Cy2, Cy3, and Cy5 were used following standard methods for minimal labeling. Briefly labeling was performed for 30 min on ice in the dark after which the reactions were quenched with the addition of 10 mM lysine (2 l for each 200 pmol of dye) for 10 min on ice in the dark. The quenched Cy3-and Cy5-labeled samples (each containing 166.7 g of protein) were then combined and mixed with a 166.7-g aliquot of the Cy2-labeled pooled standard after which an equal volume of 2ϫ rehydration buffer (7 M urea, 2 M thiourea, 4% CHAPS, 4 mg/ml DTT) was added. The mixtures were brought up to final volume of 450 l with 1ϫ rehydration buffer (same as above except for 2 mg/ml DTT) after which 0.5% IPG buffer 4 -7 or 5.3-6.5 was added and mixed thoroughly. Although the dye:sample ratios are skewed from the manufacturer's recommendation, we have validated that these ratios provide sensitive labeling comparable to SYPRO Ruby total protein staining while allowing for optimal protein amounts to facilitate subsequent mass spectrometry (10, 22, 24 -26).
2D Gel Electrophoresis and Imaging-For each six-gel DIGE experiment, tripartite-labeled samples for each gel (450-l final volume) were passively rehydrated into 24-cm pH 4 -7 and pH 5.3-6.5 IPG strips (Amersham Biosciences/GE Healthcare) for 24 h followed by simultaneous isoelectric focusing using a manifold-equipped IPGphor IEF unit (Amersham Biosciences/GE Healthcare) according to the manufacturer's instructions for a total of 60 kV-h. The cysteine sulfhydryls were reduced and carbamidomethylated while the proteins were equilibrated into the second dimension loading buffer by incubating the focused strips in equilibration buffer (30% glycerol, 2% SDS, 6 M urea, 50 mM Tris, pH 8.8, trace bromphenol blue) supplemented with 1% DTT for 20 min at room temperature followed by 2.5% iodoacetamide in fresh equilibration buffer for an additional 20-min room temperature incubation. Second dimensional SDS-PAGE was performed on hand-cast 12% SDS-PAGE gels using low fluorescence glass plates with one plate presilanized to preferentially affix the gel, thereby ensuring the accuracy of subsequent robotic protein excision. Electrophoresis was carried out at 0.2 watts/gel for 3 h followed by 20 watts/gel until completion using a DALT-12 unit (Amersham Biosciences/GE Healthcare).
The differentially labeled co-resolved proteome maps within each DIGE gel were imaged at 100-m resolution separately by dyespecific excitation and emission wavelengths using a Typhoon 9400 variable mode imager (Amersham Biosciences/GE Healthcare). 16-bit tagged image file format images were cropped and exported for analysis using the DeCyder version 6.5 suite of software tools. After imaging for CyDye components, the non-silanized glass plate was removed, and the gels were fixed in 50% methanol, 7% acetic acid for 2 h and then incubated in SYPRO Ruby (Invitrogen) in the dark overnight. This poststain also visualizes approximately 97% of unlabeled protein and ensures accurate protein excision as the molecular weight and hydrophobicity of the CyDyes influence the apparent molecular mass of proteins during SDS-PAGE. SYPRO Ruby images were acquired on the same imager as well as reimaged postexcision to ensure accurate protein excision.
DIGE Analysis-The DeCyder version 6.5 suite of software tools (Amersham Biosciences/GE Healthcare) was used for DIGE analysis. The Differential In-gel Analysis module was used to quantitatively compare the normalized volume ratio of each individual protein spot feature from a Cy3-or Cy5-labeled sample on a given gel relative to the Cy2 signal from the pooled sample internal standard corresponding to the same spot feature. Within each gel, the co-resolved fluorescent signals from each protein (two individual samples and one internal standard) are co-detected by the software, and abundance measurements are made directly to the internal standard without interference from gel-to-gel variation, obviating the need to run analytical replicates for each sample.
The Differential In-gel Analysis datasets for each individual gel were then collectively analyzed using the Biological Variation Analysis module, which allowed for the facile matching of protein migration patterns and normalization of Cy3:Cy2 and Cy5:Cy2 quantitative abundance ratios for each protein between gels of a coordinated set, again using the unique signal of each protein from the pooled internal standard. In this way, multiple variables each with independent experimental repetition were coordinately quantified with statistical confidence and without the requirement that every pairwise comparison be made within a single 2D DIGE gel. Statistical significance was associated with each change in abundance or charge-altering posttranslational modification using Student's t test and analysis of variance (ANOVA) analyses that compare the variation of expression within a group to the magnitude of change between groups. Many statistically significant changes were observed within the 99.9th percentile confidence interval (representing less than one false positive for approximately 1500 proteins resolved in a gel), but changes within the 95th percentile were also considered.
Unsupervised PCA and HC was performed using the DeCyder Extended Data Analysis module. These multivariate analyses clustered the individual Cy3-and Cy5-labeled samples based on the collective comparison of expression patterns from the proteins identified in Table I. These groups of protein expression characteristics are represented by each data point in the PCA plots and by each column in the HC expression matrixes (heat maps). PCA reduces the complexity of a multidimensional analysis into two principle components, PC1 and PC2, which orthogonally divide the samples based on the two largest sources of variation in the dataset. Values within the circles of the PCA plots are within the 95th percentile confidence interval. HC performs a similar unsupervised clustering of the samples based on similarities of expression patterns in the selected proteins, which are visually presented as horizontal lines in an expression matrix "heat map" using a standardized log abundance scale ranging from Ϫ0.5 (green) to ϩ0.5 (red). HC expression matrixes were calculated using Euclidean correlation and average linkage. Mapping of proteins identified by mass spectrometry and database interrogation (see below) onto existing networks and pathways was accomplished using Ingenuity Pathway Analysis software (Ingenuity Systems, Inc.).
In-gel Digestion, Mass Spectrometry, and Database Interrogation-Proteins of interest were robotically excised and digested into peptides in-gel with modified porcine trypsin protease (Trypsin Gold, Promega), and peptides were applied to a stainless steel target using an integrated Spot Handling Workstation (Amersham Biosciences/GE Healthcare) according the manufacturer's recommendations. Peptide samples (0.3 l) were robotically mixed wet on the target with an equal volume of ␣-cyano-4-hydroxycinnamic acid (5 mg/ml in 60% acetonitrile, 0.1% trifluoroacetic acid supplemented with 1 mg/ml ammonium citrate).
MALDI-TOF MS and data-dependent TOF/TOF tandem MS/MS was performed on a Voyager 4700 mass spectrometer (Applied Biosystems, Framingham, MA). MALDI-TOF mass spectra were acquired in reflectron positive ion mode, averaging 1500 laser shots per spectrum. Peptide ion masses (M ϩ H) were accurate to within 20 ppm after internal calibration using the trypsin autolytic peptides at m/z 842.51 and 2211.10. TOF/TOF tandem MS fragmentation spectra were acquired in a data-dependent fashion based on the MALDI-TOF peptide mass map for each protein, averaging 2000 laser shots per fragmentation spectrum on each of the 20 most abundant ions present in each sample (excluding trypsin autolytic peptides and other known background ions).
The resulting peptide mass maps and the associated fragmentation spectra were collectively used to interrogate sequences present in the Swiss-Prot and National Center for Biotechnology Information non-redundant (NCBInr) databases to generate statistically significant candidate identifications using GPS Explorer software (Applied Biosystems) running the MASCOT search algorithm (Matrix Science). Searches were performed without constraining protein molecular weight or isoelectric point, with complete carbamidomethylation of cysteine, with partial oxidation of methionine residues, and with one missed cleavage also allowed in the search parameters. Evidence of proteins derived from mycoplasma was found in the course of these studies but did not have an effect on the statistical significance of the proteins presented in this study. Significant Molecular Weight Search (MOWSE) scores (p Ͻ 0.05), number of matched ions, number of matching ions with independent MS/MS matches, percent protein sequence coverage, and correlation of gel region with predicted molecular weight and pI were collectively considered for each protein identification (all data are presented in Table I and Supplemental  Table 1).

Time Course of TGF-␤-induced Proteome Changes
Measured by DIGE/MS-MCF10A/HER2 cells were treated with 2 ng/ml TGF-␤1 for 0, 8, 24, and 40 h, and efficacy of ligand treatment was confirmed by monitoring phosphorylation of Smad2, which occurs directly by the TGF-␤ type I receptor upon ligand binding (Fig. 1A). To assess proteomic changes in these samples, we performed DIGE/MS analysis using a pooled sample internal standard present on every gel (see "Experimental Procedures"). With this experimental design, every DIGE gel in a six-gel set contains two of the 12 samples (labeled with either Cy3 or Cy5) along with an equal aliquot of the Cy2-labeled internal standard (Fig. 1B, 12-mix). Although an 8-and 24-h sample may be co-resolved on the same gel, the quantitative comparison for a resolved protein is between the Cy3 or Cy5 signals and the Cy2 internal standard signal for this protein, not between the Cy3 and Cy5 signals directly. The intragel ratios for each resolved protein (Cy3:Cy2 and Cy5:Cy2) are then normalized to the cognate ratios from the other gels. In this way, for a given protein form, the normalized abundance ratios from all 12 samples can be intercompared with statistical confidence in a concerted six-gel experiment all coordinated by the same mixed sample internal standard. A schematic of the loading matrix, indicating how the individual samples were labeled and loaded into the six DIGE gels along with the internal standard, is shown in Fig. 1C. Within this loading matrix, we have depicted a theoretical protein that is up-regulated at the 40-h time point and indicated (with dotted lines), how the relative abundance is quantified for this protein within each gel relative to the cognate signal from the protein in the Cy2-labeled internal standard (without gel-togel variation), and how the Cy3:Cy2 and Cy5:Cy2 ratios are then normalized between the gels using the Cy2 signal for that protein. The graphical readout for this theoretical change is shown in Fig. 1D where the normalized abundance ratios (i.e. relative to the internal standard that is normalized across all six gels) are all graphed together to easily visualize the relative magnitude and reproducibility of the expression data.
DIGE analysis flagged 26 protein forms that exhibited statistically significant (ANOVA, n ϭ 3) changes in abundance or charge-altering post-translational modification that were greater than 1.2-fold displaying early, late, or biphasic kinetics ( Fig. 2A and Table I, lines 1-26). Many of the changes fell within the 99.9th percentile confidence interval where only one false positive is expected for over 1000 features, but changes within the 95th percentile were also considered (Table I, lines 1-26).
Proteins were excised from a subset of the six gels and subjected to protein identification using mass spectrometry and database interrogation as described under "Experimental Procedures," the results of which are summarized in Table I (lines 1-26) and detailed in Supplemental Table 1. The 26 identified features specified 23 unique proteins including redundancies due to post-translational modification or proteolysis. Many of these 23 proteins were also found as additional charge-related isoforms (migrating with closely related isoelectric points but indistinguishable apparent molecular masses, data not shown) consistent with a charge-altering post-translational modification. However, the expression patterns of these isoforms were overall similar to those listed in Table I, and in most cases these were omitted from Table I for brevity.
The above analysis used medium range pH 4 -7 gradients for the first dimension isoelectric focusing because this condition offers increased resolution and sensitivity (loading 0.5 mg of total protein into each gel) compared with broader range gradients (e.g. pH 3-11) (13). To increase further the pI resolution and to gain access to lower abundance proteins in these samples, we reanalyzed the same samples using narrow range isoelectric focusing (pH 5.3-6.5) in a second six-gel set with 4-fold more (2 mg) total protein per gel ( Fig. 2B and Table I, lines 27-59). We identified 33 additional features (specifying 31 proteins) in the narrow range DIGE analysis exhibiting kinetics similar to those observed above that were subsequently identified by mass spectrometry and database interrogation (Table I and Supplemental Table 1 (Table I,  similarly confirmed by positional matching and inspection of the DIGE profiles but were not further confirmed by mass spectrometry (data not shown). The remaining 20 features listed in Table I for the narrow range experiment displayed only marginal changes (in magnitude and/or statistical significance) that may not have been selected in the first pH 4 -7 experiment due to the presence of stronger candidates in the dataset.
Principle Component Analysis, Hierarchical Clustering, and Pathway Analysis-We sought to further validate the experimental samples for relevancy of ligand binding and to establish biological significance of the resulting protein changes (as opposed to stochastic changes) by performing multivariate statistical tests and network mapping of the proteins identified by DIGE/MS. PCA reduces the dimensionality of a multidimensional analysis to display the two principle components that distinguish between the two largest sources of variation within the dataset.
In both pH range analyses, PCA indicated distinct expression patterns from the four groups and demonstrated high reproducibility between the replicate samples (Fig. 3, A and  B). Each data point in the PCA plots describes the collective expression profiles for the subset of proteins listed in Table I for each pH range (Fig. 3, A and B). For the 26 features identified in the pH 4 -7 analysis, the first principle component distinguished 65.5% of the variance with 19.4% additional variation distinguished by the second principle component. For the pH 5.3-6.5 analysis, these values were 81.1% of the variance and 8.3% additional variance for PC1 and PC2, respectively. In addition, the PCAs demonstrate that the greatest amount of variation in the experiment is what distinguishes the 40-h time point from the others. Although quite similar, the different relative orientations of the groups between the pH 4 -7 and 5.3-6.5 analyses are most likely due to a different subset of proteins used for the analysis in each pH range.
These grouping assignments were reiterated in an unsupervised HC analysis of the protein expression patterns within each sample (Fig. 3, C and D). HC compares groups based on similarity of the collective expression patterns of the selected proteins with similarity being proportional to the lateral distance depicted in the branched dendrograms above each expression matrix (heat map). Each column in the HC expression matrix is effectively the same as each data point in the PCA plots. For clarity, expression and identification information for each of the individual proteins in the HC analysis (grouped into horizontal bars in the expression matrix and related via a similar dendrogram on the left) is directly tied to the numerical line entry in Table I, which summarizes the mass spectrometry search results and details the DIGE results. Similar PCA and HC results were found for both pH ranges when expanding the dataset to over 100 features by relaxing significance thresholds (data not shown). The PCA and HC results validate the biological significance of the protein ex-pression changes detailed in Table I as we would not expect these individual samples to cluster in this way if these individual changes arose stochastically.
In a similar fashion, we sought to validate the significance of the proteins listed in Table I by mapping them to pre-existing mammalian networks and pathways ( Fig. 4 and Supplemental Table 2). Of the 51 unique proteins specified by the 59 protein forms identified in the pH 4 -7 and pH 5.3-6.5 TGF-␤ time course experiments, 46 mapped to a network of pathways involving TGF-␤1 as a major hub (Fig. 4). Associated with this network were intercalating pathways involving MYC, p53, and the peroxisome proliferative-activated receptor-␣ (PPARA) that in turn affected many proteins that were independently identified in the DIGE experiments (Fig. 4, shaded proteins). Although additional validation is necessary to establish biological significance, the mapping of these proteins to established networks also provides new insight into potential TGF-␤ effectors and pathways that might otherwise have gone unnoticed based solely on the list of proteins presented in Table I.
Maspin and Cathepsin D-Several proteins identified by DIGE/MS as potential new TGF-␤ effectors had demonstrated roles in breast cancer. Maspin, a tumor-suppressing serpin (serine protease inhibitor; Table I, lines 13 and 14, highlighted in Fig. 5, A-D), is expressed in non-tumor mammary epithelial cells but not in most human breast cancer cell lines or primary breast tumors (27). Restoration of maspin expression or treatment with recombinant maspin protein in MDA-MB-435 and MDA-MB-231 cancer cells reduces Rac1 activity and cell invasiveness as well as metastases in nude mice (27,28). We found two isoforms of maspin that were increasing 1.65-1.67fold (65-67% increase, p ϭ 0.003 and 0.0069, respectively) after 40 h of treatment with TGF-␤.
These findings were validated by Western and RT-PCR analyses using a new time course of TGF-␤-induced maspin expression with or without concomitant overexpression of HER2 (Fig. 6). Maspin protein levels were up-regulated by exogenous TGF-␤ in both HER2-overexpressing cells and controls (Fig. 6A), implying that this effect does not depend on high levels of HER2. TGF-␤ treatment induced similar increases in maspin mRNA levels, indicating that the level of regulation was transcriptional (Fig. 6B).
In another example, two charge-related isoforms of cathepsin D, a lysosomal aspartyl endoproteinase, were altered by treatment with TGF-␤ (features 21, 22, and 53; highlighted in Fig. 5, E-H). In this case, both the magnitude and direction of the change differed between the two isoforms of cathepsin D, indicating a change in a charge-altering modification. At 40 h after treatment, the acidic isoform (feature 21) decreased by 34% (1.51-fold decrease, p ϭ 0.021), whereas the basic isoform increased by as much as 195%, or 2.95-fold (p ϭ 0.0000017; Table I, lines 22 and 53). MS and database interrogation unambiguously identified both isoforms as cathepsin D despite low protein expression   Table I (26 proteins for pH 4 -7 and 33 proteins for pH 5.3-6.5). A, principle component analysis discretely clustered the 12 individual Cy3-and Cy5-labeled DIGE expression maps into the four time treatment groups differentiated by two principle components: PC1, which distinguishes 65.5% of the total variance in the analysis, and levels and associated weak signals in the mass spectrometer (Fig. 5, F-H, Table I, and Supplemental Table 1, lines 21, 22, and 53). Although there was no indication in the mass spectral data as to the nature or location of the charged modification, without mass spectral data on a modified peptide, this significant change in protein expression may well have been overlooked in standard peptide-based proteomics analyses (e.g. LC/MS/MS shotgun analysis with spectral counting or stable isotope labeling strategies).
These results were validated by Western analysis using an independent time course of TGF-␤ treatment (Fig. 6C). The antibody used cannot distinguish between isoforms as we did with DIGE/MS. However, an overall increase in cathepsin D levels is predicted if both isoforms are considered collectively, and this increase is evident in the Western blot results. Treatment with TGF-␤ increased the mature 31-kDa isoform in MCF10A/HER2 cells but not in control MCF10A/vector cells, indicating that the observed changes in cathepsin D were dependent on high levels of HER2 signaling.

DISCUSSION
In this study we have shown the utility of the DIGE/MS approach for the quantitative analysis of over 1500 resolved proteins (including modified isoforms) across multiple conditions using TGF-␤ and HER2 signaling as a model system. Analyzing replicate samples enabled statistical confidence to be assigned for each measurement of altered protein expression over multiple variables. Multiplexing samples into the same gel separations along with a pooled sample internal standard allowed for these quantitative measurements to be made with statistical confidence across all samples without interference from gel-to-gel variation while significantly reducing the number of gels necessary compared with conventional 2D gel-based proteomics.
The expression patterns of the proteins identified in this study most likely work in concert to promote phenotypic PC2, which distinguishes an additional 19.4% of the variance. Related samples are encircled for illustrative proposes only. B, similar PCA plot shown for the pH 5.3-6.5 range analysis with PC1 and PC2 distinguishing 81.1% of the variance and 8.3% additional variance, respectively. C, unsupervised hierarchical clustering of the 12 independent samples based on the global expression patterns of 26 proteins in the pH 4 -7 range that are detailed in Table I. Hierarchical clustering of individual samples is shown on top, and clustering of individual proteins is shown on the left with relative expression values displayed as an expression matrix (heat map) using a standardized log abundance scale ranging from Ϫ0.5 (green) to ϩ0.5 (red). The gel number (1-6) and Cy3/5 dye labeling for each sample is listed below, and the protein identifications with corresponding line entries in Table I (lines 1- 26) are listed along the right-hand side. D, similar expression matrix from an unsupervised hierarchical clustering analysis of33 proteins from the pH 5.3-6.5 range that are also detailed in Table I ( Table I were occurring stochastically. In addition, mapping the proteins identified by DIGE/MS to previously characterized networks and pathways revealed new insight into the inter-relationships of these proteins as well as identified additional potential effectors that are members of these pathways but not identified in the DIGE/MS analysis due to a variety of reasons (e.g. low abundance, low molec-ular weight, or basic pI). MYC and PPARA are nuclear transcriptional regulators affecting the expression of target proteins governing cell proliferation and differentiation, adhesion, apoptosis, cell cycle progression, and inflammation responses. It was recently found that p53 promotes the activation of multiple TGF-␤ target genes during Xenopus embryonic development (29). A model has been proposed describing cooperation between p53 and Smads in TGF-␤mediated gene transcription where Smad2 enters into the nuclei and associates with Smad4 and specific cofactors to bind target sequences (30). Thus it is not surprising that many FIG. 4. Networks and pathways associated with proteins identified by DIGE/MS from the TGF-␤ time course study (both pH ranges combined, Table I). Ingenuity Pathway Analysis software (Ingenuity Systems, Inc.) was used to map identified proteins onto existing mammalian pathways and networks that associate proteins based on known protein-protein interactions, mRNA expression studies, and other biochemical interactions established in the literature. Shaded features depict proteins identified in the present study, whereas unshaded features depict additional members of these networks and pathways that were not detected by DIGE/MS. Full names and annotations for the proteins represented in this network are listed in Supplemental Table 2. of the proteins identified by DIGE/MS were secondary effectors of the TGF-␤ pathway.
Several proteins identified in these studies are known targets or downstream effectors of TGF-␤ signaling. For example, proliferating cell nuclear antigen (PCNA), a cell proliferation marker, was down-regulated by TGF-␤ by 1.72-fold (42%, p ϭ 0.003) at 40 h (Table I, line 18). This is consistent with the known proliferation suppressive function of TGF-␤ that is retained in MCF10A/HER2 cells (20). In another example, Hsp27, which was up-regulated in response to TGF-␤ by as much as 4.43-fold (343%, p ϭ 0.000075) at 40 h (Table I, lines 24 and 57), mediates TGF-␤-induced cell motility as demonstrated using small interfering RNA to block TGF-␤mediated cell invasion in human prostate cancer (31). TGF-␤ has also been shown to induce Hsp27 phosphorylation in osteoblast-like MC3T3-E1 cells (32). Determining whether the observed increases found in the current study result from specific phosphorylation or overall protein abundance will require further investigation.
Many novel TGF-␤ or HER2 effectors were also identified in these studies; several of these have demonstrated roles in breast cancer. One example is the tumor suppressor maspin (27), the expression of which has been shown to be regulated by the p53 tumor suppressor family (33,34). Furthermore mutant p53 and aberrant cytosine methylation cooperate to silence maspin expression in cancer cells (35). This relationship between maspin (SERPINB5) and p53 is also reflected in the pathway/network map for TGF-␤ treatment (Fig. 4) where p53 is also shown to modulate the expression of lamin A/C (LMNA), PCNA, Hsp105 (HSPH1), and stress-induced phosphoprotein Hsp70 (STIP1), all of which were independently identified in the DIGE analysis (Table I, lines 3, 18, 31, and 32). We demonstrated that TGF-␤-induced maspin expression was at the level of transcription (Fig. 6B), and subsequent experimentation indicates a direct role for p53 in maspin expression. 2 In another example, cathepsin D, which is implicated in a number of cancers, including breast (Ref. 36, for a review, see Ref. 37), was identified as two isoforms that were differentially expressed, indicating a change in post-translational modification as well as an overall increase in expression (Fig. 5, E-H,  and Fig. 6C). The observed apparent molecular mass and isoelectric point of the basic isoform (features 22 and 53) was consistent with the mature 31-kDa heavy chain that is produced after several rounds of post-translational processing (Ref. 38, predicted pI ϭ 5.5). The substantial shift in pI for the acidic isoform is consistent with the addition of a single negative charge or with the removal of approximately 10 amino acids from the C terminus as proposed to occur in the lysosome (38,39). One interpretation of these results is that not only are cathepsin D expression levels increasing at the later time points but also that the protein is undergoing a differential modification/processing. An alternate interpretation is that although more cathepsin D is being expressed at the later time points no more is being modified/processed into the acidic isoform.
That this type of quantitative change in isoforms may well be overlooked in peptide-based strategies exemplifies the complementary nature of these two proteomics technologies. Any single proteomics strategy provides access only to a subset of the proteome under study, and each has unique strengths and weaknesses. Although DIGE provides statistically significant quantitative information from replicate samples on over 1000 intact proteins and modified isoforms in a single run, very hydrophobic proteins remain difficult to resolve, and proteins outside of the pH and molecular weight ranges will not be surveyed. Using complementary technologies can also provide corroborative evidence. For example, a recent study of TGF-␤ treatment of human lung cancer cells using a global quantitative peptide-based profiling strategy (iTRAQ (isobaric tags for relative and absolute quantitation)) identified several of the proteins described here by DIGE/MS (tropomyosin, Hsp27, EF-Tu, and cofilin) and mapped proteins to similar networks that also spoke to a cellular program of adhesion/invasion known to be up-regulated by TGF-␤ in transformed cells (40).
Although complementary, an overriding limitation that is universal to all proteomics platforms for global scale experiments is the constraint imposed by abundant proteins that limits the total amount of sample extract that can be loaded into the analytical system without compromising resolution. This problem is most often addressed by postfractionation of the sample with commensurate higher protein loads. For DIGE/MS, this is accomplished using medium range (e.g. pH 4 -7, 7-11) and narrow range gradients (e.g. pH 5.3-6.5) that offer greater resolving power and sensitivity (as demonstrated here). In this regard, pH 3-11 gradients offer the lowest resolution and are most biased toward the high abundance proteins despite being inclusive to a wide range of protein pI values.
Newer technologies that couple in-solution isoelectric focusing with molecular weight separations hold great promise for resolving even higher protein loads to access lower abundance proteins while retaining the protein content of all pH fractions for subsequent analysis. Calcitonin and tumor necrosis factor ␣, estimated at levels below 0.1 ng/ml, were recently reported from human serum using this approach after major protein depletion (41). Utilization of the maleimide Cy-Dyes for saturation labeling of cysteine sulfhydryls offers about a 10-fold increased sensitivity, although there are currently only two CyDyes with this chemistry, and generating usable signals in the mass spectrometer usually requires increased material. Combining these technology platforms to maximize their complementary nature while minimizing the  Table I to illustrate the normalized relative abundance change and consistency across the independent replicates for each time point. Expression data are reported as the average percent change of normalized volume ratios for the indicated condition relative to 0 h or control with associated Student's t test p value (calculated using DeCyder version 6.5 software). Similar values were found for the basic isoform (Table I, line 14). B, MALDI-TOF mass spectrum of the trypsin-digested peptides derived from the protein excised from gel 4, feature 13 (arrow in A). The spectrum was internally calibrated to Ͻ20 ppm mass accuracy using the trypsin autolytic peptide ions at m/z 842.51 and 2211.10. The other labeled peptide ions (M ϩ H) were used to generate statistically significant matches to maspin as indicated in Table I and  Supplemental Table 1 (line 13). C, TOF/TOF tandem mass spectrum of the fragment ions of the peptide ion at m/z 1870.93. Fragment ions with the charge retained on the N terminus (b-ions, numbered left to right) and with charge retained on the C terminus (y-ions, numbered right to left) are indicated using the displayed amino acid sequence of the predicted maspin peptide. D, the peptide ion at m/z 1653.81 was not matched to a predicted peptide from the database entry for human maspin, but the TOF/TOF fragmentation pattern was consistent with a predicted maspin peptide containing a missense mutation of Asp to Thr or Ile to Val, neither of which was reported in the existing database entry. E, SYPRO Ruby gel image of gel 4 indicating proteins excised for the identification of cathepsin D isoforms (acidic feature 21 and basic feature 22) that were changing in opposite directions relative to the 0-h samples as indicated. F, MALDI-TOF peptide mass map resulting from the in-gel digestion of the basic isoform feature 53 excised from gel 7, internally calibrated as above. Peptide signals used for the statistically significant identification of cathepsin D (CATD) are labeled and listed in Table I and Supplemental Table 1 (lines 21, 22, and 53). The ion at m/z 1045.57 is derived from trypsin autolysis, and the ions labeled Bg were background ions present in every spectrum. G and H mark two ions for which the TOF/TOF tandem MS/MS spectra are also shown. G, tandem TOF/TOF mass spectrum of the peptide ion at m/z 1462.66 with b-and y-ions annotated for the predicted amino acid sequence as described above. H, similarly annotated tandem TOF/TOF mass spectrum of the peptide ion at m/z 1601.83. Similar MS and MS/MS spectra were acquired for protein features 21 and 22 that were comparable both with respect to m/z values and relative ion intensity albeit with lower scores (Table I and Supplemental Table 1).

FIG. 5-continued
individual weaknesses will undoubtedly be necessary to increase the scope and depth of quantitative differential display proteomics.