Yeast Interspecies Comparative Proteomics Reveals Divergence in Expression Profiles and Provides Insights into Proteome Resource Allocation and Evolutionary Roles of Gene Duplication*

Omics analysis is a versatile approach for understanding the conservation and diversity of molecular systems across multiple taxa. In this study, we compared the proteome expression profiles of four yeast species (Saccharomyces cerevisiae, Saccharomyces mikatae, Kluyveromyces waltii, and Kluyveromyces lactis) grown on glucose- or glycerol-containing media. Conserved expression changes across all species were observed only for a small proportion of all proteins differentially expressed between the two growth conditions. Two Kluyveromyces species, both of which exhibited a high growth rate on glycerol, a nonfermentative carbon source, showed distinct species-specific expression profiles. In K. waltii grown on glycerol, proteins involved in the glyoxylate cycle and gluconeogenesis were expressed in high abundance. In K. lactis grown on glycerol, the expression of glycolytic and ethanol metabolic enzymes was unexpectedly low, whereas proteins involved in cytoplasmic translation, including ribosomal proteins and elongation factors, were highly expressed. These marked differences in the types of predominantly expressed proteins suggest that K. lactis optimizes the balance of proteome resource allocation between metabolism and protein synthesis giving priority to cellular growth. In S. cerevisiae, about 450 duplicate gene pairs were retained after whole-genome duplication. Intriguingly, we found that in the case of duplicates with conserved sequences, the total abundance of proteins encoded by a duplicate pair in S. cerevisiae was similar to that of protein encoded by nonduplicated ortholog in Kluyveromyces yeast. Given the frequency of haploinsufficiency, this observation suggests that conserved duplicate genes, even though minor cases of retained duplicates, do not exhibit a dosage effect in yeast, except for ribosomal proteins. Thus, comparative proteomic analyses across multiple species may reveal not only species-specific characteristics of metabolic processes under nonoptimal culture conditions but also provide valuable insights into intriguing biological principles, including the balance of proteome resource allocation and the role of gene duplication in evolutionary history.

Genome-wide analysis of multiple organisms is one of the most useful approaches to uncover both the conservation and divergence of biological processes with respect to evolutionary history. Transcriptomic studies using DNA microarrays or deep-sequencing techniques have not only characterized the expression profiles of cell-type and -state specific genes involved in many biological processes, the have also revealed the expression patterns of genes involved in a variety among different species (1). Proteomic analysis, for which mass spectrometry is one of the most powerful and widely used analytical techniques (2), can provide much more direct evidence of both conservation and divergence of expression profiles across multiple species.
Interspecies comparative proteomic studies with mass spectrometry have been hitherto carried out in primates (3,4), yeasts (5), and a variety of other model organisms (6 -9). Abundance of the core proteome, for which orthologs have been identified in all organisms investigated, are well conserved across a wide range model organisms, including yeasts, plants, worms, flies, and humans (7). The profiles of orthologous protein abundance appear to be more conserved than profiles of transcript abundance in chimpanzees and humans (4), worms and flies (6), and many other prokaryotes and eukaryotes (8), suggesting that there is more selection pressure on protein expression than transcript expression. The degree to which protein abundance is conserved in fis-sion yeast and budding yeasts is dependent on functional classes of orthologs (5). However, with one exception (4), previous comparative analyses relied on independent analytical platforms and data derived from nonstandardized growth conditions, thus limiting direct comparisons. Aside from studies demonstrating conservation of proteome, the extent to which abundance of orthologous proteins is conserved across different species under various environmental conditions and what biological functions and processes are relevant to specific characteristics of individual organisms have yet to be thoroughly addressed.
Saccharomyces cerevisiae was the first eukaryotic organism for which the genome was completely sequenced (10), making it an ideal model organism for various types of proteomic studies. S. cerevisiae and very closely related species included in the Saccharomyces sensu stricto group (11)known as Crabtree-positive yeasts-can grow under both aerobic and anaerobic conditions and can efficiently convert glucose to ethanol anaerobically during energy production even in the presence of oxygen, when aerobic respiration is repressed by glucose (glucose repression) (12)(13)(14)(15). When glucose is exhausted or when yeasts are grown on nonfermentative carbon sources such as glycerol or ethanol, glucose repression is released leading to activation of gluconeogenesis and oxidative energy production through the tricarboxylic acid (TCA) 1 cycle and oxidative phosphorylation (12)(13)(14)(15). In contrast, most other Saccharomycetaceae yeast speciesknown as Crabtree-negative yeasts-require oxygen for growth and aerobically utilize carbon sources, including glucose, for energy production via catabolism through the TCA cycle and ultimately oxidative phosphorylation (16 -18). In recent decades, the genomes of many yeast species in the Saccharomycetaceae family have been sequenced (19 -24), enabling comparative proteomics across multiple yeast species toward our understanding the proteome profiles involved in differences in metabolic processes.
Additional intriguing differences between Saccharomyces sensu stricto group and Crabtree-negative Saccharomycetaceae yeasts are with respect to genomic architecture. A number of duplicate genes arose in Saccharomyces yeasts through whole-genome duplication, prior to which two yeast clades diverged about 100 million years ago (17, 20 -22, 25-27). Either copy of about 90% of duplicate gene pairs was lost, whereas only about 10% of the duplicate genes have been retained in the S. cerevisiae genome. The duplicate genes in S. cerevisiae can be divided into two groups. The major group comprises duplicate gene pairs exhibiting divergent sequences or differentially regulated expression profiles that have resulted in asymmetric evolution to facilitate the acquisition of new functions or the partitioning of ancestral roles into individual genes during differentiation from the ancestral gene (21,25,26,28). Duplicate gene pairs in the minor group that are characterized by conserved sequences and slow evolution rates (21,26) and include genes encoding many cytoplasmic ribosomal proteins. The evolutionary role and mechanism of retention of duplicate genes with conserved sequences-that is, increasing protein abundance (gene dosage) (29 -32), dosage balance within protein complexes and regulatory networks (33)(34)(35)(36), and amelioration of risks arising from harmful mutations in either duplicate gene (37)(38)(39) -remain controversial (40 -42). Comparative analyses of protein abundance across yeast species are expected to provide insights into whether the presence of conserved duplicate genes confers an evolutionary advantage with respect to protein dosage.
Thus, yeast species in two clades, Saccharomyces sensu stricto group (Crabtree-positive) and Crabtree-negative yeasts within the Saccharomycetaceae family, are ideal model organisms for use in comparative proteomics, especially in regard to elucidating the significance of global molecular differences involved in distinct metabolic processes and the evolutionary role for duplicate genes. Furthermore, employing closely related unicellular organisms enables the use of identical growth conditions, thereby allowing direct comparisons of the proteome. In this study, we conducted an interspecies comparative proteomic analysis of four closely related yeast species (S. cerevisiae, Saccharomyces mikatae, Kluyveromyces waltii, and Kluyveromyces lactis) using label-free quantification of protein abundance via mass spectrometry. Yeasts were grown in either glucose-or glycerol-containing medium to reveal differences in the proteome profiles under fermentative and nonfermentative growth conditions, respectively.

EXPERIMENTAL PROCEDURES
Yeast Strains and Cultivation-S. cerevisiae strain S288C was used in this study. Other yeast strains were obtained from National Bio Resource Project (NBRP, Japan; http://www.nbrp.jp/): Saccharomyces paradoxus strain FSP2-3C (NBRP ID, BY20701), S. mikatae strain IFO1815 (NBRP ID, BY20110), Saccharomyces bayanus strain Su1A (NBRP ID, BY20703), Candida glabrata strain YAT3377 (NBRP ID, BY23876), Saccharomyces kluyveri strain SK125 (NBRP ID, BY21541), K. waltii strain IFO1666 (NBRP ID, BY20700), and K. lactis strain PM6 -7A (NBRP ID, BY21799). After overnight culture in YPD medium (1% bacto yeast extract, 2% bacto peptone, and 2% glucose) or YPG medium (1% bacto yeast extract, 2% bacto peptone, and 3% glycerol) at 30°C, yeast cells were inoculated into fresh medium and cultured in YPD or YPG medium at 30°C in an exponential growth phase. Growth rate was determined by at least triplicate measurements of the optical density at 600 nm at five or more time points during a 6-to 8-hour culture. For protein extraction, yeast cells cultured in YPD medium were harvested at a density of 3ϳ6 ϫ 10 7 /ml after 4.5 h of culture; yeast cells cultured in YPG medium were harvested at a density of 1ϳ2 ϫ 10 7 /ml after 8 h of culture. Cell density was determined using a Burker-Turk hemocytometer with at least triplicate measurements for yeast cells fixed in 5% formaldehyde. Protein extractions for mass spectrometric analyses for each strain and growth condition were performed in biological duplicate experiments (experiments 1 and 2), in which proteins were extracted from independent culture of yeast cells.
Protein Extraction-Yeast cells were collected by centrifugation at 2300 ϫ g at 4°C and then lysed by incubation in 1% ␤-mercaptoethanol and 0.26 N NaOH for 10 min at 4°C, followed by addition of trichloroacetic acid (TCA) to a final concentration of 6.1%. After 30 min of incubation at 4°C, proteins and cell debris were collected by centrifugation at 20,400 ϫ g for 5 min at 4°C. The resulting pellet was washed with ice-cold acetone and allowed to dry at room temperature for 30 min. Proteins were extracted from the pellet by the addition of dissolving buffer (0.1 M Tris-HCl (pH 8.5), 1% SDS, and 50 mM DTT), followed by sonication for 10 min and subsequent incubation at 95°C for 10 min. After vortexing for 30 min to thoroughly dissolve extracted proteins, cell debris was removed by centrifugation at 20,400 ϫ g for 5 min at room temperature. The concentration of protein in the resulting supernatant was determined by Bradford assay (Bio Rad, Hercules, CA) using bovine serum albumin as a standard.
Tryptic Digestion and Peptide Fractionation-A total of 120 g of proteins was digested using trypsin, and the resulting peptides were fractionated by gel-free isoelectric focusing. The protein solution was diluted to 200 l by addition of 0.1 M Tris-HCl buffer (pH 8.5). Proteins were precipitated by methanol/chloroform precipitation method as described below. After the sequential addition of 600 l of methanol, 150 l of chloroform and 450 l of distilled water and vortexing at each addition, the solution was then centrifuged at 13,000 ϫ g for 5 min at 4°C. The top aqueous layer was removed and 500 l of methanol was added and the sample was vortexed. After centrifugation at 20,400 ϫ g for 10 min at 4°C, the precipitated proteins were dried for 2 min using a SpeedVac system (EYELA, Japan). Proteins were then re-dissolved by addition of 30 l of 0.1 M Tris-HCl buffer (pH 8.5) containing 8 M urea and vortexing for 1 h. For cysteine reduction, DTT (final concentration, 5 mM) was added and the sample was incubated for 1 h at 37°C, after which cysteine alkylation was performed by addition of iodoacetamide (final concentration, 10 mM) and incubated for 1 h at room temperature with in the dark. Next, 30 l of 0.1 M Tris-HCl buffer (pH 8.5) and 60 l of distilled water were added to reduce the urea concentration to 2 M, and then sequencinggrade modified trypsin (Promega, Wisconsin, WI) was added at a substrate to enzyme ratio of 20:1 and the sample was incubated for 15-18 h at 37°C.
A total of 100 g of the yeast tryptic peptide mixture was separated into 24 fractions using a model 3100 OFFGEL fractionator of isoelectric focusing system (Agilent technologies, Santa Clara, CA) according to the manufacturer's protocol. Briefly, the sample of tryptic peptides dissolved in 0.05 M Tris-HCl buffer (pH 8.5) containing 2 M urea was diluted 36-fold with peptide OFFGEL stock solution (Agilent technologies), and the sample was loaded into the wells (ϳ4 g/well) of a 3100 OFFGEL High-Res kit, pH 3-10 (Agilent technologies), followed by overnight electrophoresis at a current of 50 A. Separation was completed upon reaching 50 kV-h. The fractionated peptides contained in each well were collected and stored at Ϫ80°C until analyzed by mass spectrometry.
Mass Spectrometric Analysis-Tryptic peptides were analyzed by data-dependent liquid chromatography-tandem mass spectrometry (LC-MS/MS) using a Dina nano LC system (KYA technologies, Tokyo, Japan) and an LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific, Waltham, MA), as follows. The peptide mixture was loaded onto a 1 mm ϫ 0.5 mm (internal diameter) trapping column packed in-house with 5-m C 18 reverse-phase particles (Mightysil, Kanto chemical, Tokyo, Japan). Peptide were eluted from the trapping column and separated using a 5 cm ϫ 150 m (internal diameter) fused silica capillary column packed in-house with 3-m C 18 reverse phase particles (Mightysil, Kanto chemical). Mobile phase A consisted of 0.1% formic acid and mobile phase B consisted of 0.1% formic acid/80% acetonitrile. Peptides were eluted from the capillary column at a flow rate of 200 nl/min using a 150-min gradient, as follows: from 0 to 40% solvent B over 120 min, from 40 to 50% solvent B over 15 min, and from 50 to 100% solvent B over 15 min. Eluted peptides were directly introduced into the mass spectrometer by nano electrospray ionization at 1.4ϳ1.6 kV. Precursor ions were detected over the m/z range 400 to 1500 in the Orbitrap with a resolution 60,000 and maximum injection time of 500 ms. The four highest-intensity ions with a minimum signal of 500 counts were selected for MS/MS analysis via collision-induced dissociation in data-dependent mode, with dynamic exclusion of 2 min and 10-ppm mass window allowing a 500-ion exclusion list. The MS/MS scans were performed in the LTQ linear ion trap in centroid mode, with an isolation window of 2.0 m/z, normalized collision energy of 35, and maximum injection time of 10 ms.
Acquired MS/MS spectra were searched against a database containing fasta files of ORF amino acid sequences for each yeast species investigated using the SEQUEST algorithm implemented in Proteome Discoverer software version 1.3 (Thermo Fisher Scientific), as described below. Databases for S. cerevisiae and S. mikatae were downloaded from the Saccharomyces Genome Database (http:// www.yeastgenome.org/). The S. cerevisiae database version was released 3-Feb-2011 and contained 6717 entries, including unambiguous 5,852 ORFs, 775 dubious ORFs, and 90 retrotransposons. The S. mikatae database (19) version was released 14-Dec-2004 and contained 9057 entries, including ORFs for which dubious S. cerevisiae ORFs or retrotransposons are the only orthologs and entries having internal stop codons (described below). The K. waltii database was downloaded from supplementary data reported by Kellis et al. (21) and contained 5230 entries, including ORFs for which dubious S. cerevisiae ORFs or retrotransposons (described below) are the only orthologs. The K. lactis database of 10-Sep-2008 release date was downloaded from the Gé nolevures hemiascomycete yeast database (22,24) (http://www.genolevures.org/) and contained 5,076 entries.
For eight proteins (supplemental Table S1) encoded in the mitochondrial genome, coding DNA sequences of the S. mikatae and K. waltii orthologs were obtained by Basic Local Alignment Search Tool (BLAST) searching of the S. cerevisiae protein sequence as the query against the whole-genome shotgun contigs (wgs) and nonredundant nucleotide collection (nr/nr) database in GenBank (supplemental Table S1). Protein sequences were generated from the obtained coding sequences using the yeast mitochondrial codons (43) and subsequently added into the ORF databases described above. Given that the sequences of S. cerevisiae and S. mikatae orthologs are highly similarity, eight protein sequences encoded in the S. cerevisiae genome were also added into the S. mikatae database. Mitochondrionencoded protein sequences of K. lactis were obtained from the complete sequence of the mitochondrial genome (44) (EMBL accession no. AY654900) and added to the K. lactis ORF database. The protein sequences of keratin and trypsin were also added to each database as contaminants or artificially included proteins.
MS/MS spectra with a minimum total ion intensity of 100, 10 fragment ions, and corresponding precursor ions with a signal to noise ratio greater than 3, were subjected to database searching. The charge state of the precursor ions was automatically recognized. Tryptic peptides in the mass range 800 to 4500 m/z with up to two missed cleavages were considered for calculations of theoretical peptide mass. Mass tolerances for precursor and fragment ions were set at 10 ppm and 1 Da, respectively. Modification-dependent mass shifts were allowed as follows: carbamidomethylation (ϩ57.021) of cysteine as a static modification and oxidation (ϩ15.995) of methionine as a dynamic modification. False discovery rates (FDRs) based on a decoy database were calculated using the Proteome Discoverer software. Lists of peptide spectra matches (PSMs) and ORF names were obtained by filtering out low-confidence PSMs to fulfill the criteria of a FDR of less than 1%. The number of PSMs (PSM count) for each ORF was summed across all 24 isoelectric focusing fractions and the data were merged into single data set for ORFs and their PSM counts. The mass spectrometry proteomics data, including raw data, spectral files, search engine output files, and their excel files containing search result data meeting FDR criteria (Ͻ1%), have been deposited to the ProteomeXchange Consortium (45) via the PRIDE partner repository with the dataset identifier PXD001865 through PXD001868. Originally identified all ORF names and groups detected in biological duplicate experiments (experiments 1 and 2, 16 analyses) were listed in supplemental Table S2 along with the numbers of PSMs and distinct peptides assigned to each ORF and sequence coverage of detected peptides. Only for S. cerevisiae, we acquired proteomic data of analytical duplicate (experiment 3, 2 analyses), in which mass spectrometric analyses was carried out independently for the extracted proteins that is identical to experiments 2, also listed in supplemental Table S2. In some cases, the accuracy of sequence and ORF annotations seem to be reduced because of insufficient depth of sequencing of the S. mikatae genome (19). Approximately 2000 ORFs do not have a stop codon at the 3Ј end, which means that truncated rather than full-length ORF sequences are contained in this database. Thus, if each ORF is located side-by-side on the S. mikatae genome and has an identical S. cerevisiae ortholog (described below), the ORF names were converted into a single merged ORF name (supplemental Table S3), resulting in the conversion of 1320 original ORFs into 613 merged ORFs.
Ortholog Relationship Construction and Generation of Orthologous Groups-Both bidirectional ortholog relationship data sets (related species to S. cerevisiae and S. cerevisiae to related species) for S. mikatae and K. waltii that included all matched proteins were downloaded from the Broad Institute download site (19) (http://www. broadinstitute.org/annotation/fungi/comp_yeasts/downloads.html) and from supplementary data reported by Kellis et al. (21), respectively. A data set of orthologous protein families for nine yeast species was downloaded from the Gé nolevures hemiascomycete yeast database (22,23) (http://www.genolevures.org/proteinfamilies.html), from which all ortholog relationships between S. cerevisiae and K. lactis were retrieved. We reconstructed these ortholog relationship data sets, including both one-to-one and one-to-many connections, to make them amenable to consistent comparisons of ortholog protein abundance across all four yeast species examined, because the original data sets were independently created by individual research groups based on different criteria. All pair-wise protein sequence similarities between S. cerevisiae and each related species were bidirectionally searched using a locally installed BLAST. Among the original orthologous connection(s) to each query protein, the best match having the highest e-value was selected. Bidirectional best match selection was carried out in a manner analogous to that previous reported (46), such that no orphan proteins would arise even if the connected protein was not the best match in either of the one-directional BLAST searches. Selected bidirectional connections were integrated to construct three orthologous relationship data sets between S. cerevisiae and each related yeast species, including the both best match and not-best match connections (supplemental Tables S4 through S6). These three data sets were used for connecting orthologous proteins identified in the related yeasts to S. cerevisiae proteins for comparisons of protein abundance within and across yeast species.
Data regarding the numbers of protein entries with and without ortholog connections and their redundancy (the number of orthologous connections), from which we excluded ORFs having aberrant stop codons (internal presence or terminal absence) and orthologous proteins corresponding to dubious S. cerevisiae ORFs or retrotrans-posons, are summarized in supplemental Fig. S1. About 85% of S. cerevisiae proteins had orthologous connections to proteins in the other three yeast species. More S. cerevisiae proteins had connections to more than one ortholog in S. mikatae than in K. waltii and K. lactis, indicating that the S. mikatae protein sequence database contains many truncated ORFs, as described merged ORFs listed in supplemental Table S3. Approximately 10% of the K. waltii and K. lactis proteins had connections to more than one S. cerevisiae ortholog, in line with the proportion of duplicate genes retained in the S. cerevisiae genome (21,22,26). Compared with K. waltii and K. lactis, a greater number of S. mikatae proteins had no orthologs, probably because of less precise annotation of the ORFs in this species.
Between 5 and 15% of the proteins in each species examined had redundant orthologous connections, thus complicating comparisons of the protein abundance of orthologs. For global comparisons, It would be seem to be much simpler to compare the sum of the abundance of a given orthologous protein group among the species examined. We therefore used Perl script to create orthologous groups for S. cerevisiae and the other three yeast species using the three orthologous relationship data sets. Orthologous connections were grouped over all four species if any connectivity existed between the orthologs. In the first step, any two nodes (proteins) with an orthologous connection were clustered into the same group. In the next step, orthologous groups having overlapping nodes were clustered again into the same group. This clustering process was repeated until the number of total groups no longer decrease. Consequently 4857 orthologous groups were created, including 796 groups deficient in at least one yeast species ortholog (supplemental Table S7; supplemental Fig. S2A). Orthologs of K. waltii and K. lactis were absent in ϳ10% of the orthologous groups (552 and 590 groups, respectively) (supplemental Fig. S2B). Among the 4061 groups comprised of protein orthologous to all four species, 3279 groups had a single ortholog in every yeast species, and the remaining 782 groups contained redundant orthologs (supplemental Fig. S2B). The number of orthologous groups including two orthologs for S. cerevisiae or S. mikatae was about 10% (510 and 515, respectively) of the total number of groups (supplemental Fig. S2C), consistent with the number of reported duplicate gene pairs in S. cerevisiae (21,22,26).
Determination and Comparison of Orthologous Protein Abundance-Protein abundance was determined using a label-free method based on spectral count as relative copy number (47)(48)(49). Details of determination procedures in this study were described below. PSM counts shared by more than one protein entry groups were rearranged according to the following PSM count distribution rules (50). If no of unique peptides were detected for all sharing proteins, the PSM count was directly assigned to the group of sharing proteins without any change. If unique peptides were detected for at least one protein, the PSM counts shared by more than one protein were distributed according to the ratio of the PSM count of the unique peptide originally assigned to each protein. Thus, the PSM count for a given peptide that was not unique to a single protein was distributed according to the approximate abundance of each protein sharing that peptide. The PSM count for each protein was normalized by observability of tryptic peptides in our mass spectrometric analysis. The number of observable peptide ions in the m/z range 400 -1500 with a ϩ2 or ϩ3 charge state was calculated for each protein from ORF database used for database search. The PSM count was divided by the calculated number of observable peptide ions to produce an abundance index. Finally, to account for variations in the total PSM counts for each data set resulting primarily from variability in LC-MS/MS analyses, the abundance index was divided by the sum of the indices for each data set comprised of the 24 isoelectric focusing fractions to create an "abundance score" corresponding to the relative copy number. All of the proteome data including PSM counts, abundance indices, and abundance scores for each yeast species cultured with each carbon source are presented in supplemental Table S8, including biological duplicate experiments (experiments 1 and 2, 16 analyses) plus an analytical duplicate for S. cerevisiae data (experiment 3, 2 analyses).
The relative copy number (abundance score) of each protein with an S. cerevisiae ortholog was determined and compared in two different ways to evaluate differences in the abundance of individual proteins and orthologous groups. In group-to-group comparisons within and across yeast species, the abundance of the orthologous group was determined by summing the abundance scores of the proteins belonging to a certain group, using the list of orthologous groups described above (supplemental Table S7). For analysis of protein-to-protein differences, protein abundance of related yeast species was represented using the S. cerevisiae orthologous protein name. The determination of a given protein's abundance was dependent on the uniqueness of the protein's identification and the redundancy of the orthologous relationship. In the case of unique peptide-based protein identifications and also one-to-one orthologous relationships, the abundance score was simply recognized as indicative of orthologous protein abundance, according to the list of orthologous relationships (supplemental Tables S4 through S6). In contrast, when protein identifications were not based on unique peptides (e.g. Tef1p/Tef2p) or in the case of S. cerevisiae redundant orthologs, the abundance score was directly inherited to individual orthologous proteins, with the result that each orthologous protein had an original and identical score. In the case of ortholog redundancy in the yeast species related to S. cerevisiae, the sum of the scores of redundant orthologs served as an indicator of orthologous protein abundance. Only in the case of the determination of individual orthologous proteins that showed differential expression in an intraor interspecies comparison, in order to avoid a PSM count of zero, which would not be amenable to quantitative comparisons on a logarithm scale, "1" was added the PSM count for all nondetected orthologous groups or proteins to create modified PSM count. Nondetected orthologous proteins were not used for the determination of correlation coefficients in a variety of proteome data comparisons, calculations of cumulative protein abundance for each functional category, and comparisons for expression levels of proteins encoded by duplicate genes. Quantitative proteomics data regarding S. cerevisiae orthologs, including PSM counts, modified PSM counts, abundance indices, and abundance scores are provided in supplemental Tables S9 (individual proteins) and S10 (orthologous groups).
To develop criteria for determination whether proteins or orthologous groups were differentially expressed between the species examined or in the same species cultured on different carbon sources, we investigated the extent of variation in protein abundance in biological duplicate analyses of the S. cerevisiae proteome. A high correlation with respect to protein abundance was observed for both independent biological experiments and analytical measurements of S. cerevisiae grown on glucose-and glycerol-containing media (supplemental Fig. S3). Despite high overall reproducibility, a higher degree of variability was noted in the duplicate analyses for proteins with lower abundance scores, owing to stochastic fluctuation in PSM counts for low-abundance proteins. Therefore, for each range of PSM counts, we calculated the standard deviation as an indicator of the variation in protein abundance measurements within the biological duplicates. The extent of variation as indicated by standard deviation showed a clear negative correlation with PSM count in the lower range (i.e. less than 4 as logarithm with base 2). In contrast, the standard deviation in the higher range of PSM count was almost constant (supplemental Fig. S4A). The standard deviations for each PSM count range from the two data sets (S. cerevisiae grown on glucose-and glycerol-containing media) were averaged to generate a fitting curve for the lower and higher PSM count ranges (Supplemental Fig. S4B). The resulting equation allowed us to estimate the extent of protein abundance variation attributable to noise derived from the lower PSM counts. For pair-wise comparisons between each proteome data set, the logarithm value of the abundance ratio for individual proteins or orthologous groups was divided by the standard deviation calculated as a function of their averaged PSM counts (using the fitting curve shown in supplemental Fig. 4B), to produce a "standardized z-score" (supplemental Tables S11 and S12). Proteins or orthologous groups that exhibited the absolute value of a standardized z-score greater than 2 for a given comparison between different species or growing conditions in both biological duplicate experiments (experiments 1 and 2) with reproducibility were considered to be differentially expressed. Less than 2% of proteins were erroneously identified as differentially expressed between biological replicates (1.4% for S. cerevisiae grown on glucose, 1.8% for S. cerevisiae grown on glycerol), ensuring unambiguous identification of differentially expressed orthologous proteins.
Gene Ontology (GO) Term Enrichment Analysis of Differentially Expressed Proteins-To better understand functional differences of the orthologous protein groups that were differentially expressed in the distinct yeast clades, those proteins that reproducibly passed the differential expression criteria (the absolute value of standardized z-score greater than 2; supplemental Table S12) in both two experiments 1 and 2 (biological duplicate experiments) were clustered into some classes. To examine differences associated with growth conditions, orthologous groups of proteins that were differentially expressed in all four yeast species, in both Saccharomyces species, or in both Kluyveromyces species were categorized into three classes (Supplemental Table S13A-S13C). To examine interspecies differences, we generated nine classes including orthologous groups of proteins that were differentially expressed between both Saccharomyces species and both Kluyveromyces species (i.e. Kluyveromyces versus Saccharomyces), or between both Saccharomyces species and K. waltii ((i.e. K. waltii versus Saccharomyces) or K. lactis ((i.e. K. lactis versus Saccharomyces) when grown on glucose or glycerol (supplemental Table S13D-S13I). In order to find out the functional categories of proteins that are enriched in these classes, the names of S. cerevisiae ORFs included in the respective orthologous groups were subjected to GO term enrichment analysis.
Significant enrichment of GO terms related to biological processes or cellular components was found in three classes for analyses of differences in protein expression between yeast cells grown on the two carbon sources (supplemental Table S14A-S14C). In the case of interspecies differences, GO terms were enriched in five classes except for one (supplemental Table S14D-S14I). The representative GO terms were extracted from all the lists by removing redundant functional categories and rough GO terms lacking details regarding specific functions (Tables I and II).
Statistical Analysis-For statistical analyses of differences in cumulative protein abundance and the fraction of protein mass, twotailed t test was performed assuming equal variance of data sets being compared. Statistical significance of differences in protein abundance encoded by duplicate gene was determined by two-tailed Mann-Whitney U test, as quantitative values in each data set did not have normal distribution. All data points were included for statistical analyses and p values less than 0.05 were considered as a cutoff of significant difference.

Differences in Growth Rate on Fermentative and Nonfermentative Carbon
Sources among Yeast Species-We examined the growth rate on glucose and glycerol of multiple yeast species, including Saccharomyces sensu stricto (Saccharomyces cerevisiae, S. paradoxus, S. mikatae, and S. bayanus) and related yeast species within the Saccharomycetaceae family (Candida glabrata, Saccharomyces kluyveri, Kluyveromyces waltii, and Kluyveromyces lactis). Irrespective of yeast group, aerobic mitochondrial respiration is required for growth on the nonfermentative carbon source glycerol, which results in a shift in the metabolic profile between the fermentative and nonfermentative conditions. Even though some differences were observed across species, the growth rate on glucose was not associated with phylogenic distance among the eight yeast species examined (Fig. 1). In contrast, when grown on glycerol as a nonfermentative carbon source, all four sensu stricto yeasts showed significantly slower growth rate as compared with the other four yeast species within a different clade, including the Crabtree-negative species (K. waltii, and K. lactis) (Fig. 1, discrimination analysis, p Ͻ 0.05). Among the sensu stricto yeasts, the growth rate of S. cerevisiae and S. mikatae on glycerol was much slower than that of the other sensu stricto species. Thus, we chose the sensu stricto species, S. cerevisiae and S. mikatae, and the Crabtree-negative yeasts, K. waltii and K. lactis, that grow via aerobic respiration to examine proteome changes when cells are grown on different carbon sources.
Evaluation of the Scale and Feasibility of Quantitative Proteomic Analyses of Multiple Yeast Species-Peptides derived from proteins extracted from yeast cultures were pre-fraction-ated via OFFGEL isoelectronic focusing prior to data-dependent LC-MS/MS analysis to achieve large-scale proteome identification. The total number of PSMs and the number of detected proteins integrated from the 24 OFFGEL fractions in each proteome data set are summarized in supplemental Fig.  S5. In almost every data set, we successfully detected 2400 -3500 proteins (two or more PSMs per protein for 1700 -2900 of these proteins), to which 17,000 -56,000 PSMs and 9000 -22,000 peptides were assigned, resulting in 50 -70% coverage of the yeast proteome.
We also compared the changes in protein abundance in S. cerevisiae grown on either glucose-and glycerol-containing media with previously reported data regarding fold-change in mRNA expression as measured by microarray analysis (51). On a global scale, only weak correlation was observed between change in protein and mRNA abundance (supplemental Fig. S6A), as was previously reported (52). In contrast, when the comparison was limited to certain functional groups of proteins involved in central carbohydrate metabolism, oxidative phosphorylation, and fatty acid oxidation, good correlation were observed between protein and mRNA (supplemental Fig. S6B). In particular, the abundance of metabolic proteins involved in the TCA and glyoxylate cycles, gluconeogenesis, utilization of carbon sources other than glucose, and fatty acid oxidation increased in both S. cerevisiae and S. mikatae when grown on glycerol (supplemental Fig. S7A), consistent with previous transcriptome (51,53) and mitochondrial proteome (54) analyses, demonstrating that the labelfree quantitative proteomic approach employed in this study was well-suited to investigation of differences in proteome expression profiles under distinct culture conditions and between different yeast species.
The spectral counting label-free quantitative approach is based on the number of detectable ions of tryptic peptides derived from each protein. Therefore, differences between orthologous proteins with respect to the number of theoretical tryptic peptides and their physicochemical properties attributable to variations in protein sequence may cause noise that confounds quantitative comparisons. About 80% of the orthologous proteins of S. mikatae shared more than 80% identical amino acid sequences with those of S. cerevisiae, whereas the degree of sequence identity of the major (i.e. over 70%) orthologous proteins of K. waltii and K. lactis (as compared with those of S. cerevisiae) were within the range 30 to 70% identity (supplemental Fig. S8). To investigate the adverse effect of sequence variance on quantitative analyses, the extent of differences in protein abundance between S. cerevisiae and the other yeast species was compared with amino acid sequence identity. We observed very weak or no significant correlation between protein sequence identity and the extent of differences in protein abundance among orthologs of different species (supplemental Fig. S9), indicating the feasibility of interspecies quantitative comparisons and the absence of bias attributable to sequence variations. FIG. 1. Comparison of the growth rate of S. cerevisiae and related yeast species on two carbon sources. Yeast cells were grown at 30°C on medium containing either glucose (2%) or glycerol (3%), and growth rate was determined during exponential growing phase. Growth rate is reported as the average and standard deviation of three or more independent replicate experiments. The eight yeast species examined are arranged according to their phylogenetic relationships. The cladogram is based on a phylogenetic tree constructed using the sequences of more than100 genes (27). On the basis of the growth rate on glycerol, eight yeast species were significantly separated into two groups (discrimination analysis, p Ͻ 0.

Interspecies Diversity in the Proteome Profiles and Conserved Parallel Expression Changes across All Four Yeast
Species-In this study, global protein abundances were examined in four yeast species grown on glucose-and glycerolcontaining media, which allowed us to assess both the degree of variance in the proteome expression profiles across different yeast species and the extent of changes in protein expression levels between different growth conditions. A positive correlation for protein abundance was observed between the different yeast species under the same growth conditions ( Fig. 2A, Pearson correlation coefficient ϭ 0.76ϳ0.86), consistent with previous reports (4, 6 -8). Unexpectedly, the correlation for protein abundance in the same species under two different growth conditions (Pearson correlation coefficient ϭ 0.85ϳ0.92; Fig. 2A) tended to be higher than that between the different yeast species, suggesting that individual species adapt to changes in available carbon sources, with minor and/or species-specific alterations of the proteome profiles. To rule out the possibility that identity or high similarity of tryptic peptide observability could have led to higher correlations for the calculated protein abundances between the same and very closely related yeast species, protein abundance scores were recalculated based on the number of observable tryptic peptide ions in amino-and carboxyl-terminal halves of each protein. Comparisons of protein abundance between one half and the other eliminate the effect that similarity in the protein sequences and tryptic peptide observability would have on the extent of correlation, because identical and very similar peptides are never employed in the calculation of correlation coefficients. In unbiased comparisons not affected by sequence similarity, it was also observed that intra-species correlations were higher than interspecies correlations (supplemental Fig. S10). The proportion of interspecies differentially expressed orthologous groups of proteins between different species that reproducibly met the differential expression criterion (the absolute value of standardized z-score greater than 2; supplemental Table S12) in both biological replicates (experiments 1 and 2) was greater than the proportion of intra-species differentially expressed proteins between the two growth conditions (Fig. 2B).
With respect to the central carbohydrate metabolism and fatty acid oxidation pathways, only a few orthologous proteins (involved in glycerol catabolism, the glyoxylate cycle, gluconeogenesis, and fatty acid oxidation) showed parallel changes in expression between glucose-and glycerol-dependent growth conditions across all four yeast species (supplemental Figs. S7A and S11A). The expression of five of eight orthologous groups of proteins involved in fatty acid catabolism (␤-oxidation and acetyl-CoA transport into mitochondria) was up-regulated in cells of all four yeast species when grown on glycerol ( Fig. 3A; supplemental Figs. S7A and S11A), suggesting an increase in efficiency in the use of fatty acids as an energy source. We found no additional enriched functional categories exhibiting the same expression profile over the four yeast species by GO term enrichment analysis (Table I). Our preliminary global comparisons of protein abundance in four yeast species indicate intrinsic interspecies diversity in the proteome profiles and species-specific differences in protein expression levels as well as conserved expression changes across all four yeast species.

FIG. 2. Inter and intra-species differences in proteome expression profiles.
A, Orthologous group protein abundance (abundance score in Supplemental Table S10) was compared with calculate pair-wise correlation coefficients of proteome expression profiles for four yeast species grown on glucose or glycerol media. The values in the diagonal line indicate correlation coefficients for biological duplicates of the same species and conditions. The upper and lower values in each cell represent Pearson correlation coefficients (r) in experiments 1 and 2 (biological duplicate experiments), respectively. Orthologous proteins not detected in either data set being compared were excluded for the determination of correlation coefficients. B, We determined differentially expressed orthologous groups that showed the absolute value of a standardized z-score greater than 2 with reproducibility in both biological duplicates, experiments 1 and 2, for a given comparison of proteome expression profiles. The number of differentially expressed orthologs was divided by the number of the entire proteins detected in at least either of biological duplicate experiments, to determine the proportion of differentially expressed orthologous groups. Abbreviated names of each yeast species are same as in Fig. 1.

Proteins Differentially Expressed Only in Saccharomyces
Species When Grown on Different Carbon Sources-In the two Saccharomyces species examined in this study, the expression of enzymes involved in the TCA cycle and oxidative phosphorylation (enriched as mitochondrion-localized proteins) was up-regulated when yeast cells were grown in glycerol-containing medium (Table I; supplemental Fig. S7A), which are well-known expression patterns in S. cerevisiae grown on nonfermentative carbon sources (12)(13)(14)(15). With respect to proteins involved in carbohydrate metabolism (Table  I), the expression of metabolic enzymes that catalyze a variety of carbon sources was up-regulated in cells grown on glycerol (supplemental Fig. S7A), as previously reported (12,15,55). In contrast, the abundance of catabolic enzymes specific for only one or two carbohydrates was increased in Kluyveromyces species (supplemental Fig. S11A), suggesting that Saccharomyces species are capable of utilizing a wider variety of carbohydrates during growth via aerobic respiration on nonfermentative carbon sources than Kluyveromyces.
Species-specific Expression Profiles in Each Kluyveromyces Yeast-To uncover which biological processes are up-or down-regulated in each yeast species when grown on glucose or glycerol, we summed the relative copy numbers (abundance scores) for each subcellular-localization (56) and biological function (GO slim term for biological process). Approximately half of the total proteome copy number was localized in the cytoplasm in all four yeast species examined in this study (supplemental Fig. S12A). The secondarily and thirdly protein-rich organelles were mitochondria and nucleus, which accounted for 8 to 17% of the entire protein copy number in cells (supplemental Fig. S12A). The top three biological functions in three yeast species except for K. lactis, each of which shared about 6 to 22% of the total proteome copy number, were carbohydrate metabolic process, amino acid metabolism, and cytoplasmic translation (supplemental Fig. S12B).
The cumulative relative protein copy number of the peroxisome was markedly increased in K. waltii cells grown on glycerol compared with cells grown on glucose (Fig. 4A, twotailed t test, p Ͻ 0.01), resulting in a significant difference in the peroxisome-associated protein copy number between K. waltii and K. lactis when utilizing glycerol (Fig. 4A, two-tailed t test, p Ͻ 0.01). With respect to peroxisome-associated metabolic processes, the abundance of five proteins (except for Aco1/2p) involved in the glyoxylate cycle was increased in K. waltii cultivated on glycerol medium compared with glucosedependent growth conditions ( Fig. 5A; supplemental Fig. S11A). In all four species, the total abundance of enzymes involved in the glyoxylate cycle was the highest in K. waltii grown under the nonfermentative culture condition (Fig. 5B, two-tailed t test, p Ͻ 0.01). In addition to the glyoxylate cycle enzymes that play a role in supplying metabolites for gluconeogenesis, the expression of Pck1p, a key gluconeogenesis enzyme, was more abundant in K. waltii than in Saccharomyces species when cells were grown on glycerol (Figs. 5A and 6). Thus, in K. waltii, metabolic processes involved in replenishment for intermediate metabolites or glucose appear to be up-regulated compared with other yeast species when grown on glycerol, leading to a higher potential for active growth of K. waltii in nonfermentative culture conditions compared with Saccharomyces species (Fig. 1).
When K. lactis was grown on glucose, more functional categories of proteins exhibited differential expression profiles as compared with Saccharomyces species (enriched GO terms in Table II). In K. lactis, the abundance of mitochondrion-localized proteins, including metabolic enzymes involved in the TCA cycle and oxidative phosphorylation, was higher than in Saccharomyces species when cells were grown on glucose (Fig. 4B, two-tailed t test, p Ͻ 0.01; Fig. 5C, twotailed t test, p Ͻ 0.05; supplemental Figs. S13 and S14). When grown on glycerol, however, no significant differences in pro-

FIG. 3. Conserved and divergent expression profiles across four yeast species.
A, Conserved expression differences between two growth conditions. Heat map shows the relative copy number (abundance scores) of orthologous proteins involved in fatty acid catabolism for different yeast species grown on glucose or glycerol. White indicates orthologous groups or proteins not detected. The numerals (1 and 2) at top indicate the experiment number of biological duplicates. Asterisks denote orthologous proteins differentially expressed on glucose and glycerol in all four yeast species examined. B, Kluyveromyces-specific differentially expressed proteins. Relative copy number (abundance scores) of orthologous proteins that function in general transport and sorting of mitochondrial proteins are shown for different yeast species grown on glucose or glycerol. Results for experiments 1 and 2, which represent biological duplicate experiments, are indicated by left and right side-by-side bars of the same color, respectively. Abbreviated names of each yeast species are same as in Fig. 1. tein abundance of these functional groups were observed relative to Saccharomyces species. Functional category of the cytoplasmic translation exhibited the highest protein abundance in K. lactis grown on glycerol (Fig. 4C, two-tailed t test, p Ͻ 0.05). It is noteworthy that of the remaining two of the top three categories except for K. lactis, the total protein copy number of proteins related to carbohydrate metabolic process in K. lactis was two-to threefold lower than in the other yeast species (Fig. 4D, two-tailed t test, p Ͻ 0.01 (versus S. cerevisiae) or p Ͻ 0.05 (versus S. mikatae and K. waltii)). This functional category was found to be enriched GO term for a protein group that exhibited differential expression between Saccharomyces species and K. lactis (Table II; supplemental  Table S14I). In particular, metabolic enzymes that function in glycolysis and ethanol fermentation were less abundant in K. lactis grown on glycerol than in the other yeast species (Figs.

FIG. 4. Differences in cumulative protein abundance categorized by functional category for yeast cells grown on different carbon sources and for different species.
Cumulative relative copy number (abundance scores) determined for each protein functional category (A-D) derived from the entire classification data set (supplemental Fig. S12). Distinct profiles are evident across different yeast species grown on glucose-or glycerol-containing media. Numerals (1 and 2) along the horizontal axis denote experiment number of biological duplicates. Abbreviated names of each yeast species are same as in Fig. 1.  6 and 7A), resulting in a more than fourfold reduction in the total copy number of proteins in this group in K. lactis relative to the other yeast species (Fig. 7B, two-tailed t test, p Ͻ 0.01 (versus S. cerevisiae) or p Ͻ 0.05 (versus S. mikatae and K. waltii)). In contrast to K. waltii, the protein abundance related to major metabolic processes, glycolysis and ethanol metab-  olism, was reduced in glycerol-utilizing K. lactis (supplemental Fig. S11), which shared a substantially portion of the total proteome copy number in the other yeast species. This finding suggests that down-regulation of unnecessary expression of a set of enzymatic proteins, involved in glycolysis and ethanol metabolism, would lead to higher growth rate of K. lactis under the nonfermentative aerobic growing conditions compared with Saccharomyces species (Fig. 1).

Balance of Proteome Resource Allocation between Carbohydrate Metabolism and Protein
Translation-Maintaining the proteome consumes a great deal of cellular resources, such as amino acids and their charged tRNAs, ribosomes and associated translation factors, protein folding and quality control molecules, and proteins involved in the ubiquitin-proteasome degradation system and autophagy. In addition, cellular resources in the form high-energy compounds (e.g. ATP and GTP) are also consumed during the biosynthesis of these biomolecules and during the protein translation, folding, and degradation steps themselves. We assumed that the amount of proteome resources (including the above-mentioned biomolecules and high-energy compounds) consumed for each protein is proportional to protein mass rather than copy number. As such, protein mass was calculated by multiplying the relative copy number (abundance score) by the number of amino acid residues in the protein. The fraction of a given protein's mass per total proteome mass was determined for each functional group and compared between different yeast species, in order to provide insights into the allocation of proteome resources to cellular biological processes.
Analyses of GO slim terms of biological processes revealed that carbohydrate and amino acid metabolic processes represented the first-and second-highest fractions of protein mass in the two Saccharomyces species and K. waltii, respectively, under both glucose-and glycerol-dependent growth (Fig. 8A). In contrast, the fraction of protein mass associated with carbohydrate metabolic processes in K. lactis grown on glycerol was less than that associated with amino acid metabolism and cytoplasmic translation (Fig. 8A). Consequently, FIG. 6. Metabolic proteins differentially expressed between different yeast clades grown on glycerol. Orthologous groups or proteins involved in central carbohydrate metabolism and fatty acid oxidation, meeting criteria for differential expression ( z-score Ͼ2) (original data in supplemental Table S12) between Saccharomyces and Kluyveromyces species grown on glycerol are highlighted in color. Proteins highlighted in blue and purple were differentially expressed in K. waltii and K. lactis, respectively, compared both with S. cerevisiae and S. mikatae. The expression abundance of proteins denoted in red differed in both K. waltii and K. lactis. Upward and downward dashed arrows indicate the increased and decreased expression, respectively, in Kluyveromyces. Differences in expression abundance of gray-colored proteins could not be determined because of insufficient data. Proteins enclosed in rounded rectangles are components of protein complexes. Metabolic proteins involved in oxidative phosphorylation are shown in supplemental Fig. S15. the fraction of protein mass associated with carbohydrate metabolism was lower in K. lactis than in the other yeast species under glycerol-dependent growth, whereas the fraction of protein mass associated with cytoplasmic translation and ribosomal biogenesis were higher (Fig. 8A, two-tailed t test, p Ͻ 0.05). In particular, the fraction of protein mass associated with glycolysis and ethanol metabolism (proteins listed in supplemental Table S15) in K. lactis was notably (three-to sixfold) lower than in the other yeast species (Fig.  8B, two-tailed t test, p Ͻ 0.05). In contrast, a much higher (about twofold) fraction of protein mass in K. lactis was associated with cytoplasmic translation involving ribosomal proteins and translation elongation factors (proteins listed in supplemental Table S15) as opposed to glycolysis/ethanol metabolism (Fig. 8B, two-tailed t test, p Ͻ 0.05). Consequently, the sums of the fractions of protein mass associated with glucose/ethanol metabolism and protein translation were comparable in all four yeast species when grown on a nonfermentative carbon source. These observations suggest that in K. lactis, the balance of proteome resource allocation, which is potentially reflected by the protein mass, might be optimized to maximize the cellular growth rate between metabolic and translational processes (which share a substantial portion of the proteome resources) by a shift in proteome composition from proteins involved in inessential processes to those involved in essential processes.

Species-specific Differentially Expressed Proteins in Both Kluyveromyces Species Compared with Saccharomyces
Yeasts-Mitochondrion-related proteins were significantly enriched among the orthologous proteins exhibiting higher expression in the both Kluyveromyces yeasts under glucose-dependent growth conditions (Table II). Mitochondrial ribosomal proteins were included in this category (supplemental Table  S14D, S14F, and S14G). One-fourth of the entire components of mitochondrial ribosomal proteins (18 out of 67) exhibited higher abundance in at least one Kluyveromyces species as compared with both Saccharomyces yeasts (supplemental Fig. S17A). The cumulative relative copy number of the entire mitochondrial ribosomal proteins in the Kluyveromyces yeasts was about twice that of the Saccharomyces species (supplemental Fig. S17B, two-tailed t test, p Ͻ 0.01). These observations suggest that mitochondrial protein translation is more active in Kluyveromyces yeasts even under conditions of glucose-dependent growth.
Mitochondrion-related proteins exhibiting higher abundance in the Kluyveromyces clade under glucose-dependent growth conditions also included proteins involved in the transport and assembly of mitochondrial proteins (supplemental Table S14D, S14F, and S14G). These sorting and assembly factors were much more abundant under both growth conditions in one or both Kluyveromyces species compared with both Saccharomyces species (Fig. 3B; supplemental Table  S13E, S13H, and S13I). Tom40p and Sam50p are core components of general translocase of the outer membrane (TOM) and sorting and assembly machinery (SAM), respectively (57). These two proteins were more abundant in both Kluyveromyces species (Fig. 3B), whereas no other components showed significantly increased expression (supplemental Fig. S17C). Thus, it remains to be elucidated whether the activity of general transport and sorting for mitochondrial proteins encoded by the nuclear genome is higher in Kluyveromyces yeasts. Tcm62p that plays role in assembly of the particular mitochondrial protein complexes, succinate dehydrogenase (58), was more abundant in both Kluyveromyces species (Fig. 3B). As Tcm62p is believed to function alone as chaperone for their respective protein complexes, this result suggest that the assembly of this type of protein complex involved in electron transport chain may be enhanced in Kluyveromyces yeasts.
Similarity in the Summed Protein Abundance of Conserved Duplicate Genes between the Two Yeast Clades-In addition to their metabolic differences, gene duplication in the Saccharomyces yeast genome after divergence from the Kluyveromyces yeats (21,22,(25)(26)(27) is another interesting difference between the two yeast clades. As described in a previous report of 450 duplicate gene pairs in the Saccharomyces (21), there is substantial diversity with respect to the similarity of protein sequences of paralogous duplicate genes, with 10 to 100% identity observed in intra-and interspecies comparisons ( Fig. 9A; supplemental Table S16). Gene duplication is considered to play distinct evolutionary roles, such as in sub-or neo-functionalization and gene dosage effects or mutational robustness, which are dependent on the evolutionary rate (18,26,27,59,60). Therefore, we divided the duplicate genes and their nonduplicated orthologs into divergent and conserved groups, having less than 80% sequence identity within S. cerevisiae and more than 80% sequence identity in both intra-and interspecies comparisons, respectively (supplemental Table S16), and we then compared the abundance of proteins in each group between S. cerevisiae and K. waltii. Higher protein abundance was observed for the duplicate genes (50 gene pairs) and their orthologs in the conserved group (including identifications based on only nonunique peptides) compared with the divergent group (332 pairs) ( Fig. 9B; supplemental Fig. S18). The positive correlation between extent of sequence conservation and protein expression level was also observed in a proteome-wide analysis (supplemental Fig. S19), results that are consistent with previous determinations of evolutionary rate and mRNA abundance (61,62). FIG. 8. The fraction of protein mass associated with respective biological functions. A, Relative copy number (abundance score) of each protein was multiplied by the number of amino acid residues to calculate protein mass. Then, the fraction of a given protein's mass per total proteome mass was determined for each functional category. Functional classification was carried out as described in supplemental Fig. S12. B, The fractions of protein mass associated with the glycolysis/ethanol metabolism and cytoplasmic translation, respectively, were calculated from the protein mass of orthologous groups or proteins involved in these functions (listed in supplemental Table S15). Individual fractions of protein mass associated with the particular functional groups are shown in supplemental Fig. S16. Numerals (1 and 2) noted above the yeast species names indicate the experiment number of biological duplicates. Abbreviated names of each yeast species are same as in Fig. 1.
In order to investigate interspecies differences in the protein abundance of duplicate genes, we compared median and distribution of the duplicate gene abundance ratio for each group with proteome-wide abundance differences as reference ratios. The protein expression level of each duplicate gene in the divergent group in S. cerevisiae was slightly lower than that in K. waltii, whereas the sum of the protein abundance of a paralogous pair in S. cerevisiae tended to be significantly higher than that of the nonduplicated orthologous gene in K. waltii ( Fig. 9C; supplemental Fig. S20A). In contrast, the protein abundance of each S. cerevisiae duplicate gene in the conserved group tended to be significantly lower than that of K. waltii nonduplicated orthologous gene ( Fig. 9C; supplemental Fig. S20A). Of particular interest was the observation that in the conserved group, the summed protein abundance for a duplicate gene pair in S. cerevisiae was similar to the expression level of the nonduplicate ortholog in K. waltii ( Fig.  9C; supplemental Fig. S20A). The abundance ratio, except for ribosomal proteins or proteins identified only by nonunique peptides, which accounted for about half of the proteins in the conserved duplicate gene group, was similar to that of the entire conserved group (Fig. 9C; supplemental Fig. S20A). Furthermore, the summed protein abundance for each duplicate gene pair in S. cerevisiae was also found to be similar to nonduplicate counterpart in K. lactis under both growing conditions (supplemental Fig. S20B). These results suggest that, in the case of conserved and highly expressed duplicate genes, protein abundance encoded by each gene tends to be approximately half of those encoded by the nonduplicated orthologous gene in the other species. In addition, the total FIG. 9. Comparison of protein abundance of duplicate genes between S. cerevisiae and K. waltii species. A, Percentage of identical amino acid residues between individual paralogs of duplicate genes was compared for intra-and interspecies sequence similarities (original data in supplemental Table S16). Divergent (332 gene pairs and their orthologs) and conserved (50 pairs and orthologs) genes, defined as described in the main text and supplemental Table S16, are indicated in blue and red circles, respectively. B, Relative copy number (abundance score) of each paralogous protein of duplicate genes under glucose-dependent growth (experimental data 1 in supplemental Table S16) was shown for divergent (blue) and conserved (red) groups. C, Relative copy number of each paralog or sum of two paralogous protein abundances in S. cerevisiae was compared with that of singleton nonduplicate orthologous protein in K. waltii for divergent and conserved duplicate gene groups, respectively (data from experiment 1 of biological duplicates in supplemental Table S16). Median (represented with numeric value) and dispersion of abundance ratio for the divergent and conserved duplicate gene pairs for which all three paralogous proteins in S. cerevisiae and K.waltii were successfully detected and quantified, are indicated in box-and-whisker plot (center line, median; lower and upper limit of box, first and third quartiles; circular dots, outliers; whiskers, the lowest and highest values without outliers). With respect to conserved duplicate genes, the protein abundance ratio exclusive of ribosomal proteins (w/o RP) or identifications based on only nonunique peptides (unique) was also compared. "Proteome" denotes the abundance ratio of one-to-one orthologous proteins between S. cerevisiae and K.waltii on a proteomewide scale. Significance of differences was determined using the two-tailed Mann-Whitney U test (*, p Ͻ 0.05; ***, p Ͻ 0.001). Abbreviated names of each yeast species are same as in Fig. 1. amount of protein encoded by highly expressed conserved duplicate gene pair (each duplicate would have the same or similar function) is likely to be conserved across different species, even if gene duplication arose in only one lineage. DISCUSSION Our interspecies comparative proteomic study, using as model organisms multiple yeast species of different clades with distinct metabolic properties and genomic architectures, revealed both conserved and divergent aspects of the proteome expression profiles under different growth conditions, particularly with respect to metabolic enzymes, mitochondrion-related proteins, translational factors, and duplicate gene products. A species-specific shift in the category of most abundantly expressed proteins between glucose/ethanol metabolism and protein translation was particularly interesting, as this result suggests that yeasts have a system for optimizing the balance of proteome resource allocation with priority on cellular growth. Another intriguing finding was that the total protein abundance of a duplicate gene pair and nonduplicated ortholog with conserved sequences tended to be conserved between the two yeast clades, suggesting a potential evolutionary role for conserved, highly expressed duplicate genes, as discussed below.
A variety of mitochondrion-related proteins were differentially expressed in the same yeast species when grown on two different carbon sources or in different yeast species of distinct clades (Kluyveromyces and Saccharomyces) grown under identical conditions. As reported previously, the expression levels of metabolic enzymes involved in the TCA cycle and oxidative phosphorylation are up-regulated in yeast grown on glycerol-containing media compared with yeast grown on glucose. In the present study, an Increase in the total abundance of mitochondrion-localized proteins was also observed (Fig. 4B). In S. cerevisiae, the volume of mitochondrial per cell is reportedly higher in cells grown on glycerol compared with glucose (63), suggesting that our observation of an increase in mitochondrion-associated protein abundance, including that of proteins involved in the TCA cycle and oxidative phosphorylation, was the result of changes in mitochondrial abundance under the different growth conditions. However, no significant differences between cells grown on the two different carbon sources were observed with respect to the abundance of core components involved in protein sorting and assembly in mitochondria (Fig. 3B). Thus, a shift in carbon source would lead to an increase in mitochondrial volume and consequently result in up-regulation of the total abundance and individual expression levels of the associated proteins, also accompanied by changes in protein contents for particular functional groups. In the case of interspecies comparisons of protein expression profiles, not all mitochondrion-associated proteins exhibited differential expression between different yeast species. In cells grown on glycerol, core components of mitochondrial protein sorting and assembly were more abundant in K. lactis than in the Saccharomyces yeasts, whereas the expression levels of metabolic proteins involved in the TCA cycle and oxidative phosphorylation were comparable in K. lactis and the Saccharomyces yeasts when grown on glycerol. Taken together, the observed intra-and interspecies differences in the expression of mitochondrion-related proteins likely reflect changes in both mitochondrial volume and protein contents.
Assuming that the fraction of protein mass reflects the proportion of consumed proteome resources, including biomolecules and high-energy compounds required for maintaining the proteome, balance of proteome resource allocation in K. lactis would be distinct from that of the other yeast species examined. The higher allocation of resources toward translational processes, which are known to be linked to cell growth (64,65), suggests that resource allocation in K. lactis might be optimized to maximize the growth rate. The true fraction of proteome resources consumed by each protein, however, is not directly correlated with protein mass alone but is also influenced by the protein turnover rate, which reflects the rates of protein synthesis and degradation. The half-lives of individual proteins vary over a wide range of more than two orders of magnitude (66,67) and are correlated with biological function and intra-cellular localization (68,69). Dynamic changes in protein translation and degradation rates have been observed in response to changes in environmental conditions, including nutrient starvation and exposure to drugand pathogen-associated stresses (70 -72). Furthermore, there is a case in which orthologous proteins of particular functional groups exhibit species-specific half-lives in budding and fission yeasts (67). Thus, the protein turnover rate of a given protein is different depending on its biological function, varies with environmental conditions, and occasionally differs between orthologs of different organisms. In order to more accurately determine the exact fraction of proteome resources consumed by each protein, it will be necessary to directly measure turnover rates on a proteome-wide scale for each yeast species under specific growth conditions.
Recent studies based on computational modeling, experimental evidence, and comparative genomics, suggest that prokaryotes optimize cellular resource allocation for metabolic enzymes in order to achieve a higher growth rate in the case of prokaryotic organisms (73,74). For example, when a rate-limiting enzyme needed for methionine biosynthesis (a process for which a substantial portion (ϳ8%) of the total mass of the proteome is dedicated) exceeds the endogenous expression level, an excess of ribosomes are allocated for translation of this enzyme; consequently the number of ribosomes available for the synthesis of other proteins is limited, resulting in a lower growth rate rather than beneficial effects associated with an increased concentration of intracellular methionine (73). Among prokaryotes, aerobes that can synthesize ATP via oxidative phosphorylation tend to have a set of genes encoding enzymes that play roles in the noncanoni-cal bypass glycolytic pathway (74). A decline in cellular resources allocated to the noncanonical rather than canonical pathway likely increases the efficiency of the catabolism of glucose to pyruvate. An empirical model for prokaryotes has also been constructed, in which the fraction of the total protein mass associated with ribosomes and metabolism-associated proteins is constrained within a given limit, and the balance between these two protein groups is optimized to allow for the highest cell growth rate (75,76). Although we did not directly determine protein turnover rates in the present study, our comparative proteomics data in conjunction with previous reports discussed above suggest that optimization of cellular resource allocation could represents a cellular system utilized by a wide range of prokaryotes and eukaryotes to provide for the most economical energy consumption, resulting in a higher growth rate.
Our finding that the summed protein abundance of a conserved and highly expressed duplicate gene pair was similar to that of nonduplicated orthologous gene provides valuable insights into the evolutionary role of gene duplication (59,60). The existence of conserved paralogous genes could be beneficial in increasing gene dosage (29 -32), maintaining the dosage balance of expressed proteins involved in multi-protein complexes and in regulatory networks of signal transduction and transcription (33)(34)(35)(36), and buffering against deleterious mutations in either one of the duplicate genes (37)(38)(39). Similar expression levels between the two yeast clades irrespective of gene duplication is inconsistent with the first theory, holding that an increase in gene dosage could be advantageous and therefore promote the fixation of duplicate genes. A previous study for comparative proteomics of worms and flies (6) also reported a comparable abundance of duplicate and nonduplicated genes and suggested that positive selection for high gene dosage likely represents a minor mechanism for retaining duplicate genes. In the case of ribosomal proteins, retention of duplicate genes is considered to be beneficial for maintaining the dosage balance of constituents of large complexes (33,34,36). Consistent with the second theory, the proportion of genes exhibiting haploinsufficiency (30), which partly reflects a dosage balance effect, was significantly higher in duplicate genes of ribosomal proteins (50.0%, 29 of 60 genes, chi-square test, p Ͻ 0.001) than in the entire gene set (3.1%, 184 of 5852 genes). In contrast, except for ribosomal proteins, no significant difference was observed in the frequency of haploinsufficient genes between conserved duplicate genes analyzed in our study (7.5%, three of 40 genes) and the entire gene set (3.1%, 184 of 5852 genes). Thus, our interspecies comparison of the expression level of the products of duplicate and nonduplicated genes seem to support an alternative theory in which either of the duplicate genes could compensate for its counterpart by reducing the deleterious effects of a mutation in the other gene, in line with results of fitness experiments with yeast deletion mutants (37), the effect of RNA interference on worm pheno-types (38), and statistical analyses for human genetic diseases (39).
A question raised by this study is what benefit derives from having a duplicate partner in S. cerevisiae rather than Kluyveromyces yeasts, especially one encoding an abundant protein. The sequences and synonymous and nonsynonymous mutation rates of abundant proteins are reported to be constrained to contribute to mRNA stability, translation fidelity, and robust protein folding (62,77). Considered together with the recent report that in bacteria, ethanol stress increases the frequency of misreading at protein translation (78), the existence of a compensatory counterpart gene might enhance the probability that proteins containing fewer errors are synthesized from at least one of the duplicate genes having a constrained DNA sequence more compatible with a high production rate, even when exposed to higher amount of self-generated ethanol.
A daunting task for interspecies proteomics is making reasonable orthologous annotations as well as for comparative genomics and transcriptomics studies. Orthologous annotations for a wide range of organisms ranging from prokaryotes to eukaryotes and metazoans have been systematically constructed (79,80). For the specific cases considered in this study, however, we had to generate a connection map more appropriate to the organisms analyzed with identical criteria considering a trade-off between specificity and coverage and the quality of sequence data for each organism. Despite this issues, interspecies comparative proteomic studies can reveal not only differences in expression profiles, some of which could be related to distinct activities and phenotypes across different species, but may also provide insights into a variety of biological principles. We plan to expand interspecies proteomics to consider a wider variety of conditions and biological phenomena, including responses to various stresses and the aging process.