Megadalton Complexes in the Chloroplast Stroma of Arabidopsis thaliana Characterized by Size Exclusion Chromatography, Mass Spectrometry, and Hierarchical Clustering*

To characterize MDa-sized macromolecular chloroplast stroma protein assemblies and to extend coverage of the chloroplast stroma proteome, we fractionated soluble chloroplast stroma in the non-denatured state by size exclusion chromatography with a size separation range up to ∼5 MDa. To maximize protein complex stability and resolution of megadalton complexes, ionic strength and composition were optimized. Subsequent high accuracy tandem mass spectrometry analysis (LTQ-Orbitrap) identified 1081 proteins across the complete native mass range. Protein complexes and assembly states above 0.8 MDa were resolved using hierarchical clustering, and protein heat maps were generated from normalized protein spectral counts for each of the size exclusion chromatography fractions; this complemented previous analysis of stromal complexes up to 0.8 MDa (Peltier, J. B., Cai, Y., Sun, Q., Zabrouskov, V., Giacomelli, L., Rudella, A., Ytterberg, A. J., Rutschow, H., and van Wijk, K. J. (2006) The oligomeric stromal proteome of Arabidopsis thaliana chloroplasts. Mol. Cell. Proteomics 5, 114–133). This combined experimental and bioinformatics analyses resolved chloroplast ribosomes in different assembly and functional states (e.g. 30, 50, and 70 S), which enabled the identification of plastid homologues of prokaryotic ribosome assembly factors as well as proteins involved in co-translational modifications, targeting, and folding. The roles of these ribosome-associating proteins will be discussed. Known RNA splice factors (e.g. CAF1/WTF1/RNC1) as well as uncharacterized proteins with RNA-binding domains (pentatricopeptide repeat, RNA recognition motif, and chloroplast ribosome maturation), RNases, and DEAD box helicases were found in various sized complexes. Chloroplast DNA (>3 MDa) was found in association with the complete heteromeric plastid-encoded DNA polymerase complex, and a dozen other DNA-binding proteins, e.g. DNA gyrase, topoisomerase, and various DNA repair enzymes. The heteromeric ≥5-MDa pyruvate dehydrogenase complex and the 0.8–1-MDa acetyl-CoA carboxylase complex associated with uncharacterized biotin carboxyl carrier domain proteins constitute the entry point to fatty acid metabolism in leaves; we suggest that their large size relates to the need for metabolic channeling. Protein annotations and identification data are available through the Plant Proteomics Database, and mass spectrometry data are available through Proteomics Identifications database.

Chloroplasts are essential plant organelles of prokaryotic origin that perform a variety of metabolic and signaling functions. Best known for their role in photosynthesis, they also carry out the biosynthesis of many primary and secondary metabolites like lipids, amino acids, vitamins, nucleotides, tetrapyrroles, and hormones (1). Subcellular localization prediction by TargetP (2) combined with a correction for false positive and false negative rates suggested that all non-green plastid types and chloroplasts together contain some 3500 proteins in Arabidopsis thaliana (3). More than 95% of the chloroplast proteins are nucleus-encoded and post-translationally imported into the chloroplast (4 -6). Over the last decade, several studies were published that aimed to identify (subfractions of) the Arabidopsis chloroplast proteome (e.g. Refs. [7][8][9][10]. The precise number of bona fide chloroplast proteins from these proteomics studies is probably somewhere around 1000 -1300; comparing this number with the predicted chloroplast proteome indicates that ϳ50% of the proteome has still not been observed. Recently, we concluded that when compared with the predicted Arabidopsis chloroplast proteome the chloroplast proteome identified to date is particularly underrepresented (40 -70%) for proteins involved in signaling, stress, development, unassigned function, and DNA/RNA metabolism (9). To probe deeper into the chloroplast proteome, enrichment for low abundance proteins prior to MS analysis is required.
Many biochemical functions are executed by protein assemblies. Several studies have catalogued the assembly states of chloroplast proteins in plants. Separation of the oligomeric Arabidopsis stromal proteome by two-dimensional native gel electrophoresis (CN 1 -PAGE) profiled 240 non-redundant proteins and captured information for 124 complexes (11). However, native gel electrophoresis has a practical size limit, and only protein complexes below ϳ1000 kDa can be effectively separated, thereby missing megadalton-sized complexes. Several megadalton-sized complexes in plants have been characterized by targeted purification schemes, including the spinach 30 and 50 S ribosomal particles (12)(13)(14), cytosolic ribosomes (15,16), the tobacco plastid-encoded RNA polymerase (PEP) complex (17), maize mitochondrial pyruvate dehydrogenase complex (PDC) (18), and pea chloroplast acetyl-CoA carboxylase (ACCase) complex (19). Proteome characterization of a membrane-depleted, Triton-insoluble, and high density pellet from pea plastids was highly enriched for the chloroplast PDC as well as proteins involved in plastid gene expression and carbon fixation (20). However, because no subsequent fractionation was performed, specific protein associations could not be resolved.
To extend chloroplast proteome coverage and characterize MDa-sized macromolecular assemblies to complement the previous CN-PAGE analysis of complexes up to 0.8 MDa, we fractionated the soluble chloroplast stroma by size exclusion chromatography (SEC) with a particular focus on complexes greater than 0.8 MDa. Proteins were identified by mass spectrometry analysis using an LTQ-Orbitrap, a high accuracy and high sensitivity hybrid instrument (21,22). SEC migration profiles for identified proteins were generated from matched spectral counts. Hierarchical clustering and protein heat maps of the SEC migration profiles revealed that the identified protein complexes include 30, 50, and 70 S ribosomal particles; PDC; PEP; and ACCase, indicating successful MDa size fractionation. In addition, many "new" proteins were detected, and they were enriched for functions in plastid gene expression, in particular putative ribosomal biogenesis factors. Finally, protein annotations and identification data are available via the Plant Proteomics Database (PPDB) at http://ppdb.tc.cornell.edu/, and mass spectrometry data with their metadata were deposited in the Proteomics Identifications database (PRIDE) (http://www.ebi.ac.uk/pride/) under accession numbers 11459 -11568.
The concept of using chromatography (or other continuous fractionation techniques) of protein complexes (or other types of cellular protein fractions) with mass spectrometry-based quantification to determine co-localization has been applied using stable isotope labeling (23,24) or label-free techniques (25,26). When combined with cluster analysis (this study and Ref. 24), principle component analysis (23), or correlation of normalized elution profiles (this study and Refs. 25 and 26), this strategy is clearly a powerful tool and is widely applicable to other subcellular proteomes.

Plant Growth, Chloroplast Stroma Proteome Isolation, and Size
Fractionation-A. thaliana (Col 0) was grown under 10-h light/14-h dark cycles at 25/17°C in controlled growth chambers (Conviron) for about 55 days, and leaves were collected from mature rosettes about 1 week prior to bolting. Leaves were briefly homogenized in grinding medium (50 mM HEPES-KOH, pH 8.0, 330 mM sorbitol, 2 mM EDTA, 5 mM ascorbic acid, 5 mM cysteine, 0.03% BSA) and filtered through a nylon mesh. The crude plastids were then collected by a 2-min spin at 1100 ϫ g and further purified on 40 -85% Percoll cushions (Percoll in 0.6% Ficoll, 1.8% polyethylene glycol) by a 10-min spin at 3750 ϫ g and one additional wash in the grinding medium without ascorbic acid, cysteine, and BSA. Chloroplasts were subsequently lysed in 10 mM HEPES-KOH, pH 8.0, 5 mM MgCl 2 with a mixture of protease inhibitors under mild mechanical disruption. The lysate was then subjected to ultracentrifugation (100,000 ϫ g) to pellet the membrane components. The supernatant (stroma) was then collected and concentrated using an Amicon 10-kDa-molecular mass cutoff filter (Millipore). Protein amounts were determined using the Bradford reagent (Bio-Rad) or the BCA protein assay kit (ThermoScientific).
Stroma (1-3 mg) was loaded in a Superose 6 10/300 GL column (GE Healthcare) using an Ä KTA FPLC system (Amersham Biosciences) with a mercury lamp as a detector. Absorbance was measured at 280 nm. Elution was performed with buffer A (25 mM HEPES, pH 8.0, 10 mM NaCl, 10 mM MgCl 2 ) or buffer B (50 mM HEPES, 50 mM NaCl, pH 8.0, 5 mM MgCl 2 ) at an optimal flow rate of 0.25 ml/min. 300-l subfractions were initially collected. Subfractions were pooled as follows: three for fractions 1-6, four for fractions 7-12, and six for fraction 13 (see Fig. 2 for the SEC chromatogram and the fraction designations). Proteins from pooled fractions were either concentrated using an Amicon Microcon YM-10 filter (Millipore) or precipitated with 80% acetone and were then separated further using SDS-PAGE on 12% T Laemmli or 12% T Tricine minigels. Protein bands were visualized using fluorescent SYPRO Ruby for fractions 1-5 of SEC-separated sample from buffer A, and the rest of the gels were stained with Coomassie Blue. Each gel lane was excised into four or five bands followed by reduction, alkylation, in-gel digestion with trypsin, and peptide extraction as described (27). Peptide extracts were dried down and resuspended in 15-20 l of 5% formic acid for MS/MS analysis.
Nano-LC-LTQ-Orbitrap Analysis and Data Processing-The resuspended peptide extracts were analyzed by data-dependent MS/MS using an on-line LC-LTQ-Orbitrap (Thermo Electron Corp. samples were automatically loaded on a guard column (LC Packings MGU-30-C18PM) via an autosampler followed by separation on a PepMap C 18 reverse-phase nanocolumn (LC Packings nan75-15-03-C18PM) using 90-min gradients with 95% water, 5% ACN, 0.1% formic acid (solvent A) and 95% ACN, 5% water, 0.1% formic acid (solvent B) at a flow rate of 200 nl/min. Two blanks were run after every sample (see Zybailov et al. (9) for the gradient and sample injection scheme). The acquisition cycle consisted of a survey MS scan in the Orbitrap with a set mass range from 350 to 1800 m/z at the highest resolving power (100,000) followed by five data-dependent MS/MS scans acquired in the LTQ. Dynamic exclusion was used with the following parameters: exclusion size, 500; repeat count, 2; repeat duration, 30 s; exclusion time, 180 s; exclusion window, Ϯ6 or Ϯ100 ppm. Target values were set at 5 ϫ 10 5 and 10 4 for the survey and tandem MS scans, respectively. Regular scans were used both for the precursor and tandem MS.
Peak lists (.mgf format) were generated using DTASuperCharge (v1.19) software (SourceForge) and searched with Mascot v2.2 (Matrix Science). For off-line calibration, first a preliminary search was conducted with the precursor tolerance window set at Ϯ30 ppm. Peptides with ion scores above 33 were chosen as benchmarks to determine the offset for each LC-MS/MS run. This particular ion score value (33) was chosen in accordance with the results of the search against the target-decoy database (see further below and Ref. 28). This offset was then applied to adjust precursor masses in the peak lists of the respective .mgf file for recalibration using a Perl script. 2 The recalibrated peak lists were searched against The Arabidopsis Information Resource A. thaliana database v8, including sequences for known contaminants (e.g. keratin and trypsin) (total 33,013 entries) with or without a concatenated decoy database where all the sequences were in reverse orientation. Each of the peak lists were searched using Mascot v2.2 (maximum p value of 0.01) for fully tryptic peptides using a precursor ion tolerance window set at Ϯ6 ppm, variable methionine oxidation, fixed cysteine carbamidomethylation, a minimal ion score threshold of 33, and mass range of 700 -3500 Da for precursor ions. To reduce the false identification rate of proteins identified by one peptide, the Mascot search results were further filtered as follows: the ion score threshold was increased to 35, and mass accuracy on the precursor ion was required to be within Ϯ3 ppm. Overall, this yielded a peptide false discovery rate of 1.5% with peptide false positive rate calculated as 2 ϫ (decoy_hits/total_hits) derived from searches against the target-decoy database. The false protein identification rate of protein identified with two or more peptides was zero. All filtered results were uploaded into the PPDB (http://ppdb.tc.cornell.edu/) (29). All mass spectral data (the .mgf files reformatted as PRIDE XML files) were made available via the PRIDE at http://www.ebi.ac.uk/pride/.
Determination of Protein SEC Elution Profiles and Relative Protein Abundance-To determine the relative protein abundance distribution by spectral counting, the number of matched MS/MS spectra or spectral count (SPC) for each protein was obtained. This was further classified as total SPC, unique SPC (uniquely matching to an accession), and adjusted SPC (adjSPC). The latter is the sum of unique SPCs and SPCs from shared peptides across accessions with SPC distributed in proportion to their unique SPCs if applicable. Proteins that shared more than 80% of their matched peptides with other proteins across the complete data set were grouped into families. For many Arabidopsis genes, more than one protein model is predicted. In this study, protein models with the highest total adjSPC across all experiments were used; if the protein models did not differ in total adjSPC, protein model 1 was selected. To increase the robustness and significance of the data set, we removed all proteins that were identified with only one amino acid sequence irrespective of charge state, post-translational modifications, or number of SPCs. Proteins that were quantified with two or fewer adjSPCs were also removed. To generate protein elution heat maps for the "high mass" (HM) data set, the adjSPC for each protein per SEC fraction was normalized to the highest adjSPC across five HM fractions (NadjSPC).
To calculate the relative abundance for each protein across all HM fractions, the total adjSPC was divided by the number of observable tryptic peptides within the mass range 700 -3500 Da (with the predicted transit peptide removed), yielding the spectral abundance factor (SAF). The SAF values were then normalized to the total SAF in the whole data set, yielding normalized spectral abundance factors (NSAFs).
Hierarchical Clustering Analysis-To group proteins with similar elution profiles in the HM data set, hierarchical clustering was utilized using the Statistics toolbox of MATLAB version 7 (Mathworks, Inc.). The linear correlation () between every pair of proteins with NadjSPC distribution across the SEC HM fractionation range, X 1 , … X n and Y 1 , … Y n where n ϭ 5, was derived.
This was then converted into a distance measure, ⌬ XY ϭ 1 Ϫ XY . Protein pairs with similar elution profiles have higher correlations and in turn have smaller distance values. A linkage map based on the average distance among protein pairs was then constructed to yield a hierarchical cluster tree (dendrogram).
Nucleic Acid Extraction and Subsequent Detection of DNA and Ribosomal RNA from SEC Fractions-DNA and RNA were isolated from SEC fractions 1-7 by phenol/chloroform extraction followed by ethanol precipitation. Briefly, an aliquot from each fraction was combined with phenol/chloroform/isoamyl alcohol in a 50:50 mixture together with 0.3% SDS, 1.5 mM EDTA, and 20 ng/l Glycoblue (Ambion). SDS and EDTA were added to dissociate proteins from nucleoprotein complexes. Glycoblue enhances nucleic acid recovery and increases visibility of the sample pellet. The aqueous phase was extracted after phase separation. To increase nucleic acid yield, the organic phase was then back-extracted with TESS buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 100 mM NaCl, 0.2% SDS). The collected aqueous phase was then combined with the initial extraction. After adjustment of salt concentration with 3 M sodium acetate, nucleic acid was precipitated with 70% ethanol, pelleted, dried, and resuspended in TE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA).
An aliquot of the extracted DNA/RNA was treated with ribonuclease H (Invitrogen) to degrade RNA from RNA-DNA hybrids, and plastid DNA was probed by PCR amplification of the gene for 16 S rRNA (see Ref. 30 for primers used). PCR samples were then separated on agarose gels and visualized by ethidium bromide staining. The presence of 16 and 23 S rRNA in the extracted DNA/RNA was determined by Northern blot analysis through hybridization with digoxigenin (DIG)-11-dUTP-labeled rRNA probes and subsequent detection with anti-DIG antibodies as described previously (30).
Plant Proteomics Database-Mass spectrometry-based information of all identified proteins was extracted from the Mascot search pages and filtered for significance (e.g. minimum ion scores, etc.), ambiguities, and shared spectra as described previously. This information includes MOWSE scores, number of matching peptides, total SPCs, unique SPCs, adjSPCs, highest peptide score, highest peptide error (in ppm), lowest absolute error (ppm), sequence coverage, and 2 B. Zybailov, unpublished data. tryptic peptide sequences (29). All these are available in the PPDB by using the search function "Proteome Experiments" and by selecting the desired output parameters. Alternatively, information for specific accessions (either individually or a group) can be extracted using the search function "Accessions," and if desired, this search can be limited to specific experiments. Finally information for a particular accession can also be found on each "protein report page." The MapMan bin system (31) was used for functional assignment, and proteins were reassigned to other bins if needed.

Isolation, Size Exclusion Fractionation, and MS Analysis of
Arabidopsis Stromal Proteome-This study aimed to extend the chloroplast proteome coverage and examine MDa-sized assemblies. A summary of the complete work flow of the experimental and computational analysis is shown in Fig. 1. Intact chloroplasts were isolated from mature Arabidopsis leaf rosettes and were lysed under non-denaturing conditions. The lysate was subjected to ultracentrifugation to remove chloroplast membrane-bound proteomes (pellet) from the soluble fraction (stroma). Size separation of the non-denatured stromal proteome was then performed by SEC using a column that resolved up to ϳ5-MDa complexes followed by SDS-PAGE.
Several lysis media and SEC elution conditions with varying MgCl 2 concentration, NaCl concentration, and flow rates were tested for optimization of resolution and complex stability. Removal of MgCl 2 in the lysis and elution buffers led to the destabilization of high molecular weight complexes such as 70 S ribosomes (data not shown). Chromatograms for runs at various NaCl concentrations (50, 100, and 150 mM NaCl) revealed a slight shift to later elution times with increasing ionic strength, suggesting partial destabilization of protein interactions (not shown). However, similar overall peak profiles were observed, suggesting that the core complexes remained intact but that transient and weak protein-protein or protein-RNA associations were destabilized at higher salt concentrations. Based on these optimizations, SEC fractionation was performed at a flow rate of 0.25 ml/min using either buffer A or buffer B.
A typical SEC chromatogram of the stromal proteome under native conditions is shown in Fig. 2A. The highest peak at fraction 7 (ϳ13 ml) corresponds to the ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) holocomplex at ϳ550 kDa that comprises ϳ58% of the total stromal mass (11) ( Fig. 2A). Five fractions for the SEC run with buffer A covering the mass range of 0.8 to Ͼ5 MDa were designated as HM-A (fractions 1-5), and the remainder of the fractions were assigned as LM-A (fractions 6 -13), spanning a mass range of up to 0.8 MDa. The LM fraction covers the same native mass range as the CN-PAGE stromal proteome analysis (11), whereas the HM fraction is complementary to this CN-PAGE analysis. For the SEC run with buffer B, we only analyzed and discussed the HM fraction (HM-B).
The proteins in the SEC fractions were separated by SDS-PAGE ( Fig. 2A). Each gel lane was excised into slices followed by in-gel trypsin digestion and nano-LC-LTQ-Orbitrap analysis of the extracted peptides using data-dependent acquisition and dynamic exclusion in a work flow previously optimized for identification and quantification by spectral counting (9). 110 MS runs were analyzed, and the search results and relevant associated information (e.g. ion scores, mass errors, matched sequences, etc.) were uploaded in the PPDB. The PPDB provides an integrated platform for comparing our protein identifications with other in-house MSbased proteomics experiments, annotated properties, and published information (29). All mass spectral data (the .mgf files reformatted as PRIDE Extensible Markup Language files) were made available via PRIDE at http://www.ebi.ac.uk/pride/.
It has been shown for LC-MS-based analyses of proteomes that the number of matching MS/MS spectra, here assigned as SPCs, correlates well with protein abundance if there is a sufficient number of SPCs obtained per protein (32)(33)(34)(35). To correct for SPCs derived from peptides shared between proteins, adjSPCs were calculated as the sum of unique SPCs and a proportional distribution of shared SPCs (see "Experimental Procedures"). In addition, the relative concentration for each identified protein in the HM range was calculated as the NSAF derived from adjSPC weighted for the number of theoretical tryptic peptides with a relevant length ("observable peptides") (36). To increase the confidence and significance of the data sets in this study, proteins that were matched to only one amino acid sequence (irrespective of charge state, post-translational modification, or number of SPCs) were excluded from further analysis (with the exception of three small ribosomal proteins; see MS/MS spectra in supplemental Fig. 1).
Comparison with Other Chloroplast Proteomics Studies Demonstrates Enrichment of Plastid Gene Expression Components-In total, we identified 1081 proteins in the HM-A, LM-A, and HM-B fractions (not shown, but see PPDB) of which 347 were not identified in our comprehensive total chloroplast proteome analysis (membranes and stroma) in which 1325 proteins were identified (9). For the purpose of the current study, we imposed an additional stringency demanding at least two different matched amino acid sequences per protein and at least two adjSPCs, reducing the new data set to 664 proteins. The reason for this extra stringency is that low abundance proteins were the focus in this study (up to more than 100,000-fold lower in abundance than Rubisco), and this extra filter reduced identification of non-chloroplast proteins and false positives even if several "novel" chloroplast proteins were removed (including pentatricopeptide repeat (PPR) proteins). Following the grouping of closely related homologues and removal of 54 non-chloroplast proteins and 47 chloroplast membrane(-associated) proteins, we identified 542 stromal proteins (and protein groups) of which 86% had a TargetP predicted chloroplast transit pep- tide (cTP) (supplemental Table 1). 60 of these 542 stromal proteins were not observed in our previous analysis (9); 37 of those 60 were in the HM set and were dominated by chloroplast ribosomal proteins, various splice factors, and proteins with unknown function, such as PPR domain proteins and DNA-binding proteins. In contrast, the new proteins in the LM fractions had a variety of functions (supplemental Table 1). Thus, the analysis of MDa-sized assemblies revealed low abundance proteins mostly involved in plastid gene expression.

Comparison of HM and LM Data Sets Reveals Effective SEC Fractionation and Enrichment for Plastid Gene Expression
Components- Fig. 2B compares the identified stromal proteins in HM-A and LM-A fractions. Importantly, there was only an 11% overlap (76 proteins) between these two data sets, showing that SEC effectively preserved and separated specific protein complexes above and below 0.8 MDa. This overlap consisted predominantly of highly abundant proteins involved in primary carbon metabolism, such as Rubisco components. Comparison of the distribution of functions of the non-overlapping protein fractions showed a Ͼ5-fold enrichment in plastid gene expression in the HM fractions (47% all proteins) at the expense of all other functions except for protein homeostasis components (e.g. chaperones). Table I and supplemental Table 2 list annotated functions and relative abundance of proteins (NSAF) in the HM sets. Moreover, the heat maps (from NadjSPC) show the distribution for each protein across the SEC mass range. Excluding the 77 proteins that were also found in the LM data set, 27 of the 30 most abundant proteins are ribosomal subunits, further confirming that the HM fractions are highly enriched with proteins involved in plastid gene expression. This enrichment is consistent with the presence of plastid rRNAs and plastid DNA in the HM fraction (Fig. 2C). The average relative abundance of ribosomal proteins across the HM fractions is 0.01 (Table I), and this will be used to assess the accumulation levels of ribosome-associated proteins (see further below).
Profiling of Macromolecular Assemblies in Stroma-The SEC elution profiles of proteins, derived from NadjSPC values across the chromatogram, reflect the size range(s) of complexes in which they participate as well as their quantitative distribution over these complexes. To obtain a global view of SEC migration protein profiles and to facilitate the identification of putative interacting proteins, hierarchical clustering was used to group proteins that exhibit similar SEC elution trends (Fig. 1). Several dendrograms from clustering with different minimum thresholds for total adjSPCs were analyzed. Based on these tests, we chose a minimum threshold of 20 total adjSPCs per protein; this threshold minimized noise that might skew the linkages among correlated proteins. The resulting dendrogram (Fig. 3) displayed three main clusters, namely cluster I (essentially proteins confined to complexes Ͼ2 MDa), cluster II (1-3 MDa), and cluster III (Ͻ1 MDa).
The first two clusters are linked, and they are both dominated by ribosomal proteins and associated factors as will be discussed in the following sections. Cluster III is subdivided into cluster III-1 containing several ribosomal proteins and cluster III-2 consisting mainly of proteins that are starting to elute below 1 MDa. Cluster III-2 included components of the Rubisco complex (ϳ550 kDa), chaperone 60 (ϳ800 kDa), ferredoxin-glutamine:2-oxoglutarate amidotransferase (ϳ700 kDa), and Glyceraldehyde 3-phosphate dehydrogenase (600 kDa), which were well resolved by CN-PAGE (11).
Many of the proteins that were grouped by hierarchical clustering coincided with known assemblies, such as plastid PDC; 30, 50, and 70 S ribosomal particles; and ACCase ( Fig.  3). For instance, the maize mitochondrial PDC was found to have an estimated mass of about 8 -9 MDa (18,37). Similarly to the mitochondria PDC, the Arabidopsis plastid-localized PDC components clustered in the mass range greater than 3 MDa (Fig. 3B). As we will show in more detail below, the clustering further confirmed that SEC effectively separated stromal complexes. The remainder of the "Results" will discuss the composition, organization, and function of the observed MDa-sized macromolecular assemblies.
70 S Ribosomes-The spinach plastid 70 S ribosome consists of 58 ribosomal proteins with 25 and 33 components comprising the 30 and 50 S complexes, respectively (12)(13)(14). The majority of these proteins are homologous to bacterial 70 S ribosome components, but several are unique to the chloroplast (plastid-specific ribosomal proteins (PSRPs)), and they are proposed to perform plastid-specialized ribosomal functions (12,14,38). PSRP-2, -3, and -4 associate with the 30 S subunit, whereas PSRP-5 and PSRP-6 associate with the 50 S subunit (12)(13)(14). PSRP-7 interacts with the 30 S particle in Chlamydomonas reinhardtii, Arabidopsis, and rice but is missing in spinach (39). PSRP-7 is synthesized as a polyprotein consisting of the mature PSRP-7 and the elongation factor (EF)-Ts (PETs), which might be post-translationally processed to render various fused or independent proteins (39). Arabidopsis orthologues for all but three ribosomal proteins were identified in the HM fractions, accounting for 57 ribosomal proteins (Table I). About 50 and 75% of the 30 and 50 S ribosomal components in Arabidopsis are nucleus-encoded, respectively, consistent with that observed in spinach (12)(13)(14). Interestingly, RPL23 is nucleus-encoded in spinach but has two chloroplast-encoded genes in Arabidopsis (Atcg00840.1 and Atcg01300.1), suggesting evolutionary divergence.
Several Arabidopsis chloroplast ribosomal proteins that were identified are encoded by multiple genes. Five of these, namely RPS7, RPS12, RPL2, RPL12, and RPL23, comprise two or three identical gene products. Four of these protein groups are chloroplast-encoded, suggesting a possible regulatory role or adaptation to specific conditions in plastid gene expression. In addition, two paralogues each were assigned to RPL18 (23% identity) and to RPL19 (74% identity, variable N-terminal regions), and these related proteins were distin- guishable by MS analysis. Interestingly, RPL18A was found to be 6 times more abundant than RPL18B. In addition, one chloroplast and two nuclear genes encode for RPS16 in Arabidopsis (40,41), but only one paralogue (At4g34620.1) was detected in this study.
The ribosomal components RPS21, RPL34, and PSRP-6 were not detected in this study or in previous published chloroplast proteomics studies on Arabidopsis. One reason might be that they are short, lysine-and arginine-rich proteins, yielding small tryptic peptides that were not amenable for LC-MS analysis. Another possibility is the location and function of these proteins in the ribosome where they could easily be detached. RPS21 is found comprising the top of the head of the 30 S region in prokaryotic ribosomes (42). PSRP-6 has been found to be loosely associated with the ribosome (38) and is detected at lower amounts compared with other spinach ribosomal proteins (12).
To explore the assembly state of the ribosome and to co-localize ribosome-associated factors, we carried out a separate hierarchical clustering of all 53 assigned ribosomal proteins, (co-)translational factors, and ribosome biogenesis factors with 20 or more adjSPCs (Fig. 4A). The left-hand panel shows the complete dendrogram in which three clusters (A, B, and C) were distinguished with cluster A representing the 30 S particle, cluster B representing the 50 S particle, and cluster C representing the translating 70 S ribosome. Close-ups of these three clusters with protein names are shown in the three other panels of Fig. 4A. Protein components and factors known/expected to specifically associate with the 30 S particle are in blue, and those part of the 50 S particle are in black. Proteins that have a function in ribosome biogenesis or translation but that are not an integral part of the ribosome are listed in red and italics. The 30 S particle peaked in the 1-2-MDa range (fraction 4), and the 50 S particle peaked in the 2-3-MDa range (fraction 3), whereas the 70 S ribosomes peaked in fraction 2 (Fig. 4A). Northern blot analyses of rRNAs for 30 and 50 S particles extracted from these fractions were consistent with these profiles (Fig. 2C). Overall, the protein and RNA profiles of these ribonucleoparticles correspond to 30 and 50 S subunits (possibly in different stages of maturation), 70 S ribosomes, and polysomes.
We note that the ribosome-associated factors are substoichiometric to the ribosomal proteins, and many were quantified with fewer than 20 total adjSPCs, and they were therefore The left-hand panel shows the complete dendrogram; three clusters were distinguished with cluster A representing the 30 S particle, cluster B representing the 50 S particle, and cluster C representing the translating 70 S ribosome. Close-ups of these three clusters with protein names are shown in the three other panels. Protein components and factors specifically associated with the 30 S particle are in blue, and those part of the 50 S particle are in black. Proteins that have a function in ribosome biogenesis or translation but that are not an excluded from the clustering in Fig. 4A. However, to show the distribution of these proteins across the HM fractions, we show a heat map of all 30 proteins in Fig. 4B. In the next sections, we will discuss these ribosome-associated factors in more detail.
Protein Translation-The translation machinery requires the participation of ribosome-associating initiation, elongation, and termination factors as well as proteins aiding in ribosome recycling (42,43). Translation initiation proceeds with the binding of the mRNA transcript to a free 30 S subunit followed by association of the initiator tRNA (fMet-tRNA) and the initiation factors IF1, IF2 and IF3, together forming a preinitiation complex (42,43) (Fig. 4C). We observed all three IFs with IF1 and IF2 being ϳ10-fold more abundant than IF3.
Elongation ensues after binding of the mature 50 S subunit to the preinitiation complex and subsequent release of the three IFs (Fig. 4C). Several rounds of elongation of the nascent peptide proceed by alternating actions of EF-Tu, which introduces new aminoacyl-tRNAs into the peptidyl transfer center in the 50 S ribosome, and EF-G, which translocates the peptidyl-tRNA after spontaneous peptidyl transfer occurs. During each round, EF-Tu dissociates from the peptidyl transfer center upon GTP hydrolysis with the help of EF-Ts, its nucleotide exchange factor (42,43). A point mutation in the chloroplastlocalized EF-G protein leads to impairment of chloroplast development within cotyledons (which become white) but not in true leaves (44). Aside from its function in translation, the EF-Tu orthologue in maize plastids was suggested to also serve as a chaperone during heat stress (45,46). Additional translation elongation factors include the GTPases LepA and TypA/BipA proteins; in bacteria, both have been shown to bind to the 70 S ribosomes at the same site as EF-G (47)(48)(49). In Escherichia coli, LepA is proposed to recognize ribosomes with mistranslocated tRNAs and induce back-translocation for corrective retranslocation (50). The bacterial and plant TypA/BipA appear to be particularly important under stress conditions (51)(52)(53)(54)(55).
The elongation factors detected in the HM fractions are EF-Tu, EF-G, EF-Ts, LepA, and TypA/BipA (Fig. 4B). These elongation factors have different SEC elution profiles (Fig. 4, A  and B), but they were mostly found in the 2-5-MDa range and were most likely associated with 70 S ribosomes or polysomes. The primary elongation factor EF-Tu was an abundant protein (same abundance range as the ribosomes) and was seen as a broad peak across all five fractions (Fig. 4, B and C). Similarly, CN-PAGE analysis of stroma showed EF-Tu migrating at multiple native masses (11). TypA/BipA, LepA, and PSRP-1 exhibited similar accumulation levels at about 20% of the ribosome. The PETs, which harbors the EF-Ts domain fused to PSRP-7, peaked in the 0.8 -1-MDa range (Table I). Cryoelectron microscopy structure analysis and genetic and biochemical studies have shown that PSRP-1 is a translation factor rather than an integral ribosomal protein (38,56). PSRP-1 contacts the space between the 30 and 50 S subunits, thereby stabilizing the 70 S ribosome, and is proposed to be involved in translation regulation during stress (56). PSRP-1 clustered with the 50 S particle and 70 S ribosomes, supporting its role as a translation factor.
Translation termination occurs after release factors (RFs), which recognize the stop codon, bind to the ribosome. Ribosome recycling then occurs wherein a ribosome recycling factor (RRF) binds to post-termination complexes and, in coordination with EF-G, splits the 70 S ribosomes for the next round of translation (42,43). No RFs and RRFs were observed in the HM fraction, but Arabidopsis orthologues of RF1 (At3g62910.1) and RRF (At3g63190.1) were found in LM-A fractions (see supplemental Table 1).
Co-translational Protein Processing and Folding-During translation, the nascent polypeptide chain that extends out of the peptidyl exit tunnel of the 70 S ribosome is subjected to N-terminal modifications as well as protein folding (57). Proteins involved in such co-translational activities were indeed found in fractions with 70 S ribosomes (Table I and Fig. 4). These include the enzymes peptide deformylase (PDF) and methionine aminopeptidase (MAP), the chloroplast signal recognition particle (cpSRP54), and trigger factor (TF).
PDF and MAP are hydrolytic enzymes that together perform co-translational N-terminal methionine removal. PDF removes all N-formyl groups, exposing the amino group of the first methionine, a prerequisite for the subsequent action of MAP (57). Two PDFs (PDF1A and PDF1B) have been found in Arabidopsis, but only PDF1B (At5g14660.1), shown to be dually targeted to the chloroplast and mitochondria in Arabidopsis (58 -60), was observed in this study. Several MAP proteins (MAP1B, MAP1C, and MAP1D) exist in plants, and the chloroplast-localized MAP1B (At1g13270.1) (59) was found in this study.
In co-translational protein targeting, nascent polypeptides with hydrophobic domains are recognized by the signal recognition particle, and they are then targeted as a ribosomenascent chain complex to membranes (57). The chloroplast sorting component cpSRP54 has a role in post-translational targeting of nucleus-encoded thylakoid proteins and has also been implicated in co-translational targeting (61-64) and, consistently, was detected in this study peaking with ribosomes at the 2-5-MDa range (Fig. 4). In addition, polypeptides integral part of the ribosome are listed in red and in italics. B, schematic summarizing the overview of the ribosome assembly process (upper panel) and protein translation process (lower panel) with identified factors indicated. C, heat map of annotated proteins (based on NadjSPC) identified with two or more adjSPCs and known or expected to associate with ribosomes and preribosomal particles. The proteins are grouped according to function as indicated (translation and co-translational modifications, ribosome biogenesis, and RNA maturation (mat.)). The scale is indicated with NadjSPC values for each fraction normalized to the maximum value (see Table I).
are being folded as they emerge from the peptide exit tunnel. The E. coli TF binds to the 70 S exit tunnel and prevents misfolding and aggregation of emerging nascent proteins (65). A 55-kDa TF (At5g55220.1) with conserved domains and a 19-kDa truncated form of TF (At2g30695.1) were identified with the 70 S ribosomal subunits. The 55-kDa full-length TF protein was 3-fold more abundant than the truncated form (Table I).
Chloroplasts contain two dominant protein chaperones systems, namely the CPN60/CPN21/CPN10 (66, 67) and the HSP70/GrpE system (68). The Arabidopsis HSP70-1 (At4g24280.1) and HSP70-2 (At5g49910.1) are abundant proteins with a broad substrate pool, and their abundance was constant across all the HM-A fractions, consistent with their wide range of substrates. The nucleotide exchange factor of HSP-70, GrpE (At5g17710.1), showed a similar distribution. Based on information about E. coli homologues, it is quite likely that HSP70 also functions in co-translational folding, thereby assisting the TF protein (57).
Ribosome Biogenesis-The formation of a functional ribosome from more than 50 proteins and four rRNA molecules entails a complex series of coordinated processes, including processing and modification of ribosomal components, assembly, and maturation (69,70). Several Arabidopsis orthologues of bacterial ribosome assembly factors were identified in the HM-A fractions ( Fig. 4B and Table I). Most of these proteins were initially annotated with unknown function, but careful domain analysis, observed homology to bacterial ribosome biogenesis factors (from PSI-BLAST searches (71)), and their detection in the ribosomal fractions provided support for their involvement in ribosome biogenesis.
YqeH, a circularly permuted GTPase, also co-sediments with the 30 S particle and is essential for 16 S rRNA matura-tion (81). Two YqeH orthologues in Arabidopsis were identified in the HM fractions, and they behave similarly (found in the 30 and 70 S fractions). One of these orthologues, the At-NOA1 protein (At3g47450.1), exhibits GTPase activity in vitro (82). Moreover, the bacterial YqeH complements AtNOA1-deficient mutants, providing further evidence that AtNOA1 functions as a chloroplast-localized YqeH in plants (83,84).
Era is another E. coli GTPase that binds 16 S rRNA and is involved in 30 S subunit maturation (85). We observed that the Era orthologue in Arabidopsis (At5g66470.1) was present in the 70 S fractions rather than the 30 S fraction. The E. coli EngA (YphC in Bacillus subtilis) has two contiguous GTPase domains whose nucleotide occupancy modulates its binding to either 30 S alone or to 30, 50, and 70 S ribosomes (86). It co-sediments with 16 and 23 S rRNAs (86) and is essential for bacterial growth (87). The Arabidopsis orthologue of EngA (At5g6050.1) migrates at Ͼ2-MDa range (50 S, 70 S, and polysome fractions). Two mutant alleles of the Arabidopsis EngA are arrested in embryogenesis at the globular stage (88), consistent with an essential function in chloroplast development.
RNA Maturation and Degradation-Three ribonucleases were observed in the HM fractions ( Fig. 4B and Table I), and various lines of evidence suggest their involvement in rRNA maturation in various ribosomal assembly stages. CSP41A (At3g63140.1) and CSP41B (At1g09340.1) are two related endoribonucleases with multiple roles in chloroplast gene expression (99,100). In Arabidopsis, CSP41B co-purified with preribosomal particles but not with mature ribosomes or polysomes (99,100). CN-PAGE analysis of stroma showed that both CSP41A and CSP41B migrated at the Ͼ1-MDa range (in the stacking gel) (11). Consistently, we observed only CSP41B mostly in fractions with 30 S particles. In contrast, CSP41A was identified in LM-A fractions together with the remainder of CSP41B in agreement with previous observations (11,100).
Polynucleotide phosphorylase (PNPase) (At3g03710.1) is an exoribonuclease that is indispensable for 3Ј-end maturation of 23 S rRNA transcripts and the efficiency of the 3Ј-end processing and polyadenylation of mRNAs as well as RNA degradation (101)(102)(103)(104). In E. coli, PNPase is part of a "degradosome" complex along with the endoribonuclease RNase E, a DEAD box RNA helicase, and the glycolytic enzyme enolase (105). However, the chloroplast PNPase has been observed as a 600-kDa homo-oligomer in spinach (106) and as a 410-kDa tetramer in Arabidopsis (11). Interestingly, PNPase in the SEC fraction was observed to peak in ϳ1-3 MDa (50 S fractions) (Fig. 4B), suggesting interactions with RNA-containing complexes.
RNase J is another endonuclease that is implicated in 16 S rRNA maturation and associates with both assembled 70 S ribosomes and 30 S particles, suggesting a role in ribosome assembly (107). The Arabidopsis orthologue (At5g63420.1) was found to be essential for embryogenesis (EMB2746) (88). In this study, it was found to elute at a wide range of masses (Ͼ1 MDa) at relatively abundant levels, suggesting interactions with RNA bound to various ribonucleoprotein complexes (polysomes and 70 and 30 S particles).
RNA Helicases-DEAD box proteins possess the characteristic Asp-Glu-Ala-Asp sequence, an RNA-binding motif, and an ATP-hydrolyzing domain and are mainly involved in ATP-dependent rearrangement of inter-and intramolecular RNA structures or remodeling of ribonucleoprotein complexes (108,109). There are 58 predicted genes for DEAD box proteins in Arabidopsis (110,111); nine of these are predicted by TargetP (2) to be plastid-localized.
Several DEAD box RNA helicases (RHs) were identified in this study and appeared in almost all fractions Ͼ1 MDa, suggesting associations with a variety of RNA-containing protein complexes. RH3 (At5g26742.1), which belongs to a subgroup of DEAD box proteins containing a Gly/Arg/Ser-rich C-terminal extension (112), was found to be the most abundant (10ϫ higher) of these RHs. RH3 was also identified in fractions above 1 MDa in the CN-PAGE fractionation of stroma (11). An Arabidopsis RH3 knock-out mutant is embryo-lethal (EMB1138) (88), indicating a crucial role in plant development. RH26 (At5g0860.1) has a long N-terminal extension containing seven internal Arg-Ser-Asp repeats (112). In addition, RH26-deficient plants display a pale green leaf phenotype, suggesting impairment in chloroplast development (113). RH50 (At3g06980.1) is the Arabidopsis orthologue of the rice OsBIRH1, which exhibits RNA helicase activities in vitro and is involved in conferring plant resistance against various stresses (114).
Transcription and DNA Binding-Proteins involved in transcription and other functions involving DNA association were observed in the HM fractions although at relatively low concentrations ( 1 ⁄10 of the ribosomal abundance) ( Table I). The PEP complex is composed of four subunits, namely RpoA, RpoB, RpoC1, and RpoC2 (115), and is the predominant transcription complex in mature chloroplasts (116). The tobacco PEP complex was affinity-purified and was observed in a native gradient gel migrating at Ͼ900 kDa (17). In the current study, the PEP complex was observed to migrate at masses Ͼ2 MDa, suggesting association with HM complexes, most likely plastid DNA as part of the nucleoid (DNA-protein assembly). Indeed, PCR analysis confirmed the presence of plastid DNA, particularly in fractions with masses Ͼ3 MDa (Fig. 2C).
Analysis of purified transcriptionally active chromosomes (TACs) from Arabidopsis chloroplasts revealed 35 proteins involved in plastid gene expression, including the PEP complex and 18 proteins (pTACs 1 to 18) that contain RNA/DNAbinding domains (117). In the current analysis, we found pTACs 2, 16, and 17 eluting with the PEP fraction. Moreover, a DNA gyrase (At3g10690.1), a thioredoxin protein (At3g06730.1), and a pfkB-type carbohydrate kinase (At1g69200.1), which were found in the TAC preparations, were also seen in the current study. In addition, several DNAbinding proteins that were not observed in the TAC analysis (117) clearly co-eluted with the PEP complex and pTAC proteins in the current study. These include two proteins involved in DNA damage repair, namely DNA mismatch repair MutS (At1g65070.1) and DNA repair protein RecA (At1g79050.1). A DNA topoisomerase (At4g31210.1) and a relatively high abundance DNA helicase (At45g35790.1) were observed peaking at 1-2 MDa. Finally, several proteins with DnaJ domains (At2g22360.1 and At4g39960.1) were found in fractions Ͼ2 MDa, and the protein heat map suggested association with the nucleoid (Table I).
Post-transcriptional Events: RNA Intron Splicing, Processing, and Editing-Proteins harboring single or multiple CRS1-YhbY domains, also called the chloroplast ribosome maturation (CRM) domain, participate in the assembly of catalytic ribonucleoprotein complexes, namely group II intron particles and the 50 S ribosomal subunit (118). In Arabidopsis, 16 proteins are predicted to have single or multiple CRM domains (118). CAF1 (At2g20020.1) has two CRM domains and is involved in group II intron RNA splicing (40). Aside from CAF1, two other CRM domain proteins were identified in this study. These include At4g39040.1 and At3g18390.1, which have one and three CRM domains, respectively. Two mutant alleles for the latter protein are embryo-lethal (EMB1865) with embryo development arrested at the globular stage, suggesting its importance in plastid development (88).
In maize chloroplasts, WTF1 is found in group II intron ribonucleoprotein complexes (600 -800 kDa) and cooperates with RNC1 in promoting group II intron splicing (119). At4g01037.1 and At4g37510.1, the respective Arabidopsis orthologues of maize WTF1 and RNC1, did not exhibit identical elution profiles in our SEC analysis but still indicated interactions with ribonucleoprotein complexes with WTF1 eluting at 1-2 MDa and RNC1 eluting at 3-5 MDa (Table I). In addition, a chloroplast-encoded maturase K (Atcg00040.1) was observed at the 1-2-MDa region in this study (Table I). The RNA binding properties and function of maturase K in other plants have been studied (120,121).
Chloroplast ribonucleoproteins (cpRNPs) are highly abundant proteins that associate with various RNA species for RNA processing and/or stabilization (129). In tobacco, cp29A, cp29B, and cp33 were mostly found as non-ribosome-bound ribonucleoprotein complexes but were also detected in fractions Ͼ600 kDa (129). In this study, cp29B (At2g37220.1) and two cp33 proteins (At1g01080.1 and At2g35410.1) were observed in the 0.8 -2-MDa range with similar elution profiles (Table I). Several cpRNPs are involved in RNA editing (130,131), although this has not yet been demonstrated for the cpRNPs identified in this study.
Lipid Metabolism-Aside from ribosomes and ribosomeassociated proteins, the other function that is enriched in the HM data set is fatty acid synthesis. Two hetero-oligomeric complexes, namely PDC and ACCase, were observed (see Fig. 3 and Table I).
The PDC is composed of multiple copies of three enzymes. The E1 component is a pyruvate dehydrogenase (consisting of ␣and ␤-subunits), E2 is a dihydrolipoamide acetyltransferase, and E3 is a dihydrolipoamide dehydrogenase. PDCs form large complexes composed of a core complex of eight trimers (cube) or 20 trimers (pentagonal dodecahedron) of E2 with E1 with E3 promoting substrate channeling across the three enzyme components (132). The maize mitochondrial PDC was found to have an estimated mass of about 8 -9 MDa (18) due in part to the 2.7-MDa E2 core (37). The plastidlocalized PDC from pea chloroplasts dissociates rapidly in vitro, making the estimation of its organization and composition difficult (133).
The PDC components eluted at a mass range Ͼ5 MDa ( Fig.  3 and Table I). All of the subunits of the plastid-localized PDC are nucleus-encoded, and all, except the E1 ␣ component (At1g01090.1), are encoded by more than one gene. An E1 ␤-subunit (At1g3020.1) has been characterized previously (134) and was observed in this experiment with unique peptides. LTA2 (At3g25860.1) is an E2 component and has been shown to exhibit dihydrolipoamide acetyltransferase activity (135). The T-DNA insertion mutant for LTA2 is embryo-lethal (136). Another E2 subunit (At1g34330.1) was found in the genome database (132), and a T-DNA insertion mutant (EMB3003) for this gene is also embryo-lethal (88). Interestingly, two elution peaks were observed for E2, one at Ͼ5 MDa and another at 1-2 MDa (see Table I), suggesting different oligomeric states for the E2 core with or without bound E1 and E3. The two plastidic E3 isoforms (At3g16950.1 or ptlpd1 and At4g16155.1 or ptlpd2) are 85% identical and were characterized previously (137,138). The presence of different paralogues for the three central components suggests that the plastid PDC population is heterogeneous; further experimentation is needed to determine the biological significance.
The plastid-localized heteromeric ACCase catalyzes the first committed step in de novo fatty acid synthesis, which occurs solely in plastids. ACCase is composed of biotin carboxylase, biotin carboxyl carrier protein (BCCP), ␣-carboxyltransferase (␣-CT), and ␤-carboxyltransferase (␤-CT) subunits. The pea chloroplast ACCase was found to elute at about 650 -700 kDa in gel filtration analysis (19). In this study, the Arabidopsis ACCase subunits were all found in the 1-2-MDa range (Table I). These include biotin carboxylase (At5g35360.1) (139), ␣-CT (At2g38040.1), and the chloroplastencoded ␤-CT (Atcg00500.1) as well as four BCCPs that were identified with distinct peptides. BCCP1 (At5g16390.1) has been well characterized (140,141). The other three BCCPs, namely At1g52670.1, At3g56130.1, and At3g15690.1, have similar molecular masses (ϳ25 kDa), but domain analysis revealed that they are missing the critical lysine residue for biotin attachment (data not shown). Nevertheless, the observation that these BCCPs elute at the same size range as the ACCase suggests that they associate with the ACCase. This is further supported by the isolation of these BCCPs together with BCCP1 from PII affinity chromatography where PII is a signaling protein that modulates ACCase activity (142). Nevertheless, the functional roles of these BCCPs remain to be characterized.
Additional proteins involved in the fatty acid biosynthesis pathway co-eluted with ACCase. KAS1 (At5g46290.1), an essential enzyme involved in the construction of unsaturated fatty acid carbon skeletons and acetyl-CoA synthetase (acetate-CoA ligase) (At5g36880.1), were also found at the HM fractions peaking at 3-5 and 1-2 MDa, respectively (Table I). Acyl carrier proteins (ACPs) carry the acyl chains during the synthesis of 16-and 18-carbon fatty acids. Several ACP isoforms are found in Arabidopsis and are expressed in a tissue-specific manner (143). ACP4 (At4g25050.1), the most abundant and most leaf-specific isoform (143), was the one observed in this study; it peaked at 0.8 -1 MDa.

High Molecular Mass Protein-Protein and Protein-Nucleotide Complexes and Expanded Proteome
Coverage-Protein-protein and protein-nucleotide interactions play a crucial rule in orchestrating biological processes. Using gel filtrationbased size fractionation, this study provides an overview of soluble chloroplast-localized assemblies between 0.8 and ϳ5 MDa, the HM range, representing about 10 -13% of the mass of the stromal proteome. This analysis complemented our previous, gel-based analysis that resolved stromal complexes up to ϳ800 kDa (11). When excluding the abundant Calvin cycle components, the HM range was dominated in biomass by 30, 50, and 70 S ribosome particles and associated factors as well as the multifunctional enzymes complexes PDC and ACCase involved in fatty acid metabolism. Furthermore, the plastid chromosome with interacting proteins separated from the bulk of ribosomes and was associated with a specific set of DNA-binding proteins, dominated by the heteromeric PEP.
The distribution of proteins across the mass range could be quantified based on the number of adjusted SPCs using appropriate normalizations. Hierarchical clustering of the data set effectively grouped the proteins into biologically related functions and complexes, indicating that the complexes were stable during the gel filtration runs, and further benchmarking against known protein assemblies demonstrated that the clustering yielded biologically meaningful associations. Therefore, we could assign lesser or unknown proteins to various complexes even if targeted validation by e.g. co-immunoprecipitations will ultimately be needed.
A second objective was to improve coverage of the chloroplast proteome and find proteins in underrepresented functional classes. Indeed, when comparing with previous chloroplast proteome analyses, we identified several low abundance proteins involved in RNA metabolism and ribosome assembly, mostly in the HM fractions. This indicates that many of the proteins involved in plastid gene expression are associated with large RNA-containing assemblies (ribosomes and RNA splice complexes) and the plastid chromosome. Further targeted analysis of these nucleotide-protein complexes is likely to reveal additional proteins. Affinity purifications that target specific binding domains (e.g. metals, ATP, and other cofactors; see e.g. Refs. 144 -146) using highly purified chloroplast preparations will be needed to further improve proteome coverage.
Metabolic Channeling in Fatty Acid Metabolism-Metabolic functions were strongly underrepresented in the HM fraction but were otherwise dominated by fatty acid metabolism in terms of protein biomass, in particular the Ն5-MDa PDC and the 0.8 -1-MDa heteromeric ACCase. We suggest that this bias toward fatty acid metabolism relates to the mixed hydro-phobic and hydrophilic nature of the substrates and complexity of the cofactors and reactions. This complexity requires enclosure of the different intermediates within these MDa complexes, also designated as metabolic channeling (147,148).
DNA Binding: Chloroplast Chromatin-Chloroplast DNA is packaged into nucleoids (organellar nuclei) consisting of multiple copies of the plastid genome complexed with proteins that are minimally characterized even if a number of them were identified from purified nucleoids or TACs (117,149,150). The majority of the nucleoid is tightly anchored to plastid membranes and requires detergent treatment, differential centrifugation, and possibly size exclusion column chromatography or co-immunoprecipitations for purification (117,151,152). In contrast, our current study did not involve detergent treatments, and the extracted stromal proteome had negligible membrane contamination. Nevertheless, plastid DNA was clearly present in fractions Ͼ2 MDa, and we identified the four subunits of the PEP complex and a dozen DNA-binding proteins (e.g. involved in DNA repair and DNA organization), some of which were observed previously (117). However, we clearly did not observe the relatively abundant membrane-associated MFP1 protein (At3g16000) (easily observed in thylakoid proteome analysis with high protein MOWSE scores; e.g. Ref. 153) that anchors the plastid DNA to the membrane (154). The DNA and associated proteins peaked in higher mass assemblies than the 70 S ribosomes, although there was some overlap. A more extensive proteome analysis of the membrane-associated nucleoid and follow-up functional analysis of the associated proteins, which also addresses the extent to which transcription and translation are coupled, are overdue.
RNA Processing, Splice, and Degradation Machinery-Most of the plastid-encoded genes in higher plants are organized as operons, which are generally transcribed as polycistronic transcriptional units (155,156). The resulting primary transcripts are modified to generate functional RNAs by RNA cleavage of pre-existing RNAs, RNA stabilization and degradation, intron splicing, and RNA editing. In addition to known factors involved in RNA metabolism, we discovered several new proteins in the HM range that, based on their functional domains (e.g. CRM and PPR), are likely to be involved in RNA metabolism. Group II intron ribonucleoprotein complexes were found enriched at size ranges of about 500 -800 kDa (119,157) even if they can also be found in higher mass ranges. We identified Arabidopsis homologues for most known group II intron splicing proteins, and most of them migrate at 1-2 MDa but were also found at higher mass ranges, suggesting a possible coupling between transcription and RNA processing. Furthermore, these RNA-interacting proteins also form associations with RNA-containing high molecular weight assemblies such as ribosomes and PEP or other RNA processing complexes. More detailed biochemical analysis, including protein-RNA interaction analysis using Ribonucleoprotein Immuno Precipitation-Chip (158) as well as targeted affinity purifications combined with protein mass spectrometry, will be required to fully understand the organization of RNA processing and how it interfaces with transcription as well as protein translation and protein assembly.
Ribosomes and Ribosome Biogenesis-The most abundant proteins in the HM data set were ribosomal proteins, and indeed, the mass spectrometry analysis identified nearly all known or predicted ribosomal proteins as well as most known and many potential ribosome-associated proteins. Interestingly, several of the ribosomal proteins were represented by multiple gene products; in some cases, one gene was plastidencoded, whereas the other was nucleus-encoded, and in other cases, both homologues were plastid-encoded but differed greatly in protein abundance. The resulting ribosome heterogeneity is likely to have functional consequences and may represent adaptation to particular developmental states, cell types, or stresses; these observations warrant further investigation. Functional assignments of ribosome-associated proteins include translation, co-translational modifications, and ribosome biogenesis, indicating that SEC fractionated ribosomes in various assembly and functional states (Fig. 4).
The identification of 12 Arabidopsis orthologues of bacterial ribosome assembly factors in the HM-A fractions suggests that ribosome assembly in chloroplasts resembles that of the prokaryotic system. Most of these factors exhibited similar low accumulation levels (50 -100-fold lower than the average ribosome abundance). So far, only two chloroplast-localized RA-GTPases in Arabidopsis were characterized, namely AtObgC and the YqeH orthologue AtNOA. Analysis of their SEC elution profiles in this study established that AtObgC associates with 30 and 50 S particles and that AtNOA1 interacts with 30 and 70 S ribosomes, consistent with bacterial studies (Fig. 4). Most bacterial RA-GTPases are essential to bacterial viability (47,48,72). Similarly, functional analysis of AtObgC and AtNOA1 mutants has revealed that both proteins are essential for plastome synthesis and chloroplast development (76,77,82,83). Targeted biochemical and genetic analyses of other Arabidopsis orthologues of bacterial ribosome biogenesis proteins, including RA-GTPases (EngA, Era, Hflx, and YqeH) and other factors (RimM, RbfA, RsmD, YrdC, and SpoU) as well as plant-specific maturation factors (DAL and IOJAP), will provide insights on their specific roles in chloroplast ribosome biogenesis and the consequences for plastid protein homeostasis. The proteome analysis in this study has opened up this challenging topic for further investigation in Arabidopsis.
Proteins with CRM domains have been shown to participate in the assembly of catalytic ribonucleoprotein complexes, namely group II intron particles and the 50 S ribosomal subunit (118). So far, all the characterized CRM domain proteins in plants (CRS1, CAF1, and CAF2) associate with RNA in vivo and are involved in group II intron splicing (159 -162). Here we also identified a CRM domain protein (At4g39040.1) migrating at 1-2 MDa (50 S ribosomes); it is potentially a ribosome maturation factor because its E. coli homologue Yhby was found to tightly associate with pre-50 S ribosomes harboring immature 23 S rRNAs (118).
The Arabidopsis CSP41B and PNPase have multiple functions in plastid gene expression, including 23S rRNA maturation (99 -104). Their SEC elution profiles show that they elute mainly in the 1-2-MDa fraction, suggesting interactions with the 50 S particle, although this does not rule out associations with other RNA-containing complexes. Another endonuclease, RNase J, is found to be crucial for 16 S rRNA processing in B. subtilis (107). The elution profile of its Arabidopsis orthologue (Ͼ1 MDa) indicates interactions with various ribonucleoprotein complexes including 30 and 70 S particles, consistent with bacterial studies (107). Overall, these findings support the role of CSP41B, PNPase, and RNase J in ribosome assembly and maturation. RNA helicases have been implicated in various RNA processing functions, including rRNA maturation during ribosome biogenesis (109). Four Arabidopsis DEAD box RNA helicases were seen in the fractions at mass ranges Ͼ1 MDa and should be considered potential candidates for ribosome biogenesis. Interestingly, RH3 is specifically up-regulated in Arabidopsis clpr2-1 mutants with reduced plastid-localized ClpPR protease levels, and this mutant exhibits a delay in ribosome assembly and/or defects in RNA metabolism (30).
Translation and Co-translational Modifications-The isolation of ribosomes together with translational factors and proteins involved in post-translational modifications suggests that translating ribosomes were captured. Our analysis clearly provided support for ribosome association of various N-terminal modifying enzymes (MAP and PDF) as well as targeting/ folding proteins (TF and cpSR54). The E. coli TF has been shown to provide a folding cavity for the nascent protein emerging from the ribosome tunnel (163)(164)(165). We did not find any obvious growth phenotype for a TF-null mutant in Arabidopsis, 3 which is perhaps not surprising because a clear phenotype in E. coli is only seen in combination with deletion of the DnaK chaperone (163). It has been shown for E. coli that during protein synthesis in particular the RPL23 protein serves as a platform for the association of enzymes, targeting factors and chaperones that act upon the nascent polypeptide that emerges from the exit tunnel (57,166). Interestingly, Arabidopsis has two identical chloroplast-encoded L23 proteins, and future studies should determine how these two paralogues contribute to protein synthesis and homeostasis. □ S This article contains supplemental Fig. 1 and Tables 1 and 2.