Rubisco activase requires residues in the large subunit N terminus to remodel inhibited plant Rubisco

The photosynthetic CO2 fixing enzyme ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) forms dead-end inhibited complexes while binding multiple sugar phosphates, including its substrate ribulose 1,5-bisphosphate. Rubisco can be rescued from this inhibited form by molecular chaperones belonging to the ATPases associated with diverse cellular activities (AAA1 proteins) termed Rubisco activases (Rcas). The mechanism of green-type Rca found in higher plants has proved elusive, in part because until recently higher-plant Rubiscos could not be expressed recombinantly. Identifying the interaction sites between Rubisco and Rca is critical to formulate mechanistic hypotheses. Toward that end here we purify and characterize a suite of 33 Arabidopsis Rubisco mutants for their ability to be activated by Rca. Mutation of 17 surface-exposed large subunit residues did not yield variants that were perturbed in their interaction with Rca. In contrast, we find that Rca activity is highly sensitive to truncations and mutations in the conserved N terminus of the Rubisco large subunit. Large subunits lacking residues 1–4 are functional Rubiscos but cannot be activated. Both T5A and T7A substitutions result in functional carboxylases that are poorly activated by Rca, indicating the side chains of these residues form a critical interaction with the chaperone. Many other AAA1 proteins function by threading macromolecules through a central pore of a disc-shaped hexamer. Our results are consistent with amodel in which Rca transiently threads the Rubisco large subunit N terminus through the axial pore of the AAA1 hexamer.

The photosynthetic CO 2 fixing enzyme ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) forms dead-end inhibited complexes while binding multiple sugar phosphates, including its substrate ribulose 1,5-bisphosphate. Rubisco can be rescued from this inhibited form by molecular chaperones belonging to the ATPases associated with diverse cellular activities (AAA1 proteins) termed Rubisco activases (Rcas). The mechanism of green-type Rca found in higher plants has proved elusive, in part because until recently higher-plant Rubiscos could not be expressed recombinantly. Identifying the interaction sites between Rubisco and Rca is critical to formulate mechanistic hypotheses. Toward that end here we purify and characterize a suite of 33 Arabidopsis Rubisco mutants for their ability to be activated by Rca. Mutation of 17 surface-exposed large subunit residues did not yield variants that were perturbed in their interaction with Rca. In contrast, we find that Rca activity is highly sensitive to truncations and mutations in the conserved N terminus of the Rubisco large subunit. Large subunits lacking residues 1-4 are functional Rubiscos but cannot be activated. Both T5A and T7A substitutions result in functional carboxylases that are poorly activated by Rca, indicating the side chains of these residues form a critical interaction with the chaperone. Many other AAA1 proteins function by threading macromolecules through a central pore of a disc-shaped hexamer. Our results are consistent with a model in which Rca transiently threads the Rubisco large subunit N terminus through the axial pore of the AAA1 hexamer.
Virtually all carbon dioxide that enters the biosphere does so via the Calvin Benson Bassham cycle (1). Aerobic autotrophic organisms such as plants, algae, and cyanobacteria all utilize this somewhat suboptimal CO 2 -fixation process, which depends on catalysis by the slow and promiscuous enzyme Rubisco. The enzyme binds the five carbon sugar ribulose 1,5bisphosphate (RuBP), adds a carbon dioxide molecule, and hydrolyzes the six carbon intermediate to form two molecules of 3-phosphoglycerate (3PGA). In plants each active site only processes ;1-3 reactions/s, and frequently oxygen gas is incorporated instead of CO 2 , which leads to production of the toxic metabolite 2-phosphoglycolate.
To overcome these flux limitations, Rubisco is overexpressed to constitute up to 50% of the leaf soluble protein and is believed to be the most abundant protein on earth (2,3). Recognition that the enzyme catalyzes the rate-limiting step has made its performance the center of multiple ongoing crop improvement strategies (4,5). In addition to its slow speed and inaccuracy, the enzyme is also susceptible to form dead-end inhibited complexes with several sugar phosphates that are present in its environment (6,7). CO 2 fixation ceases, unless the inhibitors are constantly removed. This action is performed by a group of dedicated molecular chaperones that have been termed the Rubisco activases (Rcas) (8)(9)(10). Three classes of Rca exist, and although all belong to the superfamily of AAA1 proteins, their primary sequences and mechanisms are highly distinct, indicating convergent evolution (11,12). Red-type Rca found in red-lineage phytoplankton and proteobacteria transiently threads the C terminus of the Rubisco large subunit through the axial pore of the AAA1 hexamer (10,13,14). In contrast the CbbQO-type Rca found in chemoautotrophic proteobacteria consists of a cup-shaped AAA1 hexamer (CbbQ6) bound to a single adaptor protein CbbO, which is essential for Rubisco activation (9). During Rca function the hexamer remodels CbbO, which is bound to inhibited Rubisco via a von Willebrand Factor A domain (15).
The detailed molecular mechanism by which inhibitory compounds are removed by higher-plant Rca-mediated modeling of Rubisco's active site has long remained elusive (11,16). Because functional Rca could be produced recombinantly, a large volume of biochemical information has accumulated on Rca variants (17,18). In summary the data support a canonical AAA1 pore-loop threading mechanism in which the flat top surface of the hexameric disc engages Rubisco, followed by axial pore-loop threading of an element of Rubisco (19,20). The intrinsically disordered N-terminal domain, especially a conserved tryptophan, is also important in engaging the holoenzyme (20,21). Regarding Rubisco, early studies using green algal Chlamydomonas Rubisco were able to pinpoint two residues on the Rubisco large subunits' bC-bD loop that contact the specificity helix (H9) of Rca (22,23). However, a historical inability to produce plant Rubisco in heterologous organisms such as Escherichia coli hampered further progress. This hurdle was recently removed via the concurrent functional expression of specific plant chaperonins and assembly factors within the heterologous host (24). Here, we took advantage of the newly established capability to produce and biochemically characterize many plant Rubisco variants for their interaction with Rca. We find that the highly conserved RbcL N terminus is essential for Rca function, with a particular importance of two threonine residues: Thr 5 and Thr 7 . This is consistent with an N-terminal pore-loop threading mechanism for higher-plant Rca.

A surface scan of higher-plant Rubisco for Rca-interacting residues
We used the recently established E. coli plant Rubisco expression platform (24,25) to produce a series of Arabidopsis Rubisco large subunits variants mutated in surface-localized residues in an effort to discover additional regions important for protein-protein interactions. We first tested the bC-bD loop mutations E94K and P89A as positive controls (Fig. 1A), because these substitutions had earlier been shown to greatly perturb the ability of spinach Rca to activate Chlamydomonas Rubisco (26). We then assayed the fully activated holoenzyme (ECM) and the inhibited apo-enzyme bound to the substrate ribulose 1,5-bisphosphate (ER) in the presence and absence of the short (Rcab) isoform of Arabidopsis Rca (Fig. 1B). Consistent with the Chlamydomonas-spinach result, the inhibited E94K variant of Arabidopsis Rubisco remained nonfunctional in the presence of its cognate Rca, reconfirming the importance of the N-terminal bC-bD loop in the interaction. The P89A variant, however, was activated well in this system (Fig. 1C), suggesting that the bC-bD loop-Rca interaction is less sensitive to mutation when using Arabidopsis proteins.
We next targeted a range of surface-localized Rubisco large subunit residues for mutagenesis ( Fig. 2A and Fig. S2). As we found earlier, multiple positively charged residues on the face of the Rca disc are important for its ability to activate Rubisco (20,27), and therefore the chosen mutations were biased toward probing negatively charged surface residues. This included those located in a negatively charged pocket at the dimer-dimer interface that has recently been implicated in the binding of carboxysomal Rubisco linker proteins in prokaryotic green-type Rubiscos (Fig. 2B) (28,29). We successfully purified 17 variants (Fig. S1), which were all able to carboxylate RuBP similarly to WT (Fig. 2C, Table S2, and Fig. S3). Rca assays indicated that the different variants could still be activated, effectively indicating that the chosen residues were not of critical importance to the Rubisco-Rca interaction ( Fig. 2C and Fig.  S3). Only K14A showed a statistically significant 52% increase in its Rca-mediated activation rate, possibly reflecting a reduced stability of the inhibited complex. In several species of plants, Lys 14 is trimethylated, a modification that is catalyzed by Rubisco large subunit methyltransferase (Rubisco LSMT) (30)(31)(32). In the context of activase-mediated remodeling, however, the lack of conservation across all plants diminishes the likelihood of Lys 14 trimethylation as essential for reactivation but could possibly represent a mechanism for species-specific regulatory control. Clearly the chosen single amino acid substitutions were insufficient to disrupt the extensive protein-protein interaction interface involved in Rubisco activation. However, attempts to produce combinations of mutations were unsuccessful because of either insolubility or nonfunctionality for all tested cases.

The RbcL N terminus is essential for Rca function
The red-type Rubisco activase CbbX transiently threads the RbcL C terminus (10,13,14). However, the C terminus of green-type Rubisco large subunits is poorly conserved and is of variable length (33), indicating a distinct mechanism for greentype Rca function. In contrast, although sequences at the N terminus of red-type Rubisco large subunits differ between species, both the length and the sequence of the N terminus of higher-plant RbcL is essentially completely conserved (Fig. 3A). In available crystal structures, residues 8-20 of the N terminus are ordered only when the active site is in the closed (ligand- bound) form (Fig. 3B). In the closed conformation, the N terminus is positioned directly above the 60s loop that coordinates P1 of the substrate, with Phe 13 , Lys 14 , Gly 16 , and Lys 18 forming interactions with multiple residues of the 60s loop (34). Coupled with evidence that residues 9-15 of Rubisco from both spinach and wheat are essential for functional carboxylation activity (31,35,36), the stringent conservation of the first eight residues thus suggested a tantalizing target for mutational analysis.
A Rubisco variant with the first seven amino acids replaced by methionine (DN7) displayed 83% of WT carboxylation velocity (Fig. 3C). However, when the ER complex was formed, Rca was unable to reactivate DN7 (Fig. 3C). This result was consistent with the notion that a higher-plant Rca hexamer engages the disordered N terminus via its axial pore-loops during reactivation and is likely followed by limited threading that leads to active site disruption and inhibitor release.

A dissection of the RbcL N-terminal binding motif
We then performed a detailed mutational analysis of the RbcL N terminus, generating a series of variants that, in the ECM form, were all able to carboxylate at least as well as WT (Fig. 4). Mutant variants were designed either with sequential truncations (DN1, DN2, and DN3), specific deletions or substitutions (DTET and TET-AAA), or targeted insertions (M1insAA and T7insAAA) (Fig. 4A). In plants, post-translational processing of the Rubisco large subunit results in the removal of two residues, leaving an acetylated Pro 3 as the innate N terminus (30)(31)(32)36). The Rubisco purified using the present E. coli system by Aigner et al. (24) was reported to be N-terminally processed by the endogenous machinery. We therefore determined the N terminus of selected variants using de novo mass spectrometric sequencing (Fig. 4A, Table 1, and Table  S3).
Our recombinantly purified WT RbcL was found to contain a mixture of unprocessed, partially processed, and fully processed (;50%) N termini (Table 1 and Fig. 4A). Shortening the N terminus of RbcL by a single amino acid after methionine (DN1) resulted in a homogenous pool of post processed nativelike N termini that did not negatively affect Rca function (Fig.  4A). Sequential removal of the next amino acid (DN2) resulted in a heterogenous mix of post processed RbcL states (with the first amino acid being either Gln 4 or Thr 5 ), which resulted in a 67% increase in Rca functionality. In contrast, removal of the first three amino acids after methionine (DN3) or deleting residues 5-7 (DTET) almost completely eliminated the ability of Rca to activate Rubisco (Fig. 4A). The dramatic difference between DN2 and DN3 suggests that Gln 4 is critical for Rca function, but the N-terminal sequencing result suggests it does not have to be present on every large subunit.
Lengthening the N terminus by inserting two alanine residues upstream of residue 2 (M1insAA), which resulted in a postprocessed population of mostly lengthened N termini states, was found to greatly reduce Rca function by ;64%. Changing the register of the N-terminal sequence by inserting an AAA sequence upstream of Lys 8 (T7insAAA) in the WT or DTET variant (TET-AAA) also eliminated Rca function. These results indicated that Rca function was highly sensitive to both length and identity of the RbcL N terminus.
Next, we evaluated the effect of single amino acid substitutions in the N-terminal motif. Whereas E6A and K8A substitutions were well-tolerated, both T5A and T7A resulted in ;70% reductions in Rca functionality. This finding indicates that the two threonine residues are likely to play an important role in the threading process, possibly via specific interactions with residues in Rca's pore-loops 1 and 2 (19,20). We also further note that the observed 2-amino acid step interval would be consistent with successive zipper-like interactions that have been described to be utilized for substrate engagement and threading by the central pore in multiple other AAA1 proteins (37)(38)(39)(40)(41). In RbcL, the hydroxl side chain of Tyr 20 forms a hydrogen bond to Glu 60 , a key catalytic residue that interacts with Lys 334 that is positioned at the apex of loop 6 and thought to orient the CO 2 molecule for gas addition (42). We hypothe-sized that this interaction could act to disrupt the active site when the N terminus is displaced by Rca threading. The Y20F Rubisco mutant, which lacks the hydrogen bond caused by the absence of the hydroxyl side chain, had a ;73% reduced carboxylation rate but an increase in reactivation by 39% (Fig. 4B). This result suggests that the Tyr 20 -Glu 60 interaction is important for the integrity of the active site, and its loss facilitates disruption of the inhibited complex.
Notably, we found that multiple N-terminal variants and E94K presented with significantly enhanced carboxylation velocities (up to 53%) compared with the WT enzyme in our spectrophotometric Rubisco assay (Fig. 4, A and B). Independent measurement of enzyme carboxylation kinetics (43) via 14 C-CO 2 fixation assays confirmed these findings (Table 2 and Fig. S5). In these assays we also found that the C-terminal hexahistidine tag added to the small subunit in our constructs did not significantly affect kinetics, in contrast to that reported for tobacco Rubisco expressed in planta (44). However, the N-terminally processed DN1 variant that resembles natively processed RbcL in plants (Table 1) was among the fastest enzymes using both methods (Fig. 4A, Table 2, and Figs. S4 and S5). Therefore, recombinant WT enzyme purified in this manner (and presumably all single amino acid substitutions reported in this study) are not legitimate predictors of in planta enzyme kinetics because of the presence of partially processed N termini (Table 1). Our findings indicate that correct N-terminal processing of RbcL is required to achieve full carboxylase function in planta and strongly suggest that future work using E. coli-produced recombinant plant Rubisco should utilize a DN1 construct for kinetic studies.

Discussion
The availability of E. coli produced recombinant higher-plant Rubiscos permitted us to rapidly produce many variants and assay their capability to be engaged and activated by their cognate Rca chaperone. Mutational analysis of the holoenzyme surface indicated that Rca compatibility was not easily disrupted, with the tested variants remaining functional (Fig. 2). Arguably, inclusion of less conservative substitutions such as charge switches could have been more informative here. In contrast, mutagenesis of the highly conserved N terminus resulted in multiple variants that were able to carboxylate RuBP but could not be activated by Rca. The best described conserved mechanism of numerous AAA1 ATPases concerns the translocation of a substrate peptide through the central pore of the hexamer (45), and this is the modus operandi of the red-type Rca (13). Green-type Rca pore-loops have been shown to be critical for Rubisco activation (19,20). In addition we have long been aware of the RbcL bC-bD loop-Rca specificity helix H9 interaction (23,26,46). Assuming an axial pore-loop-RbcL N-terminal threading mechanism, we can now further constrain the positioning of an Rca hexameric model (19) in relation to an inhibited Rubisco structure (47). Helix 9 elements of two adjacent Rca subunits can be placed in proximity to two RbcL bC-bD loops that are located on two adjacent dimers (Fig. 5). In this configuration the N-terminal tail (starting at Thr 7 in the structure used) is then accessible to the Rca pore. Threading of the N termini would then result in pulling residues 13-20 away from the large subunit body. Disruptions of  Table 1. Time courses are shown in Fig. S4, with the exception of WT, P89A, and E94K variants, which are shown in Fig. 1C. Values significantly different from their WT equivalent are indicated by an asterisk (one-way analysis of variance, post hoc Tukey test, p , 0.05).

Table 1 N-terminal identity of recombinantly purified wild-type and mutant RbcL
The N terminus of Rubisco large subunits from wild-type and variant mutants shown in Fig. 4A as determined by mass spectrometry de novo sequencing. The ratios of the corresponding peak areas are indicated.  the associated van der Waal's and polar interactions (34), especially with the RbcL 60s loop, may be sufficient to trigger active site opening. In this manner, it is conceivable for inhibitor relief to be achieved through limited remodeling of the RbcL N-terminal domain. The stark difference observed between the DN2 (Rca-activatable) and DN3 (Rca-defective) proteins is intriguing because some DN2 N termini were identical to those of DN3 when determined by de novo peptide sequencing (Fig. 4A). This may indicate that not every large subunit of the L 8 S 8 oligomer needs to be engaged to achieve full Rca function and would suggest cooperativity in the process. An important question that remains completely unaddressed is the role of the critical Rca N-terminal domain. This disordered stretch of ;60 amino acids is not resolved in structural models, and a single amino acid substitution of W15A eliminates Rca function (20,21). It is likely involved in an additional, so far undescribed, anchoring site.
This work also inadvertently highlighted that in our hands E. coli did not completely process the N termini of protein produced using the WT construct, leading to lower carboxylase activity than the homogeneously processed DN1 variant ( Fig. 4A and Table 2). Future studies aiming to evaluate the effect of amino acid substitutions on plant carboxylase function should therefore use the DN1 variant. Although a C-terminal hexahistidine tag on the small subunit did not appear to affect kinetics in our system (Table 2), we agree it is prudent practice not to include affinity tags when preparing Rubiscos for kinetic analysis (44).
Other reported post-translationally modifications (Pro 3 acetylation and Lys 14 trimethylation (30,31), phosphorylation at multiple sites (48)(49)(50)) that are absent in recombinantly purified RbcL (Table S3) (24,51), may also have effects on carboxylase function. Among these, acetylation of Pro 3 has been wellestablished in both plants and algae (30), but the enzyme responsible and its functional significance remain unknown. Steric hindrance from a bulkier trimethyl Lys 14 could also interfere with engagement and threading through the axial pore of the activase. However, the post-translational modification of Lys 14 is not universal, with RbcL from Arabidopsis and spinach among species not naturally methylated (30,52). In summary, the recombinant plant Rubisco production system in conjunction with purified enzymes that catalyze post-translational modifications will permit outstanding questions (32) to be resolved.
Our model is consistent with an exquisite structural snapshot of a prokaryotic carboxysome-associated green-type Rca hexamer bound to cyanobacterial Rubisco that has been communicated in a concurrent bioRxiv preprint (53). In agreement with our findings, the study also reports that an N-terminal 9-amino acid truncation of the tobacco Rubisco large subunit abolishes tobacco Rca function. Green-type Rubisco activation is thus an ancient, conserved process that appears to precede the primary endosymbiotic event leading to the chloroplast (11).

Molecular biology
Plasmids pBAD33k-AtRbcLS, p11a-AtC60ab/C20 and pCDFduet-AtR1/R2/Rx/B2 utilized in the production of Arabidopsis Rubisco in E. coli were a gift from Dr. Manajit Hayer-Hartl (24). To achieve our final construct containing the large and small subunits of Rubisco, a hexahistidine tag was appended to the C terminus of rbcS via the QuikChange protocol (Stratagene). Restriction-free cloning of the RBS-AtRbcLScHis cassette was utilized to insert the cassette into the multiple cloning site 1 of the pRSFDuet TM -1 plasmid (Novagen). To obtain single mutants, the QuikChange protocol was applied to pRSFduet-AtRbcLScHis. Truncations of the N terminus were performed by PCR amplification of regions flanking the unwanted sequence. Linearized products were then phosphorylated by T4 PNK (NEB) before end-to-end ligation. All primers used are listed in Table S1, and protein-encoding sequences were verified by DNA sequencing.
To obtain a vector encoding Arabidopsis thaliana Rca (AtRcab), the sequence corresponding to amino acid residues 59-474 (Uniprot P10896) was amplified from a cDNA library of Arabidopsis with BamHI and NotI restriction sites at the 59 and 39 end, respectively. The sequence was then inserted into the multiple cloning site of the pHue expression vector using the appropriate restriction sites to yield the final construct pHueAthRcab. Figure 5. Model for the Rubisco-Rca interaction. By placing the Helix9 interaction site (in red) of two adjacent Rca subunits in proximity to the bC-bD loops (yellow) of two large subunits belonging to different dimers, the RbcL N terminus (in magenta, N-terminal 6 amino acids missing) can be positioned under the axial pore of the Rca hexamer. Transient pore-loop threading would then lead to disruptions of interactions between the N terminus and the catalytic 60s loop, followed by inhibitor release (PDB entries 1GK8 for Rubisco and 3ZW6 for Rca). The figure was drawn using ChimeraX (58).

Protein purification
Recombinant WT activase from Arabidopsis were expressed and purified following our protocol for the purification of Agave tequilana activases to yield activases with a single glycine prior to the native N terminus of the enzyme (27). For expression and purification of recombinant Rubiscos, BL21 (DE3) E. coli cells containing p11a-AtC60ab/C20, pCDFduet-AtR1/R2/ Rx/B2, and pRSFduet-AtRbcLScHis were grown in LB medium supplemented with ampicillin (200 mg ml 21 ), kanamycin (30 mg ml 21 ), and streptomycin (50 mg ml 21 ), respectively. Starter cultures of 2 ml were grown overnight at 37°C prior to inoculation of 1-liter cultures. Large-scale cultures were grown for 3 h at 37°C to reach an A 600 of 0.3-0.4 before temperatures were lowered to 23°C and induced with 0.5 mM isopropyl b-D-thiogalactopyranoside. The cells were harvested 16 h following induction and lysed in HisTrap buffer A (50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 10 mM imidazole). Soluble fractions were then applied to HisTrap HP 5-ml columns (Sigma-Aldrich) equilibrated with HisTrap buffer A. The proteins were eluted with a linear imidazole gradient from 10 to 200 mM. Following elution, Rubisco-containing fractions were immediately subjected to a Superdex 200 gel-filtration column (GE Healthcare) equilibrated with buffer A (20 mM Tris-HCl, pH 8.0, 50 mM NaCl) supplemented with 5% (v/v) glycerol. Pure Rubisco fractions were then pooled, concentrated, and flash-frozen for storage at 280°C. Pure proteins were quantified my measuring their absorbance at 280 nm using extinction coefficients calculated using the ProtParam tool (RRID:SCR_018087).

Biochemical assays
Rubisco and Rubisco reactivation activities were measured and quantified as described (14), using the spectrophotometric Rubisco assay (54). RuBP was synthesized enzymatically from ribose 5-phosphate (55) and purified by anion-exchange chromatography (56). ECM was formed by incubating 20 mM Rubisco active sites in buffer A supplemented with 20 mM NaHCO 3 and 10 mM MgCl 2 (50 min, 25°C). For ER, complexes were generated by incubating 20 mM Rubisco active sites in buffer A containing 4 mM EDTA (10 min) prior to addition of RuBP to a final concentration of 1 mM (50 min, 25°C). Rca activities were calculated using the ECM and ER carboxylase time courses collected on the same day. All assays were performed in assay buffer (100 mM Tricine, pH 8.0, 5 mM MgCl 2 ) containing 3 ml of coupling enzymes mixture (creatine phosphokinase (2.5 units/ml), glyceraldehyde-3-phosphate dehydrogenase (2.5 units/ml), 3-phosphoglycerate kinase and Triose-P isomerase/glycerol-3-phosphate dehydrogenase (20/2 units/ml)), 20 mM NaHCO 3 , 0.5 mM NADH, 2 mM ATP, 10 mM creatine phosphate, 1 mM RuBP. Final concentrations of 0.5 mM Rubisco active sites and 2 mM Rca protomer were utilized where appropriate. All spectrophotometric assays were performed on different days with individual preparations of ECM, ER, and assay buffer A.
Radiometric kinetic assays 14 CO 2 fixation assays (0.5 ml of reaction volume) were performed at 25°C in 7.7-ml septum-capped glass scintillation vials with assay buffer (100 mM EPPS-NaOH, pH 8.0, 20 mM MgCl 2 , 1 mM EDTA), 10 mg/ml carbonic anhydrase, and 1 mm RuBP. All the assay components were equilibrated with CO 2free air prior to addition of 14 CO 2 concentration varying from 0.46 to 4.7 mM NaH 14 CO 3 (corresponding to 6-60 mM 14 CO 2 ). Purified Rubisco ('5 mM active sites) was first activated with assay buffer containing 10 mM NaHCO 3 , and the assay was initiated by addition of 20 ml of activated Rubisco. The assay was stopped after 1 min using 200 ml of 50% (v/v) formic acid. The specific activity of 14 CO 2 was measured using the highest 14 CO 2 concentration and 5.2 nmol of RuBP. The reaction was assayed for 30 min, and the activity ranged from 840 to 1200 CPM/mol RuBP. The reactions were dried using a heat block and resuspended in 750 ml of water and 1 ml of Ultima Gold XR scintillant before quantification using a scintillation counter. Rubisco active sites were quantified using 14 C-CABP (2-carboxy-arabinitol 1,5 bisphosphate) binding assay with 20 ml of activated Rubisco (54). The 14 C-CABP-bound Rubisco was separated from the free ligand using size-exclusion chromatography and quantified by scintillation counting.

De novo mass spectrometric sequencing
De novo mass spectrometric sequencing services were provided by the Bioprocessing Technology Institute (BTI-A*STAR). Briefly, purified proteins were separated on SDS gels prior to excision of RbcL-containing regions and in-gel trypsin digestion. LC-MS/MS analysis was performed using a nano-ACQUITY UPLC (Waters, Milford, MA, USA) coupled to a LTQ Orbitrap Elite ETD mass spectrometer (Thermo Scientific, Waltham, MA, USA) as described in Ref. 57. The spectra were obtained using data-dependent scanning tandem MS with full MS scans at 120,000 resolution from 350 to 1600 m/z followed by HCD Orbitrap tandem MS scans of the 12 most intense peptide ions with normalized collision energy of 35.0% at a resolution of 15,000. Tandem MS raw files were directly analyzed using the PEAKS Studio X plus software (Bioinformatics Solutions; Waterloo, Canada) for de novo sequencing, semispecific tryptic peptide database search, peptide area detection, and annotation of MS/MS spectra according to the software instructions. Database search was conducted against the sequence of WT RbcL (Uniprot:O03042) or our described Nterminal variants from Arabidopsis. Peptide and fragment ion mass tolerances used were 6 10 ppm and 6 0.5 Da, respectively. Carbamidomethylation of cysteine was included as a fixed modification, whereas oxidation of methionine, N-terminal acetylation, and N/Q deamidation were considered as variable modifications. Two missed cleavages were allowed for searching the data. The false discovery rate was set at 0.1%.

Data availability
The data generated and analyzed in this study are available at https://researchdata.ntu.edu.sg/dataverse/cajar.