Identification of a carbonic anhydrase–Rubisco complex within the alpha-carboxysome

Significance Rubisco is responsible for the majority of inorganic carbon assimilation on Earth. To ensure efficient CO2 fixation, cyanobacteria and many autotrophic proteobacteria concentrate CO2 in the carboxysome, a bacterial organelle encapsulating Rubisco and carbonic anhydrase within a protein shell. It remains unknown exactly how this 250+ megadalton protein complex assembles with high fidelity inside cells. Here, we explore the encapsulation mechanism of the carbonic anhydrase, CsoSCA, and demonstrate that it is incorporated into the α-carboxysome via a carbonic anhydrase–Rubisco complex. Our results update the current model for carboxysome biogenesis and inform strategies for engineering CO2 concentration mechanisms into crops and industrially relevant microorganisms for improved growth and yields.

). Due to the centrality of this reaction in metabolism, CA is an essential protein in all organisms where it has been tested (1)(2)(3).In photosynthesis, CA's role is often to supply the enzyme Ribulose-1,5-bisphosphate carboxylase-oxygenase (Rubisco)-the carboxylase of the Calvin-Benson-Bassham cycle-with CO 2 to ensure fast fixation (3).Rubisco has modest turnover numbers and fails to distinguish between CO 2 and the competing off-target substrate of O 2 (4)(5)(6).To overcome Rubisco's limitation, plants, algae, and some bacteria have evolved different types of CO 2concentrating mechanisms (CCMs) which concentrate CO 2 near Rubisco (7).This ensures saturation of Rubisco's active sites with CO 2 , competitive inhibition of oxygenation, and an increase in overall carbon assimilation rates.Importantly, to understand the role of a CA in a CCM, it is essential to understand enzyme localization and regulation (3).
The bacterial CCM is present in all cyanobacteria and many proteobacteria.It consists of two main components: (I) energy-coupled inorganic carbon transporters that actively accumulate bicarbonate in the cytosol and (II) a proteinaceous bacterial organelle called the carboxysome, which coencapsulates Rubisco and CA within a capsid-like protein shell (8)(9)(10)(11).The accumulated HCO 3 − diffuses into the carboxysome where it is rapidly converted to CO 2 by CA, while diffusion out of the structure is likely restricted by the shell (12,13).This produces a locally high CO 2 concentration within the carboxysome and enables efficient Rubisco carboxylation (14).
Microbiology and biochemistry show there are, in fact, two types of carboxysomes which have evolved convergently (8,15).These are the αtype found in oceanic cyanobacteria and proteobacteria (Fig. 1A) and the βtype found in freshwater cyanobacteria.In both instances, CA localization in the carboxysome is essential for growth in present-day atmospheric CO 2 concentrations (16)(17)(18).In contrast, CA activity in the cytosol has been shown to short-circuit the CCM leading to high CO 2 -requiring phenotypes (19).Efficient encapsulation and regulation of CA activity are thus crucial for cell survival.All αcarboxysomes contain a β-CA, CsoSCA (20)(21)(22)(23), while βcarboxysomes have either an active γ-CA domain on the scaffolding protein CcmM (24,25) or a β-CA named CcaA (26).While the mechanism of CA incorporation into the βcarboxysome is understood (27,28), it is unknown how CsoSCA incorporates into the αcarboxysome.
CsoSCA belongs to its own subclass of β-CAs and uniquely consists of three domains: an N-terminal domain, a middle/catalytic domain, and a C-terminal domain (Fig. 1B) (22,29).X-ray structural analysis has shown that the catalytic domain contains the zinc-binding site as well as catalytic residues essential for CA activity.The C-terminal domain appears to be an ancient gene duplication of the catalytic domain but lacks the zinc-binding residues.The N-terminal domain (NTD) consists of an unstructured N-terminal peptide followed by a ~100 residue αhelical domain, which lacks homology to any other known protein.The function of this domain is mysterious and has been speculated to be involved in the encapsulation process (22,29).Fig. 1.An intrinsically disordered, poorly conserved N-terminal peptide is essential and sufficient for CsoSCA encapsulation.(A) Schematic of the cso operon (carboxysome operon) in H. neapolitanus.The 10-gene set consists of Rubisco large and small subunits, the scaffolding protein CsoS2, the carbonic anhydrase CsoSCA, and six shell proteins (CsoS4A/B, CsoS1A/B/C, and CsoSD1).In the native organism, CsoS1D is transcribed from an adjacent locus, while in the synthetic pHnCB10 plasmid, all genes are in a single operon.(B) Surface representation structure of the CsoSCA dimer from H. neapolitanus (pdb: 2FGY).The N-terminal domain (dark green) consists of a ~50-aa-long unstructured peptide followed by a folded α-helical domain with unknown function.The middle domain (light green) contains the active site.The C-terminal domain (white) appears to be a gene duplication of the catalytic domain but lacks essential active site residues.(C) Maximum-likelihood phylogenetic tree of CsoSCA.Cyanobacterial homologs are colored in green and proteobacteria homologous in an orange/brown gradient.Scale bar, 0. Here, we used biolayer interferometry (BLI) to screen CsoSCA for binding to all αcarboxysome proteins and identified Rubisco as its interaction partner.We show that the Rubisco interaction and encapsulation into carboxysomes are dependent on CsoSCA's unique intrinsically disordered N-terminal peptide.Using this peptide, we targeted foreign cargo into the carboxysome, demonstrating that this sequence is sufficient for encapsulation.We further determined a 1.98 Å single-particle cryoelectron microscopy (cryo-EM) structure of Rubisco in complex with the NTD peptide.The structure reveals that the peptide interacts with Rubisco at a site overlapping with a recently identified site responsible for targeting Rubisco to the αcarboxysome via interaction with the carboxysomal scaffolding protein CsoS2 (30).Thus, our work identifies a carbonic anhydrase-Rubisco supercomplex found inside the αcarboxysome and highlights a surprising flexibility in the scope of protein-protein interactions which lead to αcarboxysome self-assembly.

Results
CsoSCA's N terminus Is Necessary and Sufficient for Encapsulation.

H. neapolitanus growth assay.
In order to identify putative mechanisms for encapsulation of CsoSCA into the carboxysome, we first started with a bioinformatic examination of the CsoSCA protein.Phylogenetic analysis revealed that CsoSCA from cyanobacteria and proteobacteria cluster into two separate subfamilies (Fig. 1C, SI Appendix, Fig. S1, and Dataset S1).The cyanobacterial subfamily divides into two clusters.The proteobacteria subfamily is more diverse but has three distinct clusters, including a transition cluster more closely related to the cyanobacterial subfamily.Multiple sequence alignment (MSA) and calculated conservation score reveal a poorly conserved N-terminal region of the NTD, while the rest of the protein is highly conserved (Fig. 1D).In our model organism, the γproteobacterium H. neapolitanus, the sequence conservation of CsoSCA starts with residue H51.Using representatives from the different clusters of the phylogenetic tree, it was revealed that this ~30to 130-amino-acid-long NTD is predicted to be disordered (Fig. 1D).Even though the NTD primary sequence is not conserved, its existence among all species is suggestive of a function.
To investigate the role of CsoSCA's unstructured NTD-peptide, a csoSCA deletion strain of H. neapolitanus (ΔCsoSCA) was complemented with a version of CsoSCA lacking its first 49 residues (ΔNTD 1 -49 CsoSCA).The ΔNTD 1 -49 CsoSCA strain failed to grow in air (Fig. 1E), whereas complementation with full-length CsoSCA rescued growth, indicating an essential role for CsoSCA's NTD.Synthetic carboxysome operons containing truncated CsoSCAs (ΔNTD 1 -37 CsoSCA and ΔNTD 1 -49 CsoSCA) were heterologously expressed in E. coli.Western blot analysis of enriched carboxysome (Fig. 1F) fractions showed that neither of these constructs produced carboxysomes containing CsoSCA, suggesting that the growth defect seen in the phenotyping experiment is due to an inability to encapsulate CsoSCA into the carboxysome when the N-terminal peptide (NTD 1-49peptide) is removed.
To further test the role of the CsoSCA NTD in encapsulation, we sought to target foreign cargo into the carboxysome via fusion with peptides derived from the NTD.Monomeric superfolder GFP (sfGFP) fused with either the first 37 (NTD 1-37 -sfGFP) or first 53 (NTD 1-53 -sfGFP) residues of CsoSCA NTD was coexpressed with synthetic carboxysomes in E. coli and purified to assess GFP encapsulation.Both NTD 1-37 -sfGFP and NTD 1-53 -sfGFP produced green fluorescent carboxysomes containing the fusion protein (Fig. 1G), while a negative control did not.Normalized for expression levels, the encapsulation efficiencies (sfGFP fluorescence of carboxysome/lysate) were as follows: NTD 1-37 -sfGFP: ~3%, NTD 1-53 -sfGFP: ~15% and CsoSCA-sfGFP: ~23% (Dataset S2).This indicates that additional sequence elements not present in NTD 1-37 -sfGFP may be needed for efficient encapsulation.Shell and Rubisco protein levels were the same for all carboxysome constructs and, hence, should not influence the efficiency of NTD encapsulation (Dataset S2).In summary, these results demonstrate that the N-terminus is necessary and sufficient for encapsulating CsoSCA into the carboxysome.
CsoSCA Interacts with Rubisco.Previous immunogold labeling EM and biochemical assays (freeze/thaw treatment of carboxysomes) have suggested that CsoSCA may associate with the shell, but no specific interactions have been described (20,21).Thus, to identify CsoSCA's interaction partner, we measured binding of untagged CsoSCA against most carboxysome proteins (the shell proteins CsoS1A, CsoS1B, CsoS1D, and CsoS4B; the scaffolding protein CsoS2B; and Rubisco) using BLI.This screen showed that CsoSCA interacted with Rubisco, while none of the other carboxysome proteins had detectable binding above background (Fig. 2A).An N-terminal truncated protein variant (ΔNTD 1-37 CsoSCA) did not bind Rubisco, confirming NTD's involvement in the interaction (SI Appendix, Fig. S2A).Native-PAGE confirmed binding between CsoSCA and Rubisco and lack of binding to the major shell proteins CsoS1A and CsoS1B (SI Appendix, Fig. S2B).Finally, coelution of Rubisco and NTD 1-53 -sfGFP using size exclusion chromatography confirmed the interaction in a solution-based assay (SI Appendix, Fig. S2C).
Concentration dependence of CsoSCA binding to Rubisco was confirmed by BLI and K D of the interaction determined to be 1.2 nM ± 0.1 (k on = 2.5 × 10 5 M −1 s −1 , k off = 2.9 × 10 −4 s −1 ) (Fig. 2B and Table 1).Due to instability of WTCsoSCA, the K D was measured with a CsoSCA-MBP fusion [the negative control of MBP alone did not bind Rubisco (SI Appendix, Fig. S2D)].Using the stopped-flow based Khalifah/pH-indicator assay (31), CsoSCA-MBP was confirmed to be catalytically active (SI Appendix, Fig. S2G), demonstrating retained physiological state of the fusion protein.
Although a mutated variant of CsoSCA crystallized as a dimer (Fig. 1B) (Y92H mutation resulting in introduction of an additional Zn 2+binding site) (22), the CsoSCA-MBP protein eluted on size exclusion chromatography at an estimated molecular weight of 612 kDa, suggesting a hexameric state (SI Appendix, Fig. S2 E and F).Due to this discrepancy, present data cannot conclude whether WTCsoSCA is dimeric or hexameric.Rubisco binding to NTD 1-53 -sfGFP further confirmed the NTD 1-53 -peptide interaction (K D1 = 30 nM, K D2 = 80 nM; fit to a 1:2 model) (Fig. 2C, Table 1, and SI Appendix, Fig. S2G).The 25-fold higher K D with CsoSCA compared to NTD 1-53 -sfGFP is mainly an effect of a slower off-rate, demonstrating the importance of the multivalency (resulting from CsoSCA being multivalent (dimeric or hexameric) while NTD 1-53 -sfGFP is monomeric) in obtaining a high-affinity interaction with Rubisco.An emerging theme of CCM self-assembly is that Rubisco interacts with various CCM proteins via Short Linear Motifs (SLiMs) found in intrinsically disordered proteins/regions (IDP/Rs) (30,32,33).Since CsoSCA's NTD appears to bind Rubisco (Fig. 2C and SI Appendix, Fig. S3), we next sought to determine the structure of Rubisco in complex with this peptide using cryo-EM.In order to promote high occupancy of available binding sites, an excess of NTD peptide was complexed with Rubisco and imaged as described in the Materials and Methods.These data yielded two slightly distinct single-particle reconstructions of Rubisco bound to a peptide corresponding to the first 50 residues of CsoSCA (NTD 1-50 ) at 1.98 Å (State-1) and 2.07 Å (State-2) nominal resolution (SI Appendix, Figs.S4 and S5 and Table S1).Of note, both reconstructions show densities for ordered waters and alternate side-chain conformers (SI Appendix, Fig. S5).The two confirmations are highly similar (RMSD: 0.22 Å) with the most notable differences occurring in the βsheet of the Nterminal domain of CbbL, in particular the conformations of loops P37-D42 and G115-G125 (SI Appendix, Fig. S6).Since both reconstructions display similar density for the NTD 1-50 peptide, we do not attribute the difference in confirmation to the presence of the peptide but rather subtle "breathing" of Rubisco.For clarity, we choose to predominantly focus our discussion on the higher resolution 1.98 Å structure (State-1).
In the cryo-EM reconstruction of the Rubisco-NTD 1-50 complex, density corresponding to NTD 1-50 is located in a groove formed at the interfaces of two CbbL subunits (from two different CbbL 2 dimers) and one CbbS (Fig. 3).The biological assembly of Rubisco is CbbL 8 S 8 , resulting in eight of these binding sites per Rubisco oligomer.The peptide density is of marginally lower-quality (local resolution estimate: 2.1 to 2.2 Å) than that of the surrounding Rubisco density.Nevertheless, we could confidently assign the density to nine residues of the NTD 1-50 peptide starting at P22 (PRLDLIEQA) (Fig. 3 B and C).The structure of the Rubisco-NTD 1-50 complex is highly similar to a previous crystal structure of Rubisco (PDB: 1SVD, RMSD: 0.45 Å, SI Appendix, Fig. S7), indicating that binding of the NTD 1-50 peptide does not induce large-scale conformational changes in Rubisco.
The resolved region of the peptide starts at the bottom of the CbbS subunit (around loop D94-S99), runs downward within the groove between the two CbbL subunits, and ends between βstrand S345-I347 in CbbL B and loop P19-I29 in CbbL A , that results in a buried interface of approximately 700 Å 2 .This short stretch of sequence is predicted by JPred to form an alpha-helix (SI Appendix, Fig. S8A).Indeed, we observe this segment to form a single helical turn and thereafter an extended coil (Fig. 3 B and  C).The sharp turn of the backbone introduced by P22 (the first observed residue of the bound peptide) ensures that the upstream peptide chain points outward toward the solvent instead of clashing with CbbL.

Binding Is Predominantly Mediated by a Network of Hydrogen
Bonds.Our atomic model of the Rubisco-NTD 1-50 cocomplex indicates that the interaction between the NTD 1-50 peptide and Rubisco is largely mediated through polar interactions and, predominantly, hydrogen bonds.R23 forms an extensive network of interactions with the neighboring CbbL subunits.The side chain of R23 forms a salt bridge with D99 (CbbL A ) and hydrogen bonds to the hydroxyl groups of the CbbL subunits Y72 (CbbL A ) and S345 (CbbL B ) as well as to the carbonyl of G362 (CbbL B ) (Fig. 3E).The L24 amide and L26 carbonyl hydrogen bond to the carbonyl of CbbS Y96 and amide of CbbL B F346, respectively (Fig. 3 D and F).
A water-mediated hydrogen-bonding network likely also contributes to peptide binding (SI Appendix, Fig. S9).This putative network is predominantly built up by backbone-water interactions and consists of interactions between N29 and CbbL A Y72 (SI Appendix, Fig. S9 A and B), R23 and CbbL B S345, and L26 and CbbL B F346 (SI Appendix, Fig. S9C).However, in the lower resolution structure (State-2), the two waters mediating the interactions between the peptide and CbbL B are not resolved (SI Appendix, Fig. S9D), possibly due to the slightly lower resolution of this reconstruction.Rubisco residues interacting with CsoSCA have a high conservation score among αcarboxysomal Rubiscos but are in general not conserved in βcarboxysomal Rubiscos (SI Appendix, Fig. S8B).
To determine the relative importance of the different interactions, we measured binding kinetics with a selected set of point mutations on both the NTD peptide and Rubisco (Table 1, Dataset S3, and SI Appendix, Fig. S10).The P22A mutation resulted in a dramatic loss in binding.While a protonated CbbS D94 could potentially hydrogen bond with the amide of P22, this large effect is more likely due to P22's importance in establishing the initial alpha-helical backbone conformation of the peptide or the sharp backbone turn that is essential for binding.
Despite the many interactions made by the buried peptide residue R23, mutation of this residue to alanine yielded roughly the same K Dvalue as the wild type.However, mutation of the residues on CbbL A which interact with R23-Y72A and D99A-resulted in a 10-fold increase in K D (mainly an effect of slower on-rate).These results are consistent with the net contribution of interactions made by R23 to binding to be quite low, but, nevertheless, this residue

Immobilized
In solution adversely affects binding when these interactions are not satisfied in a buried conformation.In this context, R23 may play a role in establishing the specificity of the interaction between Rubisco and CsoSCA.
The remaining hydrogen bonds between the NTD peptide and Rubisco are mediated by backbone moieties.Furthermore, the peptide is tightly packed in the cleft formed by Rubisco to form a buried interface comprising 700 Å 2 of the 1,200-Å 2 solvent accessible surface area of the peptide.Thus, given the strong negative effect of the P22A mutant on binding, shape complementarity between peptide and Rubisco appears critical to enable the extensive backbone-mediated hydrogen bonds and van der Waals interactions that drive binding.
CsoSCA Binds at the Same Site as CsoS2.αcarboxysome assembly is mediated by a repetitive and disordered protein, CsoS2, which is thought to bind both Rubisco and shell proteins, thus serving as a physical scaffold bridging these two major components.We previously solved the structure of Rubisco in complex with an N-terminal peptide derived from CsoS2 (CsoS2-N*) (30).Surprisingly, CsoSCA and CsoS2 bind at nearly the same location on Rubisco but utilize substantially different SLiMs and binding modes (Fig. 4 A and B).
CsoS2-N* is largely alpha-helical and binds Rubisco by spanning over the CbbL 2 dimer interface lying on top of the protein surface (Fig. 4B).The complex is highly dependent on salt bridges and cation-pi interactions.In contrast, the CsoSCA peptide is bound in a conformation turned roughly ~45 degrees and with a greater fractional buried surface area for the observed peptide (approximately 700 Å 2 of 1,200 Å 2 solvent accessible surface area, compared to 830 Å 2 of 2,500 Å 2 ).CsoSCA is buried deeper into the groove between the two CbbL subunits and interacts mainly via hydrogen bonds and what appears to be an ordered network of water molecules.Notably, both peptides make significant interactions with Rubisco CbbL Y72 (Fig. 4B).This residue is conserved in αcarboxysome Rubisco but not in Rubisco from βcarboxysomes or the Form II Rubisco in H. neapolitanus, and likely contributes to specificity.Both proteins interact with Y72 via arginines; however, in CsoS2-N*, R10 is cation-pi stacked between CbbL Y72 and CbbS Y96, while in CsoSCA, R23 is positioned deeper into the structure and hydrogen bonds with CbbL Y72 and D99.Another notable feature is CbbS Y96, which in the Rubisco-CsoS2 structure is flipped ~90 degrees compared to the wild-type and Rubisco-CsoSCA structure (Fig. 4B), covering the groove between the CbbL subunits interface and enabling the conformation necessary for cation-pi interaction.
Combined, these interactions would seemingly make it impossible for CsoSCA and CsoS2 to bind to the same site of Rubisco at the same time.We have previously developed a Rubisco-CsoS2-N* fusion with all such binding sites occupied due to high local concentration of the CsoS2-N* peptide.As expected, BLI measurement indicated that NTD 1-53 -sfGFP cannot bind to Rubisco when CsoS2-N* is already present, thus confirming that CsoS2 and CsoSCA compete for the same binding site (Fig. 4C).Earlier experiments from our group have demonstrated in vitro condensate formation between Rubisco and CsoS2-NTD suggesting that assembly of the αcarboxysome occurs through a condensation-like event (30).Here, we extended these experiments to include CsoSCA in the Rubisco-CsoS2-NTD condensates.These results clearly show that CsoSCA is recruited into the phase-separated Rubisco-CsoS2-NTD condensates (Fig. 4D) and demonstrate that all three proteins can simultaneously participate in such a protein interaction network.

Discussion
In this study, we have determined the structural basis for carbonic anhydrase encapsulation in αcarboxysomes.We found that in the model organism H. neapolitanus, CsoSCA and Rubisco form a supercomplex.Through biophysical measurements and in vivo experiments, we found that this complex formation is dependent on the intrinsically disordered N-terminal peptide of CsoSCA.The cryo-EM structure of Rubisco in complex with this peptide reveals that CsoSCA binds Rubisco at a site overlapping with that of the scaffolding protein CsoS2.Aside from its enzymatic activity, this establishes Rubisco's additional function as an interaction hub in the assembly of the αcarboxysome.The intrinsically disordered and repetitive protein CsoS2 acts as a scaffold between shell and Rubisco and orchestrates the assembly (34,35) of the αcarboxysome.We previously discovered that a repeat of conserved SLiMs (four repeats/protein) found in the N-terminal portion of CsoS2 binds Rubisco and is essential for carboxysome formation (30).Here, we demonstrate that the carboxysomal carbonic anhydrase (CsoSCA) is recruited to the carboxysome via association with Rubisco.We further show that CsoSCA's N-terminal targeting peptide and CsoS2 bind at the same site on Rubisco.Fig. 5A presents our current model of αcarboxysome assembly.
Previous work has shown that, on average, there are 450, 440, and 60 copies of Rubisco, CsoS2, and CsoSCA in a typical H. neapolitanus carboxysome, respectively (Fig. 5B) (36).The roughly 1,800 CsoS2 and 120 CsoSCA Rubisco binding motifs per carboxysome set an upper boundary of occupancy for the binding sites of Rubisco (~3,600 sites/carboxysome).Assuming all CsoS2 and CsoSCA motifs engage in binding, roughly 50% of Rubisco sites would be occupied by CsoS2, while considerably less, ~3.5%, would be occupied by CsoSCA.Although this assumes that all Rubisco sites are accessible and that all CsoS2 and CsoSCA motifs bind, both assumptions of which could be incorrect, such a calculation indicates there is likely a surplus of Rubisco sites available for binding.Our finding that CsoSCA is recruited to Rubisco-CsoS2-NTD condensates supports the hypothesis that this ternary complex is an important feature of cargo assembly in αcarboxysomes in vivo (Fig. 4D and SI Appendix, Fig. S11).Further, it has previously been shown that CsoSCA mRNA levels are lower compared to other carboxysome genes (37), suggesting that the amount of encapsulated CsoSCA is likely regulated by protein expression level rather than by competing for binding site occupancy with CsoS2.Due to the need for tight regulation of CA activity outside of the carboxysome, efficient encapsulation is vital (19) and a scenario where CsoSCA had to compete for binding could pose a physiological problem.
One specific unknown is the importance of CsoSCA multivalency imparted by its oligomeric structure and the resulting mode of interaction with Rubisco.The significantly slower dissociation rate of full-length CsoSCA (multivalent) compared to the NTD 1-53peptide (monovalent) implies importance of multivalent protein-protein interactions (Table 1 and Fig. 2 B and C), a feature commonly observed for other IDP/R involved in phase separation (38).In terms of binding mode, bivalent binding of full-length CsoSCA could occur either between two binding sites on the same Rubisco or between two sites on different Rubisco molecules.The relatively short stretch of IDR sequence before the Rubisco binding motif and the rigidity of the folded domains presumably constrains possible binding conformations where CsoSCA binds on top or on the side in a 1:1 CsoSCA-Rubisco complex (SI Appendix, Fig. S12).Alternatively, CsoSCA could cross-link two Rubisco molecules (SI Appendix, Fig. S12).Further, previous experiments have indicated that CsoSCA is localized to the shell (20,21).Although our data suggest that CsoSCA makes a primary interaction with Rubisco, and would likely be found throughout the carboxysome, it is possible that additional unknown protein interactions could bias CsoSCA localization toward the shell.Recent cryoelectron tomography work has been unable to unambiguously locate CsoSCA inside the carboxysome (39,40).However, rapid advances in this technique will likely, in the near future, determine the binding conformation and localization of all such components in situ.
Plasticity of the Rubisco Binding Motif.We could not identify a consensus Rubisco SLiM-binding motif across NTD sequences in CsoSCA homologs.The binding element identified in H. neapolitanus CsoSCA (PRLDLIEQA) is present in its most closely related homolog (Halothiobacillus sp.LS2) but is not conserved across species (further discussed in SI Appendix, Fig. S13 A and B).Many prolines are followed by R or xR, but, overall, the proteobacteria clade contains no convincing conserved motifs.In the cyanobacterial clade, a PTAPx[R/K]R motif is present in 87% of the sequences, suggesting a possible binding motif among cyanobacteria.
Surprisingly, a handful of cyanobacterial CsoSCA sequences from the Prochlorococcus genus contain the Rubisco-binding motif found in CsoS2 (RxxxxxRRxxxxxxGK) (SI Appendix, Fig. S13A), suggesting an evolutionary relationship.The lack of a consensus motif in CsoSCA homologs, coupled with the fact that CsoSCA and CsoS2 bind at the same site but with different mechanisms, reveals an evolutionary plasticity in SLiM sequence space.Across Fig. 5. Updated model for carboxysome assembly.(A) Schematic model of α-carboxysome assembly in which CsoSCA is recruited to the carboxysome via interactions with Rubisco.The model involves 1) initial molecular associations (specific order not known), 2) Cargo nucleation, 3) cargo growth with local phase separation, and finally 4) shell closure forming fully assembled carboxysomes.Current knowledge does not allow us to distinguish between whether CsoSCA associates with Rubisco during the initial association, step 1, or whether association occurs in the phase-separated condensate, step 3 (or both).CsoSCA is depicted as a dimer; however, present data cannot conclude whether CsoSCA is dimeric or hexameric.The fully assembled carboxysome in step 4 shows a stoichiometrically accurate-with respect to cargo proteins-version of the α-carboxysome.(B) Average number of cargo proteins present in an α-carboxysome (36), number of binding sites per oligomeric form of cargo protein, and total number of binding sites per carboxysome.
the various microbes in this phylogeny, we hypothesize that CA is recruited to its respective αcarboxysomes by the observed, versatile Rubisco binding site and does so using diverse SLiM sequences.
Rubisco as an Interaction Hub in Biophysical CCM's.Recent work on both bacterial carboxysomes and algal pyrenoids suggests that Rubisco itself plays a role as an interaction hub in the ultrastructural organization of CCMs.It is now clear that not only does Rubisco interact with scaffolding proteins as a means to condensate Rubisco and form these confined CO 2fixing organelles, it also recruits auxiliary proteins, such as CAs and activases, needed for the CCM to function.In the bacterial carboxysomal αlineage, we have demonstrated that Rubisco binds the intrinsically disordered proteins CsoS2 (30) and CsoSCA.Additionally, it also binds the Rubisco activase CbbQO, likely via CbbO's von Willebrand factor A domain (41,42).The βcarboxysomal Rubisco binds its interaction partners-the scaffolding protein CcmM (43,44), and the Rubisco activase Rca (45,46)-via a folded domain resembling the small subunit of Rubisco (SSLD).Similar to CsoSCA, the βcarboxysomal CA, CcaA, is also recruited via a terminal peptide.However, instead of direct interaction with Rubisco, the two enzymes are linked together via the scaffolding protein CcmM (28).This convergent function may have evolutionary significance-recent results suggest that Rubisco-CA colocalization was an important step in the evolution of biophysical CCMs (47,48).Despite convergent evolution, a notable similarity between both carboxysome lineages is the binding site on Rubisco.In known cases (except for CbbQO), the interactor binds at the same patch on Rubisco and makes contact with two different CbbL 2 dimers and one CbbS.This likely ensures that the binding partner only interacts with fully assembled CbbL 8 S 8 Rubiscos during the assembly process.
In contrast to these bacterial systems, in the model algae Chlamydomonas reinhardtii, a repeat SLiM in the disordered scaffolding protein EPYC1 is essential for pyrenoid formation (49,50) and binds on top of the small subunit via salt bridges and a hydrophobic interface (32).The sequence motif is shared among many pyrenoid proteins, suggesting a mechanism for protein targeting as well as more broadly organizing pyrenoid ultrastructure (33).The versatility in binding motif and binding site, and the convergent function of diverse Rubiscos as a hub of interaction, implies this might be a general feature, which raises a final question: Do Form IB plant Rubiscos engage in similar protein-protein interactions and do other Rubisco Forms also function as interaction hubs?
In summary, this work advances our understanding of carboxysome biogenesis and puts a focus on both the essential carbonic anhydrase and the role of Rubisco as a hub protein.This provides critical findings for engineering the carboxysome-based CCM into, e.g., crops and industrially relevant microorganisms for improved growth and yields.More broadly, we hope that the findings presented here will advance our understanding of bacterial microcompartments and promote development of their many potential biotechnological applications.

Materials and Methods
Bioinformatics.Protein sequences assigned CsoSCA (pfam08936) from all finished and permanent draft bacterial genomes available in the Integrated Microbial Genomes and Microbiomes database (51) were collected on December 12, 2019, and curated to only include proteins in an α-carboxysome operon (containing CbbL/S, CsoS2, and shell hexamers and pentamers) (412 genes).Thereafter, redundancy was reduced by removing sequences with >98% identity using Jalview, and sequences were manually curated to remove incomplete sequences, resulting in 222 sequences in the final CsoSCA dataset (Dataset S1).Sequences were aligned using MUSCLE (52).Resulting MSA was used to calculate the conservation score (53).Tree was built using IQ-TREE web server (54) and visualized using iTOL (55).Protein disorder was predicted for a subset of the dataset, including H. neapolitanus CsoSCA, using the DISOPRED3 algorithm (56).Conservation of CsoSCA NTD Rubisco binding motif was analyzed using The MEME Suite (57) (Dataset S1).MSA of α-carboxysomal Form IA Rubiscos (135 cbbL and 132 cbbS sequences) and β-carboxysomal Form IB Rubiscos (211 cbbL and 207 cbbS sequences) was constructed (Dataset S1) using, MUSCLE and visualized with WebLogo.Secondary structure prediction of NTD sequence was performed using Jpred4 (58).
Protein Expression and Purification.Specifics regarding E. coli strain, plasmid, expression condition, and purification method for each protein in this study (CsoSCA variants, sfGFP fusions, Rubisco, shell proteins and CsoS2) are provided in Dataset S4.Protocols for protein expression and purification are fully described in SI Appendix, Method.In short, E. coli cells harboring appropriate expression plasmids were grown at 37 °C in LB-medium supplemented with appropriate antibiotics.At OD 600 = 0.4-0.6, the expression was induced, and the cells were grown overnight at 18 °C.All Rubisco constructs were coexpressed with GroEL/ES.All CsoSCA-variants, sfGFP fusions, shell proteins, and CsoS2 were purified by Histag purification and all Rubisco variants by Strep-tag purification.To obtain pure untagged CsoSCA, purified His-SUMO-CsoSCA was cleaved using Ulp-protease.His-tag purified CsoSCA-MBP was further cleaned up by size exclusion chromatography.See SI Appendix, Methods for full purification protocols, including purification columns and buffer conditions.Protein purities were assessed by SDS-PAGE and were in general >95% pure.For storage, proteins were made to 10% (w/v) glycerol, flash-frozen in liquid nitrogen, and stored in −80 °C.The oligomeric state of CsoSCA-MBP was determined from the Superose 6 Increase chromatogram and a Gel Filtration Standard (BioRad, #1511901).

Generation of H. neapolitanus
ΔcsoSCA and of WTcsoSCA and ΔNTD 1-49 csoSCA mutant complementations.csoSCA was knocked out by insertion of a spectinomycin cassette.CsoSCA mutant complementations (ΔcsoSCA+WTcsoSCA; ΔcsoSCA+NTD 1 -49 csoSCA) were genomically integrated into H. neapolitanus NS2 neutral site.H. neapolitanus growth assays.Precultures of WT H. neapolitanus and H. neapolitanus ΔcsoSCA were grown in DSMZ68 at 5% CO 2 supplemented with the appropriate antibiotics.To induce CA expression, 1 μM IPTG was added to ΔcsoSCA transformed with wild-type csoSCA or the N-terminal truncation ΔNTD 1 -49 csoSCA.Upon reaching log phase, cultures were spun down, washed twice, and then serially diluted in 10x steps from 10 −1 to 10 −8 OD600.Resulting titers were spotted onto plates in 5% CO 2 and ambient air; strains expressing complemented WTcsoSCA or ΔNTD 1 -53 csoSCA were plated on plates containing 1 μM IPTG.Strains were allowed to grow for 4 d.All strains were plated in biological and technical triplicate.Protocols for H. neapolitanus genomic modifications and growth assays are fully described in SI Appendix, Methods.
Final carboxysome samples and the lysate were analyzed for the presence of CsoSCA or sfGFP fusion protein by SDS-PAGE (4 to 20% Mini-PROTEAN® TGX™ Precast Protein Gels, Bio-Rad), western blot, and GFP fluorescence.For western blotting, proteins from SDS-PAGE gels were transferred to nitrocellulose membranes using the Trans-Blot Turbo system (Bio-Rad).Membranes were blocked with 5% (w/v) nonfat dry milk in phosphate-buffered saline (PBS), 0.1% (v/v) Triton X-100 for 1 h at room temperature.Immunolabeling of Flag-tag was done overnight in 4 °C in the above-mentioned buffer containing a 1:5,000 dilution of a monoclonal anti-Flag horseradish peroxidase-conjugated antibody (Sigma).Membranes were washed 3 × 10 min with PBS, 0.1% (v/v) Triton X-100, and blots were thereafter developed using the SuperSignal West Pico Chemiluminescent Substrate (ThermoFisher) according to the manufacturer's procedure.Gels and western blots were imaged with the ChemiDoc TM XRS+ System (Bio-Rad).Fluorescence of sfGFP samples was quantified using an Infinite M-1000 plate reader (Tecan).To quantify encapsulation efficiency, the ratio of sfGFP fluorescence in carboxysomes (encapsulated protein)/lysate (expressed protein) was used.The ratio of shell/Rubisco content was quantified using densitometry by measuring the intensity of the CsoS1B and CbbS bands on the SDS-PAGE using ImageJ.
Binding and kinetic constants were extracted using the Data Analysis HT 10.0.00.44 software in the Octet Forte Bio package.NTD 1-53 -sfGFP vs. Rubisco were fitted to a 1:2 (Bivalent Analyte) binding model and Rubisco vs. CsoSCA to a 1:1 binding model.

Size Exclusion Chromatography Analysis of the NTD 1-53 -sfGFP:Rubisco
Cocomplex.Purified Rubisco-strep and NTD 1-53 -sfGFP samples were exchanged into moderate-salt buffer (20 mM Tris, pH 7.5, 150 mM NaCl) using Zeba desalting columns.For cocomplexing, the protein samples were mixed at an 32:1 ratio (NTD 1-53 -sfGFP:CbbL 8 S 8 ) and incubated briefly on ice prior to injection over a 3.2/300 Superose 6 Increase column equilibrated in 20 mM Tris, pH 7.5, and 150 mM NaCl at 4 °C.The column was eluted isocratically in the same buffer with elution of total protein monitored by A 280 and elution of sfGFP-containing fractions monitored by A 485 .
Native-PAGE Analysis of CsoSCA Binding.Binding of CsoSCA-MBP to Rubisco and shell proteins was analyzed by native-PAGE, using 4 to 15% Mini-PROTEAN® TGX™ Precast Protein Gels (Bio-Rad).Then, 2.5 μM CsoSCA-MBP was mixed with 0.5 μM Rubisco or 5 μM CsoS1A and CsoS1B and incubated for 15 min in RT.Final buffer composition was 50 mM Tris and 150 mM NaCl, pH 7.5.Carbonic Anhydrase Kinetics.CO 2 hydration catalyzed by CsoSCA-MBP was measured using the Khalifah/pH indicator assay (31) on an Applied Photophysics SX20 stopped-flow spectrophotometer at 25 °C.Saturated CO 2 solution (34 mM) was prepared by bubbling CO 2 gas into milli-Q water at 25 °C.To prevent CO 2 from escaping, a gas-tight Hamilton syringe was used to inject the solution into the stopped-flow drive syringe.MOPS and para-nitrophenol (pNP) were used as buffer-indicator pairs, and change in pH over time was detected at 400 nm using a pathlength of 1 cm.Final experimental conditions after mixing were 50 mM MOPS, pH 7.5 with the ionic strength adjusted to 50 mM with Na 2 SO 4 , 50 µM pNP, 17 mM CO 2 , and 6 µM of CsoSCA-MBP enzyme.Steady-state kinetics was measured in the timeframe of 0.02 to 0.5 s in eight replicates, and progression curves reaching equilibrium were measured for 60 s in triplicates.
Cryo-EM of the CsoSCA-Rubisco Complex.First, 0.5 μM Rubisco-strep was mixed with 0.5 mM of NTD 1-50 peptide (CsoSCA residue 1-50) in 25 mM Tris pH 7.5, 80 mM NaCl containing 2% glycerol, incubated for 20 min at room temperature, and thereafter stored on ice.Then, 3.5 μL of this sample was deposited onto freshly glow-discharged (PELCO easiGlow), Quantifoil R 1.2/1.3200 mesh Copper TEM grids (Quantifoil Microtools) and blotted for 3 s using a Mark IV Vitrobot (FEI) after a 30-s delay under 100% humidity at 4 °C conditions before freezing in liquid ethane.The complex was visualized in a Talos Arctica (Thermo Fisher Scientific) operating at 200 keV and equipped with a K3 Summit director electron detector (Gatan) in superresolution CDS mode at 57,000×, corresponding to a pixel size of 0.69 Å.In total, 5,742 movies were acquired with the aid of SerialEM* using a defocus range between −0.6 and −1.8 μm and a 3 × 3 multishot image shift pattern.All movies consisted of 50 frames with a total dose of 50 e-/Å 2 .The data collection was monitored using on-the-fly processing in cryoSPARC live (Structura Biotechnology Inc., https://cryosparc.com/live) (60) to monitor microscope performance, micrograph quality, and orientation distribution of the particles on the grid.
Image Processing.Superresolution electron micrograph movies were aligned using MOTIONCOR2 (61) from within RELION 3.1 or using the CPU implementation of motion correction within RELION 3.1.CTF estimation was performed using CTFFIND 4.1 (62) from within RELION 3.1.Micrographs were inspected to remove poor-quality images, resulting in the higher-quality selection of 3,932 micrographs.All further processing was done from within RELION 3.1.Laplacian-of-Gaussian autopicking was used on a subset of 200 micrographs to pick approximately 75,000 particles.These particles were extracted from the micrographs with a pixel size of 2.77 Ångström and a box size of 90 pixels.2D classification was then used to generate a higher-quality subset of particles that were used to generate an initial 3D model by way of Stochastic Gradient Descent.3D classification of this higher-quality subset of particles gave us a good-quality 3D reference that was then used as a 3D template.
Approximately 1,300,000 particles were picked from the 3932 micrographs using our 3D reference before being extracted with a pixel size of 2.77 and a box size of 90 pixels.The particles were subjected to 3D classification applying D4 symmetry with a soft circular mask, and the best-looking classes comprising 358,785 particles were selected.The particles were then re-extracted with a pixel size of 1.37 Ångström and a box size of 180 pixels before undergoing another round of 3D classification.Again, the best classes comprising 290,762 particles were selected.The particles were re-extracted with a pixel size of 0.91 Ångström and a box size of 312 pixels before being subjected to 3D autorefinement.The refined particles were then 3D classified without additional image alignment, and the best classes comprising 262,882 particles were selected.These particles underwent CTF refinement and Bayesian polishing before being extracted with a larger 410-pixel size box.A few more rounds of 3D classification and 3D refinement, while selecting only the best classes left us with a homogeneous set of 79,562 particles.3D refinement of this particle set gives a final resolution of 1.98 Ångstöm at Fourier shell correlation (FSC) = 0.143.RELION 3.1 reports a b-factor of about −36 Å 2 when sharpened with a soft mask.
Coordinate Model Building and Refinement.The coordinate models for the two H. neapolitanus Rubisco-NTD 1-50 complex maps were built and refined similarly using a combination of COOT-v0.9.1 (63) and PHENIX-v1.19.1-4122 (64).Maps for this process were obtained by combination of the respective half-maps without filtering.Maps were molecular weight-based density modified and sharpened with phenix.resolve_cryo_emand phenix.auto_sharpen(65,66), respectively.For ease of handling, the maps were reboxed to 160 vx 3 (about 145 Å 3 ) for further use.Chains A (CbbL) and D (CbbS) from the H. neapolitanus Rubisco-CsoS2 N*-peptide cocrystal structure (PDB ID: 6UEW) (30), stripped of all ligands, were used as initial models for both maps.The initial models were rigid-body docked and manually reworked to fit the maps in COOT and the resolved portion of CsoSCA NTD 1-50 peptide built de novo.An initial round of phenix.real_space_refinement(67) was performed on models consisting of all asymmetric units with NCS constraints enforced, as well as default target bond length and angle restraints, but without secondary structure, rotamer, or Ramachandran restraints.Putative ordered water molecules were then placed interactively in COOT using maps thresholded at 2σ based on the presence of at least 2 hydrogen-bonding partners and the occurrence of the density in both halfmaps.Additional rounds of phenix.real_space_refinementand manual adjustment in COOT were performed as described above to yield the final coordinate models.For the higher resolution map (State-1), residues V3-E457 of CbbL were modeled with residues V324-E329 truncated to the C β atoms due to poor side-chain density in this region.Similarly, for the lower resolution map (State-2), residues V3-E457 of CbbL were modeled with residues H291-H300 and V323-D331 truncated to the C β atoms.The CbbS and NTD 1-50 peptide density for both maps were modeled with residues M4-N110 and P22-A30, respectively.

Condensate Formation Assays.
Labeling of CsoSCA-MBP.His-purified CsoSCA-MBP-his was run on a Superose 6 Increase 10/300 GL size exclusion column (GE Healthcare) in 50 mM HEPES, 300 mM NaCl, pH 8 to remove MBP-his contamination.Pooled fractions containing CsoSCA-MBP were labeled with Alexa Fluor 647 NHS Ester dye at a 1:1 ratio of protein to dye for 2 h in the dark at 4 °C.Excess dye was removed via buffer exchange into 50 mM HEPES, 300 mM NaCl, pH 8 on an EconoPac column (BioRad) and concentrated on a 30K cutoff spin column (Thermo Pierce) at 3,500 × g for 20 min.Glycerol was added to a final concentration of 10% before flash-freezing the protein.Condensate formation.All condensate formation experiments were carried out in a final buffer concentration of 50 mM Tris, 20 mM NaCl, pH 7.5.final protein concentrations were as follows: 1 μM Rubisco, 1 μM CsoS2-NTD-sfGFP, and 0.5 μM CsoSCA-MBP.Then, 20 μL of each mixture was loaded onto a gasket fixed to a cover slip (CoverWell Perfusion Chamber 8x9 mm Dia × 0.9 mm Depth, Grace Bio-Labs) and imaged at 100× on a Zeiss Axio Observer Z1 inverted phase contrast microscope.Green channel excitation was 488 nm and emission was 509 nm.Red channel excitation was 650 nm and emission was 673 nm.
Data, Materials, and Software Availability.Cryo-EM maps (sharpened, full, and unfiltered halves) and masks have been deposited with the Electron Microscopy Data Bank, and the corresponding atomic coordinate models deposited with the Protein Data Bank for Rubisco-N50 peptide State-1 (EMD-25201 (70), PDB-7SMK ( 71)) and State-2 (EMD-25228 (72), PDB-7SNV ( 73)).Plasmids for all protein constructs used (Dataset S4) are deposited and available from Addgene.All protein sequences used for bioinformatics are available in FASTA format in Dataset S1.
Fig. 1.An intrinsically disordered, poorly conserved N-terminal peptide is essential and sufficient for CsoSCA encapsulation.(A) Schematic of the cso operon (carboxysome operon) in H. neapolitanus.The 10-gene set consists of Rubisco large and small subunits, the scaffolding protein CsoS2, the carbonic anhydrase CsoSCA, and six shell proteins (CsoS4A/B, CsoS1A/B/C, and CsoSD1).In the native organism, CsoS1D is transcribed from an adjacent locus, while in the synthetic pHnCB10 plasmid, all genes are in a single operon.(B) Surface representation structure of the CsoSCA dimer from H. neapolitanus (pdb: 2FGY).The N-terminal domain (dark green) consists of a ~50-aa-long unstructured peptide followed by a folded α-helical domain with unknown function.The middle domain (light green) contains the active site.The C-terminal domain (white) appears to be a gene duplication of the catalytic domain but lacks essential active site residues.(C) Maximum-likelihood phylogenetic tree of CsoSCA.Cyanobacterial homologs are colored in green and proteobacteria homologous in an orange/brown gradient.Scale bar, 0.1 substitutions per site.(D) Disorder score of four representative CsoSCA homologous calculated using DISOPRED3 and conservation calculated from multiple sequence alignment.(E) Complementation of full-length csoSCA rescues growth of a csoSCA knock-out in H. neapolitanus, while complementation with an NTD-truncated variant, ΔNTD 1-49 CsoSCA, fails to rescue growth.(F) Western blot analysis detecting C-terminally flag-tagged CsoSCA in lysate (L) and enriched carboxysomes (CB) fractions of carboxysomes produced heterologously in Escherichia coli.Synthetic carboxysomes consist of the full cso operon (Fig. 1A), with either wild-type CsoSCA or an N-terminal truncated variant (ΔNTD 1-37 CsoSCA or ΔNTD 1-49 CsoSCA).CsoSCA is not detected in carboxysomes with N-terminal truncated CsoSCA variants.(G) Fusing the unstructured NTD of CsoSCA (37 or 53 residues) to sfGFP targets the fusion protein to synthetic carboxysomes produced heterologously in E. coli, while untagged sfGFP does not target to carboxysomes.The control with full-length CsoSCA-sfGFP also produces fluorescent carboxysomes.The panel shows fluorescence of purified carboxysomes, western blot analysis against flag-tagged sfGFP and SDS-PAGE of lysate (L) and purified carboxysomes (CB).(F and G) The L sample contains detergent for lysing the cells (B-PER II) resulting in a small band shift on the SDS-PAGE, explaining the slightly lower CsoSCA and NTD 1-37 /NTD 1-53 band in L compared to CB.
Structure of Rubisco inComplex with the CsoSCA NTD Peptide.

4 NTD 1 3 Fig. 3 .
Fig. 3. Structure of Rubisco with bound NTD 1-50 CsoSCA peptide.(A) Cryo-EM map of Rubisco bound to a peptide corresponding to the first 50 residues of CsoSCA (NTD 1-50 ).The Rubisco-NTD 1-50 cocomplex is colored by subunit with color key inset.(B) Close-up of the region boxed in A of the NTD 1-50 peptide shown as sticks and transparent surface and Rubisco subunits shown as opaque surfaces.(C) Same view as in B with NTD 1-50 peptide and interacting Rubisco residues shown as sticks.NTD 1-50 peptide density is shown as a gray mesh contoured to 2σ. (D-F) Detailed polar interactions between residues of NTD 1-50 peptide and Rubisco are shown as sticks with interaction depicted as dashed lines with distances in Ångströms.

Fig. 4 .
Fig. 4. CsoSCA and CsoS2 bind at the same site on Rubisco.(A) Surface representation of Rubisco with CsoSCA's NTD 1-50 peptide bound and Rubisco with CsoS2 peptide bound (pdb: 6uew).(B) Zoomed view of the binding site showing the different conformations of the CsoSCA and CsoS2 peptides.Peptides are shown as cartoons and detailed residues as sticks.Rubisco bound with the NTD 1-50 peptide is colored according to color key Inset in A and Rubisco (both subunits) bound with the CsoS2-N* peptide (pdb: 6uew) is colored white.The white transparent surface represents the Rubisco structure which binds CsoSCA.Polar interactions are depicted as dashed lines and cation-pi stacking as dashed triangles with distances in Ångströms.(C) The BLI response shows that the CsoS2 peptide fused to Rubisco (Rubisco-CsoS2-N*) passivates binding of CsoSCA to Rubisco.(D) Alexa Fluor 647, sfGFP, and merged fluorescence as well as phase contrast images of protein condensates formed from a solution of Rubisco, CsoS2-NTD-sfGFP, and Alexa Fluor 647 labeled CsoSCA-MBP showing that CsoSCA recruits into Rubisco-CsoS2 protein condensates.