α-Carboxysome Size Is Controlled by the Disordered Scaffold Protein CsoS2

Carboxysomes are protein microcompartments that function in the bacterial CO2 concentrating mechanism (CCM) to facilitate CO2 assimilation. To do so, carboxysomes assemble from thousands of constituent proteins into an icosahedral shell, which encapsulates the enzymes Rubisco and carbonic anhydrase to form structures typically > 100 nm and > 300 megadaltons. Although many of the protein interactions driving the assembly process have been determined, it remains unknown how size and composition are precisely controlled. Here, we show that the size of α-carboxysomes is controlled by the disordered scaffolding protein CsoS2. CsoS2 contains two classes of related peptide repeats that bind to the shell in a distinct fashion, and our data indicate that size is controlled by the relative number of these interactions. We propose an energetic and structural model wherein the two repeat classes bind at the junction of shell hexamers but differ in their preferences for the shell contact angles, and thus the local curvature. In total, this model suggests that a set of specific and repeated interactions between CsoS2 and shell proteins collectively achieve the large size and monodispersity of α-carboxysomes.


■ INTRODUCTION
−4 Carboxysomes are a striking example of emergent order, with thousands of individual copies of 10 different constituent proteins spontaneously assembling into quasi-icosahedral particles with well-defined size and composition. 5The mechanism of size control is particularly interesting, as the assembled particles extend far beyond the dimensions of any individual component.The assembly process also requires high fidelity with the potential for severe phenotypic penalty.−8 Carboxysomes occur in two evolutionarily distinct lineages: α, found in most marine cyanobacteria and several clades of bacterial chemoautotrophs; and β, found in certain freshwater cyanobacteria.β-Carboxysomes tend to be larger (100−400 nm in diameter) and are thought to assemble in an inside-out manner with the formation of a dense Rubisco kernel preceding encapsulation by the shell proteins. 9−11 α-Carboxysomes are somewhat smaller (80−120 nm in diameter) and more regular in size.−14 In this work, we focus on α-carboxysomes to understand what factors drive the size and regularity of the particles.
α-Carboxysome assembly culminates in particles that are densely packed with cargo, free of shell defects to serve as a CO 2 permeability barrier, and of uniform size with diameters of about 115 nm in the model species Halothiobacillus neapolitanus. 4,15−18 Structural studies on the shell proteins have revealed spontaneous assembly of small icosahedral complexes with triangulation numbers of T = 3, 4, or 9. 19−21 By comparison, the H. neapolitanus carboxysome is significantly larger with T ∼ 75. 22This suggests that intrinsic curvature preferences of the shell proteins alone 1 are not determinative of the carboxysome size�as is the case for many small icosahedral virus capsids�but that an interplay with internal components is ultimately responsible for particle size.
α-Carboxysomes are composed of a thin icosahedral shell made up of thousands of hexameric protein capsomers (CsoS1) and pentameric capsomers (CsoS4) forming the 12 vertices. 22−26 CsoS2 is unique among the carboxysome proteins as it is predicted to be largely structurally disordered along its entire length. 4,13,27It does, however, contain several distinct repeated sequence motifs that collectively define a three-part domain structure: the N-terminal domain (NTD) with N-peptide motifs, the middle region (MR) with Mpeptide motifs, and the C-terminal domain (CTD) with Cpeptide motifs and a conserved C-terminal peptide (CTP).Previous work has shown that CsoS2 is essential to growth and acts as a central node with interactions to both Rubisco and CsoS1. 13,14,28Specifically, the NTD binds to and encapsulates Rubisco via interactions with the N-peptides, 14,25 and the CTD has been shown to associate with the shell. 20,21We recently showed that MR also interacts with the shell, though what distinguishes its functional role from the CTD is not yet clear. 29erein we describe how the sequence of the scaffold CsoS2 and, in particular, the relative numbers of M-and C-peptide motifs specify the size of the carboxysome.Furthermore, this size effect is largely independent of the primary cargo Rubisco and points toward a model for size control that is driven by the shell's mechanical properties and, specifically, CsoS2's modulation of these mechanical properties.Finally, in light of a recent cryoEM structure of portions of the CTD-shell interface, 21 we propose a structural model of the M-peptide's role.
For testing of different CsoS2 variants and truncations, a Golden Gate destination vector was created from pHnCB10, replacing the original csoS2 gene with a lacZ fragment bracketed by BsaI type II restriction sites.The different CsoS2 constructs were built by PCR amplification of the desired fragments with primers containing flanking BsaI sites and compatible recombination sites.Each construct encodes the full set of carboxysome genes but with a modified CsoS2.The polyproline II helix sequence was purchased as a gBlock from IDT.All plasmids were completed by Golden Gate Assembly. 31The coexpression plasmids for CsoS1ABC, CsoS2A, CsoS2B, and fragments of the MR fused to GFP were built on a pFA backbone with a p15a origin, kanamycin

Biochemistry
resistance, and tetracycline operator. 32All constructs are summarized in Table S1.
Plasmids were transformed into electrocompetent BW25113 Escherichia coli cells.For carboxysome expression, cells were grown at 37 °C to midlog (OD600 0.4−0.6)whereupon the temperature was reduced to 18 °C and inducer was added: 1 mM IPTG for all samples and a variable amount of anhydrotetracycline (aTc) for the variable coexpression.Cultures were grown overnight and then pelleted and frozen or used directly for carboxysome purification.
H. neapolitanus cells were grown at 30 °C in DSMZ 68 media.The construction of the strain expressing carboxysomes with only CsoS2B from a genomic neutral site under control of a lac operator is described in ref 33.To express carboxysomes and sustain atmospheric growth, we grew this strain in the presence of 1 mM IPTG.
Carboxysome Purification.Carboxysomes were purified as described previously. 28The cells from 1 L expression cultures were lysed with 25 mL of B-PER bacterial lysis reagent (Thermo Fisher) supplemented with 0.01 mg/mL DnaseI, 0.1 mg/mL lysozyme, 1 mM phenylmethylsulfonyl fluoride (PMSF), 10 mM MgCl 2 , and 20 mM NaHCO 3 .Lysate was clarified by centrifugation at 12 000 rcf for 30 min.Carboxysomes were pelleted from the supernatant by centrifugation at 40 000 rcf for 30 min.Crude pelleted carboxysomes were resuspended with 20 mL of TEMB buffer (10 mM Tris, 10 mM MgCl 2 , 1 mM EDTA, and 20 mM NaHCO 3 , pH 8.4) on ice and pelleted again at 40 000 rcf for 30 min.The pellet was again gently resuspended in 2 mL of TEMB and loaded on top of a sucrose gradient with 5 mL layers of 10, 20, 30, 40, and 50% sucrose (w/v) in TEMB.The gradient samples were spun in an ultracentrifuge for 35 min at 105 000 rcf.Then, the gradients were fractionated into 1 mL fractions and analyzed by sodium dodecyl-sulfate polyacrylamide gel electrophoresis (SDS-PAGE) to determine the carboxysome-containing fractions which also display bluish light scattering from the particles.Fractions with carboxysomes were pooled, centrifuged for 90 min at 105 000 rcf, resuspended with TEMB, and stored at 4 °C prior to negative stain transmission electron microscopy (TEM).
Transmission Electron Microscopy and Size Characterization.Carboxysome samples were diluted to a 280 nm absorbance of 0.05−0.1.Prior to sample application, Formvar/ carbon-coated, copper EM grids were prepared by glow discharge.Five μL of sample was loaded on to each grid, allowed to sit for at least 2 min, washed, stained with continuous application of 10 μL of 1% (w/v) uranyl acetate solution, and blotted with filter paper.Exposure of the carboxysomes on the grid to air−water interfaces was minimized to prevent carboxysome breakage and to increase the accuracy of size analysis.All samples were imaged with either a FEI Tecnai 12 120 kV or JEOL 1200 EX 80 kV transmission electron microscope.
Particle size characterization was performed in ImageJ.Carboxysomes were manually located to avoid particle misidentification and to exclude significantly damaged carboxysomes.The images were first given a pixel offset of +4 counts to eliminate any zero count pixels.Then, a polyhedron was drawn around each carboxysome, and the pixel counts were zeroed within this region.Finally, the particles were selected with a zero count threshold and measured with automatic particle characterization utility.The minimum Feret diameter, i.e., the closest approach of two parallel lines in contact with the particle projection, was used for all size statistics.
MR Localization to Carboxysomes.A series of plasmids expressing MR truncations fused to Superfolder GFP 34 were each cotransformed with the full carboxysome plasmid (pHnCB10).Carboxysome expression was induced with 1 mM IPTG and GFP fusion expression was induced with 100 nM aTc when the cells reached midlog phase (OD600 0.4− 0.6).The temperature was reduced to 18 °C and the cells were grown overnight.Carboxysomes were purified as described above.Then, for each sample, the absorbance was measured at 340 nm and GFP fluorescence was measured with 485 nm excitation and 515 nm emission using a Tecan M1000 plate reader.Due to their size, carboxysomes strongly scatter at 340 nm, while soluble protein has low absorption at this wavelength.Thus, the GFP fluorescence intensity divided by the 340 nm absorbance was used as a proxy for the relative GFP loading per carboxysome.
Modified CsoS2 in H. neapolitanus.H. neapolitanus is naturally competent and will take up foreign DNA and specifically integrate it into its genome by homologous recombination if provided suitable flanking homology arms.We used this strategy to knock out CsoS2 at the native locus, replacing it with a spectinomycin resistance cassette and validating it by PCR.This intermediate strain is incapable of atmospheric growth due to its inability to form carboxysomes.We then knocked in a series of CsoS2 variants having variable M-peptide repeat numbers into the genomic neutral site NS1 along with a kanamycin resistance gene. 33All variants could grow under CO 2 -replete conditions with 5% CO 2 .Colony forming unit (CFU) counts were performed by serial dilution of liquid cultures grown at 5% CO 2 and plated on DSMZ 68 agar plates with spectinomycin (20 μg/mL), kanamycin (10 μg/mL), and IPTG (1 mM) and placed in the corresponding CO 2 environments: atmosphere (∼0.04), 0.5, and 5%.
H. neapolitanus M1-peptide was docked to the luminal side of the junction of three CsoS1A hexamers.The coordinates of this shell trimer were obtained from symmetry expansion of the coordinates of PDB ID 2EWH. 37Rosetta was used to independently relax the structures of the M1-peptide and the shell trimer, and the best scoring structures were used for protein−protein docking.The M1-peptide was positioned above the 3-fold shell axis and roughly aligned with the M1peptide pseudo-3-fold axis.Next, the Rosetta local docking routine was used with an initial translation randomization of 5 Å and orientational randomization of 12°.500 docked structures were generated. 38The best scoring among these clearly fell into a common class of configurations, with the VTGs wedged near the symmetry-related CsoS1A-His79 residues.The protein−protein interface was further relaxed in Rosetta subject to an applied harmonic distance constraint bringing the VTG threonines into contact with the 3-fold His79 residues.

Biochemistry Bioinformatic Characterization of M-and C-Peptide
Repeats.All sequences from the Integrated Microbial Genomes database (IMG) with the CsoS2 M pfam (PF12288) were downloaded in May 2020 for a total of 770 sequences.These were filtered to eliminate those with partially sequenced and unspecified residues.Next, the sequences were dereplicated to 95% identity using Usearch for a total of 272 unique sequences. 39MEME was used to identify peptide motifs and find, as previously reported, the main known motifs of the N-peptide, M-peptide, C-peptide, and CTP. 40MAST from the MEME suite was used to extract the positions of all matches to the M-and C-peptide motifs, and the amino acid sequences were extracted for each of these with a 15aa buffer zone on either side.These peptide sequences were aligned all together (i.e., both M-and C-peptides) using mafft. 41FastTree was used to generate a phylogenetic tree on the basis of this alignment. 42From the tree there are clearly distinguishable clades for the M-and C-peptide classes but some of the initial MAST hits were mislabeled (including H. neapolitanus M7).All sequences belonging to each clade were aligned against each other and Weblogo3 was used to generate sequence logos. 43

■ RESULTS
Size in self-assembling biological systems can be controlled by a number of mechanisms including intrinsic curvature built into the geometry of the constituent proteins, 44 molecular rulers, 45 excluded volume effects, 18 and nonequilibrium race conditions between growth and termination. 16We hypothesized whether Rubisco, which is approximately 65% of the particle by mass and most of the cargo, 5 might play a pivotal role in guiding the carboxysome size and overall morphology.−48 We repeated this experiment� by recombinantly expressing all carboxysome proteins except Rubisco in our E. coli heterologous expression system�and obtained similar results (Figure 2a).In particular, the particle size does not collapse to that of the much smaller complexes formed from shell proteins alone (Figure 2a, blue lines).
We next turned to the major shell components: the hexameric proteins.An alternative hypothesis is that increasing shell protein concentration would alter the balance between cargo and shell growth in favor of the shell to enclose smaller particles, ultimately short-circuiting the assembly of full-sized particles. 16To test this idea, the three major hexameric proteins CsoS1ABC were coexpressed with variable induction alongside the full set of carboxysome proteins.The size and morphologies of the resulting particles were not appreciably different; however, the presence of extra shell proteins was deleterious to the yield of purified carboxysomes (Figure 2b, lysate gel, Figure S2), likely by diverting other carboxysome components into off-pathway assemblies and/or aggregates.
A third hypothesis is that CsoS2, also a major component at ∼15% by mass, 5 is a possible governor of carboxysome size.To this point, it has previously been found that disordered scaffold proteins can play a role in the size specification of certain icosahedral virus capsids. 49,50A particularly striking case is PBCV-1 in which a disordered protein acts as a molecular ruler, effectively measuring the distance between vertices. 45We thus tested whether CsoS2 acts as a molecular ruler by inserting a sequence encoding a polyproline II helix�a rigid structural element�of approximately 12 nm between the M3 and M4 repeats. 51This construct also produced carboxysomes, but this structural modification of CsoS2 showed no discernible size effect (Figure 2c).
Given these negative results, we posited that other known biochemistries within the carboxysome could play a role in size control.In H. neapolitanus carboxysomes, CsoS2 appears as two distinct isoforms: a short form of ∼61 kDa called CsoS2A and a long form of ∼92 kDa called CsoS2B. 27We previously showed that the short form is produced as a result of a −1 programmed ribosomal frameshift (PRF) that leads to the premature termination of CsoS2 (Figure 3a). 28In native H. neapolitanus carboxysomes and ones produced recombinantly in E. coli, CsoS2A and CsoS2B are present in approximately equimolar quantities.Furthermore, CsoS2B alone can still sustain atmospheric growth in H. neapolitanus, while CsoS2A alone cannot due to its inability to assemble robust carboxysomes. 28,33We thus asked whether the ratio between CsoS2 isoforms has consequences for particle size.Carboxysomes from H. neapolitanus were purified from the wild-type strain (having both CsoS2A and B) and a mutant strain in which CsoS2 was knocked out at the native locus and complemented with a CsoS2B-only sequence lacking the frameshifting slippery sequence.Similarly, carboxysomes were expressed and purified from E. coli with either the wild-type (wtCsoS2) sequence or the CsoS2B-only sequence in pHnCB10.There was good agreement between the average diameters of carboxysomes obtained from H. neapolitanus and E. coli.However, a reduction of size was observed in both going from wtCsoS2 to CsoS2B carboxysomes with mean diameters changing from 114 to 93 nm for H. neapolitanus and 113 to 93 nm for E. coli (Figure 3b), similar to the effect of eliminating Rubisco (Figure 2a).A difference between H. neapolitanus and E. coli carboxysomes is the notable sharpness of the H. neapolinatus CsoS2B-only size distribution.We attribute this to two factors: there is only one CsoS2 form so that there is no particle-by-particle variation caused by the stoichiometry of isoforms, and native carboxysomes are seen to adopt more regular icosahedral geometries relative to their heterologous counterparts. 30In sum, the loss of the frameshift alone toggles a size shift in both H. neapolitanus and E. coli

Biochemistry
carboxysome that, while appearing relatively modest, represents a nearly 40% decrease of available cargo volume in carboxysomes lacking the CsoS2 frameshifting element.
To further explore the connection between CsoS2 isoforms and size, we employed a titratable coexpression system in E. coli.In the first, CsoS2A was titrated against a background of carboxysomes with CsoS2B-only.In the second, CsoS2B was titrated against a background of wild-type carboxysomes containing both CsoS2A and CsoS2B (Figure 3c,d, lysate gel, Figure S3).Broadly, the results indicate that the balance of CsoS2A and CsoS2B systematically alters the size of the resulting carboxysomes.That is, more CsoS2A correlates with larger particle sizes.Interestingly, overexpression of CsoS2B enhanced the recovered yield of carboxysomes while also reducing the mean particle sizes even lower than those of carboxysomes having only CsoS2B (Figure 3b,d).This suggests that whatever interaction is driving the size effect may not be fully saturated under normal assembly conditions.
CsoS2A and CsoS2B are distinguished by the absence or presence of the CTD, respectively, which we know to be an indispensable feature for carboxysome assembly. 28Wild-type and CsoS2B-only carboxysomes then differ in the relative amount of MR with respect to the CTD.To investigate this effect, we therefore built a series of CsoS2 variants in which we systematically truncated or augmented the MR to include different numbers of conserved M-peptide repeats.In all of these, we eliminated the PRF sequence such that the resulting proteins would be precisely defined with respect to the Mpeptide-to-CTD ratio without the complicating effects of frameshifting.
The results showed that each CsoS2 variant produced purifiable carboxysomes that encapsulated Rubisco, and, across the series of M-peptide numbers, we observed a dramatic size trend (Figure 4, pooled carboxysome gel, Figure S4).The smallest carboxysomes were those with no M-peptide repeats and with an average diameter of just over 40 nm, a size which is highly similar to the single particle structure solved by Ni and Jiang et al. 21at T = 9 and 36.9 nm diameter.In contrast, augmenting CsoS2B with copies of M-peptides M1−M6 resulted in particles having substantially larger diameters, with an average of 130 versus 94 nm for unmodified CsoS2B.
In addition, we tested a subset of these variants for complementation in the wtCsoS2 knockout H. neapolitanus strain.While all strains could grow under 5% CO 2, we found that the MR truncations negatively impacted survival at atmospheric CO 2 concentrations (∼0.04%) with 1 and 4 Mpeptide repeats unable to sustain growth, while 7 and 13 Mpeptide repeats could.At 0.5% CO 2 , all but the 1 M-peptide strain could survive (Figure 4c).
The two C-peptides of the CTD are related to the Mpeptides and share a common motif of three VTG triplets each separated by 8 residues (Figure S1).This similarity leads us to ask: do these C-peptides play a similar role as the M-peptides with respect to carboxysome size?To test this hypothesis, we created two CsoS2 variants: one with duplication of both Cpeptide repeats and another with duplication of the entire CTD, including the CTP.In contrast to the MR augmentation which produced substantially larger carboxysomes, the Cpeptide duplications did not result in larger particles and, instead, produced similarly sized or even slightly smaller carboxysomes compared to CsoS2B (Figure S5).
It is now well established that the CTD forms interactions with the carboxysome shell via both the C-peptides and CTP. 20,21Our recent study found that the MR also specifically interacts with the shell hexamers and that the VTG triplets are an essential component of that interaction. 29To further test this interaction in the context of carboxysome assembly, we designed a series of MR-GFP fusion proteins having different numbers of M-peptides.These were coexpressed alongside complete carboxysomes, and we quantified the relative GFP loading by measuring the ratio of GFP fluorescence to absorbance at 340 nm due to carboxysome light scattering (Figure S6).We found a strong dependence on the M-peptide repeat number with more repeats resulting in more loading.The quantity of any of these fusions in the carboxysomes as visualized by SDS-PAGE was far below that of CsoS2 such that they did not appreciably alter the balance of the M-and Cpeptides.Together, these results imply that the MR interacts with the shell during assembly and does so in a multivalent fashion.
In summary, the M-and C-peptides both share similar sequence features and interact with the shell.However, they have divergent effects on the size: the number of M-peptides strongly dictates the particle size, while the C-peptides are absolutely required for carboxysome-like particle formation, but their relative numbers seemingly do not alter particle size.

■ DISCUSSION
In this study, we have examined the factors determining the size of α-carboxysomes.We discovered that the particle size is strongly specified by the M-peptide repeat number and that this size preference is transmitted via its interaction with the shell.Below, we discuss the implications of these observations and put forward a size control model with a potential structural mechanism.
The dispensability of Rubisco to the formation of carboxysome-like particles implies that it is not an active participant in the assembly itself.Instead, it acts more as a client protein, partitioning into a phase-separated condensate with CsoS2 via the NTD on the surface of the growing shell. 14hus, we are left with the interplay between CsoS2 and the shell from which to build a model.This model must take into account several observations: (1) the MR and CTD both interact with shell proteins; 20,21,29 (2) their interactions are both multivalent in nature and due to the repeated M-and Cpeptides, respectively 21,29 (Figure S6); (3) the M-and Cpeptides share a sequence motif�the [V/I]TG triplet�so that they likely have related modes of contacting the shell (Figure S1); and (4) the M-and C-peptides differ in their effects on particle size, i.e., M-peptides encourage larger particles while C-peptides are size neutral (Figures 4 and S5).Taken together, we believe that these factors point toward a model in which CsoS2 modifies the mechanical properties of the shell.
It has been previously shown that expressing shell proteins alone results in small icosahedral structures, revealing that, in the absence of other factors, the capsomers form stable highcurvature contacts. 19,52,53The introduction of CsoS2 into this mix frustrates the formation of these small particles and instead produces significantly larger particles with lower curvature at the shell−shell interfaces.We specifically propose that the essence of the size control lies in energetic differences in shell binding between the M-and C-peptides with respect to the angles at the junctions of shell hexamers.This energetic effect can be expressed as two related but distinct models we call "preferred curvature" and "flexible curvature" (Figure 5a).In the preferred curvature model, the C-peptide has an energy minimum at a greater shell contact angle than the M-peptide.In the flexible-curvature model, the C-peptide and M-peptide have the same energy minimum angle.However, the C-peptide has a broader energy function and can tolerate contact angles greater than those of the M-peptide.Note that the angle, while represented as a two-dimensional plane angle in the cartoon schematic (Figure 5a), is likely a three-dimensional solid angle.Finally, it should be emphasized that the model is not for Cpeptide binding at shell junctions directly bordering the pentameric vertex capsomer�where Ni and Jiang et al. find the CTP 21 �but rather for stabilizing hexamer-only junctions in the vicinity of the vertex that must sustain higher curvature than those further away for which the M-peptide is favored (Figure 5b). 54It is possible that the CTP interaction plays a supporting role in size control.However, it is likely minor since carboxysomes made without pentameric capsomers form with typical diameters. 6his model can be used to interpret several experimental observations.First, the M-peptide series in Figure 4 entails increasing M-peptide numbers on a constant background of two C-peptides.To a first approximation, the number of highcurvature sites will be roughly the same as particles increase in size, while the number of low-curvature sites will scale with the surface area.Thus, smaller particles are energetically better Idealized map of carboxysome shell curvature.Higher-curvature regions are centered on vertices, while lower curvature prevails in between and on facets.The model predicts green regions will be more favorable for M-peptide binding and blue regions will be more favorable for C-peptide binding.(c) More M-peptide stabilizes particles with a greater share of low-curvature surface and thus greater diameter.suited to a low M-to C-peptide ratio, while larger particles will benefit from a high M-to C-peptide ratio (Figure 5c). 54econd, the insufficiency of CsoS2A alone to form carboxysomes can be understood through the lens of the model.Having only M-peptides, it could perhaps knit together flat sheets of hexamers.Without the CTD, however, CsoS2A lacks the C-peptides necessary to stabilize the regions of highest curvature near the vertices and is thus incapable of producing closed particles.Finally, the larger size of H. neapolitanus carboxysomes with wtCsoS2 is explained by the extra abundance of the M-peptides, which extend the low curvature between vertices relative to CsoS2B-only particles.
At first glance, it appears improbable that one could make structural inferences on the basis of our phenomenological model given CsoS2's high disorder score throughout the sequence. 13,14,28However, analysis of diverse CsoS2 sequences and their structural predictions from the AlphaFold UniProt database revealed that, while the overall structures are amorphous and have low confidence, the aligned M-peptide sequences give a remarkably consistent microdomain (Figure 6a).This structure has the VTG triplet motifs arranged into a triangular structure with pseudo-3-fold symmetry.Furthermore, the conserved pair of cysteine residues are predicted to be in close proximity to one another.This is notable because the carboxysome interior is thought to possess an oxidized environment, and thus one might expect a fold-stabilizing disulfide bridge to form. 9,55,56he above structural prediction combined with the curvature model suggests that this pseudo-3-fold symmetric M-peptide ought to bind at a shell junction with 3-fold symmetry (Figure 6b, orange).We took the AlphaFold prediction for the H. neapolitanus CsoS2 M1-peptide and used Rosetta to dock the structure and relax the interface onto the 3-fold hexamer junction using the CsoS1A coordinates (PDB ID: 2EWH). 37,38The fit has good size and shape complementarity to the depression at the junction.A cryoEM structure with portions of the CTD resolved was very recently described by Ni and Jiang et al. 21Most interestingly, they identified CsoS1A-His79 as an important contact for CTD binding through the VTG motifs.His79 lies in close proximity to the triad of VTGs of the docked M1-peptide.Using harmonic distance constraints in Rosetta, the M1-peptide was relaxed into an idealized geometry with the VTG threonine hydroxyls each in hydrogen bonding contact with the His79 delta nitrogen while maintaining the same basic tertiary fold (Figure 6c).
What might structurally distinguish the C-peptide from the M-peptide and explain their energetic preferences for different shell curvatures?The M-peptide consensus sequence is essentially the C-peptide consensus plus several additional

Biochemistry
conserved positions, namely, the aforementioned cysteine pair, a tyrosine residue appearing three residues downstream of the last VTG motif, and a lysine-valine pair near the beginning (Figures 6a and S1).In a recent study of the MR, we observed significant effects of the conserved tyrosine on the biochemical interactions with the shell and sufficiency of resulting carboxysomes for atmospheric growth in H. neapolitanus, highlighting a distinct yet crucial role of the M-peptides. 29The AlphaFold predictions for the C-peptides are of low confidence and lack a common structure.The CTD-shell structure from Ni and Jiang et al. shows that even as the C-peptide and CTP residues are well resolved, particularly at the VTG-binding contacts, the overall structure threads its way along the hexamers without a clear tertiary structure of its own. 21We hypothesize, in contrast, that the M-peptide acts more as a microdomain with a characteristic fold (Figure 6a).Taken together, this suggests the M-peptide motif is more rigid and prefers to bind low-angle hexamer junctions, while the Cpeptide motif is more flexible and can accommodate higherangle junctions.
Given this model of size, we can make some inferences about the natural diversity of CsoS2 sequences and the attending size consequences.Broadly speaking, we predict that organisms with high M-to C-peptide ratios will produce on average larger carboxysomes, while those with low M-to Cpeptide ratios will produce smaller carboxysomes.Furthermore, we also predict that all else being equal, CsoS2 sequences containing a PRF will afford larger carboxysomes than an otherwise identical sequence.It will be interesting to see whether these predictions map to environmental conditions.For example, we observed that CsoS2 truncated to 4 M-peptides could not sustain growth in H. neapolitanus under atmospheric CO 2 concentrations but could at 0.5% CO 2 (Figure 4c).
More broadly, how does this mechanism of size control compare to other bacterial microcompartments (BMCs)?β-Carboxysomes do not form large icosahedral shell structures in the absence of Rubisco 52 nor do they possess an obvious analogue to CsoS2.Combined with the evidence for inside-out assembly, this suggests that the size is determined as a result of a close interplay between a growing shell stabilized around a dense cargo. 9,16,17Among other BMCs (e.g., Pdu and Eut), there is a wide variety of sizes and morphologies.It is not presently clear whether they fall under common assembly principles and size controls.Interestingly, some BMC shell proteins can assemble into diverse macrostructures such as nanotubes, extended sheets, and rosettes. 57A leading hypothesis for the BMC size control is that it may be driven by the relative stoichiometries of constituent shell capsomers that differ in their toleration of curvature. 19,54,58,59With additional structural and biochemical data, it will thus be interesting to see whether there is a broad unifying principle for the size control of microcompartments or if, as with viruses, there are multiple mechanistic routes to achieve similar outcomes.

■ CONCLUSIONS
In this study, we sought to uncover the determinants of particle size in α-carboxysomes.We discovered that the size is set primarily by the scaffold protein CsoS2 and its interactions with the shell.Through systematic modification of CsoS2, we observed that the size effect is mediated by the relative numbers of M-peptide repeats found in the MR and the C-peptide repeats found in the CTD.Additionally, we determined that a functional consequence of the PRF in H. neapolitanus CsoS2 is to increase particle size at no cost of additional sequence.
We propose an energetic model based on differential curvature preferences between the M-and C-peptides and connect this model to a mechanistic structural hypothesis on the basis of AlphaFold structural predictions and new experimental breakthroughs in carboxysome structure.
An increasingly comprehensive view of the α-carboxysome assembly is coming into focus as the structures and interactions linking the components together are elucidated.Only a few key gaps, most notably the CsoS2-MR shell interaction, remain before a complete account of the structure, spanning atomic detail to functional particles.With this depth of information, we think that the carboxysome is a useful paradigm for complex self-assembly and potentially a template, physically and conceptually, for future engineering efforts.
■ ASSOCIATED CONTENT * sı Supporting Information

Figure 1 .
Figure 1.(a) H. neapolitanus regulon with the 10 proteins making up the α-carboxysome.The callout shows the domain structure with repeated peptide motifs in the disordered scaffold protein CsoS2.Note that M7 has been reclassified as an M-peptide (see Figure S1).(b) Carboxysome protein interactome.Known protein−protein interactions are depicted with solid arrows, and cases where the contact structures are solved are shown in the popouts along with their corresponding PDB IDs.CsoS2 is an important hub in the network, bridging the cargo to shell connection.Cartoon schematics are shown roughly to scale.

Figure 2 .
Figure 2. (a) Comparison of recombinant carboxysomes with and without the primary cargo Rubisco.TEM negative stain micrographs highlight the same overall pseudo-icosahedral structure.All swarm plots show the minimum Feret diameter.The dark central line is mean diameter with whiskers to ±1 standard deviation.Particle counts and mean diameters are shown at the top.The blue lines mark the sizes of solved icosahedral complexes of shell capsomers, and the corresponding T-numbers are indicated. 20,21(b) Hexamer titration in E. coli carrying pHnCB10, a plasmid containing all 10 carboxysome proteins under control of lac operator, and a plasmid with the major hexameric shell proteins CsoS1ABC under tetracycline induction control.Cells were all induced with 1 mM ITPG and variable amounts of anhydrotetracycline (aTc) to change the shell expression levels.Swarm plots of particle sizes as in (a) with inset zoomed TEM micrographs showing normal morphology.(c) Polyproline helix.Carboxysomes were expressed with CsoS2B, i.e., CsoS2 with no frameshift, and CsoS2B with a 12 nm polyproline II helix inserted.Swarm plots of particle sizes as in (a) with inset TEM micrographs of representative particles.

Figure 3 .
Figure 3. (a) Diagram of the programmed ribosomal frameshift (PRF) leading to ribosome slippage to −1 frame and premature termination of CsoS2 to generate the CsoS2A isoform.(b) Kernel density estimate distributions of carboxysome sizes from H. neapolitanus and E. coli expressing carboxysomes with wtCsoS2 (CsoS2A + CsoS2B) or only CsoS2B in which the PRF has been eliminated.White boxes span the inner quartiles, with the median as the central black line.Whiskers are from the 5th to the 95th percentile.Shown below the distributions are p-values for unpaired two-tailed t tests.Representative negative stain TEM images are shown for carboxysomes purified from H. neapolitanus and E. coli.(c) Scheme and distributions for titration of CsoS2A on a background of CsoS2B-only carboxysomes (red) and titration of CsoS2B on a background of wtCsoS2 carboxysomes.t test p values are shown below.(d) Mean particle diameters for titration conditions plotted versus the measured fraction of CsoS2 as CsoS2A from gel densitometry.aTc concentrations in nanomolar quantities are shown in italics next to each data point.Popouts are representative TEM micrographs of the extremes of the CsoS2B titration, showing substantially smaller carboxysomes with overexpression of CsoS2B.

Figure 4 .
Figure 4. (a) Series of carboxysomes were produced containing all components�including Rubisco�and CsoS2 with variable numbers of Mpeptide repeats going from 0 to 13 (CsoS2B has 7).Negative stained TEM images are shown for representative particles in the population.All images are at the same scale with 100 nm scale bars.Below is the distribution of the minimum particle Feret diameters.The middle black line indicates the median, the white box represents inner quartiles, and whiskers extend to the 5th−95th percentile.(b) Plot of mean particle diameters as a function of the number of M-peptides.Whiskers are ±1 standard deviation of the distributions.(c) Plate growth assay of CsoS2-deficient H. neapolitanus complemented with a subset of CsoS2 M-peptide variants.Colony forming unit (CFU) counts were quantified in biological triplicate for atmospheric (0.04), 0.5, and 5% CO 2 concentrations.Whiskers represent ±1 geometric standard deviation.

Figure 5 .
Figure 5. (a) Energetic models of M-and C-peptide shell interactions.Plots show the potential energy of shell-bound M-(green) and C-peptides (blue) as a function of shell angle.In the "preferred curvature" model, the C-peptide has an energy minimum for a high-curvature shell configuration.In the "flexible curvature" model, the C-peptide has the same preferred angle as the M-peptide but a shallower energy well.(b)Idealized map of carboxysome shell curvature.Higher-curvature regions are centered on vertices, while lower curvature prevails in between and on facets.The model predicts green regions will be more favorable for M-peptide binding and blue regions will be more favorable for C-peptide binding.(c) More M-peptide stabilizes particles with a greater share of low-curvature surface and thus greater diameter.

Figure 6 .
Figure 6.(a) Sequence logo for M-peptide created from 1662 M-peptide sequences pulled from 272 CsoS2 sequences after dereplication to 95% identity.Aligned AlphaFold predictions for all M-peptides in CsoS2s from H. neapolitanus, P. marinus, N. eutropha, and I. purpurea.Key conserved residues are colored, matching the bars on the sequence logo.(b) Schematic of hexameric shell lattice indicating the site classes with 3-fold symmetry.(c) Surface representation of H. neapolitanus M1-peptide docked onto the 3-fold hexamer junction with Rosetta.Zoom in shows the VTG motifs in contact with the triangle defined by CsoS1A-His79.