Recent Structural Insights into Bacterial Microcompartment Shells

Bacterial microcompartments are organelle-like structures that enhance a variety of metabolic functions in diverse bacteria. Composed entirely of proteins, thousands of homologous hexameric shell proteins tesselate to form facets while pentameric proteins form the vertices of a polyhedral shell that encapsulates various enzymes, substrates and cofactors. Recent structural data have highlighted nuanced variations in the sequence and topology of microcompartment shell proteins, emphasizing how variation and specialization enable the construction of complex molecular machines. Recent studies engineering synthetic miniaturized microcompartment shells provide additional frameworks for dissecting principals of microcompartment structure and assembly. This review updates our current understanding of bacterial microcompartment shell proteins, providing new insights and highlighting outstanding questions. Narrow pores located at the center of BMC proteins provide channels for the diffusion of specific substrates and cofactors. The external protein shell remains the hallmark of all MCPs.


INTRODUCTION
Bacterial Microcompartments (MCPs or alternatively BMCs) are a class of supramolecular structures found in approximately 20% of bacteria [1]. Ranging in size from roughly 100 -400 nm in diameter, MCPs encapsulate and optimize a myriad of metabolic pathways by concentrating together enzymes and substrates to accelerate catalysis and to prevent the escape of toxic or volatile intermediates [2][3][4][5][6][7][8]. Unlike membrane-bound eukaryotic organelles, MCPs are composed entirely of proteins. So-called BMC proteins (based on Pfam PF00936) form hexameric building blocks shaped like hexagonal disks ( Figure 1). These tessellate side by side to form flat extended facets [9][10][11][12][13] with lateral associations driven by highly conserved perimeter residues [12,[14][15][16]. In addition, and distinct from BMC proteins, pentameric BMV proteins (Pfam PF03319) form the vertices of the polyhedral shell, which is in some cases roughly icosahedral (Figure 1) [17][18][19]. The heterogenous shells of MCPs are formed from two to seven BMC paralogs that are often expressed within a single operon [4,7,8]. Narrow pores located at the center of BMC proteins provide channels for the diffusion of specific substrates and cofactors. The external protein shell remains the hallmark of all MCPs.
This review focuses on current structural data, highlighting and updating our understanding of the roles of bacterial microcompartment shell proteins. We draw attention to the unusual variations in sequence and topology in BMC shell proteins and emphasize structural polymorphisms in certain subsets, which are likely to relate to functional specialization. Finally, we investigate the implications of recently characterized mini MCP shells, addressing strengths in identifying MCP assembly principals and overall shell topology.

STRUCTURAL FEATURES AND VARIATIONS IN BMC SHELL PROTEINS
To date, the structures of some 110 MCP shell proteins have been deposited in the protein data bank (PDB) [34]. A specialized database focusing on MCPs has recently been established to facilitate the analysis and study of their structures [35**]. Here we summarize the structural features of BMC shell proteins and update the tertiary topological variations discovered so far.
Though structurally similar, BMC shell proteins exhibit topological differences of various types ( Figure 2a). The canonical BMC protein is comprised of a roughly 100 amino acid domain (Pf00936), and is referred to as BMC-H for its hexameric assembly ( Figure 2b). The hexagonal disks formed by BMC proteins have distinctly shaped top and bottom faces, with one relatively flat face and the other bearing a central depression that creates a concave surface. Looking down the flat face and following the sequence from the N to C-terminus, the secondary structure elements of BMC-H proteins are arranged in a roughly clockwise fashion (Figure 2b). A unique subset of hexameric BMC domain-containing shell proteins, Permuted BMCs, have been discovered to have cyclically permuted sequences and structures. While they possess a similar overall tertiary structure, the circular permutation results in differently poised N and C-termini relative to their BMC-H counterparts [36-38*]. In cases that have been structurally characterized (including Permuted BMCs from Eut, Pdu and Cut MCP types), a novel extension at the N-terminus forms a right-handed 6-stranded beta-barrel (with one strand from each subunit) protruding from the otherwise flat face (Figure 2b). Another unique variation of BMC shell proteins has arisen from geneduplication events, thereby producing tandem domain structures. So-called BMC-T proteins, comprising two BMC domains, oligomerize to form trimeric pseudohexamers (Figure 2b) whose overall architectures closely resemble a canonical hexameric BMC disk [13,[39][40][41].
Three-dimensional all-against-all comparisons between the known BMC structures reveal further types of variation, particularly among the BMC-T proteins ( Figure 2). These relate to surprising differences in the way the sequential tandem domains are arranged, and whether the individual domains are permuted. Remarkably, different BMC-T proteins present sequentially connected BMC domains arranged in either a clockwise or counterclockwise fashion in the context of the trimeric (pseudohexameric) disk. These varied forms can be accommodated with a more finely articulated naming convention; BMC-T (+) , Permuted BMC-T (+) and Permuted BMC-T (−) . Here the superscript conveys the clockwise (+) or counterclockwise (−) ordering of domains when viewed from the flat face ( Figure 2b). The greatest number of BMC-T structures deposited in the PDB are of the Permuted BMC-T (−) type, with a non-exhaustive list including PDBs 3GFH, 3I82, 3MPV, 4FAY, 4FDZ and 6ARD [37,40,[42][43][44][45]. Despite sequence variations, all known BMC-Ts retain the overall BMC architecture, with pseudohexameric shapes compatible for exchange with BMC-H hexamers within a complete MCP [46]. The specific advantages conferred by distinct variations of the BMC protein in a shell are still not fully understood, but important functional distinctions are believed to relate to overall assembly, interactions with interior enzymes (and likely other cellular proteins), and molecular transport across the shell.
The central pores of BMC shell proteins provide routes for the diffusion of molecules across the MCP shell. Mutagenesis experiments suggest that the narrow pores in BMC-H hexamers are the primary routes of substrate influx [47-49*]. Different MCP types operate on and thus transport different substrates, suggesting that sequence and structural variations in the pores of BMC proteins are likely important for diverse metabolic functions. The pores of BMC-H hexamers are typically narrow (roughly 4 and 7 Å in diameter). Absent structural evidence for large conformational transitions, the narrow pores in BMC-H hexamers are presumed to be relatively static. Electrostatic properties of BMC pores have been analyzed, with particular implications for their roles in MCPs with charged substrates (e.g. the carboxysome [bicarbonate], Eut [ethanolamine], and Aaum [aminoacetone]) [9,10,19,50]. Several recent molecular dynamics (MD) and flux modeling studies have begun to examine the atomic details and mathematical aspects of pore transport. Important questions concern the degree to which pores in BMC-H proteins are selective for their cognate metabolic substrate. Optimal metabolic function would presumably occur with a combination of facile substrate influx and restricted metabolic intermediate efflux. MD studies on the PduA (BMC-H) protein suggested a modest level of selection in this regard, with a preference for its propanediol substrate that is 3 to 10 times greater than its propionaldehyde intermediate [51]. Similar MD studies on carboxysome shell proteins have also reported a range of selectivity for its substrate, with values in one case as high as 1000 times greater than its corresponding intermediate [49*]. Interestingly, transport and metabolic flux modeling on both the carboxysome and the Pdu MCP have emphasized that high selectivity might not be critical for function if internal consumption of the intermediate is sufficiently rapid [52,53]. Nonetheless, the pores of BMC-H proteins present useful targets for modulating MCP function, including by mutagenesis to occlude pores or to insert non-native 4Fe-4S clusters [47,54]. Additional structural studies on more remote BMC homologs could shed further light on transport mechanisms. Based on sequence alignment, several BMC shell proteins from a recently-proposed xanthine MCP [33*] appear to have three to four residue insertions near the loop region, opening the possibility for identifying novel pore features and functionalities.
BMC-T proteins present intriguing features and additional puzzles related to transport. A general theme is that the evolution of trimeric BMC assemblies appears to have allowed for greater versatility at the pore because of lower symmetry; only three instead of six equivalent residues need to be accommodated near the center. Indeed, examples of both Permuted BMC-T (+) and Permuted BMC-T (−) proteins have been shown to form trimers in which the central pore can apparently convert between open and closed forms, with important implications for the transport of larger substrates or cofactors [37,[40][41][42]50,55,56]. The potential presence of larger pores (between 12 -14 Å [42,55]) in the shell presents a dilemma, as retaining key metabolic intermediates is essential for proper MCP function. Different ideas have been put forward on the subject. In some cases it appears that the large BMC-T pores are regulated and could be occluded by allosteric binding events, e.g. by substrates when the MCP is active [42,50,57,58]. As described below, alternative mechanisms of opening and closing have been proposed in other cases [37,[40][41][42]50,55,56,58]. Interestingly, examples of the BMC-T (+) type include cases where the central pore presents three symmetry-related cysteine residues (one from each protein chain) for coordinating an 4Fe-4S cluster [54,57,59].

POLYMORPHISMS IN QUATERNARY STRUCTURE
MCP shell proteins display a surprising degree of flexibility ( Figure 3). This is particularly true of Permuted BMC proteins. An early study found that the EutS shell protein crystallized in two forms: 1) a canonical flat disk and 2) a hexamer with a twisting or bending deformation down the two-fold axis of symmetry (Fig. 3a) [37]. Recent structural characterization of another Permuted BMC homolog, CutR from a Choline Utilization Type II MCP, showed other forms of flexibility including the formation of flat disks and screw-type helical assemblies of varying pitch (Figure 3a) [38*]. In another instance, a synthetic Permuted BMC -a version of PduA (BMC-H) that was engineered to introduce an artificial circular permutation -rearranged to form a cyclic homopentamer, despite retaining the BMC fold (Figure 3a) [60]. Interestingly, such structural polymorphism does not appear unique to Permuted BMCs. BMV shell proteins are understood to serve as the pentameric component required for (Gaussian) shell curvature and closure [17,18,61], yet the EutN protein (from the BMV family, PF03319) was found in two separate crystallographic experiments to be capable of forming cyclic homohexamers (PDB ID 2HD3 and 2Z9H) [62].
BMC-Ts also appear capable of alternative quaternary conformations (Figure 3b). Specifically, all structurally characterized members of the Permuted BMC-T (+) subset have been observed to assemble as stacked disks (a dimer of trimers), creating a large central cavity accessible by pores on opposite ends [41,50,55,56]. While their biological relevance has yet to be confirmed, their recurrence in multiple studies suggests their potential importance. Stacked disks are observed in crystal structures of recombinantly expressed and purified Permuted BMC-T (+) type proteins from alpha-carboxysomes (CsoS1D), betacarboxysomes (CcmP), aminoacetone utilization MCPs and other MCPs of unknown function [41,50,55,58]. Double disks have also been observed by crystallography and cryo-EM in recombinantly purified mini MCP shells [46**,56], further described below. One intriguing hypothesis is that they could serve as a gated airlock system for transport [41].
Variability in the open and closed states of the pores of stacked disks have been noted in numerous studies. Some stacked disks have two open pores (PDB ID 4HT7), some have one open and one closed pore (PDB ID 3F56, 3FCH, 4HT5, 5LSR and 5V75) and some have two closed pores (PDB ID 3NWG, 5L39, 5LT5, 5V76 and 5SUH) [41,50,55,56]. The presence of stacked disks with two open pores would counter an airlock mechanism, but crystallographic observations call for cautious interpretation on this issue, owing both to conditions of the crystalline state and challenges in adequately capturing important dynamic behavior.

LARGE RECOMBINANT ASSEMBLY FORMS
Attempts to assemble larger protein species, including work to mix different shell proteins, date to the first BMC protein structural studies [9]. Subsequent efforts to develop experimental procedures and suitable combinations of shell proteins have led to remarkable successes in purifying and characterizing what can be described as miniaturized synthetic shells (Figure 4a). Several examples have been obtained, built from either a single component or multiple shell components and ranging in size from 130 Å to 400 Å in diameter [24**,46**,56,60,63**]. These miniaturized synthetic MCP shells have helped to support and more finely articulate assembly principles that were formulated from studies on individual shell components. Of particular note, these synthetic miniaturized MCPs have confirmed the roles of BMV proteins as polyhedral vertices and the role of lateral associations between hexameric units through conserved interactions at their perimeters.
These miniaturized structures have also led to surprises. The first observed mini shell was obtained serendipitously from an engineering experiment wherein a synthetically permuted version of an otherwise ordinary BMC-H protein, PduA, formed a 130 Å dodecahedron from twelve cyclic homopentamers [60]. More deliberate assembly studies based on mixtures of various BMC and BMV proteins have produced a range of structures. A 6.5 MDa mini shell was constructed from one BMC-H, one BMV and three BMC-T proteins from an MCP of unknown function. This 400 Å diameter, icosahedral shell (triangulation number T=9) highlighted the dynamic ability of different BMC-domain containing proteins to occupy different positions in the icosahedral shell [46**,56]. Studies on constructing mini shells from beta-carboxysome shell proteins resulted in a variety of structures including 210 Å T=3, 245 Å T=4 and a 310 Å prolate T=4, Q=6 icosahedral mini shells. A broad diversity in shape, size and morphology was observed despite using only one BMC-H and one BMV protein [63**]. The most recent study led to the structural characterization of a 250 Å T=4 shell constructed from GRM2 proteins. This work utilized three BMC-Hs, one BMV and numerous enzymatic proteins in varying combinations. In addition to the T=4 shell that was characterized in detail, this work revealed diversity in shape and size [24**]. While they were not necessary for the formation of closed structures, different enzymes appeared to be hierarchically involved in the formation of larger particles, though their structures could not be resolved by cryo-EM [24**].
The experimental studies on miniaturized shells have emphasized the importance of identifying a suitable composition of BMC (and BMV) paralogs for assembly. In some cases, the resolution of the structural studies has not fully distinguished the identities of similar paralogs in shells that contain complex mixtures of BMC proteins [24**,46**,56]. Thus, some uncertainties remain in modeling the key atomic interactions between components; models where distinctions between multiple BMC paralogs are ambiguous exhibit surface complementarity between BMC and BMV components with values that are somewhat lower than seen in well-resolved BMC-BMC interfaces (for example, 0.4 vs 0.6 or higher).
An important and unexpected observation from the structures of miniaturized synthetic MCPs concerns the orientation of the protein layer forming the shell. To date, all the cases on characterized mini shells show that BMC proteins are oriented with their concave faces oriented outward towards the cytosol. Early structural studies on BMC and BMV proteins offered both orientations (concave in or concave out) as possibilities [17], but biochemical and mutational studies provided evidence to suggest that concave faces interact with internal enzymes [64,65], which would require that flat faces to be oriented outward towards the cytosol. Additional biochemical data could be vital for clarifying this outstanding issue. In order to elucidate the exact arrangements of shell proteins, higher resolution structural data on intact native MCPs produced in-situ and containing interior enzymes will be essential. Encapsulated enzymes are particularly critical, given the important roles that interior enzymes have been shown to play in organizing the external shell in some systems [24**,66-68*].
Several Cryo-electron tomography (cryo-ET) studies have begun to pursue in-situ MCP structural elucidation. Multiple cryo-ET studies on carboxysomes, which are the most geometrically regular of the MCP types, have confirmed their roughly icosahedral shape revealing nearly flat facets and identifiable edges, though detailed structural features of the individual shell proteins, including shell protein orientation, have not been resolved [69][70][71]. Moreover, some degree of order has been seen for the encapsulated RuBisCO molecules [69,70,[72][73][74]. The metabolosome MCPs present even greater challenges owing to their more irregular, polymorphic shapes [75]. Even the degree to which their polyhedral architectures might be described within the broader scheme of irregular icosahedra remains unclear. Recent studies highlight the possible need to bring new kinds of analysis to this problem [68*,76].

CONCLUSION AND OUTLOOK
Bacterial microcompartments are extraordinary examples of how complex protein assemblies have evolved to provide subcellular organization and compartmentalization in bacterial cells. Their ability to form robust supramolecular architectures from a complex mixture of homologous shell proteins rivals similar phenomena seen in large viruses. A great deal of structural data has revealed nuances in topological variations, conformational flexibility and quaternary polymorphisms in MCP shell proteins, highlighting the role that duplication has played in supporting functional diversification. Studies on miniaturized MCPs have affirmed models for larger scale shell architecture, though the absence of internal enzymes presents an important gap that will ultimately need to be bridged in order to understand native MCPs. In particular, the shells of metabolosome MCPs are considerably more irregular compared to the icosahedral and nearly icosahedral assembly models presently available. Cryo-ET studies pushed to higher resolution limits may be essential for achieving a fuller understanding of MCPs. In parallel, computational simulations could be informative regarding biophysical parameters that might govern assembly architecture. New large-scale assembly simulation methods are beginning to provide insights along this line [77*,78].  Different BMC shell proteins exhibit varied tertiary structures. (a) Based on comparisons of three-dimensional similarity with sequential ordering enforced, BMC shell proteins of known structure (PDB codes shown) cluster into five distinct populations, representing subtypes within the two major families: BMC-H (blue) which assemble as hexamers from a single BMC domain and BMC-T (red) which assemble as trimers from two BMC domains. In the case of BMC-Ts, superscript notations denote whether the sequential domains are arranged clockwise (+) or counterclockwise (−) in a disk, as shown in panel b. The graph layout is a stochastic optimization placing protein structures at separation distances specified by their coordinate overlap deviations (including penalties for non-overlapping residues) as nearly as possible in two-dimensional space; the overall orientation is arbitrary.