Biochemical and structural insights into SARS-CoV-2 polyprotein processing by Mpro

SARS-CoV-2, a human coronavirus, is the causative agent of the COVID-19 pandemic. Its genome is translated into two large polyproteins subsequently cleaved by viral papain-like protease and main protease (Mpro). Polyprotein processing is essential yet incompletely understood. We studied Mpro-mediated processing of the nsp7-11 polyprotein, whose mature products include cofactors of the viral replicase, and identified the order of cleavages. Integrative modeling based on mass spectrometry (including hydrogen-deuterium exchange and cross-linking) and x-ray scattering yielded a nsp7-11 structural ensemble, demonstrating shared secondary structural elements with individual nsps. The pattern of cross-links and HDX footprint of the C145A Mpro and nsp7-11 complex demonstrate preferential binding of the enzyme active site to the polyprotein junction sites and additional transient contacts to help orient the enzyme on its substrate for cleavage. Last, proteolysis assays were used to characterize the effect of inhibitors/binders on Mpro processing/inhibition using the nsp7-11 polyprotein as substrate.


INTRODUCTION
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; CoV-2), a member of the family Coronaviridae, is responsible for the ongoing coronavirus disease 2019 (COVID-19) global pandemic (1). The toll of CoV-2 is extraordinary in terms of worldwide repercussions in the number of infected people, deaths, and pace of infection spread (https://covid19.who.int/). SARS-CoV-2 has ã 30-kb (+)-sense RNA genome, one of the largest known of any RNA virus, that encodes 13 open reading frames (ORFs), including replicase (ORF1a/ORF1b), spike (S), envelope (E), membrane (M), nucleocapsid (N), and 7 other ORFs that encode accessory proteins (2). ORF1a and ORF1b are translated to produce two large polyproteins, pp1a and pp1ab. These polyproteins are subsequently cleaved into 16 nonstructural proteins (nsps) by virally encoded proteases: the papain-like protease (PLpro; a domain of nsp3), which cleaves junctions from nsp1 to the nsp4 N terminus, and the main protease (Mpro; nsp5, 3C-like protease), which cleaves junctions from the nsp4 C terminus to nsp16 (3). The "polyprotein strategy"-used by most RNA viruses and retroviruses-allows for (i) a more compact genome, (ii) regulation of activity through a precise temporal (i.e., stage of viral cycle) and spatial (i.e., subcellular location) cleavage pattern, and (iii) cleavage intermediates having distinct and critical roles from those of the cleaved products, as shown for alphaviruses, picornaviruses, and noroviruses (4)(5)(6). Hence, coordinated processing of polyproteins is vital for regulating the viral life cycle. Different polyprotein intermediates derived from Mpro-mediated pp1a/1b processing have been detected in other CoVs, including mouse hepatitis virus (MHV) (7,8) and alphacoronavirus human CoV 229E (HCoV-229E) (9). CoV-2 and MHV belong to the betacoronavirus genus with the latter being a good surrogate mouse model for studying CoV-2 infection and biology (10)(11)(12). Notably, mutations in the junction sites within the MHV nsp7-10 polyprotein were found to be lethal for viral replication, with the exception of the nsp9-10 site, where mutations led to a crippled mutant virus (8). In addition, a polyprotein intermediate of~150 kDa corresponding to nsp4-10/11 has been detected in pulsechase experiments (13,14). Reverse genetic studies with temperature-sensitive mutants in MHV suggest that nsp4-11 could serve as a scaffold where replicative enzymes (nsp12, nsp14, and nsp16) may dock to perform their activities on the viral RNA (15). Alternatively, they may indicate that mutation of this intermediate perturbs Mpro processing (16). Thus, the functional roles of nsp4-10/ 11 in virus replication remain unclear. In addition, the subcellular localization of nsp7 to nsp10 has been studied for several CoVs using immunofluorescence microscopy and cryo-electron microscopy/tomography. pp1a/1ab is anchored to the endoplasmic reticulum membranes by flanking transmembrane domains of nsp4 and nsp6, along with membrane-spanning nsp3. This topology results in the membrane-anchored Mpro being exposed to the cytosol along with, most likely, the nsp7-10/7-11 region (7,(17)(18)(19)(20). Moreover, data from CoVs and other RNA viruses suggest that "convoluted membranes" (the precursors of the coronavirus replication organelles formed by double-membrane vesicles) may be the main site of viral gene expression and polyprotein processing. However, it should be noted that these labeling techniques cannot distinguish between mature nsps and polyprotein intermediates.
More recently in CoV-2-infected cells, the identification of viral cleavage sites at nsp4, nsp8-9, and nsp10-12 junctions at different postinfection time points has validated the presence of polyprotein intermediates and thus garnered support for further investigation into their functional relevance and structures (3). Krichel and coauthors (21,22) have applied a structure-function approach to investigate the processing of the SARS-CoV nsp7-10 and Middle East respiratory syndrome coronavirus (MERS-CoV) nsp7-11 polyproteins in vitro using native mass spectrometry (MS). Their results emphasized the critical role of the polyprotein conformation and the structural environment of the cleavage junctions in determining cleavage order, as the order of processing was previously inferred by determining the specific activity of Mpro cleavage on short oligopeptide sequences comprising the cleavage junctions (23).
Here, we have used a multipronged approach to study the structural basis of processing of the CoV-2 nsp7-11 polyprotein by Mpro in vitro, given its highly dynamic nature and multidomain organization. We have characterized the processing kinetics through gelbased and pulse labeling MS techniques, as well as the structural footprint of the polyprotein on Mpro and vice versa. We also Color scale represents changes in deuterium uptake over the course of the cleavage reaction, with gray representing no significant change in deuterium uptake, white denoting no sequence coverage, and black representing residues within peptides that are no longer identifiable. developed integrative structural models of the nsp7-11 and nsp7-8 polyproteins [by MS, small-angle x-ray scattering (SAXS), and molecular modeling]. These experiments allowed us to rationalize how the tertiary structure of the polyprotein influences the order of processing of the polyprotein by Mpro and provided insights into the binding of the polyprotein substrate to Mpro. Lastly, taking advantage of the vast number of Mpro-ligand structures, we identified a set of binders (with some displaying antiviral activity) overlapping with regions of Mpro relevant for polyprotein binding outside of its active site and probed them in proteolytic inhibition assays including the full-length polyprotein as substrate. Taken together, the information gathered from this study improves our understanding of the role of polyproteins in SARS-CoV-2 viral replication.

SDS-PAGE analysis reveals cleavage order for Mpromediated processing of the nsp7-11 polyprotein
Polyprotein processing in coronaviruses is a precise and tightly regulated process (8,16,36). We first expressed and purified the nsp7-11 and nsp7-8 polyproteins and wild-type (WT) Mpro (fig. S1) to assess the proteolytic cleavage order of SARS-CoV-2 polyproteins. Next, we conducted a semiquantitative proteolysis assay of the nsp7-11 polyprotein with Mpro using SDS-polyacrylamide gel electrophoresis (SDS-PAGE) as a readout ( Fig. 1A and fig. S2). Analysis of the nsp7-11 polyprotein processing at room temperature revealed that the nsp9-10 junction was cleaved initially (starting~30 min), followed by simultaneous cleavage of the nsp8-9 and nsp10-11 junctions (starting~2 hours), and lastly the nsp7-8 junction (starting 4 hours) (Fig. 1, A and B). This order of cleavage is identical to the polyprotein processing order reported for SARS-CoV (CoV-1), which was expected given their high amino acid sequence conservation (22). Identification of the protein bands in the SDS-PAGE gel was done by performing in-gel tryptic digestion followed by liquid chromatography-tandem MS (LC-MS/MS) analysis (fig. S2). Authentic N-and C-terminal peptides were observed for some of the proteins, suggesting that the Mpro cleavage resembles the authentic processing in our in vitro system. Moreover, performing the proteolysis at 4°C slowed the cleavage process but did not change the cleavage order. Altering ratios of Mpro to polyprotein also had no effect on the cleavage order (1:6 and 1:12 molar ratios) ( fig. S3), further supporting the specificity of Mpro and the lack of a temperature/concentration-dependent effect on cleavage order. After 24 hours of exposure to Mpro, the nsp7-8 junction was not completely cleaved. The proteolysis assay with the nsp7-8 intermediate polyprotein showed that nsp7-8 was also not fully cleaved after 24 hours (1:10 molar ratio of Mpro:nsp7-8) ( fig. S4), suggesting that the structural environment around the nsp7-8 junction impedes efficient Mpro cleavage with respect to the other junction sites.

Pulse labeling HDX-MS validates the cleavage order of polyprotein processing observed by SDS-PAGE analysis
To gain further structural insight into the polyprotein processing by Mpro, we conducted the same proteolysis assay followed by pulse labeling hydrogen-deuterium exchange MS (HDX-MS). HDX-MS is a useful tool for probing protein conformation and dynamics by measuring protein backbone amide hydrogen exchange. During conventional continuous labeling HDX-MS, proteins are incubated in deuterated buffer for increasing lengths of time such that changes in solvent exchange, as measured by deuterium uptake, represent changes in equilibrium conformational dynamics as a result of alterations in the backbone amide hydrogen chemical environment, such as changes in hydrogen bonding and/or solvent accessibility. However, with pulsed HDX-MS, protein interactions are sampled at different incubation intervals and then subjected to a single, short deuterium labeling step. Thus, observed changes in solvent exchange are representative of distinct protein populations present in solution. Briefly, we incubated nsp7-11 with Mpro at an equimolar ratio, and the cleavage reaction was allowed to proceed over 24 hours on ice. Aliquots of the reaction were taken at various time intervals and incubated in deuterated buffer for 30 s on ice before being quenched, flash-frozen, and stored until ready for MS analysis. All time points were compared to nsp7-11 without Mpro to observe changes in solvent exchange occurring in the polyprotein over the course of the proteolytic process (Fig. 1C, fig. S5, and table S1).
We observed increased solvent exchange in the nsp9 C-terminal region at 30 min as compared to the protein in the absence of enzyme (Fig. 1C). Solvent exchange continued to increase in this region during the proteolysis time course and was followed by increased solvent exchange in the nsp10 N-terminal region. These increases in solvent exchange can be explained by cleavage of the nsp9-10 junction releasing the nsp7-9 N termini and nsp10-11 C termini from the intact polyprotein junction. These newly formed termini have increased conformational mobility compared to the intact polyprotein junctions. Concomitantly, we observed decreased solvent exchange at the cleavage junction residues, which is a result of reduced solvent accessibility due to Mpro binding at the junction sites during the cleavage reaction. As the proteolysis time course progressed, we also observed a decrease in signal intensity of the peptides spanning the nsp9-10 junction site, providing further evidence of productive cleavage of the nsp9-10 junction. At 24 hours, we were no longer able to detect these peptides in the mass spectrometer, indicating full cleavage of the nsp9-10 site ( fig. S5). These observations suggest that the nsp9-10 junction is being cleaved first, consistent with our SDS-PAGE analysis (Fig. 1A).
The nsp8-9 and nsp10-11 junctions appeared to be simultaneously cleaved next as both junctions showed changes in solvent exchange starting at 4 hours. At the nsp9 N-terminal region, we observed increased solvent exchange compared to the protein in the absence of the enzyme. As we were unable to detect peptides specifically spanning the nsp8-9 junction, we could not determine the exact timing of full cleavage. Meanwhile, at the nsp10-11 junction site, we observed decreased solvent exchange and reduced signal intensity in the peptides spanning the junction site until 24 hours when they were no longer identified in the mass spectrometer, suggesting full cleavage. While it appears that nsp11 is no longer observed at 24 hours, this is due to our inability to detect any nsp11-only peptides, as it is only 13 amino acids in length, such that all the peptides covering nsp11 also include the junction residues ( fig. S5). Nevertheless, it was clear that nsp8-9 and nsp10-11 are cleaved simultaneously, following cleavage at the nsp9-10 site.
The nsp7-8 site is cleaved last, as we did not observe any changes in deuterium uptake near the nsp7-8 junction until 8 hours. While we did not observe decreased solvent exchange in the peptides spanning the nsp7-8 junction to indicate Mpro binding at this site, we no longer detected these peptides in the mass spectrometer at 24 hours, suggesting that the nsp7-8 junction is cleaved by Mpro.
We also observed changes in solvent exchange away from the junction sites in nsp7 and nsp8. The protection from solvent exchange within both nsp7 and nsp8 suggests that nsp7 and nsp8 associate into a heterodimer after their release from the polyprotein. The pattern of protection observed in nsp7 agreed well with the protection we observed in our prior continuous labeling HDX-MS analysis of the nsp7:nsp8 complex (wherein we confirmed agreement of our HDX-MS data with the multiple x-ray crystal structures of "linear" heterotetrameric nsp7:nsp8 complexes) (37). Unexpectedly, increased exchange within nsp8 was also observed starting at 2 hours. These peptides demonstrated EX1 exchange kinetics as revealed by the detection of two distinct deuterated ion distributions (bimodal) for the same peptide (38). Under native conditions, this behavior has been shown to be a result of multiple intermediate conformational protein states (39)(40)(41)(42). The observed EX1 behavior in nsp8 could be explained by the increased flexibility of the nsp8 N terminus, adopting multiple conformations as previously documented (21,37).
In addition, nsp10 also showed decreased deuterium uptake away from the junction site. Comparing the intrinsic exchange profile of nsp7-11, nsp7-10, and individual nsp10 (see the Section 'HDX-MS profile of polyprotein revealed similar secondary structure elements to individual nsps'), we observed that nsp10 has the greatest deuterium uptake in nsp7-11, while nsp7-10 and nsp10 showed similar intrinsic exchange profiles, suggesting that the presence of nsp11 reduces solvent exchange within nsp10. Specifically, the regions of protection from solvent exchange observed in nsp10 during the pulsed HDX-MS experiment align with the residues showing decreased deuterium uptake in mature nsp10 and nsp7-10. Moreover, the pulsed HDX-MS following nsp7-10 proteolysis ( fig. S6) did not show any changes in deuterium exchange in nsp10, as expected from the comparison of the intrinsic exchange profiles of nsp7-10 and mature nsp10. This confirms that released nsp10 does not interact with other liberated proteins in solution, and the observed decreased solvent exchange in nsp10 is due to the release of nsp11 from nsp10.

Continuous labeling HDX-MS demonstrates localized sites of interaction of C145A Mpro to polyprotein junction sites
Next, we used continuous labeling HDX-MS and cross-linking MS (XL-MS) as complementary techniques to better understand the solution phase dynamics of the complex. To facilitate these studies, we used a catalytically inactive Mpro mutant, C145A Mpro. Mutation of the active site cysteine to alanine inhibits the cleavage of the substrate, allowing us to capture the bound Mpro: polyprotein complex. Using continuous labeling HDX-MS, we compared nsp7-11 versus nsp7-11 in complex with C145A Mpro at an equimolar ratio ( Fig. 2A, fig. S7A, and table S1). Increased protection from solvent exchange was observed at all junction sites except the nsp10-11 junction. The nsp9-10 junction had the largest magnitude in protection from solvent exchange, which suggests that it is the primary binding site on the polyprotein, as the presence of Mpro at the junction site would reduce solvent accessibility and solvent exchange. This observation is consistent with the SDS-PAGE (Fig. 1A) and pulsed HDX-MS results (Fig. 1C) that also indicate the nsp9-10 junction to be the initial target of Mpro. Minimal alteration of solvent exchange was observed within the nsp subdomains outside of the junction sites, suggesting that Mpro interaction with the polyprotein is favored at the junction site sequences and that binding of Mpro does not induce significant long-range conformational changes in the polyprotein. Only nsp8 showed additional regions of protection from solvent exchange away from the junctions, specifically residues T120 to M140 and K182 to L213. These regions of the polyprotein are inherently more dynamic, as determined by higher intrinsic deuterium exchange (Fig. 3A), and thus, the observed protection suggests that interaction with C145A Mpro is stabilizing the nsp8 N-terminal region.
Similar results were observed with the nsp7-10 polyprotein (Fig. 2B, fig. S7B, and table S1). All the cleavage sites in nsp7-10 were protected from solvent exchange upon interaction with C145A Mpro. In addition, the nsp9-10 junction showed the greatest magnitude in protection from solvent exchange, while only nsp8 showed additional regions of protection outside the junctions. These results indicate that nsp11 does not alter the polyprotein interactions with C145A Mpro.

Continuous labeling HDX-MS validates polyprotein binding to C145A Mpro beyond the active site pocket
Next, we profiled changes in solvent exchange of C145A Mpro bound to nsp7-11 at an equimolar ratio but did not observe any significant protection from solvent exchange within C145A Mpro ( fig.  S7C), despite evidence from the above differential HDX-MS experiments that C145A Mpro is interacting with the polyprotein. We noted that C145A Mpro deuterium uptake was relatively low over an experimental time course of 1 hour, suggesting that the Mpro dimer is stable. Thus, we incubated C145A Mpro and the C145A Mpro:nsp7-11 complex in deuterium buffer for 12 hours to increase the observable experimental window for detecting protection from solvent exchange. Overall, deuterium uptake was increased over this longer time course, and significant protection from solvent exchange was observed in the active site peptides in the region containing the C145A mutation only when in the presence of the polyprotein (Fig. 2C, fig. S7D, and table S1). Additional regions of the enzyme were shown to be protected from solvent exchange at 12 hours of incubation in deuterated buffer only, including peptides spanning residues T45 to S46, V77 to L89, I106 to F150, Y161 to L205, and D289 to Q306 (Fig. 2D, fig. S7E, and table S1). These regions mapped from the back side of the C145 Mpro active site pocket, including the other catalytic dyad residue (H41), to the vicinity of the dimerization interface, hinting at a putative binding track for the rest of the nsp7-11 polyprotein outside of the nsp9-10 junction (see below for more details).

XL-MS demonstrates additional contact sites between C145A Mpro and polyproteins
While the continuous labeling HDX-MS analysis of the C145A Mpro polyprotein complex reported on changes in protein backbone dynamics, we also analyzed the complex using XL-MS to probe protein side-chain reactivity and spatial proximity. A total of nine interprotein cross-links were identified between C145A Mpro and nsp7-10 ( Fig. 2E). The three C145A Mpro residues (K61, S62, and K102) that form interprotein cross-links with nsp7-10 were mapped to the catalytic domain (Fig. 2F). When these cross-links were mapped alongside the HDX-MS data, we observed that the interprotein cross-links map to residues outside of the regions showing protection from solvent exchange (Fig. 2E). Accordingly, these cross-links represent additional contact sites between the polyprotein and Mpro that might help orient the junction sites into the active site and stabilize the complex. For example, K162 within nsp8 is located between the two regions of nsp8 showing protection from solvent exchange (Fig. 2B) and forms an interprotein cross-link with S62 and K102 in Mpro. This further supports that interaction with C145A Mpro stabilizes nsp8 conformation. No cross-links were observed between the Mpro active site and polyprotein junction sites as we used the MS-cleavable cross-linker disuccinimidyl sulfoxide (DSSO) that primarily reacts with lysine as well as serine, threonine, and tyrosine albeit with lower efficiency. Because the Mpro active site does not contain lysine residues, only S46 and T190 may be capable of reacting with DSSO; however, our results suggest that these residues are not reactive toward the cross-linking reagent under the current experimental conditions.

HDX-MS profile of polyprotein revealed similar secondary structure elements to individual nsps
The HDX-MS intrinsic exchange profiles of nsp7-11, nsp7-10, and nsp7-8 polyproteins all revealed similar solvent exchange behavior ( Fig. 3A), which suggests that all polyproteins share similar secondary structure elements and overall conformation. In addition, the HDX-MS intrinsic exchange profiles of the polyproteins largely resembled the intrinsic exchange profiles of individual nsp7, nsp8, and nsp9 (Fig. 3A). This suggests that the secondary structures within the polyproteins remain largely unchanged in these mature nsps. The only exception is mature nsp10, which, by itself or within nsp7-10, shows reduced deuterium uptake compared to nsp10 within the nsp7-11 polyprotein. This suggests that nsp10 shows an increase in dynamics when bound to nsp11.
Integrative structural modeling of the nsp7-11 polyprotein structure predicts an ensemble of different conformations Next, we turned to an integrative structural modeling approach using multiple experimental techniques to account for biases inherent to each technique. We used the ab initio-based Iterative Threading Assembly Refinement (I-TASSER) algorithm that allows incorporation of experimental restraints (43). We decided to focus the integrative modeling efforts on nsp7-11 as translation of ORF1a includes nsp11. We first used analytical size exclusion chromatography (SEC) coupled to multiangle light scattering (MALS) and SAXS detection (SEC-MALS-SAXS) to analyze the assembly state and structural features of the polyprotein in solution. The SEC of the nsp7-11 polyprotein showed two peaks, suggesting the presence of two different states: monomer and dimer, with the monomeric form being predominant ( fig. S8A). The MALS analysis was used to calculate the molecular weights (MW) of the two identified peaks for nsp7-11 (∼60 and~110 kDa). SAXS analysis was conducted for both oligomeric states to understand the arrangement of these polyproteins in solution. Concretely, evolving factor analysis was used for separating the scattering of the monomer and dimer components ( fig. S8B) in a model-independent way (44). Both states yielded a linear Guinier plot, indicating the presence of stable protein sample with no aggregation (fig. S8C). The bell-shaped (Gaussian) curve at lower q values in the Kratky plot showed that the sample contains folded domains with no significant disorder ( fig. S8D). The pairdistance distribution function, P(r), which is related to the shape of the sample, indicated a globular-shaped protein for both the monomeric and dimeric forms of nsp7-11 ( fig. S8E). The Rg and Dmax values calculated from the P(r) are 48.2 and 191 Å for the dimer and 35.8 and 156 Å for the monomeric nsp7-11 (table S2).
Subsequently, we applied the integrative structural modeling approach to predict the structure(s) of the monomeric state of the nsp7-11 polyprotein (Fig. 3B), which may likely be the form binding to Mpro (based on the HDX-MS data; see Fig. 2A and fig. S7A). Models were generated on the basis of the amino acid sequence and the following experimental parameters: (i) distance constraints from XL-MS, (ii) secondary structure restraints from solved x-ray crystal structures of the mature nsp7 to nsp10, and (iii) various nsp7-8 polyprotein models (fig. S10, A to F; more details in the Supplementary Materials). A final ensemble of 10 nsp7-11 models were binned into four representative conformational groups, which were all assessed using our gamut of techniques (Fig. 3, B and C).
When we compared the four model groups, the nsp7 helical bundle stood as the most conserved structural element in all models, except for group D. The other nsp domains presented more diversity in structural conformations and orientations. Group A is defined by an extended helical N terminus, with a "golf-club" conformation, observed in the CoV-1 nsp7:nsp8 complex (45) and in CoV-2 structures of nsp8 interacting with nsp7 and nsp12 (46)(47)(48). Groups B and C exhibit a more compact organization of nsp8, with group B having nsp7, nsp8, nsp9, and nsp10 arranged linearly, while group C has all domains arranged in a packed "sphere," and group D presents an "open" conformation.
Despite these conformational differences, all the models satisfied most of the cross-links, with distances equal to or less than 30 Å (upper limit distance for DSSO cross-links; Fig. 4A, fig. S11, and table S3). Specifically, group B satisfied the greatest percentage of cross-links, while groups A and D had the highest number of violations. These violations mostly stemmed from the extended nsp8 N-terminal helix in group A and the less ordered conformation in group D, which suggests that the nsp7-11 polyprotein, and especially the nsp8 segment, samples multiple conformations in solution. This conclusion is also supported by the HDX-MS data, showing that the central region of the nsp8 N-terminal subdomain exhibited greater solvent exchange (higher percent deuterium), suggesting increased inherent dynamics ( fig. S12D).
Next, we evaluated the agreement of the models with the experimental HDX-MS data using HDXer software ( Fig. 4B; fig. S12, A and B; and table S3) (49,50). HDXer calculates deuterated fractions for peptide segments corresponding to the experimental data as a function of the experimental deuterium exposure times. We then plotted the computationally derived percent deuterium value at a 10-s incubation in deuterated buffer for each model versus the experimentally determined percent deuterium value at a 10-s incubation in deuterated buffer and calculated their root mean square error (RMSE). Group A demonstrated the lowest RMSE (best agreement), compared to other models. Group C and D models had the highest RMSE, suggesting these conformations to be representative of models with too much rigidity or too much flexibility, respectively. Moreover, it was noted that nsp11 had the worst agreement with experimental HDX-MS data, while nsp7 had the best agreement (fig. S12C), which is likely due to the lack of nsp11 structures and abundance of x-ray crystal structures of nsp7 in complex with nsp8.
Next, a three-dimensional (3D) shape (bead model) was reconstructed for the monomeric nsp7-11 from the scattering profile using DAMMIF/N from the ATSAS package (51,52). As the SAXS scattering profile represents averaged scattering from all different possible orientations, it may be possible that many different shapes/orientations can generate the same scattering profile, and therefore, for certain shapes, it can be difficult to generate a bead model that correctly represents the solution shape. To assess whether a bead model uniquely fits the scattering data or whether multiple models can fit the data, certain criteria are checked including ambiguity score, normalized spatial discrepancy (NSD) value, number of clusters, and parameters such as Rg, Dmax, and MW values. Ambiguity score or "a-score" is the initial screening that informs about the number of possible shapes representing the same scattering profile. An a-score below 1.5 is usually indicative of a unique ab initio shape determination. In our case, 0.85 ascore suggested a unique 3D reconstruction. The Rg and Dmax values obtained from the reconstructed model were close to those calculated from the P(r) function. The MW value of the refined model was also comparable to the expected MW value (table S2). Another important criterion to consider is NSD, which is used to evaluate the stability of the reconstruction. An NSD value less than 1.0 suggests fair stability of the reconstructions. DAMAVER reported 0.95 NSD for our reconstruction, which is on the borderline of a stable reconstruction. DAMCLUST created nine different clusters, suggesting that several different shapes in solution could have generated the same scattering profile. While the ambiguity score and comparable Rg, Dmax, and MW values favored the bead model reconstruction, other criteria such as NSD and the number of clusters suggested heterogeneity in the reconstruction. The higher NSD value and multiple clusters are likely due to nsp7-11 adopting multiple conformations. As stated earlier, the central segment of nsp8 is highly flexible and dynamic, as suggested by greater solvent exchange in HDX, which could lead to heterogeneity in the conformations. Of the four representative conformational groups, group A showed the best fit with the reconstructed SAXS envelope, as the extended nsp8 helix fit into the elongated extension of the envelope. The less ordered and open conformation of the group D model appeared to fit better in the SAXS envelope compared to the more ordered and compact structure of group C models. We also compared the calculated scattering profile for the models to the experimental scattering profile. The χ 2 and Rg values for group A showed the greatest agreement with experimental data (Fig. 4C, fig. S13, and table S3).
In summary, the assessment of the 10 models using HDX-MS, XL-MS, and SAXS highlighted that nsp7-11 can sample four major conformations (table S3). To note, our integrative structure modeling approach cannot ascertain the abundance of the different conformers within the ensemble. Group A conformers adopted an extended nsp8 helix with good agreement with HDX-MS and SAXS data but poor agreement with XL-MS data. Group B conformers showed linear nsp organization with good XL-MS agreement but average agreement with HDX-MS and poor SAXS agreement. Group C conformers were arranged as a packed spherical structure with poor HDX-MS and SAXS agreement but good XL-MS agreement. Lastly, group D conformers had the most dynamic conformations (i.e., fewer ordered secondary structural elements) and showed good agreement to SAXS data but average agreement to HDX-MS and poor XL-MS agreement (table S3).
The ensemble of nsp7-11 models unveils the interplay between cleavage junction conformation and accessibility to determine preference and order of cleavage Next, we evaluated the structural environment of the cleavage junctions in the ensemble of nsp7-11 models to understand the influence of polyprotein substrate conformation and accessibility in processing (Fig. 5, fig. S14, and table S3). In general, all the junctions (except for nsp8-9, which was just partially covered by HDX-MS; Fig. 2A) showed high levels of solvent exchange (high percent deuterium values), consistent with the fact that the cleavage regions should be accessible for proteolysis to occur. The combination of secondary structure and accessible surface area for groups B and C was most consistent with the processing order that we determined by limited proteolysis and pulsed HDX-MS (Fig. 5, fig. S14, and  table S3). Comparing all the junctions, the nsp9-10 junction, which was the first to be cleaved, was the most exposed junction in all the models and adopted a random coil in all but one model, which may best facilitate interaction with Mpro. On the other hand, the nsp7-8 junction, which was the last to be cleaved, was more hindered and mostly adopted an α-helical conformation, which may entail a slow cleavage event. For group A models, nsp7-8 junction appeared to be the most accessible junction, which ultimately lends to our conclusion that the polyprotein is likely sampling multiple conformations with some being more amenable to proteolytic processing than others.

Probing nsp7-11 binding to Mpro with smallmolecule binders
To further understand the implications of polyprotein binding to Mpro outside its active site-studied first via HDX-MS and XL-MS-regarding proteolytic processing, we leveraged the proteolysis assay using the nsp7-11 polyprotein as Mpro substrate to measure inhibition by active site and non-active site binders of Mpro identified through crystallography (30,32). Specifically, we selected small-molecule binders overlapping with the Mpro regions showing protection from solvent exchange in the differential HDX-MS of C145A Mpro with nsp7-11 at 12 hours ( Fig. 6A and  . Some of them presented antiviral activity, but most of them were not tested in enzymatic assays (32). We used the FDAapproved drug NMTV as a positive control. NMTV, as expected, strongly inhibited Mpro processing [median inhibitory concentration (IC 50 ) = 255 nM] of the full nsp7-11 polyprotein substrate in vitro (Fig. 6, B to D) (29). None of the non-active site binders displayed significant inhibition of the enzymatic activity of Mpro (fig. S15A). On the contrary, climbazole and pelitinib showed activation of Mpro activity in our assay conditions, despite the latter showing inhibition of SARS-CoV-2 replication with median effective concentration = 1.25 μM with moderate cytotoxicity (32).
Next, we analyzed by differential HDX-MS the effect of ligand binding on Mpro. NMTV was the only compound that showed significant change in Mpro solvent exchange behavior (Fig. 6E, fig.  S15B, and table S1). The lack of observed change in solvent exchange may be due to experimental limitations in studying the interactions of weak binders by HDX-MS (53,54). The NMTV interaction footprint on Mpro demonstrates strong protection from solvent exchange in the active site, in agreement with the mechanism of action of NMTV forming a reversible covalent thioimidate adduct with the catalytic C145 (29). These results closely resemble the nsp7-10/11 interaction footprint (Figs. 2 and 6A), as we observed protection in the active site of Mpro upon interaction with the polyprotein. In addition, the nsp7-10/11 footprint showed protection from solvent exchange in residues V77 to L89 not found in the presence of NMTV. These residues are located on the back of the catalytic domain, near residues K61 and S62, which form inter--Mpro-nsp7-11 cross-links and are thus likely stemming from a more transient interaction of Mpro with the polyprotein away from the active site.

DISCUSSION
In this work, we have studied the processing of CoV-2 nsp7-11 polyproteins by Mpro. As expected by the high degree of amino acid conservation, we have seen that CoV-2 polyprotein processing is almost, if not, identical to that observed for CoV-1 (22). The cleavage order deduced from the gel analysis is also supported by results from the pulsed HDX-MS experiment. The increased solvent exchange observed in the pulsed HDX-MS in the nsp9 C-terminal region at the first time point (30 min) suggests cleavage and release of nsp7-9 to increase solvent exposure of the nsp9 C-terminal region. This is consistent with the fact that SDS-PAGE gel shows an intermediate nsp7-9 polyprotein observed at 30 min to 1 hour, suggesting that the nsp9-10 junction is the first cleavage site. As shown in the literature for CoV-1 (22), the order of processing cannot be directly inferred from the substrate specificity of Mpro with peptides mimicking the cleavage junctions as the conformation and accessibility of the substrate polyprotein(s) are critical to regulating the process.
Whether the same in vitro order of cleavages occurs during viral replication is unknown. However, several lines of evidence support this concept. Several studies have detected the nsp4-nsp10/11 polyprotein intermediate in MHV-infected cells (13,16,55). Recently, in CoV-2-infected cells, the identification of viral cleavage sites at nsp4, nsp8-9, and nsp10-12 junctions at different postinfection time points is also consistent with such a polyprotein intermediate (3). Reverse genetics studies with MHV-infected cells (8, 16, 55) also provide support for their essential role in the viral replication cycle. As shown in MHV, the processing order of the nsp7-10 region is crucial for viral replication: Either domain deletions or switching and cleavage site mutations were lethal to the virus replication, with the exception being the inactivation of the nsp9-10 cleavage site, which yielded an attenuated mutant virus (8).
In addition, the nsp7-11 and nsp7-8 processing results indicate the presence of the nsp7-8 intermediate even after 24 hours of exposure to Mpro. It is not known whether this longer-lived intermediate could have some functional or essential role in the viral cycle; further suppression of nsp7-8 maturation could represent a unique drug target. The existence of potent maturation inhibitors in HIV has validated this concept as a plausible strategy; bevirimat, the lead for this class, binds to the CA/SP1 junction of the Gag polyprotein and hinders its cleavage: This junction (similar to the nsp7-8 junction) is in a dynamic helix-to-coil equilibrium, and binding of bevirimat stabilizes the helical conformation (56)(57)(58). Regardless, it should be noted that, as labeling techniques used for microscopy cannot distinguish between mature nsps and polyprotein intermediates, chemical probes specifically targeting the nsp7-8 junction could help in further elucidation of the role of polyproteins during the CoV cycle.
Despite the efforts to understand the role of polyprotein processing, structural characterization of the polyproteins is still incomplete. Our HDX-MS results revealed the critical observation that the studied polyproteins have similar intrinsic exchange profiles as the individual proteins (and thus share similar structural elements) that led us to conclude that the individual nsps do not undergo large structural rearrangements following cleavage by Mpro. This observation is also supported by our pulsed HDX-MS analysis of polyprotein proteolysis. If the intact polyprotein, cleaved polyprotein intermediates, and mature nsps underwent significant conformational changes, large changes in the intrinsic exchange profiles for each population of protein would have been expected.
Accordingly, this permitted the use of an integrative structural biology approach combining modeling and experimental methodologies to construct 3D models of nsp7-11 polyprotein, which were used to elucidate the structural basis for the order of CoV-2 polyprotein processing. The structural predictions of the nsp7-11 polyprotein using the I-TASSER software provided us with an ensemble of 10 models with four representative conformations. Overall, none of the four groups satisfy all the experimental HDX-MS, XL-MS, and SAXS data, suggesting that the nsp7-11 polyprotein is dynamic and samples multiple conformations. While SAXS and HDX-MS capture the extended nsp8 helix conformation represented by group A, XL-MS data are more consistent with the more globular protein conformations seen in groups B and C. The surfaceaccessible areas of the cleavage junctions and secondary structure element analysis of the nsp7-11 polyprotein suggested that groups B and C (comprising 6 of the 10 models of the ensemble) might represent the polyprotein conformations in better agreement with the processing order that we determined experimentally (e.g., more accessible and disordered nsp9-10 junction in comparison with a more structured and hindered nsp7-8 junction). On the other hand, the four models comprising groups A and D showcase the conformational adaptability of the polyprotein. In these models, the nsp7-8 junction is more exposed and unstructured, thus more accessible for cleavage. Overall, the nsp7-11 model ensemble recapitulates the need for viral polyproteins to adopt different conformations during the replication cycle, i.e., metamorphic proteins (59,60), given the strict genetic economy of RNA viruses.
The HDX-MS footprint and XL-MS of the Mpro:nsp7-11 complex revealed the importance of the "incognito" part of the polyprotein, the part of the polyprotein excluding the junctions captured in Mpro:substrate peptidic structures (33)(34)(35), in processing. The pattern of cross-links and HDX footprint demonstrated that there are multiple transient contacts between Mpro and the nsp7-11 polyprotein, which help to orient the enzyme on its substrate for cleavage. We propose that the positioning of the polyprotein may be such that either the polyprotein binds to the active site of one Mpro protomer and wraps around to make contact with the back side of the catalytic domain of that same protomer, or the polyprotein binds to the active site of one protomer and sits on top of the back side of the catalytic domain of the other protomer (Fig. 7). These interactions, away from the Mpro active site and the nsp7-11 polyprotein junction sites, may reflect transient interactions that occur while Mpro scans multiple polyprotein intermediates and their different conformations as candidate substrates during proteolytic processing. The lack of Mpro inhibition by the nonactive site surface binders also hints at the transient nature of these interactions ( fig. S15). Nevertheless, these interactions may be important in setting the conformation of the junctions for cleavage, as aforementioned.
The HDX-MS footprint of Mpro:nsp7-11 also reveals significant protection in the Mpro dimerization interface area, especially near the Mpro C terminus (Figs. 2D and 6), suggesting that nsp7-11 binding stabilizes the Mpro dimer. In this sense, El-Baba and colleagues (61) identified that fragment JGY, found through crystallographic fragment screening and binding in the dimer interface (30), destabilized the Mpro dimer and showed~35% inhibition of the rate of processing at 100 μM. Along the same lines, Sun and colleagues (62) found a nanobody, NB2B4, which binds the C-terminal domain of monomeric Mpro (PDB 7VFB) and inhibits activity with an IC 50 value of~150 nM. Thus, destabilization of the Mpro catalytic dimer may contribute to the mechanism of inhibition. Combined with the lack of Mpro inhibition by the non-active site surface binders ( fig. S15), allosteric inhibition of Mpro may only be efficiently achieved by interface binders destabilizing the Mpro dimer. Binding in areas on the surface may not distort the active site of Mpro, which is, by nature, very malleable (accommodating 11 different junctions in virio) (25,34).
While the impressive crystallographic small-molecule repurposing campaign (32) has provided valuable hits and probes along with antiviral activity testing, enzymatic inhibition was not reported by this study. As reviewed for remdesivir (63), the value of mechanistic and enzymatic inhibition studies (alongside antiviral studies) is paramount because it provides a logical path for developing directacting antivirals. The best example in the current case is pelitinib, which, given its strong antiviral activity, was portrayed as an allosteric inhibitor. Our studies show that it might be an allosteric activator. We hypothesize that this activation might be due to the stabilization of the Mpro dimer ( fig. S15A). To understand whether its antiviral activity is due to an off-target effect [ pelitinib has low inhibition of PLpro in enzymatic inhibition assays (64)] or due to dysregulation of viral maturation [as seen for efavirenz acceleration of Gag-Pol processing in HIV (65)], more experiments are required.
In summary, this study describes the structural basis of the order of Mpro processing of the essential nsp7-11 segment and the importance of the more transient interactions of the substrate to Mpro for proper positioning and catalysis and provides a mechanistic validation of allosteric inhibition. In conclusion, our results give structural insights into CoV-2 polyproteins, which will help us in understanding the structure-function relationships, drug design, and the fundamental biology of polyprotein activity and processing in CoV-2.

Protein expression and purification
WT Mpro was produced with native N and C termini, as described in (66). The pGEX-6P-1-nsp5 expression plasmid was transformed into Escherichia coli Rosetta gami competent cells and cultured in LB medium at 37°C with ampicillin (100 μg/ml). Next day, the culture was diluted 1:100 into 1 liter of LB medium supplemented with ampicillin (100 μg/ml). The cells were grown to OD 600 (optical density at 600 nm) = 0.8 before being induced with 1 mM isopropylβ-D-thiogalactopyranoside (IPTG) at 16°C. After 10 hours of induction, the cells were collected by centrifugation at 7200g for 10 min and stored at −80°C. The cell pellet was resuspended in 50 mM tris ( pH 8.0), 300 mM NaCl, 5 mM imidazole, and 1 mM tris(2-carboxyethyl)phosphine (TCEP) followed by sonication and centrifugation at 30,000g for 60 min. The cleared lysate was loaded on a Ni-NTA (nitrilotriacetic acid) affinity column (Qiagen). The bound proteins were first washed with lysis buffer and then with the lysis buffer supplemented with 20 mM imidazole to remove nonspecific proteins. Mpro was eluted with 300 mM imidazole in the lysis buffer and then purified by SEC using a prepacked Superose 6 Increase 10/300 GL column (GE Healthcare Life Sciences) equilibrated in 50 mM tris ( pH 8.0), 300 mM NaCl, and 1 mM TCEP. The fractions containing the pure protein were pooled, concentrated, and stored at −80°C.
nsp7 and nsp8 were produced as described in our earlier work (37). nsp9 was purified using the protocol described in (67). Overall, the plasmid was first transformed into E. coli BL21-CodonPlus (DE3)-RIL cells and then grown in LB medium with kanamycin (50 μg/ml) at 37°C. The cells were grown to an OD 600 of 1.0 before being induced with 0.5 mM IPTG. After 4 hours of induction, the cells were collected by centrifugation at 7200g. The cells were resuspended in a lysis buffer [20 mM Hepes ( pH 7.0), 150 mM NaCl, 20 mM imidazole, 2 mM MgCl 2 , and 0.5 mM TCEP]. The cells were lysed by sonication in the presence of 1 mg of lysozyme and then centrifuged at 10,000g for 20 min. The cleared lysate was loaded on a Ni-NTA affinity column (Qiagen), and the column was then washed with the lysis buffer and 50 mM imidazole buffer. The His-tagged protein was eluted with 400 mM imidazole in the lysis buffer. The His-tag was cleaved by incubating the protein with HRV 3C protease overnight at 4°C. After digestion, the protein was passed through a second Ni-NTA column to remove the 3C protease and the residual uncleaved protein. The protein sample was then purified by SEC using a prepacked Superose 6 Increase 10/300 GL (GE Healthcare Life Sciences) equilibrated in 20 mM Hepes ( pH 7.0), 150 mM NaCl, 2 mM MgCl 2 , and 0.5 mM TCEP. Pure protein-containing fractions were concentrated and stored at −80°C after snap-freezing.
A single colony of E. coli BL-CodonPlus (DE3)-RIL (Agilent Technologies) carrying the GST 3C -nsp10 was used to inoculate 50 ml of LB medium containing the appropriate antibiotics [carbenicillin (100 μg/ml) and chloramphenicol (25 μg/ml)]. This seeding culture was grown overnight in a shaking incubator at 37°C. The seeding cultures were then used to inoculate 1 liter of expression cultures containing the appropriate antibiotics to an initial OD 600 of 0.2 and grown in a shaking incubator at 37°C to an OD 600 of 0.6. The temperature was reduced to 16°C, and protein expression was induced at an OD 600 of 0.9 with the addition of 0.1 mM IPTG. The expression cultures were harvested after 16 hours by centrifugation for 30 min at 2555g, followed by flash-freezing and storage at −80°C. All centrifugation steps were performed at 4°C. The cell pellet from 1 liter of expression culture was resuspended in lysis buffer [50 mM tris, 300 mM NaCl, 5 mM β-mercaptoethanol, 4 mM MgSO 4 , 10% (v/v) glycerol ( pH 8.0)] at a ratio of 5 ml of lysis buffer to 1 g of cell paste and thawed on ice. The cells were lysed by sonication on ice for 8 min, and the cellular debris was separated from the soluble lysate by centrifugation for 30 min at 48,000g. The volume of the soluble lysate was measured, and an equal volume of saturated ammonium sulfate was added to achieve 50% saturation, followed by overnight incubation at 4°C.
The soluble fraction was separated by centrifugation for 30 min at 24,000g and discarded. The pellet was resuspended in 10 ml of lysis buffer, and 100 μl of polyethyleneimine (5%, w/v) was added in a dropwise fashion. The insoluble material was removed by centrifugation for 30 min at 24,000g. The supernatant was decanted and added to 2 ml of Glutathione Sepharose 4 FF (Cytiva) affinity medium, which had been preequilibrated with lysis buffer. Batch binding was performed on an orbital rotator at 4°C for 4 hours, and the unbound protein was removed using gravity-flow chromatography and washed with 20 ml of lysis buffer. GST 3C -nsp10 was cleaved on-column with the addition of 8 ml of lysis buffer containing HRV 3C protease (0.2 mg/ml) and incubation on an orbital rotator at 4°C overnight. The cleaved nsp10 was collected in the flow-through and wash fractions, concentrated to 198 μM using an Amicon Ultra centrifugal filter (MilliporeSigma), and stored at −80°C.
The nsp7-8 and nsp7-11 polyprotein genes were transformed into E. coli BL21-CodonPlus (DE3)-RIL cells and grown overnight on an LB-agar plate containing kanamycin (50 μg/ml). A single colony was picked from the plate and inoculated into LB medium with kanamycin (50 μg/ml). The culture was grown overnight at 37°C. Next morning, the starter culture was diluted 1:500 into the LB medium. The cells were grown at 37°C until OD 600 of~1.6 was reached. The culture was then allowed to cool for an hour at 20°C with continuous shaking after which it was induced with 1 mM IPTG. After overnight incubation, the cells were collected by centrifugation at 7200g. The cell pellet was resuspended in lysis buffer [50 mM tris ( pH 8.0), 500 mM NaCl, 20 mM imidazole, 5% glycerol, 10 mM CHAPS, and 1 mM TCEP] supplemented with 1 μM leupeptin, 1 μM pepstatin, and 1 mM phenylmethylsulfonyl fluoride. The cell suspension was lysed by sonication and clarified by centrifugation at 30,000g at 4°C for an hour. The supernatant was loaded on a Ni-NTA affinity column (Qiagen), preequilibrated with the lysis buffer. The column was first washed with lysis buffer and then with 50 mM imidazole in lysis buffer. Homemade HRV 3C protease in the buffer containing 50 mM tris ( pH 8.0), 500 mM NaCl, 20 mM imidazole, 5% glycerol, and 1 mM TCEP was added to perform oncolumn cleavage of the 6x-His tag at 4°C. The digested protein was eluted from the column and passed through a second Ni-NTA column. This reverse Ni-NTA step is performed to remove residual 3C protease and uncleaved protein. The protein was further purified by ion-exchange chromatography using the HiTrap Heparin HP column (GE Healthcare Life Sciences) and a Mono Q anion exchange column (16/10; GE Healthcare Life Sciences) using gradient elution from 150 mM to 2 M NaCl. The protein sample was then purified by SEC using a prepacked Superose 6 Increase 10/300 GL (GE Healthcare Life Sciences) equilibrated in 50 mM tris ( pH 8.0), 500 mM NaCl, 5% glycerol, and 1 mM TCEP. Pure protein-containing fractions were pooled together, concentrated, and stored at −80°C.
Proteolysis assays with nsp7-8 and nsp7-11 polyprotein substrates WT Mpro was used to carry out cleavage assays with the nsp7-8 and nsp7-11 polyprotein substrates. The in vitro cleavage reaction was performed by incubating the polyproteins with Mpro WT (nsp7-11:Mpro molar ratio was 6 μM:0.5 μM; nsp7-8:Mpro molar ratio was 5 μM:0.5 μM) at room temperature in the assay buffer: 50 mM tris ( pH 7.5), 150 mM NaCl, and 1 mM dithiothreitol (DTT). The reaction was stopped at various time points by the addition of 4× stop buffer [277.8 mM tris-HCl ( pH 6.8), 44.4% glycerol, 4.4% SDS, and 0.02% bromophenol blue]. The samples were then denatured at 95°C for 5 min and assessed on a gradient SDS-PAGE gel. The bands for the full-length substrates, intermediate products, and the final cleavage products were cut and confirmed by MS. In-gel trypsin digestion was performed on the gel bands, and LC-MS/MS was carried out on them (see the Supplementary Materials for MS experimental details). The in vitro cleavage reaction was also performed in Hepes buffer [50 mM Hepes ( pH 8.0), 500 mM NaCl, and 1 mM TCEP] at 4°C to match the assay conditions for processing for pulse labeling HDX ( fig. S3).

In vitro assessment of the effects of small molecules on Mpro activity
The stock solutions of all the Mpro binders were made in dimethyl sulfoxide. They were diluted in the assay buffer and preincubated for 30 min at room temperature with Mpro WT before starting the reaction. The nsp7-11 polyprotein substrate was then added to the reaction at 6 μM. The reaction was stopped after 24 hours. After denaturing, the samples were then run on the SDS-PAGE gel. The effect of small molecules on Mpro activity was assessed by observing the amount of substrate (nsp7-11) present after 24 hours. The gel band intensity for nsp7-11 was calculated using ImageJ software (https://imagej.nih.gov/ij/index.html) and plotted against the concentration of binders using the GraphPad Prism version 9.3.1 (GraphPad Software, La Jolla, CA, USA; www.graphpad.com). The IC 50 value calculation for NMTV was also done using the GraphPad Prism version 9.3.1.

Cross-linking mass spectrometry Sample preparation
For DSSO (Thermo Fisher Scientific) cross-linking reactions, individual protein and protein-protein complexes were diluted to 10 μM in cross-linking buffer [50 mM Hepes ( pH 8.0), 500 mM NaCl, and 1 mM TCEP] and incubated for 30 min at room temperature before initiating the cross-linking reaction. DSSO cross-linker was freshly dissolved in cross-linking buffer to a final concentration of 75 mM before being added to the protein solution at a final concentration of 1.5 mM. The reaction was incubated at 25°C for 45 or 90 min and then quenched by adding 1 μl of 1.0 M tris ( pH 8.0) and incubating for an additional 10 min at 25°C. Control reactions were performed in parallel without adding the DSSO cross-linker. All cross-linking reactions were carried out in three replicates. The presence of crosslinked proteins was confirmed by comparing to the no-cross-link negative control samples using SDS-PAGE and Coomassie staining. The remaining cross-linked and non-cross-linked samples were separately pooled and then precipitated using methanol and chloroform. Dried protein pellets were resuspended in 12.5 μl of resuspension buffer [50 mM ammonium bicarbonate and 8 M urea ( pH 8.0)]. ProteaseMAX (Promega, V5111) was added to 0.02%, and the solutions were mixed on an orbital shaker operating at 400 rpm for 5 min. After resuspension, 87.5 μl of digestion buffer [50 mM ammonium bicarbonate ( pH 8.0)] was added. Protein samples were reduced by adding 1 μl of 500 mM DTT followed by incubation of the protein solutions on an orbital shaker operating at 400 rpm at 56°C for 20 min. After reduction, 2.7 μl of 550 mM iodoacetamide was added, and the solutions were incubated at room temperature in the dark for 15 min. Reduced and alkylated protein solutions were digested overnight using trypsin at a ratio of 1:150 (w/w) (trypsin: protein) at 37°C. Peptides were acidified with 1% TFA and then desalted using C18 ZipTip (Millipore, catalog no. ZTC18 5096). Dried peptides were resuspended in 10 μl of 0.1% TFA in water. Samples were then frozen and stored at −20°C until LC-MS analysis. Liquid chromatography and mass spectrometry A total of 500 ng of sample was injected (triplicate injections for cross-linked samples and duplicate injections for control samples) onto an UltiMate 3000 UHPLC system (Dionex, Thermo Fisher Scientific). Peptides were trapped using a μPAC C18 trapping column (PharmaFluidics) using a load pump operating at 20 μl/min. Peptides were separated on a 200-cm μPAC C18 column (PharmaFluidics) using a linear gradient (1% solvent B for 4 min, 1 to 30% solvent B from 4 to 70 min, 30 to 55% solvent B from 70 to 90 min, 55 to 97% solvent B from 90 to 112 min, and isocratic at 97% solvent B from 112 to 120 min) at a flow rate of 800 nl/min. Gradient solvent A contained 0.1% formic acid, and solvent B contained 80% acetonitrile and 0.1% formic acid. LC eluate was interfaced to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) with a Nanospray Flex ion source (Thermo Fisher Scientific). The source voltage was set to 2.5 kV, and the S-Lens RF level was set to 30%. Cross-links were identified using a previously described MS2-MS3 method (68) with slight modifications. Full scans were recorded from mass/charge ratio (m/z) 150 to 1500 at a resolution of 60,000 in the Orbitrap mass analyzer. The automatic gain control (AGC) target value was set to 4 × 10 5 , and the maximum injection time was set to 50 ms in the Orbitrap. MS2 scans were recorded at a resolution of 30,000 in the Orbitrap mass analyzer. Only precursors with charge state between 4 and 8 were selected for MS2 scans. The AGC target was set to 5 × 10 4 , a maximum injection time of 150 ms, and an isolation width of 1.6 m/ z. Collision-induced dissociation fragmentation energy was set to 25%. The two most abundant reporter doublets from the MS2 scans with a charge state of 2 to 6, a 31.9721-Da mass difference, and a mass tolerance of ±10 parts per million ( ppm) were selected for MS3. The MS3 scans were recorded in the ion trap in rapid mode using higher-energy C-trap dissociation (HCD) fragmentation with 35% collision energy. The AGC target was set to 2 × 10 4 , the maximum injection time was set to 200 ms, and the isolation width was set to 2.0 m/z.

Data analysis
To identify cross-linked peptides, Thermo.Raw files were imported into Proteome Discoverer 2.5 (Thermo Fisher Scientific) and analyzed via XlinkX algorithm (69) using the MS2_MS3 workflow with the following parameters: MS1 mass tolerance, 10 ppm; MS2 mass tolerance, 20 ppm; MS3 mass tolerance, 0.5 Da; digestion, trypsin with four missed cleavages allowed; minimum peptide length of four amino acids; fixed modification, carbamidomethylation (C); variable modification, oxidation (M), and DSSO (K, S, T, and Y). The XlinkX/PD Validator node was used for cross-linked peptide validation with a 1% false discovery rate. Identified cross-links were further validated and quantified using Skyline (version 19.1) (70) using a previously described protocol (71). Cross-link spectral matches found in Proteome Discoverer were exported and converted to sequence spectrum list format using Excel (Microsoft). Crosslink peak areas were assessed using the MS1 full-scan filtering protocol for peaks within 8 min of the cross-link spectral match identification. Peak areas were assigned to the specified cross-linked peptide identification if the mass error was within 10 ppm of the theoretical mass, the isotope dot product was greater than 0.95, and the peak was not found in the non-cross-linked negative control samples. The isotope dot product compares the distribution of the measured MS1 signals against the theoretical isotope abundance distribution calculated on the basis of the peptide sequence. Its value ranges between 0 and 1, where 1 indicates a perfect match (72). Pairwise comparisons were made using the "MSstats" package (73) implemented in Skyline to calculate relative fold changes and significance. Significant change thresholds were defined as a log 2 fold change less than −2 or greater than 2 and −log 10 P value greater than 1.3 (P value less than 0.05). Visualization of proteins and cross-links was generated using xiNET (74). The data have been deposited to the ProteomeXchange Consortium via the PRIDE (75) partner repository with the dataset identifier PXD033748.

Hydrogen-deuterium exchange mass spectrometry Peptide identification
Peptides were identified using MS/MS experiments performed on a QExactive (Thermo Fisher Scientific, San Jose, CA) over a 70-min gradient. Product ion spectra were acquired in a data-dependent mode, and the five most abundant ions were selected for the product ion analysis per scan event. The MS/MS *.raw data files were converted to *.mgf files and then submitted to MASCOT (version 2.3, Matrix Science, London, UK) for peptide identification. The maximum number of missed cleavages was set to four with a mass tolerance of ±0.6 Da for precursor ions and of ±8 ppm for fragment ions. Oxidation to methionine was selected for variable modification. Pepsin was used for digestion, and no specific enzyme was selected in MASCOT during the search. Peptides included in the peptide set used for HDX detection had a MASCOT score of 20 or greater. The MS/MS MASCOT search was also performed against a decoy (reverse) sequence, and false positives were ruled out if they did not pass a 1% false discovery rate.

Pulse labeling
The nsp7-10 or nsp7-11 polyprotein at 10 μM concentration was incubated with WT Mpro at 1:1 molar ratio, and 5-μl aliquots of the cleavage reaction were removed at 600, 1800, 3600, 14,400, and 86,400 s as well as 7200 and 28,800 s for nsp7-11 only. Aliquots were mixed with 20 μl of deuterated (D 2 O-containing) buffer [50 mM Hepes, 500 mM NaCl, and 1 mM TCEP ( pD 8.4)] and incubated on ice for 30 s. Deuterated samples were quenched with 25 μl of quench solution [5 M urea and 1% TFA ( pH 2)] and immediately flash-frozen and stored until ready for direct inject MS analysis.

Continuous labeling
Experiments with continuous labeling were carried out on a fully automated system (CTC HTS PAL, LEAP Technologies, Carrboro, NC; housed inside a 4°C cabinet) as previously described (76) with the following modifications. For differential HDX, protein-protein complexes were preformed and allowed to incubate for 30 min at room temperature before analysis. The reactions (5 μl) were mixed with 20 μl of deuterated (D 2 O-containing) buffer [50 mM Hepes, 500 mM NaCl, and 1 mM TCEP ( pD 8.4)] and incubated at 4°C for 0, 10, 30, 60, 900, or 3600 s. Following on-exchange, unwanted forward-or back-exchange was minimized, and the protein was denatured by the addition of 25 μl of a quench solution [5 M urea and 1% TFA ( pH 2.0)] before being immediately passed along for online digestion.

HDX-MS analysis
Samples were digested through an immobilized pepsin column ( prepared in-house) at 50 μl/min [0.1% (v/v) TFA at 4°C], and the resulting peptides were trapped and desalted on a 2 mm-by-10 mm C8 trap column (Hypersil Gold, Thermo Fisher Scientific). The bound peptides were then gradient-eluted [4 to 40% (v/v) CH 3 CN and 0.3% (v/v) formic acid] on a 2.1 mm-by-50 mm C18 separation column (Hypersil Gold, Thermo Fisher Scientific) for 5 min. Sample handling and peptide separation were conducted at 4°C. The eluted peptides were then subjected to electrospray ionization directly coupled to a high-resolution Orbitrap mass spectrometer (QExactive, Thermo Fisher Scientific).

Data rendering
The intensity-weighted mean m/z centroid value of each peptide envelope was calculated and subsequently converted into a percentage of deuterium incorporation. This is accomplished by determining the observed averages of the undeuterated and fully deuterated spectra using the conventional formula described elsewhere (77). The fully deuterated control, 100% deuterium incorporation, was calculated theoretically, and corrections for back-exchange were made on the basis of an estimated 70% deuterium recovery and accounting for 80% final deuterium concentration in the sample (1:5 dilution in deuterated buffer). Statistical significance for the differential HDX data is determined by an unpaired t test for each time point, a procedure that is integrated into the HDX Workbench software (78).
The HDX data from all overlapping peptides were consolidated to individual amino acid values using a residue averaging approach. Briefly, for each residue, the deuterium incorporation values and peptide lengths from all overlapping peptides were assembled. A weighting function was applied in which shorter peptides were weighted more heavily and longer peptides were weighted less. Each of the weighted deuterium incorporation values was then averaged by incorporating this weighting function to produce a single value for each amino acid. The initial two residues of each peptide, as well as prolines, were omitted from the calculations. This approach is similar to that previously described (79).
Deuterium uptake for each peptide is calculated as the average of %D for all on-exchange time points, and the difference in average %D values between the unbound and bound samples is presented as a heatmap with a color code given at the bottom of the figure (warm colors for deprotection and cool colors for protection). Peptides are colored by the software automatically to display significant differences, determined either by a >5% difference (less or more protection) in average deuterium uptake between the two states or by using the results of unpaired t tests at each time point (P < 0.05 for any two time points or P < 0.01 for any single time point). Peptides with nonsignificant changes between the two states are colored gray. The exchange at the first two residues for any given peptide is not colored. Each peptide bar in the heatmap view displays the average Δ %D values, associated SD, and the charge state. In addition, overlapping peptides with a similar protection trend covering the same region are used to rule out data ambiguity. The data have been deposited to the ProteomeXchange Consortium via the PRIDE (75) partner repository with the dataset identifier PXD033702 for the pulse labeling HDX-MS experiment and PXD033698 for continuous labeling HDX-MS experiments [ project name: Biochemical and structural insights into SARS-CoV-2 polyprotein processing by Mpro (HDX-MS continuous labeling), project accession: PXD033698, and reviewer account details: username: reviewer_ pxd033698@ebi.ac.uk and password: QgiXipcs; project name: Biochemical and structural insights into SARS-CoV-2 polyprotein processing by Mpro (HDX-MS pulse labeling), project accession: PXD033702, and reviewer account details: username: reviewer_ pxd033702@ebi.ac.uk and password: wHh4zEPB].

SEC coupled to MALS and SAXS detection
Purified and concentrated nsp7-8 (8 mg/ml) and nsp7-11 (4 mg/ ml) were used for data collection. SAXS was performed at BioCAT (beamline 18ID at the Advanced Photon Source, Chicago) with in-line SEC to separate the sample from aggregates and other contaminants, thus ensuring optimal sample quality and MALS, dynamic light scattering (DLS), and refractive index (RI) measurement for additional biophysical characterization (SEC-MALS-SAXS). The samples were loaded on a Superdex 200 Increase 10/300 GL column (Cytiva) run by a 1260 Infinity II HPLC (Agilent Technologies) at 0.6 ml/min. The flow passed through (in order) the Agilent ultraviolet detector, a MALS detector, a DLS detector (DAWN Helios II, Wyatt Technologies), and an RI detector (Optilab T-rEX, Wyatt). The flow then went through the SAXS flow cell. The flow cell consists of a 1.0-mm-inside diameter quartz capillary with~20-μm walls. A coflowing buffer sheath is used to separate the samples from the capillary walls, helping prevent radiation damage (80). Scattering intensity was recorded using a Pilatus3 X 1M (Dectris) detector, which was placed 3.69 m from the nsp7-11 sample, giving us access to a q range of 0.003 to 0.35 Å −1 and 3.631 m from the nsp7-8 sample, giving us access to a qrange of 0.0047 to 0.35 Å −1 . The data were reduced using BioXTAS RAW 2.0.3 (81). Buffer blanks were created by averaging regions flanking the elution peak and subtracted from exposures selected from the elution peak to create the I(q)-versus-q curves used for subsequent analyses. MWs and hydrodynamic radii were calculated from the MALS and DLS data, respectively, using the ASTRA 7 software (Wyatt). Data analysis was carried out using the RAW software package for the determination of radius of gyration (Rg), P(r) distribution, and particle maximum dimension (Dmax) parameters and for qualitative flexibility analysis (through generation of Rgnormalized Kratky and Guinier plots). Volumetric bead modeling was performed using the DAMMIN software package (52). The resulting bead models were averaged and filtered using the DAMAVER package (82), generating the final bead model reconstruction. The SAXS data are deposited in the SAXS database under the accession codes SASDPY2, SASDPZ2, SASDP23, and SASDP33.

Structural integrative modeling using I-TASSER
For the structural predictions of the nsp7-11 polyprotein, an integrative modeling approach was used. The I-TASSER server (43), which is an online source for automated protein structure prediction, was used to generate models of the polyproteins. A two-run approach was used to model the nsp7-11 polyprotein. Run 1 included the following inputs: (i) amino acid sequence, (ii) distance constraints from XL-MS, (iii) nsp7-8 model as a template, and (iv) secondary structure constraints for nsp7, nsp8, nsp9, and nsp10 as advised by HDX-MS to generate models A1, A2, D, C1, and C2 (Fig. 3, B and C). Run 2 included the following: (i) amino acid sequence, (ii) distance constraints from XL-MS, (iii) nsp7-8 as a template, and (iv) secondary structure constraints for nsp8, nsp9, and nsp10 as advised by HDX-MS to generate models A3, B1, B2, B3, and B4 (Fig. 3, B and C). Initial observation of the polyprotein by HDX-MS showed a similar pattern of deuterium uptake compared to the individual proteins (Fig. 3A), suggesting that secondary structures within the polyprotein are likely to largely resemble the secondary structures of the mature nsps. Accordingly, this allowed us to delineate secondary structural constraints based on solved xray crystal structures of nsp7, nsp8, nsp9, and nsp10. We also used two nsp7-8 models that we previously generated using a similar integrative modeling workflow to serve as additional structural templates because the HDX-MS footprint of the nsp7-8 polyprotein resembles the footprint of nsp7-8 in nsp7-11 (Fig. 3A). The two nsp7-8 models were chosen on the basis of their varying agreement with the XL-MS and HDX-MS data to limit bias from a particular experimental approach and to sample the conformational landscape as thoroughly as possible (see the Supplementary Materials for nsp7-8 integrative modeling).
The 10 nsp7-11 output models were assessed against the experimental (i) HDX-MS, (ii) XL-MS, and (iii) SAXS data: (i) Agreement of models to HDX-MS data was completed using HDXer, which generated theoretical deuterium uptake values for the models to compare to experimental values (49,50). Smaller RMSE indicates better agreement of models to experimental data. (ii) Cross-links were mapped on the models using xiVIEW (DOI: 10.1101/561829) to calculate distances and determine the percentage of cross-links satisfied, i.e., distances less than 30 Å. (iii) A theoretical scattering profile was generated for each model using the CRYSOL web interface (83). The theoretical scattering profile of each model was then fitted against the experimental scattering profile. Last, the secondary structural elements and the solvent-accessible surface area of the junction sites for both polyproteins were also analyzed, and the results were compared with the limited proteolysis results to evaluate the physiological relevance of the structure in the context of polyprotein processing. The junctionaccessible area was calculated by the summation of the accessible area of four residues (P1, P2, P1', and P2') at the junction site. The accessible surface area for each residue was calculated using VADAR (84). The integrative structures of nsp7-11 polyprotein have been deposited in the PDB-Dev databank under accession code PDBDEV_00000120. They are also provided in the Supplementary Materials as PyMOL sessions.

Supplementary Materials
This PDF file includes: Supplementary Text Figs. S1 to S15 Tables S1 to S4 Other Supplementary Material for this manuscript includes the following: Data S1 and S2 View/request a protocol for this paper from Bio-protocol.