Optimization of quenched fluorescent peptide substrates of SARS-CoV-2 3CLpro main protease (Mpro) from proteomic identification of P6—P6' active site specificity

ABSTRACT SARS-CoV-2 3C-like main protease (3CLpro) is essential for protein excision from the viral polyprotein. 3CLpro inhibitor drug development to block SARS-CoV-2 replication focuses on the catalytic non-prime (P) side for specificity and potency, but the importance of the prime (P') side in substrate specificity and for drug development remains underappreciated. We determined the P6–P6' specificity for 3CLpro from >800 cleavage sites that we identified using Proteomic Identification of Cleavage site Specificity (PICS). Cleavage occurred after the canonical P1-Gln and non-canonical P1-His and P1-Met residues. Moreover, P3 showed a preference for Arg/Lys and P3' for His. Essential H-bonds between the N-terminal Ser1 of protomer-B in 3CLpro dimers form with P1-His, but not with P1-Met. Nonetheless, cleavage occurs at P1-Met456 in native MAP4K5. Elevated reactive oxygen species in SARS-CoV-2 infection oxidize methionines. Molecular simulations revealed P1-MetOX forms an H-bond with Ser1 and notably, strong positive cooperativity between P1-Met with P3'-His was revealed, which enhanced peptide-cleavage rates. The highly plastic S3' subsite accommodates P3′-His that displays stabilizing backbone H-bonds with Thr25 lying central in a “‘threonine trio” (Thr24-Thr25-Thr26) in the P'-binding domain I. Molecular docking simulations unveiled structure-activity relationships impacting 3CLpro-substrate interactions, and the role of these structural determinants was confirmed by MALDI-TOF-MS cleavage assays of P1'- and P3'-positional scanning peptide libraries carrying a 2nd optimal cut-site as an internal positive control. These data informed the design of two new and highly soluble 3CLproquenched-fluorescent peptide substrates for improved FRET monitoring of 3CLpro activity with 15× improved sensitivity over current assays. IMPORTANCE From global proteomics identification of >800 cleavage sites, we characterized the P6–P6' active site specificity of SARS-CoV-2 3CLpro using proteome-derived peptide library screens, molecular modeling simulations, and focussed positional peptide libraries. In P1', we show that alanine and serine are cleaved 3× faster than glycine and the hydrophobic small amino acids Leu, Ile, or Val prevent cleavage of otherwise optimal non-prime sequences. In characterizing non-canonical non-prime P1 specificity, we explored the unusual P1-Met specificity, discovering enhanced cleavage when in the oxidized state (P1-MetOX). We unveiled unexpected amino acid cooperativity at P1-Met with P3′-His and noncanonical P1-His with P2-Phe, and the importance of the threonine trio (Thr24-Thr25-Thr26) in the prime side binding domain I in defining prime side binding in SARS-CoV-2 3CLpro. From these analyses, we rationally designed quenched-fluorescence natural amino acid peptide substrates with >15× improved sensitivity and high peptide solubility, facilitating handling and application for screening of new antiviral drugs.

S evere acute respiratory syndrome (SARS)-coronavirus 2 (CoV-2) is the etiologic agent of coronavirus disease 2019  in the first documented coronavirus pandemic (1).With SARS-CoV-2 now endemic, infected human reservoirs of suscepti ble under-vaccinated populations, including children and the immunocompromised, and new variants of concern (VOCs) exhibiting immune-and vaccine-escape potential highlight the likelihood of the emergence of further VOCs.The looming health challenge for COVID-19 caused by new VOCs is to improve antiviral drugs for treatments and that can be pre-positioned as scaffolds for rapid drug development in future coronavirus outbreaks.
SARS-CoV-2 is a positive single-stranded RNA virus encoding four structural proteins (spike, envelope, membrane, and nucleocapsid) and 15-16 non-structural proteins (nsps), including two viral cysteine proteases: nsp5 3C-like protease (3CL pro ), also known by the subjective term, main protease (Mpro), and nsp3 papain-like protease (PL pro ) (2).The SARS-CoV-2 proteases excise proteins from two frameshift viral polyproteins encoded by ORF1a and ORF1ab.We have used Terminal Amino Isotopic Labeling of Substrates (TAILS) proteomics to discover and validate >150 human host cell substrates of 3CL pro that promote SARS-CoV-2 replication and circumvention of innate immune defenses and antiviral xenophagy, including cleavage-inactivation of galectin-8 to avoid xenophagy destruction (3).3CL pro was validated early in the pandemic as a therapeutic target -effective treatment of COVID-19 was achieved by inhibiting 3CL pro activity with Paxlovid, a combination drug of Nirmatrelvir and Ritonavir (4), emphasizes the importance of the protease in viral replication and the concern that mutant 3CL pro (5) may evolve reduced susceptibility to Nirmatrelvir.A next-generation high-potency anti-proteolytic antiviral drug, Ibuzatrelvir (PF-07817883), with low off-target effects and not requiring Ritonavir to inhibit cytochrome P450-3A4, is in clinical trials by Pfizer to treat infection and prevent transmission.However, high-throughput screening (HTS) for follow-up inhibitors is hampered by poor solubility and handling properties of even the best current quenched fluorescent (QF) peptide substrates of 3CL pro (6).
Substrate cleavage requires the amino acids flanking the scissile bond on the non-prime (P) side and distal prime (P') side to fit the protease S and S' subsites, respectively.The essential role of interactions between Ser1 of Protomer B in SARS-CoV-2 3CL pro dimers with P1-Gln in the S1 subsite for substrate stabilization and peptide bond scission was first described by Kneller et al. (7).Early in the pandemic, Rut et al. (6) employed a combinatorial library of natural and non-natural amino acids to determine the P side specificity of 3CL pro , which showed similarities and differences with the SARS-CoV (−1) and MERS 3CL pro .Similarly, active site structural characteriza tion has mainly been directed to the P side to inform the development of antiviral protease inhibitors.In contrast, the contribution of the P' prime side has often been overlooked.Here, we describe the complete P6-P6' active site substrate specificity of SARS-CoV-2 3CL pro that we determined from >800 cleavage sites identified in three different proteome-derived peptide libraries using Proteomic Identification of Cleavage site Specificity (PICS) (7,8).Employing focused positional peptide libraries and molecular dynamics modeling of the best-fitting sequences, we identified undescribed P'-side natural amino acid residue preferences and their structural determinants in 3CL pro interactions.Crucial P'-side specificity determinants that we identified include a highly plastic P3' subsite moulding the physicochemical envelope of the active site.These data informed the development of new highly soluble, high-efficiency fluorescence resonance energy transfer (FRET) peptide substrates for improved assays of 3CL pro activity, such as for HTS screening.Moreover, peptide cleavage after P1 methionine in a preferred oxidized form (Met OX ) was established further emphasizing the relevance of oxidative stress in viral infections.These findings should inform med-chem improvement of new small-molecule inhibitor compounds potent against SARS-CoV-2 3CL pro .

Quenched fluorescent peptide cleavage assays
Quantification of 3CL pro activity was by using the SARS-CoV-2 3CL pro -specific quenched fluorescent peptide (Ac-Abu-Tle-Leu-Gln-ACC) at 20 µM as described by Rut et al. (6).Fluorescence on cleavage was measured at λex 320 nm and λem 460 nm using a POLARstar optima (BMG LABTECH) microplate reader.The pH optimum for 3CL pro activity was determined over a pH range of 5-10, and the dimerization concentration was determined by kinetic measurements using the quenched fluorescent peptide substrate. We

Proteome library preparation
E. coli strain K12 lysates were used to prepare proteome-derived peptide libraries as described (8,9).As E. coli does not extensively modify its proteins, it provides suitable proteomes to determine protease specificity.Protein supernatants were reduced with 10 mM dithiothreitol (DTT) for 60 min at 37°C and alkylated with 20 mM iodoacetamide for 30 min at room temperature in the dark.Lysates were quenched with 20 mM DTT and precipitated by mixing 5 mL of lysate with 30 mL of ice-cold acetone:methanol (8:1) and incubating at -80°C for 16 h.Proteome pellets were collected by centrifugation at 9,000 × xg for 15 min at 4°C, washed with ice-cold acetone, dissolved in 6 M GuCl, and quantified by A 280 nm.
To prepare the trypsin library, 10 mg of protein was diluted 10-fold in 200 mM HEPES, pH 8.0.For the GluC library, 10 mg of protein was diluted 10-fold in 200 mM phosphate-buffered saline, pH 7.5.For the lysargiNase (10) library, 10 mg of protein was diluted 10-fold in 200 mM HEPES, pH 8.0, and CaCl 2 was added to 10 mM.For each library, sequencing-grade enzyme was added: 100 µg trypsin (Thermo Fisher Scientific), 200 µg GluC (Thermo Fisher Scientific), or 200 µg lysargiNase (https://www.ibmb.csic.es/en/department-of-structural-and-molecular-biology/proteolysis-lab/lysarginase/) and incubated at 37°C for 16 h with rotation.The digested protein samples were heated at 80°C for 10 min to deactivate proteases, followed by incubation with 10 mM EDTA and 1 mM 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride.Peptide mixtures were adjusted to pH 8.0, and then free amines dimethylated with 100 mM formaldehyde and 80 mM sodium cyanoborohydride at 37°C for 16 h.Dimethylated peptides were quenched with 150 mM Tris-HCl, pH 8.0, for 45 min at 37°C and then acidified to pH 3 using formic acid with excess gas allowed to bubble off for 30 min.Peptide samples were purified by solid-phase extraction using C18 SPE cartridges (Waters).Cartridges were activated with 100% acetonitrile (MeCN), equilibra ted with 0.1% formic acid, followed by sample loading.Flowthrough peptides were collected and reloaded.Cartridges were washed thrice with 0.1% formic acid (6 volumes).Dimethylated peptides were eluted twice with 70% MeCN, 0.1% formic acid into Protein LoBind tubes (Eppendorf ) and lyophilized to complete dryness.Dimethylated peptide libraries were resuspended in water, quantified by A 280 nm, aliquoted, and frozen at -80°C until PICS assays.

Proteomic identification of cleavage substrates
In PICS, 3CL pro -cleaved neo-N-termini display a free N-terminal amine that is reactive with NHS-biotin, which is used to purify the prime side cleavage products.To do so, the N-terminal-blocked proteome-derived peptide libraries were prepared for assay as follows: 300 µg of each library was resuspended in 3CL pro assay buffer [50 mM Tris, 150 mM NaCl, 2 mM DTT, 1 mM EDTA, 0.05% Brij-35, at pH 6.9, which we determined was the pH optimum for the protease (Fig. S1)] in Protein LoBind tubes at 2 mg/mL.3CL pro or inactive 3CL pro -Cys145Ala as control were added to each library at the dimerization concentration that we determined to be 0.8 µM, gently mixed, and incubated at 37°C for 16 h.Digestions were stopped by heat inactivation.Reduced cleavage of dimethylated lysine-containing peptides by proteases cleaving with Lys specificity is a constraint of PICS (9).
A stock of 10 mM sulfo-NHS-biotin was prepared in DMSO, added to each digested sample to a final concentration of 0.5 mM, and incubated with rotation at 25°C for 2 h.High-capacity streptavidin-Sepharose (1.5× reaction volume) was equilibrated 5 times in 50 mM HEPES, 150 mM NaCl, pH 7.5.The biotinylated products were mixed with washed slurry and vortexed for 30 min at 25°C, after which the mixture was transferred to a spin column (~500 µL capacity) with a filter of ~10 µm pore size.Spin filters were centrifuged at 100 × g for 15 s or until all contents passed through without drying the slurry.Flowthroughs were reapplied and centrifuged again.The resin was washed 10 times and centrifuged with 500 µL of 50 mM HEPES, 150 mM NaCl, pH 7.5.Spin filters were plugged, and elution buffer (50 mM HEPES, 20 mM DTT, pH 7.5) was added to the slurry and mixed with agitation for 1 h at 25°C.Eluates were collected in fresh tubes by removing the column plug and centrifugation at 150 × g for 15 s or until contents had passed through without drying the column.A second elution was performed using fresh elution buffer; eluates were combined and then acidified to pH 3 with formic acid.Peptides were desalted and concentrated using C18 SPE cartridges as described above.Peptides were lyophilized to complete dryness and analyzed by RP-LC-MS/MS.

RP-LC-MS/MS
Purified peptides were resuspended in 5 µL buffer A (0.1% formic acid, 99.9% H 2 O) for injection onto an Easy nLC-1000 (Thermo Scientific) connected inline to a Bruker Impact II Q-TOF mass spectrometer (Bruker Daltonics).Peptides were loaded onto an in-house-packed analytical column (25 cm × 75 µm ID, 1.9 µm C18AQ ReproSil-Pur, Dr. Maisch) at 800 bar using buffer A heated by a column oven at 50°C and then eluted with a linear gradient of buffer B (0.1% formic acid, 99.9% MeCN) over 90 min at 200 nL/min from 2% buffer B to 24% buffer B, then washed with 95% buffer B for 5 min before re-equilibration with buffer A. Eluted peptides were ionized by electrospray and analyzed in positive ion mode, source voltage 1.5 kV and 150°C, with nanoBooster using MeCN at 3.0 L/min and then analyzed by data-dependent acquisition.Ions were scanned between 150-1,750 m/z at 5 Hz, and then the top 12 most intense precursor peptide ions (charges 2-5) were selected for fragmentation by collision-induced dissociation, and the product ions were scanned at 8 Hz for counts above 1.2 × 10 3 and at 12 Hz for counts above 2.5 × 10 5 and actively excluded for MS/MS after 1 spectrum for 30 s but reconsidered if the intensity was 5-fold higher than the previous intensity.Spectra were converted into .mgffiles for database searching.
Peptides that were thioacylated and semi-cleaved (i.e., biotinylated peptides resulting from 3CL pro cleavage) were considered for sequence analysis by WebPICS (https:// webpics.clip.msl.ubc.ca)(11).The amino acid occurrences from P6-P6' were normalized for natural abundance in the E. coli K12 reference proteome, and their occurrence at each position (P6-P6') was calculated (p-value ≥ 0.05).Amino acids with a >2-fold occurrence over the normalizsed baseline were plotted as heat maps using GraphPad Prism version 9.0.0.121 (GraphPad Software Inc., San Diego, CA).WebPICS outputted the P4-P4' sequences for cut site motif analysis using IceLogo version 1.3.8.Amino acid synergism/cooperativity analysis was performed by using fixed amino acid subsite position analyses in WebPICS to establish any positive or negative amino acid cooperativ ity >10% frequency versus independence between the fixed amino acid and amino acids in other positions.

3CL pro cut-site kinetics by MALDI-TOF/TOF-MS
Peptides with the following sequences: VALQGAHRVALQXAHYR (X = Leu, Val, or Ile), RASVALQGAXYSAR (X = His, Thr, or Ala), RASVALQXAHYSAR (X = Ser, Ala, or Gly), and AAVALQXAHHYAYR (X is one of each 20 natural amino acids) were synthesized (Gen Script).Peptides (50 µM) were diluted in 3CL pro assay buffer, pH 6.9, and incubated with 3CL pro (1:50 molar ratio, E:S) in a 25 µL final volume at 37°C in a humidified chamber for 5, 15, 30, 60, 120, 240 min.At each time point, 0.5 µL of the assay reactants was spotted on a matrix-assisted laser desorption/ionization (MALDI) plate pre-spotted with alpha-cyano-4-hydroxy-cinnamic acid matrix solution (10 mg/mL in 50/50/0.1% water/ acetonitrile/formic acid) after which 0.5 µL matrix solution was immediately added.The spotted samples were desalted in an ice-cold 0.1% formic acid bath by immersing the plate, which was then loaded into a MALDI-TOF/TOF 4700 Proteomics Analyzer (Applied Biosystems) equipped with a 335 nm Nd:YAG laser (200 Hz) operating in positive ion mode.MALDI spectra were analyzed using Applied Biosystem Data Explorer, version 4.5.The apparent (app) app (k cat /K M ) was estimated as before (3) under the assumption of a first-order reaction where

P1 methionine oxidation and MS/MS analysis
The peptide AAVALM↓SAHHYAYR was synthesized and the methionine was oxidized by incubating 5 µL of 10 mM peptide (0.05 µM) with 5 µL 30% aqueous H 2 O 2 (25 µM, 500× excess) in 50 µL glacial acetic acid for 15 min at 22°C.100% conversion of Met to Met OX was confirmed by mass spectrometry as follows.The peptide sample was desalted on an Oasis HLB cartridge and eluted with 1 mL MeCN, 0.1% formic acid, and 50 µL was then diluted in 150 µL H 2 O and measured by direct infusion on an Impact II Q-TOF mass spectrometer (Bruker-Daltonics) in positive ion mode.For cleavage assays, the acetonitrile was removed from the oxidized peptide by SpeedVac and solubilized in 50 mM Tris-HCl, 1 mM EDTA, 50 mM NaCl, pH 6.9 in the absence of DTT.To 200 µg/mL P1-Met-peptide or P1-Met OX -peptide, 80 µM 3CL pro or 3CL pro (Cys145Ala) as control were added to a final 1:100 (E:S ratio), incubated and analyzed by MALDI-TOF-MS as described above.

Molecular docking simulations
Peptide-3CL pro molecular docking simulations were performed using Rosetta FlexPep Dock ab-initio protocol (12) in the Rosetta software suite (13).The 3CL pro structure (PDB: 6XHU) ( 14) was prepared for docking calculations by running the Rosetta relax appli cation using flags listed in Supporting Information.The starting backbone conforma tion of an intact peptide spanning a canonical P4-P4′ that we identified by PICS (VALQ↓GAHY) was created as a preliminary extended structure truncated at both Nand C-termini using the BuildPeptide Rosetta application.A fragment library of trimer and pentamer backbones was generated from known PDB structures based on sequence similarity and predicted secondary structure.FlexPepDock ab initio simulations were performed of the 3CL pro active site with the extended peptide structure positioned 15 Å away.Fast, low-resolution modeling was performed 50,000 times with the side chains represented as single-centroid spheres.High-resolution analysis was then performed using a full-atom energy function, which enables full flexibility for all peptide and receptor side chains (15).A flat harmonic function (Supporting Information) penalized models where the Euclidean distance between the P1-Cα and Ser1-Sγ was >4 Å.The Rosetta Interface score (I_sc) (16) was calculated by subtracting the energy of 3CL pro and the peptide in isolation from the total energy of the complex.The 500 lowest-scoring models based on Rosetta total energy were selected, within which the model with the highest structural similarity was selected as the representative model.Other parent peptides of cleavage products identified by PICS that we mod eled were SRLH↓SYSS, SKLM↓SENT, and SKLM OX ↓SENT versus ARLQ↓AMAP, AALQ↓AVNS, VVVQ↓AASG, AILQ↓NATS and SDLQ↓STQA.

Incubation of PICS libraries with 3CL pro yields cleaved peptides identifiable by LC-MS/MS
Digestion of proteomes with proteases of differing sequence specificities (trypsin, GluC, and lysargiNase) (10) yielded three separate proteome-derived peptide libraries, each distinguished by characteristic amino acids at the C-terminus (Arg/Lys for trypsin, Glu/Asp for GluC) or N-terminus (Arg/Lys for lysargiNase).After dimethylation to block primary amines, the libraries were assayed for susceptibility to cleavage by recombinant SARS-CoV-2 3CL pro .Cleaved neo-N-terminal products displaying a free α-amine at the N-terminus were biotinylated to detect 3CL pro -cleaved neo-N-terminal peptides after streptavidin enrichment.From the identity of the biotinylated peptides, which were the P' sequences of the cleavage sites, 816 cleavage sites spanning P6-P6' were identified across the three libraries in N = 2 separate independent experiments (Fig. 1; Fig. S2).Using three digestion enzymes to generate the PICS proteome libraries greatly expanded the number of cleavage sites identified (Fig. 1a): Digestion of the trypsin, GluC, and lysargiNase libraries with 3CL pro resulted in 415, 328, and 88 cleavage sites, respectively, with only 41 common sites.
From the occurrence of amino acids at each position, heatmaps were generated (Fig. 1b).The values for the occurrences of all amino acids at all positions were combined by a weighted average into the final composite motif, highlighted in Fig. 1b.Assay robustness was confirmed by all IceLogos showing good agreement across individual replicates and libraries (Fig. S2).As expected, only minor library peptide carryover (n = 32) lacking any defined specificity was found in the catalytically inactive 3CL pro (Cys145Ala) samples (Fig. 1c).Since we used the same expression and purification strategies for 3CL pro and 3CL pro (Cys145Ala), which included the use of a cocktail of protease inhibitors, the control data confirm that no co-eluting contaminant proteases were present.

SARS-CoV-2 3CL pro domain interactions in prime-side substrate recognition
In agreement with previously mapped 3CL pro cleavage motifs in the viral polyprotein (2) and native human cell proteins determined by terminal amine isotopic labeling of substrates (TAILS) (3), the cleavage specificity for 3CL pro is dominated by prefer red amino acid residues at three sites, P2, P1, and P1' (6,17).Unsurprisingly, the non-prime side showed a strong preference for leucine at P2 and glutamine at P1 and for glycine, alanine, and serine at P1' .However, the identification of >800 cut sites allowed for other specificity preferences between P4 and P4' and amino acid cooperativity between subsites in the peptide context to be resolved with statistical significance.
To compare the structure-activity relationships for cleaved peptides displaying the preferred amino acids in peptides revealed by PICS and not described for the SARS-CoV-2 3CL pro , we constructed nine models of exemplary peptides binding to one of the two 3CL pro protomers in the dimer (PDB 6XHM) through molecular docking simulations (Fig. 2).The I_sc of the final models demonstrated that it is more favorable for 3CL pro and these peptides to interact than to remain separate, e.g., VALQ↓GAHY, with a preferred P3′-His, had the lowest (strongest) I_sc = -38.17.The following substrate peptides were modeled with a non-canonical P1-His, SRLH↓SYSS and P1-Met, SKLM↓SENT and in the oxidized form, Met OX , versus the canonical P1-Gln.In addition to the above peptides, P3-Arg was modeled using ARLQ↓AMAP with a P1′-Ala for comparison with the other well-described P1' amino acid residue by peptides SDLQ↓STQA and AILQ↓NATS.Finally, we compared the preferred P2-Leu (AALQ↓AVNS) with a P2-Val in the cleaved peptide VVVQ↓AASG.
Stabilization of the substrate prime side is entirely by domain I of the 3CL pro catalytic domain (residues 10-99), whereas the non-prime side interactions occur through domain II (residues 100-196) (17-19) (Fig. 2a).Three consecutive threonine residues, Thr24, Thr25, and Thr26, which we termed the "threonine trio, " in catalytic domain I are the primary P' stabilizing residues.The P1'-P4' residues extend over the threonine trio, maximizing main-chain interactions between the substrate and 3CL pro (Fig. 2f through i).
The main-chain oxygens of Thr24 and Thr26 accept a hydrogen bond from the substrate main-chain nitrogen atom of P4′-Tyr and P2′-Ala at an Euclidian distance of 2.5 and 2.0 Å, respectively.In contrast, the P2′-Ala oxygen forms a bifurcated hydrogen bond with the Thr26 through its main-chain nitrogen (2.8 Å) and side-chain hydroxyl group (3.5 Å) (Fig. 2h and i).
The docking model for VALQ↓GAHY agrees well with experimental evidence from crystallography studies (17)(18)(19).Zhao et al. solved the structure of 3CL pro mutant (His41Ala) in complex with six native substrates from SARS-CoV-2 replicase polyproteins and found that the overall structure is almost identical in all six complex structures, with root mean square deviation (RMSD) values ranging from 0.12 Å to 0.32 Å for Cα atoms (17).We aligned the docked PICS peptide with each of the six individual crystal structures (Fig. S3) and observed a substantial-high structural similarity evidenced by the low RMSD for their respective Cα atoms, ranging from 0.16 Å to 0.37 Å (Table S1), validating our approach.

P1' amino acid preferences
As well-known from the 17 viral polyprotein cut sites of 3CL pro (2), P1' is the key determinant position on the P' side of the cleavage site, with the S1' subsite only accommodating small amino acid residues such as Gly, Ala, and Ser and is prohibitive for charged amino acid residues (Fig. 1b).Our P1' PICS peptide data for SARS-CoV-2 3CL pro are consistent with the cleavage sites in the polyprotein (2) and the reported preferences for the SARS-CoV (−1) protease (20).To quantify minor specificity differences at P1' , we synthesized a peptide library by substituting every natural amino acid at P1′-X in the sequence AAVALQ↓XAHHYAYR.By MALDI-TOF-MS, the app (k cat /K M ) for 3CL pro cleavage of each substituted peptide confirmed that Ala (1,522.0 ± 7.6 M −1 s −1 ) and Ser (1,519.1 ± 6.2 M −1 s −1 ) were equally preferred over Gly (1,079.1 ± 12.6 M −1 s −1 ) and revealed four minor P1' specificities: His, Phe, Met, and Asn that were cleaved with specificity constants between 45-70 M −1 s −1 (Table 1; Fig. S4).
From the structural analysis of the substrate peptide VALQ↓GAHY complex with 3CL pro in Fig. 2, P1' selectivity stems from steric hindrance imposed by Thr25, Leu27, and His41 side chains.To confirm the negative effect on 3CL pro activity of having a bulky residue at P1' , we synthesized three uniquely designed 17-mer peptides, each displaying two optimal non-prime sequences (underlined) with X in the second site substituted for Leu, Val, or Ile: VALQ↓GAHRVALQ↓XAHYR.The first site was a preferred cleavage sequence and lay N-terminal to the distal test sequence in the same peptide.This design enabled the normalization of cleavage events within each peptide and between the three members of the focussed library.The second site compared the effect of non-preferred bulky residues at P1' .By quantifying cleavage using MALDI-TOF-MS (Fig. 3a through d) where (e) Enlarged view of the network of intermolecular interactions between 3CL pro Protomer A, Protomer B (Ser1), and the substrate P1-Gln.(f ) Enlarged view of the protomer A interactions between 3CL pro active site residues and P4-P4' peptide residues.(g) Hydrophobic interactions between 3CL pro residues and the peptide substrate P4-P4' amino acid residues.(h, i) Stabilization of P' amino acid residues by the threonine trio (Thr24-Thr25-Thr26). the P1' X was Leu, Val, or Ile, there was minimal cleavage and generation of the cleavage products B = 1,262.73m/z or AB = 851.48m/z.In contrast, we observed efficient cleavage at the positive control site within the same peptide, where P1' was Gly, generating product A containing the uncut test site with the nonpreferred P1' Val (product A = 1,477.81m/z) or Leu/Ile (product A = 1,491.83m/z).These data consolidate the evidence found through PICS and molecular modeling that even where the non-prime sequence is optimal, 3CL pro will barely cleave a site with a bulky residue at P1' .

P3' histidine preference
We found that 3CL pro prefers histidine at P3' , which was previously noted for SARS-CoV (−1) 3CL pro 20 .As revealed by molecular modeling, the P3′-His side chain forms a stabiliz ing 2.0 Å hydrogen bond with the Thr25 Oγ hydroxyl group (Fig. 2f and i).The S3' subsite is large and plastic enough to accommodate a diverse range of amino acid residues, including bulky residues, with the hydrogen bond formed through Thr25 potentially being the primary P3' stabilizing interaction.We observed great flexibility of the S3' subsite, with some residues at P1' and P2' swinging into S3' to adopt a lower energy conformational state for the transient complex between the substrate and 3CL pro .As discussed later, when the P3' is histidine, 27.5% of these sequences are accompanied by a P1-Met, revealing substrate amino acid cooperativity at a distance.Using a third set of synthetic peptides spanning P4-P4' VALQ↓GAXY, where X = His, Thr, or Ala, we observed a direct relationship between the peptide cleavage rate by 3CL pro and the ability of P3' to form a hydrogen bond (Fig. 3e).P3′-Thr was cleaved less efficiently than P3′-His due to the greater propensity of the His side chain to act as a hydrogen bond donor than the Thr hydroxyl group.P3′-Ala displayed the slowest cleavage rate since the non-polar side chain cannot establish hydrogen bonds.

Structural features of non-prime-side substrate recognition
Our molecular dynamics models are in agreement with reported structures of 3CL pro (15,(17)(18)(19)21) and tightly overlay six X-ray crystallographic structures of SARS-CoV-2 3CL pro in complex with six polyprotein cleavage site sequences (Fig. S3)-raising confidence in our models and interpretations of undescribed specificity determinants for which X-ray crystal structures are not yet reported.On the non-prime side, the P1-Gln side chain is in range to form hydrogen bonds with the 3CL pro domain II residues: Phe140 (1.8 Å), Asn142 (3.5 Å), Ser144 (3.2 Å), His163 (1.8 Å), and Glu166 (2.7 Å) (Figure 2c, e-i, Fig. S5).In these interactions, the amide group of P1-Gln is both a hydrogen bond acceptor and donor through the Oε1 and Nε2, respectively.In addition to P1 side chain interactions, intermo lecular hydrogen bonds with 3CL pro protomer B Ser1 are formed by Phe140 (2.1 Å) and Glu166 (1.7 and 3.3 Å) (Fig. 2e).His172 is the third residue that forms an intermolecular hydrogen bond (2.7 Å) with the Ser1 of 3CL pro protomer B. Disruption of these interac tions by blocking or modifying the 3CL pro N-terminus has been linked to loss of enzyme activity (21).With the software available, we were unable to perform the simulation for a system having both active sites of protomer A and B simultaneously occupied.
In our previous study (3) and in the current PICS assays, we showed that 3CL pro cleaves after a non-canonical histidine and methionine in P1.This is not without precedence for coronavirus 3C proteases.Cleavage at P1-His by SARS-CoV (−1) 3CL pro was reported by Goetz et al. (22).Chuck et al. (20) further profiled SARS-CoV (−1) 3CL pro substrate specificity at P5 to P3′ positions using 19 × 8 single substitutions and, in addition to P1-Gln and P1-His, the authors identified cleavage at P1-Met.We extended these studies to SARS-CoV-2 3CL pro using massively diverse biologically derived peptide libraries.
Essential for cleavage is the Ser1 H-bonding interactions with a P1-Gln (7,15), involving both the N-terminus of Ser1 and OG that act as a H-bond donor to Glu166 (Fig. 4e).Using synthetic peptides with identical amino acid sequences (AAVALX↓SAHHYAYR) except for P1 (Fig. 4a through d), we found that cleavage at the canonical P1-Gln showed an app (k cat /K M ) of 1,528 M −1 s −1 , followed by P1-His (222 M −1 s −1 ) and P1-Met (37 M −1 s −1 ) in the MALDI-TOF assay format.Like the P1-Gln side chain that acts as a hydrogen bond donor and acceptor, P1-His forms a similar hydrogen bond network through its imidazole moiety with His163 (2.1 Å), Glu166 (2.7 Å), and the Ser1 α-amine (2.9 Å) of the protomer B N-terminus (Fig. 4f).Intriguingly, despite the cleavage of peptides with a P1-Met (ALM↓SAH) and the human native protein substrate MAP4K5 at SKLM↓SENT (3), H-bonding of the Ser1 N-terminus with the P1 side-chain is absent when P1 is Met (Fig. 4g).However, the S1 subsite of 3CL pro accommodates the P1-Met side chain with no unfavorable interactions resulting in a slower cleavage rate than for P1-Gln and P1-His.SARS-CoV-2 induces cell damage by increasing intracellular reactive oxygen species (ROS) levels (23)(24)(25)(26).The thiol group of the methionine side chain is one of the main targets of ROS (27,28) and methionine sulfoxide in native proteins is of significant physiological relevance (29)(30)(31)(32).Therefore, we performed chemical oxidation of P1 methionine in the synthetic peptide AAVALM↓SAHHYAYR to compare cleavage rates.The peptide containing the oxidized methionine (Met OX ) was obtained in >99% yield (Fig. S6), and its cleavage efficiency was 3-fold higher ( app k cat /K M = 111 M −1 s −1 ) than for the unmodified peptide (37 M −1 s −1 ) (Fig. 4c  and d).We ascribe the faster cleavage rate of oxidized methionine (P1-Met OX ) to a 3.0 Å hydrogen bond between the introduced oxygen atom that acts as a H-bond acceptor and the protomer B N-terminal amine group (Fig. 4h).The slight differences in kinetic parameters measured between the different libraries or quenched fluorescent peptides are consistent with the properties of the different assay formats.

Cleavage site amino acid positional cooperativity
An advantage of PICS is the large variation of the peptide library.In addition to investigating substrate specificity at individual positions, we sought positional synergism between different residues.To do so, we generated Icelogo motif plots where single amino acids were fixed at selected positions (Fig. 5).Notably, for the noncanonical P1-His, we found that phenylalanine was the preferred amino acid residue at P2 (28.8%) and cysteine was preferred at P2' (15.4%).Indeed, when cysteine was found at P2' then 63.8% of oxidized methionine favoring cleavage.It is well known that ROS targets the thiol group of methionines (27).Recent studies have shown that ROS production is markedly elevated in COVID-19 patients and is related to disease pathogenesis and progression (23)(24)(25)(26).This rise in intracellular ROS could increase the range of 3CL pro substrates in the cell and improve the cleavage of proteins containing a P1-Met to form kinetically accessible substrates.
Developing next-generation antiviral therapies, vaccines, and viral protease assays for inhibitor development is fundamental to characterizing the molecular mechanisms of pathogenesis and virulence and, ultimately, in combating COVID-19.Currently, the most widely employed quenched fluorescent substrate has a k cat /K M = 859 M −1 s −1 and was optimized using non-natural amino acids (Ac-Abu-Tle-Leu-Gln-ACC) (6) to be a marked improvement over the SARS-CoV-1 3CL pro peptide substrate early in the pandemic.The structural insights gleaned from global analysis of >800 cleavage sites identified by PICS and in-depth molecular simulation modeling allowed us to design two new quenched fluorescent substrates Mca-VRLQSK(Dnp)RR and Mca-VRLQAK(Dnp)RR.Notably, these peptide substrates are designed from natural amino acids, facilitating their synthesis and application.By displaying >15× improved sensitivity with a k cat /K M of 13,696.8M −1 • s −1 and superior solubility and handling properties, the Mca-VRLQSK(Dnp)RR quenched fluorescent substrate can be applied in various assay formats, including those for highthroughput drug screening.

FIG 1
FIG 1 Protease specificity of SARS-CoV-2 3CL pro determined by PICS assays.(a) Venn diagram showing the distribution of the 816 cleavage sites identified after incubating active SARS-CoV-2 3CL pro with proteome-derived peptide libraries generated by one of the three proteases shown (trypsin, GluC, LysargiNase) (N = 2).Heatmaps with their associated Icelogos for each PICS library and the composite data compiled from the three libraries for (b) active 3CL pro and (c) control catalytic inactive mutant 3CL pro (Cys145Ala) experiments.Heatmaps show sites with an occurrence value >2-fold.Highlighted in red is the composite motif across all libraries that provides the substrate sequence specificity for 3CL pro determined from n = 816 cleavage sites identified by PICS.See also Fig. S2 for matching data for each of the two independent replicates of the assays.

Full 8 FIG 2
FIG 2 Structure of the top-ranked model of the peptide substrate VALQ↓GAHY docked onto the active site of 3CL pro protomer A (PDB 6XHM).(a) Green putty cartoon representation of 3CL pro protomer A (ribbon representation) residues that interact with the substrate by hydrogen bonding.The substrate interacting amino acid residues and the catalytic cysteine are shown as green and yellow lobes, respectively.The substrate prime side (P') (red sticks) interacts with the 3CL pro catalytic domain I (Ser10-Pro99), whereas the non-prime (P) side of the substrate (blue sticks) interacts with domain II (Lys100-Gly182).The C-terminal four residues of dimerization domain III (Asn-Gln 310) are not resolved.A 1D structural representation of Domain I shows the structural features, bounding residues, and location of the P' substrate interacting threonine trio (Thr24, Thr25, Thr26).(b) Sequence alignment of the 3CL pro from MERS, SARS-CoV-1, and SARS-CoV-2 generated by Clustal Omega showing conservation and divergence of the P' interacting residues in Domain 1 bounded in red and highlighting the threonine trio in SARS CoV-1 and SARS-CoV-2 3CL pro , which is absent in the MERS protease (highlighted in blue).(c-i) 3CL pro dimer.Protomers A and B are shown as gray and orange surfaces, respectively.Substrate non-prime (P) and prime (P') sides are shown as blue and red sticks, respectively, with the major interacting residues numbered.Green sticks represent interacting 3CL pro amino acid residues.H-bonds are shown as yellow dashed sticks.(d) Green putty cartoon representation (Continued on next page)

FIG 2 (
FIG 2 (Continued) of 3CL pro Protomer A residues in Domain I that interact with the substrate by making H-bond interactions.Domains II and III are shown as an orange surface.

a
pro determined by positional scanning of all 20 natural amino acids (X) in the peptide (AAVALQ↓XAHHYAYR) P1′ amino acid residue app (k cat /K M ) (M −1 s −1 ) The specificity constants for SARS-CoV-2 3CL pro cleavage of 20 peptides (AAVALQ↓XAHHYAYR), where X is one of each 20 natural amino acids substituted at P1' to form a library of 20 peptides; ↓, scissile bond.(See also Fig.S4.)

FullFIG 3
FIG 3 Influence of non-canonical P1' amino acid residues on the cleavage kinetics of 3CL pro with an optimal P sequence.(a) A preferred 3CL pro cleavage site derived from PICS analyses (VALQ↓GAHR) was incorporated as an internal positive control for each peptide containing a non-canonical P1' residue in the otherwise optimal P' cleavage sequence (VALQ↓XAHR) lying C-terminal to the positive control cut site.X was either Leu, Val, or Ile.The calculated [M + H] + of the intact tested peptide and each predicted cleavage product are tabulated.(b, c and d) Left, progress curves of peptide cleavage after incubation with 3CL pro (1:50 mole ratio, E:S).Right, MALDI-TOF-MS spectra and the detected charge states of the intact and cleaved peptides were measured at times indicated.(e) Influence of hydrogen bond formation between P3' (His, Thr, Ala) and Thr25 on quenched-fluorescent peptide cleavage profiles and kinetic parameters.3CL pro cleavage sites are indicated by ↓.

FIG 4
FIG 4 Influence of non-canonical His, Met, and Met OX P1 residues on 3CL pro activity.(a) P1-Gln and (b) the non-canonical P1-His, (c) P1-Met, and (d) P1-Met OX cleavage rates are shown after incubation with 3CL pro (1:50 molar ratio, E:S).3CL pro cleavage sites are indicated by ↓. (e-h) The network of intermolecular H-bond (dashed yellow sticks) interactions between 3CL pro Protomer A Glu166 and His163 (green sticks) amino acid residues (gray surface) and Ser1 in Protomer B (orange surface), with P1-Gln/His/Met/Met OX (blue sticks).H-bond interactions involving the respective P1 residues in the 3CL pro active site predicted by molecular docking simulations are shown.