DUckCov: a Dynamic Undocking‐Based Virtual Screening Protocol for Covalent Binders

Abstract Thanks to recent guidelines, the design of safe and effective covalent drugs has gained significant interest. Other than targeting non‐conserved nucleophilic residues, optimizing the noncovalent binding framework is important to improve potency and selectivity of covalent binders toward the desired target. Significant efforts have been made in extending the computational toolkits to include a covalent mechanism of protein targeting, like in the development of covalent docking methods for binding mode prediction. To highlight the value of the noncovalent complex in the covalent binding process, here we describe a new protocol using tethered and constrained docking in combination with Dynamic Undocking (DUck) as a tool to privilege strong protein binders for the identification of novel covalent inhibitors. At the end of the protocol, dedicated covalent docking methods were used to rank and select the virtual hits based on the predicted binding mode. By validating the method on JAK3 and KRas, we demonstrate how this fast iterative protocol can be applied to explore a wide chemical space and identify potent targeted covalent inhibitors.


Introduction
Despite the activity of al arge number drugs approved by the US Food and Drug Administration (FDA) that dependo nacovalent mode of action, [1] classical drug discovery screening cascades typically eliminate electrophilicc ompounds, mainly due to the toxicity risks associated with their mechanism. Indeed, a majority of these drugs was discovered by serendipity in biological assays,a nd their mechanism was elucidated later on, typicallya fter approval. The reluctance to use reactive ligands, and more specifically promiscuous" suicide inhibitors", is related to increased risks of carcinogenicity,h epatotoxicity,a nd potential idiosyncratic effects caused by protein haptenization. [2,3] More recently,t he reputation of covalent binders has changed thanks to the guidelines introduced for the rational designo ft argeted covalenti nhibitors (TCIs). According to these guidelines, the ligand's selectivity toward its protein target is still to be achievedb yo ptimizing the noncovalent interactions (hydrogen bonding, van der Waals, electrostatic, etc.) at the bindings ite, as in the case of traditional approaches. Furthermore, increased specificity can be obtained by targeting ap oorly conserved reactiver esidue within the protein family. [3] To this effect, the development of methods to identify poorly conserved reactive residues have aided the acceleration of TCI design. For example, activity-based protein profiling techniques (ABPP,i soTOP-ABPP [4,5] )c an be used to both investigate the activity at the proteomic level and quantify the intrinsic reactivity of functional cysteines. Also, Liu and colleagues have coined the term "kinase cysteinome" to refer to the collection of targetable cysteine residues in the human kinome [6] and published ac omputational methodology to identify such cysteines. [7] Ligands that bind through ac ovalent mechanism are not subject to classical equilibrium kinetics, as their residence time in the binding pocket can last up to days.A saconsequence, the potency of these drugs is capable of surpassing the theoreticall imits of potency/ligand efficiency. [2] Another advantage is the prolonged duration of action, which can persist even when the ligand has already been cleared from the body.T his can be beneficial for alleviating the drug burden of ap atient due to less frequent drug dosing (depending on the turnover rate of the protein) and therefore ap ossibly lower risk of idiosyncratic toxicity, which has been linked to daily drug dosage. [8] Thankst or ecent guidelines, the design of safe ande ffective covalentd rugs has gaineds ignificant interest. Other than targeting non-conserved nucleophilic residues, optimizing the noncovalent binding framework is important to improve potency and selectivity of covalentb inderst oward the desired target. Significant efforts have been made in extending the computational toolkitst oi nclude ac ovalent mechanism of protein targeting, likei nt he development of covalent docking methods for binding mode prediction. To highlight the value of the noncovalentc omplex in the covalent binding process, here we describeanew protocolu sing tethered and constrained docking in combination with Dynamic Undocking (DUck) as at ool to privileges trong protein bindersf or the identification of novel covalent inhibitors. At the end of the protocol, dedicated covalentd ockingm ethods were used to rank and select the virtualh its based on the predicted binding mode. By validating the method on JAK3 and KRas, we demonstrate how this fast iterative protocol can be applied to exploreawide chemicals pace and identify potent targeted covalent inhibitors.
In addition, the TCI approach has provent ob eavaluable tool in targeting protein bindings ites, which were previously considered as undruggable, as well as to combat drugr esistance by targeting poorly conserved non-catalytic residues. Overall,a ll of thesea spects have contributed to ar esurgence of covalent drug discoveryp rograms, which has already led to an increaseo fc linicalc andidates acting via ac ovalentm echanism. [9] In general, ac ovalentb inder first requires the formation of an initial noncovalent complex with its target, followed by the chemicalr eactionb etween the ligand's electrophilic warhead and the nucleophilic residue. As such, the most straightforward covalentd rug design approachi sb ased on the modification of ak nown noncovalent bindert oi ntroduce an electrophilic warhead. This couldi ndeed allow to reach andc ovalently modify the targeted nucleophile on the protein by maintaining the overall binding mode in the rest of the pocket. Additionally,a n important strategy is to fine-tune the warhead reactivity based on the target nucleophilicityi no rder to limit possible sidee ffects arising from off-target modifications. [9][10][11] From ac omputational perspective, once an appropriate nucleophile and warhead are identified, as tructure-based approach can be used to screen or optimize ligands to fit the binding site, while also being able to place the warhead in the vicinity of the targeted residue to form the covalent bond. Several covalentd ocking methods have recently been developed to model the structural changes occurringw hen covalent ligands bind to their target. However,o ther than the inherent limitations of traditional docking methods (i.e.,s coring, protein flexibility,s olvation, andn onclassicale ffects), [12] these tools also have to addressa dditional challenges in the simulation. Predicting the optimal geometry of the reacting groups upon covalent bond formation is of key importance for accurate simulations.F urthermore, covalent docking programsf ace the inability to evaluatet he energy of bond formation, which would requireQ M-based simulations of the reaction. Dependingo n the methodo fc hoice, modeling all the differenta nd key aspects characterizingthe binding of covalentligands is often reflected in higher computational costs than for traditional noncovalentdocking.
Among the first developed tools are GOLD [13] and Auto-Dock: [14] the formere nforces the covalent reaction through the definition of al ink atom in both the ligand and receptor before initiating the genetica lgorithm search, while the latter offers the opportunity to choose between the two-pointa ttractor approacha nd the better performing flexible side-chain method, in which the ligand is sampled as part of the protein.
In addition to al ack of the energetic contribution of covalent binding, the manuald efinition of the atoms involved in the reaction hinders the applicabilityo fc ovalent dockingp rograms to large libraries. Ar ecent approacht aken by CovalentDock [15] automatically detects reactivea toms for linking and rewards the energy contribution of the binding event as an additional MM-based term. The authors retrospectively validated their methodo n7 6c ovalently bound ligands in the Protein Data Bank (PDB),f or which CovalentDock showed better performance than GOLD and AutoDock. However,C ovalentDock is limited in reactiont ypes (only Michael addition and b-lactam openingare supported) and does not account for the flexibility of the reactedr esidue. Furthermore,t he cloudw eb server developedf or its usage appearst on ol onger be available (access attempted on October 16, 2018). More recently,o ther webbased servers such as DOCKovalent, [16] or proprietary software such as ICM-Pro, [17] FITTED, [18] and DOCKTITE [19] (an SVL-based workflow fort he modeling softwareM OE [20] )e nabled covalent docking-based virtual screening applications by using predefined and customizable reactions to identify reactingg roups.
Schrçdinger'sC ovDock [21] takes it one step further and mimics the full binding process of covalentl igands (as opposed to only taking into account the covalently attached ligand-protein complex). With this, CovDock highlights the importance of the noncovalent interactions formed prior to covalent binding. The multistep algorithm provides two alternative solutionsbymeans of a" pose prediction" module and av irtual screening module (CovDock-VS). The former includes an extensive protocol for the predictiono fthec ovalently bound pose, namely:I )ligand conformation generation;I I) positioningt he pre-reaction form of the ligand warhead close to the receptor reactive residue (mutated to Ala) using ac onstrained docking; III) resettingt he mutation to the originalr esidue, sampling its rotameric states,a nd generating the covalenta ttachment;I V) clustering and minimization of the poses (including the reacted residue);a nd V) scoring by meanso ft he Prime energy model. An additional affinity score, which averages GlideScore on both the pre-and post-reactionf orms of the ligand, is provided to compare different compounds equipped with the same or similarr eactive warheads. While it shows good binding mode prediction accuracy, this protocol takes roughly1 -2 CPU hours per ligand, so it is not suited forh igh-throughput screenings.T oledo Warshaviak and colleagues addressed this issue by developing CovDock-VS, [22] which I) skips the ConfGen step, II) limits the number of resultingp ose clusters to three, III) excludes minimization by Prime, and IV) scores and ranks protein-ligand complexes based only on the initial GlideScore. Ultimately,t his led to significantly improved speeds(% 15 minutes per structure on as ingle CPU according to the info on CovDock's latest release) over the pose prediction module, but also yielded less accurate binding modep redictions,u nless known interaction patterns were incorporated.
In general,t he performance gap in terms of binding mode prediction among the different covalent docking programs was shownt ov ary significantly depending on variousf actors (i.e.,p rotein target, accessibility of the nucleophilic residue, amount of noncovalent interactions occurring in the complex). [23] On the other hand, the speed of the simulation remains one of the main bottlenecks that can drastically affect the size and diversity of the covalent libraries used for screening applications.T ot his end, herein we present DUckCov,a time-efficient multistep VS protocol for the identification of novel covalent binders.I tw as devised to emphasize the role of the interactions mediating the initial noncovalent complex, whose optimization can, therefore, result in both an increase of the selectivityf or the target and in an opportunity to decrease the reactivity of the electrophile. As depictedi n  Figure 1, rDock [24] is first used to constrain the reactive warhead close to the targeted residue. During docking, pharmacophoric restraints are applied to known H-bond interaction points, if any.D ynamic Undocking (DUck) [25] is then used to assess the strength of these H-bonds. DUck evaluates structural stability,r ather than thermodynamic stability, and has been shown to be orthogonal to methods that attemptt oe stimate the bindinge nergy.H -bonds are suggested to be the main determinants of structural stabilityb ased on their sharp distance and angular dependencies, and their role in structure-kinetic relationships. [25,26] Finally,C ovDock is used to evaluatet he binding mode of those ligandst hat optimallyb ind through noncovalent interactions, and to check if the same interaction pattern is maintained in the predicted covalentd ocking pose.
The protocol was prospectively validated in two case studies:atarget with highly conserved noncovalenti nteractions (JAK3) anda nother one where the noncovalenti nteractions are not conserved across known inhibitors (KRas G12C ).

Results and Discussion
Case study1:J AK3 JAK3 is one of the four Janus kinases (a subfamilyo ft yrosine kinases), the only of which is primarily restricted to leukocytes. Its functional modulation has been associated with ap henotype of severe combinedi mmunodeficiency. [27] Although significant effort has been puti nto the discovery of JAK inhibitors, the search for JAK-specific ligands is still on-going. JAK3 specificity over other family members can be achieved by targeting C909 with ligandst hat are able to covalently bind this residue. H-bond interactions are known to play ap rominent role in buildingupa ffinity toward kinase targets. Because the majority of kinase inhibitors bind to the highly conserved hinge motif, DUckCova pplication on JAK3 was focused on the identification of covalent ligandsd isplaying strongi nteractions at this region.
The DUckCov workflow for JAK3 is described in Figure 2B. Based on the selected JAK3 structure, tethered and constrain-ed docking filtered the acrylamide datasetf rom roughly5 0000 compounds to 249 compounds that satisfied the H-bond interactions with E903 (backboneC =O) and L905 (backbone NH) at the hinge region (depicted in cyan in Figure 2A), while conserving the acrylamide group closet ot he reactive C909 as observed in the prepared reference ligand (depicted in grey in Figure 2A). Next, DUck was performed, using the H-bond established with E903-O as the simulation coordinate, leadingt o 92 remaining hits. From those, as econd round of DUck on the H-bond formed with L905-NH resulted in 66 compounds. In both cases, aW QB (work necessary to pull the ligand from 2.5 to 5.0 relative to the defined H-bond interaction) threshold of 6kcal mol À1 was maintained. The consensus of both interactions wasu sed to filter the rDock docking poses using DUck, as both these interactionsa re made by the reference ligand (with W QB = 13 kcal mol À1 and 11 kcal mol À1 ,r espectively). Then, in ordert og et both aq uantitative ranking and more accurate binding mode predictions, CovDock in the "Pose prediction" module wasu sed for the covalent docking simulations on the 66 DUck hits. Finally, we have selected the top 10 ligandsa ccording to their CovDock affinity scores.O ut of these, five compounds (1, Compound 1, the top-ranked ligand, has been originally reporteda sapotent double mutant EGFR L858R/T790M inhibitor (PF-06459988, IC 50 = 7nm), [28] and based on the DUckCov prediction and subsequenti nvitro testing, we found that the compoundi nhibits JAK3 with similarp otency( 5nm). Interestingly, JAK3 activity of this compound was not reported previously; however,p rofiling against 54 human kinases at 1 mm revealed its moderate JAK1 andJ AK2i nhibitory activity. [28] Compounds 5, 6, 8 and 9 are characterized by ah igher rigidity of the linker between the hinge binding region andt he warhead. Compound 5 displayed an IC 50 value of 389 nm on JAK3, and 18 % and 62 %i nhibition on JAK1 and JAK2 at 10 mm,r espectively, suggesting ac ovalent bond driven improvement of the inhibi- Figure 1. Starting from al ibraryofc ovalent ligands, the general workflow is as follows:1)docking with rDock with pharmacophoric constraints (orange spheres) and positionalrestraints for the warhead (encircled by orange ellipse), 2) dynamic undocking to test the strengthoft he H-bondinteraction that was enforcedduring docking,a nd 3) covalent docking of ligands that display the best noncovalent interactions to account for warhead flexibility (red arrow).  . A) Pre-reaction reference ligand in the structure 5TOZ in grey,and covalent attachmentino rangei nt he post-reactionform, with defined featuresa s cyan spheres, and interactions in cyan dashes. B) DUckCov protocolfor JAK3. C) To p-ten compounds, ranked by CovDock affinity scores (green:rDock binding modes, magenta: CovDock binding modes;interactions with the hinger esidues E903 and L905 are displayed in cyan). D) Experimentally tested compounds for JAK3;t he three confirmedc ompoundscorrespond to ah it rate of 60 %. ature. In the same line, compound 8 showed no activity in the biochemical assay,f urtherh ighlighting the preference for planar hinge-binding cores. Finally, compound 3 was also experimentally tested (as an analogue of 5 with af lexible linker), but has shown no activity.
In Figure 2C the poses generated by rDock and CovDock for the top 10 ligandsa re shown in green and magenta, respectively.I nn ine out of 10 cases the interactions used for pulling with DUck were reproduced in the bests coring pose generated by CovDock.( RMSD values, CovDock affinity,r Dock scores, and DUck W QB values, alongw ith the ZINC codes of the compounds are given in Supporting Information Ta ble S7). Compounds 1, 4, 5, 7, 8, 9 and 10 retain the same orientation of the hinge binding region, while compounds 2 and 3 contain the hinge binding scaffold in af lippedo rientation, duet ot he flexibility of the linker.H owever, compound 3 maintained both interactions at the hinge, while compound 2 showed ad ual interaction with L905.I na ddition to the higherd eviations (RMSD) betweent he binding modes predicted by rDock and CovDock, the linker flexibility of compounds 1-4 resultsi n highers train energiesa sw ell (relative to minimum energieso f the free ligands, see Supporting Information Ta ble S8). However,t his does not necessarily preventt he compounds to be potent inhibitors of JAK3 (as exemplifiedb yc ompound 1), in accordance with the generaln otiont hat the linker rarely has a profounde ffect on activity.C ompound 6 is the only case where neither an interaction with E903, nor L905 is observed. (The ten lowest ranked ligands are included as counter-examples in Supporting Information Figure S9, most of them lacking any kind of interaction with the hinge).
The 60 %h it rate observedw ith DUckCov against JAK3 demonstrates the importance of noncovalenti nteractions established by covalentl igands. However,t hese findings also clearly show the need for dedicated covalent docking programs sampling multiple rotameric states of the reactive residue. This would increase the chance to identify the most optimal geometry of the covalent attachment,w hich could consequently reflect in ar earrangemento ft he overall binding mode that would be generated by tethered docking.
It is also importantt on ote that running the DUckCovw orkflow took at otal 1200 CPU/GPUh ours, while running CovDock Virtual screening on the whole dataset would have taken about 13 750 CPU hours (15 minutes per ligand according to the softwarem anual). The roughly 11-fold speedup can be mostly attributedt ot he quick tethered dockings tep, leaving only af raction of the ligandst ob ee valuated by the more expensive Dynamic Undocking. If we account for parallelization as well, running this specific workflow in parallel on 24 GPUs of the Barcelona Supercomputing Center has required at otal 50 hours of runtime, while our license token limit would have allowed us to run CovDock Virtual screening on three parallel threads (three ligands),r esultingi na bout4 600 hours of total runtime, translating to ar oughly9 2-fold decrease in speed compared to the DUckCov workflow.T he reported speedups can be considered typical for academic groups (based on the accessible resources), but in am ore general sense,C PU/GPU time is more accessible (cheaper) than state-of-the-arts oftware licenses (such as Schrçdinger) for industrial researchers as well.

Case study2:K Ras G12C
To challenge the method's applicability domain, it was also applied to anothero ncologicalt arget, the catalytic domain of KRas G12C .F or KRas G12C ,e ven the best irreversible binderss how low potency if their covalent warhead is removed. [29] KRas is a small Gprotein, which is rendered constitutively active by the G12C mutation, leadingt oa bnormal cell growth.The mutation has been shown to be implicated in 40 %o fK Ras-driven lung cancers. [30] Known covalentl igandsb ind to ah ighly flexible allosteric pocket, which traps KRas G12C in the inactiveG DP-bound state (thereby confirming its druggability). [31] Additionally,covalent ligands can specifically target the mutatedK Ras G12C ,s paring the wildtype protein and offering the opportunity for oncogene-specific inhibition. [30] In Table 1, the variousH -bond interactions are displayed for 10 KRas G12C structures containing ac ovalently bound acrylamide ligand, as well as the W QB values obtained for each interaction on the reference ligand. For the remaining two of the 12 selected structures (PDB IDs 4M21 and 6ARK),n oH -bonds could be reliably identified. An interaction was used in DUck-Cov if the work required to breakt he H-bond was higher than 6kcal mol À1 .T hus, structures 5F2E (pullingf rom atoms R68-  Figure 3A,t he workflow is summarized for the two structures and three interactions that led to virtual hits, namely 5V9O, E63-OE2 and D69-OD1, and 5V6S, D69-OD1. Finally, 6 3, 22 and 47 hits were generated by DUckCov from these interaction points, respectively.Based on the resultsofJAK3, the compoundst hat were selected for experimentalt esting were en-sured to have maintained the inspected H-bonda nd/ord isplayeds imilar binding modes according to rDock and Cov-Dock.T he RMSD between rDock and CovDock poses and diversityo ft he hits were also used to support the final selection ( Figure 3C).
In Figure 3C,t he compounds retrieved using the stepwise workflow are shown. Compounds 11, 13, 15 and 16,m aintain the defined interaction with D69 of the KRas structure 5V6S, in both rDock and CovDock poses. It should be noted that the pharmacophoric restriction in rDock has atolerance of 1 relative to the reference coordinates.A saresult, in some cases the input geometry for DUck does not form ah ydrogen bond. Yet, the initial step in the DUck protocol involvesaminimization that can repair the H-bond. For this reason,a ss hown in compounds 12 and 14,s omei nteractions present highW QB values even though they were not recapitulatedb yr Dock. These interactions are also presenti nt he CovDock pose.
The overall conservation of the binding modes in all ten compounds is reasonably good, in accordance with the RMSD values between the rDock and CovDock poses. RMSD values, CovDock affinity and rDock SCORE.INTER (interaction score energy) scores, as well as W QB values, for the final ten compounds are reportedi nS upporting Information Table S10, along with their ZINC codes. These compounds were purchased and tested in HSQC NMR measurements (exceptf or compound 14 that was not availablef or immediate purchase at the time of this study). The 2D structures are included in Figure 4B for the confirmed hits, and Supporting Information Figure S11f or the rest. Strain energies of the resulting binding modes are reported in Supporting Information Table S12.
The site specific binding of compounds 11, 13, 17 and 19 was confirmed by 1 H, 15 N-HSQC (2D NMR) measurements. After the appropriate incubation time, changes in the HSQC spectrum were detectedb ased on chemical shift perturbationc onfirming the binding of the mentioned small molecules (see Figure 4A,a nd Supporting Information Figure S13 for the full spectra). The perturbedc hemical shifts are located mostlyi n the well-known Ras functional regions:t he P-loop (G10-S17), Switch I( D30-D38), and SwitchII( L56-G77). Overall, the findings suggest that the compounds can bind to KRas G12C covalently at the C12 residue, and are located in thea llostericb inding pocket of KRas, similarly to known inhibitors such as ARS-853 [31] and ARS-1620. [32] It is worth to note that from the three protein structure/H-bond combinationst hat were applied for the DUckCov workflow ( Figure 3A), all of them have produced at least one confirmed hit compound.
The 44 %h it rate retrievedf or KRas using the DUckCov protocol is exceptional, considering the lower druggability of this target. Moreover,t he correlation between affinity and activity remains elusive. [33] Considering this, the protocol was successful in identifying four novel KRas covalent binders in an efficient manner by first focusingo nt he noncovalent interactions, even though these are known to be non-conserved. Furthermore, ad edicated covalent docking program is imperative for the evaluation of possible rearrangements of the overall binding mode generated by rDock. By sampling multiple rotameric states of the reactive residue, forming the covalentb ond between the reactive atomsa nd performings tructuralo ptimization of the covalently attached ligand,t he binding mode prediction module in CovDock could increase the chance to find an optimal geometry for the ligands. Additionally,acomparison with Schrçdinger's CovDock Virtual screening module clearly highlightst he advantage of the DUckCovw orkflow,a s three out of the four confirmed hits weren ot included in the top ten virtual hits by CovDock (Supporting Information Ta ble S14).

Conclusions
DUckCovi sp resentedh ere as an ovel protocol for the identification of covalent binders that models every stage of the multistep bindingm echanism of covalent ligands in an efficient hierarchical manner.I nt his protocol, only molecules that can form as table and productivep re-reactive state are evaluated beforea ssessing the post-reactives tate, therebya llowing to explore large chemical spaces. DynamicU ndocking (DUck) is the main feature of the workflow,a si ti su sed to analyze the pre-reactive state of the ligandsb ye valuatingt he strengtho f H-bondi nteractions driving the formation of the initial noncovalentp rotein-ligand complex. Furthermore, DUck calculations are performed on focused protein chunks, thus enabling fast simulations by decreasing the size of the system. Therefore, DUckCovr elies on DUck as as tringent and efficient filter for the selection of molecules to be subjected to the following steps.N ext, the post-reactive state is analyzed by performing covalentd ocking with CovDock in the most accurate pose predictionmodule. This step is used to generate bound conformations, thus allowing to compare bindingm odes in the pre-and post-reaction states, and to assess if the key H-bonds are maintained when the ligand is covalently bound to the targetednucleophilic residue.
Our protocol was successfully validatedi nt wo case studies. For JAK3, we reported ah it rate of 60 %( three actives out of five molecules tested),i dentifying two novel, low micromolar and high nanomolar ligands, as well as alow nanomolar inhibitor,o riginally developed for another kinase target( EGFR). For the more challenging KRas G12C protein target, four novel covalent ligandsw eree xperimentally confirmedo ut of nine tested. Due to the highly flexible natureo ft he KRas G12C allostericb inding pocket, the resulting 44 %h it rate can be considered exceptionally good. The two case studies display the broad applicability of DUckCov,i ni dentifying novel chemical matter for structurally better characterized, as well as more challenging targets.I tis also important to highlight that, depending on the available resources, the presented workflow can provide a roughlyt en-to hundred-fold speedup, as compared to ac ommerciallya vailablev irtuals creening tool for covalent binders (SchrçdingerC ovDock).

Experimental Section
Target structure selection:F or JAK3, the PDB structure 5TOZ (chain A) was used as the template, using the co-crystallized inhibitor PF-06651600 as the reference ligand. At the moment of selection, eight structures were available containing covalently bound ligands having at erminal acrylamide as warhead (4QPS, 4V0G, 4Z16, 5TOZ, 5TTS, 5TTU, and 5TTV). Alignment and superposition of these structures in MOE [20] led to an average RMSD of 0.78 .G iven the structural conservation of the JAK3 kinase, the choice for structure 5TOZ was based on its co-crystallized ligand having the best inhibitory potency (0.4 nm). [34] For the second case study,astructure ensemble approach was used, due to the pronounced flexibility of the KRas allosteric binding site. At the end of May 2018, 23 KRas structures containing covalently bound ligands had been deposited in the PDB. The majori-   (20/23). Of these, 12 were acrylamide-based covalent ligands, while eight contained av inylsulfonamide moiety as the electrophile. As for the remaining three complexes, one was formed via ar ing opening reaction (5V6V) and the other two were formed through disulfide formation (4LUC, 4LV6). We limited the set to the 12 acrylamide-based complexes due to the limited commercial availability of screening compounds able to react via ring opening or disulfide formation, and the narrower chemical space of available vinylsulfonamides (roughly 2000 purchasable compounds in ZINC, versus 50 000 purchasable acrylamides [16] ). The average RMSD of these 12 structures in the flexible switch II loop was 2.41 (Supporting Information Ta ble S1). From these 12, those structures in which at least one H-bond with the co-crystallized ligand was stronger than 6kcal mol À1 (as evaluated by DUck) were selected for the ensemble approach in DUckCov.
Protein structure preparation (in silico):A ll of the selected PDB structures were prepared in MOE as follows:I )the structure was corrected (termini were capped, gaps were capped or ah omology of the sequence of as imilar structure was built, alternate conformations were chosen if more than one was present, correct tautomeric states for the residues were assigned);I I) the structure (including the covalently bound ligand) was protonated at pH 7; III) the covalent bond between the residue and the ligand was manually broken;I V) the ligand warhead in its pre-reaction form was built with MOE builder by making the acrylamide's Ca-Cb bond double, and finally;V )t he cysteine was rebuilt, then minimized in the presence of the pre-reaction ligand.
The structures for the CovDock simulations were prepared with the Protein Preparation Wizard provided in the Schrçdinger Suite, [35,36] in order to further refine the protein's H-bond network and to perform ar estrained minimization of hydrogen atoms. The receptor grid box required for docking calculations was centered on the corresponding co-crystallized ligand.
Datasets for VS:A so ne of the main features of the protocol is its efficiency,i ti sm ost beneficial when al arge collection of electrophilic ligands is available. In general, if the protein has already been targeted by covalent inhibitors, the library can be compiled by collecting commercially available ligands and/or by enumerating synthetically accessible compounds bearing the same warhead type as the crystallized inhibitor.F or JAK3, the vast majority of known covalent inhibitors bind through an acrylamide warhead, while for KRas G12C ,m ost of the known potent covalent inhibitors bind through either an acrylamide or av inylsulfonamide warhead. Further on, due to the limited commercial availability of screening compounds containing other types of warheads, we used the acrylamide dataset (roughly 50 000 compounds) collected by London et al. (for testing their recently published covalent docking program, DOCKovalent), to validate our protocol. [16] Prior to docking simulations, LigPrep by Schrçdinger was used to prepare 3D conformations from SMILES codes and to generate tautomeric and ionization states at pH 6-8 while retaining specified chiralities. [35,36] General workflow description:A no verview of the protocol is shown in Figure 1. The collected library (here:Z INC acrylamide collection) is first docked with rDock against the target of interest, while simultaneously tethering the covalent warhead to its reference coordinates and using pharmacophoric constraints to enforce the main noncovalent interaction. Because the protein structure is derived from ac rystallized covalent complex, ad istance cutoff is set to avoid large deviations of the electrophilic warhead from the position defined in the reference ligand. H-bond pharmacophoric constraints are applied in the docking simulation to keep only those ligands that can establish the H-bond interactions defined as important for binding. DUck is then used to evaluate the strength of the H-bond. For DUck, only H-bonds are assessed, as they are known to be key contributors to affinity in many targets. [37,38] In a DUck simulation, the ligands are pulled from 2.5 to 5.0 relative to the defined H-bond interaction point in the protein, during a user-defined number of MD and SMD replicas. The force necessary to pull out the ligand is then used to calculate aw ork value (W QB ), which corresponds to the strength of the H-bond. What makes DUck exceptionally fast, is that only the local environment of the residue involved in the interaction is required for the simulation. Lastly,o nly those ligands that display the best noncovalent interactions (according to rDock and DUck) are covalently docked with CovDock using the most accurate pose prediction module.
Tethered and constrained docking:U sing rDock, the cavity was prepared with the reference ligand method, using the respective co-crystallized ligand. Te thered docking was used to restrain the electrophile close to the reactive residue. Te thered docking consists of two steps, namely,I )superposing atoms according to the defined SMARTS pattern, and II) docking, during which the superposed atoms can only deviate from the original position by au serdefined cutoff. The warhead was defined by the SMARTS pattern "[#6] = [#6]À[#6] = [#8]" for the acrylamide motif. During docking, the tethered part of the ligand could move freely in terms of the dihedral degrees of freedom, while the translational and rotational degrees of freedom could deviate by max. 0.1 per docking run. This is meant to allow some flexibility in the sampling, also taking into consideration that the targeted cysteine residue could display asignificant degree of flexibility.
Furthermore, ap ose is penalized if the defined pharmacophoric constraints are not met. Here, a1 deviation was permitted to increase sampling, considering that the strength of the H-bond would still be assessed by DUck later on. For JAK3, the pharmacophoric constraints were defined as an acceptor (E903 backbone C= O) and ad onor (L905 backbone NH), both of which interact with the reference ligand in 5TOZ. For KRas G12C ,t he pharmacophoric constraints were defined based on the H-bond interactions observed between the reference ligands and the protein in the selected structures:t hese interactions (along with their W QB values evaluated by DUck) are summarized in Ta ble 1.
Next, the high-throughput VS protocol (HTVS) of rDock was implemented, which consisted of three docking stages for each ligand. In each stage, the number of docking runs increases (for better sampling), and the threshold for the docking score decreases (better scores). The ligand only proceeds to the next stage if its docking score is better than the defined threshold within the specified number of runs. This is done to increase the efficiency of the simulation by progressively decreasing the number of ligands moved forward. The docking score filters were selected based on the score of the reference ligands, while being stricter for JAK3 (as the defined noncovalent interactions necessary for binding are well known) and less strict for KRas (as the defined noncovalent interactions necessary for binding are not known). For the same reason, the total number of docking runs for JAK3 was significantly lower than for KRas. The exact HTVS protocols are given in Supporting Information Schemes S2 (for JAK3) and S3 (for KRas) along with the in-place rDock SCORE.INTER scores in Supporting Information Ta ble S4.
Dynamic undocking:T he first step for aD Uck simulation is the definition of the chunk (a part of the protein structure) that repre- sents the local environment surrounding the residue interacting with the ligand. Thus, for every interaction point, as eparate chunk is created. When selecting residues for the chunk, the following guidelines were considered:I )selecting as little residues as possible to reduce computational time;I I) residues were not selected if they would block the ligand from exiting the pocket during the simulations based on the directionality of the H-bond;I II) residues were not removed if this would lead to the possibility of solvent entering the pocket from areas other than where the ligand is exiting;a nd lastly IV) preserving the local environment. This was done from the already prepared structures. The gaps created during the process of selecting the chunk residues were capped. For this, each section of residues was split into separate chains, and the termini of each chain were acetylated or methylated. Lastly,t he chunk was checked for clashes possibly created during the capping of the chains. The corresponding chunk definitions for JAK3 and KRas are included in Supporting Information Ta ble S5.
After production of the chunk, DUck automatically does the following:I )automatic ligand parameterization in MOE, II) minimization, III) equilibration, and IV) as eries of SMD (at two different temperatures), then MD simulations, in which the ligand is pulled from 2.5 to 5.0 relative to the defined H-bond interaction. Steps II) to IV) were performed with GPU-based pmemd.cuda in AMBER. [39] Five replicas of step IV) were performed, during which aW QB threshold of 6kcal mol À1 (force necessary to pull the ligand) was maintained, so that the simulation was stopped if the measured W QB value of the H-bond was smaller.A dditionally,f or KRas, the inclusion of cocrystallized GDP in the chunk was necessary,a si ts absence would have led to the surface being more exposed to bulk solvent. For this, GDP was parameterized using MOE'sP FROST forcefield, and the generated parameters were automatically included in the DUck protocol.
Covalent docking with CovDock:B ecause Schrçdinger's CovDock pose prediction module outperformed most of the other covalent docking tools, we used this approach to rank and predict the binding mode of the virtual hits identified to have strong noncovalent interactions by the previous workflow steps. [23] CovDock ranks the compounds according to an "Affinity Score", which is calculated as the average of the pre-reaction Glide score and the post-reaction in-place docking score. By deeming the energy of bond formation as constant across as et of compounds having the same warhead involved in the chemical reaction (as in the case of DUckCov), the affinitys core can be used to compare and rank ligands in as et. CovDock affinitys cores for the reference ligands of the structures that led to hits are given in Supporting Information Ta ble S4. Contrary to the first docking step in the workflow,n oa dditional restraints were applied in the covalent docking simulation other than those used by default in CovDock. Binding site residues were defined by centering the receptor grid on the ligand co-crystallized in the structure under investigation. When setting up the simulation, the acrylamide warhead was automatically recognized in each ligand structure through the SMARTS-based definition of the "Michael addition" reaction type. Ultimately,t his step was incorporated to evaluate if achange in binding mode would take place upon covalent bond formation, which would prevent the ligand from establishing the interactions defined as necessary by previous workflow steps. To that end, root-mean-squared deviation (RMSD) values between the rDock and CovDock conformations were calculated by means of aP ython script provided by Schrçdinger (rmsd.py). As mall RMSD, typically lower than 2.0 ,w as considered as favorable. Furthermore, the defined interaction patterns were also visually inspected for consensus.
Biochemical and structural characterization of the identified virtual hits:C ompounds 1, 3, 5, 8 and 9 were tested at 10 mm in duplicate with the Z'-LYTE kinase inhibition assay (Life Te chnologies). The assay uses af luorescence-based format and is based on the different sensitivity of phosphorylated and non-phosphorylated peptides to proteolytic cleavage. As uitable peptide substrate is labeled with two fluorophores (coumarin and fluorescein), forming a FRET pair.A fter incubating the kinase + peptide + test compound mixture for an hour,adevelopment reaction is carried out. Any peptide that was not phosphorylated by the kinase is cleaved, disrupting the resonance energy transfer between the FRET pair.T he reaction progress is quantified based on the ratio of the detected emission at 445 nm (coumarin) and 520 nm (fluorescein), that is, the ratio of cleaved versus intact peptide. Am ore detailed description of the assay is available on the website of Life Te chnologies. [40] IC 50 values were determined from 10 points titration measurements using the same assay.
Binding of compounds 11-13 and 15-20 to KRas G12C was tested and structurally characterized by NMR measurements, performed on aB ruker AvanceI II 700 MHz spectrometer equipped with a5mm Prodigy TCI H&F-C/N-D, z-gradient probe head operating at 700.05 MHz for 1 Ha nd 70.94 MHz for 15 Nn uclei. 1 H, 15 N-HSQC spectra were recorded at 298 Kt oo btain the protein 1 Ha nd 15 Nr esonances in both free and small-molecule-bound state and the changes in chemical shifts were followed upon complex formation. NMR samples contained 15 N-labeled KRas G12C (catalytic domain, residues 1-169) in 150 and 50 mm concentration in free protein measurement (as ar eference) and binding test, respectively,5m m GDP, 10 mm EDTA, 15 mm MgCl 2 in PBS buffer,5%D MSO and 10 %D 2 O at pH 7.4 and 150-500 mm ligand. Because some of the ligands were not fully dissolved, we used al onger incubation time (96 h), and ah igh number of scans for every HSQC spectrum (NS = 128). To avoid false-positive results, the free protein was incubated for four days as well, and the spectra of the samples were compared with the spectrum of the incubated free protein. All 1 Hc hemical shifts were referenced to the DMSO peaks (which were calibrated to DSS resonance before in free protein measurements) as DSS was not added to avoid any side reactions with the ligand. 15 N chemical shift values were referenced indirectly using the corresponding gyromagnetic ratios according to the IUPAC convention. Sequence-specific assignments of H N and Ni nt he bound KRas G12C spectra were transferred from our results to be published elsewhere (BMRB entry code:2 7646). There were ambiguities in a number of resonances in crowded spectral regions;h owever,t his fact did not influence the final outcome. All spectra were processed with Bruker TOPSPIN and analyzed using NMRFAM-SPARKY software. [41]