Identifying promising druggable binding sites and their flexibility to target the receptor-binding domain of SARS-CoV-2 spike protein

The spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is crucial for viral infection. The interaction of its receptor-binding domain (RBD) with the human angiotensin-converting enzyme 2 (ACE2) protein is required for the virus to enter the host cell. We identified RBD binding sites to block its function with inhibitors by combining the protein structural flexibility with machine learning analysis. Molecular dynamics simulations were performed on unbound or ACE2-bound RBD conformations. Pockets estimation, tracking and druggability prediction were performed on a large sample of simulated RBD conformations. Recurrent druggable binding sites and their key residues were identified by clustering pockets based on their residue similarity. This protocol successfully identified three druggable sites and their key residues, aiming to target with inhibitors for preventing ACE2 interaction. One site features key residues for direct ACE2 interaction, highlighted using energetic computations, but can be affected by several mutations of the variants of concern. Two highly druggable sites, located between the spike protein monomers interface are promising. One weakly impacted by only one Omicron mutation, could contribute to stabilizing the spike protein in its closed state. The other, currently not affected by mutations, could avoid the activation of the spike protein trimer.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the COVID-19 outbreak [1,2].It was originally discovered in Wuhan, China, in late December 2019.The World Health Organization (WHO) labeled this epidemic a pandemic in March 2020 and reported nearly 760 million confirmed cases of COVID-19 and 6.8 million deaths by the end of March 2023 (https://covid19.who.int/).Important variants of the SARS-CoV-2 virus have emerged in the UK, Brazil, India, and South Africa from December 2020 to the end of November 2021.Five variants have been recognized by the WHO as variants of concern and are labeled as coronavirus variants: Alpha, Beta, Gamma, Delta, and Omicron [3][4][5].SARS-CoV-2 is an extremely unstable virus [6,7], which is favorable for the development of new variants.
COVID-19 vaccines have enabled the reduction of the spread, severity, and death caused worldwide.Eight drugs were approved in May, 2022.Three of these drugs are biologics, which aim to block viral attachment and entry into human cells [8][9][10].Two are a combination of two monoclonal antibodies intended to prevent mutational escape: casirivimab/imdevimab (commercialized under the name Ronapreve) [8] and bamlanivimab/etesevimab [9].The third drug, sotrovimab (Xevudy) [10], is also a monoclonal antibody.Although monoclonal antibodies are an important therapeutic advancement, their manufacturing costs are high, and they are not convenient for patients because they are administered intravenously.Small molecules are often cheaper and easier to produce than protein-or peptide-based drugs [11].They can withstand a wide range of delivery modalities, including oral administration, making them a preferred choice among pharmaceutical chemists.Currently, four small-molecule drugs are available for the treatment of COVID-19.Remdesivir and molnupiravir exert their antiviral action by perturbing viral RNA replication.Baricitinib attenuates the uncontrollable inflammatory response by the immune system owing to SARS-CoV-2 infection, referred to as a cytokine storm, by specifically inhibiting Janus kinases.From the end of December 2021, a fourth drug, named Paxlovid, has become available for people who are at high risk of developing severe COVID-19 symptoms [12].Paxlovid is a combination of two antiviral medications, nirmatrelvir and ritonavir, administered orally.Several repurposed drugs and new drug candidates are currently in phase III and IV trials.However, it is still crucial to develop drugs that can alleviate the severity of COVID-19 in individuals who are at high risk for progression to severe COVID-19.
The instability of the SARS-CoV-2 genome and the high possibility of new variants emerging make the discovery of new treatments and effective maintenance of already discovered drugs challenging [13].Therefore, it is crucial to understand the interaction mechanisms of SARS-CoV-2 at the molecular level and the impact of its variants on these interactions.
The SARS-CoV-2 genome encodes four structural proteins: the spike, envelope, membrane, and nucleocapsid proteins, and 16 nonstructural proteins.Spike, envelope, and membrane proteins form the viral envelope, and the nucleocapsid protein binds to the RNA genome [14][15][16].The spike protein is a homotrimeric glycoprotein and each monomer is composed of 1273 residues [17].In coronaviruses, the spike protein can interact with human angiotensin-converting enzyme 2 (ACE2) to initiate fusion with host cells [18].The receptor-binding domain (RBD) of the spike protein is responsible for the interaction with ACE2 [19].Therefore, the RBD is a crucial protein target for the development of COVID-19 drugs.The 17 residues on the RBD that interact directly with ACE2 have been grouped under the name of the receptor-binding motif (RBM).Targeting the RBM using small molecules, therapeutic peptides, and neutralizing antibodies was determined to be an attractive method to inhibit the ability of the spike protein to bind ACE2, owing to its low glycosylation [20][21][22].
Advancements in structural biology and structural bioinformatics methods have enabled the elucidation of molecular and dynamic mechanisms of protein-protein or protein-molecule interactions.It is possible to design molecules capable of disrupting protein-protein interactions using structure-guided approaches.For this purpose, it is important to first understand the flexibility of the structures and to identify the residues essential for stabilizing the interaction, commonly named "hot spots".
However, the interaction between a drug and a target protein depends on a few key residues as well as on a larger protein cavity or pocket, referred to as the binding site, which must have physicochemical and geometrical properties in agreement with those of the ligand [23].Therefore, it is also crucial to determine which regions of the protein surface have the suitable druggability profile that can be targeted by therapeutic molecules.
The first SARS-CoV-2 spike protein structure was resolved in February 2020 [24].The number of available spike protein structures has increased rapidly, explaining the structural mechanisms that allow SARS-CoV-2 entry into the host cell.According to cryo-electron microscopy (cryo-EM) structures of the spike protein, RBD exists in two states: a closed (or "down") and an open (or "up") state.In the latter state, the RBD is less buried and can interact directly with ACE2 [24].At the beginning of this study, three crystal structures of the wild-type ACE2/RBD complex were available: PDB IDs 6M0J [25], 6LZG [26], and 6VW1 [27].
With the release of the experimental structures of the SARS-CoV-2 RBD-ACE2 complex, numerous studies have been performed to elucidate the molecular and dynamic mechanisms involved in this protein-protein interaction.
Considering the flexibility of the protein partners improved the identification of hot spot residues.Spinello et al. identified 12 hot spot residues using SARS-CoV-1 and SARS-CoV-2 PDB structures and the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) method [28,29].Using structural comparisons between the SARS-CoV-1 RBD-ACE2 X-ray structure and a SARS-CoV2 RBD-ACE2 model built by homology and the MM-PBSA method, the authors identified ten hot spots for the affinity of the SARS-CoV-2 RBD for ACE2 [30].Recently, a combination of molecular dynamics (MD) simulations and the Molecular mechanics-generalized Born surface area (MM-GBSA) method enabled the identification of 13 hot spot residues [31].Overall, the hot spot residues assessed using these different methods showed coherency but clear dependency on the employed experimental structures and the protocol used.
Identification of these hot spot residues is a key step in the design of drug molecules capable of disrupting protein-protein interactions.However, the protein surface region or binding site to be targeted by therapeutic compounds also requires characterization in terms of the physicochemical and geometrical properties inherent to its flexibility.The binding sites must be druggable and possess physicochemical and geometrical properties that allow them to bind to drug-like molecules [23].
Some studies have focused on the determination of druggable sites on SARS-CoV-2 spike RBD, in which occupancy could reduce, directly or through an allosteric mechanism, the interaction between the RBD and ACE2 [32][33][34][35].Trigueiro-Louro et al. identified druggable pockets from different spike protein structures using consensus from pocket estimations [32].They concluded that the RBD is one of the two most promising conserved druggable regions.They identified four to seven RBD pockets from three RBD-ACE2 complex experimental structures (PDB IDs 6VW1, 6LZG, and 6M0J) without considering protein flexibility.They selected two pockets characterized as druggable, observed in the three RBD structures.Carino et al. [33] identified 300 putative pockets in the whole trimeric structure of the spike protein using the Fpocket estimation program [36].They selected six pockets on the RBD structure based on their potential druggability, structural rigidity, and sequence conservation between the SARS-CoV-2 and SARS-CoV-1 RBD.They identified two continuous pockets in the central β-sheet core of the spike RBD that were targetable by steroidal molecules based on virtual screening of FDA-approved drugs on these six RBD pockets.Using in vitro assays, they confirmed that several compounds highlighted by virtual screening can reduce the ability of the RBD to bind to ACE2.
Although these studies are promising, they have been performed only on static protein structures.The dynamic aspect of the structure, as well as the flexibility of the pockets, were not considered.To identify reliable druggable binding sites for drug design, the importance to consider the presence of cryptic, transitory, and flexible pockets, have been underlined by Stank et al. [37], Abi Hussein et al. [38], and Kuzmanic et al. [39].Recent research conducted by Dokainish et al. [40] stressed the importance of considering the intrinsic flexibility of the SARS-CoV-2 spike protein receptor domains.They identified cryptic pockets from RBD intermediate states determined using MD simulations and concluded that the intrinsic flexibility of the spike protein must be taken into consideration when developing vaccines or antiviral medications.
In this study, we proposed a protocol considering the structural flexibility of RBD while combining machine learning approaches to identify promising RBD druggable binding sites for drug design development.
For this purpose, we performed a structural flexibility analysis of both unbound RBD and RBD-ACE2 complex using MD simulations.We also identified hot spots (crucial residues involved in the RBD-ACE2 interaction) using MM-PBSA energetics computation.Using a large number of conformations extracted from the MD simulations, we estimated the pockets on the protein surface during the MD simulations and characterized them in terms of the druggability score using a supervised machine learning method developed in the laboratory [41,42].We then performed unsupervised hierarchical classification on the large set of estimated pockets using pocket similarity in terms of residue composition.Main pocket clusters regrouping pockets with a similar residue composition correspond to binding sites, frequently observed along DM simulations.Analysis of these pocket clusters in terms of residue localization, frequency, variability, and druggability allowed the identification of promising druggable pockets and of their key residue in terms of contribution to the binding site and its druggability.Additionally, the potential mechanistic impact of mutations of SARS-CoV-2 variants of concern on the selected druggable binding sites located in key regions of the RBD surface and hot spots is discussed.

Protein preparation
Three crystal structures of the SARS-CoV-2 RBD complexed with the human ACE2 receptor were considered (PDB IDs 6VW1 [25], 6M0J [26], and 6LZG [27]).Overall, the three structures are very similar, with a maximum backbone root mean square deviation (RMSD) of 1.25 Å.However, 6VW1 is based on a chimeric SARS-CoV-2 RBD protein and displays poorer resolution than the 6M0J and 6LZG structures.Coordinates of 6M0J and 6LZG crystal structures were determined at 2.45 Å and 2.50 Å, respectively.Therefore, our study focused only on the 6M0J and 6LZG structures.Each complex was protonated at physiological pH (7.4) using the PROPKA tool [43].

MD simulations
MD simulations were performed on these two crystallographic structures to study the dynamic behavior and stability of the ACE2-RBD complex.They were also performed to study the dynamic behavior of the unbound state of the RBD initially extracted from the ACE2-RBD complex.
MD simulations were performed using the GROMACS software package [44] with the CHARMM36m all-atom additive protein force field [45] under periodic boundary conditions.A dodecahedron water box of TIP3P water molecules was used to run the simulations.The full simulation system consisted approximately of 12,500 atoms.Non-bonded interactions were truncated at a cut-off distance of ten Å for electrostatic twin-range cut-off and van der Waals cut-off.Neighbor searching was performed every 10 steps.The energy of the system was minimized over approximately 1000 steps using the steepest descent algorithm for energy minimization after the ion addition and neutralization of the systems.Each system was equilibrated with an NVT (Number of particles, Volume, and Temperature) ensemble during 1 ns at a temperature of 300 K and a coupling constant of 0.1 ps.Subsequently, each simulation was performed under number of particles, pressure, and temperature conditions, coupling the system to a heat bath using the Berendsen algorithm and setting the temperature at 300 K and the pressure at 1 bar during 1 ns.For the production step, ten independent runs of 100 ns with different initial velocities were performed.The LINCS algorithm [46] was applied to all the bond lengths to constrain them, and an integration time step of 2 fs was used.The electrostatic interactions were cut off at 1.2 nm and were calculated using the particle mesh Ewald method.As for Lennard-Jones interactions, they were cut off by 1.2 nm by using the potential shift Verlet method [47].Periodic boundaries conditions were also applied to all systems.The temperature was maintained at 300 K with a V-rescale thermostat [48] and τ T coupling constant of 0.1 ps.The pressure was maintained at 1 bar with a τ P constant of 2.0 ps and a compressibility of 4.5 × 10 −5 bar −1 using the Parrinello-Rahman barostat [49].
For each PDB structure, the trajectories were merged for the analyses, resulting in a total of 1 μs.The RMSD analysis measures the average distances between the starting structure and each structure obtained over time.The root-mean-square fluctuation (RMS Fluctuation) was analyzed to identify flexible residues.

Identification of hot spot residues
The MM-PBSA method is widely used for binding free energy calculations from conformations extracted from the MD trajectory [54].In this study, MM-PBSA calculations were performed on MD simulations of the RBD-ACE2 complexes (6M0J and 6LZG).The representative conformations of each independent MD simulation were extracted through cluster analysis using the GROMACS gmx cluster tool.The frames were selected in such a way as to cover a wide range of trajectories and to sample different conformational spaces of the complex.Accordingly, representative frames of each of the determined clusters were extracted for every MD simulation using the Gromos algorithm [55] and the best optimal backbone RMSD cutoff was selected.To choose a reasonable cut-off value for each trajectory, we varied the backbone RMSD cutoff between 0.9 and 1.5 Å. Thereby, we found dominant clusters that captured > 70% of the trajectory for each MD simulation.Thus, for every dominant cluster (seven clusters/MD), we extracted one representative conformation, which was subjected to MM-PBSA calculations.
A free energy decomposition analysis was performed using MM-PBSA residue decomposition to retrieve the contribution energy of each amino acid represented on the binding interface of both RBD and ACE2.The total free energy and its individual components for each individual system (6M0J and 6LZG) were averaged and weighted based on the cluster populations, that is, a higher weight was assigned to the conformations extracted from more populated clusters.
We used the g_mmpbsa package for MM-PBSA computations [56].The dielectric relative constant ε was set to eight for proteins and 80 for water [57].In this approach, the binding free energy G bind between the protein and ligand/protein includes different energy terms and can be calculated as follows: where E MM is the gas-phase interaction energy, which is the sum of the van der Waals energy E vdw and the electrostatic energy E elec .G sol is the sum of the polar solvation energy G PB and nonpolar solvation energy G SA .The polar solvation energy was calculated using the Poisson Boltzmann (PB) approximation model, whereas the nonpolar solvation energy was estimated using the solvent-accessible surface area (SASA).The entropy contribution (− T S) was ignored in this study because of its high computational cost.After the calculation, the binding free energies were decomposed into each residue.We considered hot spots as residues with binding energy below − 1 (favorable) or above + 1 kcal.mol−1 (unfavorable).It is important to note that the more negative the energy, the more favorable is the contribution.In contrast, positive energy values indicate unfavorable interactions and a poor contribution to the complex.

Pocket estimation, druggability prediction and tracking along the MD simulations
The estimation of the RBD pockets was first performed on the static structures from the PDB.
Pockets were estimated with PockDrug tool based on the automated geometry-based method from Fpocket [36] independent of ligand proximity information.The ensemble of pockets was then characterized in terms of 19 physicochemical and geometrical properties and druggability score using PockDrug [41,42].PockDrug provides a prediction of the druggability score which, if greater than 50%, indicates a druggable pocket.
Secondly, pocket tracking was run on the sample of MD generated conformations of the RBD unbound and bound states.This allows the identification of new pockets during dynamics or changes in pocket properties between the PDB conformations or conformations observed during MD.This approach can also detect some separation or fusion of pockets resulting from both local and global alterations, including those occurring in transient and allosteric pockets.
To consider the flexibility of the RBD, we estimated the pockets from conformations obtained from the RBD MD simulations.A total of 1000 RBD conformations were sampled from MD simulations at regular intervals.RBD pockets estimated from this series of 1000 conformations were merged into a pocket set.

Clustering of pockets and identification of druggable binding sites
We performed unsupervised hierarchical classification using residue pocket similarity to identify binding sites frequently observed along MD simulations, corresponding to main pocket clusters.
The similarity of the RBD pockets can be quantified using binary distance based on common residues.A binary distance of 0 corresponds to two identical pockets in terms of residues.Hierarchical classification of the pockets was performed using the binary distance, Ward metric (ward.D2) [58], and R Hclust package [59].Dendrogram visualizations were performed using the Heatmaps2 package in R39 [51] to illustrate pocket similarity in terms of residues.The dendrogram lengths between the pockets and/or pocket clusters are proportional to their binary distances.
The resulting pocket classification allows the identification of the main clusters of pockets similar in terms of residues composition, corresponding to binding sites frequently observed along MD simulations.The dissimilarity between the pockets or pocket clusters increased with dendrogram lengths between pockets or clusters.Main pocket clusters were compared i) between pocket sets extracted from two PDB IDs (6LZG and 6M0J) to assess the impact of the initial PDB structure and ii) extracted from bound and unbound RBD conformations to assess the influence of ACE2 interaction.
The number of similar pockets within one cluster indicates the frequency of the appearance of its corresponding binding site.The flexibility of the binding site is described by the residue variability of its corresponding pockets.Analysis of pocket clusters was performed in terms of frequency, residue contributions and variability, RBD localization, and druggability scores.Finally, we combined these statistical and flexibility analyses to select binding sites based on the criteria of frequency and residue stability, localization in key regions of the RBD, accessibility, and druggability scores.
We also crossed the binding site analysis with the hot spot information obtained from the MM-PBSA analysis and evaluated the potential structural impact of mutations observed in the five SARS-CoV-2 variants of concern (Alpha, Beta, Gamma, Delta, and Omicron) on these selected hot spots and binding sites to identify the druggable binding sites that can also be targeted in mutated systems.

Protocol of identification of druggable binding sites and their flexibility
Combining supervised and unsupervised machine learning techniques with a traditional flexibility study through MD simulations and MM-PBSA energetic computation analysis yields a comprehensive protocol.This three-step protocol identify druggable binding sites frequently observed on the RBD protein and their key residues in terms of druggability and contribution to stability (Fig. S1).

RBD-ACE2 flexibility
Proteins are dynamic molecules with intrinsic flexibility and often undergo conformational changes upon partner binding.Therefore, it is essential to consider their dynamic behavior to predict which surface regions may be of interest.
RBD flexibility was studied with and without ACE2 by MD simulations.Mean Cα RMSDs have been found to stabilize for both bound proteins systems (6M0J and 6LZG) around 2.5 Å with fewer fluctuations (Fig. S3 and Fig. S4).These results highlight the stability of the SARS-CoV-2 RBD and ACE2 complex throughout all the simulations (Fig. S3A and Fig. S4A).The low mean Cα RMSD values indicated that the unbound RBD was stable (Fig. S3D and S4D).No major flexibility variation was observed between the bound and unbound RBD (Fig. S3C, Fig. S3D, Fig. S4C and Fig. S4D), considering both unbound and bound RBD simulations, have not been reported in other studies.To examine the flexibility and local changes in the complex, Cα RMSF versus the residue number of both RBD systems were investigated (Fig. S5).RMSF analysis revealed that the RBM region flexibility increased to a greater extent in the unbound RBD structure than in the bound RBD and increased from approximately 1.5 to 2.0 Å.This was not surprising, as the RBM mediates contact with ACE2, which tends to be more stable when the complex is formed (Fig. S5A and Fig. S5C) than in unbound RBD (Fig. S5B and Fig. S5D).
Free energy decomposition analysis was also performed to obtain a detailed insight into the interactions between each residue in the binding interface of the two proteins.The binding interaction of each residue included four terms: molecular mechanics contribution, polar contribution, apolar contribution, and total energy contribution.The individual components of each residue were averaged for each system (6M0J and 6LZG).The hot spot residues on the RBD surface were essentially the same for the core interactions in 6M0J and 6LZG (Fig. 2).
Notably, residues F486 and Y489 inserted into a hydrophobic pocket on the surface of ACE2 formed by residues including T27, F28, L79, and M82.Thus, the presence of aromatic amino acids in the pocket may provide an additional binding force through π-stacking interactions (Fig. 3A) and explain the favorable ΔE MM , which is mainly due to the favorable contribution of ΔE vdw .The residue K417 showed a significant ΔE polar contribution that reaches 11.0 kcal.mol−1 .However, the fact that this residue formed a salt bridge and a hydrogen bond with residue D30 led to a more negative ΔΕ MM contribution of − 14.0 kcal.mol−1 (Fig. 3B) and, consequently, it counterbalanced the unfavorable polar solvation energy.Moreover,  ACE2 residue Y41 interacted with Q498 and Y505 through van der Waals interactions and formed two hydrogen bonds with residues T500 and N501 through its side chain (Fig. 3C).Although ACE2 residue H34 was not directly involved in any intermolecular hydrogen bond with RBD, it nonetheless provided favorable polar contacts with the side chain of L455 of the virus protein (Fig. 3D).L455 also formed intermolecular van der Waals interaction with D30 (Fig. 3D).Interestingly, E484 displayed a large positive E MM contribution that reaches 10.0 kcal.mol−1 (Fig. 2).Thus, E484 free energy contribution (E TOT ) was found to be disfavoring complex binding and formation.Two negatively charged residues, E35 and E75, surrounded E484, which creates long-range opposing repulsive charges and explains the unfavorable contribution of this residue to the complex (Fig. 3E).Altogether, our analysis shows that among the residues comprising the ACE2-RBD interface, some critical amino acids are known as the evolutionary signature of RBD, and major hot spots of both proteins contribute favorably to complex stability and formation [27].In conclusion, eight favorable hot spot residues (K417, L455, F456, F486, Y489, T500, N501, and Y505) and one unfavorable hot spot residue (E484) have been identified as important SARS-CoV-2 residues that are critical for ACE2 binding.
These results are in good agreement with other studies in which the authors identified 10-13 hot spot residues using similar bioinformatics methods on different RBD structures, that is, X-ray structures or models built by homology modelling, from SARS-CoV-1 or SARS-CoV-2 [29][30][31][64][65][66].

Tracking of RBD pockets during the MD simulations
We first examined pockets that were extracted from the unbound RBD system from static PDB.From the initial 6M0J and 6LZG PDB IDs, we estimated four and seven pockets on the unbound RBD, respectively, using the PockDrug program [41,42].This is consistent with the observation of a variable number of pockets estimated by Trigueiro-Louro et al. on the same PDB structures [32].
We secondly examined pockets that were extracted from the unbound RBD systems considering their flexibility.We extracted two sets of pockets from two datasets of 1000 conformations sampled from each of the unbound RBD structures 6M0J and 6LZG during the MD simulations.The average number of RBD pockets by conformation in these two sets were highly similar (5.1 versus 5.8) despite the different number of pockets estimated on the two static PDBs.The frequency of druggable pockets (50.4% versus 54.0%) and the frequency of pockets located at the interface (31.0% versus 25.0%) were also highly similar within both 6M0J and 6LZG MD trajectories.The consistency of the results obtained from the MD simulations, regardless of the initial structure, shows the importance of integrating the protein and pocket structural flexibility for binding site identification and selection.
The results were consistent for both the 6M0J and 6LZG PDBs; therefore, we decided to focus only on the 6M0J MD trajectory.The average number of 5.1 pockets in the RBD conformation (from one to a maximum of nine pockets) results in a total set of 5065 pockets.Of these pockets, 50.4% were predicted to be druggable (druggability score ≥ 50%), and 40.6% of these pockets were highly druggable (druggability score ≥ 75%).Nearly one-third of the pockets (31.0%) included at least one of the 17 main RBM interacting residues and were located at the RBD-ACE2 interface.We also combined pocket analysis with the hot spot results obtained from the MM-PBSA study to identify the druggable binding sites containing these hot spots.Interestingly, almost all of these interface pockets (98.5%) included at least one of the nine hot spot residues (identified in Section 3.3).This is consistent with the fact that hot spot residues involved in energetic interactions have physicochemical properties and locations that promote pocket accessibility.
To determine how ACE2 affects RBD conformation and binding regions, we subsequently examined pockets extracted from the bound RBD system.The same pocket tracking protocol was applied to 1000 RBD conformations extracted from MD simulations of 6M0J in the RBD-ACE2 complex form.Therefore, a set of 4428 pockets with a pocket druggability frequency of 50.1% was extracted from the 6M0J RBD-ACE2 complex trajectory.This is close to the pocket druggability frequency of 50.4% of the 5065 pockets extracted from the unbound RBD trajectory.Fewer pockets (approximately 640 pockets) were observed in the RBM region of bound RBD.This can be explained by the presence of ACE2, which limits pocket formation in the RBM region.These results clearly show that consideration of the flexibility of the structure leads to the identification of comparable and replicable RBD pockets when utilizing isolated or complex forms, with the exception of the area in contact with ACE2.Comparable steady results were also obtained using 6LZG MD simulations (not shown).

RBD pockets clustering based on pocket residue similarity
After quantifying the druggable pockets on the two different systems, we performed a hierarchical classification of the pockets extracted from the 6M0J unbound RBD trajectory based on residue similarity.This classification of pockets based on residue composition allows the identification of clusters of pockets with common residues, corresponding to binding sites frequently observed along the MD simulations.Furthermore, the variability in residues within a pocket cluster illustrates the residue flexibility of the corresponding binding site.Highly similar classification results were obtained for pocket sets extracted from MD simulations of the 6M0J and 6LZG PDB.Here, the classification obtained for the 5065 pockets extracted from the 6M0J unbound RBD trajectory is presented.This classification resulted in the identification of eight main pocket clusters, enumerated as clusters I to VIII (Fig. 4).We analyzed these main clusters in terms of frequency, residue localization, stability, and druggability scores.
These eight clusters were observed in at least one-third of the 1000 conformations, and did not correspond to rare pockets (Table S2).They were regularly observed during the MD simulations.The least frequent (cluster VI) was observed in 32.3% of the conformations, whereas the most frequent (cluster I) was observed in approximately 90% of the conformations.
Moreover, these eight clusters were well-characterized in terms of druggability (Fig. 4 and Table S2).Four clusters were mainly druggable (II, III, V, VII, including 68.9-99.2% of druggable pockets), whereas four others identified clusters included a weak or moderate part of druggable pockets (I, IV, VI, VIII including 0.2-27.9% of druggable pockets).Interestingly, each cluster was defined by specific residues that constituted pockets that were well differentiated and localized in different regions of the RBD (Fig. 4).Clusters I, II, III, IV, and VI regrouped very similar pockets, as indicated by the small distances between the pockets within each cluster.Clusters V and VIII showed less pocket similarity in terms of residues, and each could be split into two main homogeneous sub-clusters.Cluster VII was the most variable and included three main subclusters (Fig. 4).

Selection of RBD druggable binding sites
A more detailed analysis was performed to study the properties and localization of the eight main clusters resulting from RBD pocket classification (Fig. 4) in order to extract druggable binding sites of interest.Six clusters (II to VII) were distant from the RBD-ACE2 interface, and two clusters (I and VIII) were located close to the RDB-ACE2 interface.
Two clusters (IV and VI) were moderately observed in the 1000 RBD conformations (48.8% and 32.3%) and were weakly druggable (27.9% and 7.4%, respectively).The cluster pockets corresponded to two exposed protein cavities in both the closed and open states of the spike protein trimer.These cavities did not establish contact with the ACE2 interface or another RBD monomer.Consequently, they did not form valuable target sites.The other four clusters (II, III, V, and VII) were observed in approximately half or more of the 1000 conformations and included mainly druggable pockets (> 68%).Cluster II was frequent (47.9%) and mainly druggable (99.2%).However, its pockets were located between the two spike protein interfaces in both states and are thus buried.Thus, this cluster cannot be targeted by therapeutic molecules.Cluster V was highly frequent (87.3%) and most of its pockets were druggable (93.0%).The corresponding pockets were highlighted as sites of interest in a previous study [33].However, this cluster included N343, which is covalently linked to N-acetylglucosamine in the RBD.Notably, Nacetylglucosamines were not used in the MD simulations in this study.Therefore, even without this information, our protocol could identify a region matching an N-linked glycosylation site.We do not consider this cluster of pockets a priority to be targeted by therapeutic molecules due to the presence of N-acetylglucosamine.
Cluster III was observed in half of the conformations (52.4%) and mainly included druggable pockets (85.9%) (Table S2).The pockets were located on the interface region between two RBD proteins in the closed state of the spike protein trimer (Fig. S6).This cluster regroups similar pockets (with 17 residues observed in more than 15% of its pockets) (Table S2).It is noteworthy that Carino et al. highlighted one potential druggable pocket from the static spike protein trimer (PDB ID:6VSB), but it corresponded only to a partial sub-pocket of ten residues of our cluster of pockets [33].Therefore, it may be of interest to design molecules that bind this cluster of 17 residue pockets to stabilize the closed state and avoid the activation of the spike protein trimer.This frequent and druggable cluster, named site 1 (Table 2), is a potential site of interest for the design of new therapeutic molecules.

Table 2
The three selected potential binding sites with their residue composition (only residue observed in more than 15% of their associated pockets), their occurrence on 1000 RBD conformations and the percentage of druggable pockets within each site.Hot spot residues are in bold.Cluster VII was frequent (70.4%) and included pockets mainly structurally located at the bottom end of the RBD and partially buried in the spike protein trimer.However, this cluster regroups pockets that vary in terms of residues; thus, cluster VII can be decomposed into four main pocket sub-clusters.Interestingly, the most frequent sub-cluster, named site 2, which was also observed in 35% of the conformations, corresponded to homogeneous and druggable pockets (87.8%) (Table 2).The pockets point toward another spike protein monomer and are consequently partially buried.This subcluster coincides with the pocket identified by Toelzer et al., in which linoleic acid is bound [67].Linoleic acid appears to stabilize the spike protein in its closed state and induces a reduced ACE2 interaction in vitro.Therefore, our protocol was successfully identified and characterized in terms of druggability and residue stability in this specific region, called site 2, which corresponds to the experimentally known binding site entrance of a small molecule.
We also extracted two frequent clusters, I and VIII, which were located close to the RBD-ACE2 interface.Cluster I was quasi permanent (86.6%), although it displayed no druggability (0.2%).It contains nearby pockets located on the RBD loop between residues T470 and P491 and thus excluded the association of every RBM region.Moreover, cluster I pockets were exposed to the solvent in both the closed and open states of the spike protein.Due to its location and undruggable properties (Table S2), cluster I is not suitable for targeting therapeutic molecules.
Our results showed that cluster VIII was frequent (78.0%) and partially druggable (21.7%).It is relatively variable and regroups three main subclusters located at the ACE2 interface.The most frequent sub-cluster, named site 3, was observed in 35.0% of the conformations and included 13.1% of druggable pockets (Table 2).It is centralized at the interface and includes three hot spots, K417, N501, and Y505, among the 18 residues constituting this pocket.Consequently, site 3 is particularly appealing as a targeted therapeutic molecule.
These sites are observed in more than 30% of the conformations and are associated with an average druggability score range of 13.1-87.7%.The pockets of these three sites included 20 residues on average (Table 2).The contribution of residues to their pocket cluster, as well as the druggability score, was made available for the three selected sites in Fig. S7.Along with these results, we have shown that the two most frequent sites, 1 and 2, are distant from the RBM, even though they display significant druggable scores.In contrast, site 3 has been classified as less frequent yet druggable.Compared to the other sites, an important feature of site 3 is its location, which is close to the RBD-ACE2 interface, and it includes three highly contributing hot spots: K417, N501, and Y505.Dokainish et al. identified several pockets with residues overlapping with those of binding site 3 [40].This site was targeted by nilotinib.The latter study, as many others, validates our protocol because we identified the same site.This site includes four hot spot residues, suggesting the significance of targeting it to disrupt the interaction between RDD and ACE2.
Ten representative pockets were selected for each of the three sites to illustrate the binding site and their flexibility (pocket PDB, residues, and global properties) that could be used for the design of novel compounds that target the spike protein.Table S3 provides descriptions of physicochemical, geometrical, and druggability scores.The ten selected pockets and their corresponding RBD conformations are freely available in the PDB format (https://data.mendeley.com/datasets/xhnjgfbgzr/1).

Potential impact of mutations on RBD hot spot and binding sites
It is known that the mutations associated with five variants of concern, Alpha, Beta, Gamma, Delta, and Omicron, cause a substantial increase in their transmissibility, virulence, and antigenic escape ability.During the evolution of the SARS-CoV-2 genome, RBD mutations had an impact on spike protein affinity for ACE2 to improve virus transmissibility and/or decrease its affinity for antibodies to improve its antigenic escape ability.The Alpha variant includes only one mutated residue, N501Y, whereas Beta, Gamma, and Omicron variants include both N501Y and K417N/T and E484K/A mutated residues.The Omicron variant also has 12 supplementary mutated residues (G339D, S371L, S373P, S375F, N440K, G446S, S477N, T478K, Q493R, G496S, Q498R, and Y505H).The Delta variant includes two specific mutations (L452R and T478K).
It is of interest to analyze the potential structural impact of RBD mutations of these variants of concern, notably, relative to the nine hot spots we have identified.These mutations directly impacted sites 1 and 3 and three hot spot residues, which correspond to the most frequent mutations, from site 3 (Table 3).
Two mutations (N501 and K417) corresponded to hot spots and played an important role in the stabilization of the interaction between RDB and ACE2.This confirms that these mutations can directly affect the affinity of the RBD for ACE2 and the interactions with antibodies targeting the RBM.The mutation N501Y is observed in Alpha, Beta, Gamma, and Omicron variants, suggesting that it provides SARS-CoV-2 with a selective advantage.E484 and K417 mutations are present in the N501Y mutation in the Beta, Gamma, and Omicron variants.The N501Y mutation counterbalances the decrease in ACE2 affinity due to mutations in K417 and E484, whereas the latter tends to enhance the ability of the variant to escape from neutralizing antibodies.This explains why these three mutations were selected simultaneously during the SARS-CoV-2 evolution.More specifically, the N501 hot spot is stabilized by a hydrogen bond with residue Y41 and contributes favorably to the binding free energy of the complex (Fig. 1D and Figs.2A and 3C).It is surrounded by a K353 hydrophobic alkyl chain and Y41 benzene ring.Understandably, a tyrosine switch is a better choice because a more favorable interaction can be made with the hydrophobic pocket, particularly π-π stacking with residue Y41.Thus, the mutation of N501 in Y501 can explain the enhanced affinity of the RBD to the ACE2 receptor [68].
E484 mutations are present with N501Y and K417 mutations in the Beta, Gamma, and Omicron variants.In the Beta and Gamma variants, the E484K mutation swaps the charge of the side chain.It is a considerable switch from negatively charged glutamic acid (−) to a positively charged lysine (+).In the Omicron variant, the E484A mutation also induces loss of the negative charge carried by the E residue.It is important to note that E484 is located on the RBD loop (amino acids 470-490) and is enclosed by several negatively charged residues, E35 and E75, from the ACE2 receptor.The position of three neighboring glutamic acids could be unfavorable due to repulsive forces; thus, introducing a positively charged residue or a small neutral residue might create a more favorable interaction between the two proteins.This may explain the unfavorable free energy decomposition of E484 during MM-PBSA analysis (Fig. 2 A).Similar to N501, E484 is a critical epitope residue for SARS-CoV-2 neutralizing antibodies.Therefore, charge change can also be a method to alter the electrostatic complementarity of known antibodies binding to this region, leading to better virus adaptation [69].
Concerning the K417 mutations, some studies showed that they may be responsible for increased binding with ACE2 and a decreased affinity for SARS-CoV-2 antibodies when combined with N501Y and E484K [70].K417 forms a steady salt-bridge with residue D30 and has a favorable energy contribution to the complex (Figs.1B, 2 A, and 3B).Consequently, abolishing this strong interaction decreases the binding affinity of the RBD and ACE2 complexes.However, the K417 mutation is only present with the N501 and E484 mutations, which may compensate for the loss of affinity with the ACE2 receptor by forming new favorable interactions.Additionally, a recent study showed that K417 is another critical epitope that forms strong salt bridges with SARS-CoV-2 antibody residues [68].Thus, the reason behind the K417 abrupt change to asparagine is that viral adaptation is vital and overrides the binding affinity.In fact, with this triple mutation, SARS-CoV-2 may be harder to handle and can easily escape antibodies.The explanation for the K417 change to N or T may be a viral adaptation to decrease the affinity of antibodies for spike protein and evade the immune system.Regardless of the reasons for the K417 mutation, the replacement of a residue with a positive charge by residues with a hydrophilic side chain should be considered for drug design.
These mutations directly impacted three hot spot residues, which corresponded to the most frequent mutations.Thus, we analyzed the potential structural impact of RBD mutations of these variants on the three selected binding sites.These mutations directly affected sites 1 and 3. Additionally, site 3 would be greatly impacted by three hot spot residues, which correspond to the most common mutations in the variants of concern (Table 3).
For the Delta variant, L452 and T478 mutations did not occur in residues forming the highlighted sites and were far from the hot spot residues, indicating that inhibitors targeting these sites would not require adaptation to treat this variant.
The Omicron variant was found to be highly mutated.For example, 15 mutated residues were found in the RBD region alone (G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H).Only the Omicron S375F mutation has been observed to impact site 1 (Fig. 6A).Currently, there is no available scientific information on how this mutation affects the SARS-CoV-2 life cycle.S375 was exposed to the solvent in both the open and closed states of the spike protein trimer and did not form a non-covalent bond with any other residue.As a result, it does not appear to play a special role in the spike protein trimer action.Because the alcohol group in the spike residue was replaced by an aromatic cycle in the F residue, the S375F mutation only slightly altered the shape and hydrophobic properties of site 1.Therefore, site 1 seems to be relatively well-conserved among the variants of concern.In summary, site 3 is located on the RBD-ACE2 interaction interface and is formed by several hot spot residues; therefore, it may be of interest to focus on this region with the aim of targeting it with inhibitory molecules.tofocus on this region with the aim of targeting it with inhibitory molecules.However, because of several mutations that occur in different variants of concern, site 3 is difficult to block, in contrast to sites 1 and 2.Even though site 1 seems to be weakly affected by only one mutation, site 2 is currently not affected by any mutation.These two sites can be targeted by inhibitors that would be effective for all currently known variants.
Sites 1 and 2 are contiguous on the RBD surface.Interestingly, a neutralizing antibody isolated from convalescent Covid-19 patients and named CR3022, can bind to the RBD surface corresponding to sites 2 and 3 [71].This interaction occurs only when two of the three spike proteins in the spike protein trimer are in an open state.Therefore, these sites are accessible to the inhibitory molecules.In the case of mutations that prevent the binding of neutralizing antibodies, targeting these sites by inhibitory molecules seems to be a promising therapeutic approach; notably, site 2 is currently not affected by any mutation.

Conclusions
In this study, we analyzed the dynamic behavior of the unbound RBD in complex with the ACE2 protein to identify binding sites that can be targeted by inhibitory molecules.We proposed a protocol combining MD simulations, hot spot identification, pocket tracking along the simulations, and pocket druggability prediction using a supervised method.Then, an unsupervised machine learning analysis was applied to cluster similar RBD pockets in terms of residue composition.This protocol allows the identification of druggable binding sites frequently observed along the MD simulations, the characterization of their residue flexibility and of key residues contributing to the druggability.
The stability of these pocket clusters was verified on different PDBs, in apo and holo forms, which allowed us to select the most pertinent druggable binding sites.Based on their RBD localization and druggability assessments, these three sites seem to be particularly promising.The potential effect of mutations in the variants of concern on these three binding sites was investigated.
Site 3 is located at the RBD-ACE2 protein interaction region and is formed by four hot spot residues.It is therefore an interesting target to disrupt the interaction between the spike protein and ACE2 protein and, consequently, to prohibit SARS-CoV-2 entry into the cell.However, several mutations of residues forming this site have been observed in SARS-CoV-2 variants.This suggests that if an inhibitory molecule can be designed against this site, it should be efficient for only a limited number of variants.Site 2, in which linoleic acid interacts to lock the spike protein in closed form, is relevant to the target.It is highly druggable, undergoes only one mutation in the Omicron variant, and is consequently an interesting site for targeting.Site 1 is highly druggable, accessible and no mutations were observed at this site.To our knowledge, this is the first time it has been characterized using in silico methods.In the spike protein trimer, we can observe that three sites 1 observed on each spike protein monomer are near in space (Fig. S6).This is also a promising target region.This suggests that the design of molecules able to bind to at least two sites 1 located within two RBD monomers may prohibit the transition between the inactive closed form and the active open form of the spike protein.
In summary, our combined protocol provides new insights and highlight opportunities on three binding sites for the development of inhibitors of the RBD of the spike protein.

Fig. 2 .
Fig. 2. (A)The energy components of the nine residues on the RBD that contribute significantly to the ΔG of binding with ACE2.The average binding energy components and standard deviations are calculated from the MD simulations of the RBD-ACE2 complex (PDB IDs 6M0J and 6LZG).(B) RBD displayed in cartoon representation.The hot spot residues are indicated by cyan sticks.

Fig. 4 .
Fig. 4. Hierarchical classification of 5065 RBD pockets extracted from 1000 conformations of unbound RBD (6M0J).Each column corresponds to a pocket.First-color bars are colored according to four levels of druggability scores: respectively in black, grey, pink, and garnet for the non-druggable [0.00, 0.25], less druggable [0.25, 0.50], moderately druggable [0.50, 0.75] and highly druggable [0.75, 1.00] pockets, as predicted using PockDrug [41].Second-color bars are colored based on the eight main clusters of pockets (noted I to VIII).At the bottom, the residues are ordered according to the numbering of the protein sequence and indicated in black when involved in the pocket.The three blue rectangles indicate the three sub-cluster of pockets corresponding to selected sites of interest.(Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Representation of the three sites of interest selected on the RBD surface (site 1 represents the trimerization pocket, site 2 indicates a highly druggable pocket capable of holding a drug candidate, and site 3 corresponds to the interface region pocket).

Table 1
Average binding free energy and standard deviations in the MD simulations of the RBD-ACE2 complex for PDB IDs 6M0J and 6LZG.

Table 3
Mutations in RBD observed in the SARS-CoV-2 variants of concern.The hot spot residues identified by the MM-PBSA analyze are in bolt.