DINC-COVID: A webserver for ensemble docking with flexible SARS-CoV-2 proteins

An unprecedented research effort has been undertaken in response to the ongoing COVID-19 pandemic. This has included the determination of hundreds of crystallographic structures of SARS-CoV-2 proteins, and numerous virtual screening projects searching large compound libraries for potential drug inhibitors. Unfortunately, these initiatives have had very limited success in producing effective inhibitors against SARS-CoV-2 proteins. A reason might be an often overlooked factor in these computational efforts: receptor flexibility. To address this issue we have implemented a computational tool for ensemble docking with SARS-CoV-2 proteins. We have extracted representative ensembles of protein conformations from the Protein Data Bank and from in silico molecular dynamics simulations. Twelve pre-computed ensembles of SARS-CoV-2 protein conformations have now been made available for ensemble docking via a user-friendly webserver called DINC-COVID (dinc-covid.kavrakilab.org). We have validated DINC-COVID using data on tested inhibitors of two SARS-CoV-2 proteins, obtaining good correlations between docking-derived binding energies and experimentally-determined binding affinities. Some of the best results have been obtained on a dataset of large ligands resolved via room temperature crystallography, and therefore capturing alternative receptor conformations. In addition, we have shown that the ensembles available in DINC-COVID capture different ranges of receptor flexibility, and that this diversity is useful in finding alternative binding modes of ligands. Overall, our work highlights the importance of accounting for receptor flexibility in docking studies, and provides a platform for the identification of new inhibitors against SARS-CoV-2 proteins.


Introduction
The respiratory disease COVID-19, caused by the novel coronavirus SARS-CoV-2, went from an outbreak to a pandemic in just a few months [1]. In response, there have been unprecedented global efforts to develop vaccines and effective treatments. Several effective vaccines have been validated and used in massive vaccination campaigns [2], particularly in developed countries. However, the need for pharmacological treatments for infected patients persists due to unequal vaccination coverage across the globe [3,4] and to the rise of more virulent variants that can cause symptoms even in fully-vaccinated individuals [5,6].
Among potential pharmacological targets, proteins involved in the viral replication have been used in several computational studies focused on drug design, drug repurposing and virtual screening [7,8]. An impressive number of SARS-CoV-2 protein structures have been solved in a short period of time, but computational studies targeting these proteins have been mostly limited to the use of a single experimental structure [9][10][11][12][13][14][15][16]. Although this approach is expected to correctly reproduce similar experimental structural data, it tends to fail in exploring the binding of ligands with diverse chemical features (e.g., large ligands or ligands with alternative binding modes), as it does not account for the inherent flexibility of proteins in solution [17][18][19].
One example of the importance of protein malleability on SARS-CoV-2 drug screening involves the Main protease (Mpro). Although Mpro has been, so far, one of the most explored SARS-CoV-2 targets in computational studies, there remain numerous open questions for the design of effective inhibitors. Indeed, the malleability of the Mpro active-site cavity remains the greatest challenge in the development of effective inhibitors. By collecting X-ray data from ligands bound to Mpro at room temperature, Kneller et al. have demonstrated experimentally that Mpro has the ability to substantially distort its shape in response to binding [18]. Such malleability allows Mpro to accommodate a larger diversity of physico-chemical features (e.g., chemical groups, size, charge distribution) in ligands.
Accounting for protein flexibility adds complexity and increases the computational cost of molecular docking studies. Therefore, explicitly sampling full receptor flexibility while also sampling alternative ligand conformations is generally unfeasible. Various strategies have been proposed to make this problem computationally tractable [20]. A popular solution involves explicitly sampling only selected parts of the receptor, as in selective docking with flexible binding-site residues. Another one, known as ensemble docking, only implicitly accounts for full receptor flexibility [21]: a separate sampling method is used to explore protein flexibility, and a set of representative conformations is extracted to create an ensemble for docking [22]; ligand sampling is then conducted as in a regular molecular docking job. By docking the ligand against all conformations in the ensemble, rather than a single conformation, ensemble docking implicitly accounts for receptor flexibility [23,24].
Note that each of these strategies suffers from its own limitations and that successful predictions usually require significant knowledge about the system of interest and the required methodologies. On one hand, selective docking requires knowledge of which residues should be prioritized for explicit sampling during docking. In addition, execution time grows exponentially with the number of flexible bonds, which might prevent the exploration of complex binding sites or large ligands. On the other hand, ensemble docking requires expertise on tools that can sample protein conformations (e.g., molecular dynamics, Langevin dynamics, Monte Carlo simulations, or coarse-grained simulations [25,26]). These methods can produce a huge number of conformations, in turn requiring user expertise on methods that can select representative conformations (e.g., dimensionality reduction, clustering, or free energy estimation) [22,27]. All these tasks are very time-consuming, taking days to weeks to implement and execute, depending on available computational resources [22,23].
Despite these challenges, recent studies have highlighted the importance of considering multiple receptor conformations in molecular docking studies [22,24,28], and even presented preliminary results on the use of ensemble docking to screen for drug inhibitors against SARS-CoV-2 proteins [23,29]. However, the resources and expertise required for conducting an ensemble docking study still prevents its use by most researchers interested in testing new drugs or natural products against SARS-CoV-2 proteins.
To bridge this gap, we present DINC-COVID, a user-friendly webserver for ensemble docking of small molecules and peptides to SARS-CoV-2 proteins. The sampling of binding modes is performed with DINC, a parallelized meta-docking approach that has been shown to outperform conventional docking tools on several challenging datasets, especially for large or flexible ligands (e.g., peptides or peptidomimetics) [30]. The proteins currently available for docking through DINC-COVID are the Main protease (Mpro) [9][10][11][12][13][14][15][16], the Papain-like protease (PLpro) [31], and the RNA-dependent RNA polymerase (RdRp) [32][33][34]. For Mpro, in addition to the catalytic site, we allow for predictions targeting an allosteric site [35]. Note that the ensembles we have pre-computed cover different ranges of protein flexibility, for each targeted binding site. This initial selection of SARS-CoV-2 proteins and binding sites was performed based on the availability of crystallographic structures in the Protein Data Bank (PDB) and the potential importance for drug discovery. Nonetheless, our team is continuously creating additional ensembles, covering other targets of interest, which will be later added to the DINC-COVID webserver.
We have carried out a proof-of-concept validation of DINC-COVID using datasets of experimentally-determined inhibitors for SARS-CoV-2 proteins. As such data are still scarce, we have focused our validation on a small dataset of PLpro inhibitors [31] and three small datasets of Mpro inhibitors [18,36,37]. For all datasets, we have obtained good correlations between predicted binding energies and experimentally-determined IC 50 values. We have also compared DINC-COVID against two other online servers targeting SARS-CoV-2 proteins, namely the COVID-19 Docking Server [38], and the DockThor-VS webserver [39]. Note that DINC-COVID is the only webserver that performs the simultaneous docking of a ligand against multiple receptor conformations and automatically aggregates the results (as suggested by the term ensemble docking), while the two other webservers provide more docking targets. We provide a detailed discussion on how our results are affected by choices related to method settings, dataset composition and statistical analysis. Finally, we provide further insight on the range of receptor conformational flexibility captured by DINC-COVID predictions, depending on the data/methodology used to generate the input ensemble and on the scoring function used to rank the output binding modes.

Overview of DINC-COVID ensemble docking strategy
To enable fast predictions of protein-ligand binding modes while accounting for receptor flexibility, we have decoupled the steps required for ensemble generation from those directly related to the ensemble docking procedure. For each binding target of interest, three ensembles have been pre-computed and stored on our webserver (see Fig. 1). Therefore, all ensembles mentioned in this manuscript are available for docking. After accessing the DINC-COVID webserver, users can choose a binding target, select one of the available ensembles, and upload a ligand of interest (e.g., a drug-like ligand or a peptide). These files are then used as input for ensemble docking, which involves the parallelized meta-docking approach DINC [30]. All generated binding modes are rescored and ranked using three scoring functions: AutoDock Vina [40] (a.k.a. Vina), AutoDock 4 [41] (a.k.a., AD4) and Vinardo [42]. Finally, the top-scoring binding modes (for each scoring function) are returned to the user, reflecting the flexibility of both the ligand and the receptor.

Pre-computed receptor ensembles
Four target binding sites of SARS-CoV-2 proteins are currently available through the DINC-COVID webserver: catalytic site and allosteric site of Main protease (Mpro), catalytic site of Papain-like protease (PLpro), and catalytic site of RNA-dependent RNA polymerase (RdRp) (see Fig. 2). For each target binding site, we first compiled three ensembles of conformations (see Appendix A for more information): one ensemble containing crystal structures, and two ensembles containing conformations extracted from in silico molecular dynamics (MD) simulations.
We performed the MD simulations with the GROMACS 2019 package [43], using two different force fields: CHARMM36 [44] and GRO-MOS53a6 [45]. With each force field we ran five independent MD simulations of 200 ns, for a total of 1 μs. All simulations were performed with full-atom representation and explicit water molecules. Note that CHARMM36 simulations used TIP3 water models, while GROMOS53a6 used SPC water models. In addition, ions Na+ and Cl-were added at concentration of 0.15 M to neutralize the net charge of the system. Each system was minimized through the steepest descent algorithm, followed by two equilibration steps of 1 ns each, using NVT and NPT ensembles, respectively. Positions of protein atoms were restrained during equilibration. During the simulations, long-range electrostatics were modelled with the Particle Mesh Ewald (PME) method [46], temperature coupling was set at 310.15 K using the V-rescale thermostat [47] and the Parrinello-Rahman barostat [48] with a reference pressure of 1 bar; a compressibility of 4.5 × 10 − 5 bar − 1 was applied for pressure control. Covalent bonds were constrained to their equilibrium length by the LINCS algorithm [49]. The integration steps of all simulations were set to 2 fs. The stability of each structure during the simulation time was assessed through root-mean-square deviation (RMSD) calculations using the "gmx rms" algorithm. Eventually, we extracted a set of roughly 100, 000 conformations from each MD simulation with MDtraj [50]. These come in addition to the 179 crystallographic structures (156 for Mpro, 12 for PLpro, and 11 for RdRp) we extracted from the Protein Data Bank (PDB).
To create the final ensembles we implemented a data-driven protocol that could extract representative conformations using algorithms from the scikit-learn package [51]. Briefly, minimal distances between all pairs of amino acids in the targeted binding site (i.e., distances between the closest pair of heavy atoms belonging to two residues) were used as features for a Principal Component Analysis (PCA). The first principal components accounting for around 80% of variance in the data were used to project each set of conformations into a lower-dimensional space. In this space, the Elbow method of scikit-learn was used to determine the ideal number of clusters (K). Representative members of each cluster were identified with K-means, and extracted to build the final ensembles. Although other strategies could have been adopted, this is a reasonable protocol to build a representative ensemble of conformations, which has produced good results in our validation experiments (see Results section). Finally, all selected structures were protonated at pH 7.0 using the PROPKA algorithm [52] at the PDB2PQR server [53]. In summary, for each target binding site, we produced three ensembles: one based on crystal structures (hereafter called Crystal ensemble) and two based on MD simulations with different force fields (hereafter called Charmm ensemble and Gromos ensemble). Note that for each receptor, these three ensembles capture different ranges of receptor flexibility and are therefore all useful for ensemble docking (see Section 3.3).

Live webserver docking procedure
DINC-COVID uses parallelization to speed up both the sampling and the scoring of protein-ligand binding modes (see Fig. 3). Once a ligand structure and an ensemble of receptor conformations have been selected as input, the algorithm triggers multiple parallel docking jobs with the fast docking method Vina. Each independent job starts with a distinct, randomized conformation of the ligand and one of the receptor conformations from the selected ensemble. This first batch of docking jobs produces a diverse set of binding modes for the ligand against a single receptor conformation. This process is then repeated for each of the receptor conformations available in the ensemble. Once all docking jobs are completed, the rescoring phase starts. In this phase, all binding modes predicted for each receptor conformation are rescored using three popular scoring functions: Vina, Vinardo and AD4. At the end of the whole procedure, a user-specified number of top scoring conformations is provided as output, for each scoring function. These results include both alternative conformations of the ligand and alternative conformations of the receptor.

DINC-COVID webserver interface and infrastructure
The DINC-COVID ensemble docking procedure is made available have been pre-computed: different shades represent different conformations within each ensemble. After uploading a ligand of interest and selecting one of the ensembles, the user can submit a job that will execute the live steps of ensemble docking, scoring and ranking, eventually returning the best results to the user. BS, binding site. through an intuitive webserver interface (see Fig. 4). It allows users to upload a ligand of interest and select the targeted receptor ensemble. Optional ligand preparation (e.g., adding hydrogens and charges) is also available. Note that we plan to allow for batch submission in future work. After submission and execution, users can visualize the selected binding modes through the viewer embedded in the "Results" page. If they want to perform offline analyses, users can download all the results, which include top-scoring binding modes in pdb format, as well as a text file listing the selected conformations and their binding energies with respect to all scoring functions. More details are available at dinc-covid. kavrakilab.org/method/.
The DINC-COVID webserver is implemented using Docker [54], allowing it to run with its own dependencies on any machine with Docker installed. The main backend is implemented with Django [55]. Submitted jobs are managed by Celery [56], a distributed task queue. The webserver is currently hosted on a 16-cores virtual machine in the Owl Research Infrastructure Open Nebula (ORION) VM Pool on Rice University Campus. DINC-COVID currently uses 16 parallel threads, and our virtual infrastructure allows for future expansion.

Proof-of-concept validation
As in any virtual screening study, identifying better inhibitors for SARS-CoV-2 proteins requires ranking sets of ligands with respect to their estimated binding affinity with a given receptor. To validate the ability of our method to accurately rank ligands, we computed correlations between docking-derived binding energies and the natural logarithm of experimentally-determined half-maximal inhibitory concentration, ln(IC 50 ); these correlations were evaluated using Pearson's r and Spearman's ρ. As experimental data, we used ln(IC 50 ) instead of IC 50 because it is expected to better correlate with predicted binding energies [57][58][59][60]. As DINC-COVID-predicted data, for each ligand and each scoring function, we used the binding energy predicted for the top-scoring binding mode of this ligand according to this scoring function. Note that our goal was to assess the overall ranking accuracy of our ensemble docking results, as opposed to assessing the accuracy of individual binding energy predictions.
To compare DINC-COVID with other methods, we performed the same experiments with the COVID-19 Docking Server (hereafter referred to as COVID-19-DS) [38] and the DockThor-VS webserver [39,61]. For COVID-19-DS we selected Mpro or PLpro as the "nCoV Protein Target", with the recommended exhaustiveness of 8. Although the top 10 models are returned with a score value (in kcal/mol), we used only the score value of the top 1 model to perform our correlation analysis.
For DockThor-VS, we selected either the Nsp5-Main protease (wild type) or the Nsp3-PLpro (wild type) as the target protein. Since two crystal structures of the Mpro receptor (with PDB codes 6LU7 and 6W63) are made available, we used both of them as input receptors for independent docking runs. Similarly, for PLpro, the structures with PDB codes 6W9C and 6WX4 were used as input receptors for independent docking runs. As docking parameters, we manually defined the catalytic binding site of both proteins and used the standard algorithm precision (i.e., 1,000,000 evaluations, population size of 750, and 24 runs). DockThor-VS provides a single output binding mode with a corresponding binding energy, which we used to compute correlations with experimental data.
We used four datasets of ligands with experimentally-determined binding affinities in our validation experiments: one for the PLpro receptor and three for the Mpro receptor. More specifically, to create the first dataset, we selected seven PLpro inhibitors with known IC 50 values published by Osipiuk et al. [31]: The first Mpro dataset comprises 14 compounds identified by Li et al. through free energy perturbation-based virtual screening and validated experimentally through an enzymatic assay [36]. In their study, IC 50 values for all compounds were measured through a fluorescence assay using an inhibitory curve with 500 nM of enzyme, 20 μM of substrate and six different concentrations of each compound. The second Mpro dataset was initially gathered by Ngo et al. to validate a free energy perturbation protocol to assess Mpro inhibitors during virtual screening [37]. It comprises 11 Mpro inhibitors whose experimental IC 50 values were retrieved from the literature and therefore obtained in different experiments. The third Mpro dataset was obtained from a study demonstrating the capacity of leupeptin and three hepatitis C clinical protease inhibitors to bind and inhibit SARS-CoV-2 Mpro [18]. Kneller et al. characterized these four ligands by X-ray crystallography at near-physiological room temperature, thus capturing Mpro motions that would not be observable at lower temperature.

Results
The accuracy of molecular docking tools is usually assessed through self-docking experiments, where protein-ligand complexes previously determined by experimental methods are used as references to be reproduced. In a self-docking experiment, the ligand conformation is randomized before docking, and the quality of predicted binding modes is quantified by computing their deviation from the reference experimental structure. These experiments are useful to validate the geometries of predicted binding modes, which is usually summarized in terms of RMSD values. However, it is important to highlight that reproducing the bound geometries of available crystal structures is not the intended application of our webserver. DINC-COVID's goal is to efficiently account for receptor flexibility during the docking of a ligand. This allows for the sampling of alternative low-energy binding modes that are not captured by rigid crystal structures or by conventional molecular docking techniques. Therefore, by design, the geometries of output complex conformations can significantly differ from those observed in available crystal structures, especially when using an ensemble of receptor conformations derived from molecular dynamics. Our underlying hypothesis is that these alternative lower-energy binding modes can better approximate the population of receptor-bound ligand conformations that exist in solution (in vitro/in vivo).

DINC-COVID predictions for PLpro inhibitors show high correlation with experimental data
To evaluate DINC-COVID predictions, we used a recently published dataset of seven PLpro inhibitors (see Table 1) [31]. This dataset is mostly composed of small ligands (i.e., up to seven flexible bonds) that are all strong binders (i.e., IC 50 values ranging from 2.3 μM to 43.2 μM).
It is therefore quite challenging with respect to the task of reproducing a ranking of binding affinities.
In spite of that, we achieved high correlations between ln(IC 50 ) values and binding energies predicted by DINC-COVID using the Crystal ensemble (i.e., Pearson's r = 0.9 and Spearman's ρ = 0.89), far above correlations achieved by binding energies predicted by other ensembles and by other servers (see Table 1). In addition, when comparing predicted binding modes with crystal structures available for three of these compounds, it appears that DINC-COVID produced binding modes that are very similar to the corresponding crystal structures (see Fig. 5).
The best results were obtained with the Vina scoring function and the Crystal ensemble, which might be due to the nature of this dataset (i.e., small ligands and limited receptor flexibility). However, good correlations were also observed in other settings, for example using Vinardo or the Charmm ensemble (see Table 1). Note that the highest correlation with ln(IC 50 ) values is not a spurious one (see Appendix B, Fig. 11). Although the small size of this dataset is likely to have contributed to this high correlation, the structural agreement between predicted binding modes and experimental structures corroborates the validity of this correlation. In addition, the structural examination of binding modes provides a rationale for the lower correlations observed for other servers in this experiment.

DINC-COVID rankings of Mpro inhibitors show good agreement with affinity-based rankings
To further validate DINC-COVID predictions, we used three datasets of ligands that were tested in inhibitory assays with Mpro, and for which IC 50 values are available.
The first dataset contains 14 drug-like ligands that are experimentally characterized SARS-CoV-2 Mpro inhibitors (see Table 2) [36]. Performing a docking experiment with this dataset produced rather low correlations with experimental ln(IC 50 ) values for all tested settings. According to Pearson's r, the best correlation is observed for DockThor-VS using the 6LU7 receptor (r = 0.36), but according to Spearman's ρ, the best correlation is observed for DINC-COVID using Vinardo and the Charmm ensemble (ρ = 0.55). This experiment highlights the challenges of dealing with outliers (see Appendix B, Fig. 12). Indeed, two ligands have a particularly strong impact on the obtained correlations (see Table 2). Removing Indinavir from the correlation analysis produces better correlations for all settings, the best results being achieved by DockThor-VS with the 6LU7 receptor (Pearson's r = 0.65) and DINC-COVID with Vinardo and the Charmm ensemble (Spearman's ρ = 0.75). Although removing Dipyridamole alone from the correlation analysis does not produce substantial changes, removing both Dipyridamole and Indinavir substantially improves Pearson's correlations, with the best results being achieved by DockThor-VS with the 6LU7 receptor (r = 0.74) and DINC-COVID with Vinardo and the Charmm ensemble (r = 0.65). These results show that, when considering only the 12 remaining ligands, good agreement is reached between predicted binding energies and experimental binding affinities. The second one is a mixed dataset composed of 11 ligands of various types that are experimentally characterized SARS-CoV-2 Mpro inhibitors (see Table 3) [37]. This dataset is quite challenging for the task of reproducing a ranking of binding affinities because all ligands are very strong binders, with IC 50 values ranging from 0.04 μM to 21.39 μM.
After performing a docking experiment with this dataset, reasonably good correlations between predicted binding energies and ln(IC 50 ) values were obtained across most settings (see Table 3). The best correlations were achieved by COVID-19-DS (with Pearson's r = 0.74 and Spearman's ρ = 0.73). The second best correlations were achieved by DINC-COVID using the Crystal ensemble, with Vinardo according to Pearson's r (= 0.7), but with Vina according to Spearman's ρ (= 0.68).
Results obtained on this dataset are also strongly influenced by two outliers: 13a and Shikonin (see Appendix B, Fig. 13). Removing these outliers from the correlation analysis leads to improved correlations for all methods. The best correlations are then achieved by DINC-COVID using Vina and the Gromos ensemble (Pearson's r = 0.88 and Spearman's ρ = 0.95). This represents a very good agreement between predicted binding energies and binding affinities.
The third dataset contains four peptidomimetics that are known antiviral compounds: Leupeptin, Telaprevir, Narlaprevir and Boceprevir (see Table 4) [18]. At the time the Mpro Crystal ensemble for DINC-COVID was built, no experimental structure of Mpro bound to these compounds was available. Indeed, it was only recently that the Table 1 Correlations between docking-derived binding energies and experimentally-determined binding affinities of PLpro drug-like inhibitors [31]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC 50 ) values are reported below; in each row, the two highest coefficients are highlighted. crystallographic structures of these complexes were solved at room temperature (and deposited under PDB codes 6XCH, 6XQS, 6XQT, and 6XQU). Working at room temperature allowed for the induced-fit process to occur without the constraints of crystallographic packing. Even without access to these additional structures, DINC-COVID was able to reproduce the experimentally-determined binding affinities with good correlation (i.e., Pearson's r = 0.86), using Vina and the Charmm ensemble. Unfortunately, due to the very small number of ligands, this correlation is not of high quality (see Appendix B, Fig. 14). However, other correlations (of higher quality) achieved by DINC-COVID are superior to those achieved by DockThor-VS and COVID-19-DS (see Table 4). Interestingly, results produced by the Vinardo scoring present the worst correlations for all three DINC-COVID ensembles in this experiment. This is in contrast with good results produced by Vinardo in previous experiments, and might indicate a limitation of this function when ranking peptidomimetic ligands. These three datasets contain Mpro inhibitors with experimentally determined IC 50 values. However, these values were produced by different groups using different protocols, and might not be fully comparable. That is why we initially decided to use these datasets separately. On the other hand, working with small datasets introduces other issues regarding the reliability of the obtained ligand rankings. Therefore, we performed an additional correlation analysis in which the three Mpro datasets were combined into a larger one. Although the observed correlations are not very high, they are positive for all methods (see Table 5). The best correlations were achieved by DINC-COVID using  Fig. 16). This analysis confirmed the good performance of DINC-COVID when using Vinardo and the Crystal ensemble, but also indicated a competitive performance of DockThor-VS when using the 6LU7 receptor.
Taken together, these results constitute a proof-of-concept validation of the predictive capabilities of DINC-COVID, as it consistently produced predictions in good agreement with experimental data across all studied datasets. The results also provide insights on the differences between docking outputs produced by different ensembles, as well as a comparison with webservers that do not perform ensemble docking. For instance, when making predictions for ligands that do not require significant adjustments in the receptor conformation, good results can be obtained with an ensemble of crystal structures (i.e., with only limited receptor flexibility) and even without an ensemble docking protocol, as illustrated by results obtained with the other webservers (see Tables 2  and 3). On the other hand, MD-derived conformations have greater chances of successfully accommodating ligands that require greater receptor flexibility for binding (see Table 4).

DINC-COVID predictions capture different ranges of receptor flexibility
Overall, our results highlight the benefits of using ensemble docking as an efficient way to account for receptor flexibility in molecular docking targeting SARS-CoV-2 proteins. In DINC-COVID, receptor flexibility is not adjusted on-the-fly during docking, as it is accounted for implicitly through the use of pre-computed ensembles [20]. However, we tried to compensate for this limitation by generating different ensembles for each target binding site. Having several pre-computed ensembles at their disposal allows users to explore different ranges of receptor flexibility, according to their needs.
The differences in range between the provided ensembles are real and can be quantified (see Fig. 6). For instance, the Crystal ensemble created for Mpro captures mostly side-chain rearrangements in the Table 2 Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro drug-like inhibitors [36]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln(IC 50 ) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., Dipyridamole and Indinavir) are also reported.

Table 3
Correlations between binding energies and experimental binding affinities of Mpro inhibitors [37]. Binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Correlation coefficients between binding energies and ln (IC 50 ) values are reported below; in each row, the two highest coefficients are highlighted. Results for correlation analyses without outliers (i.e., 13a and Shikonin) are also reported. Most importantly, these differences between ensembles are useful, since the best results are not always provided by the same ensemble. As an extension, the diversity within ensembles is also useful, since the best ligand binding modes are not always obtained with the same receptor conformation from a given ensemble (see Fig. 7). In summary, our analysis shows that both levels of diversityi.e., within an ensemble, and between ensemblescontribute to the good quality of DINC-COVID predictions.
A third level of diversity in DINC-COVID output is introduced by the scoring function used to rank all sampled conformations. Our results show different frequencies for the selection of receptor conformations, depending on the combination of scoring function and conformation ensemble (see Fig. 7). Some of the starkest differences are observed when using the Charmm ensemble: the greatest diversity in receptor conformations included in the top conformations is observed with the AD4 scoring function, while Vina and Vinardo produce lower levels of diversity.
It is worth emphasizing that in its current implementation DINC-COVID is not automatically selecting one of the available scoring functions, and is not providing a consensus ranking across all three functions. Instead, each output produced by the server includes the three alternative rankings of all sampled conformations, according to the three available scoring functions. This approach provides users with the Table 4 Correlations between docking-derived binding energies and experimentally-determined binding affinities of Mpro peptidomimetic inhibitors [18]. Predicted binding energies are reported for DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). Pearson's correlation coefficient between binding energies and ln(IC 50

Discussion
DINC-COVID offers a ready-to-use solution for researchers to account for protein flexibility while testing compounds against SARS-CoV-2 proteins. It allows users to run ensemble docking experiments without the additional burden of time-consuming simulations required for ensemble generation and docking preparation. Since sampling conformational flexibility for large proteins and multimeric complexes is a challenging and computationally intensive task, we decoupled it from the docking procedure itself. By making the pre-computed ensembles readily available in our webserver, we aim to facilitate and speed up the use of ensemble docking by a broader audience of users searching for new SARS-CoV-2 inhibitors.
Users of our webserver can choose between four target binding sites: the catalytic site of Main protease (Mpro), Papain-like protease (PLpro) and RNA-dependent RNA polymerase (RdRp), as well as the allosteric binding site of Mpro. For each target binding site, three distinct ensembles are provided, including experimental data (i.e., structures from X-ray crystallography) or simulated data (i.e., structures from molecular dynamics with different force fields). The choice of the force field has significant impacts on the results of MD simulations, since it encompasses a series of empirically-determined parameters [62,63]. By relying on two force fields that use distinct representations for hydrogen atoms and different water models, we aim to limit the impact of force field bias on conformational sampling.
It is important to recognize that the conformational ensembles available on the DINC-COVID webserver reflect different scales of protein flexibility, from subtle side-chain rearrangements (e.g., in the Crystal ensemble) to larger backbone motions (e.g., in the Gromos ensemble). Note that we have carefully prepared the structures provided in the ensembles, which includes verifying the protonation state of His, Glu and Asp residues. We have also provided users with the possibility of automatically preprocessing the ligand before docking. Altogether, these features make our server of potential interest even for advanced users, adding robustness to docking analyses with SARS-CoV-2 proteins.
The main contribution of our webserver is to account for receptor flexibility while sampling ligand binding modes. This is made possible through a parallelized ensemble docking protocol using DINC. DINC has been extensively validated in previous work, through both self-docking and cross-docking experiments [30,64,65]. On the other hand, the DINC-COVID webserver cannot be properly assessed through conventional self-docking experiments, since the goal of ensemble docking is precisely to find alternative low-energy binding modes. For instance, when using an MD-derived ensemble, the best ranked receptor conformations will usually not be the ones with the lowest RMSD to a reference crystal structure. Our approach relies on the fact that a set of diverse receptor conformations can accommodate a much broader range of ligand conformations and sizes than a single receptor conformation [22]. To sum up, the goal of ensemble docking is not to reproduce binding geometries observed in crystallographic structures, but to produce conformations that better reflect the possible binding modes of a protein-ligand complex in solution.
In this context, we searched for experimental binding affinities of inhibitors for SARS-CoV-2 proteins, to evaluate whether DINC-COVID could properly rank these ligands. Good correlations between predicted binding energies and experimental binding affinities were obtained with both PLpro inhibithors (e.g., Pearson's r = 0.9, see Table 1) and Mpro inhibithors. Interestingly, we obtained good correlations between predicted binding energies and experimental binding affinities with a recently published set of antiviral peptidomimetic inhibitors for Mpro (e.g., Pearson's r = 0.86, see Table 4). As noted, the binding of these peptidomimetics requires malleability of Mpro catalytic site [18], and as such could not be predicted by docking methods using a single receptor conformation (i.e., COVID-19-DS and DockThor-VS). These results highlight the potential of DINC-COVID to identify completely novel binding modes. Such binding modes could involve conformational changes of the receptor's binding site, and could potentially be predicted regardless of the availability of experimental data on such receptor's conformation.
As a note of caution, we should reiterate that the analyzed datasets were small, which can introduce biases regarding their composition (e. g., impact of outliers) and the choice of statistical analysis. We tried to address these limitations by discussing the impact of different scenarios in the analysis of these datasets. In the case of the Mpro datasets, we also performed an alternative analysis in which we aggregated all results into a single dataset. Despite all the observed variability across different scenarios, the ensemble docking strategy of DINC-COVID produced results that were consistently in agreement with experimental data, outperforming other webservers in many of these experiments.
It is important to acknowledge that accurate scoring remains an open challenge in the field of molecular docking [66], especially when considering large and flexible ligands [30]. In the context of ensemble docking, accurately scoring and ranking sampled binding modes becomes even more challenging. To try and compensate for the limitations of relying on a single scoring function, our meta-docking approach provides scoring of the results with three alternative functions. Users can then decide if they want to rely on a specific scoring function, or build a consensus across the three functions. In fact, our results show that the scoring function can affect the diversity of binding modes produced by DINC-COVID, which is a factor that users might want to explore in different ways depending on their specific application. Most importantly, despite the challenges and known limitations of available scoring functions, we were able to obtain good correlation with experimental data in our validation experiments.
There are countless scoring functions proposed in previous studies and implemented in other tools. DINC-COVID predictions could certainly be improved by using scoring functions that might be tailored to a particular type of ligand. In order to explore this possibility, users can request to download a larger number of output conformations, and rescore them locally with the most appropriate scoring function. Future work on DINC-COVID will include the implementation of additional scoring functions during the sampling phase, and consensus ranking during the rescoring phase.
The quality of results produced by ensemble docking is directly influenced by the quality of the conformations included in the ensemble [22]. For instance, it was shown that including too many conformations in an ensemble can deteriorate docking results [22,67], since it exacerbates existing limitations of both sampling and scoring. As briefly discussed in the Methods section, we used a reasonable and efficient strategy to extract conformations from MD simulations. By combining standard procedures for dimensionality reduction and clustering, we tried to maximize coverage of the conformational landscape, while minimizing the number of conformations included in each ensemble. In fact, we used the same strategy to generate the Crystal ensembles, as opposed to simply using all available crystal structures. Again, using a different strategy for selecting conformations could potentially improve DINC-COVID results [29,68]. However, demonstrating the optimality of our selection strategy goes beyond the scope of this study. It is worth noting that our proof-of-concept validation indicates the success of decisions made for conformation generation and conformation selection, providing us with a useful protocol that can be further improved upon.
Finally, the choice of making DINC-COVID available through a webserver also comes with trade-offs. On one hand, it takes away most of the burden associated with software installation and the preparation of input files. We hope that this format will enable users who are not familiar with molecular docking to go ahead and run their own tests, potentially evaluating new compounds that are not yet available in public databases. On the other hand, it constrains the use of our ensemble docking strategy to the specific algorithms and parameters we considered. Fortunately, our implementation offers the possibility to make this pipeline available for local customization and execution. In particular, we are already working on leveraging the use of Docker containerization and widely-used python packages, to develop a future version of DINC-COVID that could be deployed on local computational resources. This will enable a full customization of DINC-COVID, and foster its use for large-scale virtual screening of viral variants.

Declaration of competing interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Neither the manuscript nor any parts of its content are currently under consideration or published in another journal.  Table 7 and Fig. 8 Again, the box used for sampling was tuned to the binding site of each receptor conformation.

A.2. Papain-like Protease (PLpro)
Three ensembles were built for PLpro, based on conformational changes in the catalytic binding site. To build the Crystal ensemble, monomeric forms of PLpro were extracted from 12 crystals representing wild type structures (see Table 8 and Fig. 8.C). Crystals with double occupancy were split into two files to ensure each file would represent one distinct orientation. In addition, two new ensembles were produced by sampling conformations from MD simulations of PLpro in the apo state. For these simulations, a monomeric unit from the crystallographic structure 6WUU was used as input, and approximately 100,000 conformations were extracted for each force field. The number of structures in the Crystal, Charmm, and Gromos ensembles are 6, 20 and 19, respectively. The scoring box center was set to 46  As for Mpro, we have performed an RMSD analysis (for binding site residues only) to quantify differences between the three receptor ensembles of PLpro (see Fig. 9). Again, the Crystal ensemble captures mostly side-chain rearrangements and subtle backbone variations: the all-heavy-atom RMSD between members of the Crystal ensemble is lower than 0.

A.3. RNA-dependent RNA-polymerase (RdRp)
Three ensembles were built for RdRp, based on conformational changes in the catalytic binding site. The position of remdesivir in the crystallographic structure with PDB code 7BV2 was taken as a reference for the targeted site. In total, 7 crystallographic structures, out of 11 analyzed, were included into the RdRp Crystal ensemble (see Table 9 and Fig. 8.D). In addition, two new ensembles were produced by sampling conformations from MD simulations of RdRp in the apo state. In total 100,000 snapshots were extracted for each force field. The number of structures in the Crystal, Charmm, and Gromos ensembles are 7, 14 and 13, respectively. The scoring box center was set to 90.02, 90.17 and 103.52 (X, Y and Z, respectively) with dimensions 98 × 94 × 90 Å for the Crystal ensemble; 92.02, 90.17 and 103.52 (X, Y and Z, respectively) with dimensions 96 × 94 × 90 Å for the Charmm ensemble; and 92.02, 90.17 and 103.52 (X, Y and Z, respectively) with dimensions 96 × 94 × 90 Å for the Gromos ensemble. The box used for sampling was tuned to the binding site of each receptor conformation, based on the solvent accessible volume of the binding site. We have also performed an RMSD analysis (for binding site residues only) to quantify differences between the three receptor ensembles of RdRp (see Fig. 10). The all-heavy-atom RMSD between members of the Crystal ensemble is lower than 0.89 Å. The all-heavy-atom RMSD between members of the Crystal ensemble and of the Charmm ensemble ranges from 1.36 to 2.43 Å. The all-heavy-atom RMSD between members of the Crystal ensemble and of the Gromos ensemble ranges from 1.89 to 3.58 Å.

B. Correlations between Binding Energies and Experimental Binding Affinities
In addition to reporting correlation coefficients in the main text, in this appendix we present scatter plots illustrating these correlations. This allowed us to determine which correlations were of high quality and to identify outliers in the datasets. This appendix includes the correlation analysis of the PLpro dataset (see Fig. 11), of the three Mpro datasets (see Figs. 12,13 and 14), and of the large Mpro dataset obtained by combining these three small datasets together (see Fig. 15).
In addition, we present a receiver operating characteristic (ROC) analysis performed on the large, combined Mpro dataset (see Fig. 16). This dataset is the only one that is large enough to allow for this analysis, although a typical ROC analysis would include many more ligands. ROC analyses are traditionally performed to evaluate the ability of a docking method at distinguishing binding from non-binding ligands. However, in our case, all ligands are Mpro binders. Therefore, we performed this analysis by arbitrarily dividing the dataset between "strong binders" and "weak binders" using an IC 50 threshold of 10 μM. Fig. 11. Correlation analysis for seven PLpro drug-like inhibitors [31]. Experimentally-determined binding affinities are plotted against docking-derived binding energies obtained with DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). The best linear fit and Pearson's r are also reported on each plot.  [36]. Experimentally-determined binding affinities are plotted against docking-derived binding energies obtained with DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). The best linear fit and Pearson's r are also reported on each plot. Two outliers are highlighted: Dipyridamole (in blue) and Indinavir (in red). [37]. Experimentally-determined binding affinities are plotted against docking-derived binding energies obtained with DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). The best linear fit and Pearson's r are also reported on each plot. Two outliers are highlighted: 13a (in blue) and Shikonin (in red). [18]. Experimentally-determined binding affinities are plotted against docking-derived binding energies obtained with DINC-COVID (using three ensembles and three scoring functions), COVID-19-DS and DockThor-VS (using two receptor conformations). The best linear fit and Pearson's r are also reported on each plot.