Comparing AutoDock and Vina in Ligand / Decoy Discrimination for Virtual Screening

: AutoDock and Vina are two of the most widely used protein–ligand docking programs. The fact that these programs are free and available under an open source license, also makes them a very popular ﬁrst choice for many users and a common starting point for many virtual screening campaigns, particularly in academia. Here, we evaluated the performance of AutoDock and Vina against an unbiased dataset containing 102 protein targets, 22,432 active compounds and 1,380,513 decoy molecules. In general, the results showed that the overall performance of Vina and AutoDock was comparable in discriminating between actives and decoys. However, the results varied signiﬁcantly with the type of target. AutoDock was better in discriminating ligands and decoys in more hydrophobic, poorly polar and poorly charged pockets, while Vina tended to give better results for polar and charged binding pockets. For the type of ligand, the tendency was the same for both Vina and AutoDock. Bigger and more ﬂexible ligands still presented a bigger challenge for these docking programs. A set of guidelines was formulated, based on the strengths and weaknesses of both docking program and their limits of validation.


Introduction
The use of computational methods is a crucial part of the drug discovery, development, and optimization process. Protein-ligand docking and virtual screening are two of the most used techniques in this field that continue to show promise in hit identification and subsequent optimization [1]. They are also helpful tools for drug repositioning [2][3][4]. These methods are effective and fast, and allow researchers to evaluate large virtual databases of molecular compounds as a first attempt to guide the selection of more limited sets of compounds for experimental testing. They do, however, possess few limitations [5,6].
Protein-ligand docking is a computational technique that predicts the conformation and orientation (pose) of a ligand when it is bound to a given protein [1,[7][8][9][10][11][12]. With this method, the ligand-target interactions are modeled to achieve an optimal complementarity of steric and physicochemical properties [13]. This methodology has made possible the visualization of the potential interactions between a ligand and its target [14].
Docking, however, still faces difficulties, particularly regarding the correct modeling of ligand and protein flexibility [15][16][17][18] and of water-mediated interactions [18,19]. It is widely used for small molecules, but its use for small peptides and other larger biomolecules has only been under development in the last decade [20][21][22].
Typically, the docking software is an interplay between the search algorithm, which explores and generates different poses of the ligand, and the scoring function, which estimates the binding affinities The Directory of Useful Decoys-Enhanced (DUD-E) [31] holds a collection of decoys and ligands for benchmarking virtual screening, containing 22,432 active compounds and their affinities against 102 targets set by Huang et al. For each of the active compounds (i.e., the ligands), this database contains a set of 50 "decoys", i.e., molecules with similar 1-D physico-chemical properties to remove bias (e.g., molecular weight, calculated LogP), but dissimilar 2-D topology to be likely non-binders, i.e., inactives. These characteristics make DUD-E a challenging dataset to test scoring functions and protein-ligand docking algorithms. Ideally, the perfect scoring function would rank the active molecules higher than the decoys, but that is not often the case.

Materials and Methods
The performance of AutoDock 4 and Vina was measured using the Directory of Useful Decoys-Enhanced (DUD-E). DUD-E contains a large collection of decoys and ligands that can be used for benchmarking ligand/decoys discrimination in virtual screening tests. The DUD-E dataset has been widely used to validate data from other open source such as Dock [47,48] and commercial programs such as Gold, Glide, Surflex, and FlexX [48]. It is also frequently used to validate the development of new consensus scoring functions [38,[49][50][51][52][53][54].
An overview of the 102 protein-targets in DUD-E can be found in Figure 1, and Table 1 specifies the types of protein targets and the number of ligands and decoys in the dataset. DUD-E contains a wide variety of protein target types, including 26 kinases, 15 proteases, 11 nuclear receptors, 5 G protein-coupled receptor (GPCR), 2 ion channels, 2 cytochrome P450s, 36 other enzymes, and 5 miscellaneous proteins. About 18 of these proteins contain metal atoms, while the other 84 do not. Proteases, kinases, and metalloenzymes are the largest groups present in the DUD-E dataset and are the ones that were emphasized on in the discussion. We also included GPCRs, since this large class of proteins was structurally very similar and was the focus of many other studies. The results presented in this study here might guide the selection of the most adequate docking software for these specific families.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 20 102 targets set by Huang et al. For each of the active compounds (i.e., the ligands), this database contains a set of 50 "decoys", i.e., molecules with similar 1-D physico-chemical properties to remove bias (e.g., molecular weight, calculated LogP), but dissimilar 2-D topology to be likely non-binders, i.e., inactives. These characteristics make DUD-E a challenging dataset to test scoring functions and protein-ligand docking algorithms. Ideally, the perfect scoring function would rank the active molecules higher than the decoys, but that is not often the case.

Materials and Methods
The performance of AutoDock 4 and Vina was measured using the Directory of Useful Decoys-Enhanced (DUD-E). DUD-E contains a large collection of decoys and ligands that can be used for benchmarking ligand/decoys discrimination in virtual screening tests. The DUD-E dataset has been widely used to validate data from other open source such as Dock [47,48] and commercial programs such as Gold, Glide, Surflex, and FlexX [48]. It is also frequently used to validate the development of new consensus scoring functions [38,[49][50][51][52][53][54].
An overview of the 102 protein-targets in DUD-E can be found in Figure 1, and Table 1 specifies the types of protein targets and the number of ligands and decoys in the dataset. DUD-E contains a wide variety of protein target types, including 26 kinases, 15 proteases, 11 nuclear receptors, 5 G protein-coupled receptor (GPCR), 2 ion channels, 2 cytochrome P450s, 36 other enzymes, and 5 miscellaneous proteins. About 18 of these proteins contain metal atoms, while the other 84 do not. Proteases, kinases, and metalloenzymes are the largest groups present in the DUD-E dataset and are the ones that were emphasized on in the discussion. We also included GPCRs, since this large class of proteins was structurally very similar and was the focus of many other studies. The results presented in this study here might guide the selection of the most adequate docking software for these specific families.    Using the DUD-E dataset, the performance of a scoring function in virtual screening could be expressed through a graphical representation of the true positive rate versus the false positive rate in terms of receiver operating characteristic (ROC) plots. In ROC plots, the true positive rate (TPR = TP/P) was plotted against the false positive rate (FPR = FP/N), where TP is the number of true positives, P is the total number of positives (actives), FP is the number of false positives, and N is the total number of negatives (decoys). A useful measure is the area under the curve (AUC). The higher the AUC value in a ROC curve, the better the discrimination between the true positive and the false positive poses.
As previously mentioned, a successful scoring function for virtual screening should rank active compounds very early on a large score list, so metrics that emphasize early recognition of ligands are normally used. One of such measures is the enrichment factor at 1% (abbreviated EF1%). This value measures the number of active ligands recovered at 1% of the ligand/decoy database, over the number of active ligands that should be expected at the same fraction of the database with random selection. Other values such as the EF20% were also used sometimes.
After an initial analysis of all the DUD-E targets, there was one (FGFR pdb:3C4F) that did not have the 1/50 proportion for active/decoys, so it was decided to exclude it from this test.
For each target in the DUD-E dataset, an initial analysis of the PDB file associated was performed. The binding pocket was studied and evaluated. Similar PDB structures with co-crystallized ligands were also inspected. Re-docking of the ligands for which there was a ligand-target structure available was performed with AutoDock and with Vina. The docking protocol for both programs was adjusted so as to reproduce the known experimental binding poses for each target, a standard protocol, when validating a docking program/protocol for a specific target [24], as presented in Figure 2. Parameters adjusted in this process with Vina included in the box size and position, number of generated binding modes and exhaustiveness. In AutoDock, the parameters optimized also included the box size and position, number of grid points and spacing, number of genetic algorithm (GA) runs, population size, maximum number of energy evaluations, and maximum number of generations. After the first optimization stage performed for each target, the box dimensions and center coordinates used for both AutoDock and Vina were the same. The exhaustiveness value used for Vina was 8. As for AutoDock, the grid spacing was set to 0.375 Å and the number of GA runs was set to 10. All this information is provided in Table S1 in the Supplementary Materials. At the end of this stage an optimized docking protocol was selected for each target with each docking program. These protocols were used for the corresponding 101 protein targets to dock the associated ligands and decoys. For each target, ranked lists of ligands and decoys were prepared with AutoDock and Vina, based on the corresponding scores. These lists were used to determine the values of AUCs, EF 1% and EF 20%, allowing a comparison of the performance of the two docking programs in discriminating between ligands and decoys for each target. Average AUC, EF 1% and EF 20% were determined for the different families of protein targets and for the full 101 targets.
All protein targets were characterized in terms of the number of the total amino acid residues and molecular weight. The corresponding binding pockets were evaluated in terms of their percentage of hydrophobic, polar, and charged amino acid residues. Average AUC, EF 1% and EF With the current protocol, the computational time for the virtual screening of the complete DUD-E dataset for Vina was of approximately 60 days in 24 CPUs. Calculations in AutoDock took on average 100 times more.
At the end of this stage an optimized docking protocol was selected for each target with each docking program. These protocols were used for the corresponding 101 protein targets to dock the associated ligands and decoys. For each target, ranked lists of ligands and decoys were prepared with AutoDock and Vina, based on the corresponding scores. These lists were used to determine the values of AUCs, EF1% and EF20%, allowing a comparison of the performance of the two docking programs in discriminating between ligands and decoys for each target. Average AUC, EF1% and EF20% were determined for the different families of protein targets and for the full 101 targets.
All protein targets were characterized in terms of the number of the total amino acid residues and molecular weight. The corresponding binding pockets were evaluated in terms of their percentage of hydrophobic, polar, and charged amino acid residues. Average AUC, EF1% and EF20% were determined for different classes of protein targets based on the protein's size and type of residues at the binding pocket.
The Molecular Operating Environment (MOE) [55] program was used to calculate the chemical and structural properties for all ligands tested. Some of these properties were analyzed in more detail. Examples include the ligand's molecular weight, volume, area, fraction of rotatable bonds, fraction of hydrophobic accessible surface area (FASA_H), fraction of polar accessible surface area (FASA_P), and fraction of positive and negative accessible surface areas (FASA+ and FASA−). Average AUC, EF(1%) and EF(20%) were also determined for the different classes of ligands based on the ligand's size, fraction of rotatable bonds and electrostatic nature.

Evaluation of the Performance of AutoDock and Vina
The chemical and structural properties of different proteins and enzymes can vary quite significantly, in features that include the nature, type, and range of interactions around the binding pocket, the pocket size and shape, and the exposure to solvent. Therefore, the challenges that such systems offer to docking and to virtual screening can also be quite different. Some programs and scoring functions are better able to capture some of these characteristics, while other show improved performance in targets with other features. Table 2 compares the performance of AutoDock and Vina across the different classes of targets. The average results obtained for the set of 101 target showed that AutoDock and Vina exhibit a similar average performance in discriminating between ligands and decoys. In fact, the average EF1% values obtained were 7.6 and 8.9 for Vina and AutoDock, respectively (AUCs of 68.0 and 66.4). The EF1% values calculated for this extended data set show that these programs are able to rank in the top 1% of the total ligands (active and decoys) docked against each target, 7.6-and 8.9-times more active ligands than what would be expected from random selection, considering the relative percentage of actives and decoys available for each target.
However, the discrimination ability across different target classes could vary significantly. For GPCRs, for example, AutoDock exhibited superior discrimination ability, with an average EF1% of 16.6 against only 2.8 with VINA. AutoDock also demonstrated improved performance over Vina for Nuclear Receptors (EF1% of 18.4 versus 15.0). However, for kinases and metalloproteins the discrimination ability of Vina is on average better than that of AutoDock. Figure 3, shows the average AUC values, calculated for the different target families. As previously mentioned, the higher the AUC, the better the discrimination ability between actives and decoys. AutoDock provided better results for GPCRs, ion channels, and nuclear receptors. Vina worked better for all the other families.
However, across large families of proteins there could be significant variations in the docking results, when looking into individual proteins. In the case of metalloenzymes, for example, Vina provided better results, on average. Analyzing each target in particular (Figure 4) it could be seen that for some targets the AutoDock performed significantly better. This might be explained by the fact that in this family there is a large variability of types of proteins as this group includes kinases, proteases, and others. However, across large families of proteins there could be significant variations in the docking results, when looking into individual proteins. In the case of metalloenzymes, for example, Vina provided better results, on average. Analyzing each target in particular (Figure 4) it could be seen that for some targets the AutoDock performed significantly better. This might be explained by the fact that in this family there is a large variability of types of proteins as this group includes kinases, proteases, and others.   results, when looking into individual proteins. In the case of metalloenzymes, for example, Vina provided better results, on average. Analyzing each target in particular (Figure 4) it could be seen that for some targets the AutoDock performed significantly better. This might be explained by the fact that in this family there is a large variability of types of proteins as this group includes kinases, proteases, and others.    Table 3 analyzes the performance of AutoDock and VINA taking into consideration the number of amino acid residues that constitute the target. For smaller targets, the driving force for ligand-binding tends to be more concentrated in a smaller number of key specific residues. Additionally, the binding pockets tended to be smaller, or often more exposed to the solvent. On the other hand, in larger protein-targets, the range of interactions involved in ligand-binding tended to be larger and more diffused. In addition, the extra number of amino acid residues present in the larger targets could confer a more controlled environment to the corresponding binding pockets, shielding the interactions formed from the effect of the solvent. The non-specific protein environment could play a more important role for ligand-binding in these targets. Therefore, the number of amino acid residues that constituted the different targets could offer different trials for docking and virtual screening. The results from Table 3 show that Vina was, on average, better in discriminating ligands from decoys in medium-sized targets, with 250 to 400 amino acid residues (average EF1% of 11.5, AUC 71.8). For targets with more than 400 amino acid residues, the performance of Vina was significantly lower (average EF1% of only 6.1, AUC of 65. 4) AutoDock exhibited a more uniform behavior, with average EF1% values in the range 7.9-9.4 for small (less than 250 aa) and large targets (more than 400 aa), resulting in an improved performance over Vina for the small targets (<250 aa) and the large targets (>400 aa).
Another important aspect regarding the nature of the target protein concerns the type of amino acid residues that constitute each binding pocket. For this analysis, all amino acid residues defining each binding pocket were grouped into polar, charged (negative and positive), and hydrophobic amino acid residues. Binding pockets were characterized based on the relative percentage of each of these types of residues. Average EF1% and AUC values were calculated with AutoDock and Vina for each category. The results are presented in Table 4. Negative Charge Poorly Negative (0-5%) 22 11.0 ± 9.5 70.8 ± 9.8 11.7 ± 10.7 73.5 ± 11.9 Moderately Negative (5-10%) 44 8. The results presented in Table 4 showed that for poorly polar binding pockets (less than 25% of polar residues) AutoDock was on average better than Vina in discriminating between ligands and decoys, particularly among the top 1% of ranked solutions. For moderately polar and very polar binding pockets, Vina exhibited a better performance than AutoDock. The results also showed that both programs had more difficulty in discriminating ligands and decoys for very polar binding pockets (>35% of polar amino acid residues).
In terms of the percentage of hydrophobic residues, the results showed that Vina was significantly better than AutoDock in ligand/decoy discrimination for poorly hydrophobic binding pockets. As the percentage of hydrophobic residues at the binding pocket increased, the performance of Vina and AutoDock became increasingly similar, both in terms of EC1% and in terms of AUC values.
In terms of charge, the results showed that AutoDock was better in discriminating ligands and decoys in poorly charged binding pockets (<15%) than in moderate or highly charged ones. Vina, on the other hand, gave best results in highly charged binding pockets. These general tendencies concerning the presence of a charge at the binding pocket were also observed when particularly looking into positively charged residues or into negatively charged residues.
In general, these results showed that AutoDock was better in discriminating ligands and decoys in more hydrophobic, poorly polar, and poorly charged pockets, while Vina exhibited early recognition metrics that did not vary so significantly with the type of amino acid residues at the binding pocket. Vina tended to give better results for polar and charged binding pockets, which was particularly interesting, taking into consideration that the scoring function of Vina did not explicitly include charges, while that of AutoDock had an explicit electrostatic term.

Substrates
The type of molecule to be evaluated and its physico-chemical characteristics also offer different challenges for virtual screening, in terms of docking and its ability to discriminate between actives and decoys. For each specific target, the decoys included in the DUD-E were generated by having similar 1-D physico-chemical properties to the actives from which they originated, to remove bias [32]. Hence, to analyze how the different substrate properties affected the discriminating ability of each target, the physical properties of all actives identified in the ligands ranked as the top 1% were evaluated and compared with the other actives that were ranked the worst.
In this study, four fundamental properties of the ligands were analyzed-the size of the ligands, polarity, charge, and the number of rotatable bonds. Figures 5 and 6 present heat maps of the correlation between the substrate properties and their position in the ranking according to the type of target family (proteases and metalloenzymes, respectively). Darker red (+1) yield perfect positive correlation while darker blue (−1), yield perfect negative correlation. From Figure 5, it is clear that polarity and number of rotational bonds is important for both Vina and is even more distinct for AutoDock, since it presents a positive correlation, that is, as the ranking number increases, the polarity and number of rotational bonds also increase. This means that the molecules with more rotatable bonds and which are more polar, are ranked worst in the list. This leads to the conclusion that more polar and more flexible molecules present a bigger challenge for AutoDock, in particular. For metalloenzymes, the correlation profile is a little bit different from proteases. It is not easy to find a clear tendency because while some targets present a positive correlation for some property, others have a negative correlation for the same property. This could again be explained by the large variability of protein types in this particular family.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 20 [32]. Hence, to analyze how the different substrate properties affected the discriminating ability of each target, the physical properties of all actives identified in the ligands ranked as the top 1% were evaluated and compared with the other actives that were ranked the worst. In this study, four fundamental properties of the ligands were analyzed-the size of the ligands, polarity, charge, and the number of rotatable bonds. Figures 5 and 6 present heat maps of the correlation between the substrate properties and their position in the ranking according to the type of target family (proteases and metalloenzymes, respectively). Darker red (+1) yield perfect positive correlation while darker blue (-1), yield perfect negative correlation. From Figure 5, it is clear that polarity and number of rotational bonds is important for both Vina and is even more distinct for AutoDock, since it presents a positive correlation, that is, as the ranking number increases, the polarity and number of rotational bonds also increase. This means that the molecules with more rotatable bonds and which are more polar, are ranked worst in the list. This leads to the conclusion that more polar and more flexible molecules present a bigger challenge for AutoDock, in particular. For metalloenzymes, the correlation profile is a little bit different from proteases. It is not easy to find a clear tendency because while some targets present a positive correlation for some property, others have a negative correlation for the same property. This could again be explained by the large variability of protein types in this particular family.    Figure 7 summarizes the variability of all molecules present in the DUD-E dataset, taking into account the molecular weight. The results showed that from the total of 22,321 active ligands considered for all 101 DUD-E targets, 7,990 have a molecular weight below 400 Da, while 8,833 have a molecular weight in the range of 400-500 Da, with 5,498 with a molecular weight over 500 Da. The distribution of decoys across these ranges was the same, as they were generated automatically from the known ligands included.  Table 5 decomposes the number of ligands identified in the top 1% of compounds ranked, according to the molecular weight. AutoDock identified a total of 1,935 actives in the top 1% of ligands, while in Vina, this number was of 2,002. The results showed that Vina was, on average, better  Figure 7 summarizes the variability of all molecules present in the DUD-E dataset, taking into account the molecular weight. The results showed that from the total of 22,321 active ligands considered for all 101 DUD-E targets, 7990 have a molecular weight below 400 Da, while 8833 have a molecular weight in the range of 400-500 Da, with 5498 with a molecular weight over 500 Da. The distribution of decoys across these ranges was the same, as they were generated automatically from the known ligands included.  Figure 7 summarizes the variability of all molecules present in the DUD-E dataset, taking into account the molecular weight. The results showed that from the total of 22,321 active ligands considered for all 101 DUD-E targets, 7,990 have a molecular weight below 400 Da, while 8,833 have a molecular weight in the range of 400-500 Da, with 5,498 with a molecular weight over 500 Da. The distribution of decoys across these ranges was the same, as they were generated automatically from the known ligands included.    and for large-sized ligands (>500 MW) (581 versus 497 actives). However, AutoDock was able to rank more medium-sized actives (400-500 MW) among the top 1% of the results (1043 versus 885). Regarding each family of proteins, all exhibited the same tendency-smaller ligands were more difficult to discriminate and appeared at worst ranking positions for both Vina and AutoDock. Figure 8 shows the influence of molecular weight on the average ranking distribution of the molecules within the full-ranked list determined for each protein target. The results showed that there was a similar tendency for both GPCR and kinase protein families, where the smaller ligands were ranked worst and the medium ligands were ranked better. For both GPCRs and kinases, AutoDock could rank smaller ligands better than Vina, even though their ranking position was relatively high. As for the medium-sized active molecules (300-400), these two families exhibited opposite results-while Vina provided better recognition for kinases, AutoDock was more effective in discriminating actives and decoys for GPCRs.  Regarding each family of proteins, all exhibited the same tendency-smaller ligands were more difficult to discriminate and appeared at worst ranking positions for both Vina and AutoDock. Figure 8 shows the influence of molecular weight on the average ranking distribution of the molecules within the full-ranked list determined for each protein target. The results showed that there was a similar tendency for both GPCR and kinase protein families, where the smaller ligands were ranked worst and the medium ligands were ranked better. For both GPCRs and kinases, AutoDock could rank smaller ligands better than Vina, even though their ranking position was relatively high. As for the medium-sized active molecules (300-400), these two families exhibited opposite resultswhile Vina provided better recognition for kinases, AutoDock was more effective in discriminating actives and decoys for GPCRs.  Figure 9 presents the relative distribution of all active ligands in the DUD-E dataset taking into consideration the number of rotational bonds present. There is a higher prevalence in molecules with 4 to 7, and 8 to 11 rotational bonds, representing 73% of the dataset. The remaining 27% corresponds to molecules with 0 to 3 and higher than 12 rotational bonds.  Figure 9 presents the relative distribution of all active ligands in the DUD-E dataset taking into consideration the number of rotational bonds present. There is a higher prevalence in molecules with 4 to 7, and 8 to 11 rotational bonds, representing 73% of the dataset. The remaining 27% corresponds to molecules with 0 to 3 and higher than 12 rotational bonds.

Influence of the Number of Rotational Bonds
Ligands with more rotatable bonds presented a higher challenge for docking because they could adopt a larger number of possible conformations. Discriminating actives with many rotatable bonds from decoys with many rotatable bonds hence became more difficult, because correctly identifying the real pose of the ligand was more challenging. Hence, ligands with a higher number of rotational bonds were placed at the worst position in the database, when comparing with the ligands with fewer rotatable bonds. In this study, this was observed for all studied families.
In Figure 10, the data for nuclear receptors and GPCRs are presented. For both families, AutoDock was able to rank more ligands early on. While in GPCRs there was a clear difference in the discrimination ability between Vina and AutoDock, for nuclear receptors, there was a similar behavior between both alternatives (exception-compounds with 4 rotatable bonds in nuclear receptors). According to our study, molecules with 5 to 10 rotational bonds ensured a better prediction with both AutoDock and Vina. Ligands with more rotatable bonds presented a higher challenge for docking because they could adopt a larger number of possible conformations. Discriminating actives with many rotatable bonds from decoys with many rotatable bonds hence became more difficult, because correctly identifying the real pose of the ligand was more challenging. Hence, ligands with a higher number of rotational bonds were placed at the worst position in the database, when comparing with the ligands with fewer rotatable bonds. In this study, this was observed for all studied families.
In Figure 10, the data for nuclear receptors and GPCRs are presented. For both families, AutoDock was able to rank more ligands early on. While in GPCRs there was a clear difference in the discrimination ability between Vina and AutoDock, for nuclear receptors, there was a similar behavior between both alternatives (exception-compounds with 4 rotatable bonds in nuclear receptors). According to our study, molecules with 5 to 10 rotational bonds ensured a better prediction with both AutoDock and Vina.   Ligands with more rotatable bonds presented a higher challenge for docking because they could adopt a larger number of possible conformations. Discriminating actives with many rotatable bonds from decoys with many rotatable bonds hence became more difficult, because correctly identifying the real pose of the ligand was more challenging. Hence, ligands with a higher number of rotational bonds were placed at the worst position in the database, when comparing with the ligands with fewer rotatable bonds. In this study, this was observed for all studied families.

Discussion
In Figure 10, the data for nuclear receptors and GPCRs are presented. For both families, AutoDock was able to rank more ligands early on. While in GPCRs there was a clear difference in the discrimination ability between Vina and AutoDock, for nuclear receptors, there was a similar behavior between both alternatives (exception-compounds with 4 rotatable bonds in nuclear receptors). According to our study, molecules with 5 to 10 rotational bonds ensured a better prediction with both AutoDock and Vina.

Discussion
AutoDock and Vina are efficient software alternatives for virtual screening, exhibiting on average similar performance when evaluating the ligand/decoy discriminating ability, across a large number of proteins. In spite of the similar average performance exhibited, both docking programs can present a marked difference when studying a particular protein target, or even when looking into proteins or enzymes from specific families, or for different types of ligands. Hence, for the common user wishing to embark in a virtual screening study, it is not easy to select a priori the alternative that should be used.
The goal of this study was to guide the selection of the docking software according to the type and characteristics of the target and its substrates. As demonstrated, the type of target, and specially the characteristics of the binding pocket could influence the outcome of the docking software. The results showed that AutoDock was clearly better in discriminating ligands and decoys in smaller targets, with more hydrophobic, poorly polar, and poorly charged pockets, while Vina tended to give better results for bigger targets with polar and charged binding pockets. According to the results presented, Vina provided better metrics for kinases, proteases, and cytochrome P450. On the other hand, ligand/decoy discrimination for GPCR, ion channels, and nuclear receptors was improved with AutoDock.
For the substrates, however, this analysis across 22,432 active compounds and 1,380,513 decoy molecules showed that AutoDock and Vina exhibited comparable trends with the ligands size, charge, and the number of rotatable bonds. Bigger, more flexible, and more polar ligands were more difficult to discriminate from decoys for both docking programs but the performance of Vina and AutoDock was quite similar.

Conclusions
While the present study offered useful guidelines that could help researchers to choose between AutoDock or Vina before starting a new virtual screening, according to the characteristics of their specific target, it also highlighted another important aspect. The performance of both programs could in some cases vary significantly, even for very similar proteins. Therefore, for very specific systems, it is recommended that researchers test both alternatives wisely, before starting a large virtual screening study.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-3417/9/21/4538/s1. Table S1: Docking parameters used for Vina and AutoDock; Figure S2: Comparison between the crystallographic (green) and "docked" (purple) poses for Vina and AutoDock to evaluate the influence of the number of rotational bonds in pose prediction. (a) Ligands with the lowest number of rotational bonds. (a1) Ligands with the highest number of rotational bonds.