Critical assessment of approaches for molecular docking to elucidate associations of HLA alleles with adverse drug reactions

HighlightsAll software assessed could dock Abacavir back into the risk allele structure but not always predict the exact binding mode.Most docking software assessed can distinguish between risk and control alleles.Docking performance can be degraded by using a homology model.Receptor flexibility can negatively affect the docking performance for complex HLA examples.Using AutoDockFR cannot compensate for the added difficulty of docking to the unbound target. &NA; Adverse drug reactions have been linked with genetic polymorphisms in HLA genes in numerous different studies. HLA proteins have an essential role in the presentation of self and non‐self peptides, as part of the adaptive immune response. Amongst the associated drugs‐allele combinations, anti‐HIV drug Abacavir has been shown to be associated with the HLA‐B*57:01 allele, and anti‐epilepsy drug Carbamazepine with B*15:02, in both cases likely following the altered peptide repertoire model of interaction. Under this model, the drug binds directly to the antigen presentation region, causing different self peptides to be presented, which trigger an unwanted immune response. There is growing interest in searching for evidence supporting this model for other ADRs using bioinformatics techniques. In this study, in silico docking was used to assess the utility and reliability of well‐known docking programs when addressing these challenging HLA‐drug situations. The overall aim was to address the uncertainty of docking programs giving different results by completing a detailed comparative study of docking software, grounded in the MHC‐ligand experimental structural data – for Abacavir and to a lesser extent Carbamazepine ‐ in order to assess their performance. Four docking programs: SwissDock, ROSIE, AutoDock Vina and AutoDockFR, were used to investigate if each software could accurately dock the Abacavir back into the crystal structure for the protein arising from the known risk allele, and if they were able to distinguish between the HLA‐associated and non‐HLA‐associated (control) alleles. The impact of using homology models on the docking performance and how using different parameters, such as including receptor flexibility, affected the docking performance were also investigated to simulate the approach where a crystal structure for a given HLA allele may be unavailable. The programs that were best able to predict the binding position of Abacavir were then used to recreate the docking seen for Carbamazepine with B*15:02 and controls alleles. It was found that the programs investigated were sometimes able to correctly predict the binding mode of Abacavir with B*57:01 but not always. Each of the software packages that were assessed could predict the binding of Abacavir and Carbamazepine within the correct sub‐pocket and, with the exception of ROSIE, was able to correctly distinguish between risk and control alleles. We found that docking to homology models could produce poorer quality predictions, especially when sequence differences impact the architecture of predicted binding pockets. Caution must therefore be used as inaccurate structures may lead to erroneous docking predictions. Incorporating receptor flexibility was found to negatively affect the docking performance for the examples investigated. Taken together, our findings help characterise the potential but also the limitations of computational prediction of drug‐HLA interactions. These docking techniques should therefore always be used with care and alongside other methods of investigation, in order to be able to draw strong conclusions from the given results.

and HLA-A allele (A*01:01) which could be assumed to not be associated as they are seen at a high 1 9 4 frequency across European origin (Caucasian) populations (average frequencies obtained from AFND 1 9 5 using gold standard populations [9]: B*57:01 = 0.03, B*07:02=0.10 and A*01:01=0.14). 1 9 6 The allele structures were obtained from the PDB database, where available. Models were made for 1 9 7 those alleles where the structure is not publicly available (Table 1). Target and template sequences 1 9 8 were aligned with ClustalX [42]. For each modelling exercise, ten models were generated using 1 9 9 Modeller 9.9 automodel class and the model with the lowest objective function score was chosen as 2 0 0 the model for docking. This objective function is a score generated from the spatial restraints and the 2 0 1 CHARMM energy terms, reflecting stereochemistry within the structure [43].In these simple cases 2 0 2 there was no need to explore alternative target-template alignments since sequences could be aligned 2 0 3 unambiguously with no insertions or deletions.  The Abacavir risk and control alleles were used to evaluate the homology modelling as the known 2 0 9 structures are available for each allele investigated and so can be compared with the model structure 2 1 0 and docking results. Two models were created for B*57:03, one with one template allele (B5703_m) 2 1 1 and another with two template alleles (B5703_m2). These models could then be compared to the 2 1 2 known structure of B*57:03 (B5703_s), as could the docking predictions, to understand the influence 2 1 3 of these steps when employed in a typical docking protocol. The structure of B*57:01 was also 2 1 4 modelled (B5701_m), from two similar sequences identified to make similar comparisons and 2 1 5 evaluate the reliability of using homology modelling.

1 6
For the Carbamazepine risk associated allele B*15:02, there are four differing residues with control 2 1 7 allele B*15:01. Three of these lie in the peptide binding groove, with only one of these being vital to 2 1 8 the D-pocket architecture, where the Carbamazepine is predicted to bind (pos 156). Only a single 2 1 9 template, the structure of B*15:01, was therefore used to model B*15:02 (B1502_m).

0
The quality of each model was investigated using Ramachandran plots and QMEAN scores. For the 2 2 1 B5701_m and B5703_ m and m2, RMSDs were used to give a measure of how well the models 2 2 2 represent the known structures. The percentage sequence identity for each of the templates used for 2 2 3 each model are shown in S1 In order to identify the flexible side chains for the target structure, the relax function of Rosetta was 2 2 6 used to explore the conformational properties of each residue. By looking at 10 different relaxed 2 2 7 structures for each target, flexible residues can be identified. This allows us to consider the flexibility 2 2 8 of the target structure when using AutoDockFR. When using the ROSIE server, a similar sampling is 2 2 9 incorporated into the docking procedures [34]. Using SwissDock [31], the default parameters search the whole target structure but by setting the 2 3 3 search space parameters, it is possible to restrict binding to perform a local docking assay using the 2 3 4 known binding pocket. Here, the area of interest is the peptide binding groove, including residues 1-2 3 5 180 of the alpha chain. The search space was therefore restricted to this area of interest. The file was 2 3 6 processed to ensure it was in the correct format to be uploaded to SwissDock. This included removing 2 3 7 the ligand and peptide from the structure. The PDB was then passed through the Prepdock server [50] 2 3 8 to prepare the structure for docking. This prepared file was then submitted to SwissDock with the 2 3 9 relevant known drug structures. SwissDock used the ZINC database to obtain the known structures of Abacavir and peptide from 3VRI and this time also removing the water molecules from the target 2 4 4 structure, as Rosetta is unable to natively recognise these residues. The Abacavir drug structure was 2 4 5 extracted from the relevant PDB (3VRI) and converted to SDF format. This was then submitted to the 2 4 6 server to be docked, with the search space specified to centre on the peptide binding groove. This 2 4 7 process was repeated for each of the risk and control alleles to be investigated. Using SwissDock, it can be seen that the B5701_s example gives a docking solution close to the 3 1 4 native position (Fig 2 a-c) with all poses showing binding in the F-pocket. When using AutoDockFR 3 1 5 assuming the receptor to be rigid, similar results were seen to those using SwissDock (Fig 2 d-f). All 3 1 6 poses for the B5701_s risk allele were shown bound in the same pocket as the known binding 3 1 7 position. Little difference was seen between the docking shown for each search space, with only one 3 1 8 pose from the B*57:01 run showing binding outside of the peptide binding groove when the search 3 1 9 space was extended around the whole protein (data not shown). This indicates that the peptide binding 3 2 0 groove is the most favourable region for binding. It can be concluded overall that the Abacavir docks 3 2 1 in the expected binding pocket for these packages, but they cannot predict the exact native pose. As the ROSIE server allows movement of side chains within the target, the modelled structure of the 3 3 2 B*57:01 risk allele target was compared to the B5703_s structure submitted. This resulted in the 3 3 3 Table 2 with B5701_s giving similar average RMSDs to the control alleles.

4
Comparing the lowest RMSDs, the control structures give poses with lower RMSDs than B5701_s. 3 3 5  When docking the Abacavir structure using AutoDock Vina, the exhaustiveness was investigated 3 4 0 using the B*57:01 example (B5701_s) to optimise the protocol before docking the other non-risk 3 4 1 alleles. Using the default exhaustiveness of 8, the RMSD values were shown to be quite variable 3 4 2 ( Table 2). The exhaustiveness was then increased starting at exh=18 and doubling the exhaustiveness 3 4 3 to see the effect. It was found that the exhaustiveness of 112 gave the lowest RMSDs overall and 3 4 4 these did not improve with further increasing of exhaustiveness 3 4 5

Most docking software assessed can distinguish between risk and control alleles 3 4 6
Two methods were used to investigate if the docking software can distinguish between the risk alleles. When the flexible residues were incorporated for the 3VRI docking with Abacavir example, poses 4 1 5 occupied the whole peptide binding groove and did not favour the F-pocket as expected (Fig 6). When 4 1 6 the scores for the poses found inside the F-pocket were compared to those found outside the F-pocket, 4 1 7 it was also seen that these scores did not favour the F-pocket (data not shown). Thus, although the 4 1 8 complex algorithms developed for AutoDockFR have been shown to improve the success of docking 4 1 9 the difficulties of using docking to investigate these challenging HLA-ADR cases as in general 4 4 7 docking for new ADRs will be performed, for example, on structures that have a peptide already 4 4 8 bound but not the drug that is to be docked. Docking Carbamazepine with B1502_m using SwissDock, the poses were predicted to sit in the D-4 5 6 pocket previously identified as of interest. Looking at the docking results (Fig 8a), it can be seen that 4 5 7 Carbamazepine is predicted to bind in the D-pocket of B*15:02, close to Leu156, identified by the 4 5 8 study as important, with only one pose predicted out of this pocket. Using AutoDockFR, the same 4 5 9 general pattern was seen with the D-pocket generally being favoured for B*15:02 (Fig 8b). Using The purpose of this exercise was to compare multiple docking programs to assess their performance 4 7 2 with these challenging HLA-ADR cases. We used the Abacavir example as the benchmark for the 4 7 3