Informatics investigations into anti-thyroid drug induced agranulocytosis associated with multiple HLA-B alleles

Introduction Adverse drug reactions have been linked with HLA alleles in different studies. These HLA proteins play an essential role in the adaptive immune response for the presentation of self and non-self peptides. Anti-thyroid drugs methimazole and propylthiouracil have been associated with drug induced agranulocytosis (severe lower white blood cell count) in patients with B*27:05, B*38:02 and DRB1*08:03 alleles in different populations: Taiwanese, Vietnamese, Han Chinese and Caucasian. Methods In this study, informatics methods were used to investigate if any sequence or structural similarities exist between the two associated HLA-B alleles, compared with a set of “control” alleles assumed not be associated, which could help explain the molecular basis of the adverse drug reaction. We demonstrated using MHC Motif Viewer and MHCcluster that the two alleles do not have a propensity to bind similar peptides, and thus at a gross level the structure of the antigen presentation region of the two alleles are not similar. We also performed multiple sequence alignment to identify polymorphisms shared by the risk but not by the control alleles and molecular docking to compare the predicted binding poses of the drug-allele combinations. Results Two residues, Cys67 and Thr80, were identified from the multiple sequence alignments to be unique to these risk alleles alone. The molecular docking showed the poses of the risk alleles to favour the F-pocket of the peptide binding groove, close to the Thr80 residue, with the control alleles generally favouring a different pocket. The data are thus suggestive that Thr80 may be a critical residue in HLA-mediated anti-thyroid drug induced agranulocytosis, and thus can guide future research and risk assessment.


Introduction
Adverse drug reactions have been linked with HLA alleles in different studies. These HLA proteins play an essential role in the adaptive immune response for the presentation of self and non-self peptides. Anti-thyroid drugs methimazole and propylthiouracil have been associated with drug induced agranulocytosis (severe lower white blood cell count) in patients with B*27:05, B*38:02 and DRB1*08:03 alleles in different populations: Taiwanese, Vietnamese, Han Chinese and Caucasian.

Methods
In this study, informatics methods were used to investigate if any sequence or structural similarities exist between the two associated HLA-B alleles, compared with a set of "control" alleles assumed not be associated, which could help explain the molecular basis of the adverse drug reaction. We demonstrated using MHC Motif Viewer and MHCcluster that the two alleles do not have a propensity to bind similar peptides, and thus at a gross level the structure of the antigen presentation region of the two alleles are not similar. We also performed multiple sequence alignment to identify polymorphisms shared by the risk but not by the control alleles and molecular docking to compare the predicted binding poses of the drug-allele combinations.

Results
Two residues, Cys67 and Thr80, were identified from the multiple sequence alignments to be unique to these risk alleles alone. The molecular docking showed the poses of the risk alleles to favour the F-pocket of the peptide binding groove, close to the Thr80 residue, with the control alleles generally favouring a different pocket. The data are thus suggestive that Thr80 may be a critical residue in HLA-mediated anti-thyroid drug induced agranulocytosis, and thus can guide future research and risk assessment. PLOS  Introduction Adverse drug reactions have been linked to Human Leukocyte Antigens (HLA) in multiple different studies, where an individual carrying a specific risk allele has a higher risk of developing a reaction to that drug, including skin conditions like Stevens-Johnson syndrome and toxic epidermal necrolysis (SJS/TEN) and drug induced liver injury [1][2][3]. HLA proteins play a role in the adaptive immune response, presenting peptides to the T-cell receptors. Non-self peptides are then recognised and elicit an immune response where appropriate [4]. Occasionally, unnatural interaction of drugs during this process results in an adverse drug reaction. The role of HLA in these adverse drug reactions has been hypothesised in three main ways: the Hapten model, the Pharmacological Interaction model and the Altered Peptide Repertoire model. The Hapten model predicts the drug binds covalently to a self-protein and is processed via HLA molecules; this drug-protein combination is presented and recognised as being nonself, initiating an immune response [5]. The Pharmacological Interaction (PI) model predicts the drugs bind directly to the TCR or via the formation of HLA-drug complexes which activate T cells and thus initiate an immune response without the need for a specific peptide ligand [6]. Except where noted below, the results presented here work under the assumption that the drugs we are investigating here will follow the Altered Peptide Repertoire model, with the drug interacting non-covalently with the HLA molecule directly within the antigen presentation site. This leads to a difference in the self-peptide set that is presented to the T-cell receptors and thus, initiating an immune response [5]. For more details on the different mechanisms, and schematic diagrams, see Yun et al. (2016). Currently, the most widely investigated HLA-ADR association is that of abacavir with B � 57:01. For this association, the crystal structure of the drug bound in complex with the risk allele is available [7,8]. Illing et al. [7] and Ostrov et al. [8] have demonstrated the Altered Peptide Repertoire model with high confidence for abacavir, including the crystal structure for abacavir bound in the peptide binding groove of the associated risk allele, B � 57:01, along with proteomics evidence for different peptides being presented in the bound and unbound cases [7]. The peptide binding groove of the HLA is a long hydrophobic cleft formed between the αhelices and β-sheet platform. This cleft is much larger than the naturally involved binding sites that proteins have for small organic molecules. The peptide binging groove contains six subsites (Figure D in S1 File). The size and stereochemistry of the subsites are determined by the polymorphic residues along the cleft [9]. The specificity of peptide binding is determined, in part, by the interactions between anchor residues on the peptide side chains at two or more of these subsites [10].
Anti-thyroid drugs are used to treat hyperthyroidism as they normalise thyroid function through binding to the thyroid peroxidase enzyme [11]. These drugs are thioamides containing a thiocarbonyl group and a thiourea moiety within a heterocyclic structure. The common agents used are methimazole, carbimazole and propylthiouracil [12]. Agranulocytosis has been defined as absolute neutrophil count below 500/μl of blood and includes not only neutrophil count but also the absolute number of eosinophils, basophils and mast cells [13]. Patients with severe neutropenia are likely to experience infections which may be life-threatening or even fatal. The mechanism of anti-thyroid induced agranulocytosis is through either direct toxicity or immune-mediated toxicity [13]. The incidence of agranulocytosis in England and Wales has been estimated at around 7 cases per million people per year, with an adjusted odds ratio for neutropenia of 34.7 for users of thyroid inhibitors (i.e. anti-thyroid drugs) [14]. Most cases of agranulocytosis are idiosyncratic reactions to drugs or their metabolites, including anti-thyroid drugs. Other causes include splenic sequestration, nutritional deficiencies, infections, immune neutropenia, haematological disease and primary congenital or chronic neutropenia [15]. An increased risk of agranulocytosis has been associated with anti-thyroid drugs carbimazole, methimazole and propylthiouracil for three different HLA alleles in Asian and Caucasian populations, these associations are shown in Table 1.
The structures of methimazole and propylthiouracil are shown in Figure F in S1 File. It can be seen that these two associated drugs share a common thiocarbonyl group, which is also seen in the experimental anti-thyroid drugs. The associated anti-thyroid drugs share a common target, thyroid peroxidase, in their normal mechanism of action [20]. A study by Pradhan et al. conducted molecular docking of methimazole and propylthiouracil with thyroid peroxidase. The results of this study predicted both drugs to bind in the same position, with the sulphur group forming a hydrogen bond with Arg491 of the thyroid peroxidase, resulting in inhibition of thyroid hormone production [11]. As these drugs show similar binding to the same target, it is reasonable to hypothesise that both drugs might also have a shared mechanism for the adverse drug reaction, going some way to explaining why multiple drugs have been associated with the same alleles. It can be noted that methimazole is more commonly associated than propylthiouracil, although neither drug has been studied in isolation [16,17,19]. In the He et al. study, there were 26 methimazole agranulocytosis cases compared to 3 for propylthiouracil. For the Chen et al. study, of the 42 thioamide-induced agranulocytosis cases, it was stated that 9 of these were patients taking carbimazole, 9 propylthiouracil and 23 methimazole. For the Hallberg et al. study, 39 cases were induced by anti-thyroid agents, of these 29 were methimazole, 5 carbimazole and 5 propylthiouracil. Although all three anti-thyroid drugs have been incorporated into the association studies, our study focuses on the associations seen with methimazole and propylthiouracil. Carbimazole is the pro-drug of methimazole, responsible for the antithyroid activity, and has a short half-life of 5.3-5.4 hours with peak plasma concentrations of methimazole being present after 1 or 2 hours [21,22]. We are working under the assumption that the mechanisms involved in the adverse drug reaction follow the altered peptide repertoire model, the active drug methimazole would be most likely to interact with the HLA during this process and so carbimazole is excluded from this analysis. B � 38:02 has a very similar sequence to B � 38:01 with only one residue difference (B � 38:02 Thr80Ile B � 38:01). It therefore is highly possible that B � 38:01 could also be a risk allele in this context. B � 38:02 (Figure G in S1 File) is most commonly found in Asian populations (0.69%-6.6%, frequencies taken from Allele Frequency Net Database (AFND) [23]), with Caucasian (0.003-0.2%) and African populations (0.007-0.06%) having much lower frequencies. B � 38:01 is more commonly found in Caucasian populations (0.6-6.7%), compared to Asian populations (0.15-1.0%) and therefore allele frequencies for B � 38:01 in the Asian populations studies are likely not detected ( Figure H in S1 File). The Hallberg et al. study (which found B � 27:05 to This study showed poses favouring both the B-and F-pockets along the peptide binding groove. Four suspected key residues were identified: Cys67, Asn77, Thr80 and Thr123. Their study focussed on the associations seen in Taiwan individuals and therefore does not include investigations into the B � 27:05 alleles seen in Caucasian populations. Due to the differences in general structure between Class I and Class II HLA, it is difficult to compare docking predictions between the two allele classes. The B � 38:02 and B � 38:01 alleles were modelled using the same template structures, with the DRB1 � 08:03 structure modelled using only DRB1 � 01:01 as a template. The quality of the models used for molecular docking can impact the docking results and therefore the selection of templates is very important.
The purpose of this study is to investigate the associated alleles, in particular looking for commonalities between the two associated HLA-B alleles in order to look for similarities in these alleles that might shed light on the underlying mechanisms of the adverse drug reactions seen.

Sequence and structural analysis
In order to conduct a reliable analysis between cases and controls, the control alleles must first be carefully selected to ensure they can be reliably assumed to be non-associated. The case and control frequencies from He et al. [16], Chen et al. [17] and Hallberg et al. [19] were compared alongside healthy individual frequency data obtained from AFND [23]. Alleles where the study control allele frequency or healthy individual frequencies sourced from AFND were similar to or greater than the case allele frequencies were selected as controls (S1 File, i. Control allele selection). These alleles could safely be assumed to be non-associated as they do not show enrichment in the case groups.
Structural differences were assessed between the risk and selected control alleles. Firstly, the peptide binding regions were compared, this is likely where the drug binding would occur and so differences here would be important for the mechanism of interaction involved in the adverse drug reaction. The peptide binding regions were compared using MHC motif viewer [24] and MHCcluster [25]. MHC motif viewer was used to compare the predicted binding motifs for the B � 27:05 and B � 38:02 risk alleles as well as the B � 38:01 possible risk and the selected control alleles. MHCcluster was similarly used to compare the global similarities of peptide binding predictions for the risk, possible risk and selected control alleles. Both the MHC motif viewer and MHCcluster post-process NetMHCpan scores to give predictions of motifs or similarities between peptides. NetMHCpan uses artificial neural networks to predict the peptide binding of HLA molecules based on IEDB experimental data of peptides known to be presented by given alleles, including data for the binding of peptides to B � 27:05 and B � 38:01 [26,27]. Therefore, the predictions generated will be based largely on experimental data.
The protein molecules were further compared using multiple sequence alignment to view differences across the whole protein and also individual residue changes within sub-pockets along the peptide binding groove between the risk and control alleles. Firstly, the sequences for the risk, possible risk and control alleles were aligned. The alignments were then extended to look at common alleles selected from AFND and NMDP (National Marrow Donor Program) [19,23] Asian and Caucasian populations and extended further again to consider common alleles found in all populations. When considering the common alleles, it is important to note that the definitive risk profile (association status) of these alleles is unknown. Although, due to the high frequency and the fact that they have not been seen to be associated with agranulocytosis, it can be assumed that these alleles are likely not associated. These multiple sequence alignments allow us to look for residue similarities unique to the risk alleles and therefore identify residues which may be involved in the mechanism of binding for the adverse drug reactions.

Molecular docking
Molecular docking was used to compare the predicted binding sites between risk and control alleles. Crystal structures were obtained from the PDB database where available. For those alleles where the crystal structures are not available, the structures were predicted using Modeller [28,29]. Crystal structures, given the suffix '_S', were available for B � 27:05 (1OGT [30]), B � 15:01 (1XR9 [31]) and B � 51:01 (1E27 [32]). Homology models, given the suffix '_M', were created for B � 38:02, B � 38:01, B � 40:06, B � 46:01 and B � 54:01 (Table B in S1 File, S1 Models). Structures and sequences of three similar alleles were obtained searching the PDB database using BLAST-P [33]. The structures for these similar alleles, with high sequence identity (e.g. 95-98% identity for the B � 38:02 templates), were then used as templates for Modeller. Target and template sequences were aligned with ClustalX [34] and ten models were made for each structure, using Modeller 9.9 automodel class [35]. The model with the lowest objective function was chosen for the docking. Table B in S1 File summarises the structures and models obtained. Drug structures for methimazole and propylthiouracil were both obtained from the PDB database (5FF1 [36] and 5HPW [37] respectively).
AutoDockFR [38] was used, with default parameters, to dock both methimazole and propylthiouracil with the B � 27:05 and B � 38:02 risk alleles, B � 38:01 possible risk allele and the selected control alleles. Structural PDBQT files were prepared using AutoDock Tools [39] for both the alleles and drug structures. AutoGrid [40] was used to map the target allele structures and select grid points in order to search both the peptide binding groove (PBG) and the top three largest pockets (Top3). AutoGrid is a graphical user interface for specifying the docking box. This was used to identify pockets within the target structures, the pockets of interest were then selected and the search box calculated to encompass these pockets. The box sizes and positions were therefore automatically generated, centred around the areas of interest. The top three largest pockets were selected by pocket volume. Searching the largest pockets on the protein increases the search space to cover more of the protein and allows an alternate binding pose away from the peptide binding region. This helps identify if the peptide binding groove is in fact the most favourable binding region, ensuring the most favourable poses are obtained. A total of 10 poses were obtained for each allele for each drug and each search space, resulting in 20 poses per allele for each drug. CSM-lig [41] was used to compare the predicted binding affinities of each pose for each drug-allele combination. These predictions were used to compare the binding affinities between poses seen in the B-pocket and the F-pocket in risk and control alleles. These predicted results were used alongside the docking scores from Auto-DockFR to help identify the most favourable pocket for each of the drug-allele combinations. Further analysis was completed conducting 100 runs for each drug-allele combination (S1 File, ii. Docking analysis), this allowed further analysis of the favouring of binding pockets. These poses were then automatically assigned positions (i.e. binding in B or F pocket) using k-mean analysis and used to visualise the favouring of binding poses for each drug-allele combination. Visual inspection of the docking poses showed two distinct groups within the B and F pockets ( Figure B in S1 File), therefore k was defined as 2. The co-ordinates of the C1 atom of each pose were used for the clustering, allowing clustering of dissimilar ligands with differing number of atoms. LigPlot [42] was used to visualise the interactions for each of the lowest scoring poses for each drug-allele combination, in order to compare the residues involved with binding.
In order to investigate how the size and structure of the ligands and pockets could be having an impact on the molecular docking results, structurally similar molecules were docked to each of the risk and control alleles. The PDB database was searched for ligands with �50% similarity to methimazole and propylthiouracil, using the chemical components search to compare chemical descriptors of ligands within the chemical component dictionary [43]. Four ligands were selected: MZY (1,3-dihydroimidazole-2-thione) and TUL (2-thioxo-2,3-dihydropyrimidin-4(1H)-one) have been used as experimental antithyroid drugs and include a thiocarbonyl group, DMI (2,3-Dimethylimidazolium Ion) and EV0 (2-amino-6-propylpyrimidin-4(3H)-one) do not include the thiocarbonyl group, (Figure F in S1 File). These structures were prepared in the same way as the associated anti-thyroid drugs, using AutoDock Tools [39]. AutoDockFR [38] was used to dock the non-associated compounds with B � 27:05 and B � 38:02 risk alleles, B � 38:01 possible risk allele and selected control alleles. The binding poses of these were then compared to those of the associated drugs, in order to deduce if the size and structures of the ligands and pockets could be having an impact on the molecular docking results.

Sequence and structural analysis
We firstly investigated whether allele frequencies for controls were representative of larger populations available in similar regions. This allows us to test for potential biased sampling in source studies, especially as controls have been combined from different countries, as well as to determine our own control (non-associated) alleles for further comparison. Case and control frequencies were calculated from the data provided for the He et al. and Hallberg et al. studies [16,19]. Alleles with study control frequency over 3% were investigated. Five alleles were selected to be used as controls: B � 15:01, B � 40:06, B � 46:01, B � 51:01 and B � 54:01. It can be reasonably assumed that these alleles are non-associated alleles based on the frequency data of both the Han Northern China and European populations ( Figure A in S1 File). The control allele selection is covered in more detail within S1 File (i. Control allele selection).
The MHC Motif viewer [24] displays the preference for given HLA alleles to bind peptides with amino acids at given positions within an n-mer peptide e.g. 9mer for HLA class I. The motifs have been generated via the NetMHCpan [26] prediction method being run over a large selection of natural peptides, which has been trained originally with experimental data (peptides presented by given HLA alleles) from the IEDB database [27]. While the predicted motifs cannot give us a direct measure of the likelihood of a drug to bind in the cleft of a given HLA allele, they can be indicative of whether different alleles share similar peptide binding regions. If the hapten hypothesis of HLA-mediated ADRs was true for anti-thyroid drugs, then the peptide binding ability of associated alleles might be related, with the caveat that drugs binding to peptides would likely change their affinity for particular alleles. Comparing the peptide binding motifs from MHC Motif Viewer (Figure J in S1 File) it can be seen that although, as expected due to the high sequence similarity, the B � 38:02 and B � 38:01 alleles show very similar binding motifs, the B � 27:05 allele shows a very different motif. This shows that there are differences between the favoured peptides and thus the peptide binding grooves of the B � 27:05 and B � 38:02 risk alleles. A similar observation can be made from the output for MHCcluster (Fig 1), with differences seen between B � 27:05 and B � 38:02. MHCcluster is also based on predictions from random human peptides processed by NetMHCpan and generates a distance measure between the peptide binding specificity scores generated by the software to create a tree representation. Similar to MHC Motif Viewer, the results can tell us about overall relatedness of predicted peptide binding by different alleles, but not directly about the likelihood of drug binding. From the tree-based output, it can be seen that the B � 38:02 and B � 38:01 alleles show very similar clustering, whereas the B � 27:05 allele shows clustering differing from this and is not more closely related to the other risk allele than any of the selected control alleles. We can conclude therefore that B � 38:02 and B � 38:01 alleles bind highly similar pools of peptides, whereas B � 27:05 binds a largely distinct set of peptides. More generally, the differences seen between the favoured peptides of the risk alleles allows us to conclude that these alleles are not structurally similar.
Multiple sequence alignments were used to investigate individual residue differences between alleles. Comparing the B � 27:05 and B � 38:02 risk alleles with the B � 38:01 possible risk and selected controls, it can be seen that there are two residues which are seen to be unique to the risk alleles: Cys67 and Thr80 (Fig 2). These are both residues that were identified as potentially important in the Chen et al. study [17]. When the comparisons were extended to look at common alleles, obtained from searching AFND for Caucasian and Asian populations and also common alleles found in the NMDP database [19] (Figure K in S1 File), a similar pattern can be seen. These two residues are rarely found in these common alleles, although it must be noted that the association status of these common alleles is unknown.

Molecular docking
Methimazole. Methimazole was docked with each of the risk, possible risk and selected control alleles, using AutoDockFR [39], in order to compare the favourable binding poses in each case. Table 2   It has previously been shown that the process and parameterisation of homology modelling may have an impact on the molecular docking results, compared to docking within a crystal structure [44]. It can be seen here that the control alleles favouring the F-pocket are generally the modelled structures, with the B � 15:01 and B � 51:01 known crystal structures showing favouring of the B-pocket. Similarly, B � 38:02 and B � 38:01 are both modelled structures. Due to the very high sequence similarity between these alleles, it may be that these alleles show similar binding due to the similarity of the modelled structures. Two alleles that differ little will often show similar models, although larger differences would likely have an effect. These models were shown to be structurally similar with an RMSD of 0.460Å (221 to 221 atoms).
Propylthiouracil. Both methimazole and propylthiouracil have been associated with drug induced agranulocytosis with B � 27:05 and B � 38:02. Propylthiouracil was therefore also docked with the risk, possible risk and selected control alleles using AutoDockFR [39]. The docking results were then compared between alleles and with the methimazole results to identify difference in favourable binding poses between the drug-allele combinations. Table 3 Figure E in S1 File) comparing the predicted binding affinities, this may be due to a weaker association being seen than with the abacavir drug-allele associations, resulting in less favourable binding. We next expanded the docking analysis to 100 runs to explore whether further trends in poses found for B and F pockets could be observed (Fig 5). From this we can see consistent favouring of the F pocket for the risk alleles. Although the docking is not able to fully distinguish between the risk and controls, with the controls showing mixed favouring of the B and F pockets, it is evident that docking against the risk alleles strongly favours binding in the F pocket for each of the drugs.
Similar molecules to the investigated methimazole and propylthiouracil were identified: MZY, TUL, DMI and EV0. The similar molecules were docked to each of the risk and control alleles in order to deduce if the size and structures of the ligands and pockets could be having an impact on the molecular docking results ( Figure F in S1 File). From these investigations, it was found that the similar molecules that have been used as experimental anti-thyroid drugs and included the thiocarbonyl group (MZY and TUL), showed similar binding patterns to the associated anti-thyroid drugs with the risk alleles favouring the F-pocket. The possible risk B � 38:01_M allele can be seen to favour the B-pocket by scores but the F by number of poses for MZY and favour the F for TUL (Table C in S1 File). Control alleles B � 15:01_S and B � 46:01_M both favour the B via lower scores and more poses for both drugs. With control B � 51:01_S also favouring the B for TUL considering scores but not poses. For the other   Figure N in S1 File. Table 4 summarises the number of poses making hydrophobic interactions with position Thr80 of both risk alleles for each of the investigated drugs. It can be seen that for B � 27:05 the associated drugs often make Thr80 interactions with interactions with the thiocarbonyl group being most favourable. The experimental drugs show similar interactions with Thr80 as those  [17]. This adds strength to the hypothesis that the Thr80 position could be involved in the mechanism here, especially for B � 27:05. The structure of the ligands, mainly the 'S' group found in the associated and experimental anti-thyroid drugs, could therefore be potentially important for the predicted binding poses seen here.

Discussion
The purpose of this study was to investigate the associations seen between HLA and anti-thyroid alleles, focusing on the commonalities seen between the HLA-B associated alleles, to Informatics investigation of anti-thyroid drug induced agranulocytosis identify a potential shared mechanism. This was done through comparing the peptide binding regions, including the whole peptide binding groove and specific residue changes alongside the predicted binding poses of the drugs with each of the risk and control alleles. It was found that the risk alleles favour different peptides and so we can conclude that the gross structures of their peptide binding grooves are rather dissimilar. However, when the multiple sequence alignments were used to focus on specific residue changes, it could be seen that two residues were found to be unique to the risk alleles. These two residues, Cys67 and Thr80, were therefore considered to be potentially important for the mechanism of action for the adverse drug reactions seen, confirming the results of a previous study by Chen et al. where the Cys67 and Thr80 were identified, amongst others, as potentially important for the binding of the associated drugs with the risk alleles [17]. From the results of the molecular docking, the risk alleles were shown to favour the F-pocket for both drugs. This pocket lies alongside the Thr80 residue which was identified as potentially important for the mechanism of action. It was seen that the Thr80 interacts hydrophobically with the thiocarbonyl group of both associated drugs in B � 27:05, with similar interactions also being seen, but to a lesser extent, with B � 38:02. The residue at position 80 of the control alleles was seen to only make interactions with the methimazole for B � 54:01, although these were different interactions from those seen for the risk alleles. For propylthiouracil, this position made interactions with the molecule for three of the five control alleles, although these were again seen as different interactions to the risk, with the B � 38:01 possible risk showing similar interactions to the B � 27:05 risk allele. It is therefore reasonable to conclude that Thr80 could be involved in the mechanism of interaction, if only indirectly by influencing the conformations of the surrounding residues located around the Fpocket (Figure Q in S1 File).
Although the docking results for the risk alleles showed consistent results, with the risk alleles always favouring the F-pocket, the predicted poses for the control alleles showed some variation with some drug-allele combinations favouring the B-pocket and some favouring the F-pocket. Docking has previously been shown to be imperfect when considering these complex HLA cases [44], it is therefore understandable that the docking was unable to distinguish completely between risk and control for this case. Docking results are commonly reported as the most favourable pose, here we show a comparison of multiple runs as well as comparisons with selected control alleles. The risk alleles are shown to continuously favour the F-pocket over many runs, providing evidence that this is more likely position of binding. The variation  No Thr80  Thr80-S  Other Thr80  Most favourable pose  No Thr80  Thr80-S  Other Thr80  Most favourable pose   MMZ  5  4  1  Thr80-S  10  0  0  No Thr80   PTU  3  5  2  Thr80-S  5  2  3  Other Thr80   MZY  1  2  7  Other Thr80  9  0  1  No Thr80   TUL  5  2  3  Thr80-S  10  0  0  No Thr80   DMI  9  0  1  No Thr80  10  0  0  No Thr80   EV0  10  0  0  No Thr80  5  0  5 Other seen between controls could be due to a number of factors including the homology modelling of the protein structures. As discussed in a previous study [44], homology modelling can impact the docking performance and inaccurate models could result in inaccurate docking results. Here, the similar alleles B � 38:02 (risk) and B � 38:01 (possible risk), with one mutated residue at position 80, were docked with both of the associated drugs. These alleles showed similar binding patterns with both alleles generally favouring the F-pocket. Since we have concluded here that the Thr80 could potentially be involved in the mechanism of action, it would be expected that the B � 38:01 would show different docking results to the risk alleles as this allele does not possess a Threonine residue at position 80. However, both the B � 38:02 and B � 38:01 structures were obtained through homology modelling. This could impact the results of the molecular docking as although the mutation has been modelled, it may not have been accurately represented here and the similar structures could produce similar models. The results seen here confirm and build on those seen in the Chen et al. study [17], with Thr80 being identified as important for the mechanism of the adverse drug reaction. The Chen et al. study conducted docking of B � 38:02, B � 38:01 and DRB1 � 08:03 with methimazole and propylthiouracil. Models were created for B � 38:02 and B � 38:01 using the same five templates (A � 24:02, C � 08:01 and A � 02:01 along with mouse MHC and human HLA-E) and the model for DRB1 � 08:03 created using one template (DRB1 � 01:01). The templates used for the B alleles show similarities of 83-86% with B � 38:02 and the DRB1 � 01:01 template shows 92% identity with DRB1 � 08:03. In this study, the B � 38:02 model was created using templates with 95-98% identity with differing templates with similar identity being used for the other modelled structures created. Model selection is a very important aspect of molecular docking and can greatly impact the docking results seen [44]. This study was able to recreate similar docking results seen previously for B � 38:02 and B � 38:01, using our own modelled structures, whilst also incorporating docking results for the B � 27:05 risk allele and comparisons with selected control alleles. Here, we also went further to investigate the structural differences between the associated risk alleles and selected controls, through sequence alignments and comparison of binding motifs.
In order to further our understandings of the mechanisms involved here and to test the hypothesis that the Thr80 is important for binding, it would be interesting to investigate B � 38:01 further as this is a good potential biological control due to its similarity with the associated B � 38:02. For this, further association studies to confirm the association status of B � 38:01 would be needed. Structural analysis through crystal structures would be needed to confirm the binding predictions of the drug allele combinations investigated and therefore the involvement of Thr80 in the predisposition to this adverse drug reaction.