A potential implication of UDP-glucuronosyltransferase 2B10 in the detoxification of drugs used in pediatric hematopoietic stem cell transplantation setting: an in silico investigation

Sinusoidal occlusion syndrome (SOS) is a potentially severe complication following hematopoietic stem cell transplantation (HSCT) in pediatric patients. Treatment related risk factors such as intensity of conditioning, hepatotoxic co-medication and patient related factors such as genetic variants predispose individuals to develop SOS. The variant allele for SNP rs17146905 in UDP-glucuronosyl transferase 2B10 (UGT2B10) gene was correlated with the occurrence of SOS in an exome-wide association study. UGT2B10 is a phase II drug metabolizing enzyme involved in the N-glucuronidation of tertiary amine containing drugs. To shed light on the functionality of UGT2B10 enzyme in the metabolism of drugs used in pediatric HSCT setting, we performed in silico screening against custom based library of putative ligands. First, a list of potential substrates for in silico analysis was prepared using a systematic consensus-based strategy. The list comprised of drugs and their metabolites used in pediatric HSCT setting. The three-dimensional structure of UGT2B10 was not available from the Research Collaboratory Structural Bioinformatics - Protein Data Bank (RCSB - PDB) repository and thus we predicted the first human UGT2B10 3D model by using multiple template homology modeling with MODELLER Version 9.2 and molecular docking calculations with AutoDock Vina Version 1.2 were implemented to quantify the estimated binding affinity between selected putative substrates or ligands and UGT2B10. Finally, we performed molecular dynamics simulations using GROMACS Version 5.1.4 to confirm the potential UGT2B10 ligands prioritized after molecular docking (exhibiting negative free binding energy). Four potential ligands for UGT2B10 namely acetaminophen, lorazepam, mycophenolic acid and voriconazole n-oxide intermediate were identified. Other metabolites of voriconazole satisfied the criteria of being possible ligands of UGT2B10. Except for bilirubin and 4-Hydroxy Voriconazole, all the ligands (particularly voriconazole and hydroxy voriconazole) are oriented in substrate binding site close to the co-factor UDP (mean ± SD; 0.72 ± 0.33 nm). Further in vitro screening of the putative ligands prioritized by in silico pipeline is warranted to understand the nature of the ligands either as inhibitors or substrates of UGT2B10. These results may indicate the clinical and pharmacological relevance UGT2B10 in pediatric HSCT setting. With this systematic computational methodology, we provide a rational-, time-, and cost-effective way to identify and prioritize the interesting putative substrates or inhibitors of UGT2B10 for further testing in in vitro experiments.


Background
Sinusoidal obstruction syndrome (SOS) or venoocclusive disease is one of the complications associated with hematopoietic stem cell transplantation (HSCT) that occurs in 22-30% of pediatric HSCT patients [1]. Severe forms of SOS may lead to increased mortality rates reaching 80% [1]. The diagnosis of SOS post HSCT is based on Seattle, Baltimore or European society for blood and marrow transplantation (EBMT) criteria comprising important parameters such as increased bilirubin levels, weight gain, hepatomegaly and hemodynamical and/or ultrasound evidence of SOS [2]. In the context of HSCT in children, pre-existing liver disease, age, iron overload, and genetic variants, such as glutathione Stransferase A1 (GSTA1) promoter diplotypes, CTH genetic variant (rs1021737) [3] or GSTM1 null genotypes are few of the important known patient related risk factors for developing post HSCT SOS [3][4][5][6][7]. In addition, treatment-related factors, such as high-intensity conditioning regimens comprising of two or more alkylating agents including busulfan, or the co-administration of potentially hepatotoxic prophylactic drugs such as methotrexate and cyclosporine can also contribute to the increased risk of developing SOS [4]. Exposure to these hepatotoxic injuries elicits a more permeable endothelium, permitting the entry of blood cells in the Disse space. Once a pro-inflammatory environment is created, cellular debris accumulate in hepatic blood vessels. This cascade of events leads to the reduction of venous outflow and finally obstruction of centrilobular veins [8,9]. Patients' individual germ-line genetic polymorphisms can increase the risk of treatment related complications [4,5]. In search of predictive genetic variants for increased risk of post HSCT SOS development in children, our group recently conducted and exomewide association study (EWAS) that identified three variants rs17146905, rs16931326, rs2289971 in UGT2B10, BHLHE22, and KIAA1715 genes, respectively [10]. The three independent associations from EWAS were validated in a separate cohort after adjustment with the known risk factors of SOS [10]. Multivariate analysis performed in a replication cohort confirmed the important implication of 2 of those SNPs in the occurrence of SOS: KIAA1715 single nucleotide polymorphism (SNP) (rs16931326), coding for lunapark protein, involved in the formation of the endoplasmic reticulum and a potential drug metabolizing enzyme gene UGT2B10 SNP (rs17146905) [10]. The latter is located in 3′-untranslated regions (3′-UTR) of UGT2B10 which might affect its gene expression by regulating the messenger RNA (mRNA)-related processes, such as localization and stability of mRNA, or by directly modulating protein conformation inducing a potential change in the enzyme activity [11]. Interestingly, UGT2B10 expression and activity were demonstrated to be regulated by the micro-RNA (miRNA) miR-216b-5p, which binds to specific target miRNA recognition element located in the 3′-UTR part [12], comprising SNP rs139538767, which is known to alter the UGT2B10 gene regulation. However, the relationship between this SNP and the altered protein function is yet to be elucidated. It is interesting to note that the UGT2B10 SNP (rs17146905) is in strong linkage disequilibrium with the non-synonymous SNP rs61750900 located in exonic region (R 2 ≥ 0.8), in American and European populations (Data not shown). The latter genetic variant might also have an influence on UGT2B10 metabolizing capacity. Furthermore, the role of UGT2B10 in the metabolism of drugs used in the HSCT setting is not clearly known.
UGT2B10 belongs to a group of UDPglucuronosyltransferases (UGTs, EC 2.4.1.17), which are phase II conjugating drug metabolizing enzymes catalyzing the glucuronidation of endogenous compounds and xenobiotics [13]. The reaction involves the conjugation of the glucuronic acid group of UDP-glucuronic acid (UDPGlcA) to a nucleophilic acceptor (oxygen, nitrogen, or sulfur) that could be a rate limiting step in clearance of drugs/xenobiotics. Based on amino acid sequence similarities and identities, UGTs in humans are divided into four families (1,2,3 and 8), but glucuronidation reactions are catalyzed by UGT1 and UGT2 families comprising a total of 19 isoforms [14]. The UGT2 family is divided into two subgroups: UGT2A and UGT2B, comprising 3 and 7 isoforms, respectively. UGTs are ubiquitous proteins, principally expressed in the liver, the gastrointestinal tract, and the kidneys [15]. The crucial implication of genetic variants altering UGTs' function and consequently in the metabolism of drugs including anti-cancer therapies is well-known [16].
UGT2B10 is one of the liver specific UGT isoforms that has been shown glucuronidation activity similar to that of several other isoforms [17]. However, protein quantification experiments using LC-MS/MS from limited number of human (adults) liver microsomal preparations indicated relatively lower abundance of UGT2B10 compared to other UGTs [18]. Similar to UGT1A3 and UGT1A4, UGT2B10 catalyzes N-glucuronidation of endogenous and exogenous compounds, with a higher affinity towards tertiary aliphatic and heterocyclic amines [19]. Interestingly, UGT1A4 and UGT2B10 have common substrates such as antipsychotic and antidepressant drugs; for e.g. cyclobenzaprine, mirtazapine or clozapine [20]. As the UGT2B10 isoform is not known to catalyze O-glucuronidation, it was often considered as an orphan enzyme after the unsuccessful screening of large panels of substrates. However, in the last decade, after the demonstration of the primary role of UGT2B10 in the glucuronidation of tobacco-related nitrosamines [21], many other substrates were discovered. Examples of such ligands include antidepressants, antifungals, and chemotherapeutics; e.g. fluconazole, midazolam or imatinib [19,22] as well as arachidonic and linoleic acid metabolites [23]. An exhaustive list of UGT2B10 drug ligands can be found in Additional file 1.
Other members of the UGT enzymatic family (UGT1A1, 1A3, 1A6, 1A7, and 1A10) were demonstrated to be involved in the metabolism of drugs such as acetaminophen, lorazepam or deferasirox used in pediatric HSCT setting [24]. However, to the best of our knowledge, the evidence is lacking for the role of UGT2B10 in the metabolism of potential hepatotoxic drugs and/or their metabolites used in the pediatric HSCT setting. Regarding its role in the detoxification of a large panel of drugs, the altered activity of UGT2B10 due to drug-drug interactions or genetic variants may modulate the exposure to hepatotoxic amine-compounds used in HSCT setting administered along with a busulfan, known for its hepatotoxic effect. Besides, some compounds may also interact with UGT2B10 as enzyme inhibitors or activators and differentially regulate the Nglucuronidation of toxic compounds. For instance, such interactions were already demonstrated for fluconazole, which is an inhibitor of UGT2B10 [25].
Hence, we aimed to explore the role of UGT2B10 in the detoxification of commonly used medication in pediatric HSCT setting. We attempted to establish an in silico workflow for screening of the ligands using a homology modelled UGT2B10 protein. The potential ligands for further in vitro screening were identified based on results obtained from extensive molecular docking and molecular dynamics simulations.

Results
Generation of the three-dimensional model of UGT2B10 The UGT2B10 protein sequence comprised of 460 amino-acid residues without the signal peptide, transmembrane region and 8 residues surrounding transmembrane region. 250 potential templates or structural neighbors, based on Hidden-Markov model (HMM) profile similarities, were retrieved from HHpred [26,27] and filtered to select four suitable structural templates. Human UGT2B10 model was built with the following related protein structures or templates; UGT51 from S. cerevisiae (PDB ID: 5GL5), UGT76G from S. rebaudiana (PDB ID: 6INF), UGT2B7 from H. sapiens (PDB ID: 2O6L), and oleodamycin glycosyltransferase from S. antibioticus (PDB ID: 2IYA) ( Table 1) were used to build the multi-sequence alignment (MSA). The multiple sequence alignment covering UGT2B10 query sequence of 460 residues is given in Figure 1. Sequence alignment with the members of UGT family revealed putative catalytic residue i.e. D150 which is conserved and another residue H34 which is not conserved similar to that of UGT1A4.
To obtain the final human UGT2B10 homology model, six steps of structure refinement were performed, three with GalaxyRefine [28] followed by three steps with Yasara minimization server [29] (Additional file 2). The reason for using two minimization programs is to identify the optimized and reliable conformation of UGT2B10 for further molecular docking and MD simulation with putative ligands. The UGT2B10 model built using multiple templates had an ERRAT (overall quality factor) [30] score of 93.3% (Figure 2A), a Verify3D (compatibility of an atomic model) [31,32] score of 74.8% ( Figure 2B), and a ProSA (overall stereochemical quality of the protein structure) [33] score of − 8.78 ( Figure 2C). 92.8% of the residues are situated in the favored region of the Ramachandran plot, whereas 1.7% were situated in the outlier region ( Figure 2D). The overall structure comprised 460 amino acids, the secondary structure composed of 35% of alpha helix, and three parallel beta sheets (Additional file 3). This protein structure was further used in in silico virtual screening of selected ligands. The co-factor (UDPGlcA) binding site was predicted by superposition of the UGT2B10 model with the structural templates used for homology or comparative protein modelling. The cofactor binding box had a size of (x,y,z) = (15, 17, 22.5) and center coordinates of (x,y,z) = (− 70,8,3). The grid box of substrate-binding site had a size of (x,y,z) = (22,20,25) and center coordinates of (x,y,z) = (− 70, 7, 15) (Additional file 1) UDPGlcA was successfully bound to the human UGT2B10 model (Fig. 3A), with an estimated free energy of binding (ΔG) of − 7.0 kcal/mol (Kd = 7.3 μM). Hydrogen bonds are formed between the UDPGlcA and residues of UGT2B10 namely Glu8, Tyr9, Ser37, Ser287, Met288, Asn336, His350, and Glu358 ( Figure 3B).

Selection of putative UGT2B10 ligands
Eight drugs and seven metabolites of drugs used in HSCT setting are identified as putative UGT2B10 substrates using a consensus-based approach ( Figure 4, Table 2). The chemical structures of the molecules are presented in the Additional file 5. Three ligands (cyclosporine A, methotrexate, and posaconazole) are tertiary amines, the favored substrates of UGT2B10, and two are described as forming N-glucuronide conjugates. Four ligands have reported liver toxicity. The missing information is due to lack of clinical or in vitro data about the compound metabolism ( Table 2).     [34,35] *From LiverTox classification of the molecules regarding the potential to induce liver injury: A: Well known cause, B: Known or highly likely, C: Probable, D: Possible, E: not believed or unlikely [45]. GvHD: Graft versus host disease; HSCT: hematopoietic stem cell transplantation;N/A: not available data; SOS: sinusoidal occlusion syndrome; UDCA-G1 and UDCA-G2: ursodeoxycholic acid glucuronide conjugate 1 and 2 [44]. The structure of the molecules are presented in the additional file

Molecular docking analysis
The positive control amitriptyline (AMT) had ΔG of − 1.9 kcal/mol, whereas itraconazole (ITZ, chosen negative control based on the available literature [25]) had 19 kcal/mol, indicating that the ligand binding and interaction does not occur spontaneously. These results indicated that our predicted model of UGT2B10 is reliable, therefore has been implemented for the selection of putative UGT2B10 ligands. A negative ΔG value indicates that the reaction is spontaneous, thus that the conformation between the ligand and the enzyme is favorable. The more negative the results, the more affinity between the ligand and protein.
Recently, after we built the homology model of UGT2B10, three-dimensional structure of UGT2B10 was predicted using an Artificial Intelligence approach by deepmind team and was retrieved from the AlphaFold structure database [46]. This model exhibited better structure validation results (Additional file 2) and was utilized for the comparison with our model using molecular docking with experimentally proven substrates / inhibitors of UGT2B10 [25] (chemical structures of the selected molecules are given in Additional file 5). The comparison of the UGT2B10 structure between our model and Alpha Fold model showed a RMSD of 2.98 Å (Additional file 4C). Although, the RMSD value is slightly high, the overall architecture of two in silico models are nearly similar, interestingly, the position and orientation of UDP is also consistent in both models indicating that our model is reliable, and it does not have any geometry or other stereochemical errors as demonstrated using various structure validation tools. However, the secondary structural composition of both the models are slightly different from each other (Additional file 2). Molecular docking with AlphaFold model resulted in negative free binding energies for all the molecules including negative controls (Additional file 6) Whereas our model resulted in negative free binding energies for 4 out of 6 negative controls. No linear correlation between the predicted free binding energy and experimental IC 50 results was observed for our model or the AlphaFold model (Additional file 7). The results clearly indicate that our model has relatively increased sensitivity to detect negative controls compared to that of AlphaFold model. Following molecular docking of ligands with our model, 10 molecules were selected for further MD simulation analyses including the positive control AMT, negative control ITZ, and the complex formed only by the protein and the cofactor UDPGlcA (Table 3, Fig. 5). Moreover, the reason for differences in docking results obtained from two different models may be explained by the differences in the secondary structural composition (Additional file 2). Results are presented as the mean ± SD of three different replicates. Kd: dissociation constant; UDCA-G1 and UDCA-G2: ursodeoxycholic acid glucuronide conjugate 1 and 2 [44]; UDPGlcA: UDP-glucuronic acid. Molecules with ΔG of < −0.1 and with an SD of ≤0.1 Kcal/mol were selected for further for MD simulations (methotrexate was not selected as it has an SD 0.5). SD is calculated from 8 docking poses or models (default option). The ligand binding pose was selected for further analyses is the pose with the lowest free binding energy (Kcal/mol). Bilirubin was selected for further molecular docking simulations as an endogenous negative control to compare our results with other putative ligands. The putative compound with the highest predicted affinity toward UGT2B10 is the voriconazole metabolite voriconazole N-oxide intermediate UK-215,364 (VCZ-N-O intermediate), followed by acetaminophen (APAP), mycophenolic acid (MPA), and lorazepam (LOR) (Table 3, Figure 5). The residues interacting with AMT were consistent with the ones involved in the binding with other putative UGT2B10 substrates, APAP, LOR, MPA and VCZ-N-O intermediate ( Table 4, Figure 6). However, the predicted hydrogen-bonds between protein and various ligands were different. Hydrogen bond with residue Lys59 is predicted to be formed for all the ligands investigated except for BIL. Hydrogen bond interaction and other molecular interactions of ligands with protein residues are given in Table 4.
The cofactor and ligand binding sites are composed principally of positively charged residues, indicating that it may preferably bind to negatively charged functional groups (Additional file 7). Structural superposition of cofactor bounded homology model of UGT2B10 with the crystal structure of UGT76G from Stevia rebaudiana (PDB ID: 6INF) revealed that the cofactor position is similar in both the cases, which indicate that the reliability of homology modeling as well as docking studies with the cofactor (Additional file 8).

Molecular dynamics simulation
The results of the MD simulations are presented in the Table 5. Intramolecular hydrogen bonds were in the same range for all the tested complexes. The structure of UGT2B10-UDPGlcA has the higher average number of intermolecular hydrogen bonds after MD simulations (7.54 ± 2.01), indicating a stronger interaction between the protein and the co-factor, compared to the other tested ligands. The ligand with the highest average number of hydrogen bonds with the protein, is MPA. No hydrogen bond interactions were found in the complexes formed with BIL and 4HVCZ. The complex formed by VCZ-N-O intermediate UK-215,364 and UGT2B10 is predicted to have a good stability with an average root mean squared deviation (RMSD) value of 0.38 nm ± 0.03 and Radius of Gyration (RoG) of 2.3 nm ± 1.19*10 − 2 . For each complex, the part comprising the UDPGlcA binding site was more stable than the substrate binding site when looking at the RMSF values ( Figure 7). Average root mean square fluctuation (RMSF) were consistent between the different complexes (Table 5) Table 5). The structure with the negative control ITZ is predicted to be the least stable regarding the results for RMSD and RoG, whereas the positive control AMT has a similar stability to the putative UGT2B10 substrates. Average SASA (solvent accessible surface areas) values were in the same range for all the complexes. ITZ showed the highest SASA value, indicating that a larger part of the protein is exposed to the solvent (244.84 nm 2 ± 6.86) compared to the other tested molecules ( Table 5). The trace of covariance matrix value was lower for VCZ-N-O intermediate, 4 and HVCZ compared to other complexes, indicating a higher stability for these complexes. Minimum distance analyses revealed that all the ligands are in close proximity (mean ± SD; 0.72 ± 0.33) to UDP except for BIL and 4HVCZ. It is important to note that VCZ and HVCZ exhibited proximity similar to that of AMT (mean ± SD; 0.298 ± 0.024 nm; Additional File 9).

Discussion
In this report, we describe the first homology model for human UGT2B10. In addition, we successfully used this model for in silico protein-ligand interactions to predict putative ligands, providing evidence for the potential interaction of UGT2B10 with drugs and their metabolites used in pediatric hematopoietic stem cell transplantation (HSCT) setting with extensive molecular docking and MD simulation methods.

Homology modelling of human UGT2B10 and docking predictions
As membrane proteins, human UGTs are difficult to crystallize and the few available structures consist only of the co-factor UDPGlcA binding site (C-terminal part of UGT2B7, residues 285-451 and UGT2B15, residues 284-451). We provided a method aimed to reduce time and resources allocated to define the protein structure of      human UGT isoforms, by using a robust multiple template homology modelling approach (MODELLER, version 9.24). In silico methods were already used to build the 3D structures of some other human UGTs, but not UGT2B10 [50]. We provided the first homology model for the entire UGT2B10 protein, based on human, plant, and bacterial UGTs (model accessible on ModelArchive. org, DOI: https://modelarchive.org/doi/10.5452/ma-gx2 va). A good affinity for the cofactor UDPGlcA was observed with a highly negative ΔG value, confirming the accuracy of the model. Besides, the positive control amitriptyline showed a high affinity towards UGT2B10 whereas the negative control itraconazole had a poor affinity, confirming the satisfactory prediction and reliability of the UGT2B10 structure.

Molecular docking and MD simulationsputative UGT2B10 ligands
Semi flexible (protein is considered as rigid and ligand is considered as flexible) docking calculations are not sufficient for a reliable prediction of the binding affinities of putative ligands with proteins [51]. MD simulations, providing a dynamic environment and taking into consideration the change of conformation of the ligand and the protein, its evolution through time, and parameters such as temperature and pressure, is an essential complement to protein-ligand docking calculations [52]. These simulations provide more realistic energy predictions and parameters that are more comparable to in vivo conditions [53]. Based on the results obtained from MD simulations and MM-PBSA based binding free energy calculations, Acetaminophen (APAP), Lorazepam (LOR), Mycophenolic acid (MPA) and voriconazole-n-oxide (VCZ-N-O) intermediate are predicted to be putative ligands for human UGT2B10. Whereas other molecules such as HVCZ, DHVCZ, 4HVCZ and VCZ also reasonably fulfilled the criteria (stability and binding affinity) for being the ligands for UGT2B10. Based on the minimum distance analysis results we hypothesize that VCZ and HVCZ, could be the possible substrates of UGT2B10 since the computed distance values for these two ligands and UDP are comparable with the positive control AMT unlike other ligands that may serve as inhibitors. It is likely that the substrates for other UGT isoforms exhibiting sequence homology (e.g. UGT2B7) with that of UGT2B10 might also have higher affinities for the UGT2B10 substrate binding domain. Such ligands may serve as competitive inhibitors.
APAP is an analgesic antipyretic drug, known to cause acute hepatoxicity after intake of high doses, and in some cases even at therapeutic doses [54]. The majority of APAP (~95%) is degraded into non-toxic metabolites through sulfation or glucuronidation by UGT1A1, 1A6, 1A9, and 2B15 [42]. The rest of the parent compound (~5%) is metabolized by cytochrome P450, producing the hepatotoxic molecule N-acetyl-p-benzoquinone imine (NAPQI). Glutathione conjugation by glutathiones-transferases further metabolize NAPQI to form nontoxic metabolites. Acute liver toxicity of APAP can arise from an overdose, saturating the GST conjugation pathway [55] but may also occur at therapeutic doses in case of hepatic function impairment [55]. For pediatric patients, the ontogeny of metabolic enzymes such as cytochrome or UGTs, is particularly important [56]. For example, younger patients are less prone to develop hepatotoxicity than adolescents, when exposed to a chronic dose of acetaminophen [56]. The UGT2B10 activity reaches the adult activity within a month after birth indicating its potential role in glucuronidation in infants and young children [57]. Meaning that the genetic variation is as important in children as it is in adults unlike other UGTs, e.g. UGT1A6 whose maximum activity is seen only in adolescence [57]. Variants located in 3′-UTR of the UGT1A1 genes were demonstrated to influence APAP glucuronidation [58]. But no evidence was available about the contribution of UGT2B10 to the metabolism of APAP. However, UGTs form heterodimers with other close isoforms, substrate overlapping, or competitive inhibition cannot be ruled out among the members of this family. Significant correlation in the   glucuronidation of the substrates of UGT2B10, UGT2B7 and UGT2B15 was observed, suggesting the substrate overlapping between these isoforms and possibly heterodimerization of the isoform's units [17]. Given the concordance in the expression levels of UGT2B10, UGT2B7 and UGT2B15 also indicates that a genetic marker from this locus can also serve as a surrogate of the function of any of these isoforms [59]. The different levels of expression of UGT2B10, potentially influenced by the SNP rs17146905 may change the level of metabolism of APAP. Alternatively, the non-synonymous variant rs61750900 i.e. in strong LD with that of rs17146905 in Europeans and Americans could impact the function of UGT2B10 in carriers of rs17146905 leading to a potential increase of exposure to the hepatotoxic metabolites of APAP making more of APAP available for conversion into hepatotoxic metabolites by cytochrome P450 enzymes. LOR is a benzodiazepine used as busulfan-related seizure prophylaxis in pediatric patient receiving a busulfan based conditioning regimens and is usually given from the day before the first busulfan infusion up to 24 h after the last dose [60]. This agent is cleared principally through glucuronidation. R and S enantiomers are degraded by different members of the UGTs enzymatic family. S-lorazepam is degraded by UGT2B4, 2B7, and 2B15, whereas R-lorazepam is metabolized by UGT2B4, 2B7, 2B15, 1A7, and 1A10 [37]. LOR contains an amine group, the chemical group catalyzed by UGT2B10 during N-glucuronidation. The drug class of benzodiazepine is rarely the cause of hepatotoxicity [61,62]. In the case of lorazepam, the liver toxicity is also rare and there is only scarce observation of alanine aminotransferase (ALT) increase [63]. In addition, the usage of this drug is currently not reported as a risk factors for SOS, in pediatric HSCT patients [4]. It is worthy of note that Lorazepam is the only drug among the detected potential ligands, that is routinely co-administered with Busulfan. The other potential ligands that we describe, are not commonly administered at the same time as Busulfan.
MPA is the active metabolite of mycophenolate mofetil used as GvHD prophylaxis. Elevation of hepatic enzyme were reported in few patients receiving mycophenolate mofetil and liver toxicity is rarely reported [64]. Toxicity of mycophenolate mofetil in pediatric patients undergoing HSCT was reported once, and was probably due to multiorgan failure [65]. This active metabolite goes through metabolism principally by UGT1A8 and UGT1A9 to form an O-glucuronide conjugate [41]. UGT2B10 catalyzes mainly N-glucuronidation of tertiary amine compounds, while mycophenolic acid does not contain an amine group. It is also known that UGT2B7 partly contributes to the glucuronidation of MPA [17]. Thus, this molecule is not predicted to be a substrate for UGT2B10 but rather may act a competitive inhibitor. Consequently, it may modulate the enzymatic activity and the metabolism of other drugs.
As described above, LOR and MPA rarely cause severe hepatotoxicity, and are not predicted to be a risk factor for SOS. However, their combination with other known or potentially hepatotoxic chemotherapeutic and prophylactic agents in the HSCT setting, in addition to other preexisting risk factors-e.g. patients' age, genetics, underlying disease, etc. [4,66]-could potentiate the risk of occurrence of SOS in pediatric HSCT patients. It is also important to note that LOR and MPA could induce an elevation of the levels of hepatic enzymes in a small proportion of patients, and that increased transaminase levels are considered to be risk factors for SOS.
Voriconazole, the parent compound of VCZ-N-O intermediate and other VCZ metabolites investigated, is a known hepatotoxic molecule [67]. However, the correlation with the use of this antifungal and the occurrence of SOS in pediatric HSCT patients is not clearly established [4]. Furthermore, the direct hepatotoxicity of the metabolite VCZ-N-O intermediate and other VCZ metabolites and VCZ itself is yet to be studied. UGT1A4 was demonstrated to be involved in the degradation of voriconazole and some of its metabolites [35]. Interestingly, the UGT1A4 and UGT2B10 are the main catalyzer of N-glucuronidation and have overlapping substrates [20]. No experimental evidence was available on the potential role of UGT2B10 in the metabolism of these molecules, and an investigation in this regard may be interesting. VCZ-N-O intermediate, VCZ-N-O and other hydroxy metabolites undergo glucuronidation but the UGT isoform for this reaction are still unknown [34,68]. UGT2B10 catalyzes principally N-glucuronidation and thus is not predicted to have VCZ metabolites as a substrate. Nevertheless, the high affinity between the putative ligand and UGT2B10 may indicate that VCZ-N-O intermediate can be a competitive inhibitor of UGT2B10, blocking the active site of the protein and reducing the catalytic activity. However, we may not rule out the involvement of UGTB10 in the Oglucuronidation reaction given the conserved residue D150, and owing functional evidence that UGT isoforms lacking H34 also facilitates O-glucuronidation e.g.UGT1A4 [69]. However, it is also worth noting that there is an experimental evidence indicating the importance of H34 in catalyzing O-glucuronidation reaction, that concluded the need for further investigations to elucidate its function [70]. It is also likely that binding affinities of the putative ligands may be biased by the template substrate specificities used for homology modelling as it was seen in case of UGT1A6 [71]. Since UGTs also participates in glucosidation, residues offering specificity for glucuronidation or glucosidation in Nterminal domain may also be biased by the template used for homology modelling(e.g. glucuronidation specificity offered by Arg259or glucosidation by UGT2B7) [72]. However homology modelling successfully implemented to test substrates and inhibitors of other UGT enzymes in the past with a satisfactory level of sensitivity and specificity [71].
APAP, LOR, MPA and VCN-N-O intermediate may thus either act as substrates or inhibitors, and could modulate the activity of UGT2B10, causing potential drug-drug interactions. This may be especially the case in the setting of conditioning regimens before HSCT, requiring a wide combination of drugs. Further in vitro enzymatic assays are required to confirm the role of these putative UGT2B10 ligands.
Another compound that was considered during the selection process of drugs and metabolites used during busulfan-based conditioning regimens before HSCT is the antifungal fluconazole ( Figure 4). Fluconazole is principally eliminated through the renal route as a parent compound (> 80%) [73]. This compound was also demonstrated as an inhibitor of UGT2B10 in vitro, with an half inhibitory concentration (IC 50 ) of 1.13 mM [25]. Meanwhile, there is currently no in vivo evidence of drug-drug interactions involving fluconazole through UGT2B10 metabolizing pathway. Interestingly, this antifungal can induce hepatotoxicity in pediatric patients [74]. This compound may also induce a different level of exposure to potential hepatotoxic molecules according to UGT2B10 gene expression and function.

MD simulation -with the endogenous compound bilirubin
Elevated bilirubin level is one of the Modified Seattle criteria [75], used to diagnose SOS in most of the reports including our EWAS report [10] in which the association between UGT2B10 SNP and SOS was detected [10]. Interestingly, adults with Gilbert's Syndrome were reported to have a higher occurrence of SOS following an HSCT [76]. The most common genotype of Gilbert's syndrome is the homozygous, A (TA)7TAA *28 allele in the promoter of the UGT1A1 gene, reducing the metabolism of bilirubin and causing an accumulation of unconjugated bilirubin in the blood [77]. UGT1A1 and UGT2B10 are not predicted to share the substrates as they share low sequence similarities and catalyzes mainly O-and N-glucuronidation, respectively. We nevertheless analyzed bilirubin as a substrate of UGT2B10. With AutoDock Vina, we found a Kd of 1.15*10 11 μM for bilirubin, which shows that it is not predicted to be UGT2B10 substrate for. We further verified the results with MD simulation. Results indicated that the complex formed with bilirubin is stable, comparable to the stability of the complex formed with UGT2B10 and UDPGlcA alone (Additional file 6, Additional file 8), meaning that the outcome was contradictory with that of the molecular docking since it is based on semi flexible approach). Further an in vitro elucidation on the role of UGT2B10 on bilirubin metabolism is thus warranted, amidst the recent evidence on the role of UGT2B10 variant on altered bilirubin levels in a recent longitudinal study [78].

Methodologyadvantages and limitations
Our method provides a rational way to select interesting putative substrates to be tested further in in vitro enzyme assays. This method saves time and is costeffective. Computational tools used are freely available and user friendly, such as AutoDock Vina (version 1.1.2) [47] and MODELLER (version 9.24) [79]. MD simulations in contrast (GROMACS) (version 5.1.4)) [80]) require more resources and expertise, and must be performed in collaboration with a high-performance computing facility to reduce the simulation time. Nevertheless, these analyses can be achieved in a reasonable time, and can refine follow-up in vitro assays reducing the costs associated with testing of multiple molecules.
One of the limitations of our study is on the selection criteria of putative UGT2B10 ligands to be tested with MD simulations. As MD is a time-consuming process, we used a strict threshold for the selection of complexes to test protein-ligand docking with AutoDock Vina. Thus, we choose to base the filtering on the ΔG values obtained with positive control AMT. We used a strict threshold based on the results of the known ligand amitriptyline. Thus, we may have excluded potential false negative results, showing less affinity with the UGT2B10 but still interacting as ligands.
On the other hand, this method only predicts molecules interacting with the protein. The stability of the complex and energies can be predicted, but we cannot determine if the molecule undergoes glucuronide conjugations. Besides of some assumptions made based on the chemical structure, there is no possibility to determine if the molecule acts as a substrate, an inhibitor, or an inducer of UGT2B10. To have more insight into the precise role of the molecule identified, further in vitro experiments are needed to characterize the products or measure the effect of the ligand on the UGT2B10 activity. In our analysis, we chose to perform site-specific docking to focus on the detection of potential substrates while also screening for potential competitive inhibitors. The methodology used provides site-specific interaction with the putative ligands, not enabling to detect another mode of inhibition such as non-competitive inhibition, where the binding of the inhibitor does not occur in the active site of the protein. To detect potential non-competitive inhibitors, blind or non-specific docking studies can be performed to define where the interaction is most likely to occur on the protein.
Further in vitro assays must be performed in order to confirm in silico predictions and test the validity of their results. Developed enzymatic assays using recombinant UGT2B10 coupled with liquid chromatography-tandem mass spectrometry analytical methods [20,57,81,82] could be implemented to screen our test compounds as substrates or inhibitors of UGT2B10.

Conclusion
Using a step-wise methodology, we provide a systematic approach for the exploration of the potential substrates of a drug-metabolizing enzyme in silico. This enables prioritization for follow-up in vitro experiments allowing optimal utilization of time and resources. The methodology is flexible, and any other interesting proteins or ligands can be added to the selection pipeline.
We provided the first complete human UGT2B10 modeled structure. With a selection pipeline of drugs used during busulfan-based conditioning regimens, we detected potential interesting UGT2B10 ligands; acetaminophen, lorazepam, mycophenolic acid, and voriconazole-n-oxide intermediate. These compounds could be tested with enzyme kinetics followed by LC-MS/MS experiments to determine if the ligands are putative inhibitors or substrates.

Generation of the UGT2B10 model
Template-based or homology or comparative protein modelling of UGT2B10 was performed with MODEL-LER (version 9.24) [79]. The amino acid sequence was retrieved from UniProt protein sequence database with the accession number of P36537 [83]. The signal peptide was removed, as it only serves to direct the protein to the endoplasmic reticulum and is not present in the mature protein. The HHpred tool [26,27] was used to search homologous proteins from the protein databank (PDB) repository. Briefly, a multi-sequence alignment (MSA) is built with a homologous sequence from the Non-redundant protein sequence (NRDB) database. A hidden-Markov model (HMM) profile is created based on this MSA, and the RCSB-PDB database is scanned for similar HMM profiles. Potential templates or structural neighbors retrieved by HHpred were filtered to include only protein structures with more than 20% similarity with the query sequence, obtained by X-ray crystallography, of a resolution of less than 2 Å and with UDP-binding domains [84] verified manually with Inter-Pro [85]. Multiple templates were selected to provide a good quality model [86]. The four selected templates, UTG51 from S. cerevisiae (PDB ID: 5GL5), UGT76G from S. rebaudiana (PDB ID: 6INF), UGT2B7 from H. sapiens (PDB ID: 2O6L), and oleodamycin glycosyltransferase from S. antibioticus (PDB ID: 2IYA) ( Table 1) were used to build the MSA. The alignment was submitted to the MODELLER tool [87] provided in the Chimera software (version 1.15) [87] to construct the homology model. The results with the more negative normalized Discrete Optimized Protein Energy (zDOPE) score were selected as the best model. Structure of the protein and protein-ligand complexes were analyzed with PDBsum [49]. Apart from this, we also retrieved the structure of UGTB10 from AlphaFold [88], an in silico protein structure database, which was developed very recently.

Quality of the model and energy minimization
Energy minimization of our UGT2B10 model was performed first with Yasara minimization server [29] and when the quality was no more improved with GalaxyRefine, until the maximum quality was reached [28]. The final model was obtained after three steps of each respective tool (Additional file 3) The quality of the model was assessed with Verify3D [31,32], ERRAT [30], and Ramachandran plot, by using the Structure Analysis and Verification (SAVES) servers [89]. Z-score was obtained with the ProSA-web server [33] (Additional file 2). Further, we compared the structure validation results of our model with model by AlphaFold. The structure superposition between our homology model and AlphaFold model was also attempted using PYMOL [88]. Finally, PDBSum Generate server was used to compute the secondary structure composition of both AlphaFold and our homology models.

Binding site prediction
Binding site prediction was performed with MetaPocket 2.0, a consensus-based method [90]. Results from LIGS ITES, PASS, Q-SiteFinder, SURFNET, Fpocket, GHE-COM, ConCavity and POCASA are compiled with this tool [90]. Prediction returned by many of the tools was determined as the best results. The prediction was further refined and validated after the superposition of the templates with our homology model using PyMol [91]. In addition, Comparative protein sequence alignment using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/ clustalo/) was also performed for all the members of UGT to identify the putative catalytic residues of UGT2B10.

Molecular docking with cofactor UDPGlcA
Binding with the cofactor UDPGlcA was performed with the AutoDock Vina docking algorithm [92]. Simplified Molecular Input Line Entry System (SMILES) structure of UDPGlcA was retrieved from PubChem (CID: 17473) and then saved in a PDB format with Corina 3D structure conversion web server [93]. The UGT2B10 model and the ligand were prepared with Molecular Graphics Laboratory (MGL) Tools (version 1.5.7) [94]. In this step, the polar hydrogens were added, and non-polar hydrogens were merged in both protein and ligand structures. Moreover, the kollmann charges was added to the protein whereas Gasteiger charges was added to ligand structures. Finally, the protein and ligand structures were recorded into in PDBQT format (XYZ Coordinates + Partial charges + atom type). The receptor grid box had a size of was restricted to the previously defined (size (x,y,z) = (15, 17, 22.5) and center coordinates of (x,y,z) = (− 70, 8, 3)). A maximum of nine binding modes for each ligand were returned by the tool. Results with the more negative ΔG (in kcal/mol) were selected as the best docking conformation.

Consensus based approach for selection of putative UGT2B10 ligands
A list of drugs used during and after busulfan-based conditioning regimens for HSCT in pediatric was created, based on the discovery cohort of our EWAS study [10]. This list included chemotherapeutics, antifungals, SOS and GvHD prophylaxis, and antipyretics. Each molecule passed through a pipeline to choose the most interesting compounds to be tested for molecular docking calculations with UGT2B10 ( Figure 4). The approach consisted of receiving inputs on inclusion and exclusion criteria independently from four individuals with background in pediatric oncology, pharmacology and biology. These criteria included clinical relevance in pediatric HSCT setting (such as the frequency of usage), in addition to verification of previous in vitro and in vivo evidence regarding glucuronidation. 14 drugs were relevant drugs used in pediatric HSCT setting were found. From the list three were excluded because there was already evidence that the molecules do not undergo glucuronidation in vivo. Relevant metabolites for these drugs were added to the selection pipelines, for a total of 11 drugs and 7 degradation products. From these 18 molecules, three were further excluded because they were not directly metabolized by UGTs, and were already demonstrated to not undergo glucuronidation in vitro (Figure 4). Based on the consensus from the above-mentioned criteria, selected ligands were chosen further for in silico protein-ligand interactions using molecular docking and MD simulations.
Molecular docking was performed with the same method as for the cofactor UDPGlcA using AutoDock Vina docking algorithm [92]. Amitriptyline and itraconazole were used as positive and negative controls, respectively [25]. The affinity between the ligand and UGT2B10 was quantified after calculation of the dissociation constant (Kd) (Equation 1) [95].
Where ΔG is the estimated free energy of binding in kcal/mol, R the gas constant (1.957 * 10 − 3 kcal*K − 1 * mol − 1 ), and T the temperature in Kelvin (298 K for an experiment conducted at 25°). A low Kd indicates a high affinity between the ligand and the enzyme. Further, molecular docking studies were also performed for Alpha-Fold UGTB10 model with the selected ligands and compared the results obtained from our homology model of UGTB10.

Molecular dynamics simulations with GROMACS
To understand the structural stability and interaction between enzyme and putative ligands, the molecular dynamics (MD) simulation was performed with the time period of 50 ns at the University of Geneva on the Baobab cluster with GROMACS (version 5.1.4) [80] on the structures obtained after molecular docking calculation. Eight compounds showing ΔG value of ≤ − 1.0 kcal/mol predicted by AutoDock Vina (based on the results obtained for the positive control amitriptyline (Table 3)), the negative and positive controls (ITZ and AMT), the protein with only the cofactor UDPGlcA and the apo form were selected to perform these simulations. Every structure composed of a putative ligand, UGT2B10 and UDPGlcA was compared with the omplex comprising only UGT2B10 and UDPGlcA. The process of the MD simulation was based on the work of Lemkul et al. [96] Briefly, enzyme structure was prepared with The GRO-MOS force field set 53a6 [97]. The topology (ITP) file and the GROMACS (GRO) file for the ligand structures were prepared with the external PRODRG server [98]. The system temperature was set to 310 K (36.9°C), and the reference pressure to 1.0 bar. The LINCS constraint solver algorithm was used on all bonds with "lincs_iter" and "lincs_order" parameters of 1 and 4 respectively. The short range electrostatic and van der Waals cutoffs were both set to 1.0 nm. The MD simulations were run for a period of 50 ns with a step size of 2 fs. The analysis of the results was first performed with GROMACS functions to provide the principal component analysis (PCA) of the projection of trajectories of the backbone of the protein, root means square fluctuation (RMSF; Gromacs command: gmx rmsf) and root mean square deviation (RMSD; Gromacs command: gmx rms) graphs. The average number of inter-and intra-molecular H-bonds (Gromacs command: gmx hbonds), RMSD, RMSF, Radius of Gyration (RoG; Gromacs command: gmx gyrate), and Solvent Accessible Surface Area (SASA) were calculated with the average command from GROMACS. The trace of covariance matrix was determined with the gmx covar and gmx anaeag commands. To evaluate the proximity of the ligands including positive (AMT) and negative controls (ITZ) towards cofactor, we estimated minimum distance analysis using gmx mindist module of Gromacs, To have quantitative results, further MM-PBSA based calculation of the binding free energies was performed with g_mmpbsa version 1.6 [99]. MM-PBSA calculations were performed on the most stable portion of the simulation for each complex, corresponding to the last 20 ns i.e. from 30 to 50 ns when the complex is more stable (2000 frames with increments every 10 ps). Results and graphs were presented with Chimera version 1.14 and QTGrace version 0.2.6. A complete workflow of the molecular docking and MD simulations is presented in Additional file 10.
A detailed step-wise protocol followed for molecular dynamics simulations is available publicly from the Yareta repository (DOI: https://doi.org/10.26037/yareta: o7au7wijvrglnb6nu2k4azkmmy). A list of the tools and software used can be found in the supplementary materials (Additional file 11).