PHARMIP: An insilico method to predict genetics that underpin adverse drug reactions

Graphical abstract


PHARMIP at a glance
The method is composed mainly of three pipelined steps starting with the chemical structure of the drug or the drug candidate with the aid of free user-friendly bioinformatics tools and databases to get a list of candidate genes and genetic variants that may underpin an adverse reaction. We developed this method to address the recent requirements in pharmacovigilance and pharmacogenetics. To the best of our knowledge, there is no free software or bioinformatics tool that do the same job. And, we hope that this pipeline could be programmed into a tool in a future work. The predicted time to run the PHARMIP pipeline is variable and mainly depends on the step of PharMapper as a rate-limiting step. PharMapper job could range from several days to several weeks as the server runs the jobs in a queue. Limitations of PHARMIP could include the instability of the bioinformatics databases as every update of the database will change the results leading to some problems of research reproducibility. Also, some docking aspects and basics could be a complex issue for the audience of this work (pharmacovigilance and pharmacogenomics scientists). So, we tried to use the simplest free software with the default option to avoid technical details that could be inappropriate for the bioinformaticsnaïve audience. Each tool or database included in this work has its own manual and/or FAQ to troubleshoot the common problems. We encourage the readers to refer to these manuals in case they faced any problems using these tools.

Method details
Predicting the off-label drug targets Similarity ensemble approach (SEA) Starting with the chemical structure of the drug, the off-label targets could be predicted by the similarity ensemble approach (SEA). This approach predicts new protein targets based on chemical similarity between ligands [1]. Most tools that use this approach queried bydrug or candidate structure in the Simplified Molecular-Input Line-Entry System (SMILES) format [2]. There are many chemical databases to retrieve such information. However, for drugs the easiest way is to retrieve the SMILES code from the drug page at the DrugBank database [3]. If the ligand is a new structure as in case of a drug candidate or a new molecule, the SMILES code could be retrieved using molecular drawer like Pubchem sketcher or other suitable tool that allow deriving the SMILES code from directly drawing the chemical structure. Also, any other chemical format for the input molecule could be easily converted to SMILES using chemical format converter like OpenBabel converter. Once the ligand SMILES is ready, it could be used to feed several tools that use the SEA.
Several tools use SEA to assign new targets to the query molecule based on its chemical similarity score to other chemicals (usually referred to as Tanimoto coefficient or similarity score). Each tool screens its target chemical repository using different SEA models to retrieve hits. Biological targets of these hits are assigned to the query ligand depending on SEA. Targets from different tools could then collected and filtered using suitable significant level (e.g. P-value of 0.01 or 0.05).
One advantage of Polypharmacology browser is that it retrieves the already reported query molecule bioactivity in ChEMBL database [6]. Then, it predicts targets based on 6 different fingerprints and 4 combination of fingerprints. Using PPB enriches the off-label targets list with targets that have wet-lab evidences.

Reverse pharmacophore mapping
Pharmacophore of a drug is the set of molecular features that cause it to be identified by its biological target [7]. The normal pharmacophore mapping retrieves ligands whose pharmacophore fit certain protein target. The reverse pharmacophore retrieves protein targets for a certain ligand. A very famous and freely available insilico tool that uses reverse pharmacophore mapping approach is PharMapper server [8]. PharMapper needs the query molecule in MOL2, or SDF chemical file formats. While SDF format is available in DugBank, MOL2 could be retrieved from ZINC database [9]. There are different options to customize the tool before job submission. Default options resemble the optimum choice for non-specialists. The job may take one to two weeks to spit-out results according to jobs queue length. So, the option of getting job results alert by email is worthy.
The results will retrieve 300 targets by default. These targets could be arranged using different parameters. Z -score of the results could be used to detect which of the 300 results will be picked up to the targets list based on the significance level of choice (e.g. for P-value of 0.01 select targets withZscore ! 2.326, and for P-value of 0.05 select targets with Z-score !1.645) (Fig. 1).

Molecular docking
Docking is a featured insilico technique that is usually used to predict the best orientation of a ligand when bound to the active site of a protein [10]. Pharmacophore mapping usually outperforms docking in virtual screening to predict new ligand-target relationships [11]. However, docking improves and evaluates predictions of pharmacophore [12] and SEA [13]. The main use of molecular docking in this pipeline is to evaluate the affinity of the drug to the predicted targets by SEA and pharmacophore mapping. One of the featured tools for molecular docking is Auto Dock Vina [14] which can be used as a standalone program or imbedded in a platform like PyRX [15].
Docking jobs using PyRX needs the drug in the SDF format and the target protein in 3D structured format. Normally, the 3D structure of a protein could be retrieved from the PDB database [16]. Some targets still have no experimentally determined 3D structure in PDB database. In this case, homology modeled 3D structure for these targets could be then obtained from SWISS-MODEL database [17]. From the Vina wizard in PyRX, the drug SDF file is added as a ligand and the protein target 3D structure file is added as a macromolecule.
Sometimes, PyRX doesn't accept a certain 3D protein structure for docking due to some technical errors. Each protein has different entries and identifiers in 3D protein structure databases. All these entries could be found in the protein's Uniprot page in the cross-references and/or structure sections.
The results of this step will be the best conformation preference of the ligand into the binding site of the target protein. AutoDock Vina normally predicts the free binding energy (also known as Gibbs free energy or DG) in kcal/mol. For calculating the affinity of the drug to the predicted target, the dissociation constant K d could be calculated from the free binding energy using the python script from [18]. In PyRX python shell, the code is run, free binding energy from docking is entered in kcal/mol, and the K d is calculated in moles. Binding DB [19] is a good source to compare the drug predicted affinity in terms of K d to the experimental affinities obtained in wet lab settings.
The end result of this pipeline step will be a list of off-label targets (OLT list) obtained by SEA, pharmacophore mapping then validated using molecular docking. Target list identifiers may be in different formats according to the tool used. The combined list of targets obtained from different tools could be refined and unified by the retrieve/ID mapping tool from the UniProt database [20]. Visualization and different analyses could be done on the list using STRING database [21]. The final OLT list will be used as an input for the next step to retrieve the related diseases/ phenotypes list.

Retrieving the diseases /ADRs (DA) list related to OLT list
Using the OLT list from the previous step, different databases could be used to retrieve diseases and / or ADRs linked to these targets. Online Mendelian inheritance in man (OMIM) database [22], diseasegene network (DisGeNET) database [23], and comparative toxicogenomic database (CTD) [24] are good examples of these databases. Also, Kyoto encyclopedia of genes and genomes (KEGG) [25] has a disease mapper tool that could be fed by a list of gene identifiers to retrieve a list of related diseases.
The first overview of the OLT list could be done by the enrichment analysis. CTD offers the option of disease enrichment analysis which could be done to figure out the types of diseases or side effects that could result from the drug or candidate use. A certain disease is considered statistically enriched if the proportion of its related genes to the full enquiry gene list is larger than the proportion of all its related genes to the whole genome [26]. This type of analysis could be done on the level of biochemical pathways and / or gene ontology functional terms. This is very helpful in foreseeing the complete predicted picture of the query drug and its biological effects on several levels.
DisGeNET has the option of retrieving associations at the genetic variant level. Normally, DisGeNET retrieves results at gene level ranked by gene-disease association (GDA) score, and at genetic variant level ranked by variant-disease association (VDA) score. Retrieving genetic variants linked to certain drug-disease association could resemble a draft for drug's pharmacogenetics variants network that could be used in several ways. For example, detecting biomarkers related to a certain ADR, genetic guided patient recruiting in drug clinical trials, and drug genetic labelling experiments.
The results of this step will be a list of diseases and ADRs (DA list) related to the OLT list (Fig. 2).

Analysis of DA list to get answers
DA list from the previous step could be analyzed in several ways according to the question to be answered. Normally, these results could be used either to predict or to explain the genetics underpinning for a certain side effect of the enquired drug.
In Pharmacovigilance, healthcare providing entities usually report ADRs noticed during using a certain drug. These individual case safety reports (ICSRs) are usually saved in specialized databases to be analyzed. One of the featured ICSRs databases is the world health organization (WHO) database VigiBase [27] which is freely accessed through the VigiAccess portal. Analyses of pharmacovigilance data include signal detection and different risk management activities. Recently. European medicines agency (EMA) released a guideline on the use of pharmacogenomics methodologies in the pharmacovigilance regular activities. The guideline describes 3 types of genomic biomarkers (BM) that influence drug safety and efficacy. Namely, they are pharmacokinetic (PK) BM, pharmacodynamic (PD) BM, and BM associated with drug-induced toxicity risk [28]. PHARMIP uses the power of insilico tools and bioinformatics databases to figure out the different genomics BM underlying drug safety and efficacy issues.
In pharmacogenomics, regular activities include discovery, evaluation, and implementation of genetic BM influencing drug response. These data are saved in specialized databases that provide information about gene-drug associations (e.g. PharmGKB [29]). PHARMIP could be used to assist pharmacogenomics daily activities via detection of candidate genes and variants that influence certain drug response. In precision medicine, pharmacogenomics is used as a tool for genetic guided therapy personalization. PHARMIP, via its role in pharmacogenomics, could help advances in precision medicine from a drug-centric point of view. Construction of drug-centered gene network could reveal insights about the precise use of this drug with different patients' genetic profiles.
The following are two examples of using PHARMIP to detect the underpinning genetics of some reported domperidone cardiotoxic ADRs and ranitidine related thrombocytopenia.

Example 1: domperidone related cardiotoxic effects
Domperidone (DrugBank accession no. DB01184) is a specific dopaminergic blocker approved for treatment of several gastrointestinal disorders like nausea, vomiting and emesis. It's used to treat gastroparesis by its D2 selective antagonism that causes increase in GIT peristalsis. It's also used as a galactagogue by increasing prolactin secretion as a part of its anti-dopaminergic effect [30,31]. It is also used to relieve the gastrointestinal symptoms of Parkinson's disease and in some cases as an unintended antipsychotic drug [32,33].
Although being successfully used for about 40 years as a prokinetic drug in different gastrointestinal motility disorders, domperidone was repeatedly reported to have cardiotoxic effects. Leelakanok et al. [34] reported a study results that domperidone increases the risk of cardiac arrhythmia and sudden cardiac death by 70 %. Similar results were also reported by Johannes et al. [35] and van Noord et al. [36]. This cardiac toxicity is mainly thought to be resulting from the drug effect on cardiac QT interval which can increase the risk of Torsades de Pointes [37]. Also, domperidone dependent ventricular arrhythmias were reported. It is thought to be an effect of blockade of hERG voltage-gated potassium channels [38,39].
Domperidone VigiAccess page contains 9201 ICSRs in 27 categories (on 21 st January 2019). Among these ICSRs, 584 reports are for cardiac disorders in 50 subcategories. Some of these cardiac disorders are heavily reported (e.g. palpitation has 171 ICSRs) and others are seldomly reported (e.g. atrial flutter has only 1 ICSR).
Cardiotoxic ADRs that will be investigated by PHARMIP in this example are: Predicting the off-label drug targets (OLT list) SEA Domperidone SMILES code (ClC1=CC2=C(C = C1)N(C1CCN(CCCN3C(=O)NC4=CC = CC = C34)CC1)C (=O)N2) was retrieved from its DrugBank page then fed to SEA tools. Different targets were predicted using this approach. All targets are presented in human genome ontology gene nomenclature committee (HGNC) symbols [40].
To keep this example brief, molecular docking was done only on the targets filtered by the next step. The final OLT list contained 96 genes that will be used to retrieve the DA list. Visualization of this list by STRING database is available through the link https://version-11-0.string-db.org/cgi/network. pl?networkId=icgrvT0WSen6.

Disease enrichment
The OLT list was used in the CTD enrichment analysis tool. The highest enriched disease categories were roughly nervous, cardiovascular and mental disorders.

Genes and variants disease associations
The OLT list was fed to DisGeNET as a multiple gene list to retrieve domperidone DA list. A total of 13,493 gene-disease associations and 4,600 variant-disease associations were retrieved. The highest GDA score was 1 for LMNA gene associations to progeria and cardiomyopathy. The highest VDA score was 0.83 for the association of gene KCNH2 reference SNP (rs199472936) to Long Qt Syndrome 2.

Filtering DA list to get answers
Filtering DA gene list by the ADR name retrieved the list of candidate genes that are related to this ADR. Tables 1-4 show the results of filtration by the ADRs names (arrhythmia, arrest, Torsades de Pointes, and long QT).
It's obvious that KCNH2 is strongly related to all cardiac side effects that are previously reported as side effects of domperidone. This result harmonizes with the previous works of [38,39]. As these cardiovascular disorders are polygenic [41] and may be a result of multiple genes, all associated gene were kept even those with small GDA scores. Deeper analysis may reveal better results to avoid redundancy and disease name differences between different databases.

Filtering at the genetic variants level
Answers on the variant level could also be retrieved from the DA list. Table 5 shows 27 genetic variants retrieved from domperidone DA list and related to arrhythmias and Torsades de Pointes.
These variants could be used in designing clinical trails that explore domperidone cardiotoxic effects or other similar drug-centered precision medicine research activities.

Docking
Twenty genes from Tables 1-4 are shortlisted as candidates for domperidone related ADRs (arrhythmias, cardiac arrest, long QT interval, and Torsades de Pointes). Seven targets of these twenty are insilico predicted and the remaining 13 are reported in ChEMBL. Docking jobs of domperidone to these 7 targets (CACNA1G, CHRM3, GRIN2B, HDAC4, OPRL1, PTGS2, EGFR) were done using PyRX (version 0.8). Affinities in Kcal/mol were fed to the K d calculator to estimate K d s. Resulting affinities were compared to domperidone / DRD3 affinity that was retrieved from Binding DB as calculated by Freedman et al. [42]. The results are represented in Table 6.
Docking results could be used for further filtering of the previous results to select targets that are more likely involved in certain ADRs. For example, Table 6 results could suggest the selection of CHRM3 and PTGS2 for more wet lab analysis to explore their relation to the previously mentioned domperidone cardiotoxic ADRs.

Retrieving related diseases and ADRs (DA list)
The enrichment analysis using OLT gene list retrieved number of related diseases like cancer, digestive system diseases, and cardiovascular diseases. The related diseases list retrieved by DisGeNet contained 8959 entry and 3423 entry for the variant's diseases list. Filtering DA list to get answers Filtering the genes and variants lists that are related to ranitidine OLT list and thrombocytopenia retrieved the following tables (7,8 and 9): On the genetic variant level, the next table contains the variants of interest that can be used in drugcentered precision medicine research activities:

Docking
A list of 8 genes (ACHE, BCHE, CA2, ITGA2B, ITGB3, NFKB1, SLC22A1, SRC) was predicted to underpin the ranitidine related thrombocytopenia. Two of them (ACHE, NFKB1) were already reported in ChEMBL as ranitidine targets through wet lab experiments. The results of ranitidine docking to the other 6 targets is summarized in the following table: based on this rough estimation, (BCHE, CA2, SRC) could be suitable candidates for further ranitidine related thrombocytopenia wet lab invistigations.

Supplementary material and/or Additional information
The following supplementary materials are attached to the paper: Two Excel workbooks that contain all the results retrieved by the insilico tools and dependent analysis results. Two Word documents contain the refined gene lists retrieved by the insilico tools and formatted as inputs for DisGeNET and CTD analyzer. Two Zip folders contain the files used for docking and the python script to calculate K d s.

Declaration of Competing Interest
The authors declare no conflicts of interest in preparing this manuscript.