Chemoinformatic studies on some inhibitors of dopamine transporter and the receptor targeting schizophrenia for developing novel antipsychotic agents

Chemoinformatic studies were carried on some inhibitors of dopamine transporter to develop a predictive and robust QSAR model and also to elucidate binding mode and molecular interactions between the ligands (inhibitors) and the receptor targeting schizophrenia as novel Antipsychotic agents. Density Functional Theory (DFT) approach was utilized to optimize the ligands at B3LYP/6-31G∗ at the ground state and Multi-linear regression of the genetic function approximation (MLR-GFA) method was employed in building Penta-parametric linear equation models. The best model with statistically significant parameters has squared correlation coefficient R2= 0.802, adjusted squared correlation coefficient R2adj = 0.767, Leave one out (LOO) cross-validation coefficient (Q2) = 0.693, lack of fit score (LOF) = 0.406, R2Test = 0.77, Y-randomization test (cR2p) = 0.714, Chi-squared (χ2) =0.026, bootstrapping (Systematic errors = 0.272) and Variance Inflation Factor (VIF) <2 . The obtained results were compared with standard validation parameters to ascertain the predictivity, reliability, and robustness of the model. Also, the mechanistic interpretation of the descriptors found in the model revealed that two out of five descriptors; MATS7s (32.3%) and RDF95m (30.4%) having pronounced influence on the observed antipsychotic property of the compounds evidenced by their highest percentage contributions. More so, the molecular docking investigation showed that the binding affinity of the selected ligands ranges from -10.05 to -9.0 kcal/mol and with ligand 21 possessed the highest binding affinity (-10.05 kcal/mol). Furthermore, all the selected ligands displayed hydrogen bonds and hydrophobic interactions with the amino acid residues of the target (4M48) which could account for their higher binding energy. Our findings revealed that the developed model passed the general requirements for an acceptable QSAR model and also satisfied the OECD principles for model development. Hence, the developed model would be practically useful as a blueprint in developing novel antipsychotic agents with improved activity for the treatment of schizophrenia mental disorder.


Introduction
Schizophrenia is a psychiatric disorder that frequently involves a composite genetic tendency as well as susceptibility to certain environmental factors [1,2]. It is a chronic and debilitating mental disorder characterized by disordered thoughts, abnormal behaviors, and anti-social behaviors, meaning that the person with schizophrenia problem does not identify with reality at times [3]. Meanwhile, Psychosis refers to a state in which an individual experiences a false sensation and this includes auditory, visual, and tactile sensations of things that are not real, and feelings that something strange is going on [4,5]. Symptoms of Schizophrenia and Psychosis are related and they include; hallucinations, delusions, incoherent speech, dangerous behavior, unusual movements, problems at public places and detached manner with the inability to express emotion, apathy and lack of enthusiasm [6,7]. Persistent in symptoms of Psychosis may be a risk that the affected individual could be experiencing manifestations of schizophrenia or mental disorders that are considered as the precursors of schizophrenia [6]. Antipsychotic drugs (APDs) are the backbone in the treatment of schizophrenia. The available APDs exhibit a broad range of mechanisms and act on receptors of diverse biogenic monoamine neurotransmitters [8].
Dopamine is the main neurotransmitter involved in the pathophysiology and treatment of schizophrenia [9]. Dopamine pathways have been well interpreted by Positron emission tomography (PET) with different radiotracers and these PET tracers have been used to elucidate various aspects of divergent dopaminergic transmission in schizophrenia [10,11,12]. Dopamine (DA) receptor and transporter dysfunctions play an important role in the pathophysiology of neuropsychiatric mental disorder including anxiety disorder (AD), major depressive disorder (MDD), bipolar disorder or depressive state and schizophrenia disease [13]. All effective antipsychotic medications achieve their efficacy by targeting the dopaminergic system, sub-chronic blockage of 60-80% of dopamine D 2 receptors is considered to underlie treatment efficacy in schizophrenia [14]. Clozapine was reported to be the most effective pharmacotherapy for the treatment of schizophrenia since the introduction of conventional antipsychotic drugs in 1950s [15,16]. Despite its superior efficacy and potential to reduce substantially the morbidity of schizophrenia and improve the outcomes of patients, Clozapine has not been used on a widespread basis due to its potential for agranulocytosis [15]. Even though other antipsychotic drugs such as Cariprazine, risperidone, etc. were discovered after Clozapine, none of the beneficial effects for any of the medication strategies were considered to be of clinically significant magnitude [15,17,18]. More worrisome is the incidence of "ultra-resistance" cases where patients with schizophrenia respond neither Clozapine nor any other antipsychotic drugs [7]. Since existing antipsychotic drugs for the treatment of schizophrenia are faced with challenges of adverse side effects, patients' ultra-resistance, and insignificant clinical benefits, the need for a continuous search for potent and less toxic antipsychotic drugs has become very necessary. Although, drug discovery and development are very tedious scientific exercises owing to the stupendous time factors and resources involved, thus the application of chemoinformatic study to overcome these bottlenecks is very important at this time. Because of these, some of the inhibitors (ligands) of Dopamine Transporters and the receptor (Dopamine transporter elucidates antidepressant mechanism) targeting schizophrenia that is well documented in the CHEMBL Database [19] and the Protein Data Bank respectively were studied as to harness the structural features influencing the observed antipsychotic activity of the inhibitors as well as to elucidate the binding mode and molecular interactions between the ligands and the receptor targeting schizophrenia. In consequence, the results of the study are hoped to provide necessary information for the identification and development of novel Antipsychotic agents with improved activity for the treatment of schizophrenia and other related mental disorders.

Experimental data set
A data set of 44 inhibitors of DAT was collected from the CHEMBL Database [19]. The general molecular structures of the compound's vis-a-vis their CHEMBL number are contained in Supplementary  Table SD1. The anti-psychotic activities expressed in nM were converted into the corresponding pKi (pKi ¼ -log 10 pKi) values which are used as dependent variables in this study.

Molecular optimization and descriptors calculation
2D structures of the compounds were drawn using Chemdraw software and the spatial conformations of the compounds were determined using the Spartan 14 V1.1.4 wavefunction software package [20]. The molecular structures were minimized by Molecular Mechanics Force Field (MMFF) calculations to remove strain energy before subjecting it to complete geometry optimization with the aid of Density Functional Theory (DFT) at B 3 LYP/6-31G* basis set. The molecular descriptors were calculated using the PaDel descriptor tool kit, Spartan 14 software and ChmBio3D Pro 12.0.1V software [21]. More than a thousand descriptors comprising of 0D, 1D, 2D, and 3D types were generated for each molecule. The descriptors were correlated with the antipsychotic activity of the molecules using Pearson's correlation matrix to select the suitable descriptors for Genetic Function Approximation (GFA) analysis based on the correlation coefficients.

Experimental data set division
The data set (44 compounds) were split into a training set of 34 compounds (77%) which was used to develop the models and test set that is made up of 10 compounds (23%) that was utilized externally to validate the predictivity of the models [22,23] by employing Kennard stone algorithm method with the use of Dataset Division GUI 1.2" software [24,25] in line with the optimum splitting pattern of data set in QSAR study [26].

Model building and evaluation of chemometric parameters
Different possible combinations of descriptors were subjected to Genetic Function Approximation (GFA) with the experimentally determined biological activity on a logarithmic scale (pKi) as the dependent variable and the descriptors as the independent variables. Out of the three generated GFA models, the best (Model-1a) which is statistically significant and with the smallest LOF score was selected. The use of Friedman's lack-of-fit (LOF) measure has several advantages over the regular least square error measure in evaluating the quality and fitness of a model [23]. Mathematically, the Friedman lack-of-fit (LOF) is expressed [27,28] by Equation (1) below as; SSE ¼ sum of squares of errors, c ¼ number of terms in the model, other than the constant term, d ¼ user-defined smoothing parameter, p ¼ total number of descriptors contained in all model terms while M represents the number of samples in the training set [29].

QSAR model validation and principles of OECD
Validation is a decisive step in a QSAR modeling in which the predictivity, reliability, and significance of the procedures are confirmed in developing a model [30]. The principles of Organization for Economic Co-operation and Development (OECD) that constitute a conceptual framework for validating a QSAR model were employed in validating the model, that is, a well-defined End-point measured must exist, a univocal algorithm must be used, a defined applicability domain, appropriate statistical evaluation of the models must be carried out (i.e. internal and external validations using the training set and the test set respectively) and mechanistic interpretation of the models must be established [31,32].

Model validations and procedures
A quest to develop a globally acceptable QSAR model and to ensure compliance to OECD Principles of model validations, appropriate statistical evaluation and validation of the models were investigated using both internal and external validations procedures.

Internal and external validation methods
Internal validation is assessed using the data that created the model via the methods of least squares fit (R 2 ), cross-validation coefficient (Q 2 ), adjusted R 2 (R 2 adj ), the difference between R 2 and Q 2 (R 2 -Q 2 ), Chisquared (χ 2 ) and Root-mean squared error (RMSE). The values of these parameters were compared with the minimum criterion for robust QSAR models in Table 1 [30]. The R 2 value is interpreted as the proportion of variation in a dependent variable that is explained by the model. R 2 is expressed by this formula: Where SST ¼ total sum of squares, SSR ¼ regression sum of squares, and SSE ¼ minimum sum of squared residuals of any linear model. R 2 value varies directly with the increase in some descriptors, thus, R 2 cannot be a useful measure for the goodness of model fit. Therefore, R 2 is adjusted for the number of explanatory variables in the model [30,33]. The adjusted R 2 is defined as p ¼ number of descriptors in the model. The LOO cross-validated coefficient (Q 2 ) is given by; where A and B represent the predicted and observed activity respectively of the training set and C ¼ mean activity value of the training set. The Chi-squared (χ 2 ) and Root-mean squared error (RMSE) for validating the models and error checking to determine if the model possesses the predictive quality reflected in the R 2 are expressed by Eqs. (5) and (6) respectively as; y andŷ represent the experimental and predicted activity for each compound in the training set, ym denotes the mean of the experimental activities, and n is the number of molecules in the study compounds.
Since the real predictive ability of a QSAR model cannot be judged or guaranteed solely base on internal validation [29], so the only way to establish the true predictivity of a model is to compare the predicted and observed activities of the external test set of compounds that were not employed in the model building [30], hence, external validation using test set data becomes a sine qua non to ascertain quality assurance and predictive power of the model. Therefore, the predicted R 2 ext (external validation) of the model is computed by using the Equation (7) below.
W and T symbolize predicted and observed activity values respectively of the test set compounds and X indicates the mean activity value of the training set.

Prediction analysis
Bias-Variance Estimator software that uses bootstrapping as a resampling technique obtained from DTC lab website was engaged to examine the prediction error analysis which depends on bias-variance estimation evaluate the role of both the prediction errors, that is, systematic error (bias) and random error (variance) in the developed model [34]. The parameters bias 2 , variance and mean square error (MSE) were computed by using equations 8a-d below; Where n c ¼ number of compounds in the test set, y exp(i) ¼ experimental response value of the compound 'i', b y B PredðiÞ ¼ mean predicted response value of compound 'i' from 'n B ' bootstrap models, y BðjÞ PredðiÞ ¼ predicted response value of compound i from the bootstrap model 'j', n B ¼ number of bootstrapping models produced, y pred ðiÞ ¼ predicted response of compound i from the model.

Evaluating applicability domain of the model
Assessing the applicability region of a model is a crucial procedure in establishing the ability of the developed model weather it can make a reliable prediction within the chemical region or otherwise for which the model was built [23,35]. For evaluating the applicability domain of the model, both the leverage approach and Euclidean Based Applicability Domain methods were adopted in this study.
The leverage of a given data set of compounds hi, can be defined mathematically as; where xi the descriptor row is the vector of the considered compound i, hi is the n x k descriptor matrix of the training set compound used to generate the model. The warning leverage (h*) is the limit of normal values of x outliers which can be estimated by the Equation (10) below; where n is the number of training compounds and P represents the number of predictor variables in the model. A developed model is adjudged to be reliably predicted if the leverage hi < h* for the investigated compounds. The significance area of the model in terms of chemical space is visualized by Williams's plot ( Figure 4: plot of standardized residuals versus leverage values) (see Figure 1).
While the Euclidian distance is expressed mathematically as; The difference, x i k À x j k represents the distance of the test set compounds from the training set compounds and wk is a weighted vector corresponding to the importance of the k th descriptor in the model calculated using auto-scaled descriptors, x i k and x j k represent compounds from the test set and training set respectively [36]. The weighting makes it possible to account for the relative contribution of each variable to the similarity and improves the detection of the AD of the model [37]. Computed chemometric validation parameters [30,38,39]  Using the formula in Equation (7), the predicted R 2 value for the test set is calculated as follows;

. Molecular docking analysis
Molecular docking analysis was investigated on the studied compounds to elucidate the molecular interactions between the target (receptor) and the inhibitors (ligands) and also to visualize binding interactions as well as to identify the inhibitors with the best binding affinity to the receptor. The X-ray structure of dopamine transporter elucidates the antidepressant mechanism (receptor) obtained from Protein Data Bank (www.rcsb.org) with PDB code 4M48 [40] was used for this investigation. 2D structure of the ligands (Dopamine Transporter Inhibitors) were optimized and saved as PDB files (prepared ligands) using Spartan 14V 1.1.4 software [23]. Also, the receptor was prepared by importing the 3D crystal structure of the receptor (Dopamine  transporter elucidates antidepressant mechanism) downloaded from Protein Data Bank into Discovery Studio Visualizer for the addition of Hydrogen and removal of a water molecule, heteroatoms and co-ligands from the raw receptor and consequently saved as PDB file (prepared receptor). The molecular docking analysis was successfully carried out by using AutoDock Vina version 4.0 of Pyrex software [23]. The results of docking investigations were computed, analyzed and visualized via Discovery Studio Visualizer software.

Chemoinformatic investigation and validations of the model
The best model (Model 1) was selected as the optimal model because of its statistically significant output in predicting the antipsychotic activity of the studied compounds vis-a-vis the validation parameters. The details of the descriptors that appeared in the model are presented in Table 3. The optimal model proves to be in excellent agreement with the threshold for a generally acceptable model as reported in Table 1 (R 2 ¼ 0.802, R 2 adj ¼ 0.767, Q 2 ¼ 0.693, R 2 ext ¼ 0.77). This suggests that the optimal model is very predictive and reliable [24,30]. Also, the plot of predicted pKi against observed pKi as depicted in Figure 2 shows that the model is well trained and correctly predicted the antipsychotic activity of the compounds, an indication of goodness and stability of a model [38]. More so, the plot of observed pKi versus residual pKi ( Figure 3) reveals that there was no systemic error in the model developed as the propagation of residuals was observed on both sides of zero [23] [41]. To estimate the extent of prediction errors, the bootstrapping resampling technique was employed in developing 10000 bootstrap samples [34]. The goodness of the prediction of the model was further established by the low estimated values of bias, variance and mean square errors as reported in Table 1. Error checking by applying Chi-squared (χ 2 ) and Root-mean squared error (RMSE) procedures were also evaluated as to prove the predictive quality of the model reflected in the R 2 . The chi-squared (0.026) and RMSE (0.448) values obtained (Table1) are indications of a good predictivity of the model [30]. Also, to ascertain the quality assurance of the bioinformatics parameters in the model, Prediction Reliability Indicator 1.0 software was utilized [42]. The model displayed good prediction quality for seven of the test compounds with composite score 3 and moderate prediction for the remaining 3 compounds with composite score 2, more so, all the test compounds were fell within the region of Applicability domain while the values for the data set were in closeness with the mean value of the training set molecules (Table 2). Furthermore, to determine the possibility of multicollinearity between the descriptors used in the model, the variance inflation factors (VIF) of all the descriptors in the model were computed (Supplementary  Table SD5) using Equation (13). The corresponding VIF values of the five descriptors used in the optimal model (Model 1) were less than the critical value of 10, a good indication that the developed model is statistically significant, and the descriptors were found to be reasonably orthogonal [23,43].
where R 2 is the correlation coefficient of the multiple regression between the variables within the model. More so, the inhibitory activity (pKi) of the experimental, predicted and residual values of the studied compounds were reported in Supplementary Table SD1. Lower residual value (differences between experimental and predicted activity) obtained also substantiate the good predictivity of the model. Likewise, the Y-Randomization test was investigated to examine the robustness and determine the stability of the model. The results presented in Supplementary Table SD4 showed a low R 2 and Q 2 values which are in agreement with chemometric validation   parameters reported in Table 1. This is an excellent indication that the derived model is very robust, good and dynamic while its cRp^2 value of 0.7137, a value greater than 0.5 further proved that the model is not only inferred by chance but also very predictive [24,44]. The applicability domain of the optimal model (model 1) was equally investigated for the training set and test set using both Euclidean based and Leverage approach procedures. The results (Supplementary   Table SD3 and Figure 4) revealed that all the compounds of the test set fell within the applicability domain of the model. However, compounds 44 and 9 of the training set were identified as possible outliers in the two procedures employed as their normalized mean distance scores as well as their leverage values fell outside the domain (Supplementary Table SD2 and Figure 4) which could be considered as structural outliers (influential compounds) (see Figure 5).

Mechanistic interpretation and elucidation of descriptors in the model
The bioinformatics analysis result revealed that the antipsychotic activity of the studied compounds is influenced by the descriptors; ATS7m, MATS7s,VR2_Dzp, RDF95m and RDF150 denoted as a,b,c,d and e respectively in Table 3. Since pKi ¼ -log 10 [pKi], the positive coefficients of the descriptors; a, b, and c imply that the pKi of the compounds against DAT decreases with an increase in values of these descriptors. Thus, to enhance the antipsychotic activity of these compounds, the values of these descriptors should be appreciably high since the activity is inversely proportional to the concentration. Conversely, the descriptors; d and e have negative coefficients as depicted in the model (Equation (12)) (see Table 4). It suggests that the inhibitory activity (pKi) of the ligands decreases with a decrease in the values of these descriptors. Hence, to enhance the inhibitory potency of DAT, the values of these descriptors should be significantly low. More so, Pearson's correlation matrix was employed to examine the inter-correlation among the descriptors of the model and also the percentage contribution of the descriptors to the observed antipsychotic activities of the compounds was also determined ( Table 5). The fact that the obtained results for Pearson correlation coefficients for each pair of descriptors were less than 0.5, is a clear indication of insignificant inter-correlation among the descriptors in the model [23]. The descriptors MATS7s and RDF95m displayed a predominant effect (32% and 30.4% respectively) on the observed antipsychotic activity of the compounds. Descriptor MATS7s is a descriptor of molecular volume and is defined as Moran autocorrelation weighted by van der Waals volumes. Van der Waal volume is the volume occupied by a molecule or individual atoms of a molecule. Its positive correlation with pKi in the model implies that the higher the descriptor value in a molecule, the lesser the pKi of the molecule and the better its inhibitory activity against the target protein (receptor). Likewise, the descriptor RDF95m is a descriptor of molecular mass and is also defined as a Radial distribution function weighted by relative mass. The negative correlation of the descriptor with pKi in the optimal model shows that the lower its value in a molecule, the lesser the pKi of the molecule and the better the inhibitory activity against the receptor.

Result of molecular docking analysis
Molecular docking analysis was performed on the receptor (protein crystal structure of dopamine transporter elucidates antidepressant mechanism, PDB code 4M48) and the inhibitors (ligands) to elucidate the molecular interactions between the target protein (receptor) and the inhibitors and also to evaluate the performance of Docking Algorithms used in the study. Computed binding affinity (docking scores) of all the complexes which range from -3.9 to -10.05 kcal/mol and their corresponding RMSD values when docking the ligands into the active site of the receptor were contained in Supplementary Table SD1. Some of the selected ligands (1, 21, 28 and 39) of higher binding affinity (-9.0 to -10.05 kcal/mol) and their experimental activity, RMSD values, amino acid residues, bond distance, bond type as well as the type of molecular interactions were presented in Table 6. On visualizing the complexes by using the discovery studio visualizer 2016 version in other to elucidate the type of molecular interactions and binding mode, our findings showed that all the ligands were strongly bounded and fully occupy the active site of the receptor with the formation of major interactions (hydrogen bond, hydrophobic and electrostatic interactions) as reported in Table 6. Ligand 21 with highest docking score (-10.05 kcal/mol) showed two distinct interactions (Hydrogen bond and hydrophobic interactions) with the hydrogen bonding pocket consisting of amino acid residues; TYR123 (2.073 Å), VAL120 (2.073 Å), TRY123 (2.327 Å) and VAL126 (2.327 Å) while the hydrophobic pockets were surrounded by LEU474, ASP475, TYR123 and VAL120 (amino acid residues) of the target protein. More so, all the selected ligands displayed hydrophobic interactions with amino acid residue VAL120 common to all the selected complexes and the selected complexes have a computed RMSD value 2Å, this implies a successful and an excellent docking predictions [45]. Figure 6 depicts 2D and 3D molecular interactions of the ligand 28 while Figure 7 shows the hydrogen bond and hydrophobic interactions when the ligand occupied the active site of the target protein (PDB: 4M48). Ligand 28 revealed two significant interactions (Hydrogen bond and hydrophobic) with the formation of the highest number of hydrogen bonds (6 hydrogen bonds) among the selected ligands. The complex (28) formed three (3) Conventional Hydrogen Bonds, two (2) Carbon Hydrogen Bonds and one (1) Pi-Donor Hydrogen Bond surrounded by amino acid residues (ASP46, PHE43, SER320, ALA44, SER320 and PHE43) and with hydrophobic interactions (1 Pi-Pi T-shaped and 2 Pi-Alkyl) surrounded by the amino acid residues (PHE325, ALA117 and VAL120) in the active site of the target protein ( Table 6). The observed higher number of hydrogen bonds formed by the complex 28 compare to other selected ligands may inform its higher inhibitory value (pKi ¼ 9.30) of the ligand. The observed hydrogen bond, as well as hydrophobic interactions in the complexes, suggests that the selected ligands are potent inhibitor against the target protein (PDB: 4M48).

Conclusion
Chemoinformatic studies were successfully investigated on some inhibitors of Dopamine Transporter (DAT) to develop a predictive, reliable and statistically significant model. The results showed that the model  passed the minimum requirements for a globally acceptable QSAR model (R 2 ¼ 0.802, R 2 adj ¼ 0.767, Q 2 cv ¼ 0.693, R 2 Test ¼ 0.77 and cR 2 p ¼ 0.714). The mechanistic interpretation and elucidation of the descriptors in the model also revealed that two of the molecular descriptors; MATS7s (32.3%) (Moran autocorrelation weighted by van der Waals volumes) and RDF95m (30.4%) (Radial distribution function weighted by relative mass) played predominant roles in the observed anti-psychotic activities of the studied compounds, this implies that in the future design of a novel and highly potent antipsychotic agents, the molecular volume of a compound should be appreciably high while its molecular mass should be significantly low. More so, the predictivity, reliability, stability, robustness, and applicability of the QSAR model were established and the developed model proved to satisfy the OECD principle for a QSAR model development. Furthermore, in elucidating the molecular interactions between the inhibitors and the receptor (PDB:4M48) targeting schizophrenia via Molecular dockings analysis, our findings showed that the study molecules had excellent binding affinities (ranges from-3.9 to -10.05kcl/mol), good docking predictions (RMSD 2Å) and very significant interactions with the protein target. Likewise, ligand 28 (3phenoxy-N-(2-(m-tolyloxy)ethyl)propan-1-aminium) with the highest experimental activity (pKi¼ 9.30) exhibited the highest number of hydrogen bonds when compared to other selected ligands. Hence, the obtained results from this study are envisaged to provide necessary information on the structural requirements and physicochemical parameters/ properties needed to develop novel and more potent antipsychotic therapeutic agents with improved activity for the treatment of schizophrenia and other related mental disorders.

Author contribution statement
Adamu Uzairu: Conceived and designed the experiments; Contributed reagents, materials, analysis tools or data.
Olasupo Sabitu Babatunde: Performed the experiments; Wrote the paper.
Gedion Adamu Shallawang: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.
Sani Uba: Analyzed and interpreted the data; Wrote the paper.

Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.