Hologram-and Descriptor-Based QSAR Studies for a Series of Non-Azoles Derivatives Active Against C . neoformans

Durante as últimas décadas as infecções fúngicas têm se tornado um crescente problema de saúde, especialmente para pacientes imunocomprometidos. Infelizmente, o padrão-ouro de terapia profilática para tal doença é baseada em derivados azólicos, que são mais fungistáticos do que fungicidas contra C. neoformans e causam hepatotoxicidade. Objetivando contornar estes problemas, inibidores não–azólicos de CYP51 foram planejados. Aqui, um abrangente estudo de relação estrutura-atividade foi executado para uma série de 110 moléculas através de um estudo de QSAR baseado em hologramas e descritores moleculares. O melhor modelo de QSAR baseado em descritores (r = 0,92, q = 0,90, 6 LVs e rpred = 0,86) sugere que o efeito de ressonância (ESpm08r) desempenha um papel principal para a atividade antifúngica. O modelo de QSAR baseado em hologramas (r = 0,87, q = 0,81, 6 LVs e rpred = 0,84) sustenta esta hipótese. Estas percepções obtidas pelas análises integradas dos modelos de QSAR, juntamente com o bom poder preditivo comprovam sua utilidade para futuros esforços em planejamento de fármacos.


Introduction
Over the last decades fungal infections have become an increasing health problem, especially for immunocompromised patients, such as those receiving chemotherapy, organ transplant recipients and HIV positive patients, [1][2][3] for whom even opportunistic fungi, such as Cryptococcus sp present a life-threatens issue.In fact Cryptococcal meningoencephalitis, common among HIV patients, has a high mortality rate even when treated with first-line antifungal drugs. 4,5][3][4][5][6] Unfortunately, the extensive use and prolonged therapy with azole antifungal agents have led to severe resistance, which significantly limits their use. 7nfections caused by C. neoformans present an additional complexity as it has inherent resistance to fluconazole. 8,9urthermore, fungistatic rather than fungicidal activities and hepatotoxicity pose as serious drawbacks for azole use.Apparently, similarity between human and fungal therapeutic targets makes the pharmacophore requirements of azole drugs profoundly correlated to their toxicophore, as the binding of heterocyclic nitrogen atom (N-3 of imidazole and N-4 of triazole) to the heme iron atom in the active site of lanosterol 14α-demethylase (CYP51) enzyme is responsible both for the pharmacological outcome and the hepatic side-effects.In fact, cases of death due to liver failure are ascribed to the coordination of azole drugs to the heme of host cytochrome P450 enzymes, such as CYP3A4, 10,11 whereas their binding towards fungal CYP51 leads to fungal growth arrest following the accumulation of precursor 14α-methylated sterols in the fungi membrane.
3][14] Despite novel azole derivatives, for instance posaconazole and voriconazole, present lower hepatotoxicity than first generation azoles, they still cause liver enzyme elevation. 15n the other hand, inhibitors that explore specific interactions (H-bonding, hydrophobic interaction and so forth) in the active site of fungi CYP51, but do not interact with heme prosthetic group are expected to have antifungal effect only. 16Taking this hypothesis into consideration homology modeling strategies guided the design and synthesis of diverse non-azole CYP51 inhibitors (Figure 1). 10,16,17though this strategy relied on subjective structural data and docking studies, compounds that are more potent than fluconazole against C. neoformans, have been designed.As these compounds are structurally different from azole drugs, they probably have dissimilar physicochemical requirements for CYP51 inhibition.Nevertheless, as far as we are aware, no effort was made to investigate their structure-activity relationships from a quantitative point of view.The results described herein try to shorten this knowledge-gap by means of hologram-based and descriptor-based QSAR models that do not require structural information from the molecular target and thus are less prone to bias induced by other modeling strategies.

Data set
The data set used for the QSAR studies contains 110 derivatives of isoquinolines, chromenes and 2-aminetetralines along with their antifungal activity (Table 1). 2,10,18The biological property of this dataset is reported as MIC 80 values, which is the antifungal concentration required to inhibit 80% of C. neoformans growth.In order to overcome small data variability among different experiments, an internal standard was employed (MIC 80fluconazole ) so that the biological data used for QSAR models development is based on the ratio (MIC 80cpd /MIC 80fluconazole ).These values were converted to pMIC 80 (-log(MIC 80cpd /MIC 80fluconazole )) and used as dependent variables in the QSAR model development.
The chemical structures were drawn in the two-dimensional (2D) format and converted to 3D using CONCORD standard parameters, as available in "translate molecular file" tool, from Sybyl-X 1.1 plataform (Tripos Inc., St. Louis, USA) and then energy minimized by conjugated gradient using Tripos force field (convergence criteria 0.001 kcal mol −1 ).Next, MOPAC charges were added to each molecule (AM1 semi-empirical method with the following keywords: 1SCF XYZ ESP NOINTER NSURF = 2 SCALE = 1.4 SCINCR = 0.4 NOMM).This protocol is necessary so that charge-related descriptors can be properly calculated in DRAGON.A unsupervised method HCA (Hierarchical Cluster Analysis), carried out with Pirouette 4.0 software (Infometrix, Washington, USA), using the complete linkage clustering method (Euclidean distances) and data autoscaling, was used to guide to split the complete dataset into training (compounds 1-75) and test (compounds 76-110) sets.(Table 1) Accordingly, at least 1 compound from each cluster (similarity degree = 90%) was assigned to the test set (Figure 1S).
Descriptor-based QSAR approach About 2,500 2D molecular descriptors, including topological descriptors, connectivity indices, 2D autocorrelation and physicochemical descriptors and so forth, were computed using the DRAGON 5.5 software (Talette SRL, Milan, Italy) and then pre-selected as follows: descriptors with high inter-correlation (≥ 97%) or those poorly related to the biological property (r 2 < 0.10) were discarded.This strategy yielded 215 physicochemical descriptors that were employed to build multiple linear regression models (MLR) with up to 6 descriptors per model using genetic algorithm, as available in MOBYDIGS 1.0 software (Talette SRL, Milan, Italy).The MLR models were evaluated using the following fitting criteria: QUIK rule (0.005), asymptotic Q2 rule (−0.005), redundancy RP rule (0.05) and overfitting RN rule (0). 19,20ue to the stochastic nature of the genetic algorithm, the Table 1.continuation search (Population size = 100, reproduction/mutation trade-off = 0.5, selection bias = 50%) was carried out using 10 independent populations of 2000 models each that evolved for more than 100 generations or at least 2 million models.The descriptors found in the 20 best models were pooled together, autoscaled and employed to develop partial least squares (PLS) models, as implemented in the PIROUETTE 4.0 software (Infometrix, Washington, USA). 19,21,22logram-based QSAR strategy Statistical HQSAR modeling was carried out as previously described. 23,24Briefly, each molecule in the dataset was decomposed into linear, branched, and overlapping fragments, which were hashed to a fixed-length array (53 to 401 bins) that is called molecular hologram.The bin occupancies encode compositional and topological molecular information used as independent variables in QSAR modeling.Parameters that affect hologram generation such as hologram length, fragment size and fragment distinction (atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor/acceptor (DA)) were evaluated during model development, using default fragment size 4-7 over the 12 default series of hologram lengths.Next, the influence of fragment size (2-5, 3-6, 5-8, 6-9, 7-10, 8-11) was further investigated for the best model.All models generated in this study were investigated using the full cross-validated r 2 (q 2 ) Partial Least Squares (PLS) Leave-One-Out (LOO) method and the stability of best model was evaluated using the mean value of full cross-validated r 2 (q 2 ) from 25 rounds of leave-5-out method.

QSAR model validation
External validation was carried out using a test set of 35 compounds, which were not considered for the purpose of QSAR model development.[27]

Results and Discussion
Hepatotoxicity is one of the main concerns for longterm treatment of deep-seated fungal infections with azole drugs, especially among immune-compromised patients that suffer from recurring fungal infections. 28As a consequence, medicinal chemistry efforts have been directed towards separating azole´s toxicophore from their pharmacophore.Some progress has been achieved by maximizing the interactions with residues of fungal lanosterol 14α-demethylase that are different from or absent in the human counterpart, while ruling out interactions with the heme group.This strategy is hampered by the lack of reliable structural information on the macromolecular target, but could benefit from ligand-based strategies, such as quantitative structure-activity relationship models that rely either on topological or hologram-based descriptors.
Accordingly, a dataset of 110 isoquinoline, chromene and 2-aminetetraline derivatives were split into training set (compounds 1-75) and test set (compounds 76-110) so that chemical diversity and potency range were similarly represented in both sets.(Figure 1S) The antifungal-activity against C. neoformans (MIC 80 ) ranges from 1370 mmol L -1 to 0.5 mmol L -1 (a factor of about 2700) and was measured under the same experimental conditions.This fact renders this property suitable for QSAR studies.Although all assays were carried out under standardized NCCLS (National Committee for Clinical Laboratory Standards) conditions, the biological data still bear a minor inter-experiment variation, for instance MIC 80 values for fluconazole vary from 6.5 mmol L -1 to 13.1 mmol L -1 .Aiming at overcome this dilemma and produce robust QSAR models, the MIC 80cpd /MIC 80fluconazole ratio was employed to calculate pMIC 80 (−log MIC 80cpd / MIC 80fluconazole ), the dependent variable used in QSAR modeling.This strategy improved the overall statistical soundness of the QSAR models due to experimental-noise reduction, proving essential to the following steps.
Next, 215 topological descriptors, available in DRAGON 5.5, were computed and employed to build preliminary multiple linear regression QSAR models with up to 6 variables, by means of genetic algorithm, using MOBYDIGS 1.0.QSAR models with up to 5 variables have q 2 values below 0.5, but improved results were achieved when 6 variables were employed (r 2 = 0.78, q 2 = 0.76).Nevertheless, the predictive ability of the MLR models was marginal (r 2 pred = 0.19).This suggests that the chemical and structural features captured in the model do not extend beyond the chemical space of training set compounds, limiting its usefulness in drug design.In order to improve QSAR model applicability domain, we resorted to more powerful statistical tools such as partial least square (PLS) and principal component analysis (PCA), available in PIROUETTE 4.0 software.Thus, 32 descriptors, found in the 20 best models were pooled together, autoscaled and used for further PCA and PLS QSAR model development.The underlying goal of using unsupervised chemometric tools, such as PCA, is to investigate whether the low predictive power is due to insufficient sampling of descriptor space (reduced number of descriptors per model) or due to inadequate representation of chemical space (i.e., 3D information is required to describe the structure-activity relationship for this dataset).
Analysis of the loading plot for the 2 first PCs, which account for 67.6% of the data variability (Figure 2), shows that the 80% of most potent inhibitors (pMIC 80 > 0.01) have PC2 values below −2.5, whereas 79% of the average potency inhibitors (−1.0 < pMIC 80 < 0.0) are grouped between −2.5 and 2.0 and 83% of weaker inhibitors (pMIC 80 < −1.0) lie above 2.0.This result clearly shows that PC2 explain the compounds potency, suggesting that selected descriptors are capable of explaining the potency profile of this dataset and thus are suitable for QSAR model development.Accordingly, they were gathered, autoscaled and employed for QSAR models development by partial least square regression (PLS), as available in PIROUETTE 4.0.Initial models show only minor improvement in statistical values (r 2 = 0.81 and q 2 = 0.77, 6 LVs), but large increase in predictive ability (r 2 pred = 0.72), in comparison to the predictive power of the RLM model (r 2 pred = 0.19).In order to further improve the statistical soundness of QSAR models, an iterative exclusion of 16 descriptors that have minor contributions to the regression vector was carried out until no improvement in statistical values was achieved.The regression vector can be thought of as a weighted sum of the loadings included in the model.Thus, descriptors with small coefficients do not contribute significantly to explain the dependent variable (biological activity) and can be discarded from the model.This strategy lead to an overall improvement in model adjustment (r 2 = 0.92, q 2 = 0.90) and predictive power (r 2 pred = 0.86), when 6 latent variables (LV), 96.4% explained variance, where employed (Figure 3 and Table 2).
More than statistical soundness, useful QSAR models should also provide some insight into the physical-chemical and structural requirements for the biological activity. 29This sort of information can be gathered from the analysis of the regression vector plot, which underscores the relative importance of descriptors towards the final QSAR model (Figure 4).According to regression vector, B09[C−N] has the greatest positive contribution to antifungal activity of these compounds.
B09[C−N] describes the presence or absence of carbon (from aliphatic side chain or methoxyl group) linked to nitrogen at topological distance of 9 bonds.It is interesting to note that 93.33% of weak inhibitors (pMIC 80 < −1.0) have zero value for this descriptor, whereas this moiety is easily identified in most of average (73.33%) and potent (93.33%) inhibitors (Figure 5).Although this descriptor can be traced to different molecular fragments, it is noteworthy that most of them are constant throughout the dataset, whereas some chemical diversity can be traced to fragments connecting methoxyl groups and the nitrogen atom of isoquinoline inhibitors (Figure 5).In fact, linear alkyl substituent groups with 8 or more atoms connected to nitrogen atom seems to   improve the biological activity of non-azole compounds.This hypothesis is reinforced by docking studies carried out by Yao, B and cols.(2007) which suggest that such moieties might properly interact into the hydrophobic pocket of the lanosterol 14α-demethylase. 10 the other hand, ESpm08r which accounts for resonance integrals between atoms eight bonds apart 30 shows a negative contribution to the biological activity (Figure 6).Then, it is reasonable to assume that moieties that affect charge distribution on aromatic rings, such as the fused pyranone or the dihydro-pyrane ring, play an important role towards the biological activity.
Ring substituent importance to the biological property have already been highlighted by Tang and cols (2008), but for a different reason as those authors claim that these moieties would contribute to activity through hydrogen bonding to Tyr169.Our results do not contradict this hypothesis, but adds a new, quantitative, perspective to the scenario: Charge distribution, which might strengths or weakens a hydrogen bond, contributes significantly to non-azoles antifungals potency.
Despite the fact that homology models have played their part in novel antifungal drug design campaigns, we decided to investigate whether 2D information only would suffice to guide the development of second-generation non-azole antifungals.Therefore, we resorted to another 2D QSAR approach, Hologram QSAR, which has been proven as effective as many 3D QSAR approaches and complements the information provided by descriptor-based 2D QSAR models. 26,27,31,32Hence, molecular holograms were generated for training set compounds using a number of fragment distinction combinations (Table 3).
The standard fragment distinction (ABC) shows poor fit (r 2 = 0.55), which was not improved by the addition of hydrogen (H) or chirality (Ch) to fragment distinction (compare models 2 and 3 vs. 1).A slight improvement was observed when donor and acceptor atoms (DA) was considered (r 2 = 0.58).The low statistical quality of the models might indicate that hologram-based descriptors are not suitable to describe non-azole antifungals biological property or that the fragment distinction combinations investigated so far have contrasting effects to the model.For   instance, use of DA might improve fit, but B could have the opposite effect.Aiming at further investigate this hypothesis simpler models (5 to 8) were built, in which either B or C on the distinction was not employed.This strategy provided two models with improved fit, but still with limited internal consistency (models 7 and 8).Further addition of the fragment distinction C to model 8 resulted in the best fitted model, but no significant improvement in internal consistency was observed.Aiming at circumvent this setback, the influence of fragment size over the statistical parameter was also investigated (Table 4).Generally, only the model with highest statistical quality is evaluated, but as shown below this strategy might be misleading.
It is well known that fragment-distinction and fragment-size control the number of fragments that are generated and then hashed to build the hologram.During this step, fragment collision, due to greater number of fragments than hologram length, leads to abrupt changes in the statistical parameters. 33Thus, it is reasonable to assume that even sub-optimal HQSAR models, built with default fragment size (4-7 atoms), might provide the best QSAR model after fragment-size optimization.Therefore, the influence of the size was investigated for all models with q 2 > 0.50 (Table 4).This strategy revealed a common trend among the models: larger sized fragments afford models with significantly higher statistical quality, which is the best model (Figure S2 and Table 4).
This intriguing feature, obviously, must be related to some hidden SAR pattern that was not captured by default-sized HQSAR models.One reasonable explanation is that 7-atom length fragments are not enough to explain the contribution of the N-alkyl side chain, which according to Yao and cols.(2007) and Tang and cols.( 2010) is essential to activity.
Beside of good statistical quality, QSAR models should be able to predict the biological property of congeneric molecules and thus guide the development of second generation non-azole antifungals.The predictive power of the best HQSAR model (Figure 7 and Table 2), evaluated as described in Material and Methods section, (r 2 pred = 0.84) is similar to that observed for descriptor-based-QSAR model (r 2 pred = 0.86).Although it should be mentioned that compounds with pMIC 80 > 0.5, within the training-set, are frequently predicted as less potent than they really are.This trend is less obvious in descriptor-based QSAR model suggesting that hologrambased descriptors (molecular holograms) do not cover all the information required to explain the biological activity of non-azole antifungals that are more potent than fluconazole.Nevertheless, it should be possible to integrate the information provided by both models to reveal structural and chemical properties that are crucial for the antifungal activity.As stated before, regression vector plot can be used to analyze the contribution of each descriptor to the descriptor-based QSAR model.Similarly, contribution maps, which graphically represent atoms or fragments with positive (yellow or green) or detrimental (orange, red) contribution towards antifungal activity, can be used to analyze the HQSAR model.A shortcoming of this analysis it that fragments that hold pharmacophore groups into position can be colored gray (neutral contribution), once they are found in weak and potent compounds.
According to contribution map of the less potent compound (cpd 06, pMIC 80 = −2.02), the aromatic ring appears partially colored in red (Figure 8) whereas in potent compounds (cpd 35, pMIC 80 = 0.60) the aromatic ring is widely colored in green and yellow, indicating that this ring can either improve or decrease the biological activity.Apparently, this coloring scheme can be associated with resonance effects once the most potent compound has one of the lowest Espm08r values (11.5), whereas the less potent one has the highest one (12.2).

Conclusions
Although our approach is based uniquely on 2D fragments or physical-chemical descriptor, QSAR models with good statistical values were developed (descriptor-based (r 2 = 0.92, q 2 = 0.90, 6 LVs and r 2 pred = 0.86; hologram-based r 2 = 0.87, q 2 = 0.81, 6 LVs and r 2 pred = 0.84).Furthermore, the interactive analysis of 2D QSAR models showed to be useful to highlight the importance of resonance effects towards antifungal activity of isoquinoline, chromenes and 2-aminetetralines derivatives as well as pinpoint which part of the molecules are most influenced by this feature.For instance, HQSAR contribution maps suggest that fused rings play a crucial role towards the antifungal activity, due to electrostatic interactions towards lanosterol 14-alpha demethylase.Besides, the great molecular diversity of the dataset along with the good predictive ability of the QSAR models indicate that molecular features underscored in this study should apply not only to training set compounds, but also the congeneric molecules that lie within the chemical space sampled by isoquinoline, chromenes and 2-aminetetralines derivatives.Thus, QSAR models reported herein should be useful to guide the design of more potent non-azole antifungal compounds.

Figure 2 .
Figure 2. Loading plot for training set compounds according to PCA.Weak inhibitors (open diamond) are clearly separated from average (grey squares) and potent (black cycles) non-azole antifungals.

Figure 3 .
Figure 3. Predicted vs. Experimental values of pMIC 80 according to the best descriptor-based QSAR model.

Figure 4 .
Figure 4. Regression vector plot.Descriptors that increase potency have positive leverage, whereas those that reduce potency display negative values.

Figure 5 .
Figure 5. (A) plot of B09[C−N] values versus pMIC 80 of training set compounds; (B) B09[C−N] descriptor highlighted in different molecules of the training set.

Figure 6 .
Figure 6.(A) Plot of ESpm08r values versus pMIC 80 of training set compounds; (B) Influence of fused rings over the charge density and antifungal activity of selected compounds.

Figure 7 .
Figure 7. Predicted vs. experimental values of pMIC 80 according to the best Hologram QSAR model.

Table 1 .
Structural scaffold and pMIC 80 values of training and test set compounds

Table 2 .
Predicted pMIC 80 values for test set compounds according to hologram-based (HQSAR) and descriptor-based QSAR models

Table 4 .
Hologram-and Descriptor-Based QSAR Studies for a Series of Non-Azoles Derivatives Active Against C. neoformans J. Braz.Chem.Soc.1632 Influence of fragment size over the statistical parameters of the 3 best HQSAR models