D Chemometric Studies of a Series of Azole Derivatives Active against Fluconazole-Resistant Cryptococcus gattii

Apesar dos avanços no desenvolvimento de antifúngicos, tem ocorrido um aumento de casos de criptococose que não respondem de forma adequada a fluconazol (fármaco de primeira escolha). Portanto, é de suma importância investigar as propriedades químicas de derivados azólicos que sejam ativos contra cepas de Cryptococcus neoformans resistentes a fluconazol. Visando alcançar esse objetivo, o perfil de suscetibilidade de um isolado clínico de C. neoformans resistente contra 33 derivados azólicos comerciais foi avaliado junto com as suas respectivas concentrações inibitórias mínimas (MIC). Esses dados foram utilizados para construir modelos SIMCA (modelagem independente flexível por analogias de classes) que destacam a importância de propriedades eletrônicas (JGI10) para separar as moléculas ativas das inativas e para construir modelos de holograma-QSAR que apresentam bom ajuste, mas capacidade preditiva baixa (HQSAR, r = 0.85, q = 0.35 e rpred = 0.38). Por outro lado, modelos de QSAR 2D desenvolvidos a partir de descritores topológicos apresentaramm boa qualidade estatística (r = 0.95, q = 0.86, rpred = 0.72) e destacam que a distribuição de cargas (GGI1) e a eletronegatividade topológica (GATS1e e MATS2e) devem ser modulados para contornar a resistência de C. neoformans.


Introduction
F u n g a l m e n i n g i t i s , c a u s e d e i t h e r b y Cryptococcus neoformans or Cryptococcus gattii, 1 distresses approximately 1 million people each year and causes more than 600 thousand deaths. 2 The first one is worldwide relevant, especially for HIV positive patients, while the second also can affect immunocompetent persons in tropical and subtropical regions. 3Currently, azole derivatives are considered as the first choice drug for long-term therapy. 4However, the emergence of fluconazoleresistant C. neoformans isolates, along with the fact that C. gattii infections respond poorly to azole treatment, 5,6 poses a tremendous hurdle to this approach. 7,8Taking this scenario into consideration, many research groups have focused their efforts towards the development of novel antifungal drugs that could circumvent resistance issues. 9ost antifungal drug design campaigns have relied on Vol. 24, No. 6, 2013   old-fashioned trial and error paradigm, according to which lead compounds have their structures modified, guided by synthetic feasibility and intermediate biological assay results, until the desired potency/selectivity is achieved.][12][13] An alternative strategy would be to explore ligand-based strategies to extract information from the large amount of data available from phenotypic assays.This approach allows the screening of hundreds of compounds at low costs, within a reasonable amount of time, and takes into consideration pharmacokinetic issues, such as drug permeability through the cell membrane, that are not accounted for in target-based assays. 14This sort of biological data has already been used, for instance, to build descriptor-based classificatory models (k-nearest neighbor (KNN) and soft independent modeling of class analogy (SIMCA)) that hint at the importance of electronegativity and dipolar moment towards the biological activity of azole derivatives against Moniliophthora perniciosa, the causal agent of Witches Broom disease. 15Although all azole antifungals target the same macromolecule, subtle differences in the binding site might alter the structural requirements for potency.7][18] Thus, our group decided to investigate the structure-activity relationships of commercial azole against a fluconazole resistant C. gattii clinical isolate by means of 2D chemometric approaches. 15,19So, in order to accomplish the goal, the biological activities of 33 azole derivatives were measured under standardized conditions and used to build robust and predictive classificatory and quantitative 2D chemometric models that underscore crucial chemical features for antifungal activity towards a fluconazole resistant C. gattii strain.

Reagents
The 33 imidazole and triazole derivatives employed in this study were purchased from the Sigma-Aldrich Fisher Company (St. Louis, MO, EUA) with purity equal or superior to 95% (Table S1 in the Supplementary Information (SI) section).All other reagents used in the culture medium preparation, buffer solutions and so on were acquired from well-known chemical companies (example Sigma-Aldrich and Merck chemical (Darmstadt, Germany)) and used without further purification.

Disc diffusion susceptibility assay
The activity profile of 33 azole derivatives against a fluconazole resistant C. gattii strain (available in the Microbiology Research laboratory (LPMC) from Pharmacy School of Federal University of Bahia) was evaluated by disk diffusion susceptibility assay. 20onsidering the lack of standards concerning Cryptococcus spp assays, all compounds were initially evaluated at the standard concentrations employed for an American Type Culture Collection Candida albicans (ATCC 90028): 8.2 mmol L -1 (25 μg disc -1 ) as suggested by M-44A2 guideline. 20Twice the standard concentration (16.4 mmol L -1 ) and half of it (4.1 mmol L -1 ) were also evaluated to probe the susceptibility range profile (Table 1).All assays were carried out in triplicate, along with positive growth, negative growth and dimethyl sulfoxide (DMSO) controls.Process control (manipulation) was probed by simultaneous evaluation of Candida albicans (ATCC 90028) and Candida parapsilosis (ATCC 22019) susceptibility profile against standard fluconazole concentration (8.2 mmol L -1 ).

Minimum inhibitory concentration assays
The broth microdilution procedure was employed to determine the minimum inhibitory concentrations (MIC). 21riefly, the azole derivatives were solubilized in DMSO, except for fluconazole that was solubilized in Roswell Park Memorial Institute medium (RPMI) 1640, to produce stock solutions (1600-5120 mg mL -1 for fluconazole) that were serial diluted and dispensed into microdiluition plates.The final solutions (ranging from 128 to 0.062 mg mL -1 ) were inoculated with a McFarland 0.5 standard saline suspension, containing C. gattii strain, diluted into saline:RPMI solution (1:100).
All MICs were defined as the lowest concentration of azole that produces more than 50% inhibition growth (Table 1).Every measurement was carried out in quadruplicate and positive control (fluconazole MIC against standard strains Candida parapsilosis (ATCC 22019) and Candida krusei (ATCC 6258)), sterility control (only RPMI) and DMSO control (20 μL of DMSO, 80 μL of RPMI and 100 μL of fungal suspension) were also carried out. 21

Data set
The data set used for the chemometric studies comprises 33 azole derivatives, whose biological activity against Cryptococcus gattii was measured as described previously.The chemical structures were sketched in Sybyl-X 1.1 platform (Tripos Inc., St. Louis, USA) and energy optimized with Tripos molecular force-field (gradient < 0.01), using Gasteiger-Huckel charges and a dielectric constant equal to 80. Next, PM3 semi-empiric method (keywords: 1SCF XYZ ESP NOINTER SCALE=1.4NSURF=2 SCINCR=0.4NOMM) was used to assign partial charges to all molecules, as available in Sybyl-X 1.1 Mopac module.
Two biological properties were evaluated in this study: susceptibility assay distinguishes active from inactive azoles against fluconazole resistant C. gattii, whereas MIC values probe the potency profile.Hence, these values were separately used to build classification/pattern recognition models and multiple linear/partial least squares regression models.Thus, the diameter of inhibition halo, measured at 4.1 mmol L -1 (Table 1), was used to split the compounds into active (inhibition halo larger than 25 mmm) or inactive (Figure 1 and Table 2) compounds.MIC values, expressed in molar concentration (Table 1), were transformed to pMIC (-log [MIC]) and used as dependent variables in   4).

Descriptor calculation and selection
2D descriptors available in DRAGON 5.5 (Talette SRL, Milan, Italy) were calculated for the whole dataset.Next, descriptors with high pairwise-correlation (≥ 97%) and low variance (< 10%) were excluded.The remaining descriptors were subjected to different selection protocols, as described below: (i) Classification/pattern recognition models: Fisher's weight 15,22 was employed to identify descriptors that individually differentiate the two classes of compounds.Descriptors with values above the mean plus two times the standard deviation (95% confidence interval) were selected for further chemometric analysis (KNN and SIMCA) available in the Pirouette 4.0 software (Infometrix Inc., Washington, USA).
(ii) MOBYDIGS 1.0 software (Talette SRL, Milan, Italy) was employed to build preliminary multiple-linear regression QSAR (MLR-QSAR) models, with up to 4 descriptors per model through genetic algorithm. 23MLR-QSAR model optimization was guided by the following rules: QUIK rule (0.05), asymptotic Q 2 rule (-0.005), redundancy RP rule (0.1) and overfitting RN rule (0.01). 24,25Due to the stochastic nature of genetic algorithm, the search was carried out in ten independent populations of 2000 models.Each population evolved for at least 1500 generations.Finally, the descriptors found in models with q 2 > 0.64 were polled together, autoscaled and employed for further chemometric analysis (hierarchical cluster analysis (HCA), principal component analysis (PCA) and partial least square regression (PLS)) available in the Pirouette 4.0 software.

KNN and SIMCA studies
KNN models were built considering the Euclidean distance between the unknown sample (yet to be classified) and the k-th nearest neighbors, whose class is previously known. 26,27Too small or too large, a number of neighbors causes either instability or loss of precision to the model, so the optimum number of neighbors was determined by leave-one-out cross validation (LOO cv ).
LOO cv approach was also employed to evaluate the number of PCs that should be used to describe each class in SIMCA models.The model improvement was carried out by the iterative elimination of descriptors with low discriminant or modeling power.

Model evaluation and validation
As two biological properties were employed during chemometric model development, dissimilar criteria were necessary to evaluate their fit and predictive power.Nevertheless, in both cases, the complete data set was randomly split into training and test set compounds for external validation purposes (see Table 2 for KNN and SIMCA and Table 4 for descriptor-based and fragmentbased QSAR models).Classification and pattern recognition models were evaluated according to the percentage of correctly classified training set compounds, whereas the equivalent measure for test set compounds was employed to account for the predictive ability of such models.
On the other hand, fragment-based (HQSAR) and descriptor-based (QSAR) models were evaluated by LOO cross-validation (q 2 ) and had their external predictive power (r 2 pred ) calculated using the approach described in the lieterature. 31,32

Results and Discussion
5][36] It is noteworthy that although modern drug design methodologies have been successfully applied to fight Candida spp.azole-resistance and guide the development of antifungal compounds, 19,37,38 similar strategies have been scarcely employed in the development of drugs against Cryptococcus spp.
One approach to circumvent this dilemma is to evaluate the structural and physicochemical features of azole derivatives that are currently available to fight Cryptococcus spp resistant strains, such as the fluconazole resistant clinical isolate of C. gattii employed in this work.This new stand point of view over azole antifungals must be considered in two steps.First, it is essential to underscore which properties should be considered to properly select an azole derivative that is active against a fluconazole-resistant C. gattii infection.Secondly, it is important to highlight which features account for the different biological profile observed among azole derivatives against C. gattii since this information might shed some light on the chemical requirements to overcome fungal resistance.Most of the time, pharmacists and physicians measure fungal susceptibility profile by diffusion disc assay, so this technique was used to classify a dataset of 33 commercial azole antifungals as active or inactive against C. gattii.
Due to the lack of standardized microbiologic protocols to assay most azole derivatives included in this study, adaptations to M44-A2 protocol were required. 20reliminary assays were carried out with discs containing 8.2 mmol L -1 of each compound, a concentration equivalent to that proposed for evaluating the susceptibility of Candida albicans (ATCC 90028) against fluconazole.Next, twice (16.4 mmol L -1 ) and half (4.1 mmol L -1 ) of the standard concentration were probed (Table 1).Although diffusion disk protocols do not allow us to evaluate the compound potency, the results clearly highlight that some azole derivatives are ineffective against C. gattii in all concentrations assessed (example, fluconazole, fluotrimazole and flutriafol).On the other hand, many compounds have inhibition halos greater than the average (example, ketoconazole, cyproconazole, difenoconazole, diniconazole, hexaconazole and so forth).Considering the average inhibition halo as criteria (Table 1), it is possible to split the azole derivatives into either inactive or active against C. gattii.Regardless the disc concentration, the compound classification remains the same for all but 3 compounds (highlighted in Table 1), so it was decided to use the results from the lower concentration disc to classify 18 compounds as active (54.5%) and 15 as inactive (45.5%) against C. gattii (Table 2).
In order to evaluate if the chemical information provided by this study would be restricted to the compounds analyzed or could be generalized to congeneric azole derivatives, not used for model calibration, the initial dataset was randomly split into training set (13 active and 11 inactive compounds) and test set (5 active and 4 inactive compounds) (Table 2).Although DRAGON 5.1 software provides more than 2400 topological descriptors upon which chemometric tools could be applied, a limited number of those indeed is related to the biological activity of azole derivatives against C. gattii.In order to select those descriptors that have any correlation to fungal susceptibility assay, Fisher's weight was employed to select descriptors that individually differentiate active from inactive compounds.The 10 descriptors (Table 3) selected by this strategy were gathered, autoscaled and used for KNN and SIMCA model developments.
KNN approach ascribes to all descriptors similar importance to compound classification and is highly influenced by the number of neighbors.In order to avoid model instability as well the loss of precision caused by either smaller or greater than optimal number of neighbors, LOO cv was used to select the number of neighbors that best fit the data.Accordingly, the KNN model built with 10 neighbors correctly classified 69% of active and 73% of inactive training set compounds.Similar accuracy degree was observed for the test set compounds (60% of the active and 75% of inactive compounds correctly classified).In order to improve the quality of the statistical model and avoid problems associated with highly correlated descriptors (example, BEHm3 vs. MATS8p (r = 0.72) or BEHm3 vs. BEHe3 (r = 0.79)), it was resorted to chemometric approaches that rely on PCs, such as SIMCA.This approach not only reduces the dimensionality of the models and avoids correlation issues, [39][40][41] but also allows a compound to be classified as "unknown" in case it lies in a chemical space outside the applicability domain of the model. 42,43In addition, SIMCA models can be optimized by the iterative exclusion of descriptors that have minor contribution to class separation (discriminating power) or that display little influence on the model (modeling power). 44Despite the fact that the initial model presents good statistical results (83% of active and 81.3% of inactive compounds correctly classified with 4 PCs), the exclusion of 3 descriptors lead to a less complex SIMCA model, which correctly classifies 85% to active and 100% of inactive training set compounds, using 3 PCs for each classes.Although, a somewhat lower classification power was observed for the test set compounds of best model (60% of the active and 75% of inactive compounds correctly predicted), this result is still better than the initial model (33% of active and 45% of inactive compounds) (Figure 2).
A model is as good as the data it is built from, so it is not surprising that the disc diffusion assays provided models with limited classification/predictive power.6][47] The limitation of disc diffusion assays can be partially a consequence of compound diffusion rate in agar.In fact, a potent compound with low diffusion rate would produce a small inhibition halo and thus would be considered "inactive".This subject deserves some thought if triticonazole is taken into consideration.According to disc diffusion assay, this compound is inactive (19.3 mm inhibition halo), but microdilution in broth assay reveals that its MIC (12.59 mmol L -1 ) is comparable to the most potent compounds.Interestingly, the best SIMCA model "erroneously" classifies this compound as active.
Despite moderate predictive capability, the best SIMCA model can shed some light on the structure-activity relationship of azole derivatives that are active against fluconazole-resistant C. gattii.A similar strategy was successfully employed by Mota et al. 15 which uncovered the importance of electronegativity (BEHe3) and dipolar moment (JGI4), as well as H-bonding towards the activity of azole derivatives against M. perniciosa.As expected, evolutionary pressure, both natural and caused by antifungal-therapy, leads C. gattii to have a different susceptibility profile from that observed for M. perniciosa.Accordingly, azole derivative SAR must also have been altered.In order to investigate this matter, the descriptors with highest modeling power (BEHe3) and discriminant power (JGI10) (Figure S2) were further analyzed, as described below.
JGI10 is an average index of topological charge which evaluates the charge transference between pairs of atoms that are 10 bonds apart. 48,49Most of the inactive molecules have zero value for this descriptor (73%), whereas 50% of the active molecules have greater than zero values.This result suggests that charge dispersion has a large effect towards the antifungal activity.The descriptor BEHe3 is a Burden eigenvalue descriptor that accounts for the Sanderson atomic electronegativity of atoms separated by 3 bonds. 50,51The analysis of BEHe3 values suggests that extreme values are found only in inactive compounds (Figure S3 in the SI section).This analysis is in good agreement with the information provided by JGI10, once both descriptors highlight that electronic features are essential to the biological property.
Similarly, Mota et al. 15 showed that these types of descriptor (JGI4 and BEHe3) are useful for describing the activity profile of azole derivatives against the filamentous fungus Moniliophthora perniciosa.
Although promising, the classification models developed so far are not suited to predict the potency of azole derivatives and, therefore, have limited usefulness in the design of novel azole derivatives that would be more active against fluconazole-resistant C. gattii strains.In order to overcome this limitation, MIC values, obtained from microdilution in broth assay, were used to build QSAR models.
The pMIC values of training and test set compounds not only are normally distributed across the potency range (Table 4 and Figure S3 in the SI section) and were obtained under standardized conditions, but also account for pharmacokinetic effects that would not be captured in kinetic assays with the purified macromolecular target.

Hologram QSAR modeling
HQSAR technique was used as a first resource because it is not biased by subjective alignment rules required by 3D QSAR approaches, such as CoMFA and CoMSIA, but shows comparable statistical quality to those methods. 29,50r hypothesis was that fragment-like descriptors (molecular holograms) would be useful to build robust 2D QSAR models.Initially, the influence of fragment distinction over the statistical parameters was assessed using default fragment size (FS) (Table 5).If only atoms, bonds and connections (A/B/C) parameters are taken into consideration, marginal fit (r 2 = 0.55) and poor internal consistency (q 2 = 0.11) are achieved.Despite the fact that addition of hydrogen (H), chirality (Ch) and donor and acceptor (DA) improves the model fit, no significant increase was observed in the internal consistency (compare model 2, 3 and 4 vs. model 1).Similarly, other combinations of fragment distinction parameters produced no further statistical improvements in q 2 values.
Sometimes, variation of fragment size leads to models with higher statistical values, but in this case, no further improvement was achieved for the q 2 values (Table 6).
These initial results indicate that biological activity of azole derivatives cannot be properly captured by molecular holograms only.This might be a consequence of the fact that HQSAR models do not consider charge properties per se (not accounted for in fragment distinction options), whereas the best SIMCA model suggests that electronic features are highly influential for the antifungal activity.Aiming at circumvent this problem, topological descriptors weighted by steric or electrostatic features were employed to build 2D QSAR models.

Classical 2D QSAR
As an initial approach, MLR was used to build models with up to 4 variables, from a set of topological descriptors calculated with DRAGON software.Despite good statistical parameters were achieved (r 2 = 0.78, q 2 = 0.74), these models have poor predictive ability (r 2 pred = 0.1).Among the reasons that might explain this result is: (i) collinearity among descriptors, which results in unstable regression models; (ii) inadequate description of chemicalbiological space by 2D descriptors.As pointed out before, no structural data are available for the macromolecular target of azole antifungals, rendering the development of 3D QSAR model subjective.On the other hand, the collinearity issue can be simply solved by using PLS. 51This statistical tool has the additional benefit of reducing the risk of building chance correlation QSAR models, once just a few descriptors (PCs) are employed to build the models, instead of the 428 descriptors available at first.Accordingly, 36 descriptors found in the best 10 MLR models (q 2 > 0.64) were gathered, autoscaled and used for further PLS analysis, as available in the Pirouette 4.0 software.(Table S2 in the SI section).
The initial QSAR model shows statistical results (r 2 = 0.88 and q 2 = 0.69, 3PCs) similar to those of preliminary MLR models, with improved but still low predictive ability (r 2 pred = 0.45).Aiming at producing statistically sound QSAR models, an iterative exclusion of descriptors with low leverage towards the regression vector was carried out.This strategy afforded a significant improvement of the statistical quality (r 2 = 0.95, q 2 = 0.86, 3 PCs) (Table 7).
It is important to note though that high q 2 values do not guarantee that the QSAR model has any predictive  ability. 52Thus, the predictive power was assessed by means of external validation protocol using test set compounds (Table 4).The good agreement between experimental and predicted values (residues below 0.65 log units) indicates the high robustness and reliability of descriptors used to build this QSAR model (Figure 3 and Table 8), thus suggesting that this model might be useful to understand the structural and physicochemical requirements for inhibition of fluconazole-resistant C. gattii growth and, in the future, guide the identification of novel azole derivatives that would be more active.
In order to achieve this goal, descriptors with high leverage towards the regression vectors were selected for further analysis (Figure 4).
GATS1e and MATS2e account for the difference on the atomic electronegativity at the topological distance of 1 and 2 bonds, respectively.Although these descriptors point out to the same characteristic, at similar topological distance, they are mathematically uncorrelated (r = 0.05).The MATS descriptors provide global information whereas GATS descriptors are sensitive to differences in the atomic neighborhood (example, branching). 55,56Nevertheless, both descriptors show that low electronegativity difference between atoms one or two bonds apart increases potency.Taken together, the most influential descriptors underscore that modulating electronic features of azole derivatives is the most promising strategy to circumvent resistance issues in fluconazole-resistant C. gattii strains.

Conclusion
The chemometric models presented in this work take advantage of well-established ligand-based tools to highlight chemical features that should be modulated if Cryptococcus spp resistance to azole derivatives is to be overcome.First, physicochemical and structural changes that play a major role to azole derivatives antifungal activity were investigated by SIMCA models.This preliminary study suffers from the limitations imposed by disc diffusion assays.Nevertheless, it highlights that electronic features (accounted by BEH3 and JGI10) could be useful to explain azole derivative activity profile against fluconazole resistant C. gattii.Furthermore, fragment-based QSAR (HQSAR) studies proved inappropriate to describe the structure-activity relationship of azole derivatives (r 2 = 0.85, q 2 = 0.35, r 2 pred = 0.38), whereas topological descriptors weighted by electronic and steric properties proved suitable to develop high quality 2D QSAR models (r 2 = 0.95, q 2 = 0.86, r 2 pred = 0.72).The analysis of the best QSAR model supports the hypothesis that electronic features, described by GGI1, MATS2 and GATS1, should be fine-tuned in order to develop congeneric molecules with improved potency against fluconazole-resistant C. gattii strains.

Figure 1 .
Figure 1.Frequency histogram of the compounds at the lower concentration (4.1 mmol L -1 ) in relation to range in the diameter of inhibition halo against C. gattii.

Figure 2 .
Figure 2. Interclass distance for active and inactive compounds of the training and test sets according to final SIMCA model.
nArX number of halogen linked to aromatic ring C-003 atom-centred fragments -CHR3 a C-040 atom-centred fragments -R-C(=X) -X / R-C#X / X=C=X a,b a R stands for any groups linked to the central carbon atom; b =: stands for double bond; # stands for the triple bond; X stands for heteroatom (i.e., O, N, S).

Figure 3 .
Figure 3. Plot of experimental and predicted pMIC values according to the classical 2D QSAR model.

Figure 4 .
Figure 4. Coefficient of vector regression for descriptors of final 2D QSAR model based on topological descriptors.

Table 1 .
Biological properties used for chemometric model development

Table 2 .
Division of the compounds into training and test sets

Table 3 .
Descriptor selected through Fisher's weight a Descriptors found in final SIMCA model are highlighted in gray.b R stands for carbon; X stands for heteroatom (example, O, N, S). 2D Chemometric Studies of a Series of Azole Derivatives Active against Fluconazole-Resistant Cryptococcus gattii J. Braz.Chem.Soc.966

Table 4 .
pMIC values of 33 azoles derivatives (training and test sets) against C. gattii

Table 6 .
Influence of fragment size over the statistical parameters of the best HQSAR models

Table 7 .
Descriptors contained in the final 2D QSAR model

Table 8 .
Experimental and predicted pMIC values of the test set compounds for 2D QSAR based on topological descriptors models