Study of 17 β-Hydroxysteroid Dehydrogenase Type 3 Inhibitors

Prostate cancer is one of the most common form of cancer among men, and its incidence has been increasing progressively. Since the prostate depends on androgenic hormones to regulate its growth and development, interference with androgenic biosynthesis are important to control the disease. Thus, in this work, a 4D-quantitative structure-activity relationship (QSAR) receptor independent study was carried out using the Laboratório de Quimiometria Teórica e Aplicada (LQTA)-QSAR approach and the new tool Web-4D-QSAR with a set of benzylidene oxazolidinedione and thiazolidinedione inhibitor of the 17β-hydroxysteroid dehydrogenase type 3. The obtained model is robust, free from chance correlation, and with good predictability in all tests performed. In addition to this result, a good mecanistic interpretation related to the binding with the biological target could be traced, which strengthens its potential application in virtual screening and design of new inhibitors of the studied enzyme with the possibility of use in the pharmacological treatment of prostate cancer.


Introduction
Prostate cancer is the most common form of cancer among men above 50 years of age and the second most common in Western men, and its incidence has been increasing progressively in the last few decades. 1,2Surgical removal of the prostate, or radiotherapy, is indicated for localized tumors.4][5] However, all approaches pose considerable risks and side effects and negatively influence the patient's quality of life. 4ince the prostate depends on androgenic hormones to regulate its growth and development, regulation and inhibition of the androgenic receptor action and/or inhibition of the androgenic biosynthesis are important mechanisms to control the disease. 6Drugs such as flutamide, bicalutamide, enzalutamide, and apalutamide (ARN-509) function via the first mechanism of action, while abiraterone acts through the second mechanism (Figure 1). 7In this context, enzyme 17β-hydroxysteroid dehydrogenase type 3 (17β-HSD3), which converts androstenedione to active androgen testosterone, that is over-expressed in hormone-dependent prostate cancer, is considered a potential target to the chemotherapy of prostate cancer. 2,6,8,91][12] The correct use of these techniques could mitigate general costs, time required to obtain positive results, use of animals in the laboratory, and chemical and biological residues. 13,14Among CADD approaches, quantitative structure-activity relationships (QSAR) look for obtaining prediction models that correlate the chemical structure, through molecular descriptors, with a determined biological activity. 15,16onsidering the increasing interest in new chemotherapeutic agents for the prostate cancer treatment, this study aims to obtain a 4D-QSAR receptor independent (RI) model with a set of 49 derivatives of benzylidene oxazolidinedione and thiazolidinedione available in the literature, 6 which show the ability to inhibit the 17β-HSD3 enzyme.

Data set
The 49 benzylidene oxazolidinedione and thiazolidinedione (Figure 2 and Table 1) used in this work were synthesized and published by Harada et al. 6 All compounds were assayed for their in vitro inhibition of the enzyme 17β-HSD3.The data reported as half-maximal inhibitory concentration (IC 50 , in nM) were converted to -log IC 50 (pIC 50 ) to obtain symmetrically distributed data.The data set was divided into training and test sets of 39 and 10 molecules, respectively.

Construction of the ligands' three-dimensional molecular models
The 49 derivatives were built with HyperChem 7 software, 17 from the geometry of the file 01059739 obtained in ZINC database. 18Using the same program, geometry optimizations were performed using molecular mechanics (MM+) and, in sequence, using the semi empirical level Austin Model 1 (AM1).Further, using the Gaussian 09 program, 19 new calculations were carried out at the Hartree-Fock (HF)/6-31G(d) level and then using density functional theory (DFT) (B3LYP/6-311G(d,p)).

Conformational ensemble profile (CEP) of each ligand, CEP alignment and calculation of the descriptors
Molecular descriptors were obtained from the CEP for the RI-4D-QSAR study using the Laboratório de Quimiometria Teórica e Aplicada (LQTA)-QSAR methodology. 20The choice of an RI approach is justified by the fact that crystallographic structures of 17β-HSD3 are not yet available, although the use of protein structures is not essential for a study involving molecular interaction fields (MIFs).According to LQTA-QSAR methodology, the molecular set was submitted to a molecular dynamics (MD) simulation using the GROMACS molecular package 21 according to the scheme described by Martins et al. 22 The trajectory file was recorded every 1000 simulation steps.The CEPs of all ligands were assembled in the same file considering the ligand conformations recorded from 50 to    500 ps, and these data were used for building the QSAR models.Atoms used for the alignment of the different conformers are presented in Figure 3 and the CEPs for the most and least active compounds from the molecular set are presented in Figure 4.
The CEPs, resulting from GROMACS MD simulations, were used to generate the energy descriptors of intermolecular interactions with LQTAgrid program. 20n this approach a probe selected by the user runs over a grid that contains the CEP, and the electrostatic and steric 3D properties are computed for each individual grid point, based on the Coulombic (equation 1) and Lennard-Jones potential functions (equation 2), respectively. ( (2) where q i is the charge of the ith probe; q j is the charge of the jth atom from CEP; ε 0 is the vacuum permittivity; , , and are parameters for the probes and the atoms in the CEP, respectively; n indicates the number of frames aligned in the CEP; and r ij represents the distances between the ith probe and the jth atom of CEP.
The probe used for this study, based on the ff43a1 force field parameterization to simulate atoms or molecular fragments, was NH 3 + , which corresponds to the aminoterminal portion of peptides.The probe explored every point of a 1 Å resolution grid, and the grid size was 30 × 34 × 36 Å 3 .This study was accomplished with a new web based software running LQTA-QSAR methodology called Web-4D-QSAR. 22The generated 3D-energy interaction descriptors were exploited further in the variable selection procedure.

Variable reduction and selection
As a step of variable reduction, the absolute values of the correlation coefficients between each descriptor and the biological activity (|r|) were calculated, and those with coefficients lower than 0.3 were eliminated from the analysis.The variable selection was carried out using the ordered predictor selection (OPS) algorithm. 23,24This algorithm attributes an importance to each descriptor based on an informative vector.The columns of the matrix are rearranged in such a way that the most important descriptors are presented in the first columns.Then, successive partial least squares (PLS) regressions are performed with an increasing number of descriptors in order to find the best PLS model.In this analysis, the product vector was used as the informative vector and the correlation coefficient of cross-validation (Q 2 LOO ) as the criterion to select the best models.Construction of QSAR models 4D-QSAR models were built using PLS regression 25 using QSAR modeling program. 23Since in QSAR the variables normally have different numerical ranges, it was necessary to perform a preprocessing of the data using the auto-scaling approach: the variables were mean-centered and scaled by variance. 26The quality of samples was evaluated through the difference between the predicted activity and the observed activity in the data set.Considering the degree of this difference, the sample could be considered an outlier. 27

Statistical quality of the QSAR models
In this study, statistical parameters recommended for the validation of QSAR models 23,26,[28][29][30] were used to ensure the internal and external prediction quality of the obtained models.The explained variances of the models were evaluated using the coefficient of determination (R 2 ) and the root mean square error of calibration (RMSEC).For adjusted QSAR models, we adopted the most recommended limit for R 2 (> 0.6), while the values of RMSEC should be as close as possible to 0. 28,29 The significance of the models was evaluated using the F-test at a 5% level. 26eave-one-out (LOO) cross-validation was performed by calculating the Q 2 LOO and the root mean square error of cross-validation (RMSECV).For a model to be approved, the RMSECV should be close to 0, and Q 2 LOO > 0.5. 28,29The RmSquare metrics 31 for internal validation, where the average r m 2 (LOO)-scaled > 0.5, and Δr m 2 (LOO)-scaled < 0.2 were also carried out.As a robustness test of internal quality, we carried out the leave-N-out (LNO) cross-validation and achieved a systematic removal of up to N elements of the training set.In this study, N = 10 (6 replicates for each N) for all models.For a robust model, the average value of Q 2 LNO should be as close as possible to the value of Q 2 LOO . 26The chance correlation was also verified by the y-randomization test, another internal quality test.The models were recalculated after randomization of the vector y, and a significant worsening in these new regressions was expected.This procedure was repeated 50 times.The results were evaluated according to the method proposed by Eriksson et al. 13 The external validation of the obtained models was performed by predicting the pIC 50 values of the test set.The quality was assessed using the coefficient of determination of external validation (R 2 pred ), r m 2 (pred)-scaled metrics, and Golbraikh-Tropsha metrics (k, k' and |R 2 0 -R' 2 0 |).It is recommended that R 2 pred > 0.5, the root mean square error of prediction (RMSEP) be close to 0, average r m 2 (pred)-scaled > 0.5, and Δr m 2 (pred)-scaled < 0.2.The k and k' metrics should be in the range 0.85-1.15,][33][34] The second test involved the Prediction Reliable Indicator, a tool recently presented by Roy et al. 35 In this tool, predictive ability of the model is evaluated regarding new compounds that were not part of the original data set.The predictions for this set (named real external set) are ranked into good, moderate, and poor/unreliable.To make this test feasible, a 2D similarity search (DICE approach of 60% based on the basic structure of compounds 1 to 49) was carried out in the ZINC15 database. 36Excluding duplicated results, eight compounds (50 to 57, Figure 5) were submitted to the test.As it was done for the original data set, molecular descriptors were calculated in LQTAgrid program. 20In order to increase the reliability of these predictions, applicability domain was also calculated using the Euclidean Applicability Domain 1.0 tool. 37This approach is based on the similarity between the descriptors used in the model and in real external set, and a compound is considered within the domain if the normalized average distances of each compound in the original set to it fall between 0-1. 38

Results and Discussion
The obtained model (equation 5) corresponds to the results of the 4D-QSAR study obtained using the OPS-PLS approach, and it was built based on eight field descriptors (three Coulomb descriptors and five Lennard-Jones descriptors, where the numbers indicate the coordinates in the Cartesian space of each descriptor) that are projected to two latent variables (LV) accounting for 32.937% (LV1: 18.889%; LV2: 14.048%) of the original information.This model was obtained with 39 compounds (no outliers were identified) and ten compounds (6, 13, 19, 23, 25, 30, 39, 43, 47 and 48) were selected to form the external test set.This selection was carried out in such a way that all the range of the biological activity of the test set could be covered.The statistical results of the obtained model are presented in Table 2.The values of the descriptors selected in the model as well as the predicted values of the pIC 50 in the cross validation for the training set and in the external validation for the test set are available in the Supplementary Information section (Table S1).pIC 50 = 8.76 -0.012 × (20_17_17_LJ) -0.056 × (11_10_12_C) -0.19 × (18_24_22_LJ) + 0.013 × (17_15_15_LJ) + 0.0020 × (19_10_19_LJ) + 0.0025 × (10_18_23_C) -0.010 × (20_17_16_LJ) + 0.020 × (12_21_17_C) (5) The results of internal validation statistics show that the model is able to explain and predict enough information.The result of the F-test is higher than the corresponding tabulated value, thus the obtained regression can be considered significant.The average result of LNO (Q 2 LNO = 0.59) is close to Q 2 LOO , with the largest deviation observed for Q 2 L5O (0.57 ± 0.055), showing that the model is robust (Figure 6a).The intercepts of the models obtained through the regressions built in the y-randomization test (Figure 6b) also show that the model is robust and free from chance correlation.
One of the advantages of a 4D model is the possibility of better interpretation of the model.0][41][42] Previous studies using LQTA-QSAR approach 16,43 showed that it is possible to obtain models such that even if it is an RI model, the selected descriptors may be related to amino acid residues that form binding sites of biological targets, and this may be useful in the interpretation of results.However, unlike the cited studies, the crystallographic structure of 17β-HSD3 is not available.According to Lukacik et al., 44 its hydrophobic nature has thus far prevented its structure determination.Nevertheless, other proteins from the same family are already available and may be used as a tool to aid interpretation due to high homology between them.The human 17β-HSD1 presents 74% of homology with type 3, 8 and thus was used to help in the interpretation process.This distribution reminds the organization of the cavity of 17β-HSD, which is characterized by a central hydrophobic region and ends formed by groups capable of forming hydrogen bonds (Figure 7c).
It is also interesting to observe that most of the descriptors in the space presents patterns very similar to the positioning of the amino acid residues that form the cavity  (Figures 7d and 7e), including the distances among them, with the largest difference corresponding to the distance between descriptors 1 and 2 (13.64 Å), and the distance between the residues SER142 and the acid group of the backbone of TYR145 (8.40 Å).In the other cases, the maximum difference observed is only 2.6 Å.It is important to observe that the enzyme used in the comparison is not 17β-HSD3, but a homolog, which can explain the observed differences.Thus, it may be proposed that the 4D-QSAR model presents the interpretation related to the blocking mechanism of 17β-HSD3 by the studied benzylidene oxazolidinedione and thiazolidinediones.
Considering the autoscaled values of the selected descriptors (11_10_12_C (-0.48), 12_21_17_C (0.28), 10_18_23_C (0.27), 18_24_22_LJ, (-0.26), 20_17_17_LJ (-0.21), 17_15_15_LJ (0.19), 19_10_19_LJ (0.16) and 20_17_16_LJ (-0.13)), it can be observed that the most important (11_10_12_C) is occupied by the R 1 side chain of the most active compound.Since this descriptor has a negative coefficient in the model, this fact indicates that the occupation of this region of the target tends to make the compounds be less active.A similar interpretation could be given for descriptors 10_18_23_C and 12_21_17_C.Despite the positive values that the coefficients of these descriptors present, the values of the descriptors are predominantly negative, especially for 10_18_23_C, which is the furthest from the most active compound 45.
A similar analysis could be done for descriptor 18_24_22_LJ, the most important Lennard-Jones descriptor and that also presents negative coefficient.High values for this descriptor are favored by bigger R 1 substituents, which in turn decrease the biological activity.The result of this interpretation, in addition to good global statistics, strengthens the hypothesis that the model adequately describes the inhibition of this enzyme.
Referring to the test with the Prediction Reliable Indicator tool, all compounds had their predictions classified as moderate (Table 3) using the standard weighing scheme 0.5-0-0.5. 35But the Euclidean applicability domain (Figure 8) shows that one compound (57) has a normalized value greater than 1.Thus, the prediction obtained for this compound is not reliable, while the prediction applied to the other compounds, which corresponds to 87.5% of the real external validation set, is reliable.These results, combined with the results of internal and external validations, and mechanistic interpretation, show that the model has a real potential to predict with confidence the potential for inhibition of the enzyme 17β-HSD3 for new hits that are identified, or even synthesized, and which are related to the original structure.

Conclusions
In this study, we have developed a 4D-QSAR model using a set of 49 benzylidene oxazolidinedione and thiazolidinedione derivatives with inhibitory activity  against 17β-HSD3.The model was validated both internally and externally, indicating that it is significant, robust, had no chance correlation, and presented good predictive ability.The test carried out with the Prediction Reliability Indicator tool indicated that the model presents sufficient reliability to predict the inhibition of new potential inhibitors of the enzyme of interest that are structurally related to the data set.Besides, the interpretation could be directly related to the interaction with the binding site of 17β-HSD3.Thus, the obtained 4D model may be a useful tool for aiding virtual screening and design studies of new inhibitors of 17β-HSD3, so they can be used as lead compounds or new drugs for potential application in prostate cancer treatment.
Paulo A. Martins * ,a and Eduardo B. de Melo b

Figure 1 .
Figure 1.Structure of approved and under study anti-androgens.

Figure 2 .
Figure 2. Basic structure of the data set.

Figure 3 .
Figure 3. Representation of the atoms used for the alignment of the CEP.

Figure 4 .
Figure 4. Comparison of the CEPs resulting from MD simulations of the least active (left) and most active (right) compounds.

Figures 7a and 7b
Figures 7a and 7b present the eight descriptors selected in the space around the CEPs of the least active molecule (29) and the most active (45), respectively.It can be observed that the concentration of Coulomb descriptors at the most flexible end corresponds to position R 1 , while the Lennard-Jones descriptors are concentrated at the center of the molecule (with the exception of 18_24_22_LJ).

Figure 8 .
Figure 8. Plot of Euclidean applicability domain.Diamonds: training set; circles: real external set.

Table 1 .
Dataset studied and the respective pIC 50 values a

Table 3 .
Data and results for the real external test set Pred. pIC 50 : predicted -log IC 50 (IC 50 : half-maximal inhibitory concentration).