Modeling the Antileukemia Activity of Ellipticine-Related Compounds: QSAR and Molecular Docking Study

The antileukemia cancer activity of organic compounds analogous to ellipticine representes a critical endpoint in the understanding of this dramatic disease. A molecular modeling simulation on a dataset of 23 compounds, all of which comply with Lipinski’s rules and have a structure analogous to ellipticine, was performed using the quantitative structure activity relationship (QSAR) technique, followed by a detailed docking study on three different proteins significantly involved in this disease (PDB IDs: SYK, PI3K and BTK). As a result, a model with only four descriptors (HOMO, softness, AC1RABAMBID, and TS1KFABMID) was found to be robust enough for prediction of the antileukemia activity of the compounds studied in this work, with an R2 of 0.899 and Q2 of 0.730. A favorable interaction between the compounds and their target proteins was found in all cases; in particular, compounds 9 and 22 showed high activity and binding free energy values of around −10 kcal/mol. Theses compounds were evaluated in detail based on their molecular structure, and some modifications are suggested herein to enhance their biological activity. In particular, compounds 22_1, 22_2, 9_1, and 9_2 are indicated as possible new, potent ellipticine derivatives to be synthesized and biologically tested.


Introduction
Cancer disease represents one of the most significant health problems in the world, being the second most common cause of death around the planet [1,2]. It is estimated that by 2030, the number of cancer cases in the world will have increased by approximately 23.6 million [2]. The situation in the Latin American population is even more dramatic; according to the World Health Organization (WHO) data, 37% of cancer cases are reported in this region, and are associated with the limited development of technology and industry related to the treatment of this disease in these countries [3]. Childhood ellipticine moiety seems to have a significant influence on the compound's biological activity [36][37][38][39]. The structural similarity with ellipticine is, in theory, a fundamental key to designing new anticancer compounds. Moreover, the vast quantity of reported compounds with structures close to ellipticine represents a unique opportunity to identify the common structural characteristic that any molecular structure must have to be active against leukemia. In this respect, this work aimed to find a quantitative relationship between several molecular descriptors (topological, thermodynamics, and electronics) and the antileukemia activity of compounds related to ellipticine, in order to guide the synthesis of new promising antileukemia compounds. Additionally, to offer more insight into the interaction of ellipticine derivatives with leukemia cells, a docking calculation on the selected molecular target of the L1210 leukemia line cell is presented.

Results
Pharmacological data in vitro of several ellipticine analogues with antileukemia activity against L1210 cells were collected from the literature [40,41]. After the application of the Lipinski [42] rule filters, only 23 ellipticine analogues (ellipticine include) were selected. Figure 1 shows the chemical structure of the compounds studied herein.

Statistical Analysis
The linear regression (LR) method was employed to find the most relevant molecular descriptors-the parameters related to the biological activity. Thus, the biological activity as pIC50 (ln(1/IC50), dependent variable) was plotted against each molecular descriptor. A regression

Molecular Modeling
The minimum-energy 3D geometries for the compounds shown in Figure 1 were obtained using density functional theory with WB97XD/6-311G(d,p) as a theory level [43], using Gaussian 16 software [44] for Linux available in the high performing computer of the San Francisco de Quito University, Quito, Ecuador. The DFT level and interchange correlation functional was chosen because of its good correlation with experimental results based on the energetics and structure of organic molecules [45][46][47][48].

Statistical Analysis
The linear regression (LR) method was employed to find the most relevant molecular descriptors-the parameters related to the biological activity. Thus, the biological activity as pIC 50 (−ln(1/IC 50 ), dependent variable) was plotted against each molecular descriptor. A regression coefficient > 0.5 was considered to indicate an "important" molecular descriptor.
According to the linear regression analysis (LRA), Table 1 reveals that four electronic molecular descriptors (HOMO, softness, LUMO, and dipolar momentum (µ)) and three topological molecular descriptors (AC1RABABMID, TS1KFABMID, and PSA) have a strong influence over biological activity.

QSAR Model
Using the descriptors shown in Table 1 as independent variables, we obtained the mathematical models described by Equations (1)-(3), with three-four descriptors.
The best model, selected according to the statistical robustness, was Model 3, with an R 2 of 0.836, a high Fisher ratio value of 27.02, and a great correlation prediction index (Q 2 ). A Y-scrambling analysis was also performed on Model 3, and found values of a(R 2 ) = 0.126 and a(Q 2 ) = −0.505, which suggest that the models predictability cannot be explained by chance. The correlation prediction index was estimated by using the leave-one-out crossvalidation. Figure 2 graphically represents the linear relationship between the experimental pIC 50 values and those predicted using Equation (3).  The correlation matrix of Model 3 is shown in Table 2. Note that the Pearson's correlation coefficient in each descriptor was <0.6, indicating that the model did not over fit. The descriptors AC1RABABMID and TS1KFABMID seemed to have the most influence over the activity. According to Equation (3), the highest occupied molecular orbital, HOMO, has a large influence over the antileukemia activity. Thus, the more negative the HOMO energy, the greater the impact on The correlation matrix of Model 3 is shown in Table 2. Note that the Pearson's correlation coefficient in each descriptor was <0.6, indicating that the model did not over fit. The descriptors AC1RABABMID and TS1KFABMID seemed to have the most influence over the activity. According to Equation (3), the highest occupied molecular orbital, HOMO, has a large influence over the antileukemia activity. Thus, the more negative the HOMO energy, the greater the impact on biological activity. This descriptor is related to the molecule's capability to participate in dipole/dipole interactions like hydrogen bonds. Additionally, the HOMO energy describes the ionization potential and the molecule's vulnerability to electrophiles attack. HOMO energy also plays a pivotal role in free radical reactions and redox potential [52][53][54][55] therefore, it is very common to find this descriptor in works related to anticancer activity.
The global softness (s) related to the antileukemia activity shown for the compounds studied herein. Global softness is defined as the reciprocal of global hardness and describes the extent to which the electronic environment surrounding the nucleus/nuclei of an atomic/molecular species tends to loosen itself [56][57][58][59][60]. In this sense, it could be associated with a compound's ability to deform its electronic cloud via variation of the number of electrons. Therefore, a soft compound necessarily has a high potential to transfer electrons in a redox process. As reported by Table 1, ellipticine had the highest softness value, while the rest of the studied compounds had values between 3 and 5, with the most actives ones values close to 5.
The topographic descriptors AC1RABABMID and TS1KFABMID, which were estimated using QuBiL-MIDAS software, have been widely used for the construction of QSAR models for several biological activities [51,61]. These descriptors presented a good correlation with the pIC 50 values, as shown in Table 2, and they were obtained by the application of some algebraic linear indexes, considering an autocorrelation and a total sum invariant with a lag value of 1 (AC1 and TS1) on the 3D optimized structure. These two descriptors were atomic weighted by the physicochemical property logP, denoted with "a" at the end of the descriptor, and this property is related to the water solubility as well as to the lipophilicity of the compounds, which both play an important role in almost any biological activity of an organic compound.

Molecular Docking
A molecular docking tool was used to gain more insight into the binding modes of the ellipticine-related compounds studied herein with the selected targets. It is necessary to mention that with the exception of ellipticine, the mechanisms of antileukemial cell proliferation inhibition by the compounds studied herein are not well known. One author proved that, in contrast to ellipticine, these kind of compounds are not able to inhibit topoisomerases I and II proteins [40,41]. In this regard, it seems that the antileukemial activity might be associated with other mechanisms that involve different targets than topoisomerases.
According to the literature, spleen tyrosine kinase (SYK), phosphoinositide 3'kinase (PI3K), and Bruton's tyrosine kinase (BTK) were selected because they have been related to decrease of leukemia cells [62][63][64] in preclinical models. Morover, some compounds structurally close to those studied herein have been reported as SYK [65], BTK [66], and PI3K inhibitors [67]. In this regard, the entire dataset shown in Table 3 was docked against the three targets mentioned above. Due to the inactivity against topoisomerase II, this target was ruled out. Fostamatinib (R788) is one of the latest reported inhibitors of both SYK and PI3K leukemia proteins; its action mechanism relates to the blocking of antigen-dependent B-cell-receptor signaling [66,67]; thus, based on its already known antileukemial activy, it was used as the control ligand in the docking study. Table 3 reveals some interesting patterns: the most active compounds (pIC 50 > 13) showed scoring values smaller than −8 kcal/mol against any target. It is also of note that the less active compounds (pIC 50 < 11) had at least one value scored higher than −8 kcal/mol. A close inspection of Table 3 reveals that compounds 9, 20, and 22 can be highlighted because of their scoring values close to those of compound R788.
According to Table 1, the entire dataset could be divided into two structural groups; compound 13 derivatives and compound 22 derivatives. Compounds 9 and 22 had the highest antileukemia activity and the highest scoring values into their group. Thus, to gain more details about these two compounds, Figure 3 shows these molecules docked with the proteins SYK, PI3K, and BTK. However, because only the SYK protein has a reported inhibitor, this protein was selected to show the interactions with 9 and 22 compared with that inhibitor (fostamatinib). Thus, the molecular interaction between these ligands and SYK protein is shown in Figure 3; this was generated using Pymol [68] and ligplot software [69]. Figure 3 shows the most active compounds (9 and 22) and the structure of fostamitinib with the target protein, as well as a 2D diagram of interactions with the terminal residues of the protein.
Further, the figure shows the pose of compounds 9 and 22 bound to the SYK protein; the inhibitor R788 (fostamatinib) is also displayed. It can be noted that compounds 9, 22, and fostamatinib contact with the protein via the same pocket. This result is fascinating because it suggests that these compounds might act via a similar mechanism. However, some differences were found in the molecular interaction types.
As shown in Figure 3, the molecular interactions between SYK and fostamatinib were mostly, dipolar in nature. The figure highlights the hydrogen bonds between Glu48 and Leu45 with the OH fragment, Lys40 and Gly18 with NH, and Met80 with the P-OH fragment in fostamatinib. In contrast, compound 9 did not present hydrogen bond formation with the target; however, other dipolar interactions, such as LEU133, Met82, and Glu81 with the CN aliphatic chain, and hydrophobic interactions such as cation-pi (Lys40) or pi-pi (PHE20), did take place.
Molecules 2019, 24, x; doi: FOR PEER REVIEW www.mdpi.com/journal/molecules fragment, Lys40 and Gly18 with NH, and Met80 with the P-OH fragment in fostamatinib. In contrast, compound 9 did not present hydrogen bond formation with the target; however, other dipolar interactions, such as LEU133, Met82, and Glu81 with the CN aliphatic chain, and hydrophobic interactions such as cation-pi (Lys40) or pi-pi (PHE20), did take place. Compound 22 showed a similar behavior to fostamatinib. This compound displayed several hydrogen bonds formed with the residues ASN131 and ARG130. Additionally, other dipolar interactions were noted, such as with PRO87, Asp144, and Val23. The rest of the interactions were in the hydrophobic range. Compounds 9 and 22 were docked against phosphoinositide 3'kinase (PI3K). The 3D and 2D docking interaction diagrams are shown in Figure 4. As shown in Figure 4, both compounds 9 and 22 hit the PI3K protein at the same pocket as fostamitinib. However, the nature of the interaction was different. While fostamitinib interacts mostly via dipolar interactions (several hydrogen bonds), in general, compound 9's activity was related to several hydrophobic interactions, mostly pi Compound 22 showed a similar behavior to fostamatinib. This compound displayed several hydrogen bonds formed with the residues ASN131 and ARG130. Additionally, other dipolar interactions were noted, such as with PRO87, Asp144, and Val23. The rest of the interactions were in the hydrophobic range.
Compounds 9 and 22 were docked against phosphoinositide 3'kinase (PI3K). The 3D and 2D docking interaction diagrams are shown in Figure 4. As shown in Figure 4, both compounds 9 and 22 hit the PI3K protein at the same pocket as fostamitinib. However, the nature of the interaction was different. While fostamitinib interacts mostly via dipolar interactions (several hydrogen bonds), in general, compound 9's activity was related to several hydrophobic interactions, mostly pi interactions; however, a few dipolar interactions could be observed (Hist650, Glu792, and Met788 with CO group). The same behavior was observed for compound 22. Figure 4 shows that just a dipolar interaction (Lys642 with OCH 3 group) took place for this compound.  As shown by the QSAR result, antileukemia compound activity demanded a low HOMO energy and high softness. Moreover, a close inspection of the molecular docking results revealed the relationship of scoring value and QSAR descriptors in Model 3: side chains with electronegative substitution drive to more polar molecules with smaller HOMO energy; thus, a better interaction with the molecular target could be.
In agreement with the above, i.e., the combination of both molecular behavior and QSAR results, we propose four molecules with the necessary chemical characteristic previously outlined. Figure 5 shows the predicted pIC50 and scoring values for these compounds against SYK protein. According to the result above, it seems that the selected compounds might hit the proteins SYK and PI3K in the same pocket as the leader compound fostamitinib. This outcome represents a good starting point for the design and testing of these compounds and their derivatives experimentally.
As shown by the QSAR result, antileukemia compound activity demanded a low HOMO energy and high softness. Moreover, a close inspection of the molecular docking results revealed the relationship of scoring value and QSAR descriptors in Model 3: side chains with electronegative substitution drive to more polar molecules with smaller HOMO energy; thus, a better interaction with the molecular target could be.
In agreement with the above, i.e., the combination of both molecular behavior and QSAR results, we propose four molecules with the necessary chemical characteristic previously outlined. Figure 5 shows the predicted pIC 50 and scoring values for these compounds against SYK protein.  As shown in Figure 5, the QSAR model predicted both activity and scoring values higher than its precursor, encouraging the synthesis and experimental evaluation of the proposed compounds. However, as the scoring values for SYK were higher than for PI3K protein, only the docking against SYK was analyzed.
It was noetd that in the compound 9 derivatives, the enhancing of the side-chain with a hydrophobic substituent like a benzene ring increased the antileukemia activity. The benzene ring with an NH terminal not only improved the softness and diminished the HOMO energy, but also enhanced both hydrophobic and dipolar interactions with the selected target. This behavior is highlighted in the SYK proteins when interactions like hydrogen bonds (Arg130 and Asp126) take place.
On the other hand, of the compound 22 derivatives, 22_2 presented the best results against the SYK protein. It was noted that an aromatic ring bound to the ester moiety enhanced the biological activity. Once again, an aromatic ring meets the electronic requirements demands of the QSAR models, i.e., smaller HOMO energy and a higher softness. Additionally, the docking result from SYK target revealed some new interactions driven by further substitutions. For example, as in compound 22, the hydrogen bonds with ASn131 and Asp130 remain; however, a new hydrogen bond between the ester carbonyl group and Asp144 residue formed. Nevertheless, it seems that the hydrophobic interactions were more critical than for the precursor. Thus, several hydrophobic interactions appeared between the ester moiety aromatic ring and PHE20, Asn19, Val23, Asn19, Lys165, and Ser17 ( Figure 6). As shown in Figure 5, the QSAR model predicted both activity and scoring values higher than its precursor, encouraging the synthesis and experimental evaluation of the proposed compounds. However, as the scoring values for SYK were higher than for PI3K protein, only the docking against SYK was analyzed.
It was noetd that in the compound 9 derivatives, the enhancing of the side-chain with a hydrophobic substituent like a benzene ring increased the antileukemia activity. The benzene ring with an NH terminal not only improved the softness and diminished the HOMO energy, but also enhanced both hydrophobic and dipolar interactions with the selected target. This behavior is highlighted in the SYK proteins when interactions like hydrogen bonds (Arg130 and Asp126) take place.
On the other hand, of the compound 22 derivatives, 22_2 presented the best results against the SYK protein. It was noted that an aromatic ring bound to the ester moiety enhanced the biological activity. Once again, an aromatic ring meets the electronic requirements demands of the QSAR models, i.e., smaller HOMO energy and a higher softness. Additionally, the docking result from SYK target revealed some new interactions driven by further substitutions. For example, as in compound 22, the hydrogen bonds with ASn131 and Asp130 remain; however, a new hydrogen bond between the ester carbonyl group and Asp144 residue formed. Nevertheless, it seems that the hydrophobic interactions were more critical than for the precursor. Thus, several hydrophobic interactions appeared between the ester moiety aromatic ring and PHE20, Asn19, Val23, Asn19, Lys165, and Ser17 ( Figure 6). The combination of QSAR and docking results in this work seem to support Romero et al.'s idea [41] that the enlargement of the side-chain in ellipticine-related compounds enhance their antileukemia activity. When this enlargement of the side-chain is accompanied by an aromatic ring and electronegative atoms like N, the antileukemia activity is enhanced. Thus, the results reported herein support compounds 9 and 22. Likewise, their derivatives have a reasonable probability of being active against leukemia cells; therefore, their synthesis and posterior biological testing is encouraged.
The literature revealed that the synthesis procedure for compound 9_1 has been reported before [66]. However, it was not tested again cancer cells. The rest of the compounds have not been reported to date; however, some similar structure nuclei have been reported. In this regard, based on earlier reports, Figure 7 shows a proposing synthetic route for the four potential antileukemia compounds predicted in this work.  Figure 6. On the left, the 3D representation is shown, and on the right, the 2D interactions of the proposed compound at the spleen tyrosine kinase (SYK) active site.
The combination of QSAR and docking results in this work seem to support Romero et al.'s idea [41] that the enlargement of the side-chain in ellipticine-related compounds enhance their antileukemia activity. When this enlargement of the side-chain is accompanied by an aromatic ring and electronegative atoms like N, the antileukemia activity is enhanced. Thus, the results reported herein support compounds 9 and 22. Likewise, their derivatives have a reasonable probability of being active against leukemia cells; therefore, their synthesis and posterior biological testing is encouraged.
The literature revealed that the synthesis procedure for compound 9_1 has been reported before [66]. However, it was not tested again cancer cells. The rest of the compounds have not been reported to date; however, some similar structure nuclei have been reported. In this regard, based on earlier reports, Figure 7 shows a proposing synthetic route for the four potential antileukemia compounds predicted in this work. (2) Compounds 22_1 and 22_2.

Pharmacological Data Collected
The data of in vitro ability to inhibit 50% L1210 leukemia cell proliferation for 23 ellipticine-related compounds were collected from the literature, mainly from Pujol et al. [40] and Romero et al. [41]. All data collection ensured homogeneity; all of the IC 50 data were obtained through the same experimental methods. The selected compounds' IC 50 values < 100 µM.

Molecular Modeling
The minimum energy geometries for the 23 compounds studied herein were obtained using density functional theory with WB97XD/6-311G(2d,p) as the theory level, using Gaussian 16 software for Linux [44]. Both DFT theory level and interchange correlation functional were chosen because of their good correlation with experimental results based on the energetics and structure of organic molecules [45][46][47].
The minimum geometry structures were verified using the second derivative criteria [50]; the vibrational frequency calculations performed for the entire dataset showed no imaginary frequencies; therefore, all of the geometries were confirmed to be minimum-energy structures. Both the minimum structures and frequency calculations were used to find electronic and molecular descriptors such as dipolar momentum (µ), HOMO and LUMO energies, polarizability (α), enthalpy (H), entropy (S), free energy (G), ionization potential (PI), electronic affinity energy (EAE), hardness (η), softness (s), and electrophilic index (ω) using Equations (4)- (7).
where (µ) is electronic chemical potential and (X) is electronegativity. The partition coefficient (ClogP), a measure of lipophilicity (a highly influenced descriptor), was obtained using MarvinSketch software for Windows 2017 [50], whereas Gibbs energy, enthalpy, and entropy were obtained by combining frequency calculation and statistical mechanics [49]. The 3D-optimized structures for the all data studied herein were used for the calculation of MAM descriptors, as described previously [14,15]. MAM descriptors are molecular attributes based on n-linear transformation matrix representations weighted over atomic properties like lipophilicity, van der Waals volume (ν), refractivity (r), polar surface area (PSA), polarizability (α), and electronegativity (χ), and these calculations were performed with QuBiLs-MIDAS software [51].

Statistical Analysis
The molecular descriptors were plotted against IC 50 (as pIC 50 ) to find the significance over the biological activity. Descriptors with higher association with antileukemia activity (r > 0.5) were considered statistically relevant and used in the construction of the mathematical model (QSAR).

Quantitative Structure-Activity Relationship (QSAR) and Statistical Validation
The relationships of antileukemia activity and the most relevant molecular descriptors were studied using multiple linear regressions. The method used has been described [13,70] before. Briefly, attempts were made to map the relationship between two or more independent variables with a dependent variable by fitting a linear equation involving the observed data. The independent variables of the model, selected according to forward selection and backward elimination [71][72][73] methods, were three statistical variables: the correlation coefficient (R), the Fisher ratio values (F), and the standard deviation (s).
After the QSAR models were formulated, they were validated statistically before their application in the designs of the new antileukemia molecules. To this end, the predictive power of the best equation was verified via leave-one-out cross-validation methods [74] and quantified by Equation (8). This method has been explained previously in several articles, and is well known for its extensive use in QSAR studies [75][76][77][78].
Likewise, the standard error of prediction (SEP) is calculated as: where y is the experimental value of ln (1/IC 50 ), ..
y is the predicted value, and n is the number of samples used for model building. A y-scrambling analysis was also performed in order to confirm that the predictability of the models was not due to chance. The a(R 2 ) and a(Q 2 ) values were then determined using a total of 300 iterations on the response variable, and small values of these correlation coefficients were associated with a correlation not due to chance.

Molecular Docking
The protocol used herein to perform the molecular docking has been reported previously [68,72], and has been used to model the interactions of drugs with the active sites in different diseases, such as Pin1 (peptidyl-prolyl cis-trans isomerase NIMA-interacting 1) [15] inhibition and antimalarial activity [13]. Autodock4 software was used to this end [79]. Tridimensional structures of spleen tyrosine kinase (SYK, PDB-ID:4F4P), phosphoinositide 3´kinase (PI3K, PDB-ID:6DGT), and Buton´s tyrosine kinase (BTK, PDB-ID:3PIY) were obtained from the reported RX in the protein data bank (PDB) as.pdb format [80]. Each protein was refined before use; water molecules and any ligand associated with the protein were removed. Additionally, both polar hydrogen atoms and Kollman-type charge were added. The non-ligand proteins were then saved in .pdb format. Next, the optimized ligands from the WB97XD/6-311G(d,p) theory level were converted to .pdb format and added to the protein.
Next, both protein and ligand saved in PDBQT format were used to perform the docking calculation with autodock4 software. The interaction points between ligands and proteins were analyzed using autogrid software, building the grid box into the active center of each protein considering 50 points through the x, y, and z directions, taking into account at least 2.5 million interactions through genetic algorithm, and taking into account a binding site size of 22 Angstrom. The best scoring factors obtained were compared with standard ligands, i.e., the reported inhibitors of the proteins used. These poses were saved in .pdb format and visualized using pymol for Linux. Next, the best pose interaction images were generated using Ligplot; this software allowed us to gain information about the molecular interactions (dipole-dipole and hydrophobic) around the protein active site and the ligand. The identified interactions were validated via the QSAR model.

Conclusions
The minima energy structures for 23 compounds-including ellipticine, a prominent anticancer compound-were obtained by means of density functional theory, combining the WB97XD method with a 6-311G(d,p) basis set. Multiple linear regression methods were applied over the entire dataset, using both electronic and topological molecular descriptors as independent variables and the pIC 50 as the dependent variable. After this stepwise approach was used, three mathematical models were generated; according to statistical descriptors, Model 3 had the best results. Moreover, this model was statistically validated through leave-one-out cross-validation methods. According to the results herein, antileukemia activity can be attributed to four molecular descriptors: HOMO, softness, AC1RABAMBID, and TS1KFABMID. These manageable descriptors allowed association of the chemical characteristics with antileukemia activity. The entire dataset was used to perform docking studies with three fundamental protein targets: spleen tyrosine kinase (SYK), phosphoinositide 3'kinase (PI3K), and Bruton's tyrosine kinase (BTK). Compounds 9 and 22 were highlighted based on their scoring values; moreover, these compounds hit the SYK protein into the same active site as compound R788 (fostamatinib), a reported inhibitor of SYK protein. Furthermore, the scoring values of the four compounds were close to the R788 values. Based on QSAR analysis and the analysis of docking molecular interactions with SYK protein, four compounds are proposed as possible SYK inhibitors; the results achieved herein encourage the synthesis of these four compounds and subsequent biological tests.

Conflicts of Interest:
The authors declare no conflict of interest.