Design of New 2,4-Substituted Furo [3,2-B] Indole Derivatives as Anticancer Compounds Using Quantitative Structure-Activity Relationship (QSAR) and Molecular Docking

This study aimed to propose new indole derivatives as anticancer through Quantitative Structure-Activity Relationship (QSAR) and molecular docking method. The best predicted anticancer activity of indole derivatives was recommended based on the QSAR equation. A data set consist of 18 indole derivatives from literature with anticancer activity against the A498 cell line was used to generate a QSAR model equation. The data set was divided randomly into training (14) and test (4) set compounds. The structure of indole compound was optimized first using AM1 semi-empirical methods, and the descriptors involved were analyzed using Multiple Linear Regression (MLR). The best QSAR equation obtained was Log IC50 = 65.596 (qC2) + 366.764 (qC6) – 92.742 (qC11) + 503.297 (HOMO) – 492.550 (LUMO) – 76.966. Based on the QSAR model, varying electron-withdrawing groups in C2 and C6 atom, as well as adding electron-donating groups in C11 were proposed could increase the anticancer activity of the indole derivatives. The QSAR analysis showed that compound 15 has the best predicted anticancer activity, supported by molecular docking results that showed hydrogen bond interaction with essential amino acids to build anticancer activity such as MET769, THR830, and THR766 residues.


INTRODUCTION
Cancer is one of the leading causes of death in the world. One common sign of cancer is the continual uncontrolled cell division that can invade normal tissue and organs, then spread throughout the other parts of the body. In 2018, WHO reported 18.1 million new cancer cases and 9.6 million of mortality globally. The largest number of deaths was reported caused by lung cancer (1.8 million deaths), followed by colorectal cancer (881,000 deaths), stomach cancer (783,000 deaths), liver cancer (782,000 deaths), breast cancer (627,000 deaths), and esophageal cancer (400,000 deaths). The number of worldwide cancer deaths is expected to increase in every year (WHO, 2018). In Indonesia, the prevalence of cancer is quite high. According to the Basic Health Research (Riskesdas) data by The Health Ministry, the incidence of cancer in Indonesia was around 362,000 people in 2018. The highest rate of death was caused by failure for early detection and improper cancer treatment that requires one or more stages of treatment, such as surgery, radiotherapy, and chemotherapy.
The Quantitative Structure-Activity Relationship (QSAR) method has been widely used in the design of new drug compounds to predict biological activity statistically, along with molecular docking. The computational chemistry methods, such as QSAR and molecular docking, could significantly lowering the cost and time consumed during the drug development process. Those methods allow the researcher to only synthesis selected compounds with the best predicted biological activity and do further studies to the promising compounds.
In this study, an attempt to find a better anticancer activity of indole derivatives was carried out using QSAR and molecular docking. A QSAR equation was used to determine and propose what kind of substituents should be added to the active site of indole compounds to get the best predicted anticancer activity). Meanwhile, molecular docking was performed to study the intermolecular configuration complexes of indole derivatives to the active site of protein EGFR-TK (Epidermal Growth Factor Receptor-Tyrosine Kinase).

EXPERIMENTAL SECTION Instrumentation
The computer specifications used in this study were: Intel Core i7 4.00 GHz; 4GB RAM; 64-bit system type; Operating System: Windows® 7 Ultimate. The programs used in QSAR and molecular docking were Gaussian 09, ChemDraw Professional 15, and Statistical Package for the Social Sciences (SPSS) 23.0 software.

Data
The data set used in the QSAR study were 18 indole derivatives with anticancer activities against the A498 cell line (Table 1) (Zhuang et al., 2013). This data set was divided into two parts, i.e., a training (14 compounds) and test (4 compounds) set. Anticancer activity that was expressed in half-maximal inhibitory concentration (IC50) was converted into logarithms (log IC50) before the QSAR analysis to get a better distribution of the activity val ues. value or similar result with the experimental data was used as the best method to optimize the structure geometry of indole derivatives.

Geometry Optimization
The molecular structure of the eighteen indole derivative compounds was drawn in ChemDraw Professional 15.0 software. The initial optimization was performed using Chem3D 15.0, followed by reoptimization using the best optimization method in Gaussian 09 software. The optimized structure geometry will produce electronic descriptors in the form of atomic charge, HOMO, and LUMO energy data.

Multiple Linear Regression Analysis
Multiple-Linear Regression (MLR) method is used to establish the correlation of anticancer activity (dependent variable) and electronic descriptors (independent variables) statistically. In this work, the determination of the significant descriptors that influence the anticancer activity was calculated using MLR analysis in SPSS 23.0 program with a backward method.
Several models were produced from the MLR analysis. The validation of the best model was carried out by using predetermined statistical parameters, such as R 2 > 0.6; PRESS; and Fcalc./ Ftab. ≥ 1 (Frimayanti et al., 2011). The QSAR models that fit with the statistical criteria were then validated using the test set data to generate the best QSAR equation model.

Design of The Proposed Compound
The proposed compound was designed by replacing substituents in the indole derivatives structure based on the best QSAR equation model. The IC50 value of the new proposed anticancer compound can be calculated from the QSAR equation and referred to as the predicted IC50 value.

Molecular Docking
The biological activity of a compound as an anticancer can be identified by its ability to inhibit the EGFR-TK phosphorylation (Mphahlele et al., 2018;Prajapati et al., 2017). In this study, molecular docking was performed to the crystallographic protein structure of molecular target EGFR-TK with Protein Data Bank (PDB) ID: 1M17 (resolution 2.6 Å). Molecular docking was carried out using Discovery Studio software following the standard protocol implemented in the Discovery Studio (Accelrys, San Diego, USA).

RESULTS AND DISCUSSION Determination of the Best Optimization Method
Calculation of the 1 H-NMR chemical shift of compound 7h was performed using AM1, PM3, and HF methods. Based on the result in Table 2, the AM1 method was the best optimization method with the closest chemical shift (δ) results to the experimental data. This result was determined by comparing the PRESS value of the AM1 (1.68) that was smaller than the PM3 (3.30) and HF (5.05) method. Therefore, the AM1 method was used to optimize the structure and calculate the overall descriptors of the compound series.  (Table 3). The two models illustrate an excellent correlation between biological activity and descriptors that can be seen from the value of the statistical parameters of each model (Table 3).
Correlation of the experimental log IC50 vs. the predicted log IC50 to the training set compound from both models was presented in Figure 1. According to Figure 1, Model 2 gives a better correlation value (R 2 ) with 0.9314 than Model 1 (0.9234). Therefore, both models were then validated using the test set data.

Validation of the QSAR Model
The validation of the best model was carried out by calculating the PRESS value from the test sets compounds. The smaller PRESS value gave a better QSAR model. The PRESS value of the test set data based on Models 1 and 2 can be seen in Table 4. The correlation coefficient (R 2 ) of the predicted and the experimental log IC50 for the QSAR model to the test set was displayed in Figure 2. Table 4 showed that Model 1 has a smaller PRESS value (0.360) than Model 2 (0.401). However, Figure  2 showed a predictive correlation coefficient R 2 value of Model 2 to the test set (0.9544) was better than Model 1 (0.9181). Moreover, Model 1 has more descriptors (6 descriptors) involved in the equation than Model 2 (5 descriptors).
Based on the analysis of Model 2, the independent variables such as atomic charges on C2 (qC2), C6 (qC6), and C11 atom (qC11), as well as HOMO and LUMO can affect 95.4% of the anticancer activity. This correlation was better than Model 1 obtained with a correlation of 91.8%. Therefore, it can be concluded that Model 2 was the best model for designing new indole compounds with better anticancer activity.    Designing of The Proposed Compound The new anticancer indole compounds were designed by replacing substituents in indole structures based on the QSAR equation. From this study, it is expected that the designed compounds have better anticancer activity (IC50) than the previous reported indole derivative compounds. Modification of the compound by varying the substituents need to consider the electronic, hydrophobic, and steric properties of the atoms or substituent added. The design of the proposed structure of indole can be done by adding an electron-withdrawing or donating group.
The electron-withdrawing group has electrophilic properties that can decrease the electron density of the atom attached to it and make it more positive. Meanwhile, the electron-donating group has nucleophilic properties, which result in an increasing electron density of the atom. For example, the addition of an electron-withdrawing group to C6 atom in the indole compound instigates the resonance from C6 towards the electronwithdrawing group so that C6 acts as a nucleophile with a higher electron density.
This study presented the design of new indole derivatives and their predicted anticancer activity. The designed compound with the best predicted anticancer activity was obtained by modifying the functional group in the main structure of indole according to the QSAR equation (Table 5). Table 5 showed that the electron-withdrawing groups in the position R1, R2, and electron-donating group in R3 significantly could increase the anticancer activity of the compound. According to the QSAR equation obtained, electron-withdrawing in R1 and R2, as well as electron-donating in R3, could make the coefficient value of the atomic charges of C2, C6, and C11 were negative. The more negative log IC50 obtained, the better the predicted anticancer activity of indole derivatives. Based on the result, compounds 14 and 15 gave the best predicted anticancer activity, expected by the presence of a halide group in R3 that has high electronegativity properties.
Molecular docking was performed to support the QSAR result by predicting the interaction of compound 15 to crystal protein of Epidermal Growth Factor Receptors (EGFR) tyrosine kinases (PDB ID: 1M17) with co-crystal ligand AQ4. Protein EGFR is a large family of receptor tyrosine kinases that has a significant role in the emergence of breast, lung, esophageal, head, and neck cancer. This protein was also reported as the key to the viability of the cancer cell so it can be a target molecule in docking study. According to the docking result, the Root Mean Square Deviation (RMSD) value obtained was 1.8521 Å.
Molecular docking of compound 15 to the 1M17 protein displayed interaction with seven amino acid, and three of them were hydrogen bonds with MET769, THR830, and THR766 (Figure 3). These three amino acid residues were revealed to act as the active site of anticancer activity formed by co-crystal ligand AQ4 (Stamos et al., 2002). A molecular docking study showed that binding pocket formed by compound 15 was similar to co-crystal ligand AQ4. Thus, it was predicted that compound 15 has an excellent anticancer activity based on the QSAR and molecular docking results.  Designing of the new indole compound using QSAR and molecular docking showed the best predicted anticancer activity by compound 15 with IC50 of 0.003 µM and displayed hydrogen bonds interaction to the essential amino acid MET769, THR830, and THR766.