3D-QSAR and Docking Studies on Pyrimidine Derivatives of Second-Generation ALK Inhibitors

Anaplastic lymphoma kinase (ALK) is a promising target for the treatment of non-small cell lung cancer. Under crizotinib treatment, drug resistance and progressive disease appeared after the point mutations arising in the kinase domain of ALK. Second-generation ALK inhibitors can solve the de ﬁ ciencies of the ﬁ rst generation, especially the drug resistance in cancer chemotherapy. Ceritinib (LDK378), a pyrimidine derivative, for example, can inhibit the activity of ALK with an IC 50 value of 40.7 nmol/L, and can experience disease progression after initial treatment with crizotinib. Unfortunate-ly, clear structure – activity relationships have not been identi ﬁ ed to date, impeding the rational design of future compounds possessing ALK inhibition activity. To explore interesting insights into the structures of pyrimidine derivatives that in ﬂ uence the activities of the second-generation ALK inhibitors, three-dimensional quantitative structure – activity relationship (3D-QSAR) and molecular docking were performed on a total of 45 derivatives of pyrimidine. Comparative molecular ﬁ eld analysis (CoMFA) and comparative molecular similarity index analysis (CoMSIA) techniques were used to generate 3D-QSAR models. CoMFA and CoMSIA were performed using the Sybyl X 2.0 package. Molecular docking analysis was performed using the Sur ﬂ ex-Dock module in SYBYL-X 2.0 package. We found in the CoMFA model that the non-cross-validated r 2 value was 0.998, the cross-validated q 2 value was 0.663, and the F statistic value was 2,401.970, while the r 2 value was 0.988; q 2 value was 0.730, and F value was 542.933 in CoMSIA models, suggesting the good predictability of the CoMFA and CoMSIA models. 3D contour maps and docking results suggested that different groups on the core parts of the compounds could enhance the biological activities. Based on these results, the established 3D-QSAR models and the binding structures of ALK inhibitors obtained favor the prediction of the activity of new inhibitors and will be helpful in the reasonable design of ALK inhibitors in the future.


Introduction
Anaplastic lymphoma kinase (ALK), a receptor tyrosine kinase that belongs to an insulin receptor superfamily, is responsible for the development of different tumor types. According to the current study, ALK has not been found to have an essential physiological role in the normal body, and ALK fusion proteins have been connected with a wide spectrum of human cancers. [1][2][3] Evidence suggested that ALK fusion gene has been mainly expressed in the central nervous system and other parts of the brain and its expression levels sharply decline after birth. 4 Although not widely expressed in adult tissue, ALK is implicated in neuronal development, differentiation, and basal dopaminergic signaling. 5 Interestingly, ALK fusion gene also plays a key role in the development of various cancers, such as non-small cell lung cancer (NSCLC), breast cancer, ovarian cancer, and inflammatory myofibroblastic tumors. [6][7][8] ALK participates in the regulation of cell proliferation associated the PLC-γ and Ras/Raf/MEK/ERK1/2 pathways, and cell survival associated the JAK/STAT and PI3K/Akt (PKB) pathways. [9][10][11] These different pathways can be activated by ALK fusion proteins, which are interconnected and overlapping. Thus, many novel potent ALK inhibitors have been reported just like mushrooming in recent years. 12 The echinoderm microtubule-associated protein-like 4 (EML4)-ALK fusion-type tyrosine kinase is an oncoprotein found in NSCLC, and has been validated as a novel therapeutic target by Pfizer's first-generation ALK inhibitor crizotinib (PF2341066, Xalkori), which was approved by Food and Drug Administration in 2011 as the standard treatment for ALK-positive NSCLCs. However, despite the good clinical effect of crizotinib, subsequent treatment also resulted in mutation L1196M, [13][14][15][16][17] which is similar to gatekeeper mutations in the EGFR-T790M and ABL-T315I kinase domains. 18,19 Therefore, there is an urgent need for a new generation of ALK inhibitors that are able to overcome drugresistant mutants such as L1196M. Recently, the potent and high selectivity of the second-generation ALK inhibitors including X-396, alectinib (CH-5424802), ASP3026, AP26113, and ceritinib (LDK378) have been found. These inhibitors exhibit good biological activity, but are not enough for the possible resistance mutations, 20,21 Given above, we can learn from both of the drug resistance of the first-generation ALK inhibitors of crizotinib and the better treatment effect of the second-generation ones, by means of computational techniques, to in-depth study the structureactivity relationship of the second-generation ALK inhibitors.
The ALK inhibitors developed recently are shown in ►Fig. 1. In the present study, a computer-aided drug design was used to set three-dimensional quantitative structure-activity relationship (3D-QSAR) models using 45 derivative molecules with anti-ALK ability. [22][23][24] Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models based on ligand alignment were made. Both the contour maps of CoMFA and CoMSIA revealed the key factors affecting the activities of the inhibitors, and guided us to design new potent ALK inhibitors. This study may provide a theoretical basis for the design of ALK inhibitors in the future.
The 3D structures of the compounds were constructed using the sketch module in Sybyl X 2.0 package. Partial atomic charges were calculated by the Gasteiger-Huckel method before energy minimization using the Tripos force field with a convergent threshold of a maximum deviation of 0.01 kcal/(mol Å). 25,26 Molecular alignment is the most sensitive factor for building reliable 3D-QSAR models. The biological activities were measured by calculating the concentration of the compound responsible for 50% of the maximum inhibition (IC 50 ) and were converted into pIC 50 (ÀlogIC 50 ) according to a reported study. 27 The chemical structures of compounds 1-45 along with their biological activities are given in ►Table 1.

Molecular Docking Studies
The diversity of the datasets containing different backbones and the molecular alignment, which are obtained through molecular docking, is crucial for developing CoMFA and CoMSIA models. Molecular docking was manipulated using the Surflex-Dock module in SYBYL-X 2.0 to explore the interactions between the binding sites of ALK (PDB code: 4MKC from the RCSB protein data bank) and molecules. After deleting the ligands and water molecules in the ALK-binding site, the ALK inhibitors (compounds 1-45) with electric charges and hydrogen atoms were butt jointed with the ALK-binding pocket. The partial atomic charges of all inhibitors were calculated using Gasteiger-Huckel charge before docking. The minimization of each structure was also performed using the Tripos force Field. Other parameters were set to default and each docked molecule produced 20 conformations. The docking total scores of compounds 1-45 are shown in ►Table 2.
The docking results of compounds 1-45 and the binding pocket of ALK (PDB code: 4MKC) are shown in ►Figs. 2 and 3, respectively. Template molecule 21 (yellow) was docked into the binding pocket of ALK. Hydrogen bonding interactions are shown with dotted yellow lines. These images were generated using the Sybyl-x 2.0 program.
Next, according to the maximum similarity and the direction of stretching in the pocket, the best conformations of compounds 1-45 for molecular modeling alignment were selected for generating the best QSAR model.

Molecular Modeling Alignment
Molecules of the training sets and text sets were aligned onto compound 21 which possessed the highest activity (►Fig. 2). The alignments will be used for the further 3D-QSAR study. The 39 molecules of the training set were aligned onto the high potent template molecule 21 (IC 50 ¼ 0.07 nm, pIC 50 ¼ 10.15). The alignment results and common substructures are shown in ►Fig. 3. The substructure had the largest basic structure and kept the molecules' space orientation consistent with the protein pocket of 4MKC. As shown in ►Fig. 3, the 39 molecules of the training set are mainly different in the piperidine ring direction, which is shown in previous reports that the ligands make hydrogen bonds at the hinge area via the pyrimidine and amino nitrogen atoms onto the backbone nitrogen and oxygen of Met1199, respectively. 28 The central pyrimidine ring of these 39 training set molecules is sandwiched between Ala1148 and Leu1256, and its chlorine substituent is directed toward the back of the pocket, making hydrophobic contacts with gatekeeper Leu1196. The piperidine ring extends to the solvent and is engaged in a salt bridge with Glu1210. The sulfonyl makes an intramolecular H-bond. 28,29 The 39 training set molecules aligned onto the high potent template molecule 21 extend almost outside the pocket, which can core likely to study the effect of the substituents at this position on the molecular activity, and the terminal piperidine fits closely to the protein surface and makes a salt bridge with E1210. As discussed earlier, the alignment of compound structures plays a key role in developing successful 3D-QSAR models. Our effort is to study the effect of different substituents on the piperidine ring and the orientation of the substituents on the activity of the ALK inhibitor. Hence, the docked poses of the ligands were used to develop receptor-based 3D-QSAR models.

Construction of the 3D-QSAR Model
QSAR and 3D-QSAR had a profound impact on medicinal chemistry. 30 CoMFA is a promising new approach to structure/activity correlation, its characteristic features are a representation of ligand molecules by their steric and electrostatic fields, combining the method sampling at the intersections of a 3D lattice, optimal mutual alignment within a series, minimizing the root mean square field differences between molecules, analyzing data by partial least squares (PLSs), using cross-validation to maximize the likelihood that the results have predictive validity, and graphic representation of results, as contoured 3D coefficient plots. However, comparative molecular similarity index analysis (CoMSIA) is more comprehensive than CoMFA because of other two fields' analysis, hydrogen bond receptor and hydrogen bond donor. 31 Residual of actual and predicted pIC 50 values is an important factor for CoMFA, CoMSIA, and 3D-QSAR models.
An ideal way to access the robustness and predictive ability of 3D-QSAR models is to estimate the performance of the models on a validation set of compounds which were not used in model constructions. 32 The observed activity and calculated activity are shown in ►Table 3 based on the prediction of the models, which aligned with the 39 molecules of the training set. The best CoMSIA model was developed using four descriptor fields, which were steric, electrostatic, hydrophobic, and hydrogen-bond acceptor fields.  3D-QSAR and Docking Studies on Pyrimidine Derivatives of Second-generation ALK Jiang et al.   The 3D-QSAR methods rely on the principle that the 3D geometric and electronic features of molecules correlate with their biological activities. 33 Furthermore, 3D-QSAR also plays an important role in the optimization of pharmacologically active compounds and in the prediction of the biological activities of newly designed compounds. 34 The predicted pIC 50 values for the training and test set compounds are listed in ►Table 3 and depicted graphically in ►Fig. 4, from which it can be seen that all points are located near.

CoMFA and CoMSIA Analysis
CoMFA and CoMSIA models were derived from a training set of 45 compounds with pIC50 values ranging from 5.44 to 10.15 (►Table 3). The PLS statistical parameters of CoMFA and CoMSIA models are listed in ►Table 4, which are useful tools to derive multilinear relationships between independent and dependent variables, together with a cross-validation test were used for the generation and internal validation of the CoMFA and CoMSIA models. 35 As ►Table 4 shows, CoMFA and CoMSIA models were developed, and the final models were selected according to the statistical parameters.
PLS analysis on all of the compounds in the training set resulted in a CoMFA model with a cross-validated q 2 of 0.663. This model gave an optimal number of components of 8 and a conventional correlation coefficient r 2 of 0.998. The corresponding steric and electrostatic field descriptors explained 47 and 52% of the total variance. For CoMSIA analysis, five descriptor fields (steric, electrostatic, donor, acceptor, and hydrophobic) were considered, and r 2 (0.988) and q 2 (0.732) values were obtained with a satisfactory result.

CoMFA and CoMSIA Statistical Result
It is important to make an initial inspection of the inhibitor molecules before establishing the 3D-QSAR models. Statistically, the r 2 value > 0.3 of the predicted set is usually considered significant, while the r 2 value > 0.5 is statistically more significant in CoMFA and CoMSIA studies. 36 As listed in ►Fig. 4(A, B), they show a good line fit. The reason for this outlier may be different structure or different binding  For the CoMFA model, the contributions of the steric and electrostatic fields were calculated to be 47 and 52%, respectively; thus, the electrostatic field has more influence in comparison to the steric field. For the optimal CoMSIA model, five descriptor fields were considered including the steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor. Their contributions were 10.1, 30.3, 13.2, 20.9, and 24.5%. ►Table 3 lists the actual and predicted pIC 50 values of the training and test set as well as the residues between them.

CoMFA Contour Maps
In the 3D-QSAR study, the striking feature of CoMFA and CoMSIA modeling is the visualization of 3D plot. The CoMFA model based on the training set of 39 compounds was employed to investigate the existence of any correlation between chemical structures and activity of ALK inhibitors. Here, to elaborate the possible effects of different groups on the core parts of compounds 1-45 on the biological activities of the compounds, we use the following two skeletons (Skeleton 1 and Skeleton 2) to elaborate with R 1 , R 2 , R 3 , R 4 , and R 5 , as listed in ►Fig. 5 (A, B).
In the CoMFA electrostatic field map, blue contours represented regions where electropositive substitutions increased the activity, while red contours indicated regions where electronegative substitutions increased the activity. The electrostatic field map is shown in ►Fig. 6B. Blue contour maps surrounding R 3 positions suggested that electropositive groups were favorable at these positions for improved inhibitory potency. Position R 3 with electropositive    substitutions displayed better biological activity and position R 4 with electronegative substitutions can increase the activity.
►Fig. 7B shows a large and a small red contour enclosing regions R 1 and R 3 , respectively. By comparing the series of compounds 22 with 24, it can be seen that as charge effects increased (N has electronegativity of 3.04 and C has electronegativity of 2.55), the inhibitory activity increased as well.
The CoMSIA contour map of hydrophobic contribution is described in ►Fig. 7C. In this figure, the yellow (hydrophobic favorable) and white (hydrophobic unfavorable) contours represent 80 and 20% of level contributions, respectively. Obviously, a large yellow and a white region that R 3 position with hydrophilic groups can increase biological activity. However, a large-sized yellow contour located at the parasite of the R 5 and R 6 aromatic ring represented regions where the compounds with more hydrophobicity at this site would possess better biological activity. Moreover, at the end of R 1 substituents, the compounds with hydrophobicity will possess better biological activity.
In ►Fig. 7D, the cyan and purple contour maps indicated favorable and unfavorable H-bond donor groups, represent-ing 80 and 20% of level contributions, respectively. Light purple appears at the end of the R 1 position, indicating that H-bond donor groups enhanced biological activity.
As a validation of the CoMSIA model that H-bond acceptor in ►Fig. 7E, the hydrogen bond acceptor field of the CoMSIA model was represented by magenta (hydrogen bond acceptor favorable) and red (hydrogen bond acceptor disfavorable), representing 80 and 20% of contributions ratio, respectively. The large magenta contours near the R 1 position suggested that the introduction of hydrogen bond acceptor substituents to these regions would increase activity. In addition, the R 3 regions showed that the biological activity influenced by the H-bond acceptor depends on the nature of its substituents.

Conclusion
In this article, we developed a statistically significant 3D-QSAR model of a series of piperidine-substituted analogues through CoMFA, CoMSIA, and molecular docking. In this 3D-QSAR model, shown in ►Fig. 8, the derived leave-one-out (LOO) validated the PLS regression QSAR model and showed acceptable r 2 (0.988) and q 2 (0.730). As noted, this model demonstrates that R 1 and R 3 played a key role in increasing molecular activity, where the R 1 position may be better with small, negatively charged, hydrophobic, and hydrogen bond donor groups, and the R 3 position may be better with small, positively charged, and hydrogen bond donor groups.
As is shown in experimental data as well as CoMFA and CoMSIA contour maps, this model is credible for explaining the structure-activity relationship of the second-generation ALK inhibitors with steric, electrostatic, hydrophobic, donor, and acceptor. The results from the 3D-QSAR models and molecular docking would lead to a better understanding of the structural features needed to design and synthesize novel potential second-generation ALK inhibitors.

Conflict of Interest
The authors report no declarations of interest. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.