Identification of the Structural Features of Guanine Derivatives as MGMT Inhibitors Using 3D-QSAR Modeling Combined with Molecular Docking

DNA repair enzyme O6-methylguanine-DNA methyltransferase (MGMT), which plays an important role in inducing drug resistance against alkylating agents that modify the O6 position of guanine in DNA, is an attractive target for anti-tumor chemotherapy. A series of MGMT inhibitors have been synthesized over the past decades to improve the chemotherapeutic effects of O6-alkylating agents. In the present study, we performed a three-dimensional quantitative structure activity relationship (3D-QSAR) study on 97 guanine derivatives as MGMT inhibitors using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) methods. Three different alignment methods (ligand-based, DFT optimization-based and docking-based alignment) were employed to develop reliable 3D-QSAR models. Statistical parameters derived from the models using the above three alignment methods showed that the ligand-based CoMFA (Qcv2 = 0.672 and Rncv2 = 0.997) and CoMSIA (Qcv2 = 0.703 and Rncv2 = 0.946) models were better than the other two alignment methods-based CoMFA and CoMSIA models. The two ligand-based models were further confirmed by an external test-set validation and a Y-randomization examination. The ligand-based CoMFA model (Qext2 = 0.691, Rpred2 = 0.738 and slope k = 0.91) was observed with acceptable external test-set validation values rather than the CoMSIA model (Qext2 = 0.307, Rpred2 = 0.4 and slope k = 0.719). Docking studies were carried out to predict the binding modes of the inhibitors with MGMT. The results indicated that the obtained binding interactions were consistent with the 3D contour maps. Overall, the combined results of the 3D-QSAR and the docking obtained in this study provide an insight into the understanding of the interactions between guanine derivatives and MGMT protein, which will assist in designing novel MGMT inhibitors with desired activity.


Introduction
A number of alkylating agents, such as methylating agents (e.g., temozolomide, dacarbazine and procarbazine) and chloroethylating agents (e.g., carmustine, nimustine, lomustine and laromustine), are frequently used in the clinical treatment of malignant tumors [1][2][3]. These agents attack the O 6 position of guanine in DNA and result in forming a series of O 6 -alkylguanine lesions, which are believed to be crucial DNA adducts related to the anticancer activity of chemotherapies. For example, O 6 -methylguanine (O 6 -MG) lesion is produced by temozolomide. O 6 -chloroethylguanine is generated by chloroethylnitrosoureas and subsequently rearranges to N1,O 6 -ethanoguanine,

External Test Set Validation and Y-Randomization Test
Although the CoMFA and CoMSIA models derived from ligand-based alignment were both observed with Qcv 2 > 0.5 and Rncv 2 > 0.9, a 3D-QSAR model with acceptable predictability also requires to meet other statistical criterions, including external validation correlation coefficient (Qext 2 > 0.5), predictive correlation coefficient (Rpred 2 > 0.6) and slope k (0.85 ≤ k ≤ 1.15) [42]. So, a test set containing 25 compounds independent from the training set was used for an external validation to confirm the predictability of the obtained CoMFA and CoMSIA models. Table 2 lists the predicted pIC50 values of the training and the test sets, as well as the residues between the experimental and predicted pIC50 values. The linear correlations between the experimental and predicted pIC50 values for the CoMFA and CoMSIA models were shown in Figure 2A,B, respectively. The Qext 2 , Rpred 2 and k values are 0.691, 0.738 and 0.91 for the CoMFA model, respectively; and are 0.307, 0.4 and 0.719 for the CoMSIA model, respectively. A few outliers, such as compounds 82 and 91 in the test set, were observed with comparatively high residues between the experimental and the predicted activities. There are two possible reasons that may account for the failure of the models in outliers. Firstly, limited structural information on the C8 position of guanine (compound 82) can be obtained from the 3D-QSAR models. Secondly, there is a unique structural difference of R1 group in compound 91 when compared to the other guanine derivatives in the training set. The results of the external validation using the test set suggested that the CoMFA model was more satisfying than the CoMSIA model derived from ligand-based alignment method.  (C) The Q cv 2 values of the docking-based CoMSIA models.

External Test Set Validation and Y-Randomization Test
Although the CoMFA and CoMSIA models derived from ligand-based alignment were both observed with Q cv 2 > 0.5 and R ncv 2 > 0.9, a 3D-QSAR model with acceptable predictability also requires to meet other statistical criterions, including external validation correlation coefficient (Q ext 2 > 0.5), predictive correlation coefficient (R pred 2 > 0.6) and slope k (0.85 ď k ď 1.15) [42]. So, a test set containing 25 compounds independent from the training set was used for an external validation to confirm the predictability of the obtained CoMFA and CoMSIA models. Table 2 lists the predicted pIC 50 values of the training and the test sets, as well as the residues between the experimental and predicted pIC 50 values. The linear correlations between the experimental and predicted pIC 50 values for the CoMFA and CoMSIA models were shown in Figure 2A,B, respectively. The Q ext 2 , R pred 2 and k values are 0.691, 0.738 and 0.91 for the CoMFA model, respectively; and are 0.307, 0.4 and 0.719 for the CoMSIA model, respectively. A few outliers, such as compounds 82 and 91 in the test set, were observed with comparatively high residues between the experimental and the predicted activities. There are two possible reasons that may account for the failure of the models in outliers. Firstly, limited structural information on the C8 position of guanine (compound 82) can be obtained from the 3D-QSAR models. Secondly, there is a unique structural difference of R 1 group in compound 91 when compared to the other guanine derivatives in the training set. The results of the external validation using the test set suggested that the CoMFA model was more satisfying than the CoMSIA model derived from ligand-based alignment method.

External Test Set Validation and Y-Randomization Test
Although the CoMFA and CoMSIA models derived from ligand-based alignment were both observed with Qcv 2 > 0.5 and Rncv 2 > 0.9, a 3D-QSAR model with acceptable predictability also requires to meet other statistical criterions, including external validation correlation coefficient (Qext 2 > 0.5), predictive correlation coefficient (Rpred 2 > 0.6) and slope k (0.85 ≤ k ≤ 1.15) [42]. So, a test set containing 25 compounds independent from the training set was used for an external validation to confirm the predictability of the obtained CoMFA and CoMSIA models. Table 2 lists the predicted pIC50 values of the training and the test sets, as well as the residues between the experimental and predicted pIC50 values. The linear correlations between the experimental and predicted pIC50 values for the CoMFA and CoMSIA models were shown in Figure 2A,B, respectively. The Qext 2 , Rpred 2 and k values are 0.691, 0.738 and 0.91 for the CoMFA model, respectively; and are 0.307, 0.4 and 0.719 for the CoMSIA model, respectively. A few outliers, such as compounds 82 and 91 in the test set, were observed with comparatively high residues between the experimental and the predicted activities. There are two possible reasons that may account for the failure of the models in outliers. Firstly, limited structural information on the C8 position of guanine (compound 82) can be obtained from the 3D-QSAR models. Secondly, there is a unique structural difference of R1 group in compound 91 when compared to the other guanine derivatives in the training set. The results of the external validation using the test set suggested that the CoMFA model was more satisfying than the CoMSIA model derived from ligand-based alignment method.

3D Contour Map Analysis
The information visualization by 3D contour maps is an attractive feature of 3D-QSAR modeling, which can provide information about how to increase or decrease the biological activity of the investigated compounds. Different colors in 3D contour maps help to understand the relationship between the diversified steric and electrostatic field related to the activity of the compounds. As shown in Table 1, the steric and electrostatic fields account for 55.3% and 44.7% of the field contribution, respectively. Figure 3 displays the steric and electrostatic contour maps of the resulting CoMFA model. The steric field was presented in green and yellow colors, and the electrostatic field was shown in blue and red colors.
For the steric field, the green and yellow regions represent the sterically favorable and unfavorable properties, respectively. A yellow region was observed around the N7 position of guanine, which suggested that the bulky substitution in this region was unfavorable for the inhibitory activity to MGMT. This explains the relatively low inhibitory activities of compounds 14-16 and 30 with -CH 2 COOCH 2 CH 3 , -CH 2 CONH 2 , -CH 2 CH(OH)CH 2 CH 3 and methyl groups, respectively, at the N7 position when compared to compound 2 without any N7-substituent group. Another yellow region near the ortho-position of the benzene ring of compound 2 suggested that bulky substitution in this region also contributed to the decrease of the inhibitory activity. On the other hand, a big green polyhedron-like region was found around the C3´-C5´positions of the benzene ring. This could be the reason why compounds 2-4, 19-23, 40 and 41 exhibited higher activities than compounds 42, 43 and 44 with substituent groups on the ortho-position of benzene ring. Furthermore, the higher activities of compounds 2-4, 19-23, 40 and 41 than compounds 1, 8 and 32 are also in accordance with this conclusion.
For the electrostatic field, the blue and red regions represent the electropositivity and electronegativity favorable properties, respectively. The major blue region was found at the left wing as shown in the reference molecules, which indicated that electropositive substituent groups in this region were favorable for high inhibitory activity. For example, compound 2 was observed with higher activity than compound 31 bearing an electronegative nitrogen atom in the pyridine group. Four red regions were located near the plane of the benzene ring of compound 2 and the thenyl ring of compound 45. This explains the reason why most benzyl-or thenyl-substituted guanine derivatives were more potent than alkyl-substituted guanine derivatives. Moreover, it can be seen from the contour maps that the C8 and N9 position of guanine is relatively well tolerated. with higher activity than compound 31 bearing an electronegative nitrogen atom in the pyridine group. Four red regions were located near the plane of the benzene ring of compound 2 and the thenyl ring of compound 45. This explains the reason why most benzyl-or thenyl-substituted guanine derivatives were more potent than alkyl-substituted guanine derivatives. Moreover, it can be seen from the contour maps that the C8 and N9 position of guanine is relatively well tolerated.

Docking Analysis
Molecular docking studies were performed to predict the binding mode of the inhibitors with MGMT protein using the GOLD Suite 5.2 software (Cambridge Structural Database System). The crystal structure of MGMT protein with PDB entry of 1QNT (1.9 Å resolution) was selected for the docking studies [44]. We conducted the docking for all 97 compounds including the training and the test sets. The pose of each compound was selected according to the fitness score and the orientation. The binding affinities of the compounds with the receptor were presented by the docking scores [45]. The detailed docking results are listed in Table 4. Most of the compounds, which were docked into the active pocket of MGMT protein, presented a similar conformation as the ligand in the crystal structure of MGMT (PDB entry: 1T38) and agreed with the repairing mechanism of MGMT [46]. Figure 4 shows the optimal docked conformations of several representative molecules (1, 2, 14, 15, 40, 41, 45, 60 and 72) with MGMT protein. Three hydrogen bonds were formed between compound 1 (O 6 -MG) and the receptor, while four hydrogen bonds were formed between compound 2 (O 6 -BG) and the receptor. The resulting pose of 14 was far away from the active pocket of the receptor due to the steric effect. Although 15 can be docked into the active pocket, there is a strong steric clash between the N7-substituent group and residues Arg135 and Ser159. All N7-substituted guanine derivatives exhibited low binding affinities with the receptor, which accounted for the low inhibitory activities of compounds 14-16, 30 and 75. This is consistent with the 3D contour map analysis that a sterically unfavorable region was observed around the N7 position of guanine. By comparing compounds 2, 40, 41 and 45, we found that there were four residues (Tyr114, Cys145,

Docking Analysis
Molecular docking studies were performed to predict the binding mode of the inhibitors with MGMT protein using the GOLD Suite 5.2 software (Cambridge Structural Database System). The crystal structure of MGMT protein with PDB entry of 1QNT (1.9 Å resolution) was selected for the docking studies [44]. We conducted the docking for all 97 compounds including the training and the test sets. The pose of each compound was selected according to the fitness score and the orientation. The binding affinities of the compounds with the receptor were presented by the docking scores [45]. The detailed docking results are listed in Table 4. Most of the compounds, which were docked into the active pocket of MGMT protein, presented a similar conformation as the ligand in the crystal structure of MGMT (PDB entry: 1T38) and agreed with the repairing mechanism of MGMT [46]. Figure 4 shows the optimal docked conformations of several representative molecules (1, 2, 14, 15, 40, 41, 45, 60 and 72) with MGMT protein. Three hydrogen bonds were formed between compound 1 (O 6 -MG) and the receptor, while four hydrogen bonds were formed between compound 2 (O 6 -BG) and the receptor. The resulting pose of 14 was far away from the active pocket of the receptor due to the steric effect. Although 15 can be docked into the active pocket, there is a strong steric clash between the N7-substituent group and residues Arg135 and Ser159. All N7-substituted guanine derivatives exhibited low binding affinities with the receptor, which accounted for the low inhibitory activities of compounds 14-16, 30 and 75. This is consistent with the 3D contour map analysis that a sterically unfavorable region was observed around the N7 position of guanine. By comparing compounds 2, 40, 41 and 45, we found that there were four residues (Tyr114, Cys145, Val148 and Ser159) in the active pocket involved in the hydrogen-bonding formation of 2, 40 and 45 with the receptor, whereas an additional hydrogen bond was formed between the -NH 2 group of 41 and the oxygen atom of Asn137.
Combined with the binding affinities, the results explain why the inhibitory activity follows the order of 45 > 41 > 40 > 2. The higher potency of 40 and 41 than 2 was also supported by the 3D contour map where a big green polyhedron-like region was found around the C3´-C5´positions of the benzene ring. It is worth noting that compound 60 displays an opposite orientation compared to the pose of the ligand in the crystal structure of MGMT (PDB entry: 1T38) [46], which leads to the loss of inhibitory activity. Compound 72 monosaccharide-conjugated on the N9 position forms eight hydrogen bonds with the Tyr114, Gln115, Cys145, Val148, Ser151 and Ser159 residues of MGMT protein, which results in the highly potent activity of 72 (pIC 50 = 8.00). Figure S3 in the Supplementary Materials displays the ligand-binding surface of MGMT protein with the compounds described above, which helps to further visualize the docking results. Val148 and Ser159) in the active pocket involved in the hydrogen-bonding formation of 2, 40 and 45 with the receptor, whereas an additional hydrogen bond was formed between the -NH2 group of 41 and the oxygen atom of Asn137. Combined with the binding affinities, the results explain why the inhibitory activity follows the order of 45 > 41 > 40 > 2. The higher potency of 40 and 41 than 2 was also supported by the 3D contour map where a big green polyhedron-like region was found around the C3´-C5´ positions of the benzene ring. It is worth noting that compound 60 displays an opposite orientation compared to the pose of the ligand in the crystal structure of MGMT (PDB entry: 1T38) [46], which leads to the loss of inhibitory activity. Compound 72 monosaccharide-conjugated on the N9 position forms eight hydrogen bonds with the Tyr114, Gln115, Cys145, Val148, Ser151 and Ser159 residues of MGMT protein, which results in the highly potent activity of 72 (pIC50 = 8.00). Figure S3 in the Supplementary Materials displays the ligand-binding surface of MGMT protein with the compounds described above, which helps to further visualize the docking results.    Besides, the formation of hydrogen bonds suggests that the -NH 2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC 50 ) values. All original IC 50 values were converted into the corresponding pIC 50 values (pIC 50 =´logIC 50 ) and were used as the dependent variables in 3D-QSAR analysis. The pIC 50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC 50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers. Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.    Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.  Besides, the formation of hydrogen bonds suggests that the -NH2 group on the C2 position of guanine is essential for high inhibitory activity, which explains the low activities of compounds 13, 29, 37-39, 83, 84, 96 and 97 when compared to compound 2. Similarly, the hydrogen bond formed between the O 6 atom of guanine and Ser159 accounts for the higher inhibitory activity of compound 2 than compounds 17 and 18 with the O 6 atom replaced by sulphur. Furthermore, a narrow space was found between the C2 or C8 atom of guanine and the active pocket of the receptor, suggesting large substituent groups not allowed in these sites. On the contrary, a wide entrance near the N9 position of guanine indicates that a bulky substituent group in this site is tolerated. Docking studies identified the key residues in the active pocket of the receptor such as Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159. These residues are main contributors to the interactions between the inhibitors and MGMT protein.

Data Set
A set of 97 guanine derivatives with different inhibitory activity against MGMT, which were chosen from literatures [17][18][19]22,23,25,26], were used as a data set for molecular modeling. The activities of all compounds were tested in vitro under the same experimental conditions in terms of half maximal inhibitory concentration (IC50) values. All original IC50 values were converted into the corresponding pIC50 values (pIC50 = −logIC50) and were used as the dependent variables in 3D-QSAR analysis. The pIC50 values for the data set range from 3.00 to 8.54, suggesting an adequate data collection for the 3D-QSAR study. The chemical structures and the pIC50 values for all compounds are listed in Table 5. Figure 5 dispalys the general structures for these compounds. The 97 compounds were randomly divided into two subsets, a training set including 72 compounds used for constructing the 3D-QSAR models and a test set including 25 compounds used for evaluating the external predictive ability of the models. Since the chirality of molecules 12, 16, 26-28, 80 and 87 are unknown, we performed two parallel QSAR studies including the R-isomers and the S-isomers.     are shown in Figure 6B. Secondly, a DFT optimization-based alignment (superimposition II) was employed by performing geometry optimization on all molecules using the density functional theory (DFT) method with the B3LYP/6-31G+(d,p) basis set (GAUSSIAN-09 program package, Gaussian Inc., Wallingford, CT, USA) [51]. The 3D structures of all molecules were added with Gasteiger-Huckel partial atomic charges using SYBYL package and the alignment procedure is the same to superimposition I. The obtained alignment conformations are shown in Figure 6C. Thirdly, a docking-based alignment (superimposition III) was performed and the obtained alignment conformations were shown in Figure 6D. The active conformation of each compound was obtained from molecular docking by considering binding orientation and scoring. The selected conformation was added with Gasteiger-Huckel partial atomic charges followed by the alignment as described in superimposition I and II. The conformations of the inhibitors obtained from the three alignment methods are similar and agree with the repairing mechanism mediated by MGMT [5,44,46].
number of optimization steps was set to 1000. The quality of 3D-QSAR models is usually sensitive to a specific alignment method [41,50]. In this study, three different alignment methods were employed to construct the 3D-QSAR models. Firstly, a ligand-based alignment (superimposition I) was used for the 3D-QSAR analysis. We chose compound 45 with the highest activity as a template to fit the remaining compounds of the training and test set using the "align database" function. The common substructure of the template molecule and the other molecules is depicted in Figure 6A with blue color. The resulting alignment conformations are shown in Figure 6B. Secondly, a DFT optimization-based alignment (superimposition II) was employed by performing geometry optimization on all molecules using the density functional theory (DFT) method with the B3LYP/6-31G+(d,p) basis set (GAUSSIAN-09 program package, Gaussian Inc., Wallingford, CT, USA) [51]. The 3D structures of all molecules were added with Gasteiger-Huckel partial atomic charges using SYBYL package and the alignment procedure is the same to superimposition I. The obtained alignment conformations are shown in Figure 6C. Thirdly, a docking-based alignment (superimposition III) was performed and the obtained alignment conformations were shown in Figure 6D. The active conformation of each compound was obtained from molecular docking by considering binding orientation and scoring. The selected conformation was added with Gasteiger-Huckel partial atomic charges followed by the alignment as described in superimposition I and II. The conformations of the inhibitors obtained from the three alignment methods are similar and agree with the repairing mechanism mediated by MGMT [5,44,46].

3D-QSAR Studies
The 3D-QSAR models were constructed using CoMFA and CoMSIA methods. Steric and electrostatic potential fields for CoMFA were calculated at each lattice intersection of a regularly spaced grid of 2.0 Å. An sp 3 hybridized carbon atom with a charge of +1.0 was used as the probe atom to calculate the CoMFA steric and electrostatic fields. The cut-off value was set to 30 kcal/mol. For CoMSIA analysis, in addition to steric and electrostatic fields, hydrophobic, hydrogen bond donor and acceptor fields were also considered. The descriptors of CoMSIA were calculated using the same lattice box as that employed in CoMFA calculations. The similarity indices of all the five fields (steric, electrostatic, hydrophobic, hydrogen bond donor and acceptor) were calculated using a sp 3 hybridized carbon atom with a radius of 1.0 Å and +1.0 charge, +1.0 hydrophobicity, +1.0 hydrogen bond donor and +1.0 hydrogen bond acceptor properties. The attenuation factor was set to the default value of 0.3. A Gaussian function was employed to calculate the distance between the probe atom and each atom of the molecule.
A partial least squares (PLS) regression [52] was used to obtain statistically significant 3D-QSAR models. For PLS analysis, the CoMFA and CoMSIA descriptors were used as the independent variables, and the pIC 50 values were used as the dependent variables. Leave-one-out (LOO) method was used to perform a cross-validation analysis, in which one molecule is removed from the data set and its activity is predicted by the model derived from the remaining molecules of the data set. Then, the cross-validated correlation coefficient (Q cv 2 ) and the optimal number of principal components (ONC) were determined. After getting the ONC, a non-cross-validation analysis was performed to obtain the conventional correlation coefficient (R ncv 2 ), standard error of estimate (SEE) and F value. Finally, the 3D-QSAR models were generated. The test set of the compounds, which are not included in model generation, were used to evaluate the robustness and statistical significance of the 3D-QSAR models [42,43,53]. The pIC 50 values of the test set were predicted based on the constructed models and then the predictive correlation coefficient (R pred 2 ) was calculated using Formula (1) [38,42].
where SD is the sum of squared deviations between the activities of the test set molecules and the mean activity of the training set molecules. Predicted residual sum of squares (PRESS) is the sum of squared deviation between the predicted and the actual activity of each molecule in test set.

Molecular Docking
Molecular docking study was carried out using GOLD Suite 5.2 software. We selected the crystal structure of MGMT with PDB entry of 1QNT (1.90 Å resolution) [44] as a receptor for docking study. In order to validate the docking approach, self-docking was conducted using the X-ray crystal structure of human MGMT with PDB entry of 1T38 (3.2 Å resolution), which is a protein-ligand complex with MGMT bounding to DNA containing O 6 -MG. In the crystal structure of 1T38, the Cys145 residue was experimentally mutated to serine to avoid the covalent transferring of the methyl group on O 6 -MG [46]. For the self-docking, the protein-DNA complex model of 1T38 was simplified by removing the DNA double strands except for the O 6 -MG substrate and deleting the solvent molecules in the X-ray crystal. Hydrogen atoms were added to the protein and Gold score was chosen as a scoring function. The docking was performed by the "cytochrome P450 mode" in GOLD software and the active site was located at the Cys145 residue. The root mean square deviation (RMSD) of the docked pose was 0.0882 Å when compared to the pose in the crystal complex (see Figure S2 in the Supplementary Materials), which suggested that the docking conformation produced by GOLD closely resembled the crystal structure. Thus, GOLD is suitable for performing the docking of guanine derivatives to MGMT protein.

Conclusions
A 3D-QSAR study was performed based on a series of guanine derivatives as MGMT inhibitors using CoMFA and CoMSIA methods. Three different alignment methods were used to overlap the molecules. The optimal 3D-QSAR model was derived from CoMFA with the ligand-based alignment. The 3D contour maps provide crucial information of the steric and electrostatic field for the design of novel guanine derivatives with high MGMT-inhibitory activity. Molecular docking study was performed to explore the binding mode between the guanine derivatives and MGMT protein. The docking results suggest that the key residues in the active pocket of the receptor, including Tyr114, Gln115, Arg135, Asn137, Cys145, Val148, Ser151, Tyr158 and Ser159, play important roles in the interactions of the ligands and receptor. The oxygen atom at the C6 position and the -NH 2 group at the C2 position of guanine are essential for high MGMT-inhibitory activity. The substituent groups on the N7 position of guanine are unfavorable for the inhibitory activity due to the steric effect. A substituent group with limited size is allowed for the C8 position of guanine, while the N9 potion of guanine is highly tolerated. The combined analysis of the 3D contour maps and the docking results provide valuable information for the further understanding of the structure-activity relationship of guanine derivatives as MGMT inhibitors, which will assist in designing novel MGMT inhibitors with high activity.