Pyrimidine Derivatives: QSAR Studies of Larvicidal Activity against Aedes aegypti

The present study investigated the activity of pyrimidine derivatives against Aedes aegypti. Two compounds, 3c and 3d showed excellent larvicide activity. Additionally, quantitative structure-activity relationship (QSAR) models were built using multiple-linear regression and partial least squares with descriptors generated from Dragon and VolSurf+ software, respectively. The best model is obtained with multiple linear regression (MLR), leading to a robust model. Moreover, the QSAR model is validated by means of some internal validation techniques in order to check its reliability, quality and robustness for predicting the larvicidal activity against A. aegypti. The models confirmed that the three-dimensional structure of molecules, steric properties, hydrophobic polar surface area, log partition (logP) and a simple pattern of substituent groups as methyl, methoxy, and succinimide in the pyrimidine derivatives are responsible for the larvicidal activity of the pyrimidine derivatives. Even more, the activity decreases by an electron-withdrawing group in R1 and increases when it is replaced by an aromatic ring activator group. These findings will aid in further studies of new pyrimidine derivatives active against Aedes aegypti.


Introduction
Mosquitoes are a means of transmission of many neglected diseases, with millions of people being threatened vector-borne in the world, 1-3 particularly in tropical and subtropical regions. [4][5][6] According to Gorle et al., 7 these diseases affected chiefly the tropical and subtropical regions of countries that resist chemical vector control programs because the population refused to prevent mosquito control by using chemical treatment with synthetic insecticides. According to the World Health Organization (WHO), in recent years, the transmission of dengue hemorrhagic fever, zika virus, dengue fever and chikungunya has increased dramatically in those regions that are conducive to mosquito proliferation, as these mosquitoes adapt in environments characterized by irrigation systems and heavy rains. [7][8][9][10][11][12] Chikungunya produces fever and rheumatic pain, interfering with people's quality of life for days, months or even years in more serious cases. Zika presents headache, irritation of the skin, redness of the eyes or vomiting, fatigue, fever, chills, loss of appetite or sweating. In some cases, zika may cause paralysis (known as Guillain-Barré syndrome) and, in pregnant women, there may be hemorrhagic dengue: damp, pale and cold skin, as well as a decrease in blood pressure, high fever and malaise up to three days after the mosquito bite. [13][14][15][16] Unfortunately, the use of synthetic insecticides to combat the disordered growth of Aedes aegypti has not been effective in combating these mosquitoes since they are becoming resistant to conventional poisons, as well as increasing environmental problems and presenting serious damage to human health. [17][18][19] Twenty-seven pyrimidine derivatives have already been reported in the literature by our research group. 20,21 Because pyrimidines may have insecticidal activity, 22 we investigated its biological activity against A. aegypti, with respect to the dengue vector. Additionally, quantitative structure-activity relationship (QSAR) studies were performed to understand the main physicochemical features responsible for the larvicidal activity of the studied compounds.

Maintenance and creation of the Aedes colony
The beginning of the cycle was the hatching of Aedes' eggs in glass cups containing enough distilled water to cover them completely and a little cat food to stimulate the hatching more quickly. The development of the larval cycle continued by changing water in the basins and adding more food. The end of one cycle and the beginning of the next cycle was accomplished by putting the larvae and pulp in a glass beaker with a little distilled water and food (cat food) into cages. The following week, the cups were removed and a blood repast was added to the liquid. New egg collection was conducted with filter paper and dried at room temperature of 27 ± 1 °C and 75% humidity. The mosquitoes were kept alive with a piece of cotton soaked with 10% sucrose solution.
Experimental procedure for larvicidal bioassays Preliminary solubility tests were initially conducted for each sample aliquot to be tested in co-solvents such as ethanol, tween 80 or acetone, in order to select the co-solvent which best solubilized the sample in water. After this test, a stock solution concentration equal to 100 parts per million (ppm) was prepared by dissolving 5.0 mg of the compound to be tested in a volume of 0.7 mL of the chosen cosolvent. 23,24 The contents were then transferred from the beaker into a 50 mL volumetric flask and the volume topped with distilled water. Preliminary larvicidal tests were performed in concentrations of 10, 50 and 100 ppm in order to observe the concentration range where the compound was most active. The tests were carried out in triplicate which, in each replica, 20 larvae of A. aegypti were used and larvicidal activity was observed after 24 and 48 h from the beginning of the test. The larvae were considered dead when they did not respond to the stimulus or did not emerge on the surface of the solution. Negative controls (solution containing only co-solvent and distilled water) were carried out simultaneously with the tests. For the determination of the lethal concentration (LC 50 ) values for 50% of the larval population, the data was obtained using Probit software 25 with the statistical program StatPlus Pro 6.2.5.0, 26 at a 95% confidence level.

Evaluation of cytotoxic activity
Macrophage cell lines RAW 264.7 were cultured in a complete Dulbecco's modified Eagle medium (DMEM, 100 mg mL -1 streptomycin, 100 U mL -1 penicillin and 10% fetal calf serum (FCS)), (Cultilab, Campinas, São Paulo, Brazil). Cells were maintained at 37 ºC, in a 5% CO 2 atmosphere. Macrophages were seeded at 10 5 cells per well in 96-well plates and incubated for 24 h (37 °C and 5% CO 2 ). Compounds were then added to six different concentrations (6.25 to 200 mg mL -1 ). Wells containing only the culture medium and cells (without treatment) were used as the negative control. Cells were incubated for 48 h. Then a MTT solution (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazoliumbromide (Sigma, St. Louis, MO, USA) at 5 mg mL -1 in phosphate buffered saline (PBS) was added. The plates were incubated again for 2 h. The remaining culture medium and the unreduced MTT were removed and 100 μL of dimethyl sulfoxide (DMSO) was added for solubilization of formazan. The amount of formazan was determined by measuring the absorbance at 570 nm. The assay was performed in triplicate. The cytotoxicity concentration (CC 50 ) was calculated by regression analysis with GraphPad Prism Software 5.0 (San Diego, CA, USA). 27

Statistical analysis
Statistical analysis was performed using non-parametric tests. A simple linear regression test was performed to obtain CC 50 .

Molecular descriptors
Initially, the SMILES (simplified molecular line entry system) code of the molecules was obtained, generating a single file, .smi format, of all molecules. This file was introduced into the software Standardizer, ChemAxon, 28 to canonize structures, add hydrogens, perform aromatic form conversions, clean the molecular graph in three dimensions. The process uses a divide-and-conquer approach. The structure is split into small fragments which are organized into a tree using connectivity information. Conformers generated for the initial structure (represented by the root node in the tree) are optimized. The tree building process uses a proprietary extended version of the Dreiding force field. 29 Finally, the compounds were saved in SDF format that was used as input in two software programs to generate the molecular descriptors. The first program was the VolSurf+ v. 1.0.7, 30,31 which produced the file containing the structures in 3D, by calculating molecular descriptors through the intrusion of these molecules to the molecular interaction fields (MIFs), using 4 different probes: amide nitrogen (N1), carbonyl oxygen (O), hydrophobic probe (DRY) and water probe (H 2 O). It was also calculated some molecular descriptors not derived from the field of molecular interaction (non-MIF), generating a total of 128 descriptors; for example, quantifiable descriptors such as hydrophobic-lipophilic balance, capacity factors, molecular size, hydrophobic and hydrophilic regions, amphiphilic moments and moments of energy interaction. 31 Due to ease of use, understanding and interpretation of these descriptors, they were chosen the present study.
The second program used was Dragon v. 7.0.10, 32 to calculate a total of 5255 molecular descriptors through 29 descriptor blocks. After calculating these descriptors, a data treatment was performed in which descriptors of constant variables were excluded for each descriptor block, besides the exclusion of those descriptors that had a high degree of correlation (r < 0.95).

Model generation
The QSAR models were developed using the partial least squares (PLS) method with VolSurf+ software. 30,31 The PLS is based on linear regression, making it possible to extract and rationalize multivariate information to explain the maximum correlation between the descriptors, matrix x and matrix y, to then calculate a new set of orthogonal variables that had not yet been correlated, the latent variables (LVs). Whenever the number of variables is greater than the number of samples and multicollinearity occurs among independent variables, this method becomes appropriate. 33,34 The self-scale pre-processing was performed and applied to all independent variables, by subtracting the mean by the values of each variable and then dividing the resulting values by the standard deviation. A variable influence in projection (VIP) plot was used to select the number of original variables. The VIP parameter has the ability to condense the importance of the variables in the PLS model and to quantify this importance by the coefficient values and indicate the contribution of each descriptor to the model. 35 The model performance was estimated by the variance explained in the coefficient of determination (r 2 ) and the coefficient of determination in the cross-validation by leaveone-out (Q cv 2 ). The model and number of ideal LVs, that is, orthogonal linear combinations of the original variables, were determined by the highest value of Q cv 2 . 36,37 MobyDigs software 38 was also used to calculate and generate regression models using a genetic algorithm, in which the models for the activity were constructed and internally validated using leave-one-out Q cv 2 .

Insecticidal activities
Insecticidal resistance is the main problem in the control of the mosquito population in the world, principally for A. aegypti. In view of this, compounds of pyrimidine derivatives with potential anti-larvicidal activity were investigated to detect mortality of larvae in the presence of these compounds. The 27 compounds 3a-r, 5a-f and 6g-i screened for larvicidal activity against four instar larvae of A. aegypti are listed in Table 1. Considerable variation was observed in the susceptibility of the larvae to the different pyrimidines, indicating those which proved to be the most promising agents, with impressive larvicidal activity. Thus, 17 compounds 3e-l, 3n, 3q,r, 5a, 5e,f of the 27 pyrimidines showed ≥ 50% mortality against the mosquito larvae in concentration of 100 ppm; the other 10 compounds 3a-d, 3m, 3o,p and 5b-d achieved ≤ 50% mortality level of larval mortality in concentration of 100 ppm.
Among all the synthesized compounds, 3l and 5a showed highest mortality results (56.7 ± 15.4, 65.7 ± 15.9, respectively) and reached standard value at 100 ppm concentration at 72 h of exposure; other results were higher than the standard value at 100 ppm concentration. Impressive mortality was observed for 3c and 3d, with values 5.4 ± 1.8 and 4.4 ± 0.5, respectively. The mortality rates were considered at 100 ppm concentration at 72 h.
In parallel to the evaluation of larvicidal activity, the toxicity of the compounds on mammalian cells was determined, using the murine macrophage RAW 264.7. A variation in the activity of the pyrimidines was observed. Nine, of the 27 pyrimidines in Table 1 showed lower cytotoxicity, with high CC 50 values, above 100 mM: 3a,  3b, 3f, 3j, 3k, 3m, 3o and 5a, 5d. Another ten, showed higher toxicity, with CC 50 values below 50 mM: 3d, 3e, 3g,  3l, 3n, 3p, 3q, 5c and 6i. Intersecting the results obtained in the evaluation of larvicidal activity and mammalian cell toxicity, the pyrimidines which showed the best results, i.e., low toxicity in macrophages and high mortality rate at the concentration of 100 ppm for A. aegypti larvae were: 3f, 3j, 3k and 5a.

QSAR models
In order to minimize errors in the studies by QSAR, Kubinyi 39 recommends that the biological activity values should be standardized (inhibition concentration-IC 50 , LC 50 , lethal dose-LD 100 , etc) and that the variation in the values of activity between the most active and least active compounds is at least one log unit. It can be observed that the difference of logarithmic units between the highest activity compound, 3d pLC 50 = 4.9, and the lowest activity compound, 5a pLC 50 = 3.75, is 1.15, in accordance with the basic principle of the QSAR study.
Regression analysis of the training set generated where n is the number of samples, r 2 is the coefficient of determination, s is the mean square error, F is the Fisher function, Q 2 is the cross-validated r 2 , and SDEP is standard deviation error of prediction.
An analysis of Table 2 and Figure 1 indicates that each value represents the adjustment in relation to a line of the points that had been used for the calibration of the model. Equation 1, described above, shows that the value of the internal prediction (leave-one-out) coefficient (Q 2 cv ) is considerable (0.920), indicating how robust the model was. The value of F (75.94) was highly expressive, with 95% significance and with 2 and 9 degrees of freedom, where the minimum required value is 4.26. In the conception of obtaining information from 3D (three-dimensional) atomic coordinates through the transformation technique used in electron diffraction studies, one of them Mor04u, is based on the 3D-morse descriptor (3D-molecule representation of structures based on electron diffraction) selected in equation 1. The descriptor TDB05v is a 3D-topological distance-based descriptor, which is based on Moreau-Broto's 2D (two-dimensional) autocorrelation that portrays how a certain property is spread throughout a topological molecular structure, also encoding information about the separation space between two atoms. 40 The larvicidal activity is closely related to the three-dimensional  The graphs of the t1-t2 scores and loadings with two LVs of the PLS model can be observed in Figure 3, showing a separation between the most active compounds (black points) and less active (white points). The white points to the left of the graph indicate the lower experimental activity (pLC 50 ) chemical structures, while the black points to the right of the graph depict the chemical space of the molecules with the highest pLC 50 values.
In the loading plot, (Figure 3b), the descriptors VolSurf+ that contributed more in the PLS model can be seen. The HSA (hydrophobic surface area) and log partition between cyclohexane and water (logPc-hex) contribute positively to the pLC 50 values. These descriptors are related to the hydrophobic characteristics of these molecules. The HSA is computed through the hydrophobic region of the molecule and the logPc-hex. 41,42 The physical-chemical characteristics described by the HSA descriptors and logPc-hex are important for the discovery of new drugs since they provide pharmacokinetic properties in the initial phase, revealing molecules with greater chance of being transported through the circulatory system to the target tissues. 41,43 Since the LC 50 values were obtained from the test against the larvae mosquito, these properties are related to potential toxicity. On the other hand, the descriptors PSA (polar surface area), PSAR (ratio between the PSA and the surface) and PHSA (polar hydrophobic surface area), the ratio between the polar surface area PSA and the HSA, contribute negatively to pLC 50 values, therefore the more polar compounds are less active. Figure 4 shows the lipophilic and hydrophilic regions of the higher activity compound, 3d, and the compound with the lower activity, 3l. It can be observed that compound 3d has a larger lipophilic area and smaller hydrophilic surface area when compared with molecule 3l, thus justifying its better activity.
Given the results and analyzing the most active and least active compounds, we can observe the influence of the substituents on the radicals R 1 , R 2 and R 3 . The PLS model classified that molecules with higher lipophilicity (logPn-hex) and HSA have higher activities and those with smaller PSA, smaller radius between PSAR and the smallest radius between the polar surface area and the PHSAR has low biological activity. Thus, we can analyze the substituents R 1 , R 2 and R 3 to understand the relationship between chemical structure and biological activity.
First looking at the radicals R 2 and R 3 when substituted by the nitro group (NO 2 ), we see that when located at R 3 , position para, there is a slight improvement in activity, comparing, for example, molecules 3d and 3c, where the only difference is precisely the position of the group nitro. This slight improvement is explained by the decrease in PSAR and PHSAR when nitro is in the R 2 target position, Table 3. Similarly, it occurs when the substituent of R 2 and R 3 is succinimide, molecules 5c and 5d, where substitution in R 2 , meta position, is favored. Now looking at the radical R 1 , which has been replaced by OCH 3 , CH 3 , H and NO 2 , we see that when there is a methoxyl (OCH 3 ) the activity profile of the molecule is better than when replaced by a methyl, molecules 3c and 3o, for example. When substituted by methyl group there is a considerable decrease in PSA, PSAR and PHSAR (Table 3), causing the activity to be decreased, pLC 50 from 4.81 for Comparing now the methoxyl with the nitro group, molecules 3c and 3l, we have that the methoxyl group has a higher lipophilicity due to the presence of methyl and there is also an increase in hydrophobic surface area, explaining why the molecule 3c is more active than molecule 3l, with 3l being one of the least active (pLC 50 = 3.81). Similar behavior occurs when R 1 is replaced by hydrogen compared to methoxyl substitution, molecules 5c and 5a, the increase in logPc-hex and HSA (Table 3) will be greater with methoxyl, so the molecule will have better activity.
An important fact that should also be considered in the analysis is that the vector sum of the individual bond dipole moments will change the polar and nonpolar profile of the molecule, and that certain geometries make the resulting dipole moment null, resulting in nonpolarity of the molecule. [43][44][45][46][47][48][49] A consensus model was obtained from calculating the average values of the two models generated by the combination of the MLR and PLS models with Dragon and VolSurf+ descriptors respectively ( Table 2). The predicted activity was estimated by taking an average of the predicted pLC 50 from both QSAR methods. This procedure usually provides better prediction accuracy than the majority of individual models since errant predictions are dampened by the predictions from the other methods. The performance of the MLR model is better for all samples, except for compounds 3l and 5c. The consensus model shows almost the same performance as the MLR model and better performance than the PLS model. Finally, the consensus shows the following parameters: Q cv 2 = 0.879, standard deviation of errors of prediction (SDEP) = 0.130.
Although the descriptors related to the electronic effect of the substituent groups were not selected, we also observed that there is an electronic pattern relating the chemical structure of the compounds with the biological activity. This electronic influence is specifically perceived in the radical R 1 , where there are variations of the substituent groups.
When R 1 is replaced by a ring deactivator group, activity decreases, as in the case of the nitro group which is a strong ring deactivator, as does molecule 3l which has one of the smallest biological activities. When R 1 is replaced by an aromatic ring activator group, electro donor, biological activity also increases.
We can compare, for example, molecules 3l, 3o, 3c and 3j, to 3l as already mentioned have an aromatic ring deactivating group, whereas molecules 3o, 3c and 3j have a ring activating group. Molecule 3c has a methoxy in R 1 which is a moderate activator leading to a good biological activity of the molecule, 3d has a methyl which is a weak activator of the aromatic ring and there is a decrease in biological activity when compared to 3c, and molecule 3j has an amine at R 1 which is a strong ring activator, but it greatly elevates molecule's hydrophilia and therefore leaves it inactive.  Thus, we conclude that the substituent group on R 1 being activator of the aromatic ring is better for the activity, but this electro donor group needs to have an ideal hydrophilic character not to make the molecule with large polar surface area.

Conclusions
Compounds 3c and 3d, showed excellent activity as larvicides. In the evaluation of larvicidal activity and mammalian cell toxicity, pyrimidines which showed the best results, i.e., low toxicity in macrophages and high mortality rate at the concentration of 100 ppm for A. aegypti larvae, were: 3f, 3j, 3k and 5a. The QSAR models showed some physicochemical properties related to the pharmacokinetic behavior, such as hydrophobicity when the compounds were tested against mosquito larvae. A simple pattern of substituent groups, such as methyl and methoxy at R 1 , and a succinimide at R 3 , are responsible for the increase of larvicidal activity of the pyrimidine derivatives. This information can be used in further studies.

Supplementary Information
Supplementary data are available free of charge at http:// jbcs.sbq.org.br as PDF file.