Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids

Rybinska, Anna; Sosnowska, Anita; Barycki, Maciej; Puzyn, Tomasz

doi:10.1007/s10822-016-9894-3

Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids

Published: 01 February 2016

Volume 30, pages 165–176, (2016)
Cite this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Anna Rybinska¹,
Anita Sosnowska¹,
Maciej Barycki¹ &
…
Tomasz Puzyn¹

770 Accesses
25 Citations
Explore all metrics

Abstract

Computational techniques, such as Quantitative Structure-Property Relationship (QSPR) modeling, are very useful in predicting physicochemical properties of various chemicals. Building QSPR models requires calculating molecular descriptors and the proper choice of the geometry optimization method, which will be dedicated to specific structure of tested compounds. Herein, we examine the influence of the ionic liquids’ (ILs) geometry optimization methods on the predictive ability of QSPR models by comparing three models. The models were developed based on the same experimental data on density collected for 66 ionic liquids, but with employing molecular descriptors calculated from molecular geometries optimized at three different levels of the theory, namely: (1) semi-empirical (PM7), (2) ab initio (HF/6-311+G*) and (3) density functional theory (B3LYP/6-311+G*). The model in which the descriptors were calculated by using ab initio HF/6-311+G* method indicated the best predictivity capabilities (${\text{Q}}_{\text{EXT}}^{2}$ = 0.87). However, PM7-based model has comparable values of quality parameters (${\text{Q}}_{\text{EXT}}^{2}$ = 0.84). Obtained results indicate that semi-empirical methods (faster and less expensive regarding CPU time) can be successfully employed to geometry optimization in QSPR studies for ionic liquids.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Background

Ionic liquids (ILs) form an interesting group of chemical compounds. They can be built of various types of cations (imidazolium, pyridinium, pyrrolidinium, piperidinium, ammonium etc.) and anions (halides, bis[(trifluoromethyl)sulfonyl]imide, tetrafluoroborate, hexafluorophosphate etc.) [1–3]. The unlimited possibility of modification of particular ion in IL makes it possible to obtain the properties suitable for the specific needs. ILs have got several characteristic properties: they stay in liquid state at wide range of temperature; they have got an insignificant vapor pressure and melting point less than 100 °C; what is more, they are good solvents for various compounds [1, 3, 4]. Those properties as well as the possibility of structure’s changing indicate ILs’ application in many areas; examples are shown at Fig. 1 [4–14].

Such a wide spectrum of ILs’ potential applications, particularly as solvents, requires determination of their physicochemical properties, which are important from the technological point of view. Those properties are: viscosity, density, solubility in water, melting point etc. Experimental measurements of physicochemical properties for such a large group of chemicals are expensive and time-consuming. In this case, computational techniques could be applied to fill in the lack of the experimental data. One of the widely used computational methods is Quantitative Structure-Property/Activity Relationship (QSPR/QSAR) approach [15–19]. QSPR/QSAR is nowadays extensively used for various purposes, e.g. predicting toxicity against different species [20, 21], identification of potentially hazardous metabolites [22, 23] and determination of the chemicals’ mobility in the environment [24]. Building QSPR models requires calculating molecular descriptors, which are “formal mathematical representations of a molecule” [24]. That step (in case of 3D and 4D descriptors- obtained from three-dimensional molecular structure) involves optimization of the structure’s geometry (finding structure’s global energetic minimum). Geometry optimization may be performed at different levels of the theory like ab initio methods (e.g. Hartree–Fock), density functional theory (DFT) and semi-empirical methods (e.g. PM6, PM7) [25, 26]. First two are based only on the theoretical assumptions. In contrary, semi-empirical methods take into account not only quantum mechanics theory, but they also used approximations (parameters) which are fitted to empirical data, especially molecular energies and geometries. Main advantage of semi-empirical techniques is short time of calculations, however they are consider as less accurate [27].

According to our best knowledge, any recommendations of the choice of the optimization method of ionic liquids have not been published yet. There are already published QSPR/QSAR models for ILs developed on descriptors calculated from molecular geometries optimized from semi-empirical (AM1 [28], PM3 [29]), ab initio (RHF/6-31G**) [30] and DFT methods (B3LYP/6-31G*) [31]. Since calculating the descriptors with different methods may result in additional uncertainty between QSPR models, in our work we focused on examining, what is the influence of the geometry optimization method on the QSPR model. In order to determine the scale of uncertainties originating from the optimization method selection, we conducted a systematic comparison of (I) three different sets of molecular descriptors, obtained by calculations based on differently optimized structures and (II) three QSPR models built on the basis of those descriptors’ sets. Main goal of the presented paper was to find an answer for following questions: (I) Which basis set should be used to optimize structure geometry in quantum-mechanics methods? (II) Is there any significant difference between the models, where the descriptors values have been obtained based on the structures optimized with different methods (PM7, HF and B3LYP)? (III) Is there any optimal approach that should be recommended to obtain reliable QSPR models for predicting ILs’ properties?

Materials and methods

Molecular models of the ions present in the studied ILs were optimized separately by three various methods (PM7, HF/6-311+G* and B3LYP/6-311+G*). Then, we performed a series of the statistical Wilcoxon’s tests in order to examine the influence of the optimization method on the descriptors’ values. Based on the results of the statistical analysis, we were able to determinate, whether there were significant differences in the calculated descriptor values.

Subsequently, we developed three QSPR models for predicting density of ILs utilizing the same descriptors calculated from the molecular geometries optimized by PM7, HF and B3LYP methods. This was performed for investigating the influence of the optimization method on the prediction ability of the models. The models were developed based on the same experimental data set, according to the following scheme: (1) splitting experimental data into training and validation sets, (2) calculating molecular descriptors, (3) selecting the optimal, physically interpretable, combination of the descriptors and then training the QSPR model, (4) external validation of the model, (5) providing the physical interpretation of the model.

Experimental data

Density (ρ) values for 66 ionic liquids were collected from NIST Ionic Liquids Database—IL Thermo [32]. Density was measured by the vibrating tube method at 298.15 K for all ILs. The studied ILs consisted of six various types of cations (imidazolium, ammonium, phosphonium, pyridinium, pyrrolidinium, piperidinium) and 27 anions (for more information please refer to Table A in Supplementary Information). Schematic representation of the cations’ structures is presented on Fig. 2.

Geometry optimization and calculation of descriptors

In the next step, we created molecular models of all cations and anions constituting the studied ILs. This stage was performed with the Gauss View [33] software. Molecular geometries of the cations and anions (separately) were then optimized at three levels of the theory. Geometry optimization at the semi-empirical PM7 level was performed with the MOPAC 2012 software [34]. Hartree–Fock method was chosen in the study, as the basic ab initio algorithm. There are different functionals widely used in various types of DFT calculations. Since the hybrid functional B3LYP (Becke 3 term with Lee, Yang, Parr exchange) is the most commonly used one in QSPR studies [35–38], we decided to employ this functional in our calculations. In order to perform and both ab initio and DFT calculations an adequate basis set must be used. As such, we decided to compare different sizes of Pople-style, contracted basis sets, including those of double and triple zeta types augmented with polarization and diffuse functions. We took into account: the energy of the structure (after optimization), the computing time, and the difference in atomic distances and angles between the computationally optimized structures and the experimental (crystallographic) data. The comparison was conducted for the anion most frequently present in the studied set of ILs: bis[(trifluoromethyl)sulfonyl]imide [NTf₂], and for the cation of medium structural complexity: 1-ethyl-3-methylimidazolium [EMIM] (for more information please refer to Table B in Supplementary Information). Based on the comparison results, we decided to use the triple zeta basis set augmented with one set of diffuse and polarization functions on heavy atoms (6-311+G*). Both, ab initio (Hartree–Fock) and DFT (B3LYP) calculations were performed with the identical basis set with Gaussian 09 software [39]. Afterwards, we calculated molecular descriptors from the geometries optimized by the three mentioned methods. The descriptors were calculated with Dragon (version 6.0) software [40].

Model development

First, the collected experimental data were sorted according to the increasing values of ILs’ density (the modeled variable). Then, we split the data into the training and validation set using so-called “Z:1 algorithm”, in which every Zth compound is assigned to the validation set, whereas the remaining ones form the training set [41]. In effect of applying the splitting procedure (here Z = 3), we obtained the training set containing 45 ionic liquids (68 % of all) and the validation set containing 21 compounds (32 %). A table summarizing the collected data can be found in Supplementary Information.

The search for the optimal descriptor combination was carried out in two steps: firstly, by selecting 3D descriptors significantly correlated with IL density (r > 0.60); secondly, by applying the genetic algorithm, implemented in the QSARINS software [42, 43]. The following control parameters of the genetic algorithm were applied: population size: 20, and mutation rate: 45 %. The algorithm was used only for the descriptors calculated via the Hartree–Fock optimization method. The ab initio Hartree–Fock method constitutes a middle-ground of sorts—between DFT B3LYP and the semi-empirical PM7 method. We employed multiple linear regression (MLR) as the method of modeling.

Validation process

To ensure reliability of our model we followed the OECD (ang. Organization for Economic Cooperation and Development) recommendations for developing QSPR models proposed in 2004 [44]. According to those principles we properly defined endpoint; ensured transparency in the model algorithm; defined domain of applicability (AD); calculated measures of goodness-of–fit, robustness and predictivity and presented a mechanistic interpretation.

Main purpose of defining applicability domain (AD) is to point out eventual limitations of the developed model. Reliability of the predicted values depends on the structural similarity between the chemicals from the prediction and training sets [45]. There are several methods used to define limits of AD [46, 47]. In our work AD was investigated with the Williams plot (a plot of the standardized residuals vs. the leverage values, h_i). The leverage values (h_i) present similarity of particular compounds to the training set and can be calculated from the following Eq. (1):

$$h_{{\mathbf{i}}} = {\mathbf{x}}_{{\mathbf{i}}}^{{\text{T}}} ({\mathbf{X}}^{{\text{T}}} {\mathbf{X}})^{{ - 1}} {\mathbf{x}}_{{\mathbf{i}}}$$

(1)

where x _i is the vector of descriptors calculated for the considered ith compound and X is the matrix of descriptors calculated for all compounds from the training set.

Applicability domain is then determined by the two critical values: three standard deviation units of the standardized residuals (±3σ) and the threshold leverage value (h*). The value of h* is calculated as h* = 3p′/n, where p’ is the number of model’s variables plus one, and n is the number of compounds in the training set. The predictions for compounds with h_i > h* are treated as the results of extrapolation, so they are less reliable [48–50].

Furthermore, leverage approach was compared with the standardization method proposed by Roy [51]. In that approach, compounds with features very dissimilar to the rest in the training set are called “X-outliers”. Compounds from validation set which are not similar to any of the training set are considered as points outside the applicability domain. We were able to identify the X-outliers compounds and those that are outside the AD in our dataset by using application named “Applicability domain using standardization approach”.

Goodness-of–fit of our model was measured by the determination coefficient R² (Table 1) and the root mean squared error of calibration (RMSE_C). To verify robustness and predictivity of the model we performed internal (leave-one-out method, LOO) and external validation. Robustness (${\text{Q}}_{\text{CV}}^{2}$, RMSE_CV) and prediction (${\text{R}}_{\text{EXT}}^{2}$, RMSE_EXT) parameters were calculated accordingly to the equations given in Table 1 [49, 52]. In addition, we calculated the concordance correlation coefficient (CCC) [53] to measure model’s precision and accuracy and different variants of ${\text{r}}_{\text{m}}^{2}$ to indicate the external predictive capacity of a model [54]. We also estimated the presence of influential points in the training set by performing F-test proposed by Toth et al. [55], where F value is calculated by equation: ${\text{F}} = \left( {1 - {\text{Q}}_{\text{CV}}^{2} } \right)/(1 - {\text{R}}^{2} )$.

Table 1 Quality measures for QSAR models

Full size table

Results and discussion

Optimization method versus descriptor values

To obtain deeper insight into the optimization results we tested the influence of the three methods (PM7, HF/6-311+G* and B3LYP/6-311+G*) on the values of molecular descriptors. From the entire set of ILs presented in this paper, we selected 29 unique cations and 27 anions (Table C in Supplementary Information). Then, we chose only groups of 3D descriptors (ones that might be affected by the molecule’s 3D structure) from the entire set of the calculated molecular descriptors. There were: Geometrical Descriptors, Radial Density Function descriptors (RDF), 3D-MOlecule Representation of Structures based on Electron diffraction (3D-MoRSE), Weighted Holistic Invariant Molecular descriptors (WHIM), GEometry, Topology and Atom-Weights AssemblY descriptors (GETAWAY), and Randic Molecular Profiles. Next, we divided them into smaller sub-groups according to their weighting scheme or descriptors type (for example Molecular Randic Molecular Profiles or Shape Randic Molecular Profiles—see Figure 1S in Supplementary Information). At the end, we performed a series of the statistical Wilcoxon’s tests (at 5 % level of confidence), comparing the descriptors of each cation and each anion to the descriptors of their analogs from the sub-set optimized with different method. For each ion we performed a number of tests equal to the number of descriptors’ sub-groups.

According to the obtained results (Figure 1S in Supplementary Information), it can be noticed that there are groups of descriptors especially sensitive on the structure optimization method. In cases of both cation and anion, Geometrical Descriptors have significantly different values, dependently on the optimization method applied (panels A–F). RDF descriptors are also sensitive on the optimization method, but it is only the case for cations (panels A–C, sub-groups 2–6). 3D-MoRSE descriptors constitute the class being the least affected by the different optimization methods (with only few exceptions from this trend). Weighted WHIM descriptors exhibited significantly different descriptors values for anions, but only when comparing the structures optimized with HF method to the other two methods. This suggests that similar values of the descriptors are obtained for the structures of anions optimized with PM7 and B3LYP methods. The comparison between the values of weighted WHIM descriptors gave slightly different result for cations. In this case, the values of the descriptors for cations optimized with HF method and PM7 are more similar. The similarity between the WHIM descriptors values for cations optimized with HF and B3LYP methods is much smaller. WHIM total descriptors have different values in case of every optimization method for both cations and anions (panels A–F, sub-group 17). In case of GATEWAY descriptors, there are are significant differences in descriptors values for every optimization method for both cations and anions. In the pool of GETAWAY descriptors, least differences can were noticed for cations optimized with use of HF and B3LYP methods (Panel C, sub-groups 18–22). Additionally, sub-classes of unweighted GATEWAYs and GATEWAYs weighted by ionization potential are rather similar for both cations and anions, independently of compared optimization methods—Panels A–F, sub-groups 18 and 22) Group of autocorrelation GATEWAY’s seems to be more sensitive for the molecule’s optimization method (panels A–F, sub-groups 23–27). Finally, Randic Molecular Profiles came out to be very sensitive for molecules optimization method both in case of cations and anions (panels A–F, sub-groups 28–29).

To sum up, our analysis proved, that the optimization method might significantly affect the descriptor’s values of most classes of 3D descriptors. In order to verify, what is the real influence of the optimization method selection for QSPR modeling, we performed further analyzes.

Optimization method versus predictive ability

As mentioned in Model development section, relationship between the structure of ionic liquids and the density was described by the quantitative model developed with GA-MLR technique. The developed model is a linear combination of two, uncorrelated (r = 0.09) descriptors: the 3D-MoRSE descriptor weighted by mass calculated for anion (Mor03 m^A), and the mean information content on the leverage magnitude calculated for cation (HIC^C). The models’ equations obtained for the ions optimized with the three methods (PM7, HF/6-311+G* and B3LYP/6-311+G*) are presented in Table 2.

Table 2 Equations of developed models

Full size table

First two models (PM7- and HF-based) are characterized by satisfying goodness-of-fit, robustness and predictive capabilities (the values of R², ${\text{Q}}_{\text{CV}}^{2}$, ${\text{Q}}_{\text{EXT}}^{2}$ and CCC close to 1 and low values of the errors: RMSE_C, RMSE_CV, RMSE_EXT). The last one, developed with using of descriptors calculated based on the structures of ions optimized via B3LYP method has lower quality parameters. The visual correlations between the observed (experimental) and the predicted density values for the three developed models (Fig. 3) were in good accordance with the statistical parameters mentioned above.

Interestingly, the three developed models do not have identical applicability domain (Fig. 4). In case of the model utilizing descriptors calculated after PM7 optimization, two compounds, namely: 1-(2-methoxyethyl)-1-methylpyrrolidinium tris(pentafluoroethyl)trifluorophosphate (#56) and 1-methyl-3-octylimidazolium tris(pentafluoroethyl)trifluorophosphate (#28), exhibit leverage values higher than the critical one. However, the residual values stay within ±3 standard deviations from the mean value. That kind of point is called “good high leverage point” or “good influence point” and they stabilize the model (predicted data are correctly extrapolated) [56]. In case of the model developed based on descriptors obtained after HF optimization, all molecules are located within the space limited by threshold of ±3σ and critical leverage value. One ILs, namely: 1-methyl-3-propylimidazolium chloride, is placed on the edge of AD. As such, the value of density predicted for that IL has to be taken into consideration with greater caution. The model developed with descriptors calculated after optimization by B3LYP method has two points with leverage values higher than critical one. First IL, having small leverage value and residual the value between ±3σ (1-(2-methoxyethyl)-1-methylpyrrolidinium tris(pentafluoroethyl)trifluorophosphate) is a “good influence point”. Second IL with the high leverage value and the residual lower than −3σ (1-methyl-3-octylimidazolium tris(pentafluoroethyl)trifluorophosphate) is a “bad influence point”. It destabilizes the model [56]. Interestingly, 1-methyl-3-octylimidazolium tris(pentafluoroethyl)trifluorophosphate is considered as a “good influence point” in case of the model utilizing descriptors calculated based on PM7 optimization. That difference results from better prediction capability of the “PM7-based” model (predicted values are more similar to experimental ones).

As it was mentioned in the section: Validation process, applicability domain was also determined by using application “Applicability domain using standardization approach” (Table D in Supplementary Information). For “PM7-based” model, accordingly to standardization method, none of the ILs was consider as outlier, all have similar features. That result is not consistent with leverage approach, where two compounds from training set (#28, #56) were identified as less structurally similar to the others. In case of “HF-based” model obtained results are identical for both approaches, none of the compounds was classify as outlier or point out of AD. In case of the third model, “B3LYP-based”, two mentioned points were recognized as outliers, confirming the results from leverage approach. However, visualization of the outcome of leverage approach (Williams plot) gives us more precise information. Both points are less structurally similar to the rest from the training set, although prediction for one of them (#28) cannot be considered as reliable due to residual value outside of ±3σ limits.

All three presented models are linear combinations of the same two descriptors: Mor03 m^A and HIC^C. The first one belongs to the group of 3D-MoRSE descriptors. That wide group is based on the electron diffraction descriptors and can be calculated with various weights (atomic mass, van der Waals volume, Sanderson electronegativity, polarizability). Notation Mor03 m indicates that the descriptor used here is weighted by mass; number 3 is related to scattering parameter. Generally, the weighted descriptors could be used to identify presence of specific molecular fragments. Weighting by atomic mass increases effect of heavy atoms on the values of 3D-MoRSE descriptors [57].

For anions with similar skeleton of the molecule and equal number of atoms) e.g. cysteinate and serinate anions) the presence of sulfur atom significantly decreases the value of Mor03 m descriptor (Fig. 5). Moreover, an increasing molecular mass (i.e., by adding next substituent, e.g. methyl group in threonate anion) also decreases descriptor’s value.

In case of the studied set of anions, we found that the heaviest atom in the molecule has a significant impact on the modeled value. When one considers two ILs with the same cation and structurally similar anions, such as hexafluorophosphate and tetrafluoroborate, one can notice that the ionic liquid with ${\text{PF}}_{6}^{ - }$ has higher density value (1.370) than the one with ${\text{BF}}_{4}^{ - }$ anion (1.202). That difference is caused by the higher molar mass of phosphorous atom (30.97 u) in ${\text{PF}}_{6}^{ - }$ anion. Both boron and fluorine atoms present in ${\text{BF}}_{4}^{ - }$ anion have lower molar mass than phosphorus.

Second descriptor (HIC^C), belongs to the GETAWAY group. It is defined as:

$${\text{HIC}} = \sum\limits_{\text{i}}^{\text{A}} {\frac{{{\text{h}}_{\text{i}} }}{\text{M}} \cdot \log \frac{{{\text{h}}_{\text{i}} }}{\text{M}}}$$

(3)

where A is the number of atoms, h _i the leverage of the ith atom and M is a constant equal to 1 for linear, 2 for planar, and 3 for non-planar molecules.

HIC^C can be used to distinguish between the substituents in a series of cations [58]. For example, the value of HIC^C for 1-methyl-3-methylimidazolium cation is equal to 3.612. Alkyl chain elongation results in the increase of the descriptor’s value (1-ethyl-3-methylimidazolium = 4.015, 1-propyl-3-methylimidazolium = 4.247). When one considers the group of imidazolium ILs with the same anion ([NTf₂]), one would notice that the value of HIC^C descriptor is inversely proportional to the density. The relationship between the density and the descriptor values for imidazolium cation (geometry after PM7 optimization) was showed in Fig. 6.

We also explored the distribution of the selected unique cations and anions present in the studied ILs (Table E in Supplementary Information) in the space of Mor03 m^A and HIC^C descriptors (Figs. 7, 8). Therefore, we were able to find a relationship between the optimization methods of ionic structure and the values of the descriptors from developed QSPR models. Figure 7 demonstrates the anions’ distribution in the space of Mor03 m^A descriptor. Similarly, Fig. 8 shows the distribution of the particular cations in the space of HIC^C descriptor.

We noticed that the descriptor’s values obtained with using of the three studied geometry optimization methods are similar for almost all anions (Fig. 7). Though, there are three points (#5, #13, #27), which show difference between the optimization methods. In case of bis[(trifluoromethyl)sulfonyl]imide anion (#5), we noticed that the application of PM7 and B3LYP methods give almost identical descriptor values for that anion, but in case of HF there is a difference in the Mor03 m value. For the two other anions: hexafluorophosphate (#13) tris(pentafluoroethyl)trifluorophosphate (#27) HF and B3LYP methods provided more similar descriptor values than that of PM7 method. For example, the values of Mor03 m for #13 in DFT and HF panels are equal −7.51, while in case of PM7 the value raises up to −6.67. On the contrary, the effect of optimization method on the HIC^C descriptor is negligible (Fig. 8). Descriptor’s values for all cations were similar in all three studied methods.

Recommendations

The most important observation of our study is that there were significant differences between the quality measurements of the three developed QSPR models. Best-fitted model (highest value of determination coefficient, R² = 0.951) were developed based on the molecular structures optimized with HF method. When one considers the predictive capabilities of the model, it turns out that the same model has got the highest value of ${\text{Q}}_{\text{EXT}}^{2}$ (the predicted values are similar to the experimental data). That result indicates that HF method can be successfully employed in QSPR studies. However, it should be highlighted that PM7-based model has comparable values of the quality measures as well. The only exception is the model developed with descriptors obtained after B3LYP optimization. The model parameters were significantly different from those for “HF-based” and “PM7-based models”—correlation coefficients R², ${\text{Q}}_{\text{CV}}^{2}$, ${\text{Q}}_{\text{EXT}}^{2}$ were lower than 0.90, values of the errors RMSE_C, RMSE_CV, RMSE_EXT were higher than 0.070.

When deciding on whether HF or PM7 method can be applied for geometry optimization in QSPR studies, two aspects should be taken under consideration. First of all, both methods have satisfactory statistical characteristic of quality (Table 2). Secondly, HF methods require more intensive labor and cost than semi-empirical methods. When one considers the computing time and predictive capability of “HF-based” and “PM7-based” QSPR models it becomes clear that the semi-empirical method is faster and less expensive with simultaneously comparable results. Therefore, it can be successfully employed to geometry optimization in QSPR modeling. That conclusion is consistent with the earlier study published by Puzyn [37] and Rinnan [59]. These researches demonstrated that Hartree–Fock method (HF/6-31G) as well as semi-empirical (PM6 and RM1) could be successfully employed in QSPR studies. However, it should be mentioned that obtained results might be related to the dataset (e.g. number and type of compounds) and modeled value. Studies published by Roy [60] and Kar [61] showed that geometry optimization performed at higher levels of theory (MP2—that includes electron correlation) lead to the considerably higher quality of the developed model. Based on the current results of our study, we can conclude that PM7 method could be satisfactory used as an optimization approach for ionic liquids. One should be remembered that semi-empirical methods produce correct results only for the structures that are sufficiently similar to those which were used to parameterization particular semi-empirical approach [37].

Conclusions

In the presented work, we have compared three methods of geometry optimization: semi-empirical PM7, ab initio Hartree–Fock (6-311+G*) and DFT B3LYP (6-311+G*). We asked a question: How much the choice of the optimization method influences a QSPR model?

We demonstrated that 3D descriptor groups are sensitive on changing the optimization method; thereby the geometry optimization step affects the quality of the QSPR models. The results of statistical Wilcoxon’s test confirmed that particular methods provide various descriptor values.

We also have developed a model that could be used to predict the density of ILs and have examined an impact of the optimization method on the quality measures of the model. The QSPR models utilizing descriptors derived from the structures optimized at semi-empirical and ab initio levels had similar values of the validation characteristics. It means that both models had similar quality. However, when using semi-empirical methods one could calculate the descriptors and develop a QSPR in much shorter period of time. For that reason, we recommend using semi-empirical methods, such as PM7, for geometry optimization in QSPR studies for ionic liquids.

References

Wilkes JS (2002) A short history of ionic liquids—from molten salts to neoteric solvents. Green Chem 4(2):73–80. doi:10.1039/b110838g
Article CAS Google Scholar
Bruzzone S, Chiappe C, Focardi SE, Pretti C, Renzi M (2011) Theoretical descriptor for the correlation of aquatic toxicity of ionic liquids by quantitative structure-toxicity relationships. Chem Eng J 175:17–23. doi:10.1016/J.Cej.08.073
Article CAS Google Scholar
Das RN, Roy K (2013) Advances in QSPR/QSTR models of ionic liquids for the design of greener solvents of the future. Mol Divers 17(1):151–196. doi:10.1007/s11030-012-9413-y
Article CAS Google Scholar
Patel R, Kumari M, Khan AB (2014) Recent advances in the applications of ionic liquids in protein stability and activity: a review. Appl Biochem Biotechnol 172(8):3701–3720. doi:10.1007/s12010-014-0813-6
Article CAS Google Scholar
Paul TC, Morshed AKMM, Fox EB, Visser AE, Bridges NJ, Khan JA (2014) Thermal performance of ionic liquids for solar thermal applications. Exp Therm Fluid Sci 59:88–95. doi:10.1016/j.expthermflusci.2014.08.002
Article CAS Google Scholar
Zhang R, Wang CL, Yue QH, Zhou TC, Li N, Zhang HQ, Hao XK (2014) Ionic liquid foam floatation coupled with ionic liquid dispersive liquid-liquid microextraction for the separation and determination of estrogens in water samples by high-performance liquid chromatography with fluorescence detection. J Sep Sci 37(21):3133–3141. doi:10.1002/Jssc.201400568
Article CAS Google Scholar
Kapnissi-Christodoulou CP, Stavrou IJ, Mavroudi MC (2014) Chiral ionic liquids in chromatographic and electrophoretic separations. J Chromatogr A 1363:2–10. doi:10.1016/J.Chroma.05.059
Article CAS Google Scholar
Vogl T, Menne S, Balducci A (2014) Mixtures of protic ionic liquids and propylene carbonate as advanced electrolytes for lithium-ion batteries. Phys Chem Chem Phys 16(45):25014–25023. doi:10.1039/c4cp03830d
Article CAS Google Scholar
Lu Y, Korf K, Kambe Y, Tu Z, Archer LA (2014) Ionic-liquid-nanoparticle hybrid electrolytes: applications in lithium metal batteries. Angew Chem Int Ed Engl 53(2):488–492. doi:10.1002/anie.201307137
Article CAS Google Scholar
Larsson K, Binnemans K (2014) Selective extraction of metals using ionic liquids for nickel metal hydride battery recycling. Green Chem 16(10):4595–4603. doi:10.1039/C3gc41930d
Article CAS Google Scholar
Nasirpour N, Mousavi SM, Shojaosadati SA (2014) A novel surfactant-assisted ionic liquid pretreatment of sugarcane bagasse for enhanced enzymatic hydrolysis. Bioresour Technol 169:33–37. doi:10.1016/J.Biortech.06.023
Article CAS Google Scholar
Isik M, Sardon H, Mecerreyes D (2014) Ionic liquids and cellulose: dissolution, chemical modification and preparation of new cellulosic materials. Int J Mol Sci 15(7):11922–11940. doi:10.3390/Ijms150711922
Article CAS Google Scholar
Idris A, Vijayaraghavan R, Rana UA, Patti AF, MacFarlane DR (2014) Dissolution and regeneration of wool keratin in ionic liquids. Green Chem 16(5):2857–2864. doi:10.1039/C4gc00213j
Article CAS Google Scholar
Muhammad N, Man Z, Bustam MA, Mutalib MIA (2012) Rafiq S (2013) Investigations of novel nitrile-based ionic liquids as pre-treatment solvent for extraction of lignin from bamboo biomass. J Ind Eng Chem 19(1):207–214. doi:10.1016/J.Jiec.08.003
Article Google Scholar
Roy K, Kar S, Das RN (2015) Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Elsevier Academic Press, Amsterdam
Google Scholar
Roy K, Kar S, Narayan Das R (2015) A primer on QSAR/QSPR modeling. Springer, Berlin. doi:10.1007/978-3-319-17281-1
Book Google Scholar
Chen B-K, Liang M-J, Wu T-Y, Wang HP (2013) A high correlate and simplified QSPR for viscosity of imidazolium-based ionic liquids. Fluid Phase Equilib 350:37–42. doi:10.1016/j.fluid.2013.04.009
Article CAS Google Scholar
Bai LG, Zhu JQ, Chen BH (2011) Quantitative structure-property relationship study on heat of fusion for ionic liquids. Fluid Phase Equilib 312:7–13. doi:10.1016/J.Fluid.09.005
Article CAS Google Scholar
Gajewicz A, Haranczyk M, Puzyn T (2010) Predicting logarithmic values of the subcooled liquid vapor pressure of halogenated persistent organic pollutants with QSPR: how different are chlorinated and brominated congeners? Atmos Environ 44(11):1428–1436. doi:10.1016/j.atmosenv.2010.01.041
Article CAS Google Scholar
Peric B, Sierra J, Marti E, Cruanas R, Garau MA, Arning J, Bottin-Weber U, Stolte S (2013) (Eco)toxicity and biodegradability of selected protic and aprotic ionic liquids. J Hazard Mater 261:99–105. doi:10.1016/j.jhazmat.2013.06.070
Article CAS Google Scholar
Roy K, Das RN, Popelier PL (2014) Quantitative structure-activity relationship for toxicity of ionic liquids to Daphnia magna: aromaticity vs. lipophilicity. Chemosphere 112:120–127. doi:10.1016/j.chemosphere.2014.04.002
Article CAS Google Scholar
Li S-L, He M-Y, Du H-G (2011) 3D-QSAR studies on a series of dihydroorotate dehydrogenase inhibitors: analogues of the active metabolite of leflunomide. Int J Mol Sci 12(12):2982–2993. doi:10.3390/ijms12052982
Article CAS Google Scholar
Ruiz P, Myshkin E, Quigley P, Faroon O, Wheeler JS, Mumtaz MM, Brennan RJ (2013) Assessment of hydroxylated metabolites of polychlorinated biphenyls as potential xenoestrogens: a QSAR comparative analysis∗. SAR QSAR Environ Res 24(5):393–416. doi:10.1080/1062936x.2013.781537
Article CAS Google Scholar
Cronin MTD (2010) Quantitative Structure-Activity Relationships (QSARs)—applications and methodology. In: Puzyn T, Leszczynski J, Cronin MTD (eds) Recent advances in QSAR studies. Methods and applications. Challenges and advances in computational chemistry and physics, vol 8. Springer, New York, pp 3–11. doi:10.1007/978-1-4020-9783-6
Google Scholar
Jensen F (1999) Introduction to computational chemistry. Wiley, New York
Google Scholar
Stewart JJ (2004) Comparison of the accuracy of semiempirical and some DFT methods for predicting heats of formation. J Mol Model 10(1):6–12. doi:10.1007/s00894-003-0157-6
Article CAS Google Scholar
Young DC (2001) Computational chemistry: a practical guide for applying techniques to real-world problems. Wiley-Interscience, New York
Katritzky AR, Lomaka A, Petrukhin R, Jain R, Karelson M, Visser AE, Rogers RD (2002) QSPR Correlation of the melting point for pyridinium bromides, potential ionic liquids. J Chem Inf Model 42(1):71–74. doi:10.1021/ci0100503
Article CAS Google Scholar
Ma S, Lv M, Zhang X, Zhai H, Lv W (2015) Computational study of the effects of cations and anions to the cytotoxicity of diverse ionic liquids by supervised machine learning. Chemometrics Intell Lab Syst 144:138–147. doi:10.1016/j.chemolab.2015.03.014
Article CAS Google Scholar
Yu G, Wen L, Zhao D, Asumana C, Chen X (2013) QSPR study on the viscosity of bis(trifluoromethylsulfonyl)imide-based ionic liquids. J Mol Liq 184:51–59. doi:10.1016/j.molliq.2013.04.021
Article CAS Google Scholar
Bai L, Zhu J, Chen B (2011) Quantitative structure–property relationship study on heat of fusion for ionic liquids. Fluid Phase Equilib 312:7–13. doi:10.1016/j.fluid.2011.09.005
Article CAS Google Scholar
Dong Q, Muzny CD, Kazakov A, Diky V, Magee JW, Widegren JA, Chirico RD, Marsh KN, Frenkel M (2007) ILThermo: a free-access web database for thermodynamic properties of ionic liquids. J Chem Eng Data 52(4):1151–1159. doi:10.1021/Je700171f
Article CAS Google Scholar
Dennington R, Keith T, Millam J (2009) GaussView, 5th edn. Semichem Inc., Kansas
Google Scholar
Stewart JJP (2012) MOPAC2012. Stewart Computational Chemistry, Colorado Springs
Google Scholar
Wan J, Zhang L, Yang G, Zhan CG (2004) Quantitative Structure-Activity Relationship for cyclic imide derivatives of protoporphyrinogen oxidase inhibitors: a study of quantum chemical descriptors from density functional theory. J Chem Inf Model 44(6):2099–2105. doi:10.1021/ci049793p
Article CAS Google Scholar
Rasulev BF, Abdullaev ND, Syrov VN, Leszczynski J (2005) A Quantitative Structure-Activity Relationship (QSAR) study of the antioxidant activity of flavonoids. QSAR Comb Sci 24(9):1056–1065. doi:10.1002/qsar.200430013
Article CAS Google Scholar
Puzyn T, Suzuki N, Haranczyk M, Rak J (2008) Calculation of quantum-mechanical descriptors for QSPR at the DFT level: is it necessary? J Chem Inf Model 48(6):1174–1180. doi:10.1021/Ci800021p
Article CAS Google Scholar
Kušić H, Rasulev B, Leszczynska D, Leszczynski J, Koprivanac N (2009) Prediction of rate constants for radical degradation of aromatic pollutants in water matrix: a QSAR study. Chemosphere 75(8):1128–1134. doi:10.1016/j.chemosphere.2009.01.019
Article Google Scholar
Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Mennucci B, Petersson GA, Nakatsuji H, Caricato M, Li X, Hratchian HP, Izmaylov AF, Bloino J, Zheng G, Sonnenberg JL, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Montgomery JA Jr, Peralta JE, Fo Ogliaro, Bearpark MJ, Heyd J, Brothers EN, Kudin KN, Staroverov VN, Kobayashi R, Normand J, Raghavachari K, Rendell AP, Burant JC, Iyengar SS, Tomasi J, Cossi M, Rega N, Millam NJ, Klene M, Knox JE, Cross JB, Bakken V, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Martin RL, Morokuma K, Zakrzewski VG, Voth GA, Salvador P, Dannenberg JJ, Dapprich S, Daniels AD, Farkas Ăd, Foresman JB, Ortiz JV, Cioslowski J, Fox DJ (2009) Gaussian 09. Gaussian Inc, Wallingford
Google Scholar
Talete (2014) Dragon (software for molecular descriptor calculation), 6.0, Milano. http://www.talete.mi.it/
Puzyn T, Mostrag-Szlichtyng A, Gajewicz A, Skrzyński M, Worth AP (2011) Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Struct Chem 22(4):795–804. doi:10.1007/s11224-011-9757-4
Article CAS Google Scholar
Gramatica P, Chirico N, Papa E, Cassani S, Kovarich S (2013) QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J Comput Chem 34(24):2121–2132. doi:10.1002/Jcc.23361
Article CAS Google Scholar
Gramatica P, Cassani S, Chirico N (2014) QSARINS-chem: insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS. J Comput Chem 35(13):1036–1044. doi:10.1002/jcc.23576
Article CAS Google Scholar
OECD (2004) The report from the expert group on (quantitative) structure activity relationship [(Q)SARs] on the principles for the validation of (Q)SARs. Series on testing and assessment No. 49 (ENV/JM/MONO(2004)24). Organisation of Economic Cooperation and Development, Paris
Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791–4810. doi:10.3390/molecules17054791
Article CAS Google Scholar
Gramatica P (2013) On the development and validation of QSAR models. In: Reisfeld B, Mayeno AN (eds) Computational toxicology, vol 930., Methods in molecular biologyHumana Press, New York, pp 499–526. doi:10.1007/978-1-62703-059-5_21
Chapter Google Scholar
Sahlin U, Jeliazkova N, Oberg T (2014) Applicability domain dependent predictive uncertainty in QSAR regressions. Mol Inform 33(1):26–35. doi:10.1002/Minf.201200131
Article CAS Google Scholar
Netzeva T, Worth AP, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz G, Perkins R, Roberts DW, Schultz TW, Stanton DT, van de Sandt J, Tong W, Veith G, Yang C (2005) Current status of methods for defining the applicability domain of (quantitative) structure–activity relationships. The report and recommendations of ECVAM workshop 52. alternatives to laboratory animals, vol 33
Gramatica P, Cassani S, Roy PP, Kovarich S, Yap CW, Papa E (2012) QSAR modeling is not “push a button and find a correlation”: a case study of toxicity of (benzo-)triazoles on algae. Mol Inf 31(11–12):817–835. doi:10.1002/minf.201200075
Article CAS Google Scholar
Gramatica P (2010) Chemometric methods and theoretical molecular descriptors in predictive QSAR modeling of the environmental behavior of organic pollutants. In: Puzyn T, Leszczynski J, Cronin MTD (eds) Recent advances in QSAR studies. Methods and applications, vol 8. Springer, New York, pp 327–366. doi:10.1007/978-1-4020-9783-6
Chapter Google Scholar
Roy K, Kar S, Ambure P (2015) On a simple approach for determining applicability domain of QSAR models. Chem Intell Lab Syst 145:22–29. doi:10.1016/j.chemolab.2015.04.013
Article CAS Google Scholar
Gramatica P, Giani E, Papa E (2007) Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. J Mol Graph Model 25(6):755–766. doi:10.1016/j.jmgm.2006.06.005
Article CAS Google Scholar
Chirico N, Gramatica P (2011) Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J Chem Inf Model 51(9):2320–2335. doi:10.1021/ci200211n
Article CAS Google Scholar
Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN (2013) Some case studies on application of “rm2” metrics for judging quality of quantitative structure-activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34(12):1071–1082. doi:10.1002/jcc.23231
Article CAS Google Scholar
Toth G, Bodai Z, Heberger K (2013) Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart. J Comput Aided Mol Des 27(10):837–844. doi:10.1007/S10822-013-9680-4
Article CAS Google Scholar
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26(5):694–701. doi:10.1002/Qsar.200610151
Article CAS Google Scholar
Devinyak O, Havrylyuk D, Lesyk R (2014) 3D-MoRSE descriptors explained. J Mol Graph Model 54:194–203. doi:10.1016/j.jmgm.2014.10.006
Article CAS Google Scholar
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Wiley, Weinheim
Book Google Scholar
Rinnan Å, Christensen NJ, Engelsen SB (2009) How the energy evaluation method used in the geometry optimization step affect the quality of the subsequent QSAR/QSPR models. J Comput Aided Mol Des 24(1):17–22. doi:10.1007/s10822-009-9308-x
Article Google Scholar
Roy K, Popelier PLA (2008) Exploring predictive QSAR models for hepatocyte toxicity of phenols using QTMS descriptors. Bioorg Med Chem Lett 18(8):2604–2609. doi:10.1016/j.bmcl.2008.03.035
Article CAS Google Scholar
Kar S, Harding AP, Roy K, Popelier PLA (2010) QSAR with quantum topological molecular similarity indices: toxicity of aromatic aldehydes to Tetrahymena pyriformis. SAR QSAR Environ Res 21(1–2):149–168. doi:10.1080/10629360903568697
Article CAS Google Scholar

Download references

Acknowledgments

Authors would like to express gratitude to Prof Paola Gramatica for access to QSARINS software. This material is based on research funded by the National Science Center (Poland) (Grant No. UMO-2012/05/E/NZ7/01148).

Author information

Authors and Affiliations

Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdańsk, Poland
Anna Rybinska, Anita Sosnowska, Maciej Barycki & Tomasz Puzyn

Authors

Anna Rybinska
View author publications
You can also search for this author in PubMed Google Scholar
Anita Sosnowska
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Barycki
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Puzyn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Puzyn.

Ethics declarations

Conflict of interests

The authors declare that they have no competing interests.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLS 196 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rybinska, A., Sosnowska, A., Barycki, M. et al. Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids. J Comput Aided Mol Des 30, 165–176 (2016). https://doi.org/10.1007/s10822-016-9894-3

Download citation

Received: 16 October 2015
Accepted: 13 January 2016
Published: 01 February 2016
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10822-016-9894-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Geometry optimization method versus predictive ability in QSPR modeling for ionic liquids

Abstract

Background

Materials and methods

Experimental data

Geometry optimization and calculation of descriptors

Model development

Validation process

Results and discussion

Optimization method versus descriptor values

Optimization method versus predictive ability

Recommendations

Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Electronic supplementary material

Supplementary material 1 (XLS 196 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation