Additive SMILES-based optimal descriptors in QSAR modelling bee toxicity: Using rare SMILES attributes to define the applicability domain

https://doi.org/10.1016/j.bmc.2008.03.048Get rights and content

Abstract

The additive SMILES-based optimal descriptors have been used for modelling the bee toxicity. The influence of relative prevalence of the SMILES attributes in a training and test sets to the models for bee toxicity has been analysed. Avoiding the use of rare attributes improves statistical characteristics of the model on the external test set. The possibility of using the probability of the presence of SMILES attributes in training and test sets for rational definition of the applicability domain is discussed.

Introduction

Quantitative structure-property/activity relationship (QSPR/QSAR) is a tool for the estimation of unavailable numerical data on endpoints of interest by means of correlations ‘descriptor-property/activity’.1 The descriptor is a numerical index of the molecular structure that can be calculated using information on the molecular architecture, for instance, represented by molecular graphs.2, 3, 4, 5 As an alternative to the graph, the simplified molecular input line entry system (SMILES) can be used for the elucidation of the molecular structure in the QSPR/QSAR analyses.6, 7, 8

The toxicity of pesticides towards bee is an important ecologic indicator, since the bees have influence to many natural processes related to fruit trees.9 The experimental definition of the numerical values of the toxicity towards bee involves considerable time and resources.10 Thus there are motivations to search for robust models of the toxicity towards bee.9, 10, 11, 12

The number of SMILES-oriented databases in the internet is gradually increasing. SMILES-based optimal descriptors gave reasonable prediction for the bee toxicity.12 The aim of the present study is an attempt to use more transparent version of the SMILES-based descriptor. In fact, each SMILES attribute is a representation of molecular fragment. The additive scheme13 is a well-known approach of modelling properties by the selection of special contributions for molecular fragments. The new version of the SMILES-based optimal descriptors is a SMILES-based realization of the additive scheme.

Section snippets

Results and discussion

Figure 1 shows the plot of ARP versus Lim S. One can see from Figure 1 that the ARP and Lim S are increasing. However for Lim S = 4 the ARP is unexpectedly decreased. This an interesting point probably it can be an indicator for a robust selection of the Lim S. The hypothesis is that the ARP can be used for a preliminary estimation of the split into training and test sets as well as for estimation of the Lim S.

Figure 2 shows the correlation coefficients for the training and test sets for different Lim S

Conclusion

The additive SMILES-based optimal descriptors can be used as a tool for predicting the values of the bee toxicity. The suggested Lim S index can be used as a tool for the selection of the list of the SMILES attributes for the robust SMILES-based model.

Method

As endpoint we used the decimal logarithm log(1/C), where C is the concentration of the pesticides expressed in mmol/bee, which kills 50% of the bees.12 These 105 pesticides have been split into training set (n = 85) and test set (n = 20).

Optimal descriptors examined in the present study have been defined asDCW=CW(b)+CW(db)+CW(tb)+CW(N)+CW(Cl)+CW(Br)+CW(F)+CW(SSk)where CW(b) is the correlation weight of the given number of branching (i.e., number of brackets in the SMILES); CW(db) is the

Acknowledgement

The authors thank the Marie Curie fellowship for financial support (the contract ID 39036, CHEMPREDICT).

References and notes (15)

  • J. Devillers et al.

    Comput. Electron. Agric.

    (2004)
  • M. Vighi et al.

    Sci. Total Environ.

    (1991)
  • A.A. Toropov et al.

    Comput. Biol. Chem.

    (2007)
  • I.G. Zenkevich et al.

    J. Chromatogr., A

    (2004)
  • Q.N. Hu et al.

    J. Data Sci.

    (2003)
  • A.A. Toropov et al.

    J. Chem. Inf. Comput. Sci.

    (2003)
  • A.A. Toropov et al.

    J. Chem. Inf. Comput. Sci.

    (2004)
There are more references available in the full text version of this article.

Cited by (47)

  • Predicting the cytotoxicity of ionic liquids using QSAR model based on SMILES optimal descriptors

    2015, Journal of Molecular Liquids
    Citation Excerpt :

    The QSPR/QSAR models have been applied widely to forecast the various properties of compounds, such as the toxicity of ILs compounds [25–32]. Simplified molecular input line entry system has been employed as an alternative for molecular graphs in the QSPR/QSAR models [33–37]. Recently, CORAL software (available at http://www.insilico.eu/coral) has been proposed as an efficient approach for the QSAR analysis.

  • QSPR studies on refractive indices of structurally heterogeneous polymers

    2015, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    In previous QSPR studies, we have shown the importance of the methodology of flexible descriptors, which is able to provide models having a comparable or sometimes better quality to the ones found by searching the best descriptors in a pool containing thousands of 0D-3D descriptors [26–28]. Thereby, we investigated the most appropriate molecular structure representation for the flexible descriptor calculation, which can be done in different ways such as by using a chemical graph [29–31], using the Simplified Molecular Input Line Entry System (SMILES) [32–34], or with an hybrid representation which includes both graph and SMILES [35,36]. The high quality experimental refractive indices measured at 298 K on 234 polymer compounds were collected from two published compilations [17,37].

  • Conformation-independent QSAR on c-Src tyrosine kinase inhibitors

    2014, Chemometrics and Intelligent Laboratory Systems
  • QSAR models for HEPT derivates as NNRTI inhibitors based on Monte Carlo method

    2014, European Journal of Medicinal Chemistry
    Citation Excerpt :

    Recent published papers have reported the applicability of solo SMILES based descriptors in QSAR analysis [34–38] as well as SMILES based descriptors in combination with topological descriptors [39–41]. All QSAR models are based on Monte Carlo optimization method where appropriate activity is treated as random event [34–41]. The aim of this study is to build QSAR models based on SMILES and graph optimal descriptors using Monte Carlo method for HEPT derivatives as NNRTIs inhibitors and an attempt to define the molecular structure responsible for stated inhibitory effect.

View all citing articles on Scopus
View full text