Introduction

In risk assessment, repeated dose toxicity (RDT) provides information on the adverse toxicological effects which can be induced by repeated exposure to a substance over a limited period of time [1, 2]. No observed adverse effect level (NOAEL) indicates the dose at which no effects are observed, and the lowest observed adverse effect level (LOAEL) is the lowest dose at which adverse effect can serve as measure of the RDT [2]. The NOAEL is a toxicological requirement imposed by registration, evaluation authorization, and restriction of chemicals (REACH). When the NOAEL is not available, LOAEL can be used for the same purpose [1, 37].

REACH is recommended development of methods such as quantitative structure–activity relationships (QSARs) to assess the potential harmful effects of chemicals [3]. The aim of QSAR models is to establish correlation between the chemical structure of a substance and biological activity [8]. QSAR for estimating NOAEL/LOAEL has been described in the scientific literature [2, 9, 10].

The aim of this work was to build up QSAR model for the NOAEL by means of the CORAL software (http://www.insilico.eu/coral/).

Materials and methods

Data

Experimental data were obtained from the OECD toolbox version 3.2. (http://www.qsartoolbox.org/download), downloading HESS and Munro databases and from the US EPA’s integrated risk information system (IRIS) database (http://www.epa.gov/IRIS/). For all the compounds the canonical simplified molecular input-line entry systems (SMILES) were obtained by searching through CAS numbers on the PubChem Compound website (https://pubchem.ncbi.nlm.nih.gov/). The correspondence between CAS numbers and chemical structures was further checked using ChemID Plus Advanced website (http://chem.sis.nlm.nih.gov/chemidplus/). The numerical data from various sources were compared. If several values were available for the same compound, we used the lowest.

All doubtful or inorganic compounds, salts, and mixtures were eliminated, because the relationships between molecular structure and the NOAEL are very complex. We considered only data referring to 90 days of oral administration in rats and rejected reproductive toxicity studies. It is to be noted that the exchange of the 90-day study by shorter testing is an attractive alternative [11]. Taking into account this circumstance, values for 28 days of treatment were considered but, in order to have consistent data, they were divided by a factor of 3, as specified by the scientific committee on consumer safety (SCCS) in order to approximate the 90-day NOAEL [1]. After the above selection, about four hundreds of various substances with small molecules (e.g., 2–3 atoms) and vice versa with extremely large molecules (e.g., 100 or more atoms), molecules with specific groups, such as [N+], [NH4+], [nH], etc., and substances with molecules containing many various cycles / heterocycles were remained. Under such circumstances, the following limitations were used in the selection of compounds for the work set: (i) too large and, vice versa, too small molecules were removed (practically, molecules which can be represented by SMILES with length less than 70 and larger than 10 symbols, were selected); (ii) molecules which have only one cycles or have no cycles at all were selected; and (iii) molecules with special groups (indicated by square brackets) were removed from the work set. Thus, the dataset of 140 compounds has been selected. All values were converted to decimal logarithms (lgNOAEL). These compounds were randomly split into training, calibration, and validation sets three times.

Optimal descriptors

The hybrid optimal descriptors calculated with two representation of the molecular structure by SMILES [12] and by graph of atomic orbitals (GAO) [13, 14] were used for QSAR analysis:

$$\begin{aligned} \hbox {DCW}\left( {{T},{N}} \right)= & {} \hbox { CW}\left( {\hbox {NOSP}} \right) +\hbox { CW}\left( {\hbox {HALO}} \right) \nonumber \\&+\hbox { CW}\left( {\hbox {BOND}} \right) +\sum \hbox { CW}(\hbox {EC0}_{\mathrm{k}})\nonumber \\= & {} \sum \hbox {CW}\left( {\hbox {SA}_\mathrm{k} } \right) , \end{aligned}$$
(1)

where \(\hbox {SA}_{\mathrm{k}}\) is a structural attribute extracted from SMILES (NOSP, HALO, and BOND represented in Table 1) or from the graph of atomic orbitals (\(\hbox {EC0}_{\mathrm{k}}\) represented in Table 2); the CW(x) is so-called correlation weight for a structural attribute extracted from SMILES (i.e., NOSP, HALO, and BOND) or from GAO (i.e., \(\hbox {EC0}_{\mathrm{k}})\). The correlation weights are coefficients which are used for calculation with Eq. 1: the numerical data on the correlation weights are calculated with the Monte Carlo optimization which gives maximum of the determination coefficient \(({r}^{2})\) between experimental and predicted lgNOAEL for the training set. The T is threshold, i.e., coefficient for discrimination of \(\hbox {SA}_{\mathrm{k}}\) into two categories (i) rare (if frequency of \(\hbox {SA}_{\mathrm{k}}\) in the training set is less than T) and (ii) not rare (if frequency of \(\hbox {SA}_{\mathrm{k}}\) is larger than T in the training set). The N is the total number of the Monte Carlo method epochs \((N=1, 2, {\ldots }, 30)\). The NOSP is a descriptor indicating the presence (absence) of nitrogen, oxygen, sulfur, and phosphorus; HALO is a descriptor indicating the presence (absence) of halogens (i.e., “F,” “Cl,” “Br,” and “I”); the BOND is a descriptor indicating the presence (absence) of double, triple, and stereo chemical bonds; Table 1 contains clarifications for SMILES attributes involved in building up a model. Table 2 contains an example of representation of the molecular structure by the molecular graphs. The \(\hbox {EC0}_{\mathrm{k}}\) are extended connectivity of zero order in the GAO [13, 14].

Table 1 The examples of BOND, NOSP, and HALO descriptors
Table 2 Example of the representation of Acetoin (CAS 513-86-0; and SMILES=“O=C(C)C(O)C”) by means of (i) hydrogen-suppressed graph; and (ii) Graph of atomic orbitals

The modeling approach examined in this study includes three steps:

Step 1 :

Preparation of the list of attributes extracted from SMILES and from GAO (Tables 1 and 2).

Step 2 :

Calculation by the Monte Carlo method series of models with various values of the threshold (T = 1, 2, and 3) and different values of the number epochs (N). The preferable \({T}^{*}\) and \({N}^{*}\) are selected according to scheme represented by Fig. 1.

Step 3 :

Building up model with \({T}={T}^{*}\) and \({N}={N}^{*}\) and estimation of the predictive potential of the model with the validation set, i.e., with substances which are invisible during building up the model.

Thus, functions of the training set, the test set, and the validation set are considerably different ones. The substances from the training set are the basis to build up model. The substances from the test set are a tool to examine the “objectivity” of the model which is built up with the training set. In other words, it is evaluating “if the overtraining is absent”. Finally, the external validation set is a tool to estimate the true predictive potential of model with data on substances which are invisible during building up the model using the above-mentioned parameters \(T=T^{*}\) and \({N}={N}^{*}\).

Fig. 1
figure 1

Scheme of the definition of the preferable CORAL model. T is threshold; N is the number of epochs of the Monte Carlo optimization; \({T}^{*}\) and \({N}^{*}\) are values which give maximum for the correlation coefficient between experimental and calculated endpoint values for the test set

The user can calculate the DCW (T*, N*) and build up the model.

$$\begin{aligned} \hbox {Endpoint }=\hbox { C}_0 +\hbox {C}_\mathrm{1}*\hbox {DCW}\left( {{T}^*, {N}^{*}} \right) . \end{aligned}$$
(2)

The predictive potential of the model calculated with Eq. 2 should be validated with external validation set invisible during building up the model [12, 15]. It is to be noted that similar nonlinear models as rule are considerable better for visible training set (i.e., for the system of training and test sets) but poorer for the external invisible validation set [16, 17].

The measure of the statistical prevalence of various molecular features (\(\hbox {SA}_{\mathrm{k}})\) which are extracted from SMILES and graph of atomic orbitals can be calculated as the following equation:

$$\begin{aligned}&\hbox {SA}_{\mathrm{k}} {\hbox {Defect}}\nonumber \\&\quad =\left\{ \begin{array}{ll} \dfrac{\left| {P_{{\mathrm{TRN}}} \left( \mathrm{SA}_{\mathrm{k}} \right) -P_{\mathrm{TST}} \left( \mathrm{SA}_{\mathrm{k}} \right) } \right| }{N_{\mathrm{TRN}} \left( \mathrm{SA}_{\mathrm{k}}\right) +N_{\mathrm{TST}} \left( \mathrm{SA}_{\mathrm{k}}\right) }, \quad {\hbox {if}}\_N_{\mathrm{TST}} \left( \mathrm{SA}_{\mathrm{k}}\right) >0\\ 1, \quad {\hbox {otherwise}} \end{array}\right. \nonumber \\ \end{aligned}$$
(3)

where the \({P}_{\mathrm{TRN}}(\hbox {SA}_{\mathrm{k}})\) is the probability of the presence of the \(\hbox {SA}_{\mathrm{k}}\) in SMILES of the training set, i.e.,

$$\begin{aligned} {P}_{\mathrm{TRN}}(\mathrm{SA}_{\mathrm{k}}) = {N}_{\mathrm{TRN}}(\mathrm{SA}_{\mathrm{k}}) / {N}_{\mathrm{TRN}}. \end{aligned}$$

The \({P}_{\mathrm{TRN}}(\hbox {SA}_{\mathrm{k}})\) is the probability of the presence of the \(\hbox {SA}_{\mathrm{k}}\) in SMILES of the test set, i.e.,

$$\begin{aligned} {P}_{\mathrm{TST}}(\hbox {SA}_{\mathrm{k}}) = {N}_{\mathrm{TST}}(\mathrm{SA}_{\mathrm{k}}) / {N}_{\mathrm{TST}}. \end{aligned}$$

The \({N}_{\mathrm{TRN}}\) (\(\hbox {SA}_{\mathrm{k}}\)) is the number (frequency) of SMILES which contain \(\hbox {SA}_{\mathrm{k}}\) in the training set.

The \({N}_{\mathrm{{TRN}}}\) is the total number of SMILES in the training set.

The \({N}_{\mathrm{{TST}}}\) (\(\hbox {SA}_{\mathrm{k}}\)) is the number (frequency) of SMILES which contain \(\hbox {SA}_{\mathrm{k}}\) in the test set.

The \({N}_{\mathrm{{TST}}}\) is the total number of SMILES in the test set.

If the probability of \(\hbox {SA}_{\mathrm{k}}\) in the training set is equal to the probability of \(\hbox {SA}_{\mathrm{k}}\) in the test set, it is the ideal situation and the defect of the \(\hbox {SA}_{\mathrm{k}}\) in this case should be estimated as minimal (zero). However, this situation is not typical, i.e., the difference between the probability of \(\hbox {SA}_{\mathrm{k}}\) in the training set and the probability of \(\hbox {SA}_{\mathrm{k}}\) in the test set, as rule, is not zero. Under such circumstances, the numbers of \(\hbox {SA}_{\mathrm{k}}\) in the training set and in the test set also should be taken into account.

The small frequency or the absence of \(\hbox {SA}_{\mathrm{k}}\) in the training set most probably will lead to decrease of statistical significance of the \(\hbox {SA}_{\mathrm{k}}\) for the model. The absence (or even the small frequency) of \(\hbox {SA}_{\mathrm{k}}\) in the test set together with significant prevalence of the \(\hbox {SA}_{\mathrm{k}}\) in the training set will lead to overfitting: the improvement of the model for the training set due to the correlation weight of the \(\hbox {SA}_{\mathrm{k}}\) will be accompanied by unpredictable influence of this correlation weight for the model within the test set. The Eq. 3 gives two criteria of expedient distribution into the training set and test set: (i) the difference between probabilities of attributes be in the training and be in the test sets should be as small as possible; and (ii) the numbers of attributes in the training set and test sets should be as large as possible. If the above-mentioned two conditions take place, the split into the training and test sets should be estimated as “satisfactory” one. Finally, the molecular features which are absent in the test set cannot improve model, and their defect should be defined as maximum (i.e., unit).

Thus, the measure calculated with Eq. 3 can be used for the classification of the active (not blocked) attributes in accordance with their prevalence in the training and test set.

Having the numerical data on the defects of \(\hbox {SA}_{\mathrm{k}}\), one can compare reliability of the prediction for a substance, using the following criterion (DefectSMILES):

$$\begin{aligned} {\hbox {DefectSMILES}}=\sum _{{\hbox {ActiveSA}}_\mathrm{{k}} } {\mathrm{{SA}}_\mathrm{{k}} } {\hbox {defect}} \end{aligned}$$
(4)

The domain of applicability can be defined as follows: a substance is fall into the domain of applicability if its DefectSMILES obeys the condition:

$$\begin{aligned} {\hbox {DefectSMILES}}<2\times \overline{{\hbox {DefectSMILES}}}, \end{aligned}$$
(5)

where \(\overline{{\hbox {DefectSMILES}}} \) is average for visible set (training and test sets). Thus, the DefectSMILES gives possibility to define the domain of applicability for the models.

Using the summation of the Defect SMILES calculated with Eq. 4 one can define an integral characteristic of a split into the training and test sets:

$$ \begin{aligned} {\hbox {SplitDefect}}=\sum _{{\hbox {Training}}\, \& \,{\hbox {TestSets}}} {{\hbox {DefectSMILES}}}. \end{aligned}$$
(6)

The Split Defect can be a useful characteristic of the distribution into the training set and test set from heuristic point of view.

Results

The CORAL software gives for three above-mentioned splits the following models (the \(n\) is the number of compounds in a set, \({r}^{2}\) is determination coefficient, \({q}^{2}\) is cross-validated \({r}^{2}\), RMSE is root-mean-square error; F is Fischer F ratio):

Split 1

$$\begin{aligned}&\hbox {lgNOAEL} = -2.1959849 (\pm 0.0086409)\nonumber \\&\qquad +\,0.0675751 (\pm 0.0005022) {*} \hbox {DCW}(1,30)\nonumber \\&{n}=97,\quad \hbox {r}^{2}=0.5312, \quad {q}^{2}=0.5158, \nonumber \\&\hbox {RMSE} \quad = 0.617, {F}=108\;(\hbox {training set})\\&{n}=16,\quad {r}^{2}=0.6610, \quad \hbox {RMSE} = 0.444\;(\hbox {test set})\nonumber \\&{n}=27,\quad {r}^{2}=0.5832, \quad \hbox {RMSE} = 0.447\;(\hbox {validation set})\nonumber \\&{n}=26, \quad {r}^{2}=0.5749, \quad \hbox {RMSE} = 0.449\;\nonumber \\&\qquad (\hbox {validation set, domain of applicability})\nonumber \end{aligned}$$
(7)

Split 2

$$\begin{aligned}&\hbox {lgNOAEL} = -1.8713499\;(\pm 0.0073147)\nonumber \\&\qquad +\, 0.0633027 \;(\pm 0.0004966) {*} \hbox {DCW}(1,30)\nonumber \\&{n}=97, \quad {r}^{2}=0.5015, \quad {q}^{2}=0.4852,\nonumber \\&\hbox {RMSE} = 0.613, \quad {F}=96 \;(\hbox {training set})\\&{n}=16, \quad {r}^{2}=0.6799, \quad \hbox {RMSE} = 0.524\; (\hbox {test set})\nonumber \\&{n}=27, \quad {r}^{2}=0.5843, \quad \hbox {RMSE} = 0.457\; (\hbox {validation set})\nonumber \\&{n}=25, \quad {r}^{2}=0.5890, \hbox {RMSE} = 0.453\;\nonumber \\&\qquad (\hbox {validation set, domain of applicability})\nonumber \end{aligned}$$
(8)

Split 3

$$\begin{aligned}&\hbox {lgNOAEL} = -2.1680835 (\pm 0.0082917)\nonumber \\&\qquad +\, 0.0737528 (\pm 0.0005127) {*} \hbox {DCW}(1,30)\nonumber \\&=97, \quad {r}^{2}=0.5301, \quad {q}^{2}=0.5153,\nonumber \\&\hbox {RMSE} = 0.611, \quad \hbox {F}=107 \;(\hbox {training set})\\&{n}=16, \quad {r}^{2}=0.7306, \quad \hbox {RMSE} = 0.494 \;(\hbox {test set})\nonumber \\&{n}=27, \quad {r}^{2}=0.6049, \quad \hbox {RMSE} = 0.427\; (\hbox {validation set})\nonumber \\&{n}=26, \quad {r}^{2}=0.6143, \hbox {RMSE} = 0.425 \;\nonumber \\&\qquad (\hbox {validation set, domain of applicability})\nonumber \end{aligned}$$
(9)

Table 3 contains experimental and calculated lgNOAEL with Eqs. 79 for the training set and test set. The distributions of compounds into the training set and test set also are represented in Table 3. Table 4 contains experimental and calculated lgNOAEL for the external validation set.

Table 3 The distribution of available data into the training (+) and test (#) sets; experimental and calculated lgNOAEL values; and domain of applicability for suggested models
Table 4 The experimental and calculated lgNOAEL values and domain of applicability for validation set

Figure 2 contains the graphical representation of these models.

Fig. 2
figure 2

Experimental (expr) and calculated (calc) lgNOAEL values for three random splits

The additional analysis of twelve splits (including three represented by models which are calculated with Eqs. 79; these are the splits 1, 2, and 3; Table 5 contains the data) gives possibility to study the criterion calculated with Eq. 6. Figure 3 represents the correlation between Split Defect calculated with Eq. 6 and the determination coefficient between experimental and predicted lgNOAEL for the validation set (12 random splits). Figure 4 represents the correlation between Split Defect calculated with Eq. 6 and the root-mean-square error for the validation set (twelve random splits). Unexpectedly, the increase of the Split Defect is accompanied by increase of the determination coefficient and by decrease for root-mean-square error for the external validation set. Thus, very likely, these correlations can be useful criteria to compare different splits into the training set and test set.

Fig. 3
figure 3

Correlations between the Split Defect and the determination coefficient \(({r}^{2})\) for the external validation set (for 12 random splits)

Fig. 4
figure 4

Correlations between the Split Defect and the root-mean-square error (RMSE) for the external validation set (for 12 random splits)

Table 5 Correlation between split defect calculated with Eq. 6 and the predictive potential of the models

Discussion

Sakuratani et al. [2] developed a read-across approach to predict LOAEL within repeated dose toxicity (RDT) using toxicological grouping categories. They defined 33 chemical categories to be used for the gap filling based on RDT data. Mazzatorta et al. [10] developed a QSAR model for predicting LOAEL using chronic data (exposure longer than 180 days) in the rat. The model gave \({R}^{2}\) = 0.54 and RMSE = 0.7. However, the limit of this model is the absence of validation with an external dataset and, thus, the lack of a vital point for assessing the real predictive power of the model. The same group also calculated an experimental variability of 0.64 (logarithmic scale) for LOAEL used in their dataset. A model for the NOAEL suggested in the literature (the model is built up with involving various physicochemical descriptors) is characterized by \(n=218\), \({r}^{2}=0.35\), and \({q}^{2}=0.21\) [18].

Although the correlation between molecular structure and the NOAEL takes place, there is a considerable percentage of other factors that can influence this endpoint [8, 10, 1820].

Thus, the suggested approach gives quantitative models of the NOAEL for three random splits into the training set, the test set, and the validation sets, and the predictive potential of these models are comparable with the predictive potential of models for the NOAEL [21, 22] described in the literature [18].

Instead of the NOAEL/LOAEL approach, the Benchmark Dose Methodology (BDM) [23] can be used. However, the BDM is more expensive approach. Besides, in some cases, the BDM cannot be used to estimate toxicological behavior of substances. Taking this into account, the NOAEL/LOAEL approach should be estimated as a useful alternative for the BDM.

Finally, we deem that the principle “a QSAR is a random event” can be useful from regulatory point of view. In other words, the reliability of a QSAR approach for any endpoint in general, and for NOAEL in particular, should be validated with a group of different splits into the visible training set and invisible validation set [15]. The use of the approach (analysis of a group of distribution into the training set, test set, and external validation set, i.e., not only one split) in the case of the lgNOAEL numerical data for other set of organic compounds [18] gave the models which are statistically characterized by [24]: \(\overline{{n}} \approx 174\), \(\overline{{r}^{2}} \approx 0.70\), \(\overline{{s}} \approx 0.41 (\hbox {training set})\), and \(\overline{{n}} \approx 21\), \(\overline{{r}^{2}} \approx 0.64\), \(\overline{{s}} \approx 0.39\) (test set). These models [24] are based on the representation of the molecular structure by solely SMILES (without data on the molecular graph). In fact, compounds examined here are characterized by a bigger variety of the molecular structure; therefore, QSAR models for these substances which are based on solely SMILES are characterized by a poor statistical quality. Fortunately, the hybrid approach (SMILES together with the molecular graph) gives the satisfactory statistical quality of the models for these very varied compounds calculated with Eqs. 79.

The suggested criteria for the estimation of the defect for the individual SMILES and GAO (Defect SMILES, Eq. 4) and for the estimation of the distribution into the training set and test set (Split Defect, Eq. 6) can be a convenient tool for the QSAR analysis. The Defect SMILES gives possibility to detect “suspected” compounds, thus this criterion is a tool to define the domain of applicability. The Split Defect gives possibility to compare different distribution into the training set and test set and to select preferable distribution from point of view of robustness of a QSAR. The disadvantage of these criteria is their dependence upon the distribution of available data into the training set and test set.

The Supplementary materials section (Table S1) contains the numerical data on the correlation weights for \(\hbox {SA}_{\mathrm{k}}\) calculated with three different splits into the training and test sets. There are a group of \(\hbox {SA}_{\mathrm{k}}\) which are stable promoters of the lgNOAEL increase (i.e., \(\hbox {SA}_{\mathrm{k}}\) characterized by (i) significant frequency in the training set and (ii) stable positive values of correlation weights) and group of stable promoters of the lgNOAEL decrease (i.e., \(\hbox {SA}_{\mathrm{k}}\) characterized by (i) significant frequency in the training set and (ii) stable negative values of correlation weights). The above described \(\hbox {SA}_{\mathrm{k}}\) defects are also represented in Table S1 for split 1, 2, and 3. Table S2 contains the numerical data on the correlation weights of SMILES attributes and GAO invariants for twelve random splits into the training set and the test set. Table S3 contains an example of the DCW (1, 30) calculation for a substance represented by the SMILES and GAO (acetoin, CAS 513-86-0). Table S4 contains the list of compounds that were selected as the external validation set. Table S5 contains twelve random splits into the training and test set which are examined in this work.

Thus, the suggested models have (i) definition of the domain of applicability (Tables 3, 4); (ii) the mechanistic interpretation in terms of the promoters of increase / decrease for lgNOAEL; and (iii) unambiguous algorithm to build up model. Consequently, described models are built up in accordance with OECD principles [21, 22].

Conclusions

The NOAEL can be modeled by the Monte Carlo technique using SMILES and graph of atomic orbitals for the representation of the molecular structure. The statistical quality of models for the NOAEL calculated with the 2D descriptors is comparable with the statistical quality of models based on the 3D representation of the molecular structure with additional input of the physicochemical data. There are correlations between predictive potential of the models and the Split Defect calculated with Eq. 6 (Table 5). It should be noted that the described approach based on 2D descriptors can be used to build up predictive models for the cases of other complex endpoints, i.e., endpoints related to nanomaterials [25, 26] and endpoints related to peptides [27].