Bond Additive Modeling 4 . QSPR and QSAR Studies of the Variable Adriatic Indices

In this paper we analyze the set of the variable Adriatic indices. We show that three of these indices show very good predictive properties. Namely, the inverse sum -1.95-deg index is well correlated with the standard enthalpy of formation of octane isomers   2 0.75 R  , the inverse sum 0.43-lodeg index is well correlated with the total surface area of octane isomers   2 0.92 R  and the sum 0.37-exdeg index is well correlated with the octanol-water partition coefficient   2 0.99 R  . (doi: 10.5562/cca1666)


INTRODUCTION
Let G be a simple connected graph.Denote by   V G the set of its vertices and by   E G the set of its edges, respectively.Let us observe two vertex invariants: 1) v d degree of vertex v -number of edges incident to vertex v ; 2) , where   , d u v is distance be- tween vertices u and v , i.e. the length of the short- est path between vertices u and v .
The set of 48 variable Adriatic indices is defined in Ref.
1 by the following procedure (motivation of the definition, choice of functions, and restrictions on a are commented in due detail in Ref. where: , 0; , 0, otherwise.
x y x y y One can note that some famous molecular descriptors such as the Randić index 2 and the Zagreb index 3 are variable Adriatic indices.Namely, the Randić index is obtained by the use of the functions 2, 1/2 ψ  and 1 γ ; and the Zagreb index is obtained using functions 2,1 ψ and 1 γ .In order to analyze the predictive properties of these indices, we use (similarly as in Ref. 1) the benchmark sets 4 proposed by the International Academy of Mathematical Chemistry. 5amely, we observe four sets of chemical compounds: 1) the set of 18 octane isomers 2) the set of 82 polyaromatic hydrocarbons (PAH) 3) the set of 209 polychlorobiphenyls (PCB) 4) the set of 22 phenetylamines (Phenet) 16 properties and 102 descriptors are given for the set of octane isomers; 3 properties and 112 descriptors are given for PAHs; 8 properties and 106 descriptors for PCBs; and one property and 110 descriptors for the phenetylamines.
We exclude melting point from our observations since it does not predominantly depend on graph of the molecule.
We shall compare the best coefficient of determination of the one-parameter linear models based on the variable Adriatic indices with 1) the best coefficient of determination 2 R (equivalently correlation coefficient R ) of the one- parameter linear model based on the descriptors in the benchmark sets; 2) the best coefficient of determination 2 R (equivalently correlation coefficient R ) of the one- parameter linear model based on the discrete Adriatic indices. 1,6te that this comparison is not completely fair.Namely the linear one-parameter models based on "nonvariable" descriptors des depend on only two parameters to predict observed property prop ; namely prop k des l    depends solely on k and l .On the other hand, variable descriptor   des a depends on three parameters to predict observed property prop , namely

  prop k des a l
   depends on , a k and l .Hence, the same 2 R does not imply equally good predictive properties, because here we have one fitted parameter more.
Moreover, suppose that we observe the situation in which some discrete Adriatic index has better predictive properties than the benchmark descriptor.In this case, it is very much expected that the corresponding vaiable Adriatic index will make some improvement to 2 R .Taking all of this into account, we are not interested in variable Adriatic indices that make modest improvements of 2 R , but only in descriptors that make significant improvements to 2 R .It will be shown that there are three cases in which a large improvement of 2 R occur.

MAIN RESULTS
Note that in the Adriatic descriptors, parameter a is chosen from an infinite set of values (moreover from the set of values of cardinality c , i.e. the cardinality of the set of real numbers).Hence, it is not possible to calculate the correlation for each of these values.Mathematical optimizations of 2 R would be quite involved and the solutions of obtained equations would not be exactly solvable for most of these descriptors (since they involve logarithms and exponential functions).Hence, we use the following strategy.We restrict ourselves to some (sufficiently large) discrete set of values.In our case, we use the following set   rather than entire set of real numbers.
In the following four tables (Tables 1-4) we summarize the results (obtained using C++ program) of the comparison of the best correlations of the one-parameter linear models.In the second column, the highest 2 R value for one-parametric linear models based on benchmark set of descriptors is given.The highest 2 R for one-parametric linear models based on discrete Adriatic indices is given in the third column.In the last column the highest 2 R for one-parametric linear models based on variable Adriatic indices is given.Detailed tables with the names and values of these descriptors can be found in the supplementary materials.
The analyses of these four tables show that there are significant improvements only in the first table (i.e. when octane isomers are considered).These improvements correspond to the following three properties: standard enthalpy of formation, total surface area and octanol-water partition coefficient.
The result for octanol-water partition coefficient is especially interesting.Note that there was a very low correlation between this property and each of the indices in the benchmark set.Also, the discrete Adriatic indices made some progress, but the correlation coefficient was still very low.Contrary to this, an almost perfect correlation has been obtained for the sum 0.37-exdeg index.
We present these correlations in Table 5 (in the left column we present predictions by the best predictor in the benchmark set and in the right column we present predictions by the best predictor among the variable Adriatic indices; on each of the drawings 2 R is given).
From Table 5, it is obvious that the inverse sum -1.950-deg index, inverse sum 0.43-lodeg index and sum 0.37-exdeg index strongly correlate properties of molecules with their structure, and therefore, they may be a step forward in QSPR studies.
Note that each of these indices can be reformulated as: where  is the maximal degree of graph G and is the number of edges incident to vertices of degrees i and j .][9][10][11][12][13][14][15][16][17] Further, the sum 0.37-exdeg index can be reformulated as:   : 0.37 0.37 0.37 0.37 .Hence, this index can be observed not only as a bond additive index, but also as a vertex additive index which is much more simple.Further, if we denote by i n the number of vertices of degree i , this index can be reformulated as: In the case of chemical graphs, this reduces to: 1):

Table 1 .
Analyses of properties in the set of the octane isomers Croat.Chem.Acta 84 (2011) 87.

Table 2 .
Analyses of properties in the set of the polyaromatic hydrocarbons

Table 3 .
Analyses of properties in the set of the polychlorobiphenyls

Table 4 .
Analyses of properties in the set of biological activity in the phenetylamines Croat.Chem.Acta 84 (2011) 87.

Table 5 .
Predictions made by the best descriptor in the benchmark set and by the best predictor in the set of variable Adriatic indices for octane isomers.