GRAPH-THEORETICAL CONCEPTS AND PHYSIOCHEMICAL DATA

Graph theoretical concepts have been used to model the molecular polarizabilities of fifty-four organic derivatives, and the induced dipole moment of a set of fifty-seven organic compounds divided into three subsets. The starting point of these modeling strategies is the hydrogen-suppressed chemical graph and pseudograph of a molecule, which works very well for second row atoms. From these types of graphs a set of graph-theoretical basis indices, the molecular connectivity indices, can be derived and used to model properties and activities of molecules. With the aid of the molecular connectivity basis indices it is then possible to build higher-order descriptors. The problem of 'graph' encoding the contribution of the inner-core electrons of heteroatoms can here be solved with the aid of odd complete graphs, Kp-(p-odd). The use of these graph tools allow to draw an optimal modeling of the molecular polarizabilities and a satisfactory modeling of the induced dipole moment of a wide set of organic derivatives.


INTRODUCTION
The easiest way to keep us from drowning in the rising sea of experimental data is to compress these data into algorithms, the easier the algorithm, the better. The advantages of this procedure are several, e.g., it allows the development of software to enable drug testing in silico. A theory is needed to tell us how to compress data into algorithms. A very successful theory was introduced in chemistry more than hundred years ago, the chemical graph theory, which was further refined during the second half of the twentieth century, giving rise to several theoretical branches spread over different fields of physical and pharmaceutical chemistry (Balaban, 1985;Kier & Hall, 1986, 1992Hansen & Jurs, 1988;Trach, Devdariani & Zefirov, 1990;Randić, 1991;Trinajstić, 1992;Randić & Trinajstić, 1994;Basak & Grunwald, 1995;Temkin, Zeigarnik, & Bonchev, 1996;Gutman, Klavzar & Mohar, 1997;Seybold, 1999;Balaban & Devillers, 1999;Randić, Mills & Basak, 2000;Pogliani, 2000;Klein, & Brickman, 2000;Galvez, Garcia-Domenech & de Gregorio-Alapont, 2000;Estrada & Uriarte, 2001;Diudea, 2001;King & Rouvray, 2002). One branch of this theory, the molecular connectivity theory, developed more than half a century ago by Randić, Kier, and Hall (Randić, 1975;Kier & Hall, 1986), states that for every chemical graph (cg) that has a set of graph theoretical basis indices, {β}, then it also has the property P, i.e., {cg}({β}→ P), in short-hand notation. This property also encompasses activity, A. Clearly, such an assertion has a probabilistic character.
In the following sections we will explain (i) what a chemical graph is, (ii) what graph-theoretical indices are, and (iii) how they can be used to model the values of the properties, P. The properties that will be modeled with the given graph theoretical concepts are the polarizability and the induced dipole moment of two heterogeneous sets of organic compounds taken from a recent molecular mechanics study (Ma, Lii & Allinger, 2000).

GRAPH THEORETICAL CONCEPTS
The following graph theoretical concepts are needed in molecular connectivity theory for modeling properties and activities are the following (for a quick reference see Rosen, 1995).
Graph, G = {V, E}: a graph can be defined as set of vertices, V, and a set of edges, E that connect these vertices. The degree of a vertex of a graph is the number of edges that occur with it.
Pseudograph: a graph that allows for multiple connections and loops (or self-connections). These types of graphs allow a faithful encoding of a molecule, as it is possible to encode multiple bonds and non-bonding electrons with them. The loop at a vertex contributes twice to its degree.
Chemical (or molecular) Graph or Pseudograph: a graph or pseudograph representation of a chemical compound or molecule. Normally, but not always, in chemical graph theory use is made of hydrogen-suppressed (or depleted) graphs of pseudographs, i.e., chemical graphs of pseudographs whose hydrogen atoms, if there are some, have been deleted, leaving only the non-hydrogen atoms, i.e., second or higher-row atoms, whose principal quantum number is n ≥ 2. Throughout the present paper we will be concerned with these types of chemical graphs and pseudographs (Figure 1), even if they are just cited as graphs. Complete Graphs: A graph G is complete (Figure 2) if every pair of its vertices are adjacent. A complete graph of order p is denoted by K p , (p-1=r) and is r-regular. A graph is r-regular if it has all vertices with the same degree r. Figure 2. From left to right: the K 1 , K 2 , K 3 , K 4 and K 5 complete graphs.
To encode the inner-core electrons of heteroatoms, odd complete graphs, K p -(p-odd) where p = 1, 3, 5, and 7 will be used. In Figure 3 the pseudograph-odd-complete graph for the CH 3 Cl molecule is shown. Figure 3. The pseudograph-odd-complete graph for the hydrogen-suppressed CH 3 Cl molecule. The inner-core electrons of C and Cl are encoded with K 1 and K 3 complete graphs, respectively.
Adjacency Matrix of a graph: is a square and symmetrical matrix of order n, where n is the number of vertices (i.e., atoms) of the chemical graph, and whose elements g ij are equal to ones if the vertices i and j of the graph are adjacent otherwise they are zero. Self-connections are not allowed in the adjacency matrix of a graph matrix, i.e., g ii = 0. A pseudograph adjacency matrix encodes not only the features of a graph adjacency matrix but has g ii = ps ii ≠ 0 , where ps i,i encodes the pseudograph characteristics of the adjacency matrix. In this matrix ps ii equals the sum of the self-connections (or loops, which are counted twice) and multiple connections of vertex i. Thus, the hydrogen-suppressed pseudograph of a triatomic system, which also includes information about the odd complete graphs for the inner-core electrons (Pogliani, 2002a) Factor (p⋅r + 1) -1 Kp encodes the odd complete graph characteristics. This factor depends on the K p of each vertex. Its contribution renders the A matrix asymmetrical, as it is evident from the following particular 3x3 matrix for CS 2 (K 1 for C and K 3 for S, atom superscripts denote the row), The term 1/1≡ 1 has been written to allow an easier decoding of the formalism. The defined concept of a vertex degree or valency of atom i, δ i , of a chemical graph is thus the number of simple connections (i.e., σ bonds) present in i in a hydrogen-suppressed chemical graph, and is the sum of the g ij elements, in a row of the matrix, A. The vertex degree or valency of atom i, δ i v (ps) (valence delta) of a chemical pseudograph is instead the number of total connections, including self-connections (i.e., σ, π bonds and non-bonding electrons) present in i in a hydrogen-suppressed chemical graph. It is the sum of the [g ij + ps ii ] elements of a row of the A matrix. The vertex degree or valency of atom i, δ i v of a chemical pseudograph plus the odd-complete graph for the contribution of the inner-core electrons can, thus, be directly obtained by the aid of the following algorithm: It is practically the sum of all the elements in a row of the full A matrix (see matrix 2). For p = 1, δ v = δ v (ps), and in alkanes, δ v = δ v (ps) = δ. For the different halogens of the halo-compounds we have p = 1, 3, 5, 7, i.e., δ v (F) = 7, δ v (Cl) = 7/7, δ v (Br) = 7/21, δ v (I) = 7/43. Parameter p⋅r in Eq. (3) equals Σ i δ i for the complete graph, and is an interesting invariant in graph theory, as the Handshaking theorem of graph theory states that it equals twice the number of connections, since a connection occurs with two vertices, thus, it contributes twice to the sum of the degree of the vertices (Rosen, 1995).

THE GRAPH THEORETICAL BASIS INDICES
The chosen subset of graph theoretical basis indices, {β}, and the raw material of QSPR (Quantitative Structure-Property Relationships) studies is made up of the following medium-sized subset of eight molecular connectivity indices that are defined within the frame of molecular connectivity theory (Kier & Hall, 1986) {β} These indices are based on the δ and δ v connectivity numbers of a hydrogen-suppressed graph and pseudograph plus K p -(p-odd) graph (for the inner-core electrons) respectively, and their definitions are Index χ t (and χ v t ) is the total molecular connectivity index. The sums in Eqs. (5 and 6), as well as the product in Eq. (8), are taken over all n vertices (i.e., atoms) of the chemical graph (i.e., molecule). The sum in Eq. 7 is over all edges (σ bonds in a molecule) of the chemical graph. By replacing δ with δ v (see Eq. 3) in all these Eqs. the subset of valence With the aid of these basis indices it is then possible to construct, through trial-and-error, higher-order structural invariants, S, among which are the molecular connectivity terms, X = f(χ), (Pogliani, 2000). These terms have the general form of a rational function, Here β is a basis index, and S = X if β = χ. Depending on the type of β basis indices other higher-order indices can be constructed (Pogliani, 2002a;2002b). The optimization parameters, a -d, m -s, can either be negative, or zero or one. In these last two cases the rational function condenses into a much simpler form. As can be seen from Eq. (9) the power of each basis index is again optimized, which means that the original power (-1/2, see Eqs. (5-8)) loses its restrictive meaning.

THE STRUCTURE-PROPERTY RELATION
Two types of Structure-Property relation will here be used: ( Parameter Q has no absolute meaning as it is an 'intra' statistical parameter able only to compare the descriptive power of different descriptors for the same property; further, this property should always be given in the same scale. The F ratio, which has the character of an 'inter'-statistical parameter, tells us, even if Q improves, which additional descriptor endangers the statistical quality of the combination. For every index of a linear combination as well as for U 0 the fractional utility, u i = c i /s i , where s i is the confidence interval of c i , and the average fractional utility <u>=Σu i /(ν+1) is given. If the modeling relation is linear, then <u> = (u 1 +u 0 )/2. The utility statistics gives indirect information about the role of the descriptor in the modeling equation, as it allows the detection of descriptors that give rise to unreliable coefficient values (c i ), whenever they have a high deviation interval (s i ). Recently (Pogliani, 2002b;2002c), the critical importance of the standard deviation of the estimate s has been underlined, so that it is advantageous to know directly how much this statistic improves along a series of improved descriptors. For this reason we introduce here the ratio s R = s 0 /s i , where s 0 is the s value of the best single-index description and s i refers to the s values of the improved sequential descriptions. Thus, halving of s i can be read as a doubling in s R , which will allow a direct measure of the progress of s along a series of sequential descriptions. It should be stressed that, now, (i) all statistical parameters will grow with improved modeling (ii) every model is under the control of all these statistics, and (iii) nothing justifies using an improved Q as a sign of improved modeling. The richness in statistical parameters can also be used to detect possible printing errors, as redundancy is very useful in the construction of self-correcting codes. For an interesting discussion about the Q statistics see Todeschini (2001). To avoid bothering the reader with the dimensional problems of the modeling equation, every property P should be read as P/P° where P° is the unitary value of the property, so that this choice allows P to be read as a purely numerical number (Berberan-Santos & Pogliani, 1999). Table 1 shows the experimental induced dipole moment values, the corresponding calculated ones, and the residual modulus, |∆E| = |µ(E) -µ(C)|. Throughout the present modeling study we will follow the division in subclasses proposed by a recent molecular mechanics study, with different MM3 algorithms completed by quantum mechanical parameters (Ma, Lii & Allinger, 2000). At the present level of sophistication in molecular connectivity (MC) studies, and, also it seems, in the MM3 studies, it is practically impossible to model the induced dipole moments for an entire class of heterogeneous compounds without diminishing the quality of the overall modeling. The studied compounds have a simple and nearly constant topology. Nevertheless, if the modeling of the entire class of compounds is attempted the functional groups introduce consistent discontinuities in the quality of the model of this property. This can be clearly seen with the poor model of the subclass, aldehydes, ketones, acids and esters, which are made up of four different subclasses.
The reader should not forget that for the following subclasses of compounds the odd complete graph algorithm gives rise to δ v = δ v (ps), a value that can also be obtained with other algorithms based on atomic concepts (Kier & Hall, 1986;Pogliani, 2002a). The best descriptor for this particular property in this set of compounds is the following combination of basis indices (here: s 0 = 0.35) The attentive reader should keep an eye on the 0 χ v index, as it seems important for this and the next property. The correlation vector, C, of the last description will be used to derive the calculated µ(C) values and the corresponding residual modulus, |∆E| = |µ(E) -µ(C)|, for this class of compounds. These last two sets of values, µ(C) and |∆E|, shown in Table 1, underline the good quality of the modeling of this property for this class of compounds. The following interesting X term could also be detected,

Aldehydes, Ketones, Acids, and Esters
It is practically impossible to achieve a satisfactory description of the dipole moment of this class of compounds with the molecular connectivity indices alone. The best descriptor, which is a rather poor descriptor, is the following X term, (here, s 0 = 0.56) X = [D v -1.4⋅D] 0.7 ( 0 χ) 1.1 : Q = 2.03, F = 35, r = 0.804, s R = 1.4, n = 21, <u> = 11 u = (5.9, 16), C = (-0.08643, 3.54828) From the calculated and residual values of Table 1 we note that (i) the modeling is far from being optimal, but (ii) nevertheless these values are not at all absurd, and a large deviation can only be detected for formic acid.

Sulfides and Phosphines
Table 1 also shows the experimental and calculated dipole moment values, as well as the residual modulus for sulfides and phosphines. Now, due to S and P atoms, with n = 3, the K p -(p-odd) algorithm The calculated and residual values of sulfides and phosphines shown in Table 1 demonstrate the good quality of the model.
Throughout this case index 0 χ v is the best single-index descriptor. The dominant character of this index allows us to use the forward combination procedure, or greedy algorithm (Pogliani, 2000), to derive the best combinations of indices, among which the following combination of four connectivity indices shows an exceptional modelling quality (here, s 0 = 1.15). It is also worth noting that the following optimal combination for <α(E)> is also a good descriptor of the single α 1 (E), α 2 (E), and α 3 (E) properties. Thus, this linear combination is, practically, validated by the nice modeling of the α i (E) polarizabilities. 5 6.5 (12, 9.5, 7.7, 2.7, 3.4 The following simple but interesting term, X = [3⋅ 0 χ v + 1 χ], could be detected, which has: Q = 1.414, F = 1587, r = 0.984, s R = 1.6, n = 54, <u> = 21 for <α>. The relatively low utility value of the constant unitary index, u 0 , is mainly due to the vanishingly small value of the corresponding regression parameter c 0 . Small deviations near zero can have a dramatic effect on the utility value. The calculated <α(C)> values in Table 2 Table 2. Experimental <α(E)>, α i (E) (i = 1-3), computed <α(C)> molecular polarizabilities, and the corresponding residual modulus ∆α of fifty four organic compounds in units of Å 3 . <α loo > is the predicted value based on the leave-one-out method and ∆ loo  is the corresponding residual.*   10.14 6.70 * <α(E)> = Σ i α i (E)/3, α 1 , α 2 , and α 3 are the principal molecular polarizabilities. Some <α(E)> values were computed with quantum methods (Ma, Lii & Allinger, 2000). To avoid unreliable linear combinations due to collinearity among the basis indices, while maintaining their modeling power, it is advantageous to orthogonalize the corresponding basis indices. Or rather to obtain the orthogonal correlation coefficients of the correlation vector C(Ω). For example, for <α(C)> there is no need to derive single Ω i values as these correlation coefficients can be obtained with the aid of the coefficient of the sequential regressions (Randić, 1991;Pogliani, 2000). Thus, the orthogonal correlation vector for S(Ω) = ( 1 Ω, 2 Ω, 3 Ω, 4 Ω, U 0 ) ← S = ( 0 χ v , 1 χ, D v , χ t v , U 0 ), is: C(Ω) = (2.36719, 0.83838, -0.12524, 0.23527, 0.11242).

CONCLUSION
Graph-theoretical tools based on concepts defined within the framework of the molecular connectivity theory are able to optimally model the mean polarizability of fifty-four organic compounds <α(E)>, including forty values of the polarizability components, α 1 (E), α 2 (E), and α 3 (E). These last values have been 'externally' modeled with the best descriptor for <α(E)>. For the induced dipole moments the influence of the functional groups play a critical role in determining the quality of the modeling, as already suggested by a molecular mechanics study (Ma, Lii & Allinger, 2000). Following a suggestion from the cited MM study, four different subclasses were chosen for the modeling study. The resulting modeling is rather encouraging for the subclass of alcohols, amines, ethers, and, even more so for the subclass of sulfides, and phosphines, which include molecules with only two different functional groups. The modeling is unsatisfactory for the subclass {aldehydes, ketones, acids, esters}, which is made up of compounds with four different functional groups. The introduction of other types of connectivity indices, like semiempirical terms (Pogliani, 2000) might also help to improve the model. Actually, one of the main difficulties in molecular connectivity modeling, and in other modeling studies also, is mimicking the role played by the quantitatively unknown intermolecular interactions that in many cases shape the overall short-or long-lived supramolecular structures (Pogliani, 2002b;2002c). Dipole moments play a bigger role than polarizability in shaping the overall topology of the supramolecular species, and this could be the main reason for the poor modeling of this property, especially for those subclasses of compounds made up of molecules with a large variety of functional groups.
A pivotal tool of the present modeling study is surely the introduction and use of odd complete graphs to encode the inner-core electrons of heteroatoms. This is in line with recent studies that have underlined the importance of these types of graphs in solving the problem of the inner-core electrons in chemical graph theory (Pogliani, 2002a), and especially in molecular connectivity theory. Other studies do not exclude the possibility of using sequential complete graphs, where p = 1, 2, 3, 4, ... (Pogliani, 2003a;2003b)

ACKNOWLEDGMENTS
The author would like to thank the two anonymous referees for their valuable hints for improving the paper.