How Reliable are Models Based on Topological Index 3 χ v for the Prediction of Stability Constants ?

The theoretical models based on valence connectivity index of the 3rd order, 3χv, have been discussed in terms of their ability to predict stability of coordination compounds. The key factors for the success are: (1) the choice of reliable experimental data for the calibration of the model, (2) writing an appropriate constitutional formula (i.e. graph) of the complex, and (3) development of proper form of regression function. If these requirements were met, it is possible to obtain theoretical results comensurable with the experimental ones, i.e. of the sufficient quality to evaluate experimental methods or to propose the best values for stability constants.


INTRODUCTION
HE FIRST systematic application of topological indices for the prediction of stability constants of coordination compounds occured in 1999, when appeared the paper of our research group on copper(II) chelates with N-alkylated glycines attempting to correlate measured stability constants with four topological indices -Wiener number (W) and three consecutive valence-connectivity indices ( 1 χ v , 2 χ v , 3 χ v ). [1]The next paper in the series, [2] dealing with the mixed amino acid complexes, confirmed that the best topological index for this purpose is valence-connectivity index of the 3 rd order, 3 χ v , and thus we developed our further models exclusively on it.
There are many factors that determine stability of coordination compounds.Analysis of these factors led first to general rules (Irving-Williams order, HSAB model, rules for chelate, transand ring-size effect, etc.), and latter to many theoretical methods of a various levels of sophistication (molecular-mechanics, DFT and various kinds of QSPR models). [3]However, these methods use many molecular descriptors (e.g.QSPR) [4,5] or deal with specific interactions (e.g.MM). [6,7]From this point the use of a single molecular descriptor, valence-connectivity index of the 3 rd order to represent all the variety of interactions determining the stability of the complex looks a bit naive.10][11][12] The vague interpretation of graph-theoretical indices led to a sceptical attitude of many chemists towards them and especially to their application in QSPR and QSAR. [13,14]In spite of that, we expoited the advantage of models with one descriptor and applied 3 χ v index, as a sole descriptor, to predict the stability constants of a variety of coordination compounds (Table 1), ranging from copper(II), nickel(II), and other heavy metal (including lanthanides) chelates with diamines, triamines, amino acids and their N-alkylated and fructose derivatives, [15] to the complexes of monocarboylic acids, [16] and smaller peptides. [17,18]The models proved reasonably good, with a typical standard error (S.E.) about 0.3 log units.
The aim of this paper is not, however, to give a comprehensive review of all the applications and variants of our method -we will rather focus our attention on two vital points.The first is evaluation of experimental data, the second is -as the tittle suggests -to explore the suitability of our models to predict stability constants with appropriate accuracy.

Case 1: Evaluation of Two Electroanalytical Methods
Here we present our research on the comparison and evaluation of the stability constants obtained by two electroanalytical methods, namely glass electrode potentiometry (GEP) and square wave voltammetry (SWV).The methods were used to measure stability constants of copper(II) monoand bis-complexes with alanine and its five N-alkylated derivatives. [19]All the constants were measured at the same temperature (T = 298 K), but in a slightly different background electrolyte: I(GEP) = 0.1 mol L -1 solution of KNO3, I(SWV) = 0.15 mol L -1 solution of NaClO4.However, they differed considerably.The constants for mono-complexes, log K1, measured by the two methods differ 0.01 -0.64 (mean = 0.26).The constants obtained by GEP were lower than those obtained by SWV method.They are also correlated to SWV values, but worse than values obtained by the best theoretical model (Figure 1).
Calculating 3 χ v index from two constitutional formulas (i.e.graphs) of the complex (see 3.1.)we obtained two linear regression models for each, GEP and SWV methods.Better agreement between theory and experiment was achieved for GEP than for SWV method; S.E.= 0.17, S.E.cv = 0.33 for GEP and S.E.= 0.32, S.E.cv = 0.50 for SWV (the reported values are averages of both models).However, similar analysis on the stability constants of bis-complexes (log β 2) showed just the opposite; the SWV constants (S.E.= 0.43, S.E.cv = 0.73) were better reproduced than GEP constants (S.E.= 0.53, S.E.cv = 0.97).(The reported values are averages of four models.) [20]ose findings were confirmed by an additional test.Instead of regression lines developed on copper(II) complexes with alanines, for prediction of their constants we used regression lines developed on a similar system, copper(II) chelates with glycine and its N-alkylated derivatives [21] along with three aliphatic amino acids (Ala, Val, Leu).Despite these contants were measured by GEP, the same resulted.Namely, comparison of theoretical with experimental GEP and SWV values yielded rms 0.21 and 0.47, respecitvely, for log K1 (the average of two models).The same comparison of log β 2 (the average of four models) gave rms 0.79 and 0.64 for GEP and SWV, respectively.
These findings were further elaborated by the analysis of both experimental methods. [20]Data for log K1(SWV) were measured around the detection limits, and log K1(GEP) constants were determined in the pH range where the response of glass electrode is due only to the formation of ML species.

Case 2: Comparison with DFT Method
The second example is related to the study of copper(II) binding to aromatic ligands with a common core of the thioflavin T (ThT) and clioquinol (CQ) molecule, which were investigated as potential drugs against Alzheimer's disease (Figure 2). [22,23]The authors applied various theoretical methods on those systems, calculating stability constants of bis-complexes (log β 2) by DFT, along with HOMA and aromatic indices (ING), [22] but unfortunatelly they measured stability constants of only two of altogether ten complexes. [23] , respectively). [23]inear correlation of DFT constants with 3 χ v index yielded S.E.= 0.85 and S.E.cv = 0.97 (r = 0.988) for all ten complexes. [24]However, thia-complexes could be regarded as a separate group and if they were discharged (N = 7), S.E.drops to 0.30 with the absolute values of residuals in the range 0.0 to 0.6 (Figure 2).
Advantage of our method is its simplicity; disadvantage is its inability to predict stability constants without a proper set of experimental data.However, DFT method should be capable to predict experimental data without any experimental constants.As in this case it failed to do that, it cannot be judged as advantageous to our method.

RELIABILITY
As was said before, methods based on the valence connectivity index of 3rd order reproduce experimental constants with a typical standard error of 0.3 log units.But the success depends from the one hand on the quality of experimental data and from the other on the quality of regression model.It is recommended to use the values of stability constants from the same paper, or at least issued by the same research group.Despite the standardization of methods and experimental conditions (temperature, ionic stength, and background electrolyte), the best constants ("recommended", according to IUPAC criteria) are determined with S.D.≤ 0.05 log units, and the majority (denoted as "tentative") with 0.05 < S.D. < 0.2 log units. [25]In a test case, constants determined in different laboratories differed up to 0.3 log units, [26] that is close to log K1 values for copper(II)/glycine system (T =298 K, I = 0.1 mol L -1 , GEP), whose values were referred in the range 8.07 -8.38 log units (all the values were denoted as "tentative"). [27]An additional problem in choosing the appropriate set of stability constants is the usual practice that researchers focus their attention to ligands, not to metals, thus in the same paper usually stability constants of a few ligands with many metals are referred.
However, a wide range of values of the measured stability constants opens a possibility to test our method.From the histogram of 14 above mentioned "tentative" log K1 values for copper(II) monoglycinate, the most probable value should be setled at log K1 = 8.21 (Figure 3).(Note that the distribution is not Gaussian!)However, an analogeous histogram (Figure 4) for seven theoretical (estimated) values gave log K1 = 8.19 -the difference is only 0.02 log units!("The best" value for T = 298 K, I = 0.1 -0.2 mol L -1 was referred as log K1 = 8.20 ± 0.10.) [27]

Problem of the Proper Constitutional Formula
The first problem in application of models based on topological indices on coordination compounds is the construction of appropriate constitution formula (or graph).In contrast to organic compounds, that are defined by their constitution, coordination compounds are actually defined by their composition; stability constant K1(ML) refers to equilibrium equation M + L ⇄ ML and to nothing else.The structure of the complex is usually unknown, and even if it was determined in the crystal state it is dubious if the complex persists in such a form in solution.
There is a simple way to find out, from the one hand, the proper structure of the complex in dissolved state, and from the other to prove the soundness and reliability of our Regression models for aromatic ligands with a common thioflavin and clioquinol core (Ref.[24]).After refuting three tia-complexes, S.E.drops from 0.85 to 0.30, and S.E.cv from 0.97 to 0.39.In the first paper on systematic application of 3 χ v index to coordination compounds [28] we checked four constitutional formulas, i.e. corresponding graphs.The first was the graph of free ligand (L), the second was the graph of metal-ligand complex (ML), and the third graph corresponds to metal-ligand complex with two ligated water molecules, ML aq .The fourth graph was based on the presumtion that the side atoms of some ligands also bind to metal, or influence otherwise its coordination sphere (ML cor ).Namely, it had been assumed that the terminal atom of the side chain was bound to metal, and from this assumption corresponding molecular graph was constructed.For copper(II) 1,2-diaminoethane complexes (N = 14) 3 χ v (ML cor ) proved best (S.E.cv = 0.38) and 3 χ v (ML aq ) gave acceptable results (S.E.cv =0.49).However, 3 χ v (ML) yielded S.E.cv = 0.54, and for 3 χ v (L) S.E.cv rise to 0.62 log units.The similar trend has been observed for log β 2 constants, and also for both log K1 and log β 2 constants of copper(II) amino acid complexes.It was even shown that some, presumablly existing, bonds between metal and ligand should be removed from the molecular graph, as in the case of copper(II) complexes with diethylenetriamines [29] or copper(II) and nickel (II) complexes with N-phenylimidoacetic acid. [30]For cadmium(II) mono-complexes with monocarboxylic acids (N = 9), the excellent results (r = 0.983, S.E.= 0.05, S.E.cv = 0.06) were obtained after supposition that only one ligand, 2-hydroxybutanoic acid is bidendate (other two 2-hydroxycarboxylic acids were taken as monodentate). [31]In contrast to this, if all the ligands were taken as monodentate linear regression gave r = 0.778, S.E.cv = 0.26.

Problem of the Proper Regression Function
The most common regression functions that we used for the prediction of stability constants are linear and quadratic ones.They are also the simplest, but by using indicator variable(s) we succeded to develop even linear models for rather complex systems.Assuming the regression lines for copper(II) and nickel(II) complexes are parallel, it was possible to propose a common model for copper(II) and nickel(II) bis-complexes with amino acids, that yielded even (slightly) better S.E.cv value (= 0.24 log units) than the separate models for each metal. [32]That approach was latter routinely applied in the common models for copper(II) and nickel(II) complexes of iminodiacetates and pyridyl derivatives of aspartic acid, [30] N-salycidene-aminoacidato complexes with Cu 2+ , Ni 2+ and Zn 2+ , [33] and for the prediction of stability of copper(II)/peptide complexes. [17,18,34]By taking one of stability constants as a reference, it has been possible to build a common models for complexes of five metals (Co 2+ , Ni 2+ , Cu 2+ , Zn 2+ , Cd 2+ ) with four monocarboxylic acids (methanoic, ethanoic, propanoic and butanoic) [16] or for copper(II) complexes with tripepties containing glycine, histidine and glutamic acid residues. [35]owever, we have also used models of higher complexity.The first is polynomial model with 3 χ v , r and 3 χ v r variables (r stands for atomic radius of the central atom) for lanthanide complexes. [36]The second are models with variables calculated as differences between 3 χ v values of various molecules. [37]Unfortunately, despite many models checked, the general form of the regression function has not yet been found.Majority of regression lines have negative [39] (Tables 3 and 4), [20] (Table 6) and [38] (from linear model for aliphatic and both linear and quadratic model for polar amino acids).Mean values: 8.25 (N = 7), 8.19 (8.15 -8.24 range, N = 4).
slope; actually the sole exception is the regression function for the cadmium(II) mono-complexes with monocarboxylic acids, having a positive slope. [31]It seems that the form of regression function is determined by the nature of the ligand and consequently by its interactions with the central atom.In our recent paper [38] we have shown that nonpolar amino acids fit best the line of negative slope, but polar amino acids fit parabola better.By using such a division, we were able to reproduce log K1 and log β 2 values with S.E.cv 0.03 and 0.06 log units, respectively.

Problem of the Range of Stability Constants
As we tried our models on many systems, we were doing regressions with many sets of stability constants and finished our research with a various success.In short, S.E. of models varied from 0.03 to even 1.39 log units, but the difference between the highest and lowest constant employed in regression varied also considerably, from 0.32 to 30.62 log units.However, plot of stability constant ranges vs. S.E./logK range ratio (Figure 5) reveals that the relative standard error (S.E./ΔlogK) is in the range 0.02 -0.05 for most cases.That means that our models are generally capable to predict stability constants with S.E. of about five percent of the range of constants employed in the regression.Regression of the function S.E./ΔlogK = a/Δlog K (Figure 5) reveals the grouping of points around the curve a ≡ < S.E.> = 0.06 up to Δlog K = 10.The separate group of points Δlog K > 15 corresponds to the dissocitation constants of peptide complexes (i.e.constants for ML, MLH-1, MLH-2 etc. complexes calculated for separate ligands, L). [17,18,34,35]

CONCLUSION
The method for the prediction of stability constants of coordination compounds from topological indices is strictly empirical.That means it cannot be developed for unrelated systems, i.e. for the compounds differing much in the structure of ligand or in the nature of central atom.But from the other hand, it is a very valuable tool in studying the stability of molecules composed of similar metals (e.g.Cu(II) and Ni(II), and similar ligands (e.g.α-amino acids and their Nalkylated derivatives).In this case it is possible, as was shown in this contribution, to obtain results of the same quality as those worked out by DFT method, to compare reliability of methods for determination of stability constants, or to find out the best estimate of their values.Therefore we hope that this simple method, in both conceptual and computational sense, will find its way to the people dealing with stability constants, especially with their measurement and rafinement.

( a )
Other topological indices were also used.(b)Aromatic ligands unrelated to amino acids.
./ log K  log K