Correction of Charge-Transfer Indices for Multifunctional Amino Acids: Application to Lysozyme

Valence topological charge-transfer (CT) indices are applied to the calculation of pH at the pI isoelectric point. The combination of CT indices allows the estimation of pI. The model is generalized for molecules with heteroatoms. The ability of the indices for the description of molecular charge distribution is established by comparing them with the pI of 21 amino acids. Linear correlation models are obtained. The CT indices improve multivariable regression equations for pI. The variance decreases by 95%. No superposition of the corresponding G k –J k and G k V –J k V pairs is observed in most fits, which diminishes the risk of collinearity. The inclusion of heteroatoms in π-electron system is beneficial for the description of pI, owing to either the role of the additional p orbitals provided by heteroatom or role of steric factors in π-electron conjugation. The use of only CT and valence CT indices {G k ,J k ,G k V ,J k V } gives limited results for modelling pI of amino acids. Furthermore, the inclusion of the numbers of acidic and basic groups improves all models. The effect is specially noticeable for amino acids with more than two functional groups. The fitting line obtained for the 21 amino acids can be used to estimate the isoelectric point of lysozyme and its fragments, by only replacing (1+Δn/n T) with (M+Δn)/n T. For lysozyme, the results of smaller fragments can estimate that of the whole protein with 1–13% errors.


[g012]
indices that were mainly determined by the characteristics of the secondary functional groups in amino acids [1,2].As this property is highly dependent on the type of side chain an amino acid has, the normal connectivity indices of set eight achieved a totally unsatisfactory modelling.The construction of the first fragmentary molecular connectivity indices was awkward.An entirely new and sound set of fragmentary molecular connectivity terms was proposed, which were derived with an easy trial-anderror procedure [3][4][5].These terms are defined in the following way where Δn = n A -n B , n A = number of acidic groups (two for Asp and Glu, one for all others), n B = number of basic groups (two for His and Lys, three for Arg, as well as one for all others), and n T = n A + n B (total number of functional groups); notice that for n T = 2, Δn = 0. Clearly there are eight such terms following the type of index that enters in numerator χ.The nomenclature for such terms can be defined in the following way for χ = D v → X ≡ D X v , etc.The best single descriptor for pI is 0 X v with Q = 2.12, F = 267, r = 0.966, s = 0.46, u = (16,28).The statistics, specially the utility statistic, seem quite satisfactory.Now statistic Q can be improved at the expenses of statistics F and u, with the following linear combination of X terms made up of connectivity indices, which can be derived by the aid of both forward and full combinatorial techniques 53, F = 95, r = 0.980 , s = 0.39, u = 3.1,2.8,4.7,2.8,26

( )
Average <u> drops from 22.4 to 7.9, the utility of 0 X v drops dramatically, and only the unitary index maintains a good utility.
To improve these utilities and detect possibly dominant descriptors, use is made of the following vector of orthogonalized terms: 4 Ω ← 0 X.The orthogonalized vector shows the following utilities: u = (19,1.3,1.0,2.8,33).The utility vector indicates that only the first 1 Ω ≡ 0 X v and last U 0 ≡ Ω 0 ≡ 1 parameters are important descriptors.
We are thus back to the single-term description but with an enhanced utility for 1 Ω and U 0 : 19 and 33 instead of 16 and 28.Notice that the statistical score of the molar masses for pI is Q = 0.002 and 14.An inspection of the interrelation between the eight terms confirms their small interrelation as <r IM (pI:{X})> = 0.560, r w ( D X,X t ) = 0.004 and r s ( D X, 1 X) = 0.975, where r w and r s stand for the weakest and strongest interrelations, respectively.A critical analysis of the 0 X v term lets us notice that this term is trivial, as it is nothing other than (1 + Δn/n T ) [6].Now as the best description is given by a relation consisting of only this term, this means that molecular connectivity indices are not needed to simulate this property.Let us resort to a deeper trial-and-error search, discovering the following notat-all trivial term The modelling power of this dominant term is remarkable: Q = 3.41, F = 693, r = 0.987, s = 0.29, <u> = 58, u = (26,90), and the correlation vector C = (77.99429,5.75382).Thus the final modelling equation can be written as pI = 5.75 + 77.99X' pI .Not only is the improvement in F and u more than expected but, furthermore, this term is a highly dominant dead-end term, as it does not allow any better combination with any other index or term.The term like the preceding 0 X v term is mainly based on valence-type molecular connectivity indices, an expected result as side-chain functional groups in amino acids are rich in double bonds and lone-pair electrons.
The generation and decomposition of amino-acid and peptide radicals are processes of great biological importance, due to their connection to the oxidative damage caused by ionizating radiation or oxidizing agents [7,8].Moreover, several experimental studies showed that amino-acid and peptide radical cations can be generated by the electrospray technique and peptide cationization using Cu 2+ [9].The mass spectra obtained in these cases are rich and differ considerably from those of protonated systems, which can provide useful information in peptide sequencing.In order to shed some light on the properties of amino-acid and peptide radical cations, the group of Sodupe performed quantum chemical calculations on nine amino acids and the smallest N-glycylglycine peptide [10,11].They discussed the influence of intramolecular hydrogen bonds and amino-acid side chain on the localization of the electron hole upon oxidation and subsequent fragmentation process.They showed that for systems involving aromatic amino acids, oxidation is mainly produced at the side chain, whereas for non-aromatic ones oxidation is produced either at the basic NH 2 or CO groups, the nature of the electron hole depending on the existent intramolecular hydrogen bonds.In earlier publications, topological charge-transfer (CT) indices were applied to the calculation of the molecular dipole moment of hydrocarbons [12], valence-isoelectronic series of benzene, styrene [13,14] and cyclopentadiene [15], as well as phenyl alcohols [16] and 4-alkylanilines [17].In the present report, the valence CT indices have been applied to the calculation of pH at the pI isoelectric point of 21 amino acids.Section 2 presents the CT indices and their generalization for heteroatoms.Section 3 presents and discusses the calculation results.Section 4 summarizes the conclusions.

Results and Discussion
The molecular CT indices G k , J k , G k V and J k V (with k < 6) are reported in Table 1 for 21 amino acids.Hydroxyproline (4-hydroxypyrrolidine-2-carboxylic acid, Hyp) differs from proline (Pro) by the presence of a hydroxyl (-OH) group attached to the C γ atom.The G k indices contain both CT and size effects, e.g., G k (Pro) < G k (Hyp).The size effect is eliminated in the J k , e.g., J 2 (Pro) > J 2 (Hyp).The   The pI isoelectric points (calculated with Equation 10) for the 21 amino acids are also included in Table 2.For Equation (10) the absolute relative errors results 5%.The pI isoelectric points The variation of the pI isoelectric point as a function of (1+Δn/n T ) for the 21 amino acids (cf. Figure 2) shows that some amino acids appear superposed.The fitting line corresponds to the 21 amino acids; both amino acids that are the farthest are His and Lys (n B = 2).The pI isoelectric points (calculated with Equation 14) for the 21 amino acids are also included in Table 2.For Equation ( 14) the absolute relative error decreases to 4%.The pI isoelectric points (calculated with Equation 14 and experimental) for the 21 amino acids are displayed in Figure 1b.For Equation ( 14) the error is reduced for most amino acids; in particular for His and Lys the error decreases to 0.6 units.
The molecular CT indices are collected in Table 3 for lysozyme, five fragments of its tertiary structure and its binding site.In general, the CT indices do not distinguish α-helices, 3.0 10 -helix, β-sheet and binding site.In particular both J k and J k V indices for the whole molecule are similar to those for the α-helices and, specially, for α-helix D.
Table 3.Values of G k and J k charge-transfer indices up to fifth order for lysozyme and its fragments.Fragment  (15) where M is the number of amino-acid residues in the protein or fragment.The choice seems sensible as pI values are strongly dependent on the type of side-chain functional groups.
The pI isoelectric points (calculated and experimental) for lysozyme and its fragments not included in the fit are reported in Table 4.The calculation result for α-helix A (M = 11 residues) is an estimate for that of the whole lysozyme (M = 129 residues) with a relative error of 13%.Furthermore, the inclusion of the other two α-helices (A+B+D, M = 31 residues) reduces the error to 1%.The variation of the pI isoelectric point for lysozyme (experiment) and its fragments (calculation) as a function of (N+Δn)/n T (Figure 2) shows that some fragments appear superposed.
Both lysozyme and its fragments lie in the fitting line obtained for the amino acids.

Experimental Procedures
The most important matrices that delineate the labelled chemical graph are the adjacency (A) [18] and distance (D) matrices, wherein D ij = ij if i = j, "0" otherwise; ij is the shortest edge count between vertices i and j [19].In A, A ij = 1 if vertices i and j are adjacent, "0" otherwise.The D [-2]   matrix is that whose elements are the squares of the reciprocal distances D ij -2 .The intermediate matrix M is defined as the matrix product of A by D [-2] : The CT matrix C is defined as C = M -M T where M T is the transpose of M [20].By agreement C ii = M ii .For i ≠ j, the C ij terms represent a measure of the intramolecular net charge transferred from atom j to i.The topological CT indices G k are described as the sum of absolute values of the C ij terms defined for the vertices i,j placed at a topological distance D ij equal to k: where N is the number of vertices in the graph, D ij are the entries of the D matrix, as well as δ is the Kronecker δ function being δ = 1 for i = j and δ = 0 for i ≠ j.The G k represent the sum of all the C ij terms, for every pair of vertices i and j at topological distance k.Other topological CT index, J k , is defined as: The index represents the mean value of CT for each edge, since the number of edges for acyclic compounds is N -1.
When heteroatoms are present, some way of discriminating atoms of different kinds needs to be considered [21].In valence CT-index terms, the presence of each heteroatom is taken into account by introducing its electronegativity in the corresponding entry of the main diagonal of the adjacency matrix A. For each heteroatom X its entry A ii is redefined as: to give the valence adjacency A V matrix, where χ X and χ C are the electronegativities of heteroatom X and carbon, respectively, in Pauling units.The subtractive term keeps A ii V = 0 for the C atom, and the factor gives A ii V = 2.2 for O, which was taken as standard.From A V instead of A, M V , C V , G k V and J k V are calculated following the former procedure.The C ii V , G k V and J k V are graph invariants.
The enzyme protein lysozyme (129 amino-acid residues, molecular weight 14307g•mol -1 ) has been taken from the Protein Data Bank code 2LYM.The charge on lysozyme is +12.0e at pH 4.0, +8.0e at pH 7.0, +4.0e at pH 10.0 and decreases rapidly as the isoelectronic point at pH 11.35 is approached [22].
From the present results and discussion the following conclusions can be drawn.
1.The inclusion of heteroatoms in the π-electron system was beneficial for the description of the isoelecric point, owing to either the role of the additional p orbitals provided by the heteroatom or the role of steric factors in the π-electron conjugation.Work is in progress on the further elucidation of the value of Δn in the fractional indices for a better definition of indices, which are highly dependent on side-chain functional groups.

) n = 21 r 4 V( ) n = 21 r
= 0.958 s = 0.781 F = 7.5 MAPE = 8.25% AEV = 0.1754 and AEV decreases by 77%.However, the model is inadequate for proteins because N, G 3 , G 5 , G 2 V and G increase with n A and n B .The use of (1+∆n/n T ) = 0.5 for Arg, 4/3 for Asp and Glu, 2/3 for His and Lys, as well as one for all others improves the fit: pI = 14.8 − 9.01 1 + Δn n T AEV decreases by 91%.The correlation coefficient represents the 96.8% of that of the correlation of the means (n = 4, r = 0.997).

2 . 4 .
The use of only charge-transfer and valence charge-transfer indices {G k ,J k ,G k V modelling the isoelectric point of amino acids.Furthermore, the inclusion of (1+∆n/n T ) improved all the models.The effect is especially noticeable for those amino acids with more than two functional groups, viz.Arg, Asp, Glu, and, specially, His, and Lys.Moreover, the fractional index casts some light on the importance of the side-chain functional groups in the pI simulations of functional-rich molecules.The satisfactory modelling of the pI of 21 amino acids by the aid of a fractional index, based mainly on the Δn index, shows how to bypass the problem to derive and work with an extended set of charge-transfer indices (here, m = 20) as, in this case, a good description can be obtained with only one index.3.The fitting line obtained for the 21 amino acids can be used to estimate the isoelectric point of lysozyme and its fragments, by only replacing (1+Δn/n T ) with (M+Δn)/n T .For lysozyme, the results of smaller fragments can estimate that of the whole protein with 1-13% errors.An extension of the present study to other enzymes and proteins would give an insight into a possible generality of these conclusions, because most globular, water-soluble proteins are ionic, e.g., lysozyme (charge +8.0e) and bovine serum albumin (anionic) at pH 7.0.The present study may be also of interest in charge-migration peptide studies.

Table 1 .
Values of the G k and J k charge-transfer indices up to fifth order for 21 amino acids (AA).

Table 2 .Table 2 .
Calculated and experimental values of pH at isoelectric point pI for 21 amino acids (AA).For the {G k ,J k } chosen databasis the following best linear model turns out to be: (AEV) is 0.7718.The inclusion of N improves the correlation pI = 7.13 + 0.751N − 7.99J 1 − 15.7J 3 − 81.7J 4 (7)n = 21 r = 0.629 s = 1.499F = 2.6 MAPE = 16.95%AEV= 0.6065and AEV decreases by 21%.However, the model is limited to small N because N increases with both n A and n B , resulting inadequate for polypeptides and proteins.

Table 4 .
Values of the pH at the isoelectric point, pI for lysozyme fragments not included in the fit.