PERIODIC CLASSIFICATION OF HUMAN IMMUNODEFICIENCY VIRUS INHIBITORS

Classification algorithms are proposed based on information entropy and applied to 13 human immunodeficiency virus type 1 inhibitors. A number of results are compatible with the data suffering combinatorial explosion. After the equipartition conjecture, entropy production is most uniformly distributed. In ddI the formula is N4O3S0P0X0, X = F, Cl; it is selected as a reference <11111>. In most cases (ddI, ddC, d4T, novel proposed ligand) the formula is N3–4O3S0P0X0, while in 3TC the formula is N3O3S1P0X0. The analysis compares well with other classification taken as good. Introduction Ab initio theoretical calculations, molecular dynamics simulations and docking are useful tools for investigating important biological complexes. At least three anti-human immunodeficiency virus type 1 (HIV-1) drugs, for combination therapy, have become the standard treatment of acquired immunodeficiency syndrome (AIDS) drugs that have been licensed for clinical use, or are subjected to advanced clinical trials, belong to one of the following three classes: (1) nucleoside/nucleotide reverse transcriptase inhibitors (NRTIs/NtRTIs) [abacavir (ABC), emitricitabine [(–)FTC], zidovudine (AZT), didanosine (ddI), zalcitabine (ddC), stavudine (d4T), lamivudine (3TC), tenofovir disoproxil fumarate], (2) non-nucleoside reverse transcriptase inhibitors (NNRTIs) [emivirine, efavirenz, nevirapine, delavirdine], and (3) protease inhibitors (PIs) [lopinavir, nelfinavir, ritonavir, amprenavir, saquinavir, indinavir]. Various other events in the HIV replicative cycle can be considered as potential targets for chemotherapeutic intervention: (1) viral entry via blockade of the viral coreceptors CXCR4 [bicyclam (AMD3100) derivatives] and CCR5 (TAK-799 derivatives), (2) viral adsorption via binding to the viral envelope glycoprotein gp120 (polysulphates, polysulphonates, polycarboxylates, polyoxometalates, polynucleotides and negatively charged albumins), (3) viral assembly and disassembly via NCp7 Zn finger-targeted agents [2,2’-dithiobisbenzamides (DIBAs) and azodicarbonamide (ADA)], (4) virus-cell fusion via binding to the viral envelope glycoprotein gp41 (T-1249), (5) proviral deoxyribonucleic acid (DNA) integration via integrase inhibitors, e.g. 4-aryl-2,4-dioxobutanoic acid derivatives, as well as (6) viral messenger ribonucleic acid (mRNA) transcription via inhibitors of the transcription (transactivation) process (flavopiridol, fluoroquinolones). In addition, new NRTIs, NNRTIs and PIs have been developed that possess, respectively: (1) improved metabolic characteristics, e.g.,


Introduction
][3][4][5][6][7][8][9][10][11][12] At least three anti-human immunodeficiency virus type 1 (HIV-1) drugs, for combination therapy, have become the standard treatment of acquired immunodeficiency syndrome (AIDS) drugs that have been licensed for clinical use, or are subjected to advanced clinical trials, belong to one of the following three classes: (1) nucleoside/nucleotide reverse transcriptase inhibitors (NRTIs/NtRTIs) [abacavir (ABC), emitricitabine [(-)FTC], zidovudine (AZT), didanosine (ddI), zalcitabine (ddC), stavudine (d4T), lamivudine (3TC), tenofovir disoproxil fumarate], (2) non-nucleoside reverse transcriptase inhibitors (NNRTIs) [  The advent of so many new compounds, other than those that have been formally approved for the treatment of HIV infections, will undoubtedly improve the prognosis of patients with AIDS and AIDS-associated diseases.Nucleoside analogues constitute a family of biological molecules (ddI, d4T, ddC and T3C), which play an important role in the transcription process of HIV.The normal nucleoside substrates, used by reverse transcriptase (RT) to synthesize DNA, are mimicked by these nucleoside analogues, which lacked a 3'-OH group and, consequently, act as chain terminators when incorporated into DNA by RT.Although these nucleoside analogues show good activity as inhibitors of HIV, their long-term usefulness is limited by toxicities.Resistance and mutation are also problems.The development of better drugs requires a better understanding of how the drugs work, the mechanism of drug resistance and the interaction with the receptor, as well as the stability of the drugs inside the active site.A novel HIV RT inhibitor ligand was proposed, which indicated the highest docking scores and more hydrogen-bond interactions with the residues of the RT active site. 13A simple computerized algorithm, useful for establishing a relationship between chemical structures and their biological activities or significance, is proposed and exemplified here.The starting point is to use an informational or configurational entropy for pattern recognition purposes.The entropy is formulated on the basis of a matrix of similarity between two chemical or biochemical species.As entropy is weakly discriminating for classification purposes, the more powerful concept of entropy production and its equipartition conjecture are introduced. 14In earlier publications the periodic classification of local anaesthetics was analyzed. 15The aim of the present report is to develop the learning potentialities of the code and, since molecules are more naturally described via a varying size structured representation, the study of general approaches to the processing of structured information.

Results and Discussion
Many HIV-1 inhibitors fit the following general scheme: The seven classes show an entropy h A comparative analysis of the set of 1-13 classes is in agreement with previous results obtained for Entries 4-8.Once more NRTI ddI and novel proposed ligand are grouped into the same class and NRTI ddC, d4T and 3TC.The inclusion in the radial tree (cf. Figure 1) is in agreement with partial correlation diagrams, dendrograms, binary trees and previous results for Entries 4-8.Moreover, the classification presents lower bias and greater precision, with lower divergence with regard to the original distribution.Program SplitsTree analyzes cluster analysis (CA) data. 23Its split decomposition takes a distance matrix and produces a graph representing the relationships between taxa.For ideal data this graph is a tree whereas less ideal data will give rise to a tree-like network.As split decomposition does not force data onto a tree, it can provide a good indication of how tree-like given data are.The splits graph for the 13 HIV-1 inhibitors of Table 1 (cf.Figure 2) reveals no conflicting relationship.The splits graph is in general agreement with partial correlation diagrams, dendrograms and binary trees (Figure 1).A principal component analysis (PCA) 24 has been carried out for the HIV-1 inhibitors.Factors F 1 -F 5 show that F 1 explains 30% of the variance (70% error), F 1-2 , 55% of variance (45% error) and F 1-3 , 76% of variance (24% error).For F 1 and F 4 , i 2 has the greatest weight in the profile; however, F 1 cannot be reduced to three variables {i 1, i 2, i 4 } without a 19% error.For F 2 , i 3 has the greatest weight; notwithstanding, F 2 cannot be reduced to three variables {i 1, i 2, i 3 } without a 4% error.For F 3 , i 4 and i 5 have the greatest weight; furthermore, F 3 can be reduced to two variables {i 4, i 5 } with a 0% error.For F 5 , i 1 has the greatest weight; nevertheless, F 5 cannot be reduced to three variables {i 1, i 3, i 4 } without a 22% error.The F 1-5 can be considered as linear combinations of {i 1, i 2, i 4 }, {i 1, i 2, i 3 }, {i 4, i 5 }, {i 1, i 2, i 3 } and {i 1, i 3, i 4 } with 19%, 4%, 0%, 18% and 22% errors, respectively.In plot F 2 -F 1 (cf.Figure 3) those HIV-1 inhibitors with the same vector property appear superposed.

Experimental Procedures
The key problem in classification studies is to define similarity indices, when several criteria of comparison are involved.The first step in quantifying the concept of similarity, for molecules of HIV-1 inhibitors, is to list the most important portions of such molecules.Furthermore, the vector of properties i = <i 1 ,i 2 ,…i k ,…> should be associated with each inhibitor i, whose components correspond to different characteristic groups in the inhibitor molecule, in a hierarchical order according to the expected importance of their pharmacological potency.If the m-th portion of the molecule is pharmacologically more significant for the inhibitory effect than the k-th portion, then m < k.The components i k are "1" or "0", according to whether a similar (or identical) portion of rank k is present or absent in inhibitor i, compared with the reference inhibitor.The analysis includes such chemical compounds that fit the following general scheme: (base derivative)-(furan ring), since these are the most numerous and have the widest range of uses among the species used in practice of inhibition.The base portion is often a guanine (Gua) or cytosine (Cys) derivative; the furan ring normally contains one O heteroatom.In didanosine (ddI) the base is a Gua derivative, and the furan contains only one O heteroatom.It is assumed that the structural elements of an inhibitor molecule can be ranked, according to their contribution to inhibitory activity, in the following order of decreasing importance: number of N atoms > number of O atoms > number of S atoms > number of P atoms > number of halogen atoms.The ddI molecule contains four N, three O, no S, no P and no halogen (X = F, Cl) heteroatoms (N 4 O 3 S 0 P 0 X 0 ).In some inhibitors the base is a Gua (ddI, novel proposed ligand), in some others, a Cys derivative (ddC, d4T, 3TC).In most inhibitors the furan ring contains only one O heteroatom (ddI, ddC, d4T, novel proposed ligand, N 3-4 O 3 S 0 P 0 X 0 ), while in 3TC the furan ring includes one O and one S heteroatoms (N 3 O 3 S 1 P 0 X 0 ).In the NRTI inhibitor ddI the base is a Gua derivative, and the furan contains only one O heteroatom.The molecule contains four N, three O, no S, no P and no halogen heteroatoms (N 4 O 3 S 0 P 0 X 0 ).Obviously its associated vector is <11111>.In this study, ddI was selected as a reference HIV-1 inhibitor, because of the good docking scores with the receptor RT.This improves the quality of the classification for those inhibitors similar to ddI.The selection as reference of an inhibitor dissimilar to ddI, e.g.tenofovir disoproxil, would not improve the quality of the classification for those inhibitors similar to ddI.
Vector <00110> is associated with efavirenz since it contains one N, two O, no S, no P and four halogen atoms.Let us denote by r ij (0 ≤ r ij ≤ 1) the similarity index of two inhibitors associated with vectors i and j , respectively.The relation of similitude is characterized by a similarity matrix R = [r ij ].
The similarity index between two inhibitors i = <i 1 ,i 2 ,…i k …> and j = <j 1 ,j 2 ,…j k …> is defined as: where 0 ≤ a k ≤ 1 and The definition assigns a weight (a k ) k to any property involved in the description of molecule i or j.The grouping algorithm uses the stabilized matrix of similarity, obtained by applying the max-min composition rule o defined by: R oS where R = [r ij ] and S = [s ij ] are matrices of the same type, and (RoS) ij , element (i,j)-th of matrix RoS.
When applying this rule iteratively so that R(n+1) = R(n) o R, there exists an integer n such that: is called the stabilized similarity matrix.Its importance lies in the fact that in the classification it will generate a partition into disjoint classes.It is used and designated by The grouping rule is the following: i and j are assigned to the same class if The class of i noted where s stands for any index of a species belonging to the class ) i (similarly for t and ) j ).Rule (3)   means finding the largest similarity index between species of two different classes.In information theory, the information entropy h measures the surprise that the source emitting the sequences can give. 16,17For a single event occurring with probability p the degree of surprise is proportional to -ln p.
Generalizing the result to a random variable X (which can take N possible values x 1 , …, x N with probabilities p 1 , …, p N ), the average surprise received on learning the value of X is -Σ p i ln p i .The information entropy associated with similarity matrix R is: For a given charge or duty, the best configuration is that in which entropy production is most uniformly distributed.Equipartition implies a linear dependence, so that the equipartition line is described by: Since the classification is discrete, a way of expressing equipartition would be a regular staircase function.The best variant is chosen to be that minimizing the sum of squares of the deviations: Learning procedures are implemented as follows. 18Consider a given partition into classes as good or ideal from practical or empirical observations, which corresponds to a reference similarity matrix S = [s ij ] obtained for equal weights a 1 = a 2 = … = a and for an arbitrary number of fictious properties.
Next consider the same set of species as in the good classification and the actual properties.The similarity degree r ij is then computed with Equation (1) giving matrix R. The number of properties for R and S may differ.The learning procedure consists in trying to find classification results for R, as close as possible to the good classification.The first weight a 1 is taken constant, and the following weights a 2 , a 3 ,…, subjected to random variations.A new similarity matrix is obtained using Equation (1) and the new weights.The distance between the partitions into classes characterized by R and S is given by: The result of the algorithm is a set of weights allowing adequate classification.The procedure has been applied in the synthesis of complex flowsheets using of information entropy. 19om the present results and discussion the folllowing conclusions can be drawn.
1.Many algorithms for classification are based on information entropy.For sets of moderate size an excessive number of results appear compatible with data, and the number suffers a combinatorial explosion.However, after the equipartition conjecture, one has a selection criterion between different variants resulting from classification between hierarchical trees.According to the conjecture, the best configuration is the one in which the entropy production is most uniformly distributed.The method avoids the problem of other methods of continuum variables, because for the four compounds with constant <11111> vector, the null standard deviation always causes a Pearson correlation coefficient of r = 1.The lower level classification processes show lower entropy and may be more parsimonious.The good comparison of our classification results, with other taken as good, confirm the adequacy of the property vector selected for the molecular structures.Information entropy and principal component analyses permit classifying the inhibitors and agree.In general, the classical classes are recognized.
2. The analysis is in agreement with principal component analysis.It compares well with other classification taken as good based on docking, density functional, molecular dynamics, the Rule of Five, absorption, distribution, metabolism, excretion and toxicity.The analysis of the interactions of the proposed novel ligand with the reverse-transcriptase active site strongly suggests that the proposed novel ligand could be a good potential inhibitor for anti-HIV chemotherapy.

) R b1 ( ) = 24 . 22 .
Dendrogram and radial tree[20][21][22] separate the same classes.The NRTI ddI and novel proposed ligand inhibitors are grouped into the same class, and NRTI ddC and d4T.Inhibitors belonging to the same class appear highly correlated in partial correlation diagrams, in agreement with previous results for Entries 4-8.At level 0.82 ≤ b 2 ≤ 0.84 classification is: C b 2 = (1,3,9,10,11)(2,13)(4,8)(5,6,7)(12) Five classes result in this case; the entropy decreases to h ) R b2 ( ) = 11.90.Both dendrogram and radial tree matching to <i 1 ,i 2 ,i 3 ,i 4 ,i 5 > and C b 2 separate the same five classes, in agreement with both partial correlation diagrams, dendrogram, binary tree and previous results obtained for Entries 4-8.A high degree of similarity is found for Entries 2-13, 3-9-10, 4-8 and 5-6.Again NRTI ddI and novel proposed ligand are grouped into the same class and NRTI ddC, d4T and 3TC.The lower-level b 2 classification process shows lower entropy and may be more parsimonious.The b 2 may have greater signal-to-noise ratio than b 1 classification.Entries 4-8 belong to the same class at any grouping level b.

Figure 1 .
Figure 1.Radial tree for human immunodeficiency virus type 1 inhibitors.

Figure 2 .
Figure 2. Splits graph for the human immunodeficiency virus type 1 inhibitors.

Figure 3 .
Figure 3. Principal component analysis F 2 vs. F 1 plot for the HIV-1 inhibitors.

( ) if b 1 <
Denote by C b the classes set and by ) R b the similarity matrix at grouping level b.The information entropy satisfies the following properties.(1) h(R) = 0 if r ij = 0 or r ij = 1.(2) h(R) is maximum if r ij = 0.5 (when the imprecision is maximum).(3) h ) R b ( ) ≤ h R( ) for any b, i.e. classification leads to a loss of entropy.(4) h ) b 2 (the entropy is a monotone function of the grouping level b).In the classification, each hierarchical tree corresponds to an entropy dependence on grouping level, and an h-b diagram can be obtained.The equipartition conjecture of entropy production is proposed, as a selection criterion among different variants resulting from classification among hierarchical trees.