Computational and synthetic approaches for the discovery of HIV-1 integrase inhibitors

In recent years our research group has been engaged in the structure-function study of IN enzyme and in the development of new HIV-1 IN-inhibitors. We first developed three-dimensional hypothetical pharmacophore models for the binding of IN inhibitors to the enzyme. In particular we focused our attention on  β -diketo acid (DKA) derivatives which represent the major leads in the development of anti-HIV-1 IN drugs, considering that the only two IN inhibitors undergoing clinical trials belong to this family. The resulting pharmacophore models allowed the discovery of new potential IN inhibitors, both through rational design and virtual screening. Biological testing showed that our strategy was successful in searching for new structural leads as HIV-1 IN inhibitors. In addition we built a plausible model of the full-length HIV-1 integrase dimer complexed with viral DNA on which molecular dynamics simulation studies were carried out.


Introduction
The fight against HIV-1 infection has been considerably boosted by the discovery of two important class of antiviral drugs, i.e. the specific inhibitors of the viral enzymes reverse transcriptase (RT) and protease (PR). 1,2However, the current standard antiretroviral therapy has raised several problems in terms of poor tolerability and development of multidrug resistance, thus emphasizing the demand of new agents directed against alternative sites in the viral life cycle.
Integrase (IN) has thus emerged as a promising target for the development of new generation of inhibitors to be used in the anti-HIV therapy, because it plays a key role in stable infection and a known functional analogue is lacking in the human host cells.Moreover, although a wide variety of compounds have been reported as IN inhibitors, drugs active against this enzyme have not as yet been approved by the FDA.
Integrase inserts a double stranded DNA copy of the viral RNA genome into the chromosomes of an infected cell through two separate reactions; in the ''3'-processing" step, IN removes two nucleotides from each 3'-end of viral cDNA, while in the "strand transfer" reaction, the two newly processed 3'-viral DNA ends are inserted into the host cell DNA.For the integration reaction, no source of energy (e.g.no ATP) is needed and only divalent cations such as Mn 2+ or Mg 2+ are required for the catalytic activity.
In recent years our research was addressed to the development of new HIV-1 IN-inhibitors and the structure-function study of IN enzyme.
To date, β-diketo acid (DKAs) and their derivatives represent the major leads in the development of anti-HIV-1 IN drugs, considering that the only two IN inhibitors undergoing clinical trials, Shionogi/Glaxo-SmithKline's S-1360 3 and Merck's L-870-810 4 (Figure 1, 1 and 2), belong to this family.These compounds have been shown to selectively inhibit the strand transfer step by sequestering the divalent cations bound in the active site of the enzyme, 5 and to block HIV-1 replication in infected cells.
For the above reasons, our idea was to generate 3D pharmacophore models for the binding of DKA class (Integrase Strand Transfer Inhibitors, INSTI) to the enzyme. 6,7n addition, in order to study the action mechanism of IN inhibitors and considering the absence of information about the complete three-dimensional structure of HIV-1 integrase, we used a modified approach for DNA docking to obtain a model of the full-length integrase-DNA complex. 8We also carried out a molecular dynamic (MD) simulation of the IN-DNA complex in Using the 5CITEP, 3 atomic coordinates available from X-ray crystallography data (PDB code 1QS4), we developed a 3D model (Figure 2) which was consistent with the proposed mechanism of action for this family of IN inhibitors.This four-point 3D model was created by assembling three hydrogen-bond acceptors, mapped over the ketoenol moiety (HBA1 and HBA2) and the N1 atom of the tetrazole ring (HBA3), and one hydrophobic-aromatic group (HyAr), mapped over the indole centroid.
To restrict the 3D spatial arrangement for the next database search, we added several distance constraints among HBAs-HBAs and HBAs-HyAr functions, according to the corresponding distances calculated in 5CITEP and in another DKA analogue, 4 (Figure 3 and Table 1).
In fact, the interaction between the ligand 5CITEP and the metal ions in the IN active site is accounted for by our hypothesis of the mapping of HBA sites described by the ketoenol functionality and the N-1 atom of the tetrazole moiety.Moreover, a hydrophobic aromatic region (HyAr) was positioned over the indole ring since it was repeatedly reported that an aromatic moiety seems to be an essential structural element for anti-IN activity. 10,11hile 5CITEP is a perfect point of reference for distances since we obtained its structure from crystallographic data, molecule 4 was chosen among the most potent DKA analogue IN inhibitors because the 1,3-diketo acid moiety is incorporated in a highly rigid system, 12 thus providing useful and unambiguous geometric information.The latter compound was constructed using standard bond lengths and angles from Sybyl 13 fragment library and fully optimized by the semiempirical quantum mechanical method AM1.][16]  All compounds of the prediction set were found to fit the pattern of the four structural features in at least one conformation thus suggesting that this hypothetical model might be a useful tool for the discovery of potential DKA-based lead compounds.

ARKAT
The next step was thus to use the putative pharmacophore model as a search query to identify structural templates from 3D small molecule databases.About 4000 compounds that contained, in some conformation, the specified 3D location of chemical functions were found.
A subset of these structures was then chosen by removing compounds that did not satisfy the well-known Lipinski rules describing properties of drug-like compounds. 17The remaining molecules were overlaid with the pharmacophore by using the Best Fit option, and the top 100 hits were visually reviewed.
Finally, a total of 10 compounds (12-21, Figure 5) were selected for assaying in a stepwise fashion on the basis of (1) their fit value, (2) log P, (3) chemical diversity, and (4) availability and cost.The biological results, namely, the inhibition of HIV-1 integrase activity in an enzymatic assay, suggest that our approach might be effective in identifying possible IN inhibitor lead candidates for further development (Table 2).In fact, out of the 10 compounds tested, seven inhibited the HIV-1 integrase enzymatic activity.Of these, three (14, 16, and 19) showed potency below 10µM; in particular, compounds 16 and 19 were the most active IN inhibitors with a 50% inhibitory concentration (IC 50 ) in the overall integration assay of 1.9 and 0.9µM, respectively.The superimposition of the 10 molecules against the hypothetical hypothesis revealed that the four features of the pharmacophore were well matched by the chemical groups of the molecules.
As an example, Figure 6 illustrates the alignment of compounds 16 and 19 onto the plausible 3D pharmacophore model for DKA analogues.

ARKAT
We successively extended our investigation to the development of 3D QSAR models for the same family of IN inhibitors, considering that the only two previously reported pharmacophores for DKA-like derivatives were "qualitative" hypotheses 6,18 (generated without the use of activity data), whereas for the generation of our "quantitative predictive" model, the biological activities were taken into account.We used a data set consisting of 33 molecules acting as IN strand-transfer-selective inhibitors and including the most representative DKAs and DKA-like derivatives reported in the literature. 5,10,12,15,19,20Among them, 17 compounds were selected as the training set (TS), and the other 16 were used as the prediction set (PS).The structures and IN inhibitory activity values of both TS and PS molecules are reported in Figure 8 and Figure 9, respectively.The range of in vitro IN inhibitory activity, expressed as IC 50 for strand transfer inhibition, spanned 5 orders of magnitude (0.01-100 µM), making this a good data set for HypoGen module.
Using the selected TS molecules, HypoGen algorithm constructed the 10 simplest hypotheses that showed the best correlation between estimated and measured activities.The top ranked pharmacophore model had the best predictive potentiality and statistical significance.
The 3D hypothesis consisted of one hydrophobic aromatic region (HYAr).Two hydrogenbond acceptor (HBA1 and HBA2), and one hydrogen-bond donor (HBD) sites in a specific three-dimensional orientation; Using the most active molecule (2) of the TS, which is also one of the two IN-DKA-like inhibitors in clinical development, a flexible fit of this molecule to Hypo1 is shown in Figure 10.We also used Hypo1 to perform a regression analysis with the PS of 16 compounds (Figure 9) in order to check the predictive power of this model.Linear regression of the predicted activities for PS IN inhibitors versus the experimental ones gave a fairly good correlation coefficient of 0.85, confirming the validity of the most statistically significant HypoGen hypothesis in predicting the IN inhibitory activity of DKAs and DKA like derivatives.
It is also worth noting that the type and number of features encoded in the automatically generated hypothesis were in full agreement with our previously manually developed ligandbased pharmacophore model. 6,7owever, the spatial location of HYAr differed a little in the two hypotheses.In particular, the HypoGen runs pointed out that to have compounds with very high IN inhibitory activity, it is necessary to have one hydrophobic feature well separated by the DKA motif.In fact, since 27 mapped to only three of the four features of the lowest-cost Catalystgenerated DKA hypothesis, we thought that the introduction of functionality in this ligand, which might interact with the fourth feature of the hypothesis (i.e.HYAr), could provide enhanced IN inhibitory activity to the compound.
This idea was also supported by comparison of the activity data of compound 32 (IC 50 =24.2) and its benzyl derivative 35 (IC 50 =0.01).The designed N-benzyl derivative of compound 27 (compound 54, Scheme 1) and the corresponding conformational models were thus edited within Catalyst.The predicted IC 50 value for 54 was 0.02 µM, and its mapping onto the topranked hypothesis is represented in Figure 11B.
The promising molecular modeling results prompted us to plan the synthesis of a series of new 1H-indole derivatives 48-59 (Scheme 1) bearing a benzyl or 4-fluorobenzyl substituent at N-1, seeing that an overview of the most active DKA analogues highlighted the presence of a fluorine atom on the benzyl moiety.
Furthermore, a chlorine atom or a methoxy group was introduced on the benzene-fused ring of some of the newly designed benzylindole derivatives (i.e.50-53, 56-59) based on the observation that chloro-substituted compound 44 (IC 50 =0.52)was 2-fold more potent than 27 (IC 50 =1.4),and that a methoxy group was present in potent IN inhibitors such as 23, 38, and 39.The solventless microwave-assisted synthesis was used in many steps (Scheme 1) thus strikingly reducing the reaction times and obtaining almost quantitative yields.All the synthesized compounds were tested in IN inhibition assays. 21,22The results showed that the diketo acid derivatives were generally more potent than the corresponding esters and that most of the tested compounds showed 50-100 fold selectivity toward inhibition of strand transfer in comparison to 3'-processing (Table 3).
Our lead compound (54) had a strand stransfer inhibitory activity of 0.03 µM; the introduction of a chlorine atom on the indole ring (56) led to a significant decrease of anti-IN activity, while the presence of a methoxy group in the same position (58) did not influence the potency.

ARKAT
The best activity was displayed when a p-fluorine atom was present on the benzyl moiety; in fact compound 55 was 2-fold more potent compared to the unsubstituted parent 54, and fluoro derivatives 57 and 59 were the most active compounds of the series with IC 50 =0.01µM and IC 50 =0.004µM, respectively.
In particular the contemporary presence of a fluorine atom on the benzyl moiety and a methoxy group on the indole system increased the IN inhibitor activity, and compound 59 showed potency comparable to that of L-870,810, one of the two IN inhibitors in clinical trials.The relevance of our modeling assumptions is also documented by the measured strand stransfer inhibitory activity of 54 (IC 50 =0.03µM), which compared well with the predicted value of 0.02 µM.

Analysis and molecular dynamic studies of the full-length integrase-DNA complex
As above-mentioned our research was also addressed to the structure-function study of HIV-1 IN enzyme.In particular we used a modified approach for DNA docking to obtain a model of the full-length integrase-DNA complex 8 on which molecular dynamics studies were carried out. 9etroviral IN is a 32-kDa enzyme composed of one polypeptide chain that folds into three distinct functional domains: the N-terminal domain (residues 1-50), the catalytic core domain (residues 50-212), and the C-terminal domain (residues 212-288).The amino-terminal domain includes a conserved "HH-CC" motif that binds a Zn 2+ ion and promotes enzyme multimerization. 23,246][27] The C-domain has been showed to have a non-specific but strong DNA binding activity similar to that of the full-length IN. 25,28ll three domains bind DNA and each isolated domain forms a homodimer in solution.Even though all three domains are required for full catalytic activity, site-directed mutagenesis experiments have shown that the central core domain is sufficient to promote a reverse integration reaction in vitro, known as "disintegration", indicating that this region contains the enzymatic catalytic centre. 29,302][33][34][35][36][37][38][39][40] Two HIV-1-IN two-domain structures had been also solved by X-ray crystallography: catalytic core and C-terminal domains 41 and catalytic core and N-terminal domains (residues 1-56) 42 whereas the complete three-dimensional bioactive structure of HIV-1 integrase was still unknown.Due to this absence of complete structural information and since the elucidation of the HIV-1 IN/DNA complex is essential to enabling the target-based drug design, we decided to develop a model of the full-length protein complexed with viral DNA using an automated docking procedure, which was revised in order to satisfy our requirements.
A model of the full-length HIV-1 IN dimer (chains A and B both consisting of residues 1-270) was constructed assembling the experimentally determined structures of the single domains.In order to have a good starting model for further rational drug design, we used the only available IN core domain (residues 56-209) complexed with an inhibitor as the initial structure for our modelling studies.This structure presented only one Mg 2+ ion, so the second Mg 2+ ion was placed in each core domain, i.e., chains A and B, in the same relative position according to the two-metal structure of the Avian Sarcoma Virus integrase (PDB code 1VSH), a high structural homolog to HIV-1 IN. 43 The structure of the C-terminal domain dimer was later extracted by a fragment of HIV IN comprising the catalytic core and C-terminal domains (PDB code 1EX4). 41In order to obtain the right orientation between our above-described dimer and the C-domains, the conserved catalytic ARKAT core domains in the 1EX4 structure were superimposed to our isolated core domains resulting in a plausible two-domain 56-270 dimer.
The N-terminal domains were finally added using the 1K6Y structure 42 that was resolved as a tetramer of catalytic core plus N-terminal domains, giving the nearly identical AB and CD dimers.We used the AB dimer that was connected with our two-domain integrase model, matching common atoms in the catalytic core domains to obtain the full structure of integrase.
Later, the program ESCHER was modified in order to tackle the automated prediction of DNA-protein complexes, and used to dock this IN dimer to the viral DNA.
The best scored docking result suggested that the viral DNA interacted with all three integrase domains in our model and in particular the conserved 3'-CA end at the viral DNA was placed close to the residues of the catalytic site D64, D116, and E152, with the 3'-OH group pointing toward the three acidic residues (Figure 12).
Our IN-DNA complex was thus validated on the basis of known experimental data such as photo-crosslinking and site-directed mutagenesis studies 38,[45][46][47][48] that revealed the amino acids in close contact with DNA and critical for IN function.
The obtained model has been deposited into Protein Data Bank (PDB ID code is 1WKN) 9 and was used to perform MD simulation studies in order to gain further insights into the dynamics and conformational characteristics of the IN/DNA complex.
It is worth underlying that in our structure, looking at the catalytic core domains both of subunits A and B, only the catalytic core of subunit B made direct contacts with the viral DNA.
The substantial difference between the two catalytic sites was fundamental in our study, as the main aim of the present work was to compare the dynamical motion of a surface loop when the viral DNA interacts (subunit B) or does not interact (subunit A) with the core domain.In fact, even if it has been reported that this loop has an important role in the catalytic activity of IN, the bioactive conformation adopted during the integration process is still unknown.
The MD simulation was carried out in 3 phases: initial period of heating from 0 to 300 K over 3000 iterations (3 ps, i.e., 1 K/10 iterations), equilibration period of 150 ps and the production phase of simulation of 1.5 ns.Only the frames memorized during this third phase were considered, with a frame stored each 1000 iterations (1.0 ps), yielding 1500 frames.
The simulation was carried out using NAMD2.51 55implemented on a Dual-Athlon PC.The atom types were assigned using force field CHARMM v22 and the atomic charges according the Gasteiger-Marsili method.The trajectory obtained was analyzed using VEGA. 56e have focused our attention on the dynamical behaviour of the surface loop comprising residues 140-149 to gain insights into the conformational changes of this region in the two subunits.Table 4 reports the distances between the middle point of the segment connecting the Mg 2+ ions (that remain fixed due to the applied constraints) and the Cα atoms of loop 140-149 for both subunits.a In each cell the first line reports minimum and maximum values, while the second line is the average ± the standard deviation; b this column reports the differences between the averages in the two subunits.
The corresponding differences showed a markedly different dynamical behaviour of the residues in the loop in chains A and B, indicating that the loop of subunit B approaches the Mg 2+ ions, while such a conformational shift is not seen in subunit A where the loop region displays very low mobility during the entire molecular dynamics simulation.Tyr143 is the residue that shows the maximum approaching to metal ions: as seen in Figure 13, the distance between Tyr143 atom OH and the middle point of the segment connecting Mg 2+ ions always appears greater than 17 Å in subunit A, while in subunit B this distance decreases up to 7 Å .
A detailed analysis of the different behaviour of Tyr143 in the two subunits allowed us to shed light on the marked approach of this residue to metal ions, as seen only in subunit B, which is due to both a different conformational shift of backbone atoms of the surface loop and a different conformational profile of the Tyr143 side chain. the equilibrated position the Tyr143 side-chain points far away from the catalytic site; conversely, the average structure shows the rearrangement of the loop with Tyr143 pointing toward the three conserved residues and the two metal ions in the active site.

Figure 1 .
Figure 1.Two integrase inhibitors currently in clinical trials.

1 .
Pharmacophore modeling and discovery of HIV-1 Integrase inhibitors Our first aim was to build a simple 3D pharmacophore model representing the distinguishing chemical features of Integrase Strand Transfer Inhibitors (INSTI) and to obtain a useful tool for further discovery of novel IN inhibitors.

Figure 2 .
Figure 2. The four-point hypothetical model for DKA analog IN inhibitors built using the Catalyst program (HBA, green: hydrogen-bond acceptor features; HyAr, cyan: hydrophobic aromatic function).The vectors represent the putative interactions between the DKA analogs and the metal ions.The structure of 5CITEP is superimposed on the model.

Figure 4 .
Figure 4.Chemical structures of the most representative DKA analogue integrase inhibitors.

Figure 5 .
Figure 5.Ten compounds retrieved from the CAP2002 database and assayed for IN inhibition.

Figure 9 .
Figure 9.Chemical structures of the 16 compounds of PS with their experimental (Exp IC 50 ) and estimated (Est IC 50 ) IN strand transfer inhibitory activities, both expressed in µM.The molecules are placed in order of decreasing activity.

Figure 12 .
Figure 12.Model obtained by docking the full-length integrase to the viral DNA.Monomers A and B are shown in violet and yellow, respectively, while DNA is shown in green.The threeactive site residues D64, D116, and E152 are shown in blue.Amino acids establishing hydrogen bond and salt-bridge interactions with the DNA are labeled and shown as spheres.The conserved adenosine shows the proper orientation of the 3 ′ OH group.The two Mg 2+ metal ions are shown as grey spheres.This figure was prepared using the program PyMOL 44

Figure13.
Figure13.Dynamic profile of interatomic distance between Tyr143 atom OH and the middle point of the segment connecting Mg 2+ ions over the trajectory (gray line, B subunit; black line, A subunit).

Figure 14 .
Figure 14.Superimposed starting (green) and average (yellow) structures from the MD simulation of 1WKN.This figure was prepared using the program PyMOL.44

Table 1 .
Calculated and specified distance in 5CITEP, 4, and our pharmacophore model

Table 2 .
Inhibition of HIV-1 integrase activity and Fit values a Concentration of inhibitor that inhibits the overall integration in the oligonucleotide-based assay by 50%.Median values of three separate experiments are shown.

Table 3 .
Inhibition of over-all integration, 3'-processing and strand transfer as IC 50 (µM) a a Concentration required to inhibit by 50% the in vitro integrase activity assays

Table 4 .
Distance between Mg 2+ ions and Cα for loop residues in both subunits and relative differences