Retroviral DNA Integration

The integration of a DNA copy of the viral RNA genome into host chromatin is the defining step of retroviral replication. This enzymatic process is catalyzed by the virus-encoded integrase protein, which is conserved among retroviruses and LTR-retrotransposons. Retroviral integration proceeds via two integrase activities: 3′-processing of the viral DNA ends, followed by the strand transfer of the processed ends into host cell chromosomal DNA. Herein we review the molecular mechanism of retroviral DNA integration, with an emphasis on reaction chemistries and architectures of the nucleoprotein complexes involved. We additionally discuss the latest advances on anti-integrase drug development for the treatment of AIDS and the utility of integrating retroviral vectors in gene therapy applications.


INTRODUCTION
More than 100 years ago, the Danes V. Ellerman and O. Bang and the American P. Rous passaged oncogenic variants of what is known today as the avian sarcoma-leukosis virus (ASLV). 1,2 The importance of retroviruses in biology and medicine has greatly increased over the past 50 years with two major milestones: first, in 1981, the isolation of the human Tlymphotropic virus 1 (HTLV-1) 3 and, soon after, the discovery of human immunodeficiency virus type 1 (HIV-1), 4,5 which is responsible for one of the most dramatic pandemics in recent history. The flurry of high-octane research, initially driven by the suspected role of retroviruses in human cancer and later by the acquired immunodeficiency syndrome (AIDS) pandemic, yielded a plethora of discoveries and tools to bolster all disciplines of biology. 6 It would be hard to imagine cancer biology without the concept of the oncogene or molecular biology without reverse transcriptase (RT).
Retroviridae is a large viral family comprising seven genera: αthrough ε-retroviruses, lentivirus, and spumavirus (Table 1). HTLV-1 and HIV-1 (along with their respective types) belong to δ-retrovirus and lentivirus genera, respectively. Several other retroviral species gained prominence as research models, for historical reasons or as animal pathogens. These include ASLV (an α-retrovirus), mouse mammary tumor virus (MMTV, a βretrovirus), murine leukemia virus (MLV, a γ-retrovirus), simian immunodeficiency viruses (SIVs, lentiviruses highly related to HIV-1 and HIV-2), feline immunodeficiency virus (FIV, a lentivirus), and the prototype foamy virus (PFV, a spumavirus). Integration, which yields the establishment of the obligatory proviral state, 7 is the one feature that distinguishes retroviruses from all other viral families. Herein, we present state-of-the-art interpretations of the structure of retroviral integrase (IN), the essential enzyme responsible for this process, as well as the role of IN in virus replication. Due to the conservation among IN proteins from different retroviral species, we will refer to them collectively as retroviral IN, except when discussing aspects that may be relevant to a particular retroviral genus or species.

IN AND THE RETROVIRAL LIFE CYCLE
Replication via formation of a stable DNA form makes retroviruses particularly amenable to reverse genetics. Accordingly, functions of retroviral gene products have been extensively probed through mutagenesis. In early studies, IN was identified as the protein product encoded within the 3′ portion of the retroviral pol gene that was essential for efficient retroviral replication and integration. 8−11 Reverse transcription of the diploid retroviral RNA genome results in the formation of a linear double-stranded viral DNA (vDNA) molecule carrying a copy of the long terminal repeat (LTR) sequence at either end. 12−15 The vDNA molecule exists in the form of a preintegration complex (PIC) 16,17 that is rather poorly biophysically characterized due to the scarce level at which it forms, ca. one copy per cell, during acute virus infection. Nevertheless, PICs have been reported to contain a number of cellular and viral proteins, most notably IN. 18−26 Once the PIC gains access to the nuclear compartment, the vDNA ends are inserted into a cellular chromosome. This step, initiated by the enzymatic action of IN and completed by the host cell DNA repair machinery, is a point of no return: the cell becomes a permanent carrier of the integrated viral genome, which is referred to as the provirus.
In addition to this well-established role, IN may play a range of less characterized functions in retroviral replication, as suggested by its unusually complex genetics (reviewed in ref 27). For instance, disruption of the IN coding portion of the HIV-1 pol gene can lead to production of viral particles with aberrant morphology and severe defects in reverse transcription. 28 −31 In fact, only a minority of HIV-1 IN mutants display defects solely at the integration step of the viral life cycle. Such mutants, which include amino acid substitutions within the IN active site, were collectively categorized as class I mutants. 32 The hallmark of the associated phenotype is the predictable accumulation of nonintegrated forms of vDNA, including a circular form that contains two abutted copies of the LTR (2-LTR circles). Conversely, class II HIV-1 IN mutants disrupt viral replication at multiple steps while usually retaining at least partial IN enzymatic activity in vitro. 33−36 The pleiotropic effects observed with class II HIV-1 IN mutants range from disrupted virion assembly to apparent nuclear import defects. 30,33,34,37−41 Most notably, class II IN mutants typically show reduced levels of reverse transcription. 27 The abundance of HIV-1 IN mutations with pleiotropic phenotypes is a strong indication that the protein may play critical roles in the viral lifecycle outside of the integration step. Accordingly, HIV-1 IN was shown to interact with the viral RT and influence its activity in vitro. 42−44 More recent work with allosteric IN inhibitors (described at length below) has highlighted a direct role for IN in HIV-1 particle maturation. 45−47 Among the esoteric functions of HIV-1 IN, its proposed involvement in PIC nuclear import has been the subject of considerable and yet to be resolved debate. 34 Biochemical studies of retroviral DNA integration began with partial purification of enzymatically competent PICs from acutely infected cells. Such preparations can catalyze vDNA integration into exogenous DNA in vitro, and the reaction products can be detected and quantified by Southern blotting or PCR. 16,55−58 During infection, a considerable fraction of vDNA becomes circularized 59−64 and, perhaps owing to the circular nature of the bacteriophage λ DNA substrate for integrative recombination, 65 initial studies proposed that retroviral integration proceeded through the 2-LTR circular DNA form. 66 However, subsequent experiments using native MLV PICs demonstrated that it is the linear vDNA that serves as the immediate precursor for integration. 67,68 These landmark studies also established two activities associated with the PICs3′-processing and strand transfer 67 Retroviral IN bears no similarities to its namesake from λ phage and is instead closely related to the DD(E/D) family of DNA transposases. 78 Crucially, the DNA cutting and strand transfer reactions catalyzed by IN and transposases proceed through phosphodiester transesterification, without formation of covalent protein−DNA intermediates. 79−81 However, unlike transposases, retroviral IN requires a prelinearized DNA moleculethe product of reverse transcriptionand cannot act on an integrated molecule that is flanked by continuous host DNA sequences. Therefore, while the active site of a prokaryotic cut-and-paste transposase must carry out four consecutive reactions, 82,83 IN needs to accomplish only two, 3′processing and strand transfer, equivalent to the first and the final steps in DNA transposition, respectively. These reactions are carried by a multimer of IN assembled on vDNA ends, referred to as the intasome (also known as the stable synaptic complex), at the business end of the PIC (Figure 1a). 17,84,85 During 3′-processing IN hydrolyses a phosphodiester bond at either vDNA end, removing a di-or trinucleotide, liberating 3′hydroxyl groups attached to invariant 5′-CA-3′ dinucleotides. The intasome can then bind host chromosomal DNA, forming the target capture complex (TCC). Within the TCC, the enzyme utilizes vDNA 3′-hydroxyls as nucleophiles to cut host DNA in a staggered fashion, simultaneously joining both 3′ vDNA ends to apposing strands of host DNA. The postcatalytic complex, referred to as the strand transfer complex (STC), is subject to disassembly, which is likely accomplished by host cell machinery.
The fact that 3′-processing of the vDNA can occur in the cell cytoplasm 21 presents a potential problem for retroviruses. The high local concentration of vDNA in the confines of the PIC makes it a potential target for strand transfer, and products of viral self-integrationcalled autointegrationare readily de-tectable in infected cells. 86−88 It is obviously in the best interest of retroviruses to avoid autointegration, and various mechanisms have been identified. The barrier-to-autointegration factor (BAF), a small DNA-binding protein that possesses the unusual property to bridge and condense separate DNA molecules together, 89,90 was identified via its ability to suppress the autointegration activity of MLV PICs in vitro. 91 However, it has yet to be determined whether BAF provides this function during MLV infection, and the lentiviruses seem to utilize other protective measures. The SET complex, which harbors a variety of DNA metabolizing enzymes, can suppress autointegration during HIV-1 infection, 92 while the viral capsid (CA) protein has been shown to play a role in regulating SIV autointegration. 88 Additional work should help to clarify whether there might be a universal mechanism or if, indeed, different viral species have evolved unique ways to protect themselves from suicidal integration as they move through the cell toward their preferred chromosomal DNA targets.
Recombinant IN proteins are generally proficient in supporting robust 3′-processing and strand transfer activities on short double-stranded oligonucleotide substrates that represent the U5 or U3 vDNA ends at the tips of the upstream and downstream LTRs, respectively. [71][72][73][74]76,77,93 However, depending on the viral source and the method of enzyme preparation, unpaired insertions of vDNA ends into target DNA often account for the bulk of the strand transfer products formed. HIV-1 IN is particularly prone to such aberrant strand transfer events (referred to as half-site integration). Carefully optimized HIV-1 IN in vitro assays developed over the past 15 years greatly improved the efficiency of paired integration of vDNA ends (i.e., full-site or concerted integration). 94  The underlying reasons for these differences between divergent IN proteins are not fully understood. The propensity of HIV-1 IN to self-associate into higher-order multimers in solution 105 could be a factor limiting its concerted integration activity in vitro. A third type of reaction that IN can catalyze in vitro is disintegration, representing a reversal of strand transfer. 106 Although this reaction is unlikely to be relevant in vivo, disintegration assays were instrumental to help map the different IN protein domains and the enzyme active site. 107−115 Having evolved to catalyze a single reaction cycle, retroviral IN and transposases do not effectively dissociate from their reaction products, requiring active assistance from their host cells to disassemble the STC. 116,117 Because they do not turn over, these enzymes are used in stoichiometric and often superstoichiometric ratios to their DNA substrates.

Postintegration DNA Repair
Having executed concerted strand transfer to join both 3′ vDNA ends to host DNA, IN leaves a pair of single-stranded gaps and short 5′ overhangs flanking the vDNA, which are repaired by cellular enzymes (Figure 1b; reviewed in ref 118). Notwithstanding their importance, the final steps of integration are among the least studied aspects of retroviral replication. Because disorderly repair of these discontinuities may lead to disruption of proviral and chromosomal DNA integrity, the handover of the retroviral hemi-integrant to the host DNA repair machinery is likely an exquisitely choreographed process. Disassembly of the thermodynamically stable STC complex must be the first step toward repair of the vDNA−chromosome junctions. Active degradation of IN/transposase, which is wellestablished in the bacteriophage Mu transposition system, 116,117 may allow host cellular proteins to gain access to the sites of repair. Consistent with this model, HIV-1 IN is subject to ubiquitination and proteasome-dependent degradation when it is ectopically expressed in human cells. 119−121 RAD18, an E3 ubiquitin ligase involved in DNA gap repair, was shown to interact with HIV-1 IN. 122 However, a follow up study observed that knockdown of RAD18 in target cells did not reduce HIV-1 infectivity. 123 Subsequently, von Hippel−Lindau binding protein 1, a cellular subunit of the prefoldin chaperone, was implicated in proteasome-mediated HIV-1 IN degradation and, moreover, was shown to be required for efficient viral replication. 124 Following STC disassembly, three separate DNA repair enzymatic activities are required to complete the integration process by joining 5′ vDNA ends to chromosomal DNA: a DNA polymerase, a 5′ flap endonuclease, and a ligase ( Figure  1b). Early studies used biochemical approaches to test candidate DNA cellular enzymes in repair of retroviral integration products in vitro. 125−128 Synthetic model DNA containing a gap and 5′ overhang served as a template for testing candidate proteins. In these experiments a cocktail of host DNA repair proteins was challenged to polymerize across the gap, to cleave the two-nucleotide flap, and to ligate the resulting product. Base excision repair (BER) pathway enzymes DNA polymerase β, flap endonuclease 1 (FEN1), and ligase I were shown sufficient to repair the substrate. 125 DNA polymerase δ also supported repair in the presence of FEN1 and ligase I and was stimulated by its cofactor PCNA. Alternatively, viral RT and IN have been proposed to mediate postintegration repair. 106 In this model, RT polymerizes across the gap sequence, while 5′ flap removal and ligation of both strands are catalyzed by IN disintegration activity. However, RT-and IN-dependent gap repair would require a complicated juggling of the vDNA ends between the two viral enzymes and, more problematically, would involve assembly of an unusual complex with 5′ vDNA and freshly extended 3′ chromosomal DNA ends in the IN active site. Even though RT is capable of polymerizing through the gap in vitro, the reaction displayed poor fidelity. 125 Moreover, addition of IN to the reaction did not afford completion of the repair process. 125,127 Although HIV-1 IN interacts with FEN-1 129 and can stimulate its activity in vitro, 126 convincing evidence for a role of IN in repair of the hemi-integrant during retroviral infection is lacking. Cell-based studies have by contrast highlighted a role for the BER pathway of oxidative DNA damage in lentiviral DNA integration. 130,131 Other studies have described that the alteration of the nonhomologous end-joining (NHEJ) pathway affects retroviral infection and survival of the infected cells. 22,132−134 While some have shown a role for active IN and, by extension, the vDNA hemi-integrant, in NHEJ activation, 132,134 a separate study concluded that the linear vDNA substrate was cytotoxic to cells. 22 As a result of integration across the major groove in target DNA and the subsequent gap repair, the proviral DNA is flanked by a short duplication of the target DNA sequence. The duplication size is virus-specific and ranges from 4 bp for MLV and PFV to 6 bp for HTLV-1, ASLV, and MMTV, while HIV-1 and other characterized lentiviruses generate 5-bp duplications (Table 1). 63 108 the domains were characterized in isolation using X-ray crystallography and/or nuclear magnetic resonance (NMR) spectroscopy ( Figure 2a). 147−153 The NTD folds into a compact three-helical bundle stabilized by coordination of a Zn 2+ ion by side chains of His and Cys residues comprising the invariant HHCC motif. 147,150 Often misidentified as a "zinc finger domain", the NTD is most closely related to helix−turn−helix DNA binding domains. 154 In spumaviral and εand γ-retroviral INs, the NTD is expanded by a ∼40 amino acid residue NTD extension domain (NED). 85,155 The CTD is the least conserved of the three canonical IN domains and features a Src homology 3 (SH3)like β-barrel fold. 149,151 Although most structurally related to the Tudor family of chromatin binding domains, the IN CTD lacks the conserved hydrophobic cage used by Tudor domains to bind methylated Lys and Arg residues. During integration, the NTD and the CTD make important interactions with DNA substrates and play critical structural roles within the intasome assemblies. 85,156,157 The flexible linkers connecting these domains to the CCD lack sequence conservation and vary in length between retroviral genera. 155,158 The CCD harbors the active site of the enzyme composed of three invariant carboxylates comprising the signature D,D-35-E motif. 159 In isolation, the CCD can catalyze disintegration, but neither of the biologically relevant activities of full-length IN. 112 bacterial and viral RNase H enzymes, Holliday junction resolvase RuvC, prokaryotic and eukaryotic DD(E/D) transposases, and RAG1 recombinase. 148,160 This initial study and the majority of subsequent IN crystal structures captured the CCD in a recurring dimeric form (Figure 2a). Given the wide separation of IN active sites in the CCD dimer, it became immediately clear that a higher-order IN multimer must be responsible for concerted integration of vDNA ends. Similarly, four copies of the mechanistically related Mu phage transposase assemble into its core synaptic complex, with only two of the four protomers contributing active sites for the catalysis of DNA cleavage and strand transfer. 161−163 Of note, the HIV-1 virion packages as many as ∼250 copies of IN, 164,165 a large excess over the two active sites that are required to catalyze integration.
Over the last 20 years, considerable effort has been expended to establish the functional multimeric state of retroviral IN. The bulk of these studies has focused on HIV-1 IN, which was shown to form a variety of multimeric species. 166   MMTV and ASLV INs are dimeric proteins that assemble into octamers on vDNA ends. 155,181 Initial hints for the structural organization of the different IN protomers within functional intasomes came from results of biochemical complementation studies. 168,182−184 Mutant IN proteins that by themselves were defective for catalysis in vitro could recover significant levels of activity when studied as a mixture. In this way, IN NTD and CCD functions were shown to originate from different protomers within the multimeric complex. The first glimpse into the structural basis for IN multimerization beyond the canonical CCD dimerization was provided by the crystal structure of a two-domain construct spanning the NTD and the CCD of HIV-1 IN. 185 The structure revealed how pairs of IN dimers can interlock via formation of cross-dimer NTD−CCD interfaces (Figure 2b). Subsequently validated by site directed mutagenesis, the critical NTD−CCD interface, predicted from prior biochemical studies, became the second recurrent feature in structures of diverse INs and later intasomes. 85,155,157,181,186 Further details on the partial structures of the preintasome era in retroviral IN structural biology can be found in recent reviews. 158,187,188

Architecture of the PFV Intasome
The first functional retroviral IN−DNA complex found to be amenable to structural characterization was the intasome from the spumavirus PFV. 85 Identified though comparative analysis of diverse orthologs, PFV IN was shown to be highly soluble and exceptionally active on short mimics of vDNA ends. 103,189 The relatively simple architecture of the PFV intasome will provide a basis to describe the structures of the more complex αand β-retroviral intasomes, which were characterized very recently. 155,181 Although the U5 and U3 ends of retroviral LTRs are not identical, functional intasomes can generally be assembled with pairs of U5 sequences. Snapshots of such symmetrized PFV intasomes, visualized in the states prior to and after 3′-processing, as well as in complex with target host DNA prior to and after strand transfer, are now available. 156,157,190 The PFV intasome ( Figure 3a) contains a tetramer of IN with a dimer-of-dimers architecture, composed of two structurally and functionally distinct IN subunits. The inner subunits (colored green and cyan in the figures) of each IN dimer are responsible for all interactions with vDNA, including provision of the active sites for catalysis. The outer IN subunits (shown in yellow) attach to the inner chains via the canonical CCD dimerization interface. The inner IN chains interact via forming intersubunit NTD−CCD contacts, and the two halves of the intasome are held together by the insertion of a pair of CTDs, which act as solid spacers between the CCD dimers ( Figure 3a,b). The extended NTD−CCD and CCD−CTD linkers run parallel to each other, trussing the nucleoprotein assembly. The PFV intasome structures reported to date lack outer IN chain NEDs, NTDs, and CTDs, which were disordered in crystals and under cryo-electron microscopy (cryo-EM) conditions. 85,191 Some clues about the average positions of the outer chain NTD and CTD could be gleaned from solution-state X-ray and neutron scattering. 180 However, these domains are dispensable for PFV intasome assembly and strand transfer activity in vitro, and their functions await clarification. 192 The vDNA ends enter the IN dimer−dimer interface, providing both 3′ termini to the active sites of the inner IN subunits. The CCD, CTD, and NED of both inner IN chains make intimate interactions with the vDNA backbone and bases, consistent with vDNA sequence specificity of retroviral IN. 85 Three bases of each 5′ vDNA end are unpaired, tunneling through the CCD−CTD interface to protrude from the sides of the intasome structure ( Figure 3c). Conceivably, such a protein−DNA complex can only form when the recognition sites are available on free vDNA ends, which makes the integration process fundamentally irreversible once postintegration repair is completed.

Engagement of Target DNA by the PFV Intasome
The PFV intasome assembled with bunt-ended vDNA ends readily undergoes 3′-processing in the presence of Mg 2+ or Mn 2+ ions. 156 Following dissociation of the cleaved 3′dinucleotide, the saddle-shaped groove between the two halves of the intasome becomes available for host cell (target) DNA binding ( Figure 4). The interaction primarily involves the target DNA backbone, which explains only weak preferences of IN for integration target site sequence. 156,157,193 Target DNA binds to the PFV intasome in a sharply bent conformation, such that the base pair step at the center of the integration site unstacks with a 60°roll (Figure 4). 157 The associated expansion of the major groove to over 26 Å allows the widely spaced intasome active sites to align with a pair of phosphodiesters separated by 4 bp on apposing strands in target DNA. The ability of DNA to form a sharp kink contributes to integration site selection at the level of nucleotide sequence, leading to a bias against rigid purine−pyrimidine base pair steps in the central positions of PFV integration sites. 157,194 The energy of DNA deformation is offset by the interactions with the synaptic CTDs and the inner IN CCDs of the PFV intasome. Predictably, mutations affecting these contacts can lead to increased bias for a bendable target DNA sequence. 157 Conversely, prebent and distorted DNA performs better as a target for retroviral integration. 195,196 The PFV intasome displays robust preference for chromatinized targets compared to naked DNA in vitro. 191,197 These observations agree with early studies that, albeit using poorly defined IN−vDNA complexes, reported preferences of HIV-1 and MLV for nucleosomes. 195,196,198−200 Moreover, nucleotide sequence periodicities observed in the immediate vicinity of retroviral integration sites are consistent with nucleosomes serving as targets for retroviral integration in vivo. 194,201,202 Given the nonuniform availability of the major groove and the structure of the underlying histone octamer, 203 it is not surprising that integration sites cluster in sharply defined hotspots along nucleosomal DNA. Recently, the PFV TCC containing the intasome and a nucleosome core particle was characterized by cryo-EM at 8 Å resolution (Figure 5a). 191 The structure revealed that nucleosomal DNA engaged in the target DNA binding groove of the intasome is lifted from the surface of the histone octamer to assume the distorted conformation compatible with strand transfer. Outside of the target DNA binding groove, the intasome makes supporting contacts with one H2A−H2B heterodimer and the second gyre of the nucleosomal DNA (Figure 5b). These interactions presumably compensate for the energy of stretching and deforming histonewrapped DNA and explain the strong preference of PFV to target superhelix ±3.5 locations on nucleosomal DNA.
The strand transfer reactions occurring within the TCC that result in the joining of the 3′ vDNA ends to the host DNA mark the end of IN catalytic function. Like 3′-processing, strand transfer does not lead to global conformational rearrangements within the intasome structure. 156 It remains a mystery if and how IN signals completion of strand transfer for recruitment of STC disassembly and DNA repair machineries. Given the apparent preference of retroviruses to integrate into nucleosomes, 191,194,197 chromatin remodellers appear to be good candidates for the job of STC disassembly.

Mechanics of the IN Active Site
Although the first CCD structures were determined in the 1990s, the functional organization of the IN active site was revealed only in the context of the functional PFV IN−DNA complexes. Dubbed "the active site loop" in the early literature, 204 the IN residues connecting β5 and α4 of the CCD, often including the essential Glu of the D,D-35-E motif, are disordered in most partial IN crystal structures. As expected, the region folds into a defined structure through interactions with the vDNA end (Figure 3c). Given a high degree of amino acid conservation within the IN active site, the lessons learned from the high-resolution PFV structures should be generally applicable. Accordingly, the PFV intasome was successfully used as a model to study active site inhibitors of HIV-1 IN. 85,205,206 The active sites of retroviral IN and the related DD(E/D) transposases contain three carboxylates, which serve to coordinate a pair of essential divalent metal cofactors ( Figure  6). Although both Mg 2+ and Mn 2+ are capable of supporting IN activities, due to its greater abundance, the former metal ion is believed to be primarily utilized in vivo. 207,208 The general mechanism of two-Mg 2+ -ion catalysis at a phosphodiester bond, initially proposed in the early 1990s, 209,210 has been corroborated by a growing body of experimental and theoretical work, including recent studies of Bacillus halodurans RNase H. 208,211−213 The primary coordination spheres of the Mg 2+ ions include essential active site carboxylates, the substrate phosphodiester, and the attacking nucleophile (a water molecule in the case of RNase H). The strong preference of Mg 2+ for octahedral coordination enforces precise relative positioning of the reactants and aids in destabilizing the target phosphodiester group. During catalysis, the ions act as Lewis acids, with metal A assisting in deprotonation of the nucleophile and metal B neutralizing the negative charge developing on the phosphorane intermediate. The ability of Mn 2+ to replace Mg 2+ ion as a cofactor is explained by its similar size, pK a , and coordination geometry. Flavors of the two-Mg 2+ -ion mechanism are employed by a wide range of structurally and functionally diverse enzymes, which also includes protein kinases, 214−216 DNA and RNA polymerases, 214,217,218 and even ribozymes. 219 Due to the absolute dependence of the retroviral IN active site on its metal cofactors, it was possible to grow crystals of wild type (WT) PFV IN engaged with its DNA substrates. When exposed to Mg 2+ or Mn 2+ salts, such apo forms of the intasome and the TCC readily undergo 3′-processing and strand transfer "in crystallo". Soaking the crystals in the presence of Mn 2+ for very short periods of time (sufficient for diffusion of the salt through the crystal lattice but not catalysis) allowed freeze trapping the fully engaged configurations of the PFV nucleoprotein complexes in their ground states prior to 3′processing and strand transfer. 156 For consistency with the RNase H literature, the ion cofactors coordinated to PFV IN residues Asp185 and Glu221 are designated metal A and metal B, respectively ( Figure 6). The third carboxylate, Asp128, and a nonbridging  oxygen atom from the scissile phosphodiester are shared between both ions. In the pre-3′-processing state, the octahedral coordination sphere of metal A is nearly perfect and includes a water molecule positioned for an in-line nucleophilic attack at the phosphorus atom of the scissile phosphodiester (Figure 6a). Due to simultaneous bidentate interactions with Glu221 and the phosphodiester, the environment of metal B cannot assume octahedral coordination. This departure from the preferred coordination geometry of metal B is thought to provide a destabilizing potential on the scissile phosphodiester. 208,212 Following 3′-processing and dissociation of the cleaved dinucleotide, metal B remains coordinated to the

Chemical Reviews
Review 3′ oxygen atom of the processed vDNA end. Consequently, it befalls metal B to activate the nucleophilic 3′ hydroxyl of vDNA during strand transfer (Figure 6b). Thus, retroviral IN uses the inherent symmetry of the two-Mg 2+ -ion mechanism to carry out the consecutive reactions of hydrolysis and transesterification. Because the latter step does not change the number of high-energy bonds, strand transfer should in principle be reversible, at least for as long as the STC persists. However, such unproductive reversal of strand transfer is prevented by relocation of the newly formed phosphodiester out of the active site. 157 This reconfiguration is likely driven by conformational strain due to target DNA deformation, which is a conserved feature of retroviral intasomes and DD(E/D) transposases. 157

ASLV and MMTV Intasome Structures
In the PFV intasome, a pair of CTDs from the inner IN protomers are inserted in the dimer−dimer interface. The synaptic CTDs provide rigidity to the assembly and contribute to the host DNA binding platform. Crucially, this architecture depends on an extended polypeptide linker to track the linear distance of ∼50 Å that separates the carboxyl terminal boundary of the inner chain CCD to the beginning of the CTD (depicted as a black curve in Figure 3b). Intriguingly, the length of the CCD−CTD linker is not conserved among retroviral IN proteins, ranging from 50 to 60 amino acid residues in γand ε-retroviruses and spumaviruses, to less than 10 residues in αand β-retroviruses, whereas lentiviruses and δretroviruses possess intermediate-size linkers. 155,158 Modeling suggested that the lentiviral IN CCD−CTD linker may stretch sufficiently to allow formation of the tetrameric intasomal architecture similar to that in PFV. 205,223 However, this scenario is clearly not conceivable in the cases of αand βretroviral IN proteins.
Recently, the STC from ASLV and the intasome from MMTV were characterized by X-ray crystallography at 3.8 Å and cryo-EM at ∼5 Å resolution, respectively. 155,181 Unexpectedly, the structures revealed that the αand βretroviruses maintain a PFV-like core intasomal structure by employing additional IN dimers (referred to as flanking dimers) to source the pair of synaptic CTDs (Figure 7a,b). Strikingly, these intasomes contain homo-octamers of IN, each with four structurally and functionally distinct types of subunits. The intasomal core structure is further decorated by NTDs and CTDs belonging to the eight IN subunits. Locations of six CTDs, including the four provided by the flanking dimers, are conserved between the ASLV and MMTV intasome structures, and these make direct contacts with vDNA. The CCDs of the flanking IN dimers are considerably less defined in the cryo-EM structure, and their positions differ between the ASLV and MMTV intasomes (Figure 7a,c). Intriguingly, their locations are consistent with their potential roles in interactions with host DNA. Indeed, contacts between the flanking CCDs and the backbone of the target DNA are observed in the ASLV STC (Figure 7a, bottom panel). 181  Depending on the genomic location, the local chromatin environment of the provirus may be conducive to active viral expression or transcriptional silencing. Thus, the choice of integration site influences the level of ongoing viral replication and may contribute to the establishment of latent viral reservoirs. 224−228 In fact, the propensity of HIV-1 to establish chronic and hitherto incurable infection is the direct consequence of its ability to establish latent reservoirs. 225,229,230 While the interaction of IN with vDNA ends is nucleotidesequence-and structure-specific, the enzyme displays very little selectivity with regard to the host DNA. Alignments of retroviral integration sites revealed weak, virus-specific palindromic sequence consensi that do not extend farther than several bp from the integration site. 193,194,231−233 These weak nucleotide sequence preferences are in part explained by the sparse interactions between IN and target DNA bases. 157,194 However, far more interesting patterns emerge when the distributions of retroviral integration sites are scrutinized on the genomic scale.
Early research conducted in the 1970s and 1980s indicated that MLV integration may be associated with sites of DNase I hypersensitivity in the host cell genome. 234−236 The availability of the draft human genome sequence 237,238 and the relative ease of recovering vDNA−chromosomal junctions using PCR allowed Bushman and colleagues to address the distribution of HIV-1 integration sites in their landmark 2002 study. 239 The field was subsequently bolstered by the advent of deep sequencing, which now permits recovery of hundreds of thousands of unique integration sites in a single infection experiment. 201,240−242 It has emerged that retroviruses display distinct and contrasting preferences for various host cell genomic features (reviewed in ref 243). Thus, HIV-1 and other lentiviruses display strong preferences for transcription units with a sharp bias toward highly expressed and intron-rich genes. 201,239,240,244 MLV and other γ-retroviruses strongly favor promoter regions and DNase I hypersensitive sites in general. 194,235,236,241,242,245,246 In sharp contrast, the spumavirus PFV disfavors genes and loci of active transcription. 191,247,248 It appears that the least selective retroviruses are from the αand β-retrovirus genera, which show nearly random integration site distributions with respect to well-mapped genomic features. 244,249,250 Strong evidence implicated IN as a major determinant for integration site selection. Thus, implanting MLV IN into HIV-1 results in a chimeric virus with integration site preference which is closer to that of MLV. 246 However, the same study found a subtler role for viral gag gene products, and more recent work has shown that amino acid substitutions in viral CA protein have considerable bearing on HIV-1 integration site distributions. 246,251,252 A hallmark of lentiviruses is their ability to infect nondividing cells, with their PICs capable of traversing the nuclear envelope through the nuclear pore complex (NPC), 253,254 while many other retroviruses need the nuclear envelope to break down during mitosis to access host chromatin. 255−259 It seems plausible that CA may modulate HIV-1 PIC nuclear entry pathways, potentially via interactions with cleavage and polyadenylation specificity factor 6 (CPSF6) and/or nucleoporins (NUPs), which are the constitutive components of the NPC. HIV-1 CA has been shown to  262 and NUP153, 263 and depletion of each of these factors significantly reduced the frequency of HIV-1 integration into gene-rich regions. 251,262,264,265 The involvement of several NPC components in integration site selection is consistent with the observation that HIV-1 PICs preferentially target highly expressed genes in the nuclear periphery that are proximal to the nuclear pore. 266,267 Conversely, HIV-1 integration is excluded from internal nuclear regions as well as from lamina-associated domains. 267

The Nexus between Lentiviruses and LEDGF/p75
Whereas roles of retroviral structural proteins in PIC trafficking are only starting to transpire, the IN-dependent mechanisms of integration site selection are well-established. Lentiviruses and γ-retroviruses find their preferred genomic locations via recognition of specific chromatin-associated cellular proteins, which act as receptors or tethering factors for the PICs. Lens epithelium-derived growth factor (LEDGF) is a ubiquitous cellular chromatin-associated protein, 268 initially described as transcriptional coactivator p75. 269,270 Although a proposed extracellular function of the protein or specific roles in lens epithelium development were not corroborated, its misnomer has persisted in use. The protein was identified as a dominant HIV-1 IN binding partner in affinity-capture and yeast twohybrid screening experiments 171,271 and was later shown to interact with and stimulate the enzymatic activities of divergent lentiviral INs. 102,272,273 LEDGF/p75 is composed of 530 amino acid residues and contains two small structured domains: an Nterminal PWWP domain and C-terminal IN binding domain (IBD) (Figure 8a). 274,275 An alternative splice form, LEDGF/ p52, 269 lacks the IBD (Figure 8a) and consequently neither interacts with IN nor has an effect on integration. 276 The extended flexible regions of LEDGF/p75 harbor a classical importin α/β-dependent nuclear localization signal (NLS) 277 and a pair of AT-hook motifs implicated in DNA binding. 278,279 The PWWP domain belongs to the Tudor family and was shown to bind nucleosomes trimethylated on Lys36 of the histone 3 tail (H3K36me3), an epigenetic mark associated with transcription elongation and enriched within transcription units. 280−282 Knockout of the gene encoding LEDGF does not affect cell proliferation in tissue culture but results in enhanced neonatal mortality and developmental abnormalities in mice. 283,284 While its precise cellular functions remain to be elucidated, LEDGF/p75 reportedly interacts with a range of functionally diverse proteins, including the basal RNA polymerase cofactor PC2, mRNA splicing factors, multiple endocrine neoplasia type 1 protein product (menin), methyl CpG binding protein 2 (MeCP2), the end-resection protein CtIP, transcription factor JPO2, and the activating subunit of Cdc7 kinase. 240,269,285−291 LEDGF/p75 was shown to recruit some of the binding partners to chromatin, 276,285,290 suggesting that it may function as an adaptor protein in various chromatin-bound transactions.
Knockdown of LEDGF/p75 abolished the ability of ectopically expressed HIV-1 IN to bind chromatin, which provided the first hint about its role in lentiviral replication. 24,276 However, the functional significance of the LEDGF/ p75-IN interaction in vivo was initially unclear because the first attempts at LEDGF/p75 knockdown obstensibly failed to affect HIV-1 infectious titer and yielded only modest reductions in integration. 24   on a single-molecule integration event, even very limited levels of chromatin-associated LEDGF/p75 can be sufficient to support efficient viral replication. 295 Clearer results came from infections conducted under conditions of intensified LEDGF/p75 knockdown and genetic knockout, which revealed considerable decreases of HIV-1 infectivity with the specific defect at the integration step. 295−299 The analysis of distribution of residual HIV-1 integration sites in LEDGF/p75-depleted cells demonstrated a dramatic loss of transcription unit targeting concomitant with a significant increase of integration near transcription start sites. 296,299,300 The WT phenotype can be rescued upon restoration of LEDGF/p75 expression, confirming a role of the host factor in the targeting mechanism. Furthermore, by swapping the PWWP domain of LEDGF/p75 for alternative chromatin binding domains, it was possible to redirect HIV-1 integration toward chromatin regions bound by the heterologous tether. 301−303 Collectively, the results support a model whereby LEDGF/p75 acts as a chromatin-bound tether that anchors the PIC by engaging its IN component in a direct protein−protein interaction. The results of a recent study that analyzed integration distribution patterns in cells knocked out for LEDGF/p75, CPSF6, or both factors clarified that the CA binding protein CPSF6 predominantly directs the HIV-1 PIC to actively transcribed euchromatin, where LEDGF/p75 determines positions of integration along gene bodies. 265 LEDGF/p75 is additionally involved in HIV-1 latency by recruiting, post-integration, host factors IWS1 and SPT6 to the LTR, leading to the silencing of the provirus. 226 Rapid silencing of the integrated viral genome is frequently observed during HIV-1 infection in proliferating CD4+ T cells. 304−306 By interacting with LEDGF/p75, HIV-1 integration and transcriptional silencing may thus be coordinated to the establishment of long-lived viral reservoirs. 226 The solution structure of the LEDGF/p75 IBD, determined using NMR, revealed a compact domain comprising a pair of HEAT repeatlike α-helical hairpins. 307 On the virus side, the IN CCD is essential and minimally sufficient for the interaction with LEDGF/p75, whereas the NTD is required for highaffinity binding. 276 X-ray crystallography was used to visualize atomic details of the virus−host interaction ( Figure  8b,c). 98,186,308 The hairpin loops at the tip of the elongated IBD structure contact the IN CCD dimer, with side chains of LEDGF/p75 residues Ile365 and Asp366 buried in a small pocket at the CCD dimerization interface, making hydrophobic interactions and a bifurcated hydrogen bond with the IN backbone, respectively. The protein−protein interaction is enhanced by intermolecular contacts involving Lys401, Lys402, and Arg405 on the basic side of the IBD and a conserved cluster of carboxylates on lentiviral IN NTDs. 98 Crucially, LEDGF/p75 contacts both the CCD and the NTD, which cooperate in higher-order IN multimerization and intasome assembly. It is not surprising, therefore, that the host factor enhances tetramerization and the strand transfer activity of HIV-1 IN in vitro. 98,186,309 The pocket on the surface of the HIV-1 IN CCD involved in the interaction with the host factor has been targeted by small-molecule inhibitors, which in their binding mode to IN mimic the interactions made by LEDGF/ p75 Ile365 and Asp366 46,310−315 (Figure 8d,

γ-Retroviruses and BET Proteins
As LEDGF/p75 was shown to target lentiviral integration, it seemed possible that other retroviruses might similarly rely on genus-specific cellular factors to direct integration toward preferred genomic loci. Hence, it did not come as a surprise  when it was discovered that γ-retroviruses hijack cellular transcription factors as targeting factors. 129,319−321 Highly related transcription factors BRD2, BRD3, and BRD4 belong to the bromodomain and extra-terminal (BET) family. These proteins play major roles in transcription regulation 322,323 and had already been implicated in host−pathogen interactions. 324−326 The characteristic features of BET members are two tandem bromodomains and a highly conserved extraterminal (ET) domain within their N-and C-terminal regions, respectively (Figure 9a). 243 Bromodomains belong to the wellstudied group of chromatin readers with specificity for acetylated histone tails, 327 while the ET domains were implicated in binding a range of cellular and viral proteins. 323 In particular, BRD4 was shown to recruit P-TEFb to its target promoters to facilitate transcription elongation of cellular genes. 328 Papilloma viruses tether their genomes to mitotic chromosomes via a direct interaction between the viral E2 protein and the C-terminal motif of BRD4, which allows stable segregation of vDNA copies between daughter cells. 326,329 The latent nuclear antigen of Kaposi's sarcoma associated herpesvirus, essential for the viral episome maintenance and transcription, interacts with the ET domains of BET proteins. 330,331 Pull down experiments revealed a direct high-affinity interaction of γ-retroviral INs with the ET domains of BET proteins; moreover, the latter potently simulated IN strand transfer activity in vitro. 194,319−321 Finally, the function of BET proteins in the context MLV replication was demonstrated using small-molecule inhibitors of the bromodomain−chromatin interaction as well as siRNA-mediated knockdown. The small molecules inhibited MLV but not HIV-1 integration in a dose-dependent manner. 319−321 Notably, MLV integration sites correlate with binding sites of BET proteins as determined by chromatin immunoprecipitation studies. 319,321,332 Treatment of cells with BET inhibitors or a siRNA cocktail specific to BRD2−4 mRNAs significantly reduced the preference of MLV to integrate near transcription start sites. 319,320 As a complementary approach, LEDGF/p75-BRD4 hybrid proteins, containing the LEDGF/p75 PWWP and BRD4 ET domain, retargeted MLV integration toward active transcription units and away from transcription start sites, 321 confirming the major role of the BET proteins in γ-retroviral integration site selection.
Despite highly similar functional consequences, the structural bases for binding of lentiviral and γ-retroviral IN proteins to their cognate host factors are strikingly different. In contrast to lentiviruses, which engage the LEDGF/p75 IBD using a quaternary assembly of IN domains (minimally a CCD dimer), γ-retroviruses bind BET proteins using extended C-termini characteristic to the INs of this genus. 321,333,334 The solution structure of the complex between the BRD4 ET domain and a conserved C-terminal ET binding motif (EBM) of MLV IN showed that the interaction involves the formation of an intermolecular three-stranded antiparallel β-sheet ( Figure  9b). 335 The folding of the interface only occurs upon binding of the two partners, as both the C-terminal tail of MLV IN and the BRD4 ET domain loop are unstructured on their own. The protein−protein interface contains a set of hydrophobic interactions involving buried side chains from both β6 and β7 of MLV IN and residues from helices α1 and α2 and the β1 strand of the ET domain (Figure 9b). The interaction further depends on complementary electrostatics between the negatively charged amino acids from ET domain strand β1 and the highly conserved positively charged residues of MLV C-terminal β7. Mutation of the critical conserved amino acids showed a strong reduction of binding affinity as well as a shift of the integration pattern away from transcription start sites. 320,334−336 Interestingly, BRD4 residues involved in the interaction with MLV IN were shown also to be important for binding its cognate cellular cofactors. 335 Therefore, it seems likely that γ-retroviral IN evolved its C-terminal tail to mimic a cellular BET binding protein in order to optimize integration into transcriptionally active regions.

Integration Site Selection by Other Retroviruses and LTR Retrotransposons
The mechanisms of integration site selection employed by lentiviruses and γ-retroviruses present a remarkable case of convergent evolution, with both genera usurping cellular readers of the histone code to locate optimal target sites. Therefore, it is tempting to speculate that other retroviral genera may use similar strategies. It was recently reported that HTLV-1 and other δ-retroviral INs specifically interact with the B′ protein phosphatase 2A (PP2A) regulatory subunits; moreover, recombinant B′ proteins stimulated concerted integration activity of the δ-retroviral INs in vitro. 337 Although not a classic chromatin binder, PP2A was implicated in dephosphorylating chromatin-resident targets. 338,339 However, it remains to be determined whether PP2A is involved in directing δ-retroviral integration.
LTR retrotransposons, such as well-studied Ty and Tf elements from budding and fission yeasts, share most features of their replication cycle with retroviruses, although they complete it within the cell where they reside. 340 The fundamental difference between a transposon and a virus is that the fate of the former wholly depends on the fitness of the host organism. Therefore, yeast LTR retrotransposons avoid instigating harmful insertional mutagenesis on the cell by precisely targeting new integration events to safe loci, and they achieve it by utilizing IN-binding host proteins. Thus, Ty5 retrotransposition is directed into transcriptionally silent regions of the Saccharomyces cerevisiae genome via the interaction between IN and the heterochromatin maintenance protein Sir4p. 341 Of note, the C-terminal peptide of Ty5 IN engages a patch on Sir4p, which is also recognized by the cellular interacting partner Esc1. 342,343 Another S. cerevisiae retrotransposon, Ty1, engages the AC40 subunit of RNA polymerase III in a direct protein−protein interaction with IN, for specific integration upstream of RNA polymerase III transcribed genes. 344

HIV-1 IN Strand Transfer Inhibitors (INSTIs)
Its essential role in viral replication and the lack of functional equivalents in human cells made IN an ideal target for anti-HIV/AIDS drug development. Intense interest and early screening efforts notwithstanding, the first class of small molecules capable of inhibiting HIV-1 replication by blocking IN was reported only in 2000. 345 The key to identification of these molecules was the use of preassembled HIV-1 IN−vDNA complexes in screening assays. 346 Empirical optimization of the original "diketo acid" pharmacophore led to the discovery of MK0518, now widely known as raltegravir (RAL), 347 Figure 10a). 349,350 Consistent with the method of their initial identification, INSTIs engage the active site of IN only when it is in complex with the vDNA end, competing with target DNA for binding to the intasome. 85 INSTIs specifically inhibit the strand transfer reaction, although they are capable of affecting 3′-processing at greatly elevated concentrations. 345,351 These small molecules possess unusually tight binding to the HIV-1 intasome, with dissociative half-times measuring in hours (for EVG or RAL) or even days (for DTG). 352,353 This property is likely very important for an inhibitor that blocks function of a long-lived complex, such as the PIC, which is geared for a one-off reaction event: to be effective, an INSTI must remain associated with the intasome until the cell destroys the PIC.
Despite their apparent chemical diversity, the INSTIs share two common functionalities: a Mg 2+ chelating core, usually a triad of oxygen atoms attached to a rigid scaffold (colored red in Figure 10a), and a flexibly linked aromatic side chain, typically a halobenzyl group (shown in blue). Due to the conservation of the retroviral IN active site, these small molecules display broad-spectrum activity against diverse retroviruses. 103,354,355 Accordingly, the structural basis for INSTI action could be characterized in the context of the PFV intasome. 85,205,206,356 Soaking the PFV intasome crystals with the different INSTIs invariably results in binding of the drug at the active site. In each studied case, the small molecules engaged the catalytic pair of Mg 2+ ions in the IN active site (Figure 10b). 85 Here, the triad of metal-chelating heteroatoms of the small molecule closely imitates interactions made by the oxygen atoms of the scissile phosphodiesters and the respective nucleophiles during 3′-processing and strand transfer. 156 The aromatic side chain of the INSTI assumes the position normally occupied by the base of the deoxyadenosine on the processed 3′ vDNA end, intercalating between the base of the penultimate deoxycytidine and a short 3 10 helix (designated η in Figure 10b) containing conserved PFV IN residues Pro214 and Gln215, which are equivalent to HIV-1 IN Pro145 and Gln146, respectively. INSTIs also make variable contacts with PFV IN Tyr212 (corresponding to HIV-1 IN Tyr143). In particular, the oxadiazole ring of RAL stacks with the phenolic side chain (Figure 10b, middle panel) and additionally makes a hydrogen bond to the backbone amide of the Tyr residue. Other compounds, such as EVG and DTG, make much less extensive van der Waals contacts with the Tyr side chain.
Crucially, by directly engaging the catalytic metals and displacing the 3′ vDNA nucleotide, the INSTIs are incompatible with target DNA binding and strand transfer. The requirement to displace the 3′ adenosine from its natural position accounts for the slow kinetics of INSTIs binding to the intasome. 357 A further energetic penalty associated with disengagement of a phosphodiester group from the Mg 2+ ions in the intasomal active site explains the relative ineffectiveness of INSTIs to inhibit IN 3′-processing activity. 156 Adapted to operate on the bulky DNA substrate, the intasome harbors a voluminous active site, which is not ideal for the development of small-molecule inhibitors. X-ray structures of the PFV intasome prior to 3′-processing and strand transfer elucidated additional features of the DNA substrates that could potentially be mimicked in future INSTI design. 156 Furthermore, design of small molecules that more completely fill the substrate envelope within the intasome active site could lead to improved inhibitory properties. 358

Viral Resistance to INSTIs
The clinical use of INSTIs has seen the emergence of HIV-1 variants with high-level resistance to RAL and EVG (for detailed reviews see refs 359, 360).
Although not yet documented in INSTI naïve cohorts, resistance to DTG has been described in patients that initially failed RAL-based therapy. 361−363 Due to their identical modes of action, INSTIs show substantially overlapping profiles of HIV-1 resistance mutations. The major genetic pathways leading to RAL resistance and virologic failure in patients are associated with substitutions of HIV-1 IN residues Tyr143 (typically to Cys or Arg), Asn155 (to His), and Gln148 (to His or Arg). 364 Although, on their own, the primary substitutions result in modest levels of drug resistance, their effects are greatly amplified by secondary mutations. Most notably, combinations of Q148R/H with G140S/A in HIV-1 IN result in a loss of viral susceptibility to RAL and EVG, as well as substantial levels of resistance to DTG. 361,362 In the absence of HIV-1 intasome crystals, the PFV model was instrumental to shed some light on the mechanism of viral resistance to INSTIs. 206 The observation that the oxadiazole ring of RAL interacts extensively with the aromatic side chain of Tyr212 readily explained why substitutions of HIV-1 IN residue Tyr143 cause viral resistance to RAL. 85 Because EVG and DTG make only weak contacts with the Tyr residue, their activities are largely unaffected by its substitutions. 206 In contrast to Tyr212, PFV IN residues corresponding to HIV-1 Gln148 and Asn155 (Ser217 and Asn224, respectively) do not make direct interactions with the INSTIs. Nevertheless, akin to the effects of the analogous mutations in HIV-1 IN, S217H and N224H reduced susceptibility of PFV IN to RAL in vitro. 206,356 Crystal structures revealed that the amino acid substitutions result in subtle but significant deformations of the intasomal active site. Expectedly, binding of the relatively rigid INSTIs requires the mutant active site to adopt a WT-like conformation. The energetic cost associated with the rehabilitation of the active site was proposed to be the reason for the apparent reduction in drug binding affinity. 206 A Ser or an Ala residue at HIV-1 IN position 140 is predicted to make direct contacts with the side chain of His148, helping to explain the coevolution of the G140S and Q148H mutations. 206 Ostensibly, a conformational adaptation to a large substrate, such as target DNA or chromatin, which make extensive interactions outside of the active site, will be offset to a lesser degree than small-molecule binding. The difference may provide the mutant viruses a sufficient selective advantage in the presence of the drug. This model accordingly predicts that INSTIs that make more extensive interactions with immutable features of the IN active site will be less affected by mutations. Indeed, INSTIs such as DTG and MK2048, which display relatively long dissociative half-times from the WT HIV-1 intasome, are considerably less affected by the "shape-shifting" Q148H/R and N155H mutations. 356,361,362 These small molecules, commonly referred to as second-generation INSTIs, tend to make van der Waals contacts to the main chain of the β4-α2 loop of the IN active site. 206,356,358,365 Bulkier compounds, which occupy the substrate envelope of the intasomal active site more completely, tend to be more active against the classic RAL-resistant strains. 358 Additionally, main chain IN amides involved in the interaction with the scissile phosphodiesters of the viral and target DNA substrates may provide useful immutable bonding points for the next-generation of INSTIs. 156

Emerging Allosteric Inhibitors of HIV-1 IN
Despite the great success of the combinatorial therapeutic approach against HIV/AIDS, new infections emerge from drugresistant strains. In addition to developing new derivative compounds against current targets, new drugs that inhibit untapped steps of the HIV-1 replication cycle are of great importance. In the case of IN, the design of drugs that target positions different from the active site has the advantage of remaining theoretically potent against INSTI-resistant strains. Among the noncatalytic site inhibitors described so far, the most promising molecules target the LEDGF/p75 binding pocket at the HIV-1 IN CCD dimerization interface; these new molecules, which go by a variety of names (see refs 366, 367 for detailed reviews), will be referred to here as allosteric IN inhibitors (ALLINIs, Figure 8d).
Results of biochemical and cell-based infection assays provided initial evidence that the CCD−CCD interface might serve as a target for antiviral drug development. X-ray crystallography was used to screen for small-molecule binders of the HIV-1 IN CCD, and micromolar concentrations of one compound, 3,4-dihydroxyphenyltriphenylarsonium bromide, which engaged the CCD dimer interface at a region that was later confirmed as the LEDGF/p/75 binding site, inhibited IN 3′-processing and strand transfer activities in vitro. 368 Overexpression of LEDGF/p75 IBD-containing proteins that lacked chromatin-binding activity moreover inhibited HIV-1 replication at the integration step. 295,369 Further highlighting the host factor-binding pocket for drug development, the combination of IBD protein overexpression with LEDGF/p75 knockdown could effectively cripple HIV-1 infection. 370 The most advanced ALLINIs, derived from quinoline-based acetic acid, were independently discovered using two different approaches. Debyser and colleagues 312 used the HIV-1 IN CCD-LEDGF/p75 IBD cocrystal structure to screen in silico for LEDGF/p75 binding site inhibitors, whereas another group discovered highly similar small molecules in a high-throughput screen for antagonists of IN 3′-processing activity. 313,371 Optimized ALLINI compounds are highly potent, inhibiting HIV-1 replication with effective concentration 50% (EC 50 ) values in the low (∼10−100) nanomolar range. 310,313,315,317 This family of small molecules induces IN multimerization and inhibits IN catalysis 310,311,316,317 and IN-LEDGF/p75 binding [310][311][312]316,317 in vitro. Cocrystal structures of the inhibitors with the IN CCD dimer revealed that the molecules are bona fide allosteric inhibitors, as they engage the LEDGF/p75 binding site, which is distal from the enzyme active site. 46 pharmacophore is the carboxylic moiety (shown red in Figure  8d), which hydrogen bonds with the backbone amides of IN residues Glu170 and His171, mimicking the LEDGF/p75 Asp366 bidentate interaction with IN (Figure 8e). Another important feature is an aromatic side chain (blue in Figure 8d), which mimics hydrophobic interactions made by Ile365 of LEDGF/p75. Initial experiments expectedly unveiled inhibition of integration during HIV-1 infection, with drug resistance mapping to the IN coding portion of the viral pol gene. 310,312 However, follow up work revealed that the compounds are far more potent when they are present during the late phase of the HIV-1 lifecycle as compared to the acute phase of infection, when reverse transcription and integration occur. 45,46,317,372 Consistent with the in vitro data, ALLINIs induce HIV-1 IN multimerization in the context of virus particles, which results in a catastrophic defect during virion maturation. 45,46,372 The viral ribonucleoprotein (RNP) complex, composed mainly of viral RNA and nucleocapsid protein, is normally housed within a conical core composed of the CA protein. ALLINIs apparently uncouple the internal placement of the RNP within the core, yielding so-called "eccentric" virions with the RNP situated outside of the core, usually in association with the viral membrane. [45][46][47]372 Virion protein, RNA, and cellular tRNA content are unaffected by ALLINI treatment, 45,46,372,373 and the defective virions accordingly support normal levels of endogenous RT activity in vitro. 45,373 When present during the acute phase of HIV-1 infection, ALLINIs specifically inhibit integration without affecting the preceding reverse transcription step. 45,46,310,312,315,317 Although drug-treated particles enter target cells normally, 45,46 they are reportedly defective for reverse transcription, integration, 45,46,315,372 and nuclear import of the PIC. 372 Careful comparisons of dose−response curves for inhibition of particle maturation, reverse transcription, and HIV-1 infection, however, suggest that the ability of the compounds to inhibit virion maturation accounts for their full antiviral activity. 47 In other words, drug-treated viruses are defective for reverse transcription because the misplaced RNP is unable to support DNA synthesis in the subsequently infected cell, as compared to a direct inhibition of reverse transcription by ALLINIs. Intriguingly, the range of replication defects ascribed to ALLINI-treated virions is reminiscent of the pleiotropic nature of class II IN mutations on HIV-1 replication (section 2 above). Moreover, class II IN mutant virus particles harbor an eccentric RNP that is indistinguishable from the one induced by ALLINI treatment. 30,45,46,223 By extension, we suspect that the inability to encapsidate the RNP into the viral core underscores the majority of replication defects ascribed to class II HIV-1 IN mutant viruses. The ability to form eccentric RNPs has been shown to depend on the presence of viral RNA, indicating that IN may normally engage the viral genome, either directly or indirectly, to orchestrate RNP encapsidation into the viral core. 47 ALLINI potency is significantly increased when target cells are depleted for LEDGF/p75, suggesting that the host factor can compete with the compounds during the acute phase of infection. 46,298,315,374 By contrast, LEDGF/p75 depletion or overexpression does not influence drug potency during particle assembly. 45,46,297,315 As LEDGF/p75 associates constitutively with chromatin, 171,268,276 it seems that the inability of LEDGF/ p75 to compete for compound binding to IN during HIV-1 particle morphogenesis, which occurs at the cell periphery or after the virus exits the cell, accounts for the unique pharmacology of this anti-IN drug class.
As might be expected from their known binding site at the IN CCD dimer interface, ALLINIs retain potency against INSTI-resistant HIV-1 strains. 312,316,317,375 Unfortunately, ALLINIs seem to possess a relatively low genetic barrier to resistance, as several mutations mapping to the LEDGF/p75 binding cavity that greatly reduce drug potency have been described following ex vivo virus passage. 310,312,315,316,375 Nevertheless, because these molecules perform well in concert with INSTIs, 316,375 their clinical development is of great interest.

RETROVIRAL INTEGRATION AS A THERAPEUTIC
TOOL Due to the natural trait to stably integrate their genetic cargo into a cell chromosome, retroviruses have long been studied as tools for corrective gene therapy (see ref 376 for a current overview). Replication-defective vectors derived from MLV were used in pioneering studies to correct a variety of crippling diseases, including X-linked severe combined immunodeficiency (SCID-X1), 377 Wiskott−Aldrich syndrome, 378 and chronic granulomatous disease. 379,380 However, a significant number of patients from these trials developed severe adverse effects due to clonal expansion of the treated cells, leading to leukemia. 378,381−383 Detailed characterization revealed that many of these genotoxic events resulted from the integration of the MLV vector in the vicinity of a growth-promoting protooncogene, such as LMO2. 378,382−384 Deregulation of protooncogene expression has long been known as a driver for retroviral genotoxicity, 385 and the safety of retroviral vectors for human gene therapy applications has accordingly become a major priority in their development. An optimized vector would in theory efficiently express its transgene cargo over a long period of time without displaying adverse side effects on cellular gene expression or physiology.
The propensity for MLV to target integration to cellular promoters and enhancers, 241,242,245 which was virtually unknown during the planning stages of the initial SCID-X1 trials, in hindsight likely made MLV an unfortunate choice for treatment. However, the field now has a much broader appreciation of how different types of retroviruses target potentially unsafe genomic features such as genes and enhancer regions. 243 In this vein, vectors based on α-retroviruses, 386 βretroviruses, 387 or spumaviruses, 388 each of which targets genes and enhancers to lesser extents than lentiviruses and γretroviruses, respectively, might prove safer than MLV-based vectors. The identification of the mechanisms of integration targeting for the lentiviruses and γ-retroviruses has additionally opened up new approaches to vector design. As one example, MLV-derived vectors show greatly reduced propensity to integrate nearby transcriptional start sites in the presence of BET protein inhibitors, 319 raising the possibility of using such inhibitors with MLV vectors in the clinic. Several drawbacks however seem likely to limit such approaches. In addition to potential small-molecule toxicity, MLV retained a partial tendency to target promoter-proximal regions in the presence of BET inhibitors such as JQ-1. 319 Furthermore, MLV vector titer was reduced significantly by JQ-1 treatment due to inhibition of integration. 319−321 An alternative strategy is to delete the IN C-terminal tail region that mediates BET protein binding. 321,334 Such vectors greatly reduce promoter-proximal integration with only a modest effect on vector titer. 334,389 It Chemical Reviews Review will be instructive to determine the carcinogenic potential of such IN deletion constructs in animal models of MLV pathogenesis.
A separate approach to MLV vector modification looks extremely promising from initial clinical trials. The viral promoter and enhancer, which are situated within the U3 region of the LTR, can be deleted, resulting in so-called selfinactivating (SIN) vectors without severely affecting reverse transcription or integration. 390 An SCID-X1 trial with an MLV SIN vector has failed to detect evidence for leukemia during an initial 1−3 year observational period, indicating that deletion of the viral enhancer might go a long way to improve MLV-based vector safety. 391 As fusion proteins between the LEDGF/p75 IBD 301−303 or BET protein ET domain 321 and chromatin binding modules can effectively retarget integration, such protein hybrids could in theory be used to steer lentivirus or γ-retrovirus integration out of harm's way. For example, a LEDGF/p75-based fusion harboring the HP1α heterochromatin protein yielded an overall integration pattern that was remarkably similar to random. 301 A key drawback of such approaches is the requirement to express the retargeting factor in the cells that will receive the retroviral vector. This approach accordingly seems to hold considerably less promise than the attempts to reduce or eliminate genotoxicity through direct vector modification.
Notwithstanding highly significant genetropic integration targeting by the lentiviruses, there is little evidence to suggest an operational link between HIV-1 integration and carcinogenesis. A primary reason may be the highly cytopathic nature of infection: infected CD4-positive T cells, the primary targets of the virus, display an average half-life of only ∼1.5 days. 392 However, cancer-related genes are targeted about 5-fold more frequently by HIV-1 PICs than expected. 240 Integrations in the vicinity of growth-promoting genes moreover can help to drive the clonal expansion of cells that constitute the latent viral reservoir. 227,228 Numerous HIV-1 gene products, including the surface envelope glycoprotein Gp120 393 and viral protein R (Vpr), 394 are acutely cytopathic, and such genes are deleted from HIV-based vectors. As is the case with MLV vectors, it is critical to monitor HIV-based gene therapy trials for sites of vector DNA integration by deep sequencing to catch potential longitudinal emergence of dominant cell clones in patients.
As is the case with MLV SIN vectors, results of preliminary clinical trials with HIV-based vectors look promising, with little to no evidence for the type of clonal dominance that was observed in initial MLV-based trials. 395−398 The ability to safely integrate a corrective transgene in a long-lasting target cell can in theory be expected to go a long wayperhaps for a patient's lifetimeto correct certain debilitating diseases. The field accordingly cautiously awaits long-term follow up of ongoing retroviral-based gene therapy trials.

PERSPECTIVES
During the past 30 years the field of retroviral integration progressed from an epitome of experimental hardship and enigma to arguably the best-understood DNA recombination system. Combined efforts of academic groups and leading pharmaceutical companies resulted in the discovery of potent HIV-1 IN inhibitors, characterization of the first cellular cofactors of retroviral INs, and elucidation of the mechanistic and structural details of retroviral integration.
One clear vector for future research will be expansion of the intasome structure repertoire. In particular, elucidation of HIV-1 or lentiviral intasome structure will be of great importance to help the rational design of next-generation INSTIs and improved ALLINIs. Furthermore, characterization of the HIV-1 intasome may help to identify new pockets that are potentially druggable by allosteric inhibitors. The development of resistance to the antiretroviral drug arsenal is a major health issue, making it important to find new therapeutic strategies. Many steps modulating retroviral integration are still poorly described and would necessitate further investigation. Among these is the mechanism by which retroviruses protect themselves against autointegration. One could imagine the development of small molecules that could be used to trigger suicidal autointegration before the PIC encounters host chromatin.
Another open question is the mechanism of retroviral STC disassembly. Since the available TCC and STC structures are very similar, how IN signals the completion of strand transfer to the cell to allow DNA repair remains unknown. Unravelling the mechanism and the cellular proteins involved in this process may well lead to the development of new drugs targeting these steps. A more detailed characterization of the unexpected role of IN in HIV-1 particle morphogenesis could also lead to new ways of inhibiting viral replication.
Studies on retroviral IN host factors have clarified the molecular mechanisms underlying integration site selection. The recent description of MLV IN targeting factors BRD2−4 together with the lentiviral IN cofactor LEDGF/p75 established the concept of bimodal tethering as a major mechanism for integration site selection. These discoveries opened new windows toward tuning the specificity of retroviral integration. Future studies will elucidate if the other retroviral families rely on similar pathways to select suitable chromatin environments and will hopefully give important insights into improving the safety of retroviral gene therapy vectors. Given the parallel development of gene-editing technologies such as CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 (reviewed in ref 399), there has been in recent years a resurgence of interest in human gene therapy. Yet, notwithstanding its lauded efficiency, gene editing using CRISPR-Cas9 involves generation of a double-strand DNA break at the target locus, a highly genotoxic chromosomal lesion in its own right. Therefore, it remains to be seen which methodology might evolve into a safer therapy approach in the future.

Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Notes
The authors declare no competing financial interest.

ACKNOWLEDGMENTS
We thank H. Aihara, M. Kvaratskhelia, and M. Foster for sharing data prior to publication and V. Pie for critical reading of the manuscript. This work was in part funded by US National Institutes of Health grants R37 AI039394 and R01 AI052014 (to A.N.E) and P50 GM082251 (to P.C.). This work was additionally supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust.