Repair of Oxidative DNA Base Damage in the Host Genome Influences the HIV Integration Site Sequence Preference

Host base excision repair (BER) proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1) and mutY homolog (MYH) as well as DNA polymerase beta (Polβ). While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5′dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggeted Polβ DNA synthesis activity is not necessary while 5′dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.


Introduction
Retroviruses are defined by two enzymatic activities, reverse transcription and integration [1]. The viral enzyme reverse transcriptase copies the viral genomic RNA to a linear double stranded cDNA. The cDNA is part of a pre-integration complex (PIC) that is poorly understood. The PIC travels to the nucleus where the viral protein integrase mediates the covalent joining of the viral cDNA to the host chromosome. The two ends of the viral cDNA are joined to the host DNA 4-6 base pairs apart, depending on the retrovirus. In the case of HIV, 5 base pairs separate the points of joining, the distance across one major groove ( Figure 1A). The process of integration yields an integration intermediate comprised of the viral cDNA flanked by 5 base gaps of host sequence and 59 dinucleotide flaps of viral sequence. Host enzymes are assumed to repair this integration intermediate, but identities of the specific repair proteins remain unknown [2].
Integration sites from multiple retroviruses have been sequenced and mapped in relation to chromatin features such as transcription units, promoters, CpG islands, and local G:C content [5,[20][21][22][23]. Retroviruses have distinct integration site preferences. For example, HIV preferentially integrates to transcription units but does not favor promoters while MLV favors integration near promoter sequences. The sequence at the site of integration, including the 4-6 base pairs between and up to three base pairs flanking the points of joining, show a palindrome of preferred and disfavored nucleotides ( Figure 1B, [24,25]). Sequences from cytoplasmic HIV PICs integrated to a naked DNA target showed that the central 5 base pairs retained the sequence preference as well as positions 23 and 6 [25]. This comparison suggests that many base pairs at the integration site are recognized by integrase, but additional host factors may also influence integration at some flanking nucleotides. While deletion of LEDGF leads to significant differences of integration targeting to genomic features in vivo, it had no effect on the palindromic sequence preference [5,26].
The short patch oxidative base excision repair (BER) pathway has been implicated in the integration of HIV [27,28]. The BER pathway recognizes damage on individual bases, including deamination, alkylation, and oxidation [29][30][31]. Glycosylases recognize specific damages of DNA bases. The damaged base is removed by a glycosylase leaving an abasic site. The essential enzyme apurinic/apyridimic endonuclease (APEX1) recognizes the abasic site and cleaves the DNA backbone at the 59 side of the lesion to generate a 39 hydroxyl and a 59 deoxyribose phosphate (59dRP) flap. In the short patch BER pathway, DNA polymerase b (Polb) synthesizes one base and also removes the 59dRP lesion. Repair is completed when the remaining nick is ligated by a heterodimer of Lig3 and XRCC1 or via LigI [32]. Proteins throughout the pathway from oxidative base damage recognition through the final ligation step were identified in an siRNA screen for DNA repair factors affecting HIV infection [27]. Reduced expression of oxidative BER genes led to a decrease of HIV infection efficiency. The absence of BER proteins was characterized to reduce HIV integration [27,28].
The mechanism of BER mediated effects on HIV integration has not been shown. One formal possibility is repair of the gapped integration intermediate by BER. We have further explored the effects of BER proteins on HIV integration with well-defined BER mutant cell lines. The previously observed effects of the BER pathway on HIV integration do not appear related to gap repair of an integration intermediate. Instead, oxidative glycosylases with specific base recognition may affect the sequence preference of HIV integration. This suggests that integration may be favored at sites of oxidative DNA repair intermediates.
Matched wild type, OGG1 null, and MYH null murine embryonic fibroblasts were also derived from littermates and have been previously described [28,34]. All cells were grown in DMEM supplemented with 10% fetal bovine serum, GlutaMAX glutamine, and Penicillin-Streptomycin (Gibco, Life Technologies). The media of transfected Polb null cell lines was supplemented with 600 mg/ml geneticin. Cells were grown at 37uC with 10% CO 2 .

HIV vector particles
HIV lentiviral vectors expressing GFP were generated by triple transfection of human fibroblast 293T cells with a packaging construct DR9, a viral genomic RNA plasmid p156RRLsinPPTCMVGFPPRE, and a plasmid expressing the envelope gene VSV-G [35,36]. Supernatants containing vector particles were sterile filtered to remove producer cells and treated with DNaseI to digest producer plasmids. Target cells were plated in 6 well dishes at a density to achieve 2610 5 cells 24 hours after plating. Cell density was verified by counting with trypan blue (Gibco, Life Technologies). Vector particles were added to target cell media in the presence of 10 mg/ml DEAE-Dextran (Sigma Aldrich). The MOI of wild type Polb cells in Figure 2 was 0.4 and in Figure 3 was 0.3. Cells were analyzed for infection efficiency 72 hours post infection by fixing with 4% paraformaldehyde (Sigma Aldrich) and scanning for GFP expression with a FACScalibur (BD Biosciences). Flow cytometry data was analyzed with FlowJo software. Cells were exposed to varying concentrations of H 2 O 2 (Sigma Aldrich) in PBS for one hour at 37uC prior to infection.

Integration site sequencing
Cells were infected at a 0.8 multiplicity of infection and grown for 7 days. Genomic DNA was harvested by DNeasy (Qiagen) and prepared for sequencing as described [37,38]. All sequences had perfect homology to the terminal 18 bp of HIV U3 and mouse sequences were at least 20 bp with .98% identity. Sequences were mapped to the mouse genome using BLAT [39]. The number of individual integration sites analyzed included 240 from untreated wild type cells (GenBank Accession numbers KG960523-KG960762), 105 from wild type cells treated with 10 mM H 2 O 2 (GenBank Accession numbers KG960306-KG960410), 112 from wild type cells treated with 30 mM H 2 O 2 (GenBank Accession numbers KG960411-KG960522), 245 from OGG1 null cells (GenBank Accession numbers KG960986-KG961230), and 223 from MYH null cells (GenBank Accession numbers KG960763-KG960985). The frequency of G or C in these murine cells is 0.205 and the frequency of A or T is 0.295. The observed frequencies in integration sites are expressed as the difference from random frequencies 0.205 or 0.295. Statistical significance of deviations of base frequency was determined by a two-tailed binomial test in Microsoft Excel. Statistical analysis of integration sites compared to genomic elements was by Fisher's exact test using GraphPad Prism version 4.0 for Macintosh (GraphPad Software).

Results
Polb 59dRP lyase activity, but not polymerase activity, is necessary for efficient HIV infection  [28]. Polb null murine embryonic fibroblasts were complemented with an empty vector or the wild type Polb cDNA (Figure 2A, B) [33]. The cells were infected with an HIV based retroviral vector expressing GFP following successful integration [36]. Infection efficiency was measured by flow cytometry of GFP positive cells. Complementation with the wild type Polb cDNA, but not the empty vector, completely rescued infection efficiency ( Figure 2F, Polb2/2 compared to Polb2/2 complemented with an empty expression vector p = 0.91, Polb2/2 compared to Polb2/2 complemented with a wild type Polb transgene p, 0.0001).
Polb has two distinct enzymatic activities which may be distinguished by mutations at two separate active sites [33]. In addition to polymerase activity, the lyase activity of Polb cleaves a 59dRP flap that occurs during BER. DNA synthesis activity is abolished by the mutation Polb(D256A); a triple mutation Polb(K35A/K68A/K72A) abrogates the 59dRP lyase activity [33]. Polb null cells were complemented with mutant transgenes affecting only the polymerase or the 59dRP lyase active site ( Figure 2C, D). These cells were infected with an HIV based retroviral vector expressing GFP and assayed by flow cytometry. Complementation with a polymerase defective, 59dRP lyase active Polb transgene was able to rescue HIV infection efficiency to wild type levels ( Figure 2F, Polb2/2 compared to Polb2/2 complemented with a Polb(D256A) transgene p = 0.01). However, a polymerase active, 59dRP lyase defective transgene was unable to rescue HIV infection efficiency ( Figure 2F, Polb2/2 compared to Polb2/2 complemented with a Polb(K35A/K68A/K72A) transgene p = 0.21). This data suggests that the polymerase activity of Polb is not necessary for HIV integration and does not support a model for BER repair of a gapped integration intermediate. The data does indicate that the 59dRP lyase activity of Polb is necessary for efficient integration. A combination mutant protein defective for both 59dRP lyase and polymerase activities was also unable to rescue the HIV infection phenotype ( Figure 2E, F, Polb2/2 compared to Polb2/2 complemented with a Polb(D256A, K35A/K68A/K72A) transgene p = 0.14). The presence of enzymatically inactive Polb protein is not sufficient to support HIV integration.  If the oxidative BER pathway is not repairing the integration intermediate, it must enhance integration by an alternative mechanism. The most common form of oxidative DNA base damage is 8-oxo-guanine (8-oxo-dG) [29]. The sequence preference at HIV integration sites includes multiple G:C base pairs ( Figure 1B). This observation suggested that sites of oxidative DNA base repair might influence HIV integration; the sequence preference at integration sites could be 8-oxo-dG:C base pairs, which are not detectable by sequencing integration sites. Of the oxidative BER proteins, OGG1 and MYH have specificity for only 8-oxo-dG [30]. OGG1 directly recognizes 8-oxo-dG and initiates repair by removing the damaged base ( Figure 3A). During replication 8-oxo-dG mispairs with A, resulting in an 8-oxo-dG:A mismatch. MYH specifically recognizes and binds both bases in this mismatch, but does not remove the damaged base. Instead MYH excises the mispaired A and leaves the damaged 8-oxo-dG, allowing a subsequent round of repair by OGG1 that will remove the 8-oxo-dG ( Figure 3A) [40]. The remaining BER proteins, including Polb, participate in repair of multiple oxidatively damaged bases.
Murine embryonic fibroblasts with deletions of OGG1 or MYH and matched wild type cells were infected with an HIV based retroviral vector. The genomic DNA of infected cells was purified, integration sites were subcloned, sequenced, and mapped to the mouse genome [37]. The sequence preference of integration sites from wild type cells was similar to that previously observed during HIV infection (Figure 4, [24,25]). In the absence of OGG1 or MYH there was no change of the integration site base preference within the central 5 bp duplication.
In OGG1 null cells integration sites showed a loss of any statistically significant preference for G at position 21 or C at position 22 (Figure 4). In MYH null cell lines there is a loss of base preference for C at positions 22 and 7. Although C is not recognized by either repair protein, MYH or OGG1 would recognize 8-oxo-dG on the opposite strand ( Figure 3A). Thus the deletion of 8-oxo-dG specific glycosylase genes MYH and OGG1 specifically alters the HIV integration site preference at G:C base pairs near the points of joining. These changes are reminiscent of the HIV PIC integration sequence preference in vitro which also loses preference for G:C base pairs at positions 22, 21, and 7 [25].
Wild type cells were also treated with the oxidative damageinducing agent H 2 O 2 immediately before HIV infection. Treatment of wild type cells with 10 mM H 2 O 2 has no effect on viability, but treatment with 25 mM H 2 O 2 reduces viability to less than 40% ( Figure 3B). Thus 10 mM H 2 O 2 is less than and 25 mM H 2 O 2 is greater than the 50% lethal dose in this cell line. The sensitivity of these wild type cells to H 2 O 2 is not affected by HIV infection   ( Figure 3B). The major base damage induced by H 2 O 2 is on guanines [41]. The integration site sequence preference for 10 mM or 30 mM H 2 O 2 was similar to untreated wild type cells (Figure 4). However, the sequence preference observed in 30 mM H 2 O 2 treated cells was expanded to the 23 and 8 positions. While T and A continue to be favored at positions 23 and 8, respectively, H 2 O 2 treatment led to a statistically significant disfavor of C at position 23 and G at position 8 ( Figure 4). Increased oxidative damage at the time of infection appears to augment the HIV integration site sequence preference.
Integration sites were also analyzed for their proximity to genomic level elements. HIV shows a preference for integration in transcription units, but not near promoters. Deletion of the known HIV integrase co-factor LEDGF dramatically changes the integration preference for these elements [5]. The integration sites from all of these cell lines or wild type cells treated with H 2 O 2 showed a preference for integration to transcription units ( Figure 5A, p,0.05 for all cell lines), but no preference for integration near promoters or CpG islands ( Figure 5B and C, p. 0.2 for all cell lines at both promoters and CpG islands). Deletion of LEDGF significantly alters the integration profile of G:C content surrounding the integration sites [5]. The profiles of G:C content within 5 kb around the integration sites are similar for all the BER cell lines ( Figure 6). Thus the chromatin markers of integration sites are not affected by deletion of OGG1 or MYH or treatment with H 2 O 2 .

Discussion
DNA repair proteins throughout the oxidative BER pathway were identified in an siRNA screen for factors affecting HIV infection [27]. Other members of the BER pathway that mediate repair of methylation, alkylation, or deamination base damage had no effect on HIV infection efficiency. Oxidative damage of DNA is always present in cells and BER proteins are expressed throughout the cell cycle [42,43]. Presence of oxidative damage and oxidative DNA damage repair proteins appear to correlate with successful HIV infection.
One obvious mechanism for the role of a DNA repair pathway during HIV infection could be the repair of the gapped integration intermediate. However, several observations argue against this hypothesis. First, deletion of BER proteins does not affect infection efficiency of a gamma retrovirus [28]. It seems probable that the DNA repair pathway responsible for repair of integration intermediates would broadly repair this structure for all retroviruses. Second, there is no obvious role for glycosylases during gap repair. Finally, the polymerase activity of Polb has no effect on HIV infection efficiency (Figure 2). Instead the 59dRP lyase activity of Polb appears to be important. The Polb 59dRP lyase active site is not likely able to accommodate the 59 dinucleotide flap present in the HIV integration intermediate [44,45]. A previous study of this repair event shows that wild type recombinant Polb was not able to digest the 59 dinucleotide flap of a model gapped integration intermediate in vitro [2]. Thus the role of the Polb 59dRP lyase activity is also not likely part of the integration gap repair event. While oxidative BER proteins may participate in repair of retroviral gapped integration intermediates, they are not absolutely required; there are likely multiple redundant proteins that participate in this repair [2]. Instead it appears that oxidative BER proteins identified in an siRNA screen may directly affect integration targeting at the sequence level.
The host protein LEDGF has been shown to directly bind the HIV integration complex and direct integration to sites within chromatin [46][47][48]. However, LEDGF has no effect on the sequence preference at the site of integration [5,49,50]. Deletion or reduced expression of the LEDGF gene leads to significantly decreased HIV infection [4,5]. However, cytoplasmic or nuclear pre-integration complexes from LEDGF null cells show integration efficiency to a naked DNA target equal to PICs from wild type cells [5]. PICs from BER null cells show reduced integration efficiency to naked DNA targets compared to PICs from wild type cells, which can be rescued by addition of recombinant Polb protein [28]. LEDGF appears to mediate HIV integration by tethering the PIC to chromatin, but not naked DNA [18]; BER proteins appear able to affect HIV integration targeting in naked DNA. Thus LEDGF affects HIV integration targeting on a chromatin scale, while BER proteins may affect integration targeting on a nucleotide scale.
One model for BER protein effects on targeting HIV integration is for direct protein binding with the integration complex. This model seems unlikely due to known binding of LEDGF to integrase and steric interference [46][47][48][51][52][53][54]. Alternatively, the oxidative BER proteins may affect the DNA or chromatin in a way that enhances HIV integration efficiency. Oxidative damage of genomic DNA is constant and BER proteins are constitutively expressed [42,43]. Glycosylases that do not recognize oxidative damage were shown to have no effect on HIV infection efficiency [27]. A major difference between oxidative glycosylases and all other glycosylases is their DNA product. All glycosylases remove a base leaving an abasic site. Oxidative glycosylases have AP (apurinic or apyridimic) lyase activity which nicks the sugar phosphate backbone at the abasic site. MYH is the only oxidative damage glycosylase that does not have AP lyase activity; however, MYH must always act in concert with OGG1 which does have AP lyase activity ( Figure 3A). Hence, all oxidative base excision repair will result in an abasic site with a nick. HIV infection efficiency is reduced in the absence of glycosylases that have AP lyase activity or in the absence of Polb 59dRP lyase activity ( [27] and this work).
An alternative model for the role of oxidative BER proteins during HIV integration is based on the nature of the oxidative BER DNA intermediates. In this model BER single base gap DNA intermediates, rather than the BER proteins, are the true mediators of enhanced HIV integration. The single base gap of short patch BER increases the local flexibility of DNA more than a nick or an abasic site [55][56][57]. HIV integration is enhanced by an obtuse angle of bent DNA and inhibited in an acute angle [58]. Thus HIV strand joining may be favored on a helical strand with local oxidative base repair (such as positions 21 and 6) and disfavored on the opposite strand (such as positions 23, 22, 7 and 8). The base preference of HIV integrase is subtle, requiring analysis of many integration sites. The observed differences in sequence preference are also subtle due in part to the apparent inherent promiscuity of HIV integrase as well as the redundancy of BER pathway proteins. For example, OGG1 only recognizes oxidatively damaged guanine, but other glycosylases also recognizes damaged guanine.
Additional studies are required to elucidate the roles of DNA lyase activity during HIV integration. Only glycosylases with AP lyase activity enhance HIV infection efficiency. While it is impossible to isolate the lyase activity of bifunctional glycosylases, it is possible to differentiate the DNA synthesis and 59dRP lyase activities of Polb. Whether the AP lyase activity of glycosylases and the 59dRP lyase activity of Polb play the same function during integration also requires further investigation. However, this data suggest the intriguing possibility that DNA lyase activity may be a novel enzymatic target for anti-retroviral therapies.

Conclusions
HIV integration has a preference for both nucleotide sequence and chromatin features. The host protein LEDGF targets HIV to integrate near genomic elements but has no effect on the consensus sequence preference of integration sites. The oxidative BER pathway may influence the sequence preference at HIV integration sites. Additionally, host DNA lyase activity is a potential target for novel HIV inhibitors.