Distinct chromatin functional states correlate with HIV latency reactivation in infected primary CD4+ T cells

Human immunodeficiency virus (HIV) infection is currently incurable, due to the persistence of latently infected cells. The ‘shock and kill’ approach to a cure proposes to eliminate this reservoir via transcriptional activation of latent proviruses, enabling direct or indirect killing of infected cells. Currently available latency-reversing agents (LRAs) have however proven ineffective. To understand why, we used a novel HIV reporter strain in primary CD4+ T cells and determined which latently infected cells are reactivatable by current candidate LRAs. Remarkably, none of these agents reactivated more than 5% of cells carrying a latent provirus. Sequencing analysis of reactivatable vs. non-reactivatable populations revealed that the integration sites were distinguishable in terms of chromatin functional states. Our findings challenge the feasibility of ‘shock and kill’, and suggest the need to explore other strategies to control the latent HIV reservoir.


Introduction
Antiretroviral therapy (ART) has transformed HIV infection from a uniformly deadly disease into a chronic lifelong condition, saving millions of lives. However, ART interruption leads to rapid viral rebound within weeks due to the persistence of proviral latency in rare, long-lived resting CD4 + T cells and possibly in tissue macrophages . HIV latency is defined as the presence of a transcriptionally silent but replication-competent proviral genome. Latency allows infected cells to evade both immune clearance mechanisms and currently available ART, which is based solely on the elimination of actively replicating virus.
An extensively investigated approach to purging latent HIV is the 'shock and kill' strategy, which consists of forcing the reactivation of latent proviruses ('shock' phase) with the use of latency reversing agents (LRAs), while maintaining ART to prevent de novo infections. Subsequently, reactivation of HIV expression would expose such cells (shocked cells) to killing by viral cytopathic effects and immune clearance ('kill' phase). A variety of LRAs have been explored in vitro and ex vivo, with only a few candidates being advanced to testing in pilot human clinical trials. Use of histone deacetylase inhibitors (HDACi: vorinostat, panobinostat, romidepsin, and disulfiram) in clinical studies has shown increases in cell-associated HIV RNA production and/or plasma viremia after in vivo administration (Archin et al., 2012a;Elliott et al., 2015;Elliott et al., 2014;Rasmussen et al., 2014;Søgaard et al., 2015). However, none of these interventions alone has succeeded in significantly reducing the size of the latent HIV reservoir (Rasmussen and Lewin, 2016).
Several obstacles can explain the failure of LRAs, as reviewed in (Margolis et al., 2016;. However, the biggest challenge to date is our inability to accurately quantify the size of the reservoir. The absolute quantification (number of cells) of the latent reservoir in vivo (and ex vivo) has thus far been technically impossible. The most sensitive, quickest, and easiest assays to measure the prevalence of HIV-infected cells are PCR-based, quantifying total or integrated HIV DNA or RNA transcripts. However these assays substantially overestimate the number of latently infected cells, due to the predominance of defective HIV DNA genomes in vivo (Bruner et al., 2016;Ho et al., 2013). The best currently available assay to measure the latent reservoir is the relatively cumbersome viral outgrowth assay (VOA), which is based on quantification of the number of resting CD4 + T cells that produce infectious virus after a single round of maximum in vitro T-cell activation. After several weeks of culture, viral outgrowth is assessed by an ELISA assay for HIV-1 p24 antigen or a PCR assay for HIV-1 RNA in the culture supernatant. Importantly, the number of latently infected cells detected in the VOA is 300-fold lower than the number of resting CD4 + T cells that harbor proviruses detectable by PCR.
This reliance on a single round of T-cell activation likely incorrectly estimates the viral reservoir for two reasons. First, the discovery of intact non-induced proviruses indicates that the size of the latent reservoir may be much greater than previously thought: the authors estimate that the number may be at least 60 fold higher than estimates based on VOA (Ho et al., 2013;Sanyal et al., 2017). This work and that of others  highlight the heterogeneous nature of HIV latency and suggest that HIV reactivation is a stochastic process that only reactivates a small fraction of latent viruses at any given time (Dar et al., 2012;Ho et al., 2013;Singh et al., 2010;Weinberger et al., 2005). Second, the ability of defective proviruses to be transcribed and translated in vivo (Pollack et al., 2017): this study shows that, although defective proviruses cannot produce infectious particles, they express viral RNA and proteins, which can be detectable by any p24 antigen or PCR assays used for the reservoir-size quantification.
Thus, current assays underestimate the actual number of latently infected cells, both in vivo and ex vivo, and the real size of HIV reservoir is still to be determined. Therefore, it has been difficult to judge the potential of LRAs in in vitro (latency primary models), ex-vivo (patients' samples) and in vivo (clinical trial) experiments.
HIV latency is a complex, multi-factorial process (reviewed in [Dahabieh et al., 2015]). Its establishment and maintenance depend on: (a) viral factors, such as integrase that specifically interacts with cellular proteins, including LEDGF, (b) trans-acting factors (e.g., transcription factors) and their regulation by the activation state of T cells and the environmental cues that these cells receive, and (c) cis-acting mechanisms, such as the local chromatin environment at the site of integration of the virus into the genome. Recent evidence has also highlighted the association of specific HIV-1 integration sites with clonal expansion of latently infected cells (reviewed in [Maldarelli, 2016]).
The role of the site of HIV integration into the cellular genome in the establishment and maintenance of HIV latency has remained controversial. While early studies found that the HIV integration site does affect both the entry into latency Jordan et al., 2003;Jordan et al., 2001), and the viral response to LRAs , other studies have failed to find a significant role of integration sites in regulating the fate of HIV infection (Dahabieh et al., 2014;Sherrill-Mix et al., 2013).
In this study, we have used a new dual color reporter virus, HIV GKO , to investigate the reactivation potential of various LRAs in pure latent population. We find that latency is heterogeneous and that only a small fraction (<5%) of the latently infected cells is reactivated by LRAs. We also show that both genomic localization and chromatin context of the integration site affect the fate of HIV infection and the reversal of viral latency.

Results
A second-generation dual-fluorescence HIV-1 reporter (HIV GKO ) to study latency Our laboratory reported the development of a dual-labeled virus (DuoFluoI) in which eGFP is under the control of the HIV-1 promoter in the 5 0 LTR and mCherry is under the control of the cellular elongation factor one alpha promoter (EF1a) . However, we noted that the model was limited by a modest number of latently infected cells (<1%) generated regardless of viral input (Figure 1-figure supplement 1A-1C), as well as a high proportion of productively infected cells in which the constitutive promoter EF1a was not active (GFP+, mCherry-).
To address these issues, which we suspected were due to recombination between the 20-30 bp regions of homology at the N-and C-termini of the adjacent fluorescent proteins (eGFP and mCherry) (Salamango et al., 2013), we generated a new version of dual-labeled virus (HIV GKO ), containing a codon-switched eGFP (csGFP) and a distinct, unrelated fluorescent protein mKO2 under the control of EF1a ( Figure 1A). First, titration of HIV GKO input revealed that productively and latently infected cells increased proportionately as the input virus increased ( Figure 1B Figure 1C). A small proportion of csGFP+ mKO2-cells were still visible in HIV GKO infected cells. We generated a HIV GKO virus lacking the U3 promoter region of the 3 0 LTR (DU3-GKO), resulting in an integrated virus devoid of the 5' HIV U3 region. This was associated with a suppression of HIV transcription and an inversion of the latency ratio (ratios latent/productive = 0.34 for HIV GKO-WT-LTR and 8.8 for HIV GKO-D U3-3'LTR - Figure 1D). Finally, to further characterize the constituent populations of infected cells, double-negative cells, latently and productively infected cells were sorted using FACS and analyzed for viral mRNA and protein content. (Figures 1E and F, Figure 1-source data 1). As expected, productively infected cells (csGFP+) expressed higher amounts of viral mRNA and viral proteins, but latently infected cells (csGFP-mKO2+) had very small amounts of viral mRNA and no detectable viral proteins.
Based on all these findings, the second-generation of dual-fluorescence reporter, HIV GKO , is able to more accurately quantify latent infections in primary CD4 + T cells than HIV DuoFluoI , and thus allows for the identification and purification of a larger number of latently infected cells. Using flow cytometry, we can determine infection and HIV productivity of individual cells and simultaneously control for cell viability.
To measure reactivation by LRAs in patient samples, we treated 5 million purified resting CD4 + T cells from four HIV infected individuals on suppressive ART (participant characteristics in Table 1) with single LRAs, combinations thereof, or vehicle alone for 24 hr. LRAs efficacy was assessed using a PCR-based assay, by measuring levels of intracellular HIV-1 RNA using primers and a probe that detect the 3 0 sequence common to all correctly terminated HIV-1 mRNAs (Bullen et al., 2014). Of  Figure 2A), showed expected fold induction value (10 to 100-fold increases of HIV RNA in PBMCs [Bullen et al., 2014;Darcis et al., 2015;Laird et al., 2015]). Combinations of the PKC agonist bryostatin-1 with JQ1 or with panobinostat (fold-increases of 126.2-and 320.8-fold, respectively, Figure 2A), were highly more effective than bryostatin-1, JQ1 or panobinostat alone (fold-increases of 6.8, 1.7-and 2.9-fold, respectively, Figure 3A), and even greater than T-cell activation with aCD3/CD28. This observation is consistent with previous reports Jiang et al., 2015;Laird et al., 2015;Martínez-Bonet et al., 2015).
The same LRAs and combinations were next tested after infection of human CD4 + T cells in vitro with HIV GKO . Measurement of intracellular HIV-1 mRNA in HIV GKO latently infected cells showed an expected fold induction of latency in response to aCD3/CD28 (11.3-fold, Figure 2B, Figure 2source data 1). Second, JQ1, panobinostat, and bryostatin-1 alone all caused limited reactivation of latent HIV (fold-increases of 1.1-, 5.6-and 6.2-fold, respectively, Figure 2B), as observed in patients' samples. Finally, we observed low synergy when combining bryostatin and JQ1 (8-fold increase), but high synergy between bryostatin and panobinostat (67.3-fold increase). These data together demonstrate that HIV GKO closely mimics in vitro what is observed in ex vivo patients' samples (correlation rate r 2 = 0.88, p=0.0056 - Figure 2C), and validate the robustness and reliability of the dual-florescence HIV reporter as a model to study HIV-1 latency.

HIV-1 LRAs target a minority of latently infected primary CD4 + T cells
Current assays have relied on PCR-based assays to measure HIV RNA, and to evaluate the efficacy of different LRAs ( Figure 2A). The use of dual-fluorescent HIV reporters, however, provides a tool to quantify directly the fraction of cells that become reactivated.

Small fractional rate of latency reactivation is not explained by low cellular response to activation signals
These data highlight two important facts: a) cell-associated HIV RNA quantification does not reflect the absolute number of cells undergoing viral reactivation, and b) induced cell-associated HIV RNA, in response to all reversing agents, comes from a small fraction of reactivated latent cells. This was particularly surprising with aCD3/CD28 stimulation, as a currently accepted model for HIV latency is that the state of T cell activation dictates the transcriptional state of the provirus. Treatment of latently infected primary CD4 + T cells with aCD3/CD28 stimulated HIV production in less than 5% of the cells, while the other 95% remained latent, even though after 24 hr of treatment nearly all of the cells had upregulated the early T cell activation marker CD69 (  +/HLA-DR+/-). We only observed a statistically significant increase of NRLIC compared with RLIC in the CD69+/CD25-/HLA-DR+ population, however this small increase in a relatively minor population is insufficient to explain the low reactivation rate of latently infected cells. Overall, comparison of both reactivated and non reactivated latent populations showed little difference in their activation state.

Integration sites, gene expression, transcription units and the fate of HIV infection
The role of the site of HIV integration into the genome in latency remains a subject of debate Dahabieh et al., 2014;Jordan et al., 2003;Jordan et al., 2001;Sherrill-Mix et al., 2013). To identify possible differences in integration sites between reactivated and nonreactivated HIV genomes, primary CD4 + T-cells were infected with HIV GKO . At 5 days post-infection, productively infected cells (GFP+, PIC) were sorted and frozen. The GFP negative population (consisting of a mixture of latent and uninfected) was isolated and treated with aCD3/CD28. 48 hr postinduction, both non reactivated (NRLIC) and reactivated (RLIC) populations were isolated. Nine libraries (three donors, three samples/donor: PIC, RLIC, NRLIC) were constructed from genomic DNA as described (Cohn et al., 2015) and analyzed by high-throughput sequencing to locate HIV proviruses within the human genome. A total of 1803 virus integration sites were determined: 960 integrations in PIC, 681 in NRLIC, and 162 in RLIC (Integration Sites Source data).
To determine whether integration within genes differentially expressed during T-cell activation predicted infection reactivation fate, we compared our HIV integration dataset with a published dataset for gene expression in resting and activated (48 hr -aCD3/CD28) CD4 + T cells from healthy individuals (Ye et al., 2014). The analysis revealed that most of the aCD3/CD28-induced latent proviruses were not integrated in genes responsive to T-cell activation signals ( Figure 5A and B, Figure 5-source data 1). Interestingly, PIC and RLIC integration events were associated with genes whose basal expression was significantly higher than genes targeted in NRLIC, both in activated and resting T cells ( Figure 5C, Figure 5-source data 2).
Next, we investigated whether different genomic regions were associated with productive, inducible or non-inducible latent HIV-1 infections. In agreement with previous studies (Cohn et al., 2015;Dahabieh et al., 2014;Maldarelli et al., 2014;Wagner et al., 2014), the majority of integration sites were found within genes in each population ( Figure 6A, Figure 6-source data 1), although the proportion of genic integrations in NRLIC was significantly lower than in PIC and RLIC samples. Moreover, integration events in the PIC and RLIC populations were more frequent in transcribed regions (64% and 58%, respectively, [sum of low + medium + high transcribed regions] ( Figure 6B), Figure 6-source data 1), while these regions were significantly less represented in the NRLIC (31%) ( Figure 6B). As expected since introns represent a much larger proportion of genes, genic integration events were more frequent in the introns for each population (>65%, Figure 6C

Chromatin modifications at the site of HIV integration and latency
Chromatin marks, such as histone post-translational modifications (e.g., methylation and acetylation) and DNA methylation, are involved in establishing and maintaining HIV-1 latency (De Crignis and Mahmoudi, 2017). We examined 500 bp regions centered on all integration sites in each population for several chromatin marks by comparing our data with several histone modifications and DNaseI ENCODE datasets. We first looked at distinct and predictive chromatin signatures, such as H3K4me1 (active enhancers), H3K36m3 (active transcribed regions), H3K9m3 and H3K27m3 (repressive marks of transcription) (reviewed in [Kumar et al., 2015;Shlyueva et al., 2014]). All three populations exhibited distinct profiles, although productive and inducible latent infections profiles appeared most similar ( Figure 7A, Figure 7-source data 1). The analysis showed that PIC integration events were associated with active chromatin (i.e., transcribed genes -H3K36me3 or enhancers -H3K4me1), while NRLIC integration events appeared biased toward heterochromatin (H3K27me3 and H3K9me3) and non-accessible regions (DNase hyposensitivity).
Marini et al. recently reported that HIV-1 mainly integrates at the nuclear periphery (Marini et al., 2015). We therefore examined the topological distribution of integration sites from each population inside the nucleus by comparing our integration site data with a previously published dataset of lamin-associated domains (LADs) (Guelen et al., 2008). LADs consist of H3K9me2 heterochromatin and are present at the nuclear periphery. This analysis showed that latent integration sites from both RLIC and NRLIC were in LADs to a significantly higher degree (32% and 30.4%) than productive integrations (23.6%) (p<0.05, Figure 7B, Figure 7-source data 1). Overall, these data show similar features between productively infected cells and inducible latently infected cells, while non-reactivated latently infected cells appear distinct from the other populations. These findings support a prominent role for the site of integration and the chromatin context for the fate of the infection itself, as well as for latency reversal.

Discussion
Dual-color HIV-1 reporters are unique and powerful tools Dahabieh et al., 2013), that allow for the identification and the isolation of latently infected cells from productively infected cells and uninfected cells. Latency is established very early in the course of HIV-1 infection (Archin et al., 2012b;Chun et al., 1998;Whitney et al., 2014) and, until the advent of dualreporter constructs, no primary HIV-1 latency models have allowed the study of latency heterogeneity at this very early stage. Importantly, the comparison of data obtained from distinct primary HIV-1 Integration sites displayed outside of the two solid gray lines were targeted genes whose expression is at least ± twofold differentially expressed after 48 hr stimulation. Plot points size can be different, the bigger the plot point is, the more integration events happened within the same gene. (B) Fraction of integration sites from the different populations PIC, RLIC or NRLIC, integrated within genes whose expression is at least ± twofold differentially expressed after 48 hr of aCD3/CD28 stimulation (**p<0.01; ***p<0.001; two-proportion z test) ( Figure 5-source data 1). (C) Relative expression of genes targeted by HIV-1 integration in PIC, RLIC or NRLIC before TCR stimulation and after aCD3/CD28 stimulation (n = 3, mean +SEM, paired t-test). ***p<0.001; ****p<0.0001. (Figure 5-source data 2). DOI: https://doi.org/10.7554/eLife.34655.013 The following source data is available for figure 5: Source data 1. Fraction of integration sites from the different populations PIC, RLIC or NRLIC, integrated within genes whose expression is at least ± twofold differentially expressed after 48 hr of aCD3/CD28 stimulation. latency models is complicated as some models are better suited to detect latency establishment (e. g., dual-reporters), while others are biased towards latency maintenance (e.g., Bcl2-transduced CD4 + T cells). The use of env-defective viruses limits HIV replication to a single-round and, thereby limits the appearance of defective viruses (Bruner et al., 2016).
In this study, we describe and validate an improved version of HIV DuoFluoI , previously developed in our laboratory , which accurately allows for: (a) the quantification of latently infected cells, (b) the purification of latently infected cells, and (c) the evaluation of the 'shock and kill' strategy. Our data highlight two important facts: (a) cell-associated HIV RNA quantification does not reflect the number of cells undergoing viral reactivation, and (b) a small portion of the cells carrying latent proviruses (<5%) is reactivated, although LRAs target the whole latent population. Hence, even if cells harboring reactivated virus die, this small reduction would likely remain undetectable when quantifying the latent reservoir in vivo. Our data are in agreement with previous reports, which show that levels of cellular HIV RNA and virion production are not correlated, and that the absolute number of cells being reactivated by aCD3/CD28 is indeed limited to a small fraction of latently infected cells (Cillo et al., 2014;Sanyal et al., 2017;Yucha et al., 2017). Using our dual-fluorescence reporter, we confirm these findings, and extend these observations to LRAs combinations. However, although LRAs combinations show synergy when measuring cell-associated HIV RNA, we do not find such synergy at the level of individual cells, but rather only partial additive effect. Our work, as well as that of others (Cillo et al., 2014;Sanyal et al., 2017;Yucha et al., 2017), demonstrate the importance of single cell analysis when it comes to the evaluation of potential LRAs. Indeed, it is necessary to determine wheter potential increases in HIV RNA after stimulation in a bulk population result from a small number of highly productive cells, or from a larger but less productive population, as these two mechanisms likely have very different impacts on the latent reservoir.
Our data further highlight the heterogeneous nature of the latent reservoir Ho et al., 2013). We currently have a limited understanding of why some latently infected cells are capable of being induced while others are not. It is possible that different chromatin environments impose different degrees of transcriptional repression on the integrated HIV genome, with the non reactivatable latent HIV corresponding to the most repressive environment. . Since HIV GKO allows for the isolation of productively infected cells and reactivated latent cells from those that do not reactivate, it provides a unique opportunity to explore the impact of HIV integration on the fate of the infection. Different integration site-specific features contribute to latency, such as the chromatin structure, including adjacent loci but also the provirus location in the nucleus Lusic et al., 2013). Viral integration is a semi-random process (Bushman et al., 2005) in which HIV-1 preferentially integrates into active genes (Barr et al., 2006;Bushman et al., 2005;Demeulemeester et al., 2015;Ferris et al., 2010;Han et al., 2004;Lewinski et al., 2006;Mitchell et al., 2004;Schrö der et al., 2002;Sowd et al., 2016;Wang et al., 2007). LEDGF, one of the main chromatin-tethering factors of HIV-1, binds to the viral integrase and to H3K36me3, and to a lesser extent to H3K4me1, thus directing the integration of HIV-1 into transcriptional units (Daugaard et al., 2012;Eidahl et al., 2013;Pradeepa et al., 2012). Also CPSF6, which binds to the viral capsid, markedly influences integration into transcriptionally active genes and regions of euchromatin (Sowd et al., 2016), explaining how HIV-1 maintains its integration in the euchromatin regions of the genome independently of LEDGF (Quercioli et al., 2016). Several studies have characterized the integration sites, however, these analyses have been restricted to productive infections.
Using ENCODE reference datasets, our data are consistent with previous results, showing that HIV-1 preferentially targets actively transcribed regions (Marini et al., 2015;Wang et al., 2007;Chen et al., 2017). However, non-inducible latent proviruses are observed to be integrated to a higher extent into silenced chromatin. In addition, even though HIV integration is normally strongly disfavored in the heterochromatic condensed regions in LADs due to low chromatin accessibility, we show that some HIV integration does occur in LADs when using a previously published dataset of LADs (Guelen et al., 2008;Marini et al., 2015), and that latent proviruses that are not readily reactivatable are integrated at higher extent in LADs.
Importantly, we identify a unique rare population among the latent cells that can be reactivated. In contrast to the non-inducible latent infections, the latency reversal of inducible latent proviruses might be explained by integration in an open chromatin context, similar to integration sites for productive proviruses, followed by subsequent heterochromatin formation and proviral silencing. As a consequence, the distinct integration sites between induced and non-induced latent proviruses highlight new possibilities for cure strategies. Indeed, the 'shock and kill' strategy aims to reactivate and eliminate every single replication-competent latent provirus, since a single remaining cell carrying a latent inducible provirus could, in theory, reseed the infection. However, our study and others point out several significant barriers to successful implementation of the 'shock and kill' strategy. First, LRAs only reactivate a limited fraction of latent proviruses. It is likely that some of the non-induced proviruses, such as those integrated into enhancers and transcriptionnal active regions of the genome, will reactivate after several rounds of activation, due to the stochastic nature of HIV activation (Dar et al., 2012;Ho et al., 2013;Singh et al., 2010;Weinberger et al., 2005). It is also likely that better suited LRAs combinations (two or more LRAs) will reactivate some of the non-induced proviruses integrated into silenced chromatin marked by H3K27me3 and H3K9me3. Indeed, several studies have shown that the pharmaceutical inhibition of H3K27me3 and H3K9me2/3 could sensitize latent proviruses to LRAs (Friedman et al., 2011;Nguyen et al., 2017;Tripathy et al., 2015). Second, Shan et al. have shown that latently reactivated cells are not cleared due to cytopathic effects or CTL response implying that immunomodulatory approaches, in addition of more potent LRAs, are likely required to achieve a cure for HIV infection (Shan et al., 2012).
In conclusion, the heterogeneity of the latent reservoir calls for therapies addressing the different pools of latently infected cells. While 'shock and kill' might be helpful in reactivating and possibly eliminating a small subset of highly reactivatable latent HIV genomes, other approaches will be necessary to control or eliminate the less readily reactivatable population identified here and in patients.
Perhaps, this latter population should rather be 'blocked and locked' using latency-promoting agents (LPAs), as described by several groups (Besnard et al., 2016;Kessing et al., 2017;Kim et al., 2016;Vranckx et al., 2016). For a functional cure, a stably silenced, non-reactivatable provirus is preferable to a lifetime of chronic active infection.

Patients' samples
Four HIV-1-infected individuals, who met the criteria of suppressive ART, undetectable plasma HIV-1 RNA levels (<50 copies/ml) for a minimum of six months, and with CD4 + T cell count of at least 350 cells/mm 3 , were enrolled. The participants were recruited from the SCOPE cohort at the University of California, San Francisco. Table 1 details the characteristics of the study participants.
Of note, the Envelope open reading frame was disrupted by the introduction of a frame shift at position 7136 by digestion with KpnI, blunting, and re-ligation.

Virus production
The production of HIV GKO and the assessment of HIV Latency Reversal Agents in Human Primary CD4+ T Cells are described in more detail at Bio-protocol (Battivelli and Verdin, 2018). Pseudotyped HIV DuoFluoI and HIV GKO viral stocks were generated by co-transfecting (standard calcium phosphate transfection method) HEK293T cells with a plasmid encoding HIV DuoFluoI or HIV GKO , and a plasmid encoding HIV-1 dual-tropic envelope (pSVIII-92HT593.1). Medium was changed 6-8 hr posttransfection, and supernatants were collected after 48 hr, centrifuged (20 min, 2000 rpm, RT), filtered through a 0.45 mM membrane to clear cell debris, and then concentrated by ultracentrifugation (22,000 g, 2 hr, 4˚C). Concentrated virions were resuspended in complete media and stored at À80˚C. Virus concentration was estimated by p24 titration using the FLAQ assay (Gesner et al., 2014).

Primary cell isolation and cell culture
CD4 + T cells were extracted from peripheral blood mononuclear cells (PBMCs) from continuous-flow centrifugation leukophoresis product using density centrifugation on a Ficoll-Paque gradient (GE Healthcare Life Sciences, Chicago, IL). Resting CD4 + lymphocytes were enriched by negative depletion with an EasySepHuman CD4 + T Cell Isolation Kit (Stemcell Technologies, Canada). Cells were cultured in RPMI medium supplemented with 10% fetal bovine serum, penicillin/streptomycin and 5 mM saquinavir.
Primary CD4 + T cells were purified from healthy donor blood (Blood Centers of the Pacific, San Francisco, CA, and Stanford Blood Center), by negative selection using the RosetteSep Human CD4 + T Cell Enrichment Cocktail (StemCell Technologies, Canada). Purified resting CD4 + T cells from HIV-1 or healthy individuals were cultured in RPMI 1640 medium supplemented with 10% FBS, L-glutamine (2 mM), penicillin (50 U/ml), streptomycin (50 mg/ml), and IL-2 (20 to 100 U/ml) (37˚C, 5% CO 2 ). Spin-infected primary CD4 + T cells were maintained in 50% of complete RPMI media supplemented with IL-2 (20-100 U/ml) and 50% of supernatant from H80 cultures (previously filtered to remove cells) without beads. Medium was replenished every 2 days until further experiment.

Cell infection
Purified CD4 + T cells isolated from healthy peripheral blood were stimulated with aCD3/CD28 activating beads (Thermofisher, Waltham, MA) at a concentration of 0.5 bead/cell in the presence of 20-100 U/ml IL-2 (PeproTech, Rocky Hill, NJ) for three days. All cells were spinoculated with either HIV-DuoFluoI , HIV GKO or HIV D3U-GKO at a concentration of 300 ng of p24 per 1.10 6 cells for 2 hr at 2000 rpm at 32˚C without activation beads.
Infected cells were either analyzed by flow cytometry or sorted 4-5 days post-infection.

Latency-reversing agent treatment conditions
CD4 + T cells were stimulated for 24 hr unless stipulated differently, with latency-reversing agents at the following concentrations for all single and combination treatments: 10 nM bryostatin-1, 1 mM JQ1, 30 nM panobinostat, aCD3/CD28 activating beads (1 bead/cell), or media alone plus 0.1% (v/ v) DMSO. For all single and combination treatments, 30 mM Raltregravir (National AIDS Reagent Program) was added to media. Concentrations were chosen based on Laird et al. paper (Laird et al., 2015).
Sorting of infected CD4 + T cells was performed with a FACS AriaII (BD Biosciences, Franklin Lakes, NJ) based on their GFP and mKO2 fluorescence markers at 4/5 days post-infection, and placed back in culture for further experimentation. In the experiments shown in Figures 2B and  4, we isolated both HIV GKO latently infected cells (GFP-, mKO2+, 3%) and uninfected cells (csGFP-, mKO2-, 97%) five days post-infection, before treating cells with LRAs.
In the experiment shown in Figure 3, we isolated pure latent cells (GFP-, mKO2+) five days postinfection, before treating this pure population with LRAs.
DNA, RNA and protein extraction, qPCR and western blot RNA and proteins ( Figure 1B and C) were extracted with PARIS TM kit (Ambion, Thermofisher, Waltham, MA) according to manufacturer's protocol from same samples. RNA was retro-transcribed using random primers with the SuperScript II Reverse Transcriptase (Thermofisher, Waltham, MA) and qPCR was performed in the AB7900HT Fast Real-Time PCR System, using 2X HoTaq Real Time PCR kit (McLab, South San Francisco, CA) and the appropriate primer-probe combinations described in . Quantification for each qPCR reaction was assessed by the ddCt algorithm, relative to Taq Man assay GAPDH Hs99999905_m1. Protein content was determined using the Bradford assay (Bio-Rad, Hercules, CA) and 20 mg were separated by electrophoresis into 12% SDS-PAGE gels. Bands were detected by chemiluminescence (ECL Hyperfilm Amersham, GE Healthcare Life Sciences, Chicago, I) with anti-Vif, HIV-p24 and a-actin (Sigma, Saint-Louis, MO) primary antibodies.
Total RNA (Figure 2A and B) wasextracted using the Allprep DNA/RNA/miRNA Universal Kit (Qiagen, Germany) with on-column DNAase treatment (Qiagen RNase-Free DNase Set, Germany). cDNA synthesis was performed using SuperScript IV Reverse Transcriptase with a combination of random hexamers and oligo-dT primers (ThermoFisher, Waltham, MA). Relative cellular HIV mRNA levels were quantified using a qPCR TaqMan assay using primers and probes described in (Bullen et al., 2014) on a QuantStudio 6 Flex Real-Time PCR System (Thermofisher, Waltham, MA). Relative cell-associated HIV mRNA copy numbers were determined in a reaction volume of 20 mL with 10 mL of 2x TaqMan Universal Master Mix II with UNG ( Thermofisher, Waltham, MA), 4 pmol of each primer, 4 pmol of probe, 0.5 mL reverse transcriptase, and 2.5 mL of cDNA. Cycling conditions were 50˚C or 2 min, 95˚C for 10 min, then 60 cycles of 95˚C for 15 s and 60˚C for 1 min. Real-time PCR was performed in triplicate reaction wells, and relative cell-associated HIV mRNA copy number was normalized to cell equivalents using human genomic GAPDH expression by qPCR and applying the comparative Ct method (Livak and Schmittgen, 2001).

HIV integration site libraries and computational analysis
HIV integration site libraries and computational analysis were executed in collaboration with Lilian B. Cohn and Israel Tojal Da Silva as described in their published paper (Cohn et al., 2015), with a few small changes added to the computational analysis pipeline. First, we included integration sites with only a precise junction to the host genome. Second, to eliminate any possibility of PCR mispriming, we have excluded integration sites identified within 100 bp (50 bp upstream and 50 bp downstream) of a 9 bp motif identified in our LTR1 primer: TGCCTTGAG. Thirdly we have merged integration sites within 250 bp and have counted each integration site as a unique event. The list of integration sites for each donor and each population can be found as a source data file linked to this manuscript (Integration Sites Source data 1).
We calculated expression (GSM669617) and chromatin mark abundance (the remaining ENCODE datasets) at the integration sites as bins of 500 bp centered on the integration site (read count quantification in Seqmonk: all non-duplicated reads regardless of strand, corrected per million reads total, non-log transformed). Gene annotations were not taken into account. Thresholds for expression values (upper 1/8th, upper quarter, half, and above 0) were set to distinguish five different categories, set as the upper 1/8th of expression values (high), upper quarter-1/8th (medium), upper half-quarter (low), lower half but above 0 (trace), 0 (silent).

Statistical analysis
Significance was analyzed by either paired t-test (GraphPad Prism) or proportion test (standard test for the difference between proportions), also known as a two-proportion z test (https://www.medcalc.org/calc/comparison_of_proportions.php), and specified in the manuscript.
with help from the University of California San Francisco-Gladstone Institute of Virology and Immu- Data availability All sequencing data generated during this study are included in the Integration sites Source data file 1