HIV reservoirs are dominated by genetically younger and clonally enriched proviruses

ABSTRACT In order to cure human immunodeficiency virus (HIV), we need to better understand the within-host evolutionary origins of the small reservoir of genome-intact proviruses that persist within infected cells during antiretroviral therapy (ART). Most prior studies on reservoir evolutionary dynamics, however, did not discriminate genome-intact proviruses from the vast background of defective ones. We reconstructed within-host pre-ART HIV evolutionary histories in six individuals and leveraged this information to infer the ages of intact and defective proviruses sampled after an average of >9 years on ART, along with the ages of rebound and low-level/isolated viremia occurring during this time. We observed that the longest-lived proviruses persisting on ART were exclusively defective, usually due to large deletions. In contrast, intact proviruses and rebound HIV exclusively dated to the years immediately preceding ART. These observations are consistent with genome-intact proviruses having shorter lifespans, likely due to the cumulative risk of elimination following viral reactivation and protein production. Consistent with this, intact proviruses (and those with packaging signal defects) were three times more likely to be genetically identical compared to other proviral types, highlighting clonal expansion as particularly important in ensuring their survival. By contrast, low-level/isolated viremia sequences were heterogeneous in terms of age, with some potentially originating from defective proviruses. Results reveal that the HIV reservoir is dominated by clonally enriched and genetically younger sequences that date to the period of untreated infection when viral populations had been under within-host selection pressures for the longest duration. Knowledge of these qualities may help focus strategies for reservoir elimination. IMPORTANCE Characterizing the human immunodeficiency virus (HIV) reservoir that endures despite antiretroviral therapy (ART) is critical to cure efforts. We observed that the oldest proviruses persisting during ART were exclusively defective, while intact proviruses (and rebound HIV) dated to nearer ART initiation. This helps explain why studies that sampled sub-genomic proviruses on-ART (which are largely defective) routinely found sequences dating to early infection, whereas those that sampled replication-competent HIV found almost none. Together with our findings that intact proviruses were more likely to be clonal, and that on-ART low-level/isolated viremia originated from proviruses of varying ages (including possibly defective ones), our observations indicate that (i) on-ART and rebound viremia can have distinct within-host origins, (ii) intact proviruses have shorter lifespans than grossly defective ones and thus depend more heavily on clonal expansion for persistence, and (iii) an HIV reservoir predominantly “dating” to near ART initiation will be substantially adapted to within-host pressures, complicating immune-based cure strategies.

median OR = 2 (IQR 1.8-2.3)median OR = 0.16 (IQR 0.15-0.17)Fig. S3: Odds ra8os of clonality by genomic integrity: subsampling analysis.This analysis tests the robustness of significant between-group comparisons in Figures 2B-E to shallower sampling depth.Panel A. Odds ra?o distribu?onsderived by comparing the clonality of 54 unique intact proviruses with 1,000 equally-sized datasets of defec?ve proviruses, subsampled from the overall data with replacement.Panel B. Same as A, but comparing the clonality of 94 unique Ψ-defect sequences to 1,000 equally-sized subsamples of proviruses of other types.Panel C. Same as A, but comparing the clonality of 570 unique hypermutated sequences to 1,000 equally-sized subsamples of proviruses of other types.Odds ra?os and p-values were computed using Fisher's exact test and are not corrected for mul?ple comparisons.On the violin plots, the middle dashed line indicates the median Odds Ra?o, while the upper and lower dashed lines denote the 25 th and 75 th percen?les.The data, analyses and legend are the same as for Figure 8, except that all of the phylogenies in the present analysis were out-group rooted, and this root was used to infer integra;on dates of on-ART proviruses.Es#mated integra#on dates of dis#nct proviruses that were observed two or more #mes (clonal) versus only once (unique), by par#cipant.Open diamonds denote defec#ve proviruses; red diamonds with black outline include intact proviruses as well as HIV RNA sequences recovered from ex vivo reac#va#on (QVOA), if any.Black lines indicate median es#mated integra#on date for each group.P-values were determined using the Mann-Whitney test, and are not corrected for mul#ple tes#ng.

Fig. S2 :
Fig.S2: Between-host phylogeny inferred from gag sequence alignments.Same as FigureS1, except this phylogeny was inferred from all on-ART proviral sequences with an intact gag gene.
Fig.S4(previous two pages): Amino acid highlighter plots depicTng within-host evoluTon in pre-and post-ART nef sequences for each parTcipant.The top sequence corresponds to the pre-ART plasma sequence closest to the root of the par7cipant's highest likelihood within-host phylogeny (see Figures3B-8Bin main manuscript) and serves as a reference sequence, where colored 7cks in sequences beneath this denote non-synonymous subs7tu7ons rela7ve to this reference.Pre-ART nef sequences are ordered according to their sampling date; post-ART nef sequences are ordered according to their inferred integra7on date.

Fig. S5 :
Fig.S5: Comparison of integra?on date distribu?ons of select groups of sequences: subsampling analysis.This analysis tests the robustness of significant between-group comparisons reported in Figures 4D, 5D and 6D, to shallower sampling depth.Panel A. Median integraAon dates of 9 unique isolated viremia sequences from BC-002 (solid red square) versus those derived from 1,000 equally-sized provirus datasets, subsampled with replacement from the parAcipant's full provirus sequence dataset (open diamonds).Panel B. Median integraAon dates of 15 unique intact proviruses from BC-003 versus those from 1,000 equally-sized subsampled defecAve provirus datasets.Panel C. Median integraAon dates of 8 unique isolated viremia sequences from BC-004 versus those from 1,000 equally-sized subsampled on-ART sequences.The median (and IQR) p-values from all 1,000 comparisons are shown above each plot.
Fig.S6(previous page): Integra7on date inference of on-ART sequences for par7cipant BC-027, aZer out-group roo7ng their phylogenies.The data, analyses and legend are the same as for Figure8, except that all of the phylogenies in the present analysis were out-group rooted, and this root was used to infer integra;on dates of on-ART proviruses.
Fig. S8 (previous page): Lack of rela)onship between sequence clonality and age.Es#mated integra#on dates of dis#nct proviruses that were observed two or more #mes (clonal) versus only once (unique), by par#cipant.Open diamonds denote defec#ve proviruses; red diamonds with black outline include intact proviruses as well as HIV RNA sequences recovered from ex vivo reac#va#on (QVOA), if any.Black lines indicate median es#mated integra#on date for each group.P-values were determined using the Mann-Whitney test, and are not corrected for mul#ple tes#ng.