A transmission-virulence evolutionary trade-off explains attenuation of HIV-1 in Uganda

Evolutionary theory hypothesizes that intermediate virulence maximizes pathogen fitness as a result of a trade-off between virulence and transmission, but empirical evidence remains scarce. We bridge this gap using data from a large and long-standing HIV-1 prospective cohort, in Uganda. We use an epidemiological-evolutionary model parameterised with this data to derive evolutionary predictions based on analysis and detailed individual-based simulations. We robustly predict stabilising selection towards a low level of virulence, and rapid attenuation of the virus. Accordingly, set-point viral load, the most common measure of virulence, has declined in the last 20 years. Our model also predicts that subtype A is slowly outcompeting subtype D, with both subtypes becoming less virulent, as observed in the data. Reduction of set-point viral loads should have resulted in a 20% reduction in incidence, and a three years extension of untreated asymptomatic infection, increasing opportunities for timely treatment of infected individuals. DOI: http://dx.doi.org/10.7554/eLife.20492.001


Introduction
To spread, a pathogen must multiply within the host to ensure transmission, while simultaneously maintaining opportunities for transmission by avoiding host morbidity or death (Anderson and May, 1982;Alizon et al., 2009). This creates a trade-off between transmission and virulence. This hypothesis permeates theoretical work on the evolution of virulence, but empirical evidence remains scarce (Dwyer et al., 1990;Mackinnon and Read, 1999;Fraser et al., 2007;de Roode and Yates, 2008;Alizon et al., 2009;Cressler et al., 2015). In HIV-1 infection, set-point viral load (SPVL), the stable viral load in the asymptomatic phase of infection, is a viral trait which is both variable and heritable (Hollingsworth et al., 2010;Fraser et al., 2014;Hodcroft et al., 2014), and has an important impact on the transmission cycle of the pathogen. In untreated infection, higher SPVL translates into higher per-contact transmission rates but also faster disease progression to AIDS and death. From the perspective of the transmission cycle, this creates a trade-off, under which an intermediate SPVL value maximises opportunities for transmission (Fraser et al., 2007). Indeed the transmission potential of a parasite is the product of the transmission rate and the time during which the host is alive and can transmit. The latter is approximately the time to AIDS in HIV as host death occurs shortly after the onset of AIDS and sexual activity may be reduced in the AIDS phase because of AIDS-associated symptoms (Hollingsworth et al., 2008). The virulence-transmission trade-off in HIV is important for understanding pathogenesis and is a possible explanation for the significant changes in HIV virulence reported over the last decades in North America and Europe. There, SPVL increased at an estimated rate of 0.013 (Herbeck et al., 2012) and 0.020 log 10 copies/mL/year (Pantazis et al., 2014) over the last 28 years. Since many persons at risk of infection do not routinely obtain HIV testing (Paz-Bailey et al., 2013), such changes may lead to more transmission and more newly diagnosed patients presenting with advanced infection, despite the widespread availability of antiretroviral therapy (ART).
The virulence-transmission trade-off is a promising hypothesis to explain changes in virulence of HIV, but this hypothesis and its predictions have so far been approached in a piecemeal manner, by combining data on infectiousness, AIDS-free survival and the dynamics of SPVL from very different cohorts (Fraser et al., 2007;Herbeck et al., 2012;Pantazis et al., 2014). Here we integrated extensive data from a single cohort in Uganda into an epidemiological-evolutionary model describing the transmission cycle of HIV. We then compared predictions on the evolution of SPVL evolution to the observed trends in SPVL in this cohort.

Results
We focused on one of the longest established generalised HIV epidemics, in rural Uganda, and used data collected as part of the Rakai Community Cohort Study (RCCS), a large and long-standing population-based open cohort conducted by the Rakai Health Sciences Program (RHSP) in Rakai District. We combined data on transmission rates and survival to estimate the evolutionary optimal distribution of SPVL for the RCCS cohort, and then compared it to the dynamics of SPVL over time from 1995 to 2012. ART probably had little effect on the evolutionary dynamics of SPVL in Uganda because it only became available in 2004 and is initiated at relatively late stage infection (CD4 < 250 cells/mm 3 from 2004 to January 2011, and at < 350 cells/mm 3 from February, 2011 to the time of writing, August 2016).
As in other HIV epidemics, we found that SPVL is highly variable in this population, with values ranging from 10 2 copies/mL to 10 7 copies/mL. SPVL was calculated for 647 individuals who had a positive HIV serologic test within two study visits of their last negative test ('HIV incident cases', Table 1; median time between last negative visit and first positive visit is 1.25 years), and for 817 participants in a serodiscordant partnership ('serodiscordant couples', Table 2).
We analysed transmission in 817 serodiscordant couples, in which one partner was positive (index partner), while the other was initially negative and at risk of infection during follow-up. This analysis revealed that higher SPVL was associated with significantly increased rate of transmission. Transmission between partners was modelled as a Poisson process, in which the instantaneous transmission rate is constant (Fraser et al., 2007). We allowed the transmission rate to be a function of SPVL, b v ð Þ. We estimated all parameters by maximum likelihood and compared different models based on Akaike Information Criterion (AIC) (Materials and methods, Figure 1-figure supplement 1). The best model fit was one where transmission rates increases from 0.019/year to 0.14/year in a stepwise fashion as SPVL increases with three plateaus (Figure 1a) (DAIC = À75.96 compared to null model Table 1. Epidemiological and demographic characteristics of the HIV-1 incident cases in the Rakai cohort, used for the analysis of time trends in SPVL and for the analysis of time to AIDS. *Multiple subtypes (possibly dual infection) ** Recombinants, primarily A/D.  4.08 < SPVL < 5.12 SPVL > 5.12 d Figure 1. Inferred relationships between SPVL and transmission rate (a, b) and time to AIDS (b, d). On the left panels, black lines show the maximum likelihood relationships and shaded areas the bootstrap 95% confidence intervals. Both the step function (horizontal lines) and the generalised Hill function (curved line) are shown. The red lines show a non-parametric estimation of the transmission rate (a) and the time to AIDS (c) curves, when the data is stratified by SPVL in 8 bins of equal size. The right panels show Kaplan Meier plots when the data is partitioned in three SPVL groups defined by the maximum likelihood relationships. There was good agreement between the data (step functions) and the maximum likelihood function (smooth functions). DOI: 10.7554/eLife.20492.004 The following source data and figure supplements are available for figure 1: Source data 1. Data file for Figure 1. with a fixed transmission rate, n = 817). A function with three steps was favoured over others, but we also show a continuous function, the generalised Hill function, that may be considered more biologically realistic (DAIC = -71.17 compared to the null model, n = 817) ( Figure 1a). The two functions fitted the data well, as shown by comparison with non-parametric estimates of the transmission rate in the data stratified by SPVL (Figure 1a), and by a Kaplan-Meier plot comparing data to the model prediction ( Figure 1b). We also allowed the parameters of the function b v ð Þ to vary with the covariates subtype, gender, and male circumcision status. In accordance with previous studies (Kiwanuka et al., 2009), subtype A had a higher transmission rate than subtype D for all SPVL values ( Figure 3) (DAIC = À3.32 compared to the model without subtype, n = 817). We will examine the evolutionary consequences of subtype differences later on. Gender did not have an effect on transmission (DAIC = 1.66 compared to model without gender, n = 817), and male circumcision reduced transmission both from female to male and from male to female (DAIC = À3.74 for female to male, n = 321; DAIC = À3.17 for male to female, n = 487, compared to model without circumcision) (Figure 1-figure supplement 3).
We assessed the relationship between SPVL and time to AIDS from 562 incident cases with a SPVL value and information on time to AIDS, and found that higher SPVL was associated with significantly shorter time to AIDS (Figure 1c). The time to AIDS was assumed to follow a gamma distribution, where the expected value was a function of SPVL (Fraser et al., 2007). We optimized the likelihood function and compared different models for the dependence of time to AIDS on SPVL based on AIC. The best model was a step function with three plateaus, with time to AIDS decreasing from 40 years to 5 years from low to high SPVL ( Figure 1c; DAIC = 137.22 compared to null model with fixed time to AIDS). Again, non-parametric estimation of the time to AIDS ( Figure 1c) and a Kaplan-Meier survival plot ( Figure 1d) showed good fit of the model to the data. We also allowed the relationship between SPVL and time to AIDS to vary by subtype and gender. The inferred gamma distribution had shape parameter 1.2, similar to an exponential distribution (which is the special case where shape parameter is 1). We found, in agreement with previous studies ( Kiwanuka et al., 2008), that subtype D tended to confer faster disease progression, but this effect was not statistically significant here (Figure 1, DAIC = 15.41 compared to the model without subtype, n = 562). However, subtype D-infected individuals who progressed rapidly were not included in the analysis because they had no SPVL value (among the 33 individuals who progressed to AIDS within 10 years but had no SPVL value, there were 12 subtype D, 1 recombinant, and 20 unknown subtype). Time to AIDS did not significantly vary by gender ( Figure 1, DAIC = 7.85 compared to the model without gender, n = 562).
Next, we predicted how SPVL might change over time under the trade-off between virulence and transmission, incorporating our setting-specific estimates of the virulence-transmission trade-off into an evolutionary and epidemiological model. The model is an analytically tractable Susceptible-Infected compartmental ordinary differential equation (ODE) model, where the viral population is stratified by SPVL, similar to previous models of virulence evolution (Day and Proulx, 2004;Day and Gandon, 2007) (Material and methods). SPVL of an infected individual is the sum of a viral genetic effect g, which is transmitted with mutation from a donor to a recipient, and an environmental effect e, which includes host and other environmental factors and is independently drawn in a normal distribution with mean 0 in each newly infected individual. The evolution of mean SPVL in the population is determined by the evolution of the mean viral genetic effect g. In this model the transmission rate of a virus with SPVL v is the inferred function b v ð Þ (Figure 1a), while death is assumed to occur at a constant rate v ð Þ given by the inverse of the mean time to AIDS (Figure 1c). In the ODE model, the time to AIDS follows an exponential distribution because the rate of AIDS-death is constant. The individual based model presented later on relaxes this assumption and considers gamma-distributed time to AIDS as inferred from the data.
We first developed an analytical expression for the evolution of SPVL. Because prevalence of HIV in this cohort is approximately constant (at 14% on average in the period 1995 to 2013, Figure 2figure supplement 1) and the distribution of SPVL can be closely approximated by a normal distribution, we were able to use an approximation of the Price equation (Price, 1970) inspired by a classical quantitative genetics model (Lande, 1976), to write the change in mean genetic effect of SPVL in prevalent cases over time as (Appendix): The equation has two terms that respectively describe the effects of selection and of inheritance on SPVL evolution. The first term describes selection under a virulence-transmission trade-off, maximising the ratio of the mean transmission rate over the mean severity of infection, b= , which is the mean fitness of the viral population. The SPVL that maximises mean fitness is 3.4 log 10 mL/copies (95% bootstrap CI [2.6; 4.0], Figure 2a). Adaptation of the viral population will proceed at a rate proportional to phenotypic variance V P (the variance in SPVL) and heritability h 2 (the fraction of variance explained by viral genetic factors, assumed to be at equilibrium). The second term describes biased mutation that changes the mean SPVL from one infection to the next, where a is the mean effect of mutations from the donor to the recipient, recapitulating the effect of within-host selection on mean SPVL. The effects of the transmission-virulence trade-off were very similar when we used the generalised Hill functional form to fit the relationships between SPVL and transmission and time to AIDS (Figure 2a).
Next we simulated the ODE and assessed the precision of the analytical approximation. We parameterised the ODE model with the data and simulated the evolution of mean SPVL from 1995 to 2015. Parameterisation was as follows: the transmission rate was as in Figure 1a; the mortality rate was the inverse of time to AIDS ( Figure 1c); heritability of SPVL in the Rakai cohort was previously estimated at 36% (confidence interval 6-66%), using 97 donor-recipient transmission pairs (Hollingsworth et al., 2010) (who are participants of the present cohort). We had little data to parameterise the effect of within-host evolution on SPVL, a. Many different types of mutations may evolve within the host, and little is known on the net effect of these processes on SPVL. Within-host viral fitness is positively related to replicative capacity (RC), measured in the absence of an immune response, and immune escape, which is host-specific. Most studies of within-host HIV evolution have focused on CTL escape mutations, which are conditionally beneficial (i.e. their positive effect on fitness is host-specific). These usually sweep through during infection because the fitness benefit of evading the immune system outweighs the cost of reduced RC that these mutations also impose (Goepfert et al., 2008;Carlson and Brumme, 2008;Matthews et al., 2008). CTL escape mutations may be reverted if the virus harbouring a costly CTL-escape mutation is transmitted to an individual where the mutation does not help evade the new host's immune system Zanini et al., 2015). Mutations that increase the replicative capacity of the virus in all hosts may also evolve (Kouyos et al., 2011). It is also a possibility that slightly deleterious or beneficial mutations get fixed by genetic drift. We explored three scenarios where available data allow rough estimation of plausible values for the impact of within-host evolution on viral load (the a parameter) (Material and methods). (i) Most mutations evolving are conditionally beneficial but carry a strong cost to RC (a ¼ À0:47 log 10 copies/mL). (ii) Most mutations evolving are conditionally beneficial but carry a moderate cost to RC (a ¼ À0:093 log 10 copies/mL). (iii) Most mutations have unconditionally beneficial effects on RC (a ¼ þ0:057 log 10 copies/mL).
The ODE simulations predicted a decline in mean SPVL in incident cases from 1995 to 2015, at a rate of À0.042, À0.013 and À0.0009 log 10 copies/mL/year in the three scenarios chosen for withinhost evolution, for a heritability of 36%. The Price equation predicted the outcome of the ODE simulations quite precisely ( Figure 2b). The Price equation shows that the virulence-transmission tradeoff -the first term in the equation -contributes initially a decline in mean SPVL of À0:01 log 10 copies/mL/year, slowing down as the population gets closer to the optimum. Note that the Price equation concerns average genetic effect of SPVL in the prevalent cases, but the rate of evolution in the incident cases was similar in these simulations ( Figure 2-figure supplement 6). Predictions of the ODE model were robust to the addition of a number of more realistic features of the HIV epidemic, as shown by a more comprehensive individual-based stochastic model (IBM) of HIV evolution (Herbeck et al., 2014(Herbeck et al., , 2016. The IBM includes all features of the ODE model, in particular the fact that SPVL is the addition of a heritable genetic component and a random environmental component. In addition, it includes the phases of acute infection and AIDS, both characterized by viral loads being much higher than the set-point value. Disease progression was modelled as progression through a series of CD4 count categories until AIDS occurred, and the transition rates between these SPVL (log10 copies/mL) 1995−1997] [ categories were tuned to reproduce the inferred gamma-distributed time to AIDS. Partnership formation and dissolution was also explicitly modelled, as well as some degree of behavioural heterogeneity in partnership duration and coital frequency. The IBM also predicted a decline in mean SPVL in the three scenarios, although at a somewhat faster rate compared to the simplified ODE model, confirming the generality and robustness of our results ( Figure 2b). Strikingly, the data was in qualitative agreement with the evolutionary model: SPVL in the Rakai cohort decreased with date of seroconversion between 1995 and 2012, at a rate of À0.022 log 10 copies/mL per year after adjustment for other covariates (CI [À0.04; À0.002], p=0.027, n = 603) (Figure 2). Average SPVL in prevalent cases was also declining at a rate of À0.020 log 10 copies/mL, although for those it is more difficult to adjust for covariates and test for significance (because the same participants are 'prevalent cases' at multiple time points) (Figure 2-figure supplement 6). The observed trends were best explained if mutations evolving within the host had a moderate negative impact on mean SPVL (scenario 2).
The agreement between the observed trend in mean SPVL and the evolutionary model suggests that genetic changes in the virus may be responsible for decreasing SPVLs. However, it is possible that other confounding effects might explain some or all of the decrease in SPVL. Because the Rakai cohort has been studied extensively, we were able to consider the potential impact of a number of confounders but none of them could explain the observed decline in mean SPVL of around 0.4 log 10 copies/mL over 17 years ( Figure 2). SPVL decline was significant in the linear model both without adjustment (À0.029 log 10 copies/mL per year, CI [À0.045; À0.013], p=0.0005, n = 603, Figure 2c), and in the multivariate regression mentioned above, controlling for the laboratory where SPVL was measured, assay type, gender, age and subtype. Additionally, to verify the robustness of the decline in mean SPVL, we examined the trend in SPVL in a number of subsets of the population (Figure 2d). SPVL declined in a similar way: (i) when using the 'strict' definition of SPVL (i.e. the subset of measures that included more than one viral load measurement and where the standard error across viral loads of the same participant was less than one log 10 copies/mL) (Appendix); (ii) within each gender ( Figure 2d); (iii) within each assay type, when partitioning the data in viral loads measured with the 'Abbott' assays and the 'Roche 1.5' assays, showing that declining SPVL was not due to changing assays; (iv) for viral loads measured at the John Hopkins and at the RHSP laboratories; and it is unlikely there were independent downward shifts in assay reading over time in these two laboratories. Mean SPVL did not decline in the subset of SPVL measured in the Walter Reed laboratory, but 90% of those were for participants infected prior to 2003, limiting power to detect temporal trends. Improvement in nutrition or health care could be hypothesised to cause a decline in SPVL over time. However, improvement in nutrition would probably have no impact on the mean SPVL, as improving micronutrient intake slows down disease progression, but does not reduce plasma viral load (Fawzi et al., 2005;Friis, 2006;Baum et al., 2013). According to a survey conducted in 2006 in the Rakai communities, households experience on average 2 months per year of food insecurity, and the Household Dietary Diversity Score is 7.7 / 12 (S. Haberlen, personal communication, August 2016), which is high enough to meet WHO dietary requirements in energy, proteins, minerals and vitamins (Steyn et al., 2006). Improved healthcare is also a possible confounder. ART was introduced in Uganda in 2004, but until 2011 ART was prescribed only at late stage infection (CD4 count below 250 cells/mL). Although we excluded post-ART viral load measures from SPVL calculations, unreported ART use could have become more frequent at later time points and therefore might have contributed the decline in mean SPVL. To exclude this possibility, we first verified that the entire distribution of SPVL shifted downward, and the decline in mean SVPL was not only due to more low viral loads at later time points (Figure 2-figure supplement 4). We also examined the individual viral load trajectories within participants to verify that the clear drop in viraemia caused by ART was not present in more recent participants without reported ART (Supplementary file 1). Last we examined the determinants of SPVL using the same linear model, focussing on the subset of SPVL values with viral loads measured before 2004, prior to ART availability in the region. We found a similar though non-significant linear decline in SPVL after non-significant 'laboratory' factors were removed (effect size = À0.019 log 10 copies/mL, CI [À0.052; 0.014], p=0.26, n = 442). In this subset of data, all SPVL but one were measured with the Roche 1.5 assay. We had little power to distinguish between 'laboratory' and 'calendar time' effects because of a strong correlation between these factors (DAIC = À1.9 for a model with "laboratory" relative to a model with 'calendar time'). However we know from the analysis of the full dataset that 'laboratory' has no significant effect on SPVL, and furthermore the inferred effects of 'laboratory' in the pre-2004 subset are consistent with confounding by calendar time and different from those of the full dataset, which suggests the temporal effect is the genuine effect here.
Coinfections such as tuberculosis, malaria, the herpes simplex virus 2, gonorrhea, or syphilis, might increase viral load in HIV infected individuals (Modjarrad and Vermund, 2010). Better health care in the Rakai district could have caused a population-level reduction in SPVL via a reduction in prevalence of these coinfections. However, none of these coinfections had a combination of high prevalence at the beginning of the study, a large reduction in prevalence between 1995 and 2012, and a large effect on SPVL, sufficient to explain a decline of 0.4 log 10 copies/mL (Material and methods).
To corroborate the evolutionary model, we extended it to include data on the subtype-specific transmission rate and model jointly the evolution of SPVL and subtype A, D, and AD recombinants (the major subtypes circulating in the population). The evolutionary model predicted the observed dynamics of subtype A, D, and AD recombinants ('R') in the cohort (Figure 3). In particular, HIV subtype A was more transmissible than subtype D for a given SPVL (Kiwanuka et al., 2009), and therefore was predicted to increase in frequency in the population. Temporal trends in subtype frequency in the data were inferred by focusing on subtypes A, D, and R and fitting a multinomial linear model for the frequency of the three subtypes as a function of seroconversion date. This revealed significant changes in subtype frequencies (analysis of deviance, p=0.044, n = 551) an increase in the frequency of subtype A (0.009 per year, bootstrap CI [À0.0007; 0.022]) and recombinants (0.007 per year, CI [À0.005; 0.017]), and a decrease in subtype D (À0.016, CI [À0.027; À0.002]), in accordance with a previous study (Conroy et al., 2010). The rise of subtype A and R together with the lower SPVL associated with infection with these subtypes contributes additionally to the decline in mean SPVL, but this effect is estimated at À0.003 log 10 copies/mL/year, very small compared to the within-subtype evolution of SPVL at a rate of À0.022 log 10 copies/mL/year (Material and methods). To model the dynamics of subtype A, D, and R within the ODE model, we assumed co-infection by A and D occurred only transiently and resulted in an 'R' infection with probability r (Day and Gandon 2012). We assumed the transmission function for subtype R was intermediate between that of subtype A and subtype D. In spite of large uncertainty in the fitness function of subtype A due to smaller numbers of infected individuals (Figure 3c), the model accurately predicted the rise in frequency of both subtypes A and R for r=1 ( Figure 3d). SPVL declined within subtype A and D, the two major subtypes co-circulating in the region (Figure 2d). The inferred fitness functions for subtype A and D were both consistent with a decline in SPVL within each subtype (Figure 3e). We note, though, that the model predicted a slower decline in SPVL within subtype A than the one observed, because this subtype is expanding in the population, which favours selection for transmission and slows down the attenuation of the virus.

Discussion
Using extensive data on a population-based cohort in the Rakai district, Uganda, we confirmed the existence of a virulence-transmission trade-off in HIV, and predicted that the viral population should evolve reduced SPVL to maximise transmission opportunities. This prediction was verified, as mean SPVL in newly infected participants declined by 0.4 log 10 copies/mL in the Rakai cohort form 1995 to 2012. We had limited information on the impact of within-host evolution on mean SPVL. However, the virulence-transmission trade-off was not negligible compared to the potential impact of withinhost evolution, and results in a decline in mean SPVL of À0.01 log 10 copies/mL/year, i.e. about 50% of the observed trend. We systematically examined potential confounders in this well-studied cohort, but none of them could account for the trend of declining SPVL, suggesting viral genetic changes may be responsible for the observed attenuation. The evolutionary model also quantitatively reproduced how higher transmission of subtype A resulted in expansion of this subtype in the population.
The attenuation of HIV in this Ugandan cohort is in contrast to increasing virulence in Europe. The European dynamics were hypothesized to result from viral adaptation to a higher optimal SPVL of 4.5 log 10 copies/mL (Fraser et al., 2007;Herbeck et al., 2014). However this higher optimum was computed using a Zambian cohort for transmission estimates, and a Dutch cohort for time to AIDS (Figure 1-figure supplement 1). Transient selection for increased virulence could also have been important in Europe, and in fact SPVL has declined since 2004 (Pantazis et al., 2014). Our finding of HIV attenuation is consistent with another study of the evolution of HIV virulence in Africa.Comparison between the epidemic in Botswana and the younger epidemic in South Africa revealed declines  in SPVL, which was hypothesized to be due to the fixation of mutations conferring adaptation to HLA variants and decreased replicative capacity (Payne et al., 2014).
Although the agreement between the observed trend in mean SPVL and the evolutionary model are consistent with genetic changes in the virus causing decreasing SPVLs, genomic data is lacking to positively demonstrate viral genetic changes. Even if genomic data were available, this would be a challenging task since SPVL is probably determined by many loci of small effect (Bartha et al., 2013), and polygenic adaption is difficult to detect (Pritchard et al., 2010). However, adaptation of the viral population to the low optimum is a logical consequence of the impact of SPVL on transmission and time to AIDS, two robust relationships inferred from the data (Figure 1). These effects of SPVL on the viral transmission cycle, together with 30-40% viral heritability of SPVL (36% specifically in the Rakai cohort, but generally around 30-40% in different settings, [Fraser et al., 2014;Mitov and Stadler, 2016;Leventhal and Bonhoeffer, 2016]), is predicted to result in attenuation of the virus.
The detailed evolutionary model of HIV SPVL evolution presented here quantitatively reproduced the attenuation of HIV-1 virulence that happened in the last 20 years. This decline in virulence is predicted to continue into the future. This decline is unaffected by ART becoming more widely available, as even aggressive test-and-treat strategies have little predicted effect on these evolutionary dynamics (Roberts et al., 2015;Herbeck et al., 2016) (Figure 2-figure supplement 3). As ART becomes more widely available, essentially shortening the duration of infection, reduced SPVL will contribute to reductions in onwards transmission, and so synergise with efforts to eliminate the pathogen.

Materials and methods
The RCCS has conducted regular surveys (approximately annual) of all consenting residents aged 15-49 in the same 50 communities since 1994, collecting detailed information on demographics, sexual behaviours and health status and obtaining blood for HIV testing from all consenting participants. Personal information on marital and long-term consensual partners is also collected, which enables retrospective identification of stable couples. All individuals found to be HIV-infected are referred for care, including CD4 T cells count and viral load measurements. Virtually all HIV transmission in this population is via heterosexual vaginal intercourse, and the rates of reported intercourse per week and month were found to be stable by HIV subtype and different study time periods. SPVL SPVL was calculated for 817 participants in a serodiscordant partnership ('Serodiscordant couples', Table 2), and for 647 individuals who had a positive HIV serology test within two study visits of their last negative test ('HIV incident cases', Table 1; median time between last negative visit and first positive visit is 1.25 years). SPVL was defined as the mean log 10 viral load for all visits occurring more than 6 months after estimated date of infection and before initiation of ART. Clinical records indicating ART initiation were available for participants who received care at an RHSP clinic prior to 2013. After 2013, ART care at most RHSP clinics was transferred to the Ugandan Ministry of Health. We determined receipt of treatment from clinics other than RHSP prior to 2013, or at any clinic post-2013, by self-reported ART treatment status (SI).

Transmission
Transmission was modelled as a Poisson process, in which the instantaneous transmission rate is constant (Fraser et al., 2007). We allowed the transmission rate to be a function of SPVL and other epidemiological covariates. For a seropositive individual (the 'index') with SPVL v, the probability that infection of the seronegative partner occurs between time t p;À and t p;þ (where the subscript p stands for partner) is given by: where t init is the time at which the index becomes infected (defined as the mid-point between last negative and first positive dates) or where observation of the couple starts, whichever occurs last and b v ð Þ is the transmission hazard. In a Poisson process, the time to transmission is exponentially distributed: thus the probability is obtained by integration of the probability density function of the exponential distribution between time t p;À and t p;þ . When infection occurred within the window of observation, t p;À and t p;þ are simply the last time the partner was seen negative and the first time he/ she was seen positive. When infection did not occur within the window of observation, t p;À is the last time the partner was seen and t p;þ is infinity. The likelihood function is the product of these probabilities over all couples. We compared several functional forms for b v ð Þ, including a flat function where viral load has no impact on transmission, a power function b 1þ10 Àk vÀv 50 ð Þ À Á1 g , a step function with three plateaus and one with four plateaus. We computed the likelihood of each model, searched for the maximum likelihood parameters using the Nelder-Mead method and compared different models based on Akaike Information Criterion (AIC). We tested how transmission varied with other epidemiological factors, including subtype, gender, and circumcision status, by allowing the parameters of the function b v ð Þ to vary with different values of these factors (Figure 1-source data 1).

Time to AIDS
The time at which an individual was first diagnosed with AIDS was defined in one of three ways. For the majority of participants, it was defined as the time at which CD4 count is first below 200 cells per mm 3 , (n = 203 of the 288 participants who declared AIDS) or the time at which three symptoms of AIDS (Sewankambo et al., 2000) were first observed (n = 43), whichever came first. If AIDS was not defined according to these criteria, but the individual was known to have died of AIDS, the time to AIDS was taken to be the time to death (n = 42). Time to AIDS was assumed to follow a gamma distribution whose expected value was a decreasing function of the viral load. For this decreasing function we used a flat function (as a null model), a decreasing Hill functiont AIDS v ð Þ ¼ t max and a step function with three plateaus. For the Hill function and the generalised Hill function, we set the maximum time a virus can be carried by its host to t max ¼ 40 years. We also allowed these functions to vary by subtype and gender. For a participant, the probability that AIDS occurred between time t no AIDS and time t AIDS is: where G k; t AIDS = ð Þ=G k ð Þ is the regularized gamma function which is the cumulative distribution function of the gamma distribution; k is the shape parameter and is the scale parameter set tot AIDS =k so that the expected value ist AIDS . When AIDS was not declared in the individual, t no AIDS was set to the date of last visit of this individual, and t AIDS was set to infinity. The likelihood function was obtained by multiplying these probabilities across all participants. We computed the likelihood of each model, searched for the maximum likelihood parameters and compared different models based on Akaike Information Criterion (AIC).

Epidemiological and evolutionary modelling
We developed a Susceptible-Infected compartmental ordinary differential equation (ODE) model, where the viral population is stratified by SPVL. The set-point viral load v of an individual is given by v=g+e where g is the genetic effect, transmitted with mutation from a donor to a recipient, and e is the environmental effect, which includes host and other environmental factors, and is independently drawn in each newly infected individual. The model is akin to classical quantitative genetics models and in particular to a previously described model of virulence evolution (Lande, 1976;Day and Proulx, 2004). The model neglects the impact on transmission of the higher viral loads in early and late phases of infection, however we relax this assumption in the individual-based model presented below. The number of infected with genetic and environmental effects (g, e) evolves as: |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} transmission À g þ e ð ÞY g; e; t ð Þ |fflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflffl ffl} death and the number of uninfected individuals Xchanges as: The first term in the equation for the number of infected reflects the increase in the number of infected individuals with viral genetic effect g and environmental effect e due to new transmission events from all possible donors. The second term describes death of infected individuals. In these equations, b : ð Þ is the transmission rate as a function of SPVL, P e ð Þ is the distribution of environmental effects in newly infected individuals, Q g ! g ð Þ is the mutation kernel, which is the probability that a donor with virus of genetic effect g gives an infection with a virus of genetic effect g, : ð Þ is the AIDS death rate as a function of SPVL (inversely related to the time to AIDS), b is the birth rate, b is the mean transmission rate in the population, and Y tot is the total number of infected.
The evolution of mean SPVL in the population is determined by the evolution of the mean viral genetic effect g, as the mean environmental effect is set at 0 without loss of generality. Under this model, we find that evolution of mean genetic effect (denoted g) is determined by the Price equation (Price, 1970): (see SI for derivation). The parameter a is the mean effect of mutations on SPVL in log 10 copies/mL. The first term of the equation is the Robertson-Price identity (Robertson, 1966;Price, 1970), which equates the change in character with the population covariance between a fitness measure, here bX À , and the genetic value of this character. The dependence on the number of uninfected individuals sets the balance between selection for higher transmission rate and selection for lower mortality. For example, when the number of susceptible individuals is large relative to its long-term equilibrium value = b, selection for higher transmission and higher mortality is favored, an effect that can be important in an emerging epidemic (Bolker et al., 2010;Shirreff et al., 2011;Berngruber et al., 2013). The second term describes the effect of biased mutation, proportional to incidence b X. We emphasize that knowledge of the molecular mechanism driving the decline in virulence is not needed to make evolutionary predictions. To derive further analytical insights, we assume that the number of susceptible individuals is approximately at its equilibrium value = b. We take advantage of the approximately normal distribution of SPVL in the population to derive an expression for the change in mean SPVL in prevalent cases over time, akin to Lande's classical quantitative genetic equation (Lande, 1976).
where V P is the variance in SPVL and h 2 is heritability of SPVL, the fraction of the variance explained by viral genetic factors. The mean SPVL in the population will evolve to the value maximizing mean fitness b= , which is 3.4 log 10 mL/copies (95% CI [2.6; 4.0], Figure 2a), at a pace proportional to heritability (which is assumed to be at equilibrium).
We parameterised the ODE model with our data, and solved it using the Euler method. Specifically, the initial SPVL in incident cases was 4.72 log 10 copies/mL. The transmission rate and mortality due to AIDS as a function of SPVL were the inferred functions (Figure 1). We tuned the baseline transmission rate and the birth rate to achieve the stable prevalence of 14% observed in the Rakai communities and a total population size of 20 millions adults. Declining prevalence would not change much the evolution of mean SPVL (Figure 2-figure supplement 5).
We assumed that the mutation kernel Q g ! g ð Þ was the density of a normal distribution with a non-zero mean a, and standard deviation s mut ¼ 0:15, evaluated at g À g. The density of environmental effects P e ð Þ was given by the density of a normal distribution with mean 0 and standard deviation 0.76. The variance parameters were chosen to achieve an approximately stable phenotypic variance of SPVL V P ¼ 0:91 and heritability at 36% as inferred in this cohort (Hollingsworth et al., 2010), and similar to the value of 30 to 40% established in a number of studies (Fraser et al., 2014;Mitov and Stadler, 2016).
Because only a small number of studies have linked within-host evolution to SPVL evolution, we explored three scenarios spanning a range of possibilities to parameterise a. (i) The dominant process is the increase in the frequency of CTL escape mutations, or other host-specific beneficial mutations imposing a RC cost, resulting in a reduced viral fitness and SPVL in the next typical infected person. We first parameterize a in this scenario using data on the inferred decline in mean SPVL in Botswana (Payne et al., 2014). The mean SPVL in a cohort in South Africa was 4.47, compared to 4.19 log 10 copies/mL in a cohort in Botswana where the epidemic started about 6 years earlier, giving an inferred decline of (4.19 -4.47) / 6 = À0.047 log 10 copies/mL/year, hypothesized to result from the rise of CTL escape mutations in the viral population. From the Price equation, the decline in mean SPVL is given by a , assuming constant prevalence and neglecting the virulence-transmission trade-off. Solving for a in a ¼ À0:047, with a mean death rate of = 0.1 per year as in the present cohort, gives a rough estimate of a ¼ À0:47 log 10 copies/mL under this scenario. (ii) Second, under a similar assumption that the dominant process is the increase in host-specific beneficial mutations imposing a RC cost, we now parameterize a assuming that these mutations impose a RC cost similar to that of random mutations. Indeed some immune escape mutations, for example CTL escape mutations arising in the pol, env or nef gene, appear neutral Troyer et al., 2009). In this scenario, the coefficient of variation of the distribution of SPVL effects within the host would be the same as that of the distribution of fitness effects of random mutations. This coefficient of variation was estimated at À1.609 in a previous study (Bonhoeffer et al., 2004), giving a ¼ Às mut =1:609 ¼ À0:093 log 10 copies/mL. (iii) The dominant process is the increase in frequency of mutations causing a within-host increase in RC, resulting in higher viral fitness in the next host. To our knowledge increase in RC over the course of infection has been evidenced only in one study (Kouyos et al., 2011). This study predicted an increase in RC over the course of infection of + 0.02 per year. The relationship between RC and SPVL inferred in that study (SPVL = 4.297 + 0.572 * RC, Figure 1A in [Kouyos et al., 2011]), together with the fact that the mean time to transmission is 5 years (as inferred from simulation of our IBM), leads to a ¼ 0:02 Â 5 Â 0:572 ¼ þ 0:057 log 10 copies/mL in this scenario.
Predictions of the ODE model were robust to the addition of a number of more realistic features of the HIV epidemic, as shown by an individual-based stochastic model of HIV evolution (IBM) with a higher level of complexity, described in details previously (Herbeck et al., 2014;Herbeck et al., 2016). The IBM relaxed several assumptions of the ODE. In contrast to the ODE that described only the asymptomatic phase of infection characterized by a stable SPVL value, the IBM explicitly modelled the dynamics of viral load within individuals. This included the acute phase of infection and the AIDS phase, which are both characterized by a higher viral load. The viral load in the acute and AIDS phases, and the duration of acute phase did not vary across individuals. In the ODE, transmission was modelled using the law of mass action; in the IBM a changing network of sexual contacts was modelled (although sexes were not explicitly modelled). The number of partnerships in which each individual was engaged was variable, and there was heterogeneity in partnership duration (between 3 and 60 months). Furthermore, the behavioural dynamics were designed to reflect a core group of transmitters; individuals in the core group (10% of the overall population) had shorter partnership durations and increased coital frequency. The rate of overall partnership formation and the distribution of coital frequencies were both calibrated to result in an equilibrium prevalence of 14%, corresponding to the average prevalence in the 1995-2015 period, as for the main model.

Temporal trends in SPVL
We inferred temporal trends in SPVL in incident cases using a multivariate linear model where we explained variation in SPVL as a function of the laboratory in which SPVL was measured, the assay used, whether VL was measured at a RCCS visit (individuals with unclear infection status), gender, circumcision status, age, date at seroconversion, and subtype ( Figure 2). Significance was assessed using type II analysis of variance, and confidence intervals were computed assuming asymptotic normality of the coefficients. Viral loads were measured in three different laboratories and using two types of PCR assays. This heterogeneity of laboratory approaches could potentially confound other trends; however our multivariate regression controlled for these effects, and revealed that they had small and non-significant effect sizes (Figure 2-source data 1), such that they did not generate any systematic variability in SPVL. SPVL decreased at a pace of À0.033 log 10 unit per year (CI [À0.057; À0.009], p=0.007, n = 603), resulting in a 0.66 log 10 unit change over the 1995-2015 period. The estimated rate was À0.022 (CI [À0.041; À0.002], p=0.027, n = 603) after non-significant predictors were removed. The linear temporal trend in mean SPVL was more supported than a model where time was fitted as five discrete categories (DAIC = 7.2). An important potential confounder of the reported trends in SPVL would have been the use of unreported antiretroviral therapy (ART) becoming more frequent at later time points. To exclude this possibility, we focused on the subset of SPVL values with viral loads measured before 2004, prior to ART availability in the region. Consistent with previous studies (Farzadegan et al., 1998;Gandhi et al., 2002), males had a higher SPVL than females (+0.259 log 10 viral copies/mL, CI [0.14; 0.38], p=4.2 10 -5 , n = 603) subtype D conferred higher SPVL than other subtypes (+0.211 relative to subtype A, CI [0.038; 0.38], p=0.017, n = 603), and older age conferred slightly higher SPVL (+ 0.009 per year, CI [0.0008; 0.016], p=0.030, n = 603). The decreasing trend in SPVL as well as the effects of gender, and subtype D, were all robust, as they had similar magnitude in several subsets of data (Figure 2-figure supplement 2).
We also inferred temporal trends in mean SPVL in prevalent cases by calculating each year the mean SPVL for cases who are infected, alive, and not lost to follow-up. In this analysis we found a decline in mean SPVL at a rate of À0.020 log 10 copies/mL/year (Figure 2-figure supplement 6). This decline was highly significant (p=5.06eÀ08, N = 18) but the p-value calculation did not account for non-independence across years (the same prevalent cases may be included in multiple years).

Review of coinfections as potential confounders of the SPVL trend
Coinfections such as tuberculosis, malaria, the herpes simplex virus 2, gonorrhea, or syphilis, might increase viral load in HIV infected individuals (Modjarrad and Vermund, 2010). A reduction in prevalence dp of a disease with an effect dv on SPVL would cause a dp dv decrease in mean SPVL in the population. We systematically reviewed these diseases and show that potential reduction in prevalence of these diseases is unlikely to cause the observed 0.4 log 10 copies/mL decline in mean SPVL.

Subtype-specific predictions
We extended the ODE model to account for subtype-specific dynamics, in particular the dynamics of subtype A, subtype D, and AD recombinants (called 'R'). The functions describing transmission as a function of SPVL were the inferred subtype-specific step functions (Figure 3a). The function describing time to AIDS as a function of SPVL was the step function inferred on the whole cohort, as there was little difference between subtypes (Figure 1c). Starting conditions were parameterised based on the data, as follows. Mean SPVL in incident cases in 1995 were v A;0 ¼ 4:58, v D;0 ¼ 4:79, v R;0 ¼ 4:66 log 10 copies per mL of blood. The frequencies of the three types in 1995 were p A =0.17, p D =0.7, p R =0.13. The mutation kernel Q g ! g ð Þ was, for all three types, the density of a normal distribution with a non-zero mean a ¼ À0:093 (scenario 2), and standard deviation s mut ¼ 0:15, evaluated at g À g. The density of environmental effects P e ð Þ was the density of a normal distribution with mean 0 and standard deviation 0.67. These parameters were chosen to achieve an approximately stable phenotypic variance of SPVL V P ¼ 0:7 (the phenotypic variance in SPVL within subtype) and heritability at 36%.
We assumed super-infection occurred on a fast timescale and immediately resulted in one strain replacing the other. Super-infection with A and D, A and R, or D and R strains resulted in a recombinant subtype ('R') with probability r. We chose r=1 as it best reproduced the rise in frequency of recombinants (Figure 3d).

Contributions of within-subtype and between-subtype evolution to SPVL trends
We decomposed the trend in mean SPVL into the sum of two components, one due to changes in subtype frequency, one due to within-subtype changes in SPVL. The change in mean SPVL between time 0 and t reads: With linear trends in subtype frequencies, p i;t ¼ p i;0 þ dp i t, and in mean SPVL within subtypes, v i;t ¼ v i;0 þ d v i t. Replacing yields: Because the changes are slow (i.e. dp i and d v i are small), we can neglect the term in dp i d v i and approximate the change as: The first term reflects the changes in mean SPVL due to changes in mean SPVL within subtype. The second term reflects the changes in mean SPVL due to changing subtype frequencies. We have v A;0 ¼ 4:58, v D;0 ¼ 4:79, v R;0 ¼ 4:66 log 10 copies/mL, and dp A ¼ 0:009, dp D ¼ À0:016, dp R ¼ 0:007, inferred from a generalized linear model with multinomial response describing subtype frequency as a function of calendar time. Thus the change in mean SPVL due to the rise in subtype A and R is À0.003 log 10 copies/mL per year. Assuming the same rate of SPVL evolution in all subtypes, :022 log 10 copies/mL per year (a rate inferred from the linear model, adjusted for subtype and other covariates), the change in mean SPVL due to within-host evolution is also À0.022 log 10 copies/mL per year. Thus the total mean SPVL change is ¼ À0:025 log 10 copies/mL per year and most of this change is due to within-subtype evolution.
Child Health and Development, Johns Hopkins Population Center (grant 5P30 HD 06268); the Fogarty Foundation (grant 5D43TW00010); John Snow Inc, Pfizer Inc (grant 5024-30); the Rockefeller Foundation; the World Bank STI Project, Uganda; a cooperative agreement (W81XWH-07-2-0067) between the Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., and the U.S. Department of Defense (DOD). The views expressed in this article are those of the author and do not necessarily reflect the official policy or position of the Department of Defense, nor the US Government.

Incident cases
To investigate the time trends in SPVL and the relationship between SPVL and time to AIDS, we used the incident cases, defined as participants with a first positive result within two RCCS study visits of their last negative result. The median time elapsed between last negative and first positive result was 1.25 year (minimum 0.56, maximum 3.59 years).

SPVL calculation
The date of seroconversion for each participant was defined as the mid-point between the last HIV-negative date and the first HIV-positive date. The SPVL was defined as the mean log 10viral load at all visits occurring more than 6 months after seroconversion, and before the beginning of antiretroviral therapy (ART).
ART was prescribed in the Rakai district from 2004 onwards. We had clinical records on the date of ART start when it was prescribed in an RHSP clinic prior to 2013. However, participants may also have received ART from clinics outside of RHSP. Moreover, by 2013, ART distribution at the majority of RHSP clinics was transferred to the Ugandan Ministry of Health. To determine whether participants were prescribed ART outside of RHSP clinics, we relied on self-reported use of ART. For these cases, we defined the date of self-reported ART start as the mid-point between the last date no ART was reported and the first date where ART was reported. We further removed all viral load measures at a date later than this selfreported ART start date. 2135 viral load measures out of the 5180 were taken 6 months after the date of seroconversion and before ART.
A number of viral loads measures were below detection limit of the assay (<400 copies/mL), in which case the viral load is reported as 'undetectable' and we did not know its precise value. Such low viral loads could mean these participants are 'elite controllers'; but they could also be due to measurement error and/or degradation of the RNA sample, participants having been prescribed ART in a clinic other than the RHSP clinics without reporting it, or participants falsely believed to be infected. We assumed that any single viral load measure below 400 copies/mL in a participant with more than one measure over 400 copies/mL was due to either error or sample degradation, and we removed these measurements from our analysis (44 measures out of 2135). We verified that the HIV infection status of participants by reassessing all available serological HIV tests results, including rapid assays, ELISAs, and Western Blot assays. We removed from the analysis any participants whose infection status was unclear, i.e. those participants who only have 'undetectable' viral load and who either have at least one 'indeterminate' or 'negative' Western Blot and two or less ELISA tests, or who have no Western Blot and at least one negative ELISA test (7 measures out of 2135). All remaining 'undetectable' viral loads (n = 2071) were set to 200 copies/mL. We systematically verified the robustness of our analysis to the inclusion of the 'undetectable' viral loads.
Because of the uncertainty on the timing of infection, it is possible that a participant is still in acute infection more than 6 months after the presumed date of infection (the mid-point). To avoid this possibility, we also eliminated first measures where the viral load was ten times greater than all subsequent measures (13 measures out of 2135).
Following these steps of data cleaning, we obtained SPVL for 647 individuals, each SPVL measure representing 1 to 16 visits (median = 2) and a total of 2071 viral load measures. We also computed a SPVL value where all 'undetectable' viral loads are discarded, in which case we obtain SPVL for 603 individuals, each SPVL measure representing 1 to 16 visits (median = 4).

Subtyping
Subtype of each participant was determined using one or several of four different methods, based on (i) sequence fragments of gp41 and p24, (ii) Roche 454 sequencing of gp41 and p24, (iii) multi-region hybridization assays on gag, pol, vpu, env, and gp41 and (iv) full genome sequences (Kiwanuka et al., 2008;Conroy et al., 2010). All subtype information was compiled for each participant, by genomic region (gag, pol, vpu, and env). In 467 cases out of the 576 participants with any subtype information, all subtype information agreed on a single subtype, which was then assigned to the participant. If the data indicated infection with multiple HIV subtypes (i.e. one or more HIV subtypes detected in the same gene region), we assigned the subtype 'multiple' (M) to the individuals. Lastly, if HIV-1 subtype differed across but not within genes, a recombinant subtype was assigned to the participant. This algorithm resulted in 12 subtype categories, of which the most represented in the population were subtype D (n=342) subtype A (n=118), recombinant DA (n=41), recombinant AD (n=28), multiple infections M (n=17), subtype C (n=8) and various types of recombinants (n=22). A linear regression of SPVL against subtype revealed that a more parsimonious model with only five simplified subtype categories, A, C, D, M, R (all recombinants) was a better fit to the data than a model with all 12 subtype categories (4AIC = 5.5).

Time to AIDS
The time to AIDS was the time at which CD4 count is first below 200 cells per mm 3 , three symptoms of AIDS were first observed, or of AIDS death. The time to AIDS is a reasonable approximation of the time during which the virus will be transmitted. Transmission could be also interrupted because of host death not related to HIV, but AIDS was the most common cause of mortality in Uganda among 13 -44 years (Mulder et al., 1994;Sewankambo et al., 2000). Moreover, there is little opportunity for transmission after AIDS is declared, as host death occurs shortly after the onset of AIDS (median 1.46 years) and it has been estimated that very little transmission occurs in the 10 months prior to death (Hollingsworth et al., 2008).

Serodiscordant couples
To investigate the relationship between SPVL and transmission rate, we focused on retrospectively identified serodiscordant couples, a subset of data largely distinct from the incident cases. In these couples, one partner is positive, while the other is initially negative and may become infected over the course of follow-up. The median duration of couple follow-up was 2.6 years. As before, we set 'undetectable' viral loads to 200 copies/mL, and the SPVL was defined as the mean log 10 -viral load at all visits occurring more than 6 after seroconversion, and before the beginning of antiretroviral therapy (ART). We also calculated a SPVL with undetectable values removed to check that this small number of imprecise measures at low viral loads did not affect inference of transmission rates.
Subtypes were computed as before, except we simplified further the categories to keep only subtype A, subtype D, and 'Other/unknown' subtypes (including other subtypes, multiple infections, recombinants, and unknown).
Our analysis data set included 817 couples with SPVL values from the index HIV-infected partner. Each SPVL represented 1 to 15 measures (median = 2). The SPVL data from these couples are summarized in Table 2.

Inference of temporal trends in SPVL
The variability in the number of measures underlying SPVL poses several problems for the analysis of temporal trends in SPVL. There was no effect of the number of measures on SPVL (ANOVA with number of visits as factor, p = 0.61, n = 603), but the variance of SPVL across individuals was much higher among individuals measured once than across those measured multiple times. This is expected as multiple measures reduce the effects of intra-individual fluctuations in VL and measurement error on SPVL. This means the error around each SPVL value was not the same across individuals, the true SPVL being approached much more closely in individuals with many measures than in individuals with a single measure. When analysing the determinants of SPVL, this will cause heteroscedasticity, violating an assumption of linear models. To overcome this problem, it has been proposed to fit the viral load trajectories within individual using a fractional polynomial within a mixed model, then to regress several descriptors of this fractional polynomial over the predictors of interest (Pantazis et al., 2014). This method cannot easily include multiple predictors, and is clearly not applicable in our case where 242 out of 526 participants have one viral load measurement only. To verify that our results were not affected by this problem, we also conducted an analysis on the subset of SPVL values which include 2 measurements or more and where the standard deviation across these is less than 1 log-viral load unit across measures ('strict SPVL').

Linear model
We used a linear regression to explain the variation in SPVL across participants as a function of gender, age at seroconversion, subtype, and date of seroconversion, using the data on the incidence cases for whom the date of seroconversion was known precisely. We also tested the effect of being circumcised in males. We corrected for potential confounding due to the use of several viral load assays and measurements in several laboratories by including these factors in the regression.
We first analysed the determinants of SPVL being 'undetectable' (that is, those participants where all viral load measures are undetectable), using a logistic regression over epidemiological covariates. Next we analysed the determinants of SPVL on the subset of data that excludes undetectable SPVL, because these SPVL values caused the distribution of SPVL to be non-normal, violating an assumption of linear models (the 'undetectable' were set at log 10 (200)). To verify the robustness of our predictions, we ran the analysis on several subsets of data: (i) SPVL calculated from at least two viral load measures, with a standard deviation of less than 1 log-viral load unit across measures, and including undetectable viral loads ('strict SPVL') -a subset for which the heteroscedasticity problem will be reduced -, (ii) data partitioned in two subsets corresponding to the assays Abbott m2000 and Roche 1.5 (iii) data partitioned in three subsets corresponding to the laboratories (John Hopkins, RHSP, Walter Reed), (iv) data partitioned in male and female.
The distribution of SPVL was visually very close to a normal distribution. However the Shapiro -Wilk test of normality rejected the normal distribution (n = 603, p = 0.0018), in particular because of an excess of low SPVL.
The probability that the SPVL is 'undetectable' was mainly determined by the assay, Abbott m2000 giving an 'undetectable' SPVL with higher probability than Roche 1.5 (Appendix 1table 1, n = 647). There was no effect of subtype, except that individuals whose subtype is unknown also tended to have 'undetectable' SPVL with higher probability, because a low viral load made subtyping harder.
Appendix The laboratory where the viral load was assayed and the assay used had a small effect on SPVL. The inclusion of VL values measured at a RCCS visit ('RCCS visit' in Figure 2-source data 1) led to significantly lower SPVL (-0.3 log 10 copies/mL, CI [-0.51; -0.09], p = 0.006, n = 603), which was expected as viral loads were measured at a RCCS visit only when the infection status of the participants was unclear from the serology results. Variation in SPVL was mainly predicted by gender and date of seroconversion ( Figure 2-figure supplement  2).

Fitting the transmission rate as a function of SPVL
Using data on transmission and SPVL in 817 serodiscordant couples, we estimated the transmission hazard as a function of SPVL (Methods). The transmission hazard is the expected number of transmission events per unit time in a serodiscordant couple. This revealed that transmission increased significantly with SPVL, and the best functional form to describe this relationship was a step function with three plateaus. The best relationship was robust to the presence or absence of 'undetectable' SPVL in the dataset (Figure 1-figure  supplement 2). We also tested whether transmission varied by subtype, gender, and whether the male was circumcised or not.

Fitting the time to AIDS as a function of SPVL
The time to AIDS was assumed to follow a gamma distribution whose expected value was a decreasing function of the viral load (Materials and methods). We derived the likelihood function, found maximum likelihood parameters using the Nelder-Mead method, and compared the different models based on AIC. The step function was the one that described best the relationship between time to AIDS and SPVL (Figure 1). We tested how several covariates affected time to AIDS including subtype, and gender, but none of them significantly improved the fit. The best relationship was robust to the presence or absence of 'undetectable' SPVL in the dataset (Figure 1-figure supplement 2).
Þ f g; e ð Þdg de where f g; e ð Þ ¼ Y g; e ð Þ= Y tot is the frequency of each infected type in the population. The number of susceptible evolves as: The evolutionary dynamics, the dynamics of the frequency distribution f g; e ð Þ, is given by: Transmission acts as a force that brings the environmental effect to zero, as the mean of the environmental effect in new recipients is 0. However the positive correlation between the death rate and the environmental effect tend to decrease the environmental effect, as those individuals with higher environmental effect will die faster.
The mean genetic effect in the population evolves as: