Link between the numbers of particles and variants founding new HIV-1 infections depends on the timing of transmission

Abstract Understanding which HIV-1 variants are most likely to be transmitted is important for vaccine design and predicting virus evolution. Since most infections are founded by single variants, it has been suggested that selection at transmission has a key role in governing which variants are transmitted. We show that the composition of the viral population within the donor at the time of transmission is also important. To support this argument, we developed a probabilistic model describing HIV-1 transmission in an untreated population, and parameterised the model using both within-host next generation sequencing data and population-level epidemiological data on heterosexual transmission. The most basic HIV-1 transmission models cannot explain simultaneously the low probability of transmission and the non-negligible proportion of infections founded by multiple variants. In our model, transmission can only occur when environmental conditions are appropriate (e.g. abrasions are present in the genital tract of the potential recipient), allowing these observations to be reconciled. As well as reproducing features of transmission in real populations, our model demonstrates that, contrary to expectation, there is not a simple link between the number of viral variants and the number of viral particles founding each new infection. These quantities depend on the timing of transmission, and infections can be founded with small numbers of variants yet large numbers of particles. Including selection, or a bias towards early transmission (e.g. due to treatment), acts to enhance this conclusion. In addition, we find that infections initiated by multiple variants are most likely to have derived from donors with intermediate set-point viral loads, and not from individuals with high set-point viral loads as might be expected. We therefore emphasise the importance of considering viral diversity in donors, and the timings of transmissions, when trying to discern the complex factors governing single or multiple variant transmission.


Introduction
Characterising the strong bottleneck that occurs during HIV-1 transmission, and understanding the role of selection in determining which viral variants are transmitted, are important for HIV-1 prevention strategies (Joseph et al. 2015). It is now well established that most infections are founded by one or few distinct viral variants (Gottlieb et al. 2008;Keele et al. 2008;Abrahams et al. 2009; Bar et al. 2010;Herbeck et al. 2011;Rolland et al. 2011;Tully et al. 2016), with each of these variants referred to as a transmitted/founder (T/F) virus. One T/F virus might naïvely be assumed to mean one T/F viral particle. However, it is currently unknown whether each T/F virus results from the successful transmission of a single viral particle, or multiple viral particles of the same variant, and as a corollary, how the number of viral particles founding an infection relates to the number of T/F variants. To avoid potential confusion, throughout we avoid using the term 'virus', and instead refer to viral particles or viral variants, as appropriate (see Glossary).
The observation that most HIV-1 infections are founded by only one or a few variants has been used as evidence for a strong selective bottleneck at the point of transmission, giving hope that signatures of transmission can be found and exploited when designing vaccines (Boutwell et al. 2010;Joseph et al. 2015;Mundia Kariuki et al. 2017). However, the extent to which selection influences which viral variants are present at the start of an infection is a source of current debate (Shaw and Hunter 2012;Carlson et al. 2014;Oberle et al. 2016;Gonzalez, DeVico, and Spouge 2017;Oberle et al. 2017). It has also been observed that infections founded by multiple variants tend to have higher set-point viral loads (SPVLs) than those founded by single variants, with the suggestion that multi-variant transmission might be a trait associated with recipient individuals (Janes et al. 2015).
However, the hypotheses that small numbers of T/F variants are indicative of selection, and that multi-variant transmission might be driven by recipient host factors, are missing explicit consideration of the complex interplay between viral load, viral diversity, and the timings of transmissions from infector individuals (donors) within a population. For a single donor at a fixed point during infection, the number of variants transmitted to a recipient is expected to be higher if a larger number of viral particles are transmitted. However, once the possibility of transmission occurring at any point during a donor's course of infection is taken into account, it is not necessarily the case there is a simple link between these two quantities if multiple transmissions are considered. This is because the viral load typically varies by orders of magnitude during the course of an untreated infection, and viral diversity tends to increase as an infection progresses (Shankarappa et al. 1999;Zanini et al. 2015;Puller, Neher, and Albert 2017). For example, early in an HIV-1 infection, the viral load is typically high but viral diversity is usually low (Delwart et al. 2002), whereas during chronic infection the viral load is lower but diversity is typically higher. As a consequence, the relationship between the numbers of T/F variants and the numbers of T/F particles in a recipient population is likely to depend not only on selection and recipient host factors, but also on the compositions of variants in donors and the timings of transmissions.
Here we present a probabilistic model, informed by withinhost deep-sequencing (Zanini et al. 2015) and population-level (Fraser et al. 2007) data, to investigate the likely relationship between the numbers of variants and the numbers of particles founding new sexually transmitted infections in untreated populations, as well as the link between donor SPVLs and the numbers of T/F variants among recipients. We also consider the impact that selection, and a bias towards early transmission (due to treatment and/or other behavioural factors), might have on the compositions of new infections.
Considering the timings of transmissions explicitly will make it easier to deduce the relative importance of selective and non-selective bottlenecks during transmission within different risk groups. As we will discuss, the timings of transmissions might also provide an explanation for some perplexing results, such as the proportion of multi-variant transmissions in some studies of populations of men who have sex with men (MSM) being comparable to standard estimates in heterosexual populations (Gottlieb et al. 2008;Herbeck et al. 2011;Rolland et al. 2011;Tully et al. 2016) despite evidence for weaker selection during MSM transmission (Tully et al. 2016).

A probabilistic model of transmission
To characterise the relationship between the numbers of viral particles and viral variants that found infections in a population, we first developed a probabilistic model describing heterosexual transmission from a single untreated donor at a fixed time during infection, which we then scaled up to a population of untreated donors. Computing code for running our model can be accessed at https://github.com/robin-thompson/ MultiplicityOfInfection A schematic of the probabilistic model for a single transmission event is shown in Fig. 1. In brief, the expected numbers of particles and variants that are successfully transmitted depend on the viral load and the distribution of variants within the donor at the time of transmission. We account for the observations that HIV-1 is only transmitted rarely (Boily et al. 2009), but when transmission does occur, multiple viral variants found the new infection reasonably often (Keele et al. 2008;Abrahams et al. 2009;Bar et al. 2010). These observations cannot be captured simultaneously by simple models, such as binomial transmission models, since in these models a low probability of transmission predicts that, when transmissions occur, they will only be with single particles and therefore single variants (see Abrahams et al. 2009, and also Supplementary Text S1-Binomial Models of Transmission). To reconcile these two observations, we assume that transmission can only occur in a small fraction, f, of potential transmission acts, when Glossary Set-point viral load (SPVL): The approximately stable viral load observed during chronic HIV-1 infection (see top left subpanel of Fig. 1). Viral load (HIV-1): The concentration of HIV-1 RNA in plasma, measured in copies per millilitre. We assume this is proportional to the concentration of viral particles in the blood. Viral particle: An infectious unit, potentially consisting of either an individual virion or an infected cell. Viral variant: A specified viral genotype (or group of genotypes). In this analysis, a viral variant refers to a unique genotype in the specific genomic region analysed. Virus: A single viral particle, or group of viral particles, all of the same variant. A new HIV-1 infection can be established by single or multiple viruses.
environmental conditions are appropriate. This is supported by observations that HIV-1 is most likely to be transmitted when a potential recipient is experiencing abrasions in the genital tract, genital inflammation, or coinfection with another pathogen (Benki, McClelland, and Overbaugh 2005;Haaland et al. 2009;Fox and Fidler 2010;Carlson et al. 2014;Neidleman et al. 2017;Selhorst et al. 2017).
To connect from this single transmission event scale to the population scale, we then considered a population of donors with different SPVLs ( Fig. 2A) and at different stages of infection. The composition of SPVLs in the donor population was determined using data on the proportions of infected individuals with different SPVLs within a population and a characterisation of their expected profiles of infection (Fraser et al. 2007), both shown in Fig. 2. The proportion of individuals with each SPVL is slightly different from the distribution described by Fraser et al. (2007): ours has been adjusted from data restricted to seroconverters to represent a full population of donors at different stages of infection. This reflects the fact that seroconverters who go on to have high SPVLs will survive for shorter periods than those with low SPVLs. Finally, we used previously published longitudinal deep-sequencing data to parameterise a function describing the expected distribution of unique variants within an individual throughout infection, as described below.

The distribution of viral variants as infection progresses
We fitted a nonlinear mixed-effects model to previously published whole-genome deep-sequencing data from longitudinally sampled infected hosts (Zanini et al. 2015) to characterise the distribution of variants in an untreated individual as an  HIV-1 infection progresses. In the absence of selection at the point of transmission, we assume this reflects the distribution of variants available for transmission in each individual in the population. For all three regions of the genome analysed (integrase, p24 and nef), a discretised gamma distribution provided the best fit to the data, characterising h(x,s)-the proportion of the xth most common viral variant in the within-donor pathogen population at time s years since the individual became infected (see Section 4). Since the highest viral diversity was observed for integrase, we used the parameterisation for this region in our main analysis (see Table 1 of Supplementary Text S1). From the data, and our model fit, it can be seen that in the early years of an infection a small number of variants dominate, but as an infection progresses a higher diversity of variants (i.e. a more uniform distribution of variants) is seen (left column of Fig. 3). Throughout our manuscript, by high diversity of variants we mean an approximately uniform distribution of variants as opposed to a distribution skewed so that there are high proportions of some variants and low proportions of others. The corresponding distributions for p24 and nef are shown in Supplementary Fig. S1.
To incorporate selection at transmission into our analysis, we assumed that variants that are more similar to those that initiated the infection are more likely to be transmitted, since these represent variants that previously were successfully transmitted (Pybus and Rambaut 2009;Lythgoe and Fraser 2012;Lythgoe et al. 2017). This assumption is supported by the faster rates of evolution of HIV-1 within-hosts compared to betweenhosts (Lemey, Rambaut, and Pybus 2006;Pybus and Rambaut 2009;Alizon and Fraser 2013), the transmission of slowly evolving within-host lineages in a large transmission chain (Vrancken et al. 2014), and evidence for the transmission of founder-like virus in transmission couples (Sagar et al. 2009;Redd et al. 2012). We weighted the relative proportions of each variant in the sequencing data based on how close they are to the consensus sequence at the first time point in that donor, and then refitted our model (see Section 4, and also Table 1 of Supplementary Text S1). When selection is included, the effective diversity of variants available for transmission is reduced (right column of Fig. 3). We show the very strong selection case here (a s = 3), but results are also shown for strong and intermediate selection in Supplementary Fig. S2, where the parameter a s is a measure of the strength of selection. The most common variant available for transmission in the presence of selection is not necessarily the most common variant in the absence of selection. Here we assumed that selection acts through the preferential transmission of founder-like variants, represented by a reduction in the diversity of variants available for transmission. However, any form of positive selection, in which some genotypes are favoured over others, could be implemented in our modelling framework.

Numbers of particles and viral variants that successfully found new infections
Using our transmission model, we characterised the relationship between the numbers of T/F particles and the numbers of T/F variants in newly infected individuals within a population. We set the proportion of the time that the environment within an uninfected individual is appropriate for transmission (f) and the per-particle transmission probability in each act when the environment is appropriate (p) so that transmission occurred in three out of every 1,000 transmission acts (Boily et al. 2009), and multiple variants founded 30% of new infections (Keele et al. 2008;Abrahams et al. 2009;Tully et al. 2016), although we also later show that our qualitative results are robust to reasonable variation around these values. In general, specifying the per-act transmission probability and the probability of multiple T/F variants uniquely determines f and p for a given distribution of variants within the donor population ( Supplementary Fig. S3). The values of f and p used for each of the cases we considered are given in Table 2 of Supplementary Text S1. The distributions of the numbers of T/F particles and variants in the recipient population were then derived analytically for three scenarios: no selection, selection at transmission, and transmission biased towards early infection but no selection.

No selection (Case 1)
The probability that a new infection is founded by n particles decreases as n increases, with a chance of approximately 40% that a single particle is transmitted, and 25% that two particles are transmitted. Similarly, the probability that N variants are transmitted also declines as N increases ( By fitting a single distribution to the sequencing data from Zanini et al. (2015), we effectively made the simplifying assumption that every donor had an infection that was originally founded by the same number of variants. Most of the individuals in that cohort were probably infected by single variants (Puller, Neher, and Albert 2017), and the fitted distribution of variants reflected this. Since around 30% of donors in entire real populations would instead have been infected by multiple variants, we conducted a supplementary analysis in which we assumed that 30% of donors had infections founded by two distinct variants, and that the resulting lineages from each T/F variant evolved independently within each individual (Supplementary Text S1-Multiple variants founding infections in donors). The results were qualitatively similar: the link between the numbers of T/F particles and T/F variants depended on the timing of transmission. Since we fitted a single  transmission. The right panel is omitted since the bias towards early infection is assumed to change the proportion of infections in early infection (before 2 years), but not the composition of transmissions occurring during early infection, and so the result is identical to the top right panel of (A). In the case with selection, due to the reduced diversity of variants available for transmission, some infections must be with large numbers of particles so that Prob(transmit multiple variants) ¼ 0.3. Because of these large numbers of particles, the ranges on the x-axes in all panels of (B) and the y-axes of the middle panel of (B) are larger than in the equivalent subfigures in (A) and (C). For parameter values, see Tables 1 and 2 of Supplementary Text S1. In (C), the same parameter values as (A) were used but with infection w ¼ 10 times more likely at times when donors have been infected for less than scrit ¼ 2 years. distribution to the sequencing data, we also assumed implicitly that the distribution of variants in a donor is independent of SPVL. This is supported by longitudinal sequencing data in which a link between SPVL and viral diversity is not apparent (Puller, Neher, and Albert 2017;Raghwani et al. 2018).

Selection at transmission (Case 2)
We investigated how selection for particular variants would affect our results, and how sensitive this is to the strength of selection, a s . We used the fitted distributions of variants available for transmission with selection, as described above (see right panel of Fig. 3 and Supplementary Fig. S2). Using these new distributions of variants, we reparameterised the values of the perparticle transmission probability (p) and the proportion of the time the environment is appropriate for transmission (f) so that the probability of transmission occurring per act remained at 0.003 and the probability of multiple variants founding each new infection was 0.3 (see Table 2 of Supplementary Text S1). We carried out this reparameterisation step because we sought to consider the numbers of transmitted particles and variants if selection is currently acting in heterosexual populations for which transmission occurs in three out of every 1,000 potential transmission acts and 30% of infections are founded by multiple variants. If selection had instead been imposed without reparameterising the model, then the numbers of transmitted variants would have been reduced compared to the case with no selection.
Even for very strong selection (a s ¼ 3), since we reparameterised f and p we found that the overall distribution of the numbers of variants founding new infections remained similar to the case in which there is no selection at transmission (grey bars in Fig. 4B; see Supplementary Fig. S5 for equivalent figures with different strengths of selection). However, because selection reduces the diversity of viral variants available for transmission (right column of Fig. 3), a higher per-particle probability of transmission per act (p) was required to achieve 30% of new infections being founded by multiple variants. As a consequence, it became more likely that many particles were transmitted compared to the case in which there is no selection, but still only one or a few variants (Fig. 4B). In other words, in terms of the numbers of transmitted variants, the reduced diversity of variants available for selection was cancelled out by the larger numbers of particles likely to be transmitted (which permitted more variants to be transmitted). Since large numbers of particles yet few distinct variants could then be transmitted, including selection in the model enhanced the prediction that the numbers of particles and variants founding new infections are not closely linked quantities.

Bias towards early transmission (Case 3)
There are many reasons why there might be a bias towards early transmission. Interventions are likely to lead to a bias towards early transmission, because awareness of HIV-1 status can cause behaviour changes and treatment reduces infectiousness (Wawer et al. 2005;Hollingsworth, Anderson, and Fraser 2008;Cohen et al. 2011), but there is a delay between infection and diagnosis and a further delay before action is taken. Needle sharing when injecting drugs (Baggaley et al. 2006;Maljkovic Berry et al. 2007) or concurrent partnerships (Hollingsworth et al. 2015;Pines et al. 2016) might also increase the number of contacts an individual has, thereby increasing the chance of transmission particularly during the highly infectious primary stage of infection (Parrish et al. 2013).
We therefore considered the distributions of transmitted particles and variants when transmission is more likely in early infection than in later infection. We assumed that potential transmission acts occur in early infection (defined to be times since infection s < s crit years) at a rate enhanced by a factor w > 1 compared to later time points (see Section 4). Since our model was parameterised using population-level data in which it is assumed that there is no bias towards early transmission, we did not change the values of f and p here from the no selection case considered above.
In Fig. 4C, we considered the case where w ¼ 10 and s crit ¼ 2 years, representing for example a population in which test and treat interventions are very effective. A greater proportion of new infections were derived from donors in early infection than in the absence of bias towards early transmission, and so transmissions consisted of fewer distinct viral variants, but larger numbers of particles per successful transmission act. When a smaller value of the weighting parameter, w, was used, a similar but less extreme pattern was seen ( Supplementary Fig. S6).

Link between donor SPVL and recipient number of founder variants
We also considered the characteristics of the donors in the population that were most likely to transmit multiple variants (Fig. 5).

No selection (Case 1)
We derived the joint distribution characterising the numbers of transmitted variants and the SPVLs of donors in the population (left panel of Fig. 5A). The most likely combination was a single variant infection arising from a donor with intermediate SPVL, reflecting the fact that most infections are with single variants (Fig. 4A), and most infected individuals have intermediate SPVLs (Fig. 2A).
The chance that a randomly chosen infection from a donor with each SPVL consisted of multiple variants is shown in the middle panel of Fig. 5A. It can be seen that, despite donors with high SPVLs being likely to transmit large numbers of particles, each infection is most likely to be with single variants. This is because donors with high SPVLs are likely to die more quickly than individuals with lower SPVLs, so they tend to transmit before the founder viruses have diversified.
We then focused solely on infections arising with multiple T/F variants. High-SPVL donors are not only uncommon, but also tend to transit only one variant due to their short duration of infection, meaning only a small proportion of infections founded by multiple variants arose from donors with high SPVLs. Compared to donors with intermediate SPVLs, low-SPVL donors are also uncommon, and not very infectious, which together outweigh the fact that their long duration of infection provides more time for the virus to diversify. Consequently, most infections in the population that were founded by multiple variants arose from donors with intermediate SPVLs (right panel in Fig. 5A). Some of the intermediate quantities for calculating the results of Fig. 5A are shown in Supplementary Fig. S7.

Selection at transmission (Case 2)
When selection is incorporated into the model, a donor with a low SPVL is now much more likely to generate a multi-variant infection than a donor with higher SPVL (centre panel in Fig. 5B). This is because including selection makes it more likely that transmission early in infection will result in new infections founded by only one variant. As a result, the donors who survive for long periods have the opportunity to transmit after viral diversity has increased, and these are the donors with low SPVLs. In the selection case, we therefore find that infections founded by multiple variants are likely to have arisen from donors with lower SPVLs than the case with no selection (Fig. 4B right panel).

Bias towards early transmission (Case 3)
When transmission is heavily weighted towards early infection, individuals with very high SPVLs become more likely to transmit multiple variants than individuals with low SPVLs (Fig. 5C  middle). When there is no bias towards early transmission, high-SPVL individuals have shorter durations of infection than other individuals, and so have less opportunity to transmit after viral diversity has accumulated. When there is a strong bias towards early infection, however, all individuals effectively have similar, shorter durations of infectiousness, and are all unlikely to transmit before the founder viruses have diversified significantly. In this case, the increased probability of transmitting multiple viral particles (and so multiple variants) at higher SPVLs becomes more important. However, even in this case, a randomly chosen multiple-variant infection is most likely to have originated from a donor with intermediate SPVL (Fig. 5C  right), since most donors have intermediate SPVLs.

Discussion
We developed a probabilistic model to characterise HIV-1 transmission in an untreated population, focusing on the relationship between the numbers of particles and the numbers of variants founding new infections. A key finding from the model is that transmissions with more T/F particles are not necessarily those with more T/F variants, with the timing of transmission during the donor's course of infection being of critical importance. This is especially noticeable when donors transmit during primary infection, since viral loads are high but viral diversity is low. The observation that most infections are initiated by one or a few variants has been used as evidence for the role of selection at the point of transmission and/or during establishment of infection in recipients (Boutwell et al. 2010;  Tables 1 and 2 of Supplementary Text S1. In (C), the same parameter values as (A) were used but with infection w ¼ 10 times more likely at times when each donor has been infected for less than 2 years. Joseph et al. 2015;Mundia Kariuki et al. 2017). In particular, if a new infection is founded by few variants, then it might be assumed that selective factors, such as physical barriers to transmission, are preventing other variants from being transmitted or successfully establishing the infection in the recipient. However, the role of selection, as opposed to transmission simply being a stochastic process, has been debated (Shaw and Hunter 2012;Carlson et al. 2014;Oberle et al. 2016;Gonzalez, DeVico, and Spouge 2017;Oberle et al. 2017).
Here, we have shown that by considering viral diversity within donors explicitly, imposing selection is not required to reconcile within-host and population-level data, or to explain the low numbers of T/F variants generally observed. We do not contend that selection is unimportant-a number of phenotypic transmission factors have been identified (Parrish et al. 2013;Joseph et al. 2015;Foster et al. 2016;Iyer et al. 2017)-but rather that the viral bottleneck at transmission is likely to be due to both selective and stochastic forces. By including selection in the model, we found an even weaker link between the numbers of T/F variants and T/F particles than we found in the absence of selection, with a higher proportion of infections being founded by large numbers of particles but few variants. Similarly, when a bias towards early transmission was included in the model, which could be due to treatment or behavioural factors, single-variant but multiple-particle transmission became more likely.
We have shown that the distribution of viral variants during the course of infection, the timings of transmissions, and the strength of selection are some of the many factors that are likely to influence the compositions of new HIV-1 infections. This complex interaction of different factors could help to explain some confusing observations. For example, Tully et al. (2016) found that the proportion of infections founded by single variants in an MSM population was similar to that typically observed in heterosexual populations, despite finding signatures of reduced selection compared to heterosexual transmission. This is puzzling because reduced selection is expected to result in more variants being transmitted, since more variants are likely to possess characteristics that permit successful transmission and establishment of a new infection. This can be explained if transmission tended to occur early in the MSM population considered by Tully et al. (2016), as has been suggested for MSM transmission more generally (Hollingsworth et al. 2015;Pines et al. 2016). This is because lower viral diversity within the pool of transmitting donors due to early transmission will tend towards more new infections being founded by single variants. The increased diversity of infections due to reduced selection may therefore have been balanced by the reduced diversity of infections due to early transmission. Given these complex interactions, we urge caution in interpreting the proportion of infections founded by single variants as a universal statistic, even between populations that on the surface appear quite similar. Differences in the timing of transmission between different MSM populations might help to explain the higher proportion of infections founded by multiple variants observed in some studies (Keele et al. 2008;Li et al. 2010;Chaillon et al. 2016) compared to others (Gottlieb et al. 2008;Herbeck et al. 2011;Rolland et al. 2011;Tully et al. 2016), although differences in sequencing and methods of analysis might have also a role (Chaillon et al. 2016). All else being equal, we would predict fewer infections with multiple T/F variants in populations in which transmission is biased towards early infection.
A positive association between multiple variants founding an infection and a high SPVL of that infection has been observed (Janes et al. 2015). Since a higher SPVL is associated with faster progression to AIDS, understanding the factors leading to multivariant transmission is important for inferring the mechanisms driving the severity of different HIV-1 infections. This will also inform the development of evolutionary epidemiological models. It has been suggested that recipient host factors might be important (Janes et al. 2015;Joseph et al. 2015). We hypothesised that the SPVL of the donor might be another key factor involved in multi-variant transmission. Contrary to what might be expected, we found that most infections founded by multiple variants do not arise from donors with high SPVLs, but from donors with intermediate SPVLs; individuals with high SPVLs tend to progress rapidly to AIDS, and therefore viral diversity has limited time to accumulate. It is not known why there is a positive association between multi-variant transmission and a higher SPVL, although it has been suggested that viral diversity per se has a role (Janes et al. 2015;Chaillon et al. 2016). Another possibility is that when more variants are transmitted, there is a higher chance that one of these variants possesses viral factors associated with high SPVLs within the recipient.
Our aim here has not been to develop a detailed model of HIV-1 transmission, but rather to present the simplest possible model that encapsulates important features of transmission within a population. We therefore made a number of simplifying assumptions, including assuming random contacts between donors and potential recipients, and ignoring host genetic factors that might affect viral diversity within donors.
Nonetheless, most other simple models cannot accommodate the infrequent transmission, yet reasonably high proportion of infections founded by multiple variants, observed in real populations (Abrahams et al. 2009). We captured this by assuming that transmission is only possible a small fraction of the time, when the environment is appropriate. In doing this, we assumed that when a potential transmission act occurs the environment is either entirely permissive (each available viral particle can be transmitted independently of the others with a constant probability) or entirely resistant to transmission. In reality, the permissiveness of the environment to transmission is likely to be a continuous quantity, rather than always entirely 'on' or 'off'. Abrasions in the genital tract, genital inflammation, or infection with other pathogens might increase the probability of transmission (Benki, McClelland, and Overbaugh 2005;Haaland et al. 2009;Fox and Fidler 2010;Carlson et al. 2014;Neidleman et al. 2017;Selhorst et al. 2017). Facilitation, whereby a virus being transmitted changes the environment in the recipient so that further transmission is more likely to immediately occur (Gross, Porco, and Grant 2004;Abrahams et al. 2009), might also be able to reconcile infrequent transmission with the reasonably high proportion of infections founded by multiple variants. In the facilitation scenario, the probability of transmission is assumed to be low, yet when a particle is transmitted the environment temporarily changes enabling further particles, and therefore potentially multiple variants, to be transmitted. A possible mechanism is via the production, by infected cells, of extracellular vesicles containing the viral protein gp120. The protein interacts with the surfaces of uninfected cells, activates the cells, and therefore makes them targets for infection (Arakelyan et al. 2017). Other mechanisms might also be able to reconcile the low transmission probability of HIV-1 with the significant proportion of new infections founded by multiple variants, and could provide an interesting avenue for further exploration using theoretical models.
The link between viral load and the transmission rate is also in need of further study (Blaser et al. 2014). Results from models based on binomial distributions for the numbers of transmitted particles, such as the model we have developed, display an approximately linear relationship between the viral load and the transmission rate. In contrast, empirical evidence suggests that the transmission rate increases linearly with the logarithm of the viral load (Gray et al. 2001;Fraser et al. 2007). This discrepancy might arise partly because the relationship between the SPVL and the transmission rate has been determined in monogamous heterosexual couples, making it difficult to detect multiple transmissions from the same donor and thus underestimating rates of transmission when viral loads are high. Assuming a nonlinear relationship between the viral load and the number of particles available for transmission in the donor's genital tract and/or a nonlinear relationship between viral load and viral fitness (Lythgoe et al. 2016) might also allow binomial models to reproduce observed data.
A key measure that we have approximated in our model is the distribution of viral variants within donors as infections progress. We used previously published short-read deep sequencing data from longitudinally sampled untreated individuals (Zanini et al. 2015) to estimate the distribution of distinct variants in a typical donor throughout infection. However, the diversity of variants estimated using a short segment of the genome is almost certainly going to underestimate the true diversity (Korber et al. 2000;Zagordi et al. 2012;Luk et al. 2015;Laskey et al. 2016), and conversely, sequencing errors are likely to lead to an overestimate of the number of rare variants (Zagordi et al. 2010). As a result, our fitted distributions characterising the diversity of variants in donors might not reflect the diversity present in a typical individual throughout infection. To investigate the importance of this uncertainty for our model predictions, we repeated our analyses assuming both lower and higher variant diversities within donors by varying the parameters of the gamma distribution characterising variant diversity ( Supplementary Fig. S8), as well as different values of the perparticle transmission probability ( Supplementary Fig. S9). By varying the per-particle transmission probability in Supplementary Fig. S9, we also implicitly tested the robustness of our results to the assumption that 30% of new infections are founded by multiple variants. Our key conclusion remained unchanged: the link between the numbers of T/F particles and variants depends on the timing of transmission.
Understanding the relative roles of selection and other factors in determining the strong bottleneck that occurs during HIV-1 transmission is relevant for vaccine design (Haynes et al. 2016) and determining the drivers of pathogenesis (Fraser et al. 2014), as well as in the development of epidemiological and phylodynamic models that can capture viral transmission in a realistic fashion. Here, we have highlighted the need to consider viral diversity in donors at the times of transmissions as an additional important, but hitherto underappreciated, factor.

Modelling temporal changes in donor viral load
Following a previously used modelling approach (Fraser et al. 2007), we divided the infectious period of an infected donor into three stages: primary, chronic, and pre-AIDS. The viral load of the donor depends on the stage of infection. For the full mathematical details of this approach and its parameterisation, see Supplementary Text S1-Viral load profiles; a summary is below.
In primary infection, which lasts s p = 0.24 years, the viral load for all donors is V p = 8.7 Â 10 7 viral particles per millilitre.
During the chronic stage of infection, the viral load V c is fixed at set point, which varies by several orders of magnitude between donors (Henrard et al. 1995;Bonhoeffer et al. 2003). Donors with higher SPVLs progress to AIDS more quickly than individuals with lower SPVLs (Fraser et al. 2007), so that the time spent in chronic infection s c (V c ) depends on the SPVL. The probability that a randomly chosen donor has SPVL equal to V c , which we denote g(V c ), is shown in Fig. 2A and given in detail in Supplementary Text S1-Viral load profiles. In the pre-AIDS stage of infection, which lasts s a = 0.75 years, the viral load is V a = 2.4 Â 10 7 viral particles per millilitre for all individuals.
Throughout infection, the number of particles available for transmission in each donor in the population is assumed to be proportional to the viral load. We denote the number of particles available for transmission in donors by n p , n c , and n a in primary, chronic and pre-AIDS infection, respectively. In the analyses in the main text, we have assumed that the constant of proportionality, k, is equal to one, so that for example n c = V c . We note that our results are very similar if different values of k are used. This is because the results depend (approximately) only on the product of k and the per-particle transmission probability, p, rather than the individual values of these parameters-and larger values of k correspond to smaller values of p when the model is refitted so that the per-act transmission probability is 0.003 (see Supplementary Text S1-Relationship between viral load and number of particles available for transmission).

Modelling variant diversity in donors
We used publicly available whole-genome deep sequencing data from ten longitudinally sampled HIV-1 infected individuals (individuals 1-3 and 5-11 from Zanini et al. 2015) to obtain an approximation of the distribution of HIV-1 variants within a typical infected individual as infection progresses.
Specifically, we used information from three distinct regions of the viral genome, chosen for their wide coverage and because they come from three different functional categories: integrase (enzyme, HXB2 reference positions 4230-5096); p24 (structural, positions 1186-1878); and nef (accessory, positions 8797-9417). Each of the reads from these regions is around 300 base pairs long, and we conducted our analyses separately for each region. We only included samples that contained a large number (at least 1,000) of reads, so that a distribution of variants within each sample could be characterised.
We assumed that each distinct read corresponded to a different variant of the virus, and then found the proportion of each variant in each sample. Variants at proportions lower than 0.005 were removed, to protect against sequencing error. The resulting distribution of variants in one of the individuals (individual 3) throughout their course of infection, obtained using data from integrase, is shown by the red dots in Fig. 3. The distributions of variants obtained from the remaining individuals, and for the different regions of the viral genome, are shown in Supplementary Figs S10-S12.
To characterise h(x,s)-the proportion of the xth most common variant in each donor at time s years after they became infected-we considered three candidate distributions: gamma, exponential, and Pareto. These distributions all provided a reasonable fit to the data (see Fig. 3 ; for x ¼ 1, 2, 3, . . .N s , where N s is the maximum number of distinct variants observed in any individual at any single time in the data. The values of N s for integrase, p24 and nef were 56, 45, and 51, respectively. The function g p (i, j, k) is the probability density of a gamma distributed random variable with shape parameter j and scale parameter k at the value i, that is We fitted the parameters of each of the candidate functions h(x,s) to data using a nonlinear mixed-effects modelling approach. This fitting was performed using the R software function nlme with fixed effects of variant x and time since infection s, and a random effect of the individual that each read was sampled from. Including a random effect of the sampled individual amounts to a partial pooling of the data between individuals to improve our estimates of the parameters applicable to the broader population from which these individuals were drawn. By doing this, the differences between these individuals (which are not of direct interest, and are difficult to infer for individuals with little data) were not estimated, nor did we fully pool the data (which would bias estimation towards highly sampled individuals). The candidate models were compared using the Akaike information criterion scores associated with their model fits. The resulting parameter values for each of the three regions that we considered are shown in Table 1 of Supplementary Text S1. While the gamma distribution that provided the best fit to the data has the property that there is effectively only a single variant available for transmission in the donor at small times since infection, we also considered cases in which there could be multiple variants at equal frequency early in a donor's course of infection (Supplementary Text S1-Multiple variants founding infections in donors).

Modelling selection
When modelling selection at transmission, we assumed that the variants most similar to those that donors were themselves infected with were more likely to be transmitted, since they are likely to retain characteristics that make them suited for onward transmission (Lythgoe et al. 2017). To investigate the effect of preferential transmission of these 'founder-like' variants on the distributions of the numbers of transmitted particles and variants according to our model, we manipulated the variant proportions in the sequencing data to obtain effective variant proportions, assuming a selection coefficient that decays exponentially with Hamming distance from the founding consensus sequence (Lythgoe and Fraser 2012).
First, the founder sequence was estimated for each donor by taking their earliest available sample, and finding the most common base at each position. Then, if the proportion of sequence x in a donor in a reading taken at time s years since infection is h(x,s), where here we use h(x,s) to refer to the proportion in the data rather than the fitted model, then the effective proportion in the donor was assumed to be h as x; s ð Þ¼ exp Àa s dðxÞ À Á hðx; sÞ; where the parameter a s characterises the strength of preferential transmission of founder-like variants, and d(x) is the number of base pairs at which the variant x differs from the founder sequence for that donor. For example, for strength of selection a s = 1, then the effective proportion of a variant that is different from the founder at three base pairs is calculated by reducing the original proportion by multiplying by factor exp(À3). The resulting distribution of variants in a particular donor at each time s years since infection was then normalised so that the effective proportion of variants made up a valid probability distribution (as an example, if all variants are equally different from the founder, the effective distribution is identical to the original distribution without selection). Since the most common variant in the absence of selection was not necessarily the most common variant once selection had been applied, the variants were renumbered so that the variant with the highest effective proportion was labelled variant 1, and so on. As in the case with no selection, the models were then fitted to the resulting distribution of the effective quantities of each variant. The best-fitting model and parameter values for the data from the integrase region are shown in Table 1 of Supplementary Text S1 for a s = 0, 1, 2, 3. The value a s = 0 corresponds to the case in which there is no selection.

Modelling transmission
We assumed that environmental conditions are suitable for transmission in a fraction f of transmission acts, and that when conditions are suitable each particle has a probability p of being transmitted, independently of the other particles. The probability of n particles being transmitted and going on to generate a new infection is therefore given by Prob n particles transmitted in a single act À Á ¼ f v n p n ð1 À pÞ vÀn ; for n = 1, 2, 3, . . ., where the parameter v is the number of particles available for transmission in the genital tract of the donor at the time that the potential transmission act occurs. The probability of transmitting N distinct variants in any single potential transmission act is given by Prob N variants transmitted in a single act À Á Prob N variants transmitted ð j n particles transmittedÞ Probðn particles transmittedÞ; where the first factor in the sum depends on the effective distribution of variants available for transmission in the donor (accounting for changes in the effective diversity of variants likely to be transmitted given selection at transmission).

Population-scale quantities
The following quantities were derived analytically by integrating over all infected potential donors in the population, and all times during their courses of infection. The variables n p , n c , and