Host behaviour driven by awareness of infection risk amplifies the chance of superspreading events

We demonstrate that heterogeneity in the perceived risks associated with infection within host populations amplifies chances of superspreading during the crucial early stages of epidemics. Under this behavioural model, individuals less concerned about dangers from infection are more likely to be infected and attend larger sized (riskier) events, where we assume event sizes remain unchanged. For directly transmitted diseases such as COVID-19, this leads to infections being introduced at rates above the population prevalence to those events most conducive to superspreading. We develop an interpretable, computational framework for evaluating within-event risks and derive a small-scale reproduction number measuring how the infections generated at an event depend on transmission heterogeneities and numbers of introductions. This generalizes previous frameworks and quantifies how event-scale patterns and population-level characteristics relate. As event duration and size grow, our reproduction number converges to the basic reproduction number. We illustrate that even moderate levels of heterogeneity in the perceived risks of infection substantially increase the likelihood of disproportionately large clusters of infections occurring at larger events, despite fixed overall disease prevalence. We show why collecting data linking host behaviour and event attendance is essential for accurately assessing the risks posed by invading pathogens in emerging stages of outbreaks.


Introduction
The prediction and prevention of superspreading events, which are characterized by the primary infected individuals generating disproportionately large numbers of secondary infections [1], is a central challenge in infectious disease epidemiology.For acute, directly communicable diseases such as COVID-19, SARS and Ebola virus disease, superspreading is a major driver of transmission that leads to less frequent but more explosive outbreaks than we might expect under more classical models, which neglect the substantial variability in secondary infections generated by infected hosts [2].During early or emergent stages of a potential epidemic, when there are limited immunity levels in the host population and transmission dynamics are inherently stochastic, superspreading events have been found responsible for spurring both the initial growth and eventual persistence of epidemics and for limiting the effectiveness of non-pharmaceutical interventions [1,[3][4][5].
Consequently, identifying the main factors that underlie the risk of superspreading is crucial for effective disease management [4].Many of these factors are known, with heterogeneities in (i) host characteristics (e.g.susceptibility, infectiousness and contact patterns), (ii) pathogen biology (e.g.transmission routes and viral loads), (iii) environmental effects (e.g.ventilation and gathering size), and (iv) host behaviours (e.g.social customs and intervention adherence) all contributing to the risk of superspreading [3,4,[6][7][8].However, incorporating these factors in parsimonious modelling frameworks can be difficult because the mechanisms linking them to superspreading are still not fully understood.This is particularly the case for factors (iii) and (iv), with recurrent calls for more comprehensive data collection to help study the relationships among behavioural, environmental and epidemiological trends [9][10][11].Here, we explore how a key feature of host behaviour can shape the likelihood of superspreading and provide a mathematical demonstration of the benefits of collecting and analysing more data to elucidate the links between human behaviour and infectious disease epidemics.
We consider how heterogeneity in perceptions of the risk associated with infection throughout a host population impacts heterogeneity in the transmission of new infections in the early stages of an epidemic.Risk awareness is a documented phenomenon in which individuals engage in self-protective behaviours in response to the perceived health, economic and other dangers of acquiring infection.The relationship between risk perception and self-protection spans a spectrum in a population from highly risk-averse individuals to those with larger risk appetites (e.g.risk deniers) [12][13][14].During an epidemic of a directly transmitted pathogen, risk-averse individuals may reduce their number of contacts by limiting their socializing and mobility, while those with larger risk appetites may increase their relative infection exposure (e.g. by hosting unregulated gatherings when infections are rising) [15].Risk awareness can improve intervention efficacy (e.g.reducing mobility) or negate it (e.g. by deliberate non-compliance) and substantially change outbreak amplitudes and durations [11,[16][17][18][19].
Despite its importance, the interplay between risk awareness and superspreading risk has not been studied in detail, with most research focusing on the pathogen and host characteristics instead of exploring behavioural patterns.We study this interplay under a simple but plausible hypothesis-that more risk-averse hosts are more likely to avoid events of larger size, owing to the perception of heightened infection risk at those events [14,19].Here, events are short-term gatherings (e.g.parties) so that only one generation of infection is possible.While data directly linking risk perception to event attendance are unavailable, it is known that individuals modify their behaviour in response to population-level prevalence and that the variability in individuals' level of acceptable risk, relative to this prevalence baseline, correlates well with the extent to which contacts are reduced during epidemics [12][13][14]20,21].A likely pathway for reducing exposure to invading pathogens is by limiting attendance at riskier (voluntary) events.This logic underlies our hypothesis and subsequent analysis.
This awareness mechanism implies, for a fixed population prevalence, that larger events (e.g.concerts and sports matches) are more likely to be attended by individuals who are less risk averse.As there is limited infection-induced immunity in the host population during early epidemic stages, these individuals are also more likely to be infected.This stems from observations that those with larger numbers of contacts have elevated chances of acquiring infection, and these individuals are also likely to have larger risk appetites (more risk-averse individuals tend to reduce contacts) [22].We posit that this coupling between behaviour and environment (i.e.modification of event attendance owing to risk perception and event size) may amplify the chances of superspreading occurring at larger events, which have the capacity to support excessive number of infections.To test this hypothesis, we develop a framework to model the number of infections y generated at an event of size n, given that x initially infected individuals attend that event.This yields a small-scale reproduction number that extends recent approaches [23][24][25] to understanding within-event transmission in three directions.
First, we explicitly model the transmission-reducing effects of finite numbers of susceptible individuals (n − x) and imported infections (x) and at the event.As event size and duration grow, these finite size effects become less important and our small-scale reproduction number converges to R 0 , the popular basic reproduction number.Second, we embed heterogeneity in transmission at the event within our small-scale reproduction number by allowing variations in secondary infections that are controlled by a dispersion parameter, k.This is a within-event version of the seminal model of superspreading [1,5,26] and includes the broad influence of factors (i)-(ii) described earlier.Third, we account for how x changes (stochastically) with n.This considers factors (iii)-(iv) and depends on the prevalence of infection in the population as well as the size-biased importation rate of infections into the event, ϵ n , which is influenced by the spectrum of risk appetites about that prevalence.
The functional dependence of ϵ on n serves as a parsimonious model of risk awareness and allows us to assess how host behaviour shapes the risk of superspreading (e.g. if ϵ n is an increasing function of n, then this implies a higher infection import rate into larger events).We explore our central hypothesis by comparing the relative and combined impact of ϵ n and k on the tail probability of observing a disproportionately large number of secondary infections y at an event.We demonstrate, for a fixed overall import rate (equalling the wider population prevalence), that risk awareness can substantially amplify the chances of superspreading at a large event, compared with the scenario in which all individuals attending the large event are assumed to have similar perceptions of infection risk.This pattern holds regardless of k and, in some instances, we find that the increase in superspreading risk from risk-aware behaviour outweighs that from inherent transmission heterogeneity.

Event reproduction numbers including import risk and transmission heterogeneity
We develop a framework for quantifying the risk of acquiring infection at an event (e.g. a party, concert and sports match), based on a small-scale (within-event) reproduction number.We detail this below but also sketch the main steps of our methodology and list key notation in figure 1.An event is defined as a short-term grouping of n people and we allow 0 ≤ x ≤ n of the individuals attending the event to be infectious.Initially, there are x introductions (i.e.imported infections) at this event and n − x susceptible hosts.We assume no prior immunity in the population and let P y n be the probability of 0 ≤ y ≤ n − x new infections being generated at that event as we describe in equation (2.1): (2.1) This depends on P y x, n , the probability of y new infections occurring given the x infectious individuals initially (for events of size n) and the prior probability of those x imports, P x n .We define the small-scale reproduction number for this event as R x ≝ x −1 E y x, n , with the expected number of infections generated by x imports denoted by E y x, n .We expand this to obtain equation (2.2): Here, R x measures the expected number of new infections generated by each import when there are x imports in total.Although intuitive, this reproduction number formulation is novel.
A central idea of this study is the importance of P x n and its dependence on event size n.Earlier work assumed that P x n depends solely on the prevalence of the infection in the population [25], neglecting how heterogeneities in human behaviour may affect the number of imported cases at a given event of size n.To our knowledge, alternative models for P x n informed by human behaviour and the influence of this behaviour on the number of infections generated at the event have not been explored.The heterogeneity in host behaviour that we consider relates to the spectrum of risk appetites, i.e. the fact that different individuals perceive different infection risks associated with attending an event of size n.This spectrum alters the rate of importing infections into events, relative to the prevalence and so modulates P x n .
Our event or small-scale reproduction number also generalizes prior research by including the effects of finite x and n.Since only one generation of infection can occur at an event, this finite initial condition can strongly shape clustering patterns, underscoring the value of modelling P x n .The original event reproduction number [24] considers a single imported infection and relates to (but is not the same as) our R 1 , which we later show is always an upper bound for R x .By extending the event reproduction number definition, we model the influence of P x n on the risk of acquiring infection at any event directly.As we explain below, R x also embeds heterogeneity in transmission from both host characteristics as well as pathogen biology [1] and is explicitly related to the population-level basic reproduction number, R 0 [27].
To convert equations (2.1) and (2.2) into a computable form, we draw on characteristics of both the event and disease.We denote the (frequency-dependent) transmission rate as β and the expected duration of an individual infection as d, so that R 0 = βd.We then consider an event that lasts for time τ, which is assumed to be substantially shorter than d, so that infectiousness outlasts the event and at most one generation of infection is possible at the event.We also assume that the event is closed, i.e. for any specific event, n takes a constant value.The split of n into x and n − x completely defines the epidemiologically important states for the event.
If there is only one infected individual at the start of the event, then the probability that any susceptible host gets infected is the secondary attack rate (SAR), p = 1 − e − βτ n , making the standard assumption that the times to infection are exponentially distributed.When there are s susceptible individuals, then E y 1, n = sp.While this assumes that all the susceptible individuals are exposed to all infectious ones at the event, we can model more realistic contact networks as in [27] by modifying s to be the subset of susceptible hosts likely to be exposed to each infection (this connects network and random mixing models).
We generalize this approach in three main directions.First, we model the effect of variability in the number of imported infections.If there are x imports to the event, then the SAR becomes p = 1 − e − τ d x n R 0 with β = R 0 d .Since there are initially s = n − x susceptible individuals, the expected number of infections generated at the event is E y x, n = n − x p.The leads to the event reproduction number R x in equation (2.3) This formulation has interesting limiting behaviour at various x.As the number of susceptibles grows in excess of imports, i.e. n x increases, the O x n terms in the Taylor series approximation of R x in equation (2.3) become negligible.As n becomes large, we find R x τ d R 0 .If the event lasts for the duration of infectiousness (τ = d), then R x R 0 .This convergence makes sense since our formulation is equivalent to a finite or small-scale version of random mixing.
Second, we expand this model to include realistic heterogeneity from host characteristics and pathogen biology.It is unlikely that every infectious individual has the same transmissibility and we expect substantial variations in the numbers of infections generated by each infected individual [1,28].We, therefore, allow R 0 to have some distribution from which every import is randomly sampled and let R 0 j indicate the sample for the jth of the x imports at the event.This heterogeneous version of R x is in equation (2.4), with expected number of infections E y x, n = xR x .Note that equation (2.4) is of the form n − x x p het , with p het as a heterogeneous SAR.
We compute the mean of R x across the transmission heterogeneity for x infectious imports in equation (2.5), with E het indicating expectation about the distributions of the R 0 j and M b a as the moment-generating function about b evaluated at a.As the transmissibility of the x imported infections is independently sampled, x M R 0 j a .This reduces to M R 0 a x if samples are identically distributed.The expected number of infections under this model as a function of x is (2.5) Following [28], we evaluate the variance around R x as 2 and applying properties of M b a .The variance in the expected number of infections is All of these statistics remain valid for any model of transmission heterogeneity but we derive analytic relations under the most widely used model of [1] in the subsequent section. (2.6) Third, we examine how the likelihood of finding that x infectious individuals have attended the event impacts the above quantities.This involves evaluating how P x n weights the formulae in equations (2.4)- (2.6).This weighting may be random, depend on behavioural preferences as we posit in the next section (i.e.risk awareness) or be assigned using other rules.We propose that a more informative measure of the risk of acquiring infection from an event of size n and duration τ is the import-weighted event reproduction number R imp as in equation (2.7): (2.7) While R imp averages over the possible numbers of imports, it is still a random variable with samples taken from the distribution controlling transmission heterogeneity.Accordingly, it has statistics E het R imp and V het R imp that we compute by summing and Modelling framework for event-level transmission subject to risk awareness.We outline the central steps and define the main notation underlying our proposed framework for modelling transmission patterns at small events.We refer to equations defined in §2.This framework accounts for how heterogeneity in infection risk perception among individuals modulates the number of imported cases x at an event of size n and hence contributes to the secondary infections generated at that event y.
weighting E het R x and V het R x by P x n and P x n 2 , respectively.The expected number of new infections E imp y n that is associated with R imp follows as in equation (2.8): (2.8) Similarly, we obtain the heterogeneous statistics E het E imp y n and V het E imp y n but the quantities being weighted by P x n and P x n 2 are now, respectively, xE het R x and x 2 V het R x .These all proceed from the properties of expectations and variances applied to a linear weighted sum with independent terms.

Statistical models for event reproduction numbers and importation patterns
Having outlined measures of infection risk in equations (2.7) and (2.8), we build into our framework some likely approaches for integrating transmission heterogeneities and import patterns (including when those imported infections are risk-sensitive).This allows us to parsimoniously model traditional and behavioural drivers of superspreading.In addition, we incorporate process stochasticity and provide a full Bayesian formulation for our framework.We start by including the seminal heterogeneity model of [1], which describes individual variations in transmissibility through a gamma distribution with dispersion k and mean R 0 .
We write this as k with Gam as a shape-scale parametrized gamma distribution.Using scaling and summing properties of these gamma variables, we hence obtain k .This assumes that samples of the basic reproduction number of individuals are independent and identically distributed and lets us analytically evaluate the moment-generating function as We substitute this into equations (2.4)-(2.6) to precisely compute the mean and variance of the infections and event reproduction number conditional on a total of x introductions as detailed above.We can relax the assumption that the R 0 j are independent and identically distributed by instead sampling them from different distributions or by applying alternative dispersion models [28].The heterogeneous R 0 j constitute a major and traditionally modelled source of stochasticity underpinning the risk metrics we propose in equations (2.7) and (2.8).
A less studied source of stochasticity is variability in the probability that infectious individuals attend the event.Previous work [25] has treated this deterministically, setting the probability or rate that an attending individual is infected as equal to the population prevalence ρ (or ρ adjusted by an exposure factor when it is known that the event draws individuals who are less or more likely to be infected).This is modelled as x ∼ Bin n, ρ , with Bin indicating a binomial distribution.We generalize this under our behavioural hypothesis.We posit, for a fixed overall importation level, that this import probability increases with n.This models risk awareness, in which risk-averse individuals who are less likely to be infected avoid larger events, or equally the individuals attending larger sized events are less risk-averse and more likely to be infected.Risk appetites may also depend on event duration τ, but we do not explore this here.
We model event size bias using sorted Dirichlet weightings.We consider m events, the ith of which has size n i and import rate ϵ n i .Sizes sequentially span all integers from n min to n max uniquely (i.e.n max = n min + m − 1) but we can relax this to include any distribution across event sizes of interest.We fix the total importation rate across all m events.This constrains m n i , conserving the total number of infections introduced across events so that the mean importation rate equals ρ.We enforce this constraint to allow fair comparison between the conventional model, in which all imports occur with rate ρ, and our size-biased variations, which describe how variability in perceived risk by hosts affects their attendance at events.This constraint causes some event sizes to have importation rates above and others below ρ and allows us to model a spectrum of risk appetites about the baseline prevalence.This variability in risk perception aligns well with trends found in behavioural surveys [12][13][14].
The ϵ n i values encode our event size biases.We construct them by first sampling a set of random weights w i from a symmetrical Dirichlet distribution, i.e. w i ∼ Dir r with r as a shape parameter applied to every w i and ∑ i = 1 m w i = ρ.The w i set spans all m weights with smaller values of r leading to more skewed weightings.At very large r, w i ≈ ρ m for all i.To model risk awareness, where we expect that less risk-averse individuals are more likely to attend larger n i events relative to more risk-averse individuals, we sort the w i in ascending order so that w i increases with n i .We replicate this procedure across many runs to include variability from the Dirichlet distribution.For every sampled, sorted w i , we define ϵ n i ≝ w i n i −1 ∑ l = 1 m n l .This satisfies our benchmarking constraint and parsimoniously models the spectrum of risk appetites across the host population.
We conceptualize this constraint by observing that in the conventional model E x i = n i ρ for the ith event so that m n i ρ imports occur into all m events on average.In our risk-aware model E x i = n i ϵ n i and so we chose the ϵ n i to ensure ∑ i = 1 m n i ϵ n i equals the ∑ i = 1 m E x i from the conventional model.However, we can relax this constraint to describe when risk awareness itself changes the prevalence (provided we use updated values at time snapshots) and we can generalize the model to allow r to also be size dependent (i.e.r n i ).In summary, we generate import rates that are size biased (with this bias accounting for risk awareness) and use a single parameter r to set the strength of this behavioural effect.
Integrating the above models for heterogeneity and importation, we complete our algorithm (see figure 1) for sampling import-weighted distributions of event size risk using equations (2.1) and (2.2).We formulate this in equations (2.9) and (2.10) with semi-colons discriminating between probabilities that we evaluate from a distribution and parameters specifying that distribution.For convenience, we use S x = ∑ j = 1 x R 0 j for denoting heterogeneous samples and ϵ n for general size bias.We use the probability distributions in equations (2.9) and (2.10) together with the definitions of equations (2.1) and (2.2) to compute the measures of event risk that we propose in equations (2.7) and (2.8).These marginalize over the distributions of import rate and transmission heterogeneity, which are degenerate when ϵ n is constant for all n or all R 0 j = R 0 , respectively.In the Results, we examine the properties of our computational framework and apply it to explore how behaviour affects superspreading.Our framework is freely available at: https://github.com/kpzoo/smallscaleR.

Results
In the Methods, we developed a framework to assess the risk of acquiring infection at an event by deriving a small-scale reproduction number and the expected number of infections that will occur at the event.Both measures depend on the levels of heterogeneity in transmission and variability in the rate at which infectious individuals are likely to attend the event (i.e.imports).Here, we examine the influence of these two key in determining outbreak patterns.

Superspreading risk depends on importations and dispersion
Much research has investigated how heterogeneity in transmission can cause superspreading and hence increase the number of infections likely to result from a gathering or event [1,23].Specifically, there has been study of how the dispersion parameter k modulates the risk of superspreading events [26,28,29].Generally, smaller values of k < 1 are predictive of larger transmission heterogeneity and superspreading risk.However, the influence of the number of importations x at an event of size n has received relatively little attention.We examine this by computing the statistics derived in equations (2.4)- (2.6), in which we defined the reproduction number R x as a function of the imports and the resulting number of expected infections E x, n .Risk statistics for an event with heterogeneous transmission.We plot the mean (E . ,top panels) and variance to mean ratio (VM . ,bottom panels) of the small-scale event reproduction number R x (a) and the mean count of new infections E y | x, n (b) as a function of the number of imports x.We compute these from equations (2.4)-(2.6)and compile statistics over 10 5 samples from heterogeneous offspring distributions with dispersion parameters k ranging from 0.1 to 10 (increasing from blue to red, with grey depicting all intermediate values).For comparison, we show the large-scale reproduction number τ d R 0 and the number of initial susceptible individuals at the event, n − x.We repeat this analysis at a larger value of τ d R 0 = 3 in electronic supplementary material, figure S1.
We consider an event of size n = 30 over a range of dispersions 0.1 ≤ k ≤ 10 with a large-scale limit (see equation (2.3)) of τ d R 0 = 0.3.We sample R x and E y x, n from heterogeneous gamma distributions describing the transmissibility of the sum of all imported infections (see §2) and compute statistics from these samples using equations (2.5) and (2.6).We plot these results in figure 2 to explore the properties of these statistics.Interestingly, we find R x is a decreasing function of x, even though every R x has the same limit of τ d R 0 .The single import scenario of R 1 relates to the event reproduction number proposed in [24].If we assume, as in several branching process models, that all imports have reproduction number R 1 instead of R x , then E y x, n and the risk of acquiring infection at the event may be notably overestimated.
Furthermore, increasing heterogeneity (decreasing k) increases the variance of our statistics but decreases mean risk as we see from the inversion of the rank of blue to red curves between top and bottom panels in figure 2, with VM .as the ratio of variance to mean.Finally, we see that the dependence of our statistics on the number of imports is substantial and can be as critical as the value of k for describing spread.The value of x that leads to the largest possible (peak) number of secondary infections at the event is not obvious (and not inferable from R x ) as imports both cause infections and reduce the available susceptible individuals.In electronic supplementary material, figure S1, we show how this peak changes and that the mean risk difference can be appreciable in different settings.This underpins the importance of modelling finite event sizes and signifies that a crucial factor driving the risk of acquiring infection at an event is the import distribution P x n , which is rarely studied.

Population prevalence modulates the superspreading potential at events
Having observed the importance of the number of imports, x, when assessing the transmission risk at events, we explore the influence of the distribution of introductions to the event, P x n .Conventionally [25], P x n can be defined as a binomial distribution with the probability of an import being equal to the prevalence of the infection in the wider population, ρ.This is our null model and we explore how it integrates with our proposed event statistics (see equations (2.4)-(2.8)).We Figure 3.The importation rate magnifies the effects of heterogeneous transmission.We plot the log survival probabilities for the number of new infections y (a) and related event reproduction numbers R (b).We account for the probability of x imports (distributed as Bin x; n, ρ ) at an event of size n = 30 with the population prevalence as ρ (increasing from blue to red with grey indicating intermediate values).Larger P y ≥ c signifies more realized heterogeneity (higher likelihood that disproportionate numbers of infections result from the event), while larger P R ≥ c signifies more heterogeneity in transmissibility (higher potential for superspreading events).(a,b) Dashed curves are for k = 10 (spread is mostly homogeneous) and solid curves are for k = 0.1 (spread is heterogeneous).We compute these quantities from equations (2.4)-(2.8).(c) Histograms of 10 5 samples of y at two ρ values underpinning (a,b).Thicker tails or more rightwards mass in these distributions indicate a higher chance of a large number of infections at the event.We repeat this analysis at a larger event size of n = 100 in electronic supplementary material, figure S2, for comparison.
consider epidemics in their initial stages, i.e. there is no vaccination-or infection-acquired immunity, so ρ is small and there are n − x susceptible individuals at the event.We maintain parameter settings from figure 2 but weight samples of small-scale reproduction numbers and mean numbers of imported infections using P x n , which is Bin x; n, ρ with ρ ranging from 0.01 to 0.1 (1−10%).We compute histograms and statistics of these samples in figure 3. We examine homogeneous (k = 10) and heterogeneous (k = 0.1) dispersion levels and plot the log-survival (or tail) probabilities of realized numbers of new infections y and associated small-scale event reproduction numbers R in figure 3a,b for different values of ρ.We compute these probabilities through equations (2.7) and (2.8).Larger values for these probabilities, respectively, indicate that superspreading is more likely (i.e.disproportionately more infections than E het E imp y n occur) and that imports have increased potential to cause superspreading (i.e.transmissibility above E het R imp ).This distinction is rarely explored because it is less important at population levels, where superspreading models are commonly used [1,26,28].However, the limiting finite-size effects of events make this distinction crucial.Histograms of samples of the infections at the event for some of the values of ρ in figure 3a,b are shown in figure 3c.
We find that increasing prevalence ranks the y survival curves for both k scenarios (figure 3a) (at a given threshold tail size c, probabilities increase with ρ) but has limited impact on the R curves (figure 3b).The latter trend is expected as ρ does not change transmissibility.However, the fact that prevalence alone can mediate realized superspreading risk is important and, to our knowledge, unexplored.We confirm this with the histograms of figure 3c, which have thicker tails or at least more rightward probability mass as ρ grows (even at large k).The variances of the y values (not shown) also rise with ρ.We show equivalent analyses for a larger sized event (n = 100) in electronic supplementary material, figure S2, and recover similar results.The rate at which infections are introduced is therefore critical to assessing the chances of superspreading at event.The risk of superspreading is a key determinant of whether cases of disease at the beginning of an outbreak will lead to a major epidemic because local infection clusters can propagate forward, snowballing into wider waves of infections.In Figure 4. Event size bias substantially elevates the risk of infection.We compare the risk of acquiring infection at an event under models with size-biased import rates owing to variability in risk perception against a null model with constant importation rate at the prevalence ρ. (a) The size-biased rates ϵ n , parametrized by r, for m = 46 events are plotted with sizes spanning 5:50.Smaller r, decreasing from blue to green to red, indicates more skewed ϵ n functions but conserves the overall infection import rate.The critical event size n* demarcates when ϵ n is closest to ρ (risk-neutral event sizes). (b,c) The resulting mean and variance of the number of infections at an event (E y ϵ , V y ϵ ) relative to the equivalent quantities from the null model (E y ρ , V y ρ ) for dispersion parameter k = 0.1.(a-c) We show medians with 95% credible intervals as computed using equations (2.7)-(2.10).These marginalize over 10 4 samples from the distributions of transmission heterogeneity (controlled by k) and the number of importations (controlled by ρ and ϵ n ).We also provide ratios of the means of these plots for (b,c) as insets.(d) The total mean number of infections over all events, which remains mostly stable because of our ϵ n constraints.
standard models, the R survival curves correlate strongly with those of y [1,28].However, the added variation we see in the y curves in figure 3 highlights that superspreading risk is above that expected from R alone and, further, that chances of stochastic extinction are reduced (the histograms show P y = 0 falling with ρ).Understanding the interaction between the import rate (determined by the prevalence) and finite event size effects is therefore essential for accurately inferring the risk of superspreading at an event and hence the chance of epidemic establishment.Next, we demonstrate that the realized superspreading risk can further rise if risk awareness affects event attendance.

Risk awareness controls importation rates and amplifies superspreading risk
We previously assumed that the importation rate into an event was small, constant and equal to the population infection prevalence ρ.However, this is unrealistic as event attendance will depend on individual preferences.Previous studies based on survey data have found that individual perceptions of infection risk can regulate transmission dynamics and that a spectrum of risk appetites exists in a population [13,30,31].Many models couple behavioural changes to prevalence [10,17] and prevalence elasticity, in which self-protective behaviours vary with prevalence, have been observed.We hypothesize, for a fixed prevalence baseline, that heterogeneity in individual risk perception (i.e. the risk spectrum) may mean that risk-averse individuals avoid larger events where they expect higher chances of becoming infected.Events with large numbers of attendees are then disproportionately likely to be attended by less risk-averse individuals (those with large risk appetites), who have higher chances of introducing infection to the event.
We explore this idea by altering the null model from the above section in which the probability that an event attendee is already infected is ρ.We propose a size-biased model where risk appetite or awareness adjusts the event-scale rate of importation based on event size n.We realize this using weights that assign a rate ϵ n that scales with n (see §2) but ensures the total infections imported into all events is conserved on average, i.e. overall transmission levels are constrained.We consider a set of m events, the ith of which has size n i .The weight w i is set to increase with n i but satisfies ∑ i = 1 m w i = ρ .The skew of the w i , i.e. the strength of the size-bias, is controlled by the parameter r.We apply this model with differing weight strengths r using equations (2.9) and (2.10) and under the parameter settings from figure 3, to obtain figure 4.
In figure 4, we study weight choices characterizing two risk-awareness levels (green and red), in which the probability that an attendee is an imported infection increases with the event size, a relatively risk-stable case (blue) and a null model (black, dashed) completely neglecting risk awareness.We show corresponding import rates in figure 4a and compute a critical event Figure 5. Superspreading risk increases with risk awareness.We repeat the analyses in figure 4 but for varying prevalence rates ρ at a given risk-awareness strength r = 0.5 (a) and for differing strengths at prevalence ρ = 0.05 (b).These show mean numbers of infections E y | ϵ under risk-aware models relative to that from the null model E y | ρ (we plot only medians of distributions for dispersion parameter k = 0.1).We demonstrate how risk awareness modulates the risk of superspreading at medium-and large-sized events (dashed vertical lines in a,b) by exploring tail infection survival probabilities P y ≥ c (c,d) (also see electronic supplementary material, figures S3 and S4, for further accompanying simulations and statistics).
size n*, at which ϵ n is closest to ρ.For the risk-aware models (green and red), events above this size have higher infection risk than assumed by conventional (null) models.In figure 4b,c, we illustrate that size-biasing substantially amplifies the mean and variance of the number of infections y, doubling or tripling the risks at larger events, relative to the null model (see insets), for the risk strength parameters and constraints we consider.This amplification outweighs the suppression of infections at smaller events as well as susceptible depletion caused by imports and signifies that risk awareness can strongly shape infection patterns.Our ϵ n constraints limit variations in the total mean infections across events (figure 4d).We show the underlying mean, variance and VM ratios of the small-scale, event reproduction numbers as well as VM ratios for infections in electronic supplementary material, figure S3.These support the trends in figure 4. We also repeat this analysis for epidemics with homogeneous spread (k = 10) in electronic supplementary material, figure S4.Interestingly, while the variance in the number of infections is smaller, the ratios of the means and variances among risk aware and the null model (electronic supplementary material, figure S4b,c) are similar and rise with n, indicating that risk awareness alone can introduce additional superspreading risk.These results (with figure 3) mean that neglecting the risk spectrum within host populations, as is performed in conventional models where the probability that an attendee is initially infected is set solely by the prevalence ρ, can lead to substantial underestimation of the likelihood of superspreading.
We confirm this in figure 5, where we illustrate how log survival or tail probabilities of infections (log P y ≥ c ) change with the risk awareness strength r and ρ.In figure 5a, we fix r and find that the median risk of infections, relative to the null risk-neutral model, is largely unchanged.This verifies that the skew of the size bias from risk awareness is a key variable.In figure 5b, we see how median relative risk at larger events increases with risk awareness levels (i.e. as r falls), for fixed ρ.We highlight two event sizes n = 24, 48 , which have relative risks below and above 1 (dashed lines in figure 5a,b) and examine their tail probabilities in figure 5c,d.In figure 5c, we find that, for both sizes, superspreading risk rises with prevalence as tail probabilities at any threshold c scale with ρ.In figure 5d we note that risk awareness at a given prevalence can reduce the likelihood of superspreading at smaller events but considerably amplify the superspreading risk at larger events (seen as an inversion in the ranking of curves from blue to red).Risk awareness is the key driver of superspreading risk at large events.We expand on the results in figure 5 (using the same parameter values) and compute the mean and variance of the small-scale reproduction number R and the number of infections at the event y. (a) Plots these statistics for differing prevalence ρ at fixed risk-awareness strengths r (smaller values indicate stronger risk awareness), while (b) varies r at fixed ρ.Increasing ρ leads to a higher mean number of infections and more variation in the number of infections.Decreasing r reduces the mean as well as the variance in the number of infections at smaller events but amplifies them at larger events, increasing the risk of superspreading.The statistics of the reproduction numbers do not reflect the realized numbers of infections (decreases in variances at larger event sizes occur owing to saturation) confirming that variability in the risk awareness between individuals is the major driver of the event-level infection patterns.
Across our simulations, this amplification from host behaviours can be as much as 10-fold (2-2.5 natural log units in the tail probabilities at 4 ≤ c ≤ 6 in figure 5 at n = 48).We reinforce these conclusions by computing the associated statistics of new infections and our small-scale reproduction numbers (figure 6).There we verify that the mean and variance of the number of infections grow with prevalence and the variability in risk awareness within the population, but that the realized heterogeneity is not inferable from the heterogeneity in reproduction numbers.Consequently, the population risk spectrum can independently drive increased superspreading risk at larger events that may have critical ramifications because larger events can support more infections and contribute disproportionately to the establishment of infection in the host population early in an epidemic.As a result, accurate characterization of small-scale behavioural patterns, in combination with the estimation of both the wider scale prevalence of infection in the population and transmission heterogeneities, are integral to correctly quantifying the risk of superspreading.

Discussion
Human behaviour is a key driver of infectious disease outbreaks, yet, it is not often considered in detail in epidemiological modelling studies.While variations in individual perceptions of the risks associated with acquiring infection are known to shape important macroscopic properties of an epidemic, such as disease incidence time series or patterns of spread [11,17,32], few studies have explored how human behaviour impacts the chances of superspreading.This phenomenon, while infrequent and seeded at small scales such as events or gatherings, can generate disproportionate numbers of infections, which can substantially influence large-scale epidemic growth and persistence, particularly during early or emergent stages [3,5,26].Data or even models connecting the spectrum of infection risk perceptions within host populations to attendance of events are rarely available or studied [9,24].Here, we aspired to resolve this gap by investigating the effects of plausible relationships between human behaviour and event attendance on superspreading and by highlighting how small-scale behavioural data collection can be useful for improving the accuracy of epidemic modelling and hence control.
We developed a computational framework [equations (2.1)-(2.10)] to model small-scale transmission at events (e.g.weddings, parties, sports matches or concerts) where superspreading may arise and individual behaviour can impact pathogen dynamics.Our framework quantifies, under a standard random mixing assumption, how finite event size effects together with heterogeneities in both the transmissibility among hosts and the rate of introductions of infections to an event, contribute to the numbers of infections generated at that event.This generalizes several earlier approaches [23][24][25] and allows us to define a within-event (small-scale) reproduction number that measures how importations and individual-level variations impact the transmissibility at events.Our event reproduction number R x meaningfully links to population-level characteristics through its convergence to R 0 when the event size and duration scale asymptotically (see §2).
Using R x , we showed that previous transmissibility metrics, whether derived from branching processes [1] or earlier event-level approaches [24], can overestimate transmission and the number of infections likely to occur at an event (figure 2 and electronic supplementary material, figure S1).This result holds for any model in which the finite supply of susceptible individuals at an event is not accounted for and is exacerbated when there are multiple imported infections, because the number of susceptible individuals that any imported case can infect falls [33].Furthermore, this finite-size effect highlighted that it is essential to collect data on the number of infections introduced into events to accurately quantify superspreading risk (figure 3 and electronic supplementary material, figure S2), which we found to depend strongly on both the number of imported infections and more conventionally evaluated heterogeneities [3].This insight hinted at one potential reason why behavioural patterns might affect the likelihood of superspreading-if risk awareness alters the distribution of infections imported into events, then it could also modulate the chances of superspreading occurring at those events.
We explored this possibility using a parsimonious model of human behaviour.We posited that variations in infection risk perceptions or awareness in a population might cause more risk-averse individuals to (probabilistically) avoid larger events, which they believe present a higher risk of acquiring infection.Our framework allowed us to model this as a size-biased weighting on the rate of introducing infections to an event that is a function of both the wider population prevalence and the event size.This draws on real-world observations that there is a spectrum of self-protective behaviours in host populations that are driven by prevalence baselines and heterogeneous risk perceptions [10,12,14,19,20].Across numerous model simulations, we found, for given event sizes and fixed overall prevalence values, that risk awareness amplifies the chances of superspreading at large events (figure 4 and electronic supplementary material, figures S3 and S4) but limits transmission at smaller events.
Moreover, as either the prevalence or variability in risk awareness increases (characterized by the strength parameter r), the chances of superspreading elevate (figures 5 and 6).This holds irrespective of the inherent heterogeneity in transmissibility at the event (characterized by dispersion parameter k), which describes the impact of conventional superspreading drivers such as pathogen biology and host characteristics.Furthermore, the mean, variance and probability of large numbers of infections at the event all support this trend.Because this amplification of within-event transmission occurs precisely at those events with the capacity to support larger numbers of infections (i.e. at larger events, where the effect of susceptible depletion when there are more imports is less), this behavioural mechanism can have major consequences.This may be especially critical during the sensitive, initial stages of potential epidemics, when increased superspreading can spur growth and trigger progression from sporadic outbreaks into sustained waves of infection [3,5].
Although these results underscore the importance of human behaviour in driving infectious disease outbreaks, our approach, like any mathematical modelling study, involved several simplifying assumptions.First, we assumed random mixing within events, so any susceptible individual can interact with any infectious individual with equal probability.In reality, contact networks form at events, and the structure of these networks may differ with the size and type of event.While frameworks for embedding contact structure in epidemiological models exist (e.g.multilayer networks can be used to link risk awareness and infection structure [32][33][34]), they can be complex and difficult to interpret or require high-resolution data that are typically unavailable [9,30,35].We also note that our inclusion of transmission heterogeneities as in [1], together with our weighting of the risk of introductions based on event size, do reflect some features of real-world transmission networks while preserving interpretability.
Second, we assumed that event sizes and durations were predetermined and fixed the overall import rate across all events.However, risk awareness could itself reduce event durations, sizes, frequencies and thus the prevalence of infections.Conversely, if events are prevented, owing to government policy, then less risk-averse individuals may initiate their own unregulated gatherings, which could increase transmission (a rebound effect) [15].This feedback between behaviour and environment (risk and event properties) might affect chances of superspreading [11].Characterizing this feedback is difficult because of a lack of data linking event properties and human behaviour [9,36].Future collection and analysis of such data will be vital for grounding hypotheses and avoiding overly strong or prescriptive modelling assumptions.
Third, and relatedly, we did not attempt to model the causes of or temporal changes in the different levels of perceived infection risks among individuals.During initial epidemic stages and especially for novel pathogens, data can be sparse and erratic [35,37].The perception of the risks associated with acquiring infection may therefore be affected by unreliable reports and major uncertainty about the true risk posed by the invading pathogen.Moreover, these risk perceptions might change across time as population immunity builds or if pharmaceuticals that reduce the severity of infections are introduced at later epidemic stages.It could even be the case that after the epidemic peak, the less risk averse individuals are actually more likely to harbour at least partial immunity.Modelling these poorly understood effects would require strong assumptions that are hard to validate.
To avoid all of the above issues, we only considered initial epidemic stages and focused on isolating the risk-awarenessinduced patterns given a set of events and a prevalence baseline.We made a minimal assumption, supported by survey data on population behaviours [12][13][14], that there is a spectrum of risk about this baseline.Our aim was twofold, to discover how risk perceptions could impact superspreading and to show why collecting auxiliary data describing variations in epidemic-related human behaviour, such as event attendance, are necessary.Improving understanding of the coupling between risk-sensitive behaviours and epidemiology would create a platform for future investigation of the outstanding problems mentioned above.
While our modelling approach is relatively simple, it provides clear evidence that behavioural patterns can substantially amplify the risk of superspreading.Heterogeneity in infection risk perception within host populations, modelled by an event size-biased importation rate ϵ n , translates into the potential for substantial transmission at large events during the early stages of infectious disease epidemics.Because superspreading plays a pivotal role in epidemic growth and the chance of pathogen establishment, further data are required to uncover and specify the form of ϵ n and the mechanisms that shape the coupling between epidemiological and behavioural patterns.We bolster calls for enhanced surveillance that collects such data [9,30].Surveys linking perceptions of infection risk with attendance at events [19] are essential to determine when variability in risk awareness may be a principal driver of superspreading.This is important to inform, design and target public health interventions more effectively [4].

8 Infections x = 4 RFigure 1 .
Figure 1.Modelling framework for event-level transmission subject to risk awareness.We outline the central steps and define the main notation underlying our proposed framework for modelling transmission patterns at small events.We refer to equations defined in §2.This framework accounts for how heterogeneity in infection risk perception among individuals modulates the number of imported cases x at an event of size n and hence contributes to the secondary infections generated at that event y.

Figure 2 .
Figure 2.Risk statistics for an event with heterogeneous transmission.We plot the mean (E . ,top panels) and variance to mean ratio (VM . ,bottom panels) of the small-scale event reproduction number R x (a) and the mean count of new infections E y | x, n (b) as a function of the number of imports x.We compute these from equations (2.4)-(2.6)and compile statistics over 10 5 samples from heterogeneous offspring distributions with dispersion parameters k ranging from 0.1 to 10 (increasing from blue to red, with grey depicting all intermediate values).For comparison, we show the large-scale reproduction number τ d R 0 and the number of initial susceptible individuals at the event, n − x.We repeat this analysis at a larger value of τ d R 0 = 3 in electronic supplementary material, figure S1.
risk r = 0.5 tail sizes c | event size n = 24 tail sizes c | event size n = 48 increasing prevalence more risk awareness sizes n | prevalence ρ = 0.05 more risk awareness, ε(n), means smaller r prevalence ρ

Figure 6 .
Figure 6.Risk awareness is the key driver of superspreading risk at large events.We expand on the results in figure5(using the same parameter values) and compute the mean and variance of the small-scale reproduction number R and the number of infections at the event y. (a) Plots these statistics for differing prevalence ρ at fixed risk-awareness strengths r (smaller values indicate stronger risk awareness), while (b) varies r at fixed ρ.Increasing ρ leads to a higher mean number of infections and more variation in the number of infections.Decreasing r reduces the mean as well as the variance in the number of infections at smaller events but amplifies them at larger events, increasing the risk of superspreading.The statistics of the reproduction numbers do not reflect the realized numbers of infections (decreases in variances at larger event sizes occur owing to saturation) confirming that variability in the risk awareness between individuals is the major driver of the event-level infection patterns.