Modelling: Understanding pandemics and how to control them

New disease challenges, societal demands and better or novel types of data, drive innovations in the structure, formulation and analysis of epidemic models. Innovations in modelling can lead to new insights into epidemic processes and better use of available data, yielding improved disease control and stimulating collection of better data and new data types. Here we identify key challenges for the structure, formulation, analysis and use of mathematical models of pathogen transmission relevant to current and future pandemics.


Introduction
Controlling pandemics is a wicked problem (Rittel and Webber, 1973) hence characterized by incomplete, contradictory and changing requirements, difficulties in obtaining data relevant for decision-making, and where seemingly well-motivated efforts may lead to unintended and even self-defeating outcomes (Schiefloe, 2021). Data availability will vary in time and at range of spatial and social scales within and between nations. During pandemics such as COVID-19, many of the challenges therefore involve difficult questions that cannot be answered purely empirically or for which insufficient information is available solely from data at the time decisions need to be made (for example, see Thompson et al., 2020). Modelling allows development of well posed technical frameworks in which such questions can be explored: • yielding qualitative understanding of how various factors, (e.g. differential within-host progression of disease; heterogeneity in population response, infectivity, susceptibility, contact intensity and structures) influence spread of infection, and allowing in silico experiments to determine the effects of hypotheses concerning evolving threats and possible interventions; • providing a framework for synthesising multiple sources of information including understanding how best to collect, analyse and interpret observations (data) related to the infection and its spread in the population, and for determining what further data would most usefully be collected; and, • allowing estimation of current and historic trends including nonobserved or non-observable quantities (e.g. new infections) as well as short term projections and quantitative (future) scenarios useful for health policy decision making, planning and evaluation of interventions.
Almost all disease transmission models exhibit threshold behaviour whereby epidemic spread occurs when a certain combination of parameters that is dependent on model structure, (e.g. the real time reproduction number R t ) exceeds a threshold (for application to COVID-19 see Vegvari et al., 2021). Typically, the aim of interventions is to bring epidemic spread below this threshold, and/or to deal with the consequences of being close to it, but interventions should also be tailored to the particular properties of the disease (e.g., morbidity and mortality) and societal needs and structure (e.g. households and age-related contact patterns).
During the COVID-19 pandemic, modelling has been applied more prominently and widely than ever to inform and advise public health policy; see for example the special issue of the journal Philosophical Transactions B on Modelling that shaped the early COVID-19 pandemic response in the UK . However, work during the COVID-19 pandemic has revealed not only a lack of information on some aspects of the infection itself particularly in the early phase (e.g., pre-symptomatic and asymptomatic cases, duration of immunity, long-term disease), but also important gaps in the available modelling tools and theoretical understanding needed. This paper describes these gaps and defines the key challenges that if successfully tackled can address these shortcomings.
We build upon many of the challenges highlighted in the 2015 special issue of the Epidemics journal (vol.10) on Challenges in Modelling Infectious Disease Dynamics . That issue included challenges related to global transmission models , meta-population and household approaches , and explicitly spatial  and detailed network  representations. Although such models are typically built using stochastic processes, deterministic models (which can often be viewed/derived as approximations of stochastic models) can also provide critical insights (discussed in Roberts et al., 2015). Many of the challenges raised in the 2015 vol, relating for example to pathogen mutation-evolution, multi-strain systems, inferential methods for data on the emerging phase of epidemics in structured populations, computationally efficient methods for calculating thresholds and early exponential growth rates, and designing network-based/spatial interventions, remain important during modelling of the spread of SARS-CoV-2 and its variants. In this paper, we highlight to what extent these challenges have been addressed in the intervening 6 years, which are outstanding, and add new challenges seen from a mid-pandemic rather than pre-pandemic perspective.
Given the complex questions and uncertainties that arise during infectious disease emergencies, it is important to note that modelling rarely offers exact or binary answers but, rather, provides tools to enable both understanding and quantification of phenomena. Such use of models is discussed in other contributions to this special issue. For example, modelling tools can be used to develop a suite of possible answers for informing and advising policy (Hadley et al., 2021) and for designing and assessing interventions (Kretzschmar et al., 2022) including vaccination (Madewell et al., 2021). An important step in this process is the statistical estimation of key quantities (e.g. contact rates, latent and infectious periods) that characterise transmission and other aspects of disease dynamics , which is critically dependent on availability, accessibility and reliability of data (Shadbolt et al., 2022).
The specific contribution of the current paper is to focus on the challenges for the structure, formulation, and analysis of models in developing our understanding of pandemics and how to control them. Fig. 1 summarises the key areas addressed and their interconnections, with Table 1 summarising the key challenges identified. We first discuss challenges in the formulation and analysis of models from the interlinked perspectives of between-host contact processes ( §1), within-host dynamics ( §2) and the characteristics of the pathogen ( §3). We then look beyond model formulation and analysis to challenges where modellers can play a critical role in improving modelling and the impact of modelling on public health responses to future pandemics through: better collection of data during outbreaks including design of testing, surveillance and contact tracing ( §4); and in improving capabilities for real time decision support and ensuring greater openness, transparency and trust in communicating modelling results to both policy makers and the public ( §5).

Between host: modelling infectious contact processes
The representation of infectious contacts underpins all dynamic transmission modelling and is therefore the central challenge in modelling future pandemics. COVID-19 has revealed significant deficiencies in our ability to model such contact processes e.g. the need to move beyond use of observed pre-outbreak (peacetime) contact patterns (Conlan et al., 2021) in future pandemics, for example by re-estimating contact patterns during outbreaks (Pooley et al., 2022). Specifically, there is a need to better account for key structures like households, schools and workplaces that are fundamental to societal organisation and thus disease transmission, but which, despite notable successes, have yet to receive the attention they merit (Hilton et al., 2022). For example, improvements here could better predict impacts of interventions such as school and workplace closures. We describe key challenges for household and meta-population modelling in section 1.1. A further problem is that most models and analytical tools account for only a single geographic/demographic scale, placing significant limits on the range of scenarios and scope and scale of possible interventions that can be described (Garabed et al., 2019). Section 1.2 describes challenges for multi-scale meta-population, spatial and network models. Finally, the COVID-19 pandemic has shown not only the importance of individual and collective behaviour in response to the spread of pathogens and to public health interventions, but further highlighted the relative inability of modelling tools to capture such behaviour (Weston et al., 2018). Section 1.3 therefore addresses the challenges for developing more useful models of the impact of human behaviour on disease transmission. An overarching challenge and opportunity is to use the huge amounts of data generated in response to COVID-19 across the world and under different interventions Vigfusson et al., 2021;Jia et al., 2020) to address these challenges.
While they appeared less influential a decade ago , agent-based models (ABMs), also referred to as individual-based models (see e.g. Lau et al., 2017), have been employed in response to infectious disease outbreaks like Ebola and COVID-19 (see e.g. Kiskowski and Chowell, 2016;Kerr et al., 2021;Hinch et al., 2021), and represent an attractive option in tackling many of the challenges highlighted below. Examples include models with both households and communities of households to study the spread of Ebola in West Africa (Kiskowski, 2014) or household bubbles in the context of COVID-19 (Leng et al., 2021), and models with dynamic household structure, for instance with individuals distributed across multiple dwellings (Chisholm et al., 2020) or with explicit demographic change (Geard et al., 2015). However, simplified deterministic and stochastic models remain fundamental in providing analytic insights into how key aspects of contact patterns and human behaviour affect epidemic dynamics and outcomes. Analytical approaches can also enable development of simplified models more amenable to formal methods of statistical inference and uncertainty quantification.

Household and meta-population models
Households are a key structure in many human societies. For directly transmissible pathogens, they are a fundamental epidemiological unit because for many infectious diseases the stable and more intimate nature of contacts between household members typically translates into a significantly higher probability of transmission than with individuals outside the household. Furthermore, household sizes and compositions are also typically readily available from census data, and many control policies are targeted at households. Some of these considerations can also be extended to other kinds of stable groupings of individuals, such as workplaces, schools, etc., where contacts are closer than average.
Their societal importance, the availability of data and their role in transmission, make households natural observation points for estimating transmission parameters and how individual (age, sex, occupation, etc.) and household (e.g. socio-economic status, overcrowding) properties affect susceptibility and infectivity. Optimal design of household-stratified data collection has recently attracted increased interest (Kinyanjui et al., 2016), and the COVID-19 pandemic has seen an explosion of large-scale data collection (see for example the UK Office for National Statistics, 2021). Observables have traditionally focused on data on the final size of household outbreaks (Demiris and O'Neill, 2005), though more recent work has investigated biases in parameter estimation that arise when using data from a growing epidemic (e.g. Ball and Shaw, 2015). However, for household models a key challenge is to develop tools to allow analysis that accounts for model parameters varying over time, caused by behaviour change and pandemic interventions imposed and lifted in rapid succession. Further challenges arise from the presence of imperfect case ascertainment, in particular due to heterogeneous symptoms, test-seeking behaviour and disease outcome. Final size household data can be used by separating the epidemic into suitably distinct phases . However, methodological developments are urgently needed for using data collected in real-time and on a background of a dynamically varying population prevalence and hence risk of introduction into households, as well as to make full use of test results of household members collected over time, or other temporal data. A key challenge is thus to improve analytic understanding of the temporal dynamics of household models under variable conditions and with detailed endpoints. Important mathematical developments in the context of models with household structure have been made over the past 25 years (e.g. Ball et al., 1997;Ball et al., 2014;Pellis et al., 2012). However, the vast majority of theoretical results are confined to time-integrated epidemiological quantities, such as those relating reproduction numbers to final size distributions or the probability of a large outbreak. Such results have been influential in informing the likely impact of specific policies, for example providing intuition on the role of household bubbles in allowing more social interaction while limiting the increase in transmission , or on how school closure and reopening affect the network linking households and thus the household reproduction number. However, other forms of household-based interventions that received significant interest during the COVID-19 pandemic, including in-or out-of-household isolation of single individuals or quarantining of all household members following symptom-based case detection (Overton et al., 2020) or contact tracing (Fyles et al., 2021), result in changes to transmission parameters during a within-household outbreak. Obtaining theoretical results for models with non-constant parameters, even in simple cases, remains an open problem. Similarly, although analytical results for the real-time growth rate (Malthusian parameter) of the number of infected households are available (Ball et al., 2015, §6), further results on real-time non-linear epidemic dynamics in the presence of households would be valuable.
Extensions to the simplest household models that allow for heterogeneity in susceptibility and infectivity of individuals in households exist (see e.g. Ball et al., 2011). Potential extensions include models with multiple types of both households and household members, together with realistic contacts and other disease-dependent parameters varying between the different types. Such models might allow for better understanding of household infection dynamics, more targeted interventions, as well as a more realistic description of the consequences of socio-economic differences between household types (Villela, 2021). Furthermore, such models might incorporate assortative mixing between households based on size or composition, e.g. households with children are more likely to be linked with each other through schools, and are likely larger than average. Effects of interventions such as school closures could be better captured in such models than when household connections are uncorrelated. There has been some work on the impact of the assortativity in the context of networks (see e.g. Newman, 2002;Ball et al., 2013) but it remains a challenge for models with household structure.
From a practical point of view, understanding the role of household structures on transmission dynamics, epidemic outcome and impact of interventions is crucial. Pellis et al. (2020) develop model comparison approaches in the context of an emerging epidemic and two age classes, to assess the importance of household structure, but there exists a challenge to extend such approaches to more general settings and models. Models with further structures beyond single households might also be both of practical interest and amenable to mathematical analysis. Recent advances have been made on models with hierarchical structures (e.g. Gandolfi and Cecconi, 2016). Some work also exists where other Enhancements of models and associated analytical tools (see challenges in §1, §2 and §3) and greater engagement with public health stakeholders ( §4 and §5) are mutually reinforcing and will ultimately lead to more effective societal and public health responses to future pandemics. small mixing groups overlap with households, e.g. schools or workplaces (e.g. Pellis et al., 2011). However, overlap creates dependencies between units and the techniques (Ball et al., 2014) used in these studies, which essentially assume that overlap is restricted to one individual, would profit from generalisation. Ultimately, data on the structure of households and interconnectivity between them and with wider society should inform such modelling. Britton et al. (2015) posed the question, "Is the classification into global, network, meta-population and spatial models sufficient for the range of contact structures of interest in understanding infectious disease dynamics?" Here we argue for the need to understand hybrid models that combine or bridge between meta-population, social network and spatial structures. As argued above, meta-population structures of households and workplaces have proved crucial in understanding and controlling COVID-19 as have detailed individual level Table 1 Summary of Challenges for the structure, formulation, and analysis of models used to understand pandemics and how to control them. models  based on networks captured by global positioning system (GPS) tracking that reveal local spatial structure. Moreover, at large scales real word disease incidence shows substantial spatial variation. Attempts to address such issues have made use of spatially explicit ABMs (Lau et al., 2017;Kerr et al., 2021) including cases described above where household structures are embedded within broader contexts (e.g. Kiskowski and Chowell, 2016).

Multi-scale models: meta-population, spatial and network
A significant challenge is to develop approaches to analyse and simplify such models to enable greater understanding of disease dynamics and control including vaccination (see e.g. Ball and Sirl, 2018;Ball et al., 2010). Methods such as moment-closure and pair approximations are useful in developing simplified representations and analytic results in spatial and network models (Barnard et al., 2019). Another promising approach is to develop systematic methods (KhudaBukhsh et al., 2019) to enable coarse-graining of networked individual-based models to generate more tractable representations. Although it is understood that higher-order network structure impacts epidemic outcomes (Ritchie et al., 2014), an important challenge is to understand what properties of real world networks impact the accuracy of analytic results obtained under different assumptions (Silva et al., 2020;Wu and Hadzibeganovic, 2020). The concept of universality classes (Chung et al., 2016) may be useful in classifying epidemic dynamics on real world networks.
Analysis of models that bridge the gap between meta-population, network or spatial representation of contact processes are rare. One example of analytical work that attempts to address combined metapopulation and network models uses moment-closure to develop simplified representations and analytic results that suggest detailed case-reporting data can be informative of connectivity between metapopulations (Meakin and Keeling, 2019). On the other hand, Haw et al. (2020) develop and analyse the output of hybrid models that combine interactions on social networks with spatial movement showing they can give rise to sub-exponential outbreak dynamics with lower, later epidemic peaks that are hard to explain in more standard models. A potential route to greater analytic understanding of such models might be to exploit links between spatial models  and more general network representations. For example, the spatial structure of human infectious contacts, and how it is affected by interventions, is not well understood. There is an unexplored part of model space between strictly spatial models, where the k-neighbourhood of an individual, the number that can be reached in a chain of k successive contacts, is of order k 2 , and homogeneous-mixing models where it is of order e k . These correspond to networks with typical distance between individuals varying from the square root to the logarithm of population size. For example, initial work in this area shows that networks with local clustering are effectively homogeneous-mixing in their spatial structure (Mollison, 2004). Promising work has shown how to develop analytic approaches for models that combine spatial features of scale-free networks with nested community structure (Gandolfi and Cecconi, 2016). More recent work has shown how to embed networks within a spatial structure to better explore the impact of controls like social distancing and travel restrictions (Jorritsma et al., 2020). Tractable models that better represent mixing at a range of scales e.g. in human populations, would help address the challenge of better understanding and prediction of how epidemics spread and how individual and public health responses to them could best limit the impact of outbreaks and reduce persistence of endemic disease.

Behaviour and contact processes
Behavioural responses significantly increase the challenge of assessing the impact of public health interventions (Michie and West, 2020). Throughout the COVID-19 pandemic, models have been widely used to predict the impact of non-pharmaceutical interventions, or NPIs (Flaxman et al., 2020). NPIs represent measures to reduce ongoing and limit future transmission via social-distancing (including lockdowns) and in the case of respiratory infections the use of face masks. Assessment of NPIs is further complicated by the fact that awareness of disease spread itself, or simply the knowledge that NPIs are being discussed, may alter contact behaviour (Zhou et al., 2020). Kretzschmar et al. (2022) provide a broad discussion of factors that should be accounted for, and the range of challenges associated with modelling NPIs. Therefore here we focus on the major methodological and technical developments needed to support such efforts.
Currently the typical approach to assessing the impact of NPIs or other behavioural responses to outbreaks is to modify parameters of epidemic models to capture resulting or anticipated changes e.g. reducing contact rates to represent social distancing measures. This is reliant on expert judgment about the impact of NPIs or rich sources of data on observed responses, ideally under NPIs, to allow robust calibration with uncertainties in predictions quantified. This is because behavioural studies do not directly measure changes in transmission rates but rather focus on proxies e.g. changes in movement patterns. A challenge here is to develop methods that use transmission models and the observed epidemic to quantify changes in contact patterns that are robust to uncertainties and gaps in real world data. Such empirical approaches (see e.g. Pooley et al., 2022) offer a valuable quantitative framework, but are currently phenomenological and only explore direct impacts of known and anticipated responses to pandemics and NPIs. For example, to the best of our knowledge they currently do not account for compensatory behaviours.
Improved understanding and prediction in pandemics and largescale outbreaks and endemic disease scenarios requires better quantification of behaviour Weston et al., 2018) and the dynamics of social systems. These dynamics include movement patterns at various scales associated with different activities, e.g. work and leisure, including how these are impacted by public health intervention, business response and individual behaviour change, in direct and indirect response to a pandemic. Anonymised mobile-phone call detail records (CDRs) have been used to show behavioural change of infected individuals (Vigfusson et al., 2021). This poses challenges and opportunities to standard disease transmission modelling to account for such behavioural changes and heterogeneities in such responses across populations due to socio-economic factors e.g. that limit opportunities to self-isolate (Bharti, 2021;Gauvin et al., 2021). These complexities are further compounded by large-scale changes in society in response to perception of disease outbreaks and public health interventions. A challenge here is how to make use of available data at different levels of granularity from 'big data' tracking individuals, to aggregate societal data, such as population flows (Jia et al., 2020), transport usage and retail sales, to develop sufficiently predictive models that can anticipate changes in contact patterns for example when pubs, bars and restaurants close (Tang, 2020). In recent years there has been significant interest in game theoretic approaches to understand and predict behavioural responses to disease threats (Chang et al., 2020). Another potential way forward is the development of mechanistic models of dynamic contact processes that account for constraints on individual behaviour using individual propensities for types of contact behaviour (Knight et al., 2022). Such models may be amenable to analysis and parameterisation using big data describing historic movement patterns (for moves in this direction see e.g. Knight et al., 2021). Treatment of such models as dynamic networks may offer a fruitful approach to develop better understanding of behavioural responses and effect of social distancing measures (Valdano et al., 2018;Barnard et al., 2018;Ball et al., 2019).

Within-host complexity: beyond SEIR models
There is increasing recognition that within-host dynamics can play a critical role in disease transmission and control and furthermore that they may be exploited to extract greater value from existing data. COVID-19 has amply illustrated the need to move beyond standard models such as SEIR to capture aspects of within-host dynamics that are critical to population scale pandemic outcomes, for example distinguishing symptomatic and asymptomatic pathways to better understand the force of infection. Such considerations are directly relevant to public health measures since asymptomatic donors will reduce the efficacy of control programmes that largely target symptomatic individuals e.g. through isolation or quarantine. A better understanding of the dynamics and distribution of immunity in populations resulting from both outbreaks and vaccination will allow better design of future vaccination strategies. Furthermore, modelling of individual viral loads and even immune response dynamics opens up the ability to exploit the non-binary nature of diagnostic test results to better characterise outbreak dynamics, even potentially exploiting purely cross-sectional data.

Better disease progression models
Although the adoption of discrete states (e.g. the susceptible, exposed, infectious and recovered states of the SEIR model), has been extraordinarily successful in epidemiological modelling, these must be adapted to particular conditions or properties of the focal disease, such as pre-symptomatic (Anderson et al., 2020) and asymptomatic (Gandhi et al., 2020) infectivity, or degrees of severity of symptoms (Verity et al., 2020) and different levels of subsequent immunity. This requires a judicious synthesis of clinical and epidemiological observations, especially in the early phases of a new epidemic. Ideally, modelled states should correspond to distinct and measurable clinical conditions, but this is rarely the case. The challenge is model parameterisation and identification using inference exploiting longitudinal data with individual-based stochastic models and other proxy information, such as viral load; key issues are reducing or dealing with computational complexity of inference especially for model selection  to extract information from available data, including longitudinal data.
An alternative to a small number of discrete states is to describe infectivity progression by an infectivity profile, which is directly coupled to the generation time distribution (Roberts and Heesterbeek, 2007;Wallinga and Lipsitch, 2007). This kind of description emphasises the continuous nature of disease progression. Adding similar descriptions of symptoms, severity of disease and development of immunity would constitute further improvements. Recent work on inferential questions (e.g. Britton and Scalia Tomba, 2019) has also highlighted the need for describing joint properties of disease states, such as joint distributions of latent, incubation and infectious periods, which lead to related inferential challenges. Coupling within-host disease progression models with measurement of viral load  could provide information on time of infection and infectivity profiles and thus inform the dynamics of outbreaks from cross-sectional data (Hay et al., 2021;Rydevik et al., 2016). For example, the association between viral load and transmission strength and period across the population could be untangled, enabling better understanding of the difference in development of an early, exponentially growing epidemic and a developed but fragmented epidemic in heterogeneous populations (Lythgoe et al., 2013).

Immunity and vaccination
Further work is required to account for the dynamic distribution of immunity across heterogeneous populations resulting from both transmission and vaccination. For example, there is still a challenge in analysing and describing how heterogeneity in the population (e.g. in terms of innate and acquired immunity) impacts the distribution of immunity after an epidemic or wave of an outbreak, and which heterogeneities should be taken into account (Gomes et al., 2022). Some work has been done regarding the impact of multi-type populations (Britton et al., 2021), where it is shown that immunity caused by an earlier wave of an epidemic is distributed over the population in a substantially more efficient way than if the immunity is obtained through vaccination programmes that do not target those making disproportionally many potentially infectious contacts. Realistic modelling of waning immunity, and possibly boosting of previously acquired immunity via exposure to infection and vaccination, remains an active area of research (Cohen et al., 2022) linking within and between host dynamics (Heffernan and Keeling, 2009).
Mathematical models can help in the design of optimal vaccination programmes and to assess their effectiveness. However, work is needed to develop models able to identify optimal distribution strategies and vaccination thresholds when resources e.g. doses, are limited in terms of overall quantities available and rates of supply. Model-based searches for optimal strategies are computationally intensive, especially in the absence of precise estimates or prior knowledge about multiple aspects of the epidemic including transmission rates, vaccine efficacy and mode of action. Depending on the specific questions asked, several issues must be accounted for: granularity in the population, for instance in terms of age or risk groups and localities; different vaccine modes of action, e.g. transmission blocking vs reducing disease severity (Hodgson et al., 2021), especially when transmission and serious disease are distributed differently across the population (i.e. when these are negatively correlated); and different vaccine efficacies or dosing schedules (e.g., requirement for one vs two doses). There are numerous further challenges in improving the role of mathematical models in: enabling vaccine efficacy to be better estimated from surveillance data; exploring interactions between vaccination, disease-induced immunity and NPIs, for instance to identify possible roadmaps towards lifting of restrictions Whittles et al., 2021); and developing better understanding of the potential for vaccine escape and capturing the effect of variation in vaccine efficacy across variants (Day et al., 2020a). See Kretzschmar et al. (2022) and Madewell et al. (2021) for further discussion of vaccination challenges.

A pathogen perspective
If you know yourself but not the enemy, for every victory gained you will also suffer a defeat -Sun Tzu, The Art of War.
A key lesson from COVID-19 is that pathogen characteristics are critical in determining the course of pandemics. The case fatality rate associated with SARS-CoV-2 spurred action, but its propensity to generate asymptomatic infectious cases drove spread and undermined early attempts at control. COVID-19 has also highlighted the role of environmental transmission and future pandemics may be driven by pathogens with greater environmental persistence. Climate change is likely to increase risks from vector borne pathogens as environmental conditions allow vector distributions to spread or shift to either overlap with populations not currently exposed, or to increase exposure in areas already affected (Ryan et al., 2019). In addition, heterogeneity in the pathogen population enables emergence and selection of strains with higher transmission ability (e.g. a greater fraction of asymptomatic cases) and possibly also increased virulence and the ability to evade vaccines. Below we highlight challenges and opportunities for better modelling pathogen heterogeneity and also the need to consider pathogen environmental persistence and transmission.

Multiple pathogen strains: from neutrality to selection
An urgent challenge for epidemiological modelling is to understand when and how to represent the dynamics of pathogen heterogeneity e.g. multiple pathogen strains . This is driven by the need to understand the evolution of more virulent strains, vaccine escape and heterogeneity in response to vaccination, treatment and control. Currently it is very difficult to predict when such threats will emerge but we argue that effective modelling of pathogen heterogeneity should play an increasing role in understanding pathogen strain dynamics in future outbreaks to better guide control measures. Such modelling could also help to understand the potential for interaction between pathogen strains and heterogeneities in host populations e.g. that lead to disproportionate effects on particular socio-economic groups, ethnicities or genders.
Fortunately improved modelling of pathogen heterogeneity is greatly facilitated by the increased resolution in our ability to observe pathogens afforded by application of molecular biology tools. An important opportunity and challenge is to use such data to inform transmission models and provide improved, and more timely information for public health response. A standard approximation is to assume that mutations are neutral and therefore that transmission is unaffected by pathogen strain (Frost et al., 2015). Under this assumption phylogenetic data on pathogens can inform inference of contact networks and be combined with standard epidemiological observations (Volz and Frost, 2013;Lau et al., 2015). However, significant challenges remain in terms of embedding phylodynamics (Grenfell et al., 2004) within disease transmission models. In particular, current approaches do not adequately account for within-host diversity of pathogen, host immunity and pathogen load or selective pressure amongst competing strains (Lau et al., 2019;Metcalf et al., 2015;Wikramaratna et al., 2015).
More work is needed to assess the impact of evolutionary pressures imposed by vaccination and other control campaigns on pathogens (Read et al., 2015). Currently such problems are tackled using models that focus on the potential for invasion of a variant in the presence of a dominant strain, or model fixed and typically small number of competing strains (Day et al., 2020a). Greater flexibility is afforded by ABMs that represent multiple strains, but these come with significant computational and analytic challenges. Tools from quantitative genetics may prove useful in developing analytical insight into such problems (Day and Gandon, 2007;Day et al., 2020b). There is increasing recognition of the need to couple evolutionary and ecological dynamics (Lion, 2018) and in some cases it is known that pathogenicity can be determined by ecological interactions within the microbiome (Busby et al., 2016;Zumbrun et al., 2013). A further challenge is therefore the development of models that better quantify the ecology of microbes, including environmental factors, to understand when and how they become pathogenic. Addressing host genetics and other individual difference e.g. that may affect susceptibility, infectivity and recovery (Pooley et al., 2020), would add further complexity (Frost et al., 2015) and point toward multi-scale models that represent both within and between host pathogen dynamics . But such models may be needed to adequately address the interaction between existing population heterogeneities and future pandemics.

Environmentally persistent pathogens and indirect transmission
Many pathogens persist in the environment necessitating modelling of indirect, or environmental transmission in addition to, or instead of, direct transmission resulting from contacts between infectious and susceptible individuals. Future pandemics may be caused by pathogens that are more persistent in the environment than SARS-CoV-2, and there are numerous challenges associated with modelling resultant environmental transmission (Hollingsworth et al., 2015). In simple scenarios, direct transmission models can accurately represent epidemic outbreaks of environmentally transmitted pathogens as long as there is no significant timescale separation between host infectious period and environmental persistence of the pathogen (Benson et al., 2021). However, caution is needed, since for pathogens that persist in the environment for long periods of time the behaviour of direct and indirect transmission models may be markedly different e.g. the re-emergence of environmentally transmitted diseases in cases where there are no remaining infectious individuals. Thus, un-accounted for environmental transmission is likely to impact evaluation of control measures. Furthermore, environmental transmission is likely to increase the degree of connectivity compared with an observed direct contact process by broadening the effective and now directional contact network. For example, two individuals who visit a given location but at different times may nonetheless have effectively been in contact but individuals can only transmit to later visitors and not vice versa. A key challenge therefore is to develop modelling approaches that account for such differences implicitly or explicitly and methods that enable integration of environmental pathogen load measurements (Wade et al., 2022) into transmission modelling. Other major challenges include integrating understanding of local environmental transmission in and across a range of settings (Morawska et al., 2021;Wang et al., 2021) within e.g. city or national scale modelling of disease dynamics. A promising development are models that enable calculation of indices that describe environmental exposure relative to some reference scenario (Jones et al., 2021) but it remains a challenge to quantify the overall force of infection.
A critical class of strongly environmentally persistent pathogens are vector-borne diseases, such as malaria and infections by arboviruses (DENV, CHIKV, ZIKV), leading to endemic disease levels marked by seasonality, usually driven by vector abundance. Modelling of endemicity in these scenarios requires analytical treatment using seasonal oscillations to be considered for instance in interventions (Bacaër and Guernaoui, 2006;Griffin, 2015). Seasonal variations also appear in the cycle of mosquito-borne infections, for instance due to temperature-dependent incubation period, in which parasites remain latent in infected mosquitoes (extrinsic incubation period), with direct impact on the generation time of the disease. The consequences of time-varying generation time have been examined for the estimation of the effective reproduction number (Codeço et al., 2018;Siraj et al., 2017). However, further characterisation of the seasonal treatment, including modelling of other environment-dependent biological mechanisms, remains a challenge. For instance, modelling the extrinsic incubation period in Plasmodium falciparum in Anopheles mosquitoes and dengue viruses in Aedes mosquitoes require different parametrizations. Other biological mechanisms such as vector mortality and the environmental carrying capacity of vectors, are potentially impacted by climate which may induce seasonal variation. Data is required to build models that jointly capture such mechanisms.

Better design of testing, surveillance and contact tracing
Public health interventions like symptomatic testing, surveillance and contact tracing and other NPIs, in addition to being vital to control pandemics, are also currently under-exploited pseudo-experiments that provide untapped potential to inform on key parameters and processes. Given a lack of knowledge, there may be considerable benefit to trialling and assessing different interventions in different places (Michie and West, 2020). There are clearly ethical and political considerations, but for example where this is already happening (see e.g. Islam et al., 2020), it would be advisable to ensure that sufficient data are collected to enable as complete an assessment as possible. In some cases this currently collected data may be used directly but in others it may be necessary to enhance data or meta-data collection, for example more accurate characterisation of who is tested and why may enable better use of (positive and negative) case reports. Modelling helps to extract meaning from raw data and parameterised models can then be used to guide interventions. In addition, this model estimation process can be reversed to enable power-type calculations that inform improvements in the data collection process itself. Thus modelling is central to better exploitation of pandemic data sources, including demonstrating the benefits of possible improvements to data collection.

Data collection during outbreaks
An important challenge is in predicting which data types, and combinations of data types, may be useful in a future pandemic. Identifying useful data types is challenging as there are many and emerging data types, including case reports or survey results based on standard and new diagnostic tests, which may be complex in nature e.g. containing genetic sequences, information on behaviour patterns from small scale experiments and surveys to larger scale anonymised CDR and GPS data, or aggregated data such as passenger flight volumes. A further complication is the unknown nature of a newly emerging pathogen.
Prior to the next significant pathogen outbreak, there is therefore considerable work to do in assessing the informative value of various data types and data collection systems. When considering the nature of the data generated from such systems it is necessary to consider not only their planned but also real world performance e.g. accounting for compliance rates. Furthermore, there are trade-offs to consider in terms of both costs of collection and the capacity to collate and effectively analyse a given variety and volume of data. This problem can be tackled through a combination of modelling different pandemic scenarios and using inferential tools  to assess the value of different data collection systems. Even within the same broad scenario, the usefulness of certain data will depend on model structure, so this work must also anticipate that a range of models likely to be used for the focal pathogens and at different stages of an outbreak. This scenario planning will enable recommendations on what data should be collected at different stages of an outbreak, with the aim of maximising societal ability to respond. Conflict between the primary purpose of such interventions and data gathering can potentially be minimised by developing general guidelines for data collection e.g. by detailing useful meta-data such as who is tested and why, that could be collected alongside case reports (Shadbolt et al., 2022).
A further challenge is the development of tools for adaptive design to prioritise data collection in real time. These could be built around sensitivity analysis of specific model outputs/value of information studies (Jackson et al., 2019) or surveillance of current data streams for changes (Xiang and Swallow, 2021). Stochastic and network models, where sensitivity analysis is significantly more complex, would benefit from further development of tools and software to make these more practical for use by modellers. Sensitivity analysis can also inform prediction of future necessary datasets, giving data collectors the time to implement required protocols.

Contact tracing
Contact tracing represents a particularly difficult challenge for mathematical epidemiology in terms of its ability to inform effective real world intervention and real time data collection (Müller and Kretzschmar, 2021). This is due to the complexities of capturing the contact patterns between individuals and the testing and tracing process which propagates over the network of contacts and locally modifies it at the same time. Better understanding of this process is needed to allow interpretation and use of contact-tracing data to inform models and better characterise outbreaks.
Some important challenges in creating mathematically rigorous results for stochastic models with contact tracing remain to be addressed. In particular, contact tracing creates dependencies between durations of infectious periods (and thus cumulative infectivity) for infectors and their infectees. Because of this, the ordinary theory of branching processes does not suffice, even if it is possible to deduce the correct (marginal) distribution of the number of other people an infected person infects (Müller et al., 2000;Müller and Hösel, 2021). To obtain theoretical results on questions such as "what is the probability that a major outbreak occurs in a population with a functioning contact tracing infrastructure?" new models that allow for dependencies created by contact tracing and that can be analysed in a mathematically rigorous way need to be developed.
Further important challenges are connected with improving the description of the real world. Clusters and superspreading events are known to be decisive for contact tracing effectiveness, with larger clustering often improving the effectiveness of contact tracing, and particularly backward strategies that search for the infector of the index individuals, since they are more likely to be superspreaders (House and Keeling, 2010;Endo et al., 2021). Settings like households, workplaces, or more generally places of aggregation or events, are particularly important as policies often act on such scales (Kucharski et al., 2020;Fyles et al., 2021;Kretzschmar et al., 2020). Another important aspect is the realistic modelling of time and resource constraints, as contact tracing typically requires an extensive infrastructure to identify infected cases and swiftly search and isolate the contacts of a confirmed case. Therefore, theoretical results (e.g., the efficacy of backward versus forward contact tracing, or results on the controllability of the epidemic) that are often derived under assumptions of unlimited tracing capabilities, should be carefully evaluated in the presence of limitations to capacity. In this context, mathematical modelling has proved useful in assessing the effectiveness of new technologies like digital contact tracing compared to manual contact tracing Kucharski et al., 2020;Ferretti et al., 2020). However, further work involving resource limitations is critical.
Estimating model parameters from surveillance data is crucial to evaluate the effectiveness of interventions and identify margins for improvement. Some work in the direction of developing deterministic models that are efficient to solve numerically (hence amenable to model fitting) while capturing the essential features of individual contacts with rigorous probabilistic arguments has been done using time-sinceinfection models (Müller et al., 2000;Scarabel et al., 2021) or deterministic compartmental models (Sturniolo et al., 2021). However, estimation of contact tracing parameters via model fitting remains an open challenge .

Open, transparent, and trusted models to support policy
The challenges addressed here focus on the need to ensure models that support public health policy are timely, better enable decision making and are transparent and openly scrutinised. Using epidemiological models to inform public health policy requires that models be trusted by policy makers and the public alike. Although the road to public trust is complex and subject to a range of forces, greater openness and transparency of model code, data, and underlying assumptions than is currently typical, are required to aid reproducibility and engender increased trust, including being clear and honest about the level of uncertainty inherent in each analysis. Such an approach will allow scrutiny within and beyond teams using model outputs which will lead to more robust understanding, through testing of assumptions by a wider scientific community, and ultimately better policy and greater public confidence.

Modelling for real time decision support
Perhaps the key problem for real time decision support during disease emergencies is the rapidly evolving landscape in terms of both knowledge of the pathogen and its impact, and hence on the priority questions for public health. Analysis therefore must be timely. This requires better modelling tools for more rapid simulation, statistical estimation, model assessment and uncertainty quantification , and greater understanding of available data (Shadbolt et al., 2022), but also analytical understanding of the wider classes of model as discussed above.
Another critical issue for real time decision support is the need for effective quantification of uncertainty and good communication with decision makers. For example, what uncertainties to account for should be influenced by the requirements of decision makers e.g. as part of constructing scenarios to assess alternative interventions. More detailed understanding of transmission mechanisms and the effects of interventions requires more complex models, but inevitably also very many (often unknown) parameters and typically new or larger datasets. The value that statistical inference can extract from data is intimately tied to the structure and formulation of models. As the statistician George Box wrote `all models are wrong' (Box, 1976), but some are useful, meaning that models are simplifications of reality, thus false or wrong, but can still, dependent on model purpose, retain the essential features of the real world process, generating understanding and allowing prediction.
Modellers and decision makers must grapple with uncertainty driven by spatial, temporal and societal heterogeneities, stochasticity in dynamics and incomplete knowledge of parameters and model structure. Comparing models and combining outcomes across multiple models that vary in complexity, allows assessment of the robustness of conclusions to a range of assumptions. For example, the relationship between the real time growth rate of the number of cases or infected hosts and the reproduction number R t is highly sensitive to the distribution of the infectivity profile of an infectious host (Roberts and Heesterbeek, 2007;Wallinga and Lipsitch, 2007). On the other hand, the relationship depends to a much lesser extent on population structure (assuming that structure is characterized by several distinct classes of individuals, a social network structure or households; Trapman et al., 2016). A key challenge is therefore to develop practical methods to combine and use outputs from multiple models to enhance the robustness of results and recommendations for policy. Substantial work is also needed on comparison of structurally different models, not only in terms of model fit to data, but also in terms of predictions.

Openness within and beyond the scientific community
Issues with scientific peer review, reproducibility, and ultimately public trust arise when models used to inform policy cannot be readily scrutinised, are not properly documented, or are not immediately and openly available. Improved tools and a greater culture of openness (such as making model software open source as standard) are needed to support open modelling for future pandemics and to lay bare the assumptions and limitations of the approaches being used in any given analysis.
It is important that modellers communicate the dependence of predictions on such assumptions and make clear how these limit the conclusions drawn. There need to be transparent and accessible links between models, data and assumptions, and use of best practices in terms of documenting and testing open-source code. The need for such an approach is demonstrated by criticisms of models used to inform public health policies during the COVID-19 pandemic (Rice et al., 2020). Shadbolt et al. (2022) describe a roadmap for the development of suitable standards and software that would provide traceability and transparency tools that accessibly link model outputs to data and assumptions (see e.g. Mitchell et al., 2021).
To further increase trust in models used to inform policy, new and accepted open epidemiology standards are needed to provide documentation of model quality, reproducibility and fitness for purpose. This includes the need for modelling teams to provide evidence of defensible parameters and, ideally, model inference from data or transparent sources for model parameters and model structure. Even more critical is the need to demonstrate evidence of model testing and assessment against simulated, and ideally observational or experimental data. For example, is the model able to predict future trends and to what extent?
In addition, open accessibility of models needs to be accompanied by facilitation of respectful debate including clarification and questioning by the widest possible range of stakeholders. Finally, where possible models and model-based studies used to inform public health policy and intervention should be rigorously peer-reviewed (Sutherland and Lythgoe, 2020). The Royal Society's Rapid Assistance in Modelling the Pandemic (RAMP) initiative, which crowd-sourced rapid reviews of the burgeoning scientific literature, and the Rapid Reviews: COVID-19 project (rapidreviewscovid19.mitpress.mit.edu), provide paradigms for how this could be done.
In summary, there is an urgent need to develop and ensure wide adoption of open modelling standards and tools that meet the above requirements. Despite inherent time constraints in pandemic response such tools should facilitate both replication and reproducibility including rigorous testing and adaptation of epidemiological models. Indeed, there have been some moves towards this during the COVID-19 pandemic, for example, reanalysis of influential model results (Rice et al., 2020).

Deeper engagement with policy and decision makers
A further broader challenge is that of communication on the modeller-decision-maker interface (for an in depth discussion see Hadley et al., 2021). It is critical to present information from modelling clearly and in way that enables it to contribute to effective policy and decision making. For example, adoption of suitable visualisation techniques should focus on how best to present information in a way that supports human decision-making abilities . Such communication needs to convey both uncertainty and model assumptions and inherent limitations. The development of the open epidemiology standards and data pipeline tools discussed above would provide a framework within which to structure such communications. However, modellers should also give greater consideration to the presentation of results and the art of writing succinct, honest, executive summaries and syntheses across multiple sources of evidence. Such communication must take into account the way in which users of the information provided are likely to interpret what is presented.
To deepen trust and increase the effectiveness of interactions across the modeller-decision-maker interface, modellers should also engage in co-construction of models and model-based analysis with policy, decision makers and other stakeholders. Such co-development would focus on formulating policy relevant questions which could be usefully addressed by modelling, and also on the development and parameterisation of models used in answering these questions. We note that the tools of expert elicitation will be invaluable in realising these ambitions (for a relevant discussion of these tools see Swallow et al., 2022). In line with the points made previously about the need for transparency and trust in models, it is vital to ensure that such elicitation processes are well documented and open to scrutiny. Adoption of these deeply collaborative approaches would likely lead to substantially greater public health benefits derived from epidemiological modelling efforts. Moreover, the time consuming nature of such interactions suggests that the necessary exercises should begin well ahead of future pandemics. This would allow identification and testing of ideas and procedures but also, and perhaps most importantly, the development of suitable relationships at institutional and individual level.

Discussion
Despite the substantial public health benefits derived to date from epidemiological models of infectious diseases, there are still many opportunities to enhance modelling to better inform management and control of future pandemics. We have identified a wide range of challenges for epidemiological modelling (see Table 1) that need to be addressed if such benefits are to be realised. These fall under structural and analytic modelling challenges to better account for contact processes and host and pathogen complexity, and a set of challenges related to improving the impact of modelling in future pandemics (see Fig. 1). Although we have largely focussed on human disease, most of the challenges discussed are relevant to disease dynamics in other host populations, and it is important to recognise that future pandemics in animals or plants could significantly impact ecosystem services, food security and human wellbeing.
In terms of improving the impact of modelling, the overarching challenge identified is the need to make models that are more effective at providing decision makers with relevant information during pandemics. This will be underpinned by better communication with public health stakeholders at all stages from identifying questions, through model development, to the synthesis of policy advice built on model outputs and expert judgement. Such communication needs to: be built on better understanding of how target audiences perceive presented information; express the limitations of models and uncertainty in model outputs; and critically build a clear understanding of what questions are addressed by specific analyses to enable more effective decision making. An important step in achieving this is the development of tools and standards through which to make models and model outputs used to inform policy more open, transparent and trusted.
This need to better inform policy also drives the need for more sophisticated methodology. For example, an important challenge for the formulation, parameterisation and analysis of transmission models is the need to better account for heterogeneities in host populations. This technical challenge is very directly linked to the formulation of ethical public health policy that takes proper account of issues of equity as these relate to variation across the population in terms of the vulnerability, contact networks and propensity to comply with public health advice and mandates. This includes the need for modellers to consider response to risks in marginalised groups and populations that may be disproportionately impacted, yet poorly served by existing health provision (limiting data available for modelling) and lacking the resources to respond in the same way as those more advantaged. Thus this challenge extends to developing appropriate modelling tools and analysis to support public health in situations where data is limited.
In terms of impacting pandemic control, challenges identified include more realistic modelling of time and resource constraints including rates at which interventions including contact tracing and vaccination can be conducted. The modelling of contact tracing is itself identified as a particularly difficult problem due to induced dependencies between disease contacts. A further important opportunity for models to contribute to pandemic control is by using available data better to estimate the underlying characteristics of outbreaks e.g. real time reproduction numbers, transmission rates and disease progression. Furthermore, the largely untapped potential to use models to enhance possibly adaptive collection of data during a pandemic could transform future public health response. The ability to achieve such goals is dependent on continued improvements in the formulation, development and testing of models and theory of infectious disease.
Current critical gaps in the modelling toolkit include the need further to develop models to account for heterogeneity and more complex dynamics of contact structures including host behaviour, hosts and pathogens. Developments in modelling within-host dynamics will enhance implementation of interventions including contact tracing and vaccination by improved theoretical understanding and better quantification through more complete use of information from disease diagnostics. Similarly, the explosion of information on pathogen genetics raises significant modelling challenges in terms of exploiting this data to create better epidemiological models including representing the diversity and evolution of pathogens, and ultimately how these interact with host genetics. An urgent and overarching requirement is the need to enhance the modelling of contact processes in human society and how these respond to public health messages and interventions including compensatory behaviours. Continued threats from SARS-CoV-2 and risks of future pandemics point to the need for models that more fully couple the dynamics of human societies and pathogens. Such humandisease system models would enable exploration of public health and wider policies influencing vulnerabilities to infectious disease that would make humanity more resistant to the emergence of zoonotic diseases and less vulnerable to pandemic spread when they inevitably arise.

Funding
This work was supported by Engineering and Physical Sciences Research Council (EPSRC) grant no. EP/R014604/1. G.M. is supported by the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS).
L.H. is funded by the Wellcome Trust, UK (block grant no. RG92770). L.P. is funded by the Wellcome Trust, UK and the Royal Society, UK (grant no. 202562/Z/16/Z). L.P. and F.S. are supported by the UK Research and Innovation (UKRI) through the JUNIPER modelling consortium (grant no. MR/V038613/1). L.P. is also supported by The Alan Turing Institute for Data Science and Artificial Intelligence.
JPG's work is supported by funding from the UK Health Security Agency and the UK Department of Health and Social Care. This funder had no role in the study design, data analysis, data interpretation, or writing of the report. The views expressed in this article are those of the authors and not necessarily those of the UK Health Security Agency or the UK Department of Health and Social Care.

Authors' contributions
All authors took part in discussions and wrote sections of the manuscript. G.M. coordinated discussions throughout and compiled the final version of the manuscript. All authors edited the manuscript and approved the final version for publication.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.