Learning transmission dynamics modelling of COVID-19 using comomodels

The COVID-19 epidemic continues to rage in many parts of the world. In the UK alone, an array of mathematical models have played a prominent role in guiding policymaking. Whilst considerable pedagogical material exists for understanding the basics of transmission dynamics modelling, there is a substantial gap between the relatively simple models used for exposition of the theory and those used in practice to model the transmission dynamics of COVID-19. Understanding these models requires considerable prerequisite knowledge and presents challenges to those new to the field of epidemiological modelling. In this paper, we introduce an open-source R package, comomodels, which can be used to understand the complexities of modelling the transmission dynamics of COVID-19 through a series of differential equation models. Alongside the base package, we describe a host of learning resources, including detailed tutorials and an interactive web-based interface allowing dynamic investigation of the model properties. We then use comomodels to illustrate three key lessons in the transmission of COVID-19 within R Markdown vignettes.


Introduction
Mathematical modelling of COVID-19 epidemiology has been a crucial determinant of policymaking during the pandemic. For example, in the UK, the release of ''Report 9'' by Imperial College London [1] is widely believed to have been influential in the UK government's decision to institute a first lockdown on 23 rd March 2020 [2]. Compartmental models of transmission dynamics (see, for example, Anderson and May [3], Brauer [4]), in particular, have featured prominently. In these models, individual people are allocated to compartments which, amongst other things, indicate their disease state. Popular variants of compartmental models used during the pandemic include Susceptible-Exposed-Infectious-Recovered (SEIR) and Susceptible-Exposed-Infectious-Recovered-Dead (SEIRD) models, where individuals naive to infection are labelled susceptible (S); individuals who are infected but not yet infectious to others are labelled exposed (E); exposed individuals then eventually become infectious (I) and subsequently recover (R) or die (in the SEIRD model only with those individuals falling into the ''Dead'' (D) compartment).
The advice given to the UK government through the Chief Scientific Advisor has been guided by the Scientific Pandemic Influenza Group on Modelling (SPI-M), and a diversity of models have been used to inform this [5]. These models include: the ''Warwick model'', an ordinary differential equation (ODE) model with SEIR compartments with the infected compartments broken down into symptomatic and asymptomatic cases [6]. The model incorporates compartments representing admissions to hospital and intensive care units (ICUs) and is age-and spatially structured by region of the UK; the ''Cambridge/Public Health England model'' is an ODE model with an SEIR base structure extended to include two exposed and two infectious compartments [7]. In addition, the model is structured by age, incorporating age-dependent contact structures which vary according to UK region which can change over time in line with published mobility data; the model underlying Imperial College's Report 9 [1]  model with SEIR compartments, which accounts for mixing withinhouseholds, at specific places (e.g. work or school) or randomly in the community. The model is spatial and age-structured. In it, infectious individuals are stratified into asymptomatic, mild and ''influenza-like illness'' groups. From this last, more severe symptomatic group, individuals may be admitted to hospital, with a subset of hospitalised individuals requiring care in ICUs; the ''Manchester model'' is a deterministic non-age-structured SEIR model [5]. This model allows a piece-wise transmission rate, which can change in response to nonpharmaceutical interventions such as a lockdown. Apart from these mechanistic models, a host of more statistical approaches, primarily based around renewal models [8,9] have also been used within SPI-M. This latter class of models typically require fewer assumptions to be made about the transmission dynamics and, because of this, are straightforward to fit to data. A cost of their relative simplicity is that they are less able to explore a diversity of future scenarios compared to mechanistic models. In this paper and in our software, we currently focus on mechanistic models of transmission, and we plan to introduce other model classes in the future.
Additionally, there have been a host of other models that have been influential in informing policymaking worldwide. Many of these models have incorporated detailed information bespoke to particular vaccines. For instance, Hogan et al. [10] extend a deterministic, age-structured SEIR compartmental model (described in Walker et al. [11]) to include vaccinated compartments. The modelling approach assumes that those who are currently infected do not receive the vaccine and that vaccines offer partial protection which can wane over time at rates specific to particular COVID-19 vaccines. An international consortium of modellers named the ''COVID-19 Modelling (CoMo) Consortium'' [12] use a deterministic SEIRD-type model structured by age which also includes, amongst other components, compartments corresponding to individuals in quarantine, vaccinated individuals, and hospitalised individuals. This model inspired the development of our package, hence we chose to name it comomodels.
The larger models used in transmission dynamics modelling of COVID-19 comprise a complex mix of many different small modelling motifs. For an individual new to this type of modelling, such as a graduate student or a policymaker, understanding how these various components function and interact is key to building intuition about how these models work. It is also important in ensuring that the models themselves or their outputs are not misused. Whilst there are ample pedagogical materials on basic SEIR models, to our knowledge, there are no widely available tools used to teach understanding of the more complex types of models often used to decide public health policy for diseases like COVID-19. Others have recognised the need for such materials: relatively early on during the pandemic, The Royal Society published a rapid review of the science of estimating the reproduction number for COVID-19, which included toy models used to explain facets of transmission dynamics modelling of COVID-19 [5].
Historically, mathematical modelling expertise has been concentrated in high income countries (HICs), with several infectious disease modelling groups forming consortia. These consortia inform policy for HICs but also consult for policymakers in low and middle income countries (LMICs). This asymmetric group of HIC modellers and LMIC policymakers that ultimately determines policy for LMICs is inequitable and may lead to models far removed from reality [12]. Part of this asymmetry likely stems from a lack of capacity in LMICs, and we argue that part of this is due to a lack of widely available pedagogical tools that teach how modelling is applied in practice.
This stimulated us to create an open-source pedagogical R package called comomodels, which aims to produce accessible materials pertinent to applied COVID-19 modellers. comomodels was developed using robust software development principles including unit testing and continuous integration testing. The main workhorses of the package are a series of deterministic compartmental models that represent the transmission dynamics of infectious diseases which are spread humanto-human, such as COVID-19. The starting point of the package is the basic SEIRD model. Upon this foundation, we build a series of models of increasing complexity (see Fig. 1), each with a tutorial vignette where that model is applied to explain elements of COVID-19 transmission dynamics. By working incrementally through these tutorials, an individual can build a foundation in modelling required for understanding much of the applied work on mechanistic modelling of COVID-19. Alongside the package, we have created an interactive web-based interface [13], which visualises the structure of the model and plots key outputs.
In this article, in Section 2, we briefly describe the models currently available in the package. In Section 3, we describe package design choices and functionality and outline the approach taken for development. Then in dynamic vignettes described in Section 4, written in R Markdown, we use our package to illustrate three key lessons in COVID-19 modelling.

Models
comomodels contains a series of deterministic ODE-based transmission dynamics models of infectious diseases which are passed directly between humans (see Fig. 1). The set of models is tailored to highlight key elements of models of COVID-19 infection dynamics in the literature. In this section, we provide equations and a brief motivation for each model. For much more detailed information, we recommend consulting the accompanying website 2 which contains detailed tutorials for each model. This pedagogical material is a keystone of the package and aims to help a user build intuition behind transmission dynamics modelling of COVID-19. In doing so, these tutorials also explain how to effectively use the package functionality. These tutorials can be run by cloning the GitHub repository. Alternatively, the rendered markdown files can be viewed on the package website.
In what follows, each of the model systems described is closed with a set of initial conditions, and we omit them in what follows for brevity. Across all models, we assume that the states are normalised to 1: this means, for example, in the SEIRD model, that ( ) + ( ) + ( ) + ( ) + ( ) = 1 for all times ≥ 0.

SEIRD: the base model
All the models in the package are extensions of the SEIRD model. This model considers the dynamics of individuals within five compartments: Susceptibles (S), Exposed (E), Infectious (I), Recovered (R) and Dead (D) individuals. The model is described by the following system of coupled ODEs: Here, > 0 quantifies the rate at which a susceptible individual becomes infected (entering the exposed compartment) after coming into close contact with an infectious individual; > 0 represents the rate at which exposed individuals become infectious; > 0 indicates the rate that infectious individuals recover; and > 0 indicates the rate at which infectious individuals die.
The SEIRD model tutorial describes how to parameterise the model to simulate COVID-19 transmission dynamics. It then explains how All models are extensions of the SEIRD model (top; see Section 2.1). The models on the second row have additional state variables and transitions between states compared to the SEIRD model; the third row shows non-age-structured models which model the effect of interventions on COVID-19 transmission; the bottom row shows the age-structured models.

SEIRDAge: an age-structured model
A natural extension to the SEIRD model in Section 2.1 is to add age-structure to the population. This is equivalent to splitting each of the five SEIRD compartments into separate ones for each age group considered. This is done because sociological -e.g. contact rates and patterns -and biological -e.g. mortality and pathogen clearance rates -characteristics may vary, on average, according to the age of individuals.
In these, so called, age-structured models, individual age groups have different characteristics, which are reflected in varying parameter values, such as mortality and recovery rates. Additionally, heterogeneous mixing between different age groups is introduced to allow certain combinations of age groups to meet more often than others. A newborn child will typically spend most of a given day with their parents; a teenager may mix frequently and mostly with other teenagers; an elderly person may mostly mix with younger members of their family; and so on. In age-structured models, so-called contact matrices keep track of the expected number of contacts between people of different ages. It is also possible to estimate the number of contacts which occur at various locations where people spend time: for instance, the number of contacts occurring at home, school and work. Moreover, contact matrices can vary by geography to handle differences in social structures typical of diverse societies. As part of comomodels, we make the contact matrices estimated in Prem et al. [15] for 152 countries available. In Fig. 2, we show a set of contact matrices for India, where each panel represents the average number of daily contacts between individuals of different age groups for a particular location type (home, school, work and ''other'' -a catch-all covering all other locations). In this case, the contact matrix used in simulations would be the sum of these location-specific ( ) matrices, , : where ∈ {home, school, work, other}. The elements of the contact matrix , ≥ 0 indicate the expected number of contacts an individual of age class experiences with age class per day in location . The matrix , is then the total number of daily contacts per individual in age class with those in age class , on average. If all the elements of the , matrix are 1s and the death and recovery rates are set to be age-independent (i.e. = , = ), the infection dynamics recapitulate those from the base SEIRD. Running counter to intuition, contact matrices are typically not symmetric: that is, , ≠ , for reasons that we explain in the ''S2'' supplementary vignette.
The system of ODEs that describe our age-structured SEIRD model is given by: where ∈  indicates the categorical age group to which an individual is assigned. For example, an individual may belong to the 0-5 year old group. A user within age group may be in any of the infection states: if they are susceptible, they are counted towards ; and so on. Overall, this means that the system consists of 5|| coupled ODEs. The parameters and have the same meaning as in the SEIRD model (see Section 2.1). We allow an age group-specific death rate, , since, in COVID-19, death rates are known to vary substantially according to age [16]. We also allow age group-specific recovery rates, , since this facilitates flexible parameterisation for each group. Note that this model should be used only to handle infection dynamics over a relatively short period of time (i.e. up a few years), since individuals do not age throughout the course of simulations.
The tutorial for this model explains how Eqs. (3) allow the model to be parameterised using the mortality risk estimates for COVID-19 in Verity et al. [16]. It also visualises and explains the use of the contact matrices available in the package and shows how to calculate 0 for this age-structured model. Finally, simulations are used to show how the transmission dynamics of a COVID-19-like disease are affected by the age composition of the initial infected population.

SEIaImIsRD: A model with varying infectiousness compartments model
COVID-19 is known to evoke a range of symptoms of differing levels of severity across individuals, and individuals who are asymptomatic are thought to play a significant role in driving transmission of COVID-19 [17][18][19]. This group is particularly problematic for transmission, since they are unlikely to change their behaviour whilst infectious, unlike those experiencing more severe infections who may isolate or potentially enter care. Indeed, another reason for stratifying by symptom severity is to be able to forecast the demand for clinical care, such as intensive care unit beds.
As such, we include the SEIaImIsRD model in the package, which considers separately individuals with different degrees of infection severity. This model is described by the following set of ODEs: where , and indicate infectious individuals that are asymptomatic, or have mild or severe symptoms. The proportions of individuals entering each of these compartments is governed by the parameters, and it is assumed that exposed individuals must enter one of these three infectious states, so that + + = 1. The recovery and death rate parameters vary according to the symptom category, and it is assumed that The tutorial for this model provides a detailed explanation of how to calculate 0 for it. We then assess the sensitivity of total deaths in the epidemic to 0 and .

SEIRD_BD: a model with births and natural deaths and waning natural immunity
The current pandemic has raged for about two years meaning that births and natural deaths (i.e. those not caused by  are unlikely to have influenced transmission dynamics. Over longer periods of time, births and natural deaths can lead to replenishment and depletion of susceptibles, altering the level of herd immunity in the population. Considering COVID-19 transmission, it appears that infection confers a protection against reinfection for several months up to around a year post-infection [20][21][22], but this protection is thought to eventually wane. Waning immunity, like births, can also supplement the susceptible population and lead to conditions where COVID-19 can reinvade a population. We include a model incorporating these three processes: Here, > 0 represents the birth rate, which is assumed constant; newborns are assumed not to have maternal immunity, so enter into the susceptible compartment. > 0 represents the rate of natural death which is assumed the same across all compartments. > 0 is the rate at which recovered individuals become susceptible again. Unlike the other models, in the SEIRD_BD model the total population grows over time due to births.
The SEIRD_BD model tutorial shows how waves of infection can repeatedly sweep through the population. It also explains how this model, unlike the SEIRD model, allows an equilibrium where the disease is endemic, and it explores how many daily cases may be expected for COVID-19 at equilibrium.

SEIRD_RU: an urban-rural metapopulation model
Although the pandemic has reached just about every corner of the world, the global population does not behave as a fully well-mixed group of individuals. It is thus important to consider how well different population centres are connected and how this affects the spread of infection. This is often done with a metapopulation model, in which multiple connected communities are considered [23,24]. As a first step towards this, we include a model which consists of two interacting communities, a rural ( ) and an urban ( ) community: where 0 ≤ ≤ 1 is the fraction of the population that lives in an urban environment; = 1 − is the fraction of the population that lives in a rural environment; is the average number of contacts that an urban individual has in a day; is the average number of contacts that a rural individual has in a day; 0 ≤ ≤ 1 is the ''connectedness'' parameter, where = 1 means the whole population is well-mixed, and = 0 means the two communities are isolated from one another.
In this model, individuals do not permanently move to the other community but can interact with individuals in the other community. This should thus be seen as short-term travel, e.g. commuting to work, quick shopping trips, and so on.
Compared to a single-community model, splitting the population into two communities primarily affects how susceptible individuals become infected, i.e. the functional forms in equations d d and d d .
These equations are based on the model employed by Tun et al. [25], where the infectious group that the susceptible group interacts with, which is represented as in a one-community model, now needs to take into consideration the degree to which the communities are separated. If = 1 and the communities are completely well mixed, then we can multiple the average number of contacts that a person in that community has in a day ( + ) by the fraction of those contacts that are infectious ( + ) to give us the number of infectious contacts for a susceptible individual. If the communities are completely separate, i.e. = 0, then a susceptible individual only contacts people in their own community so we multiply the number of contacts an individual has, or , by the fraction of their community that is infectious, ∕ or ∕ . For any level of connectedness , we weight these two scenarios accordingly to obtain the number of infectious contacts for a susceptible individual.
The tutorial for this model shows how it can be parameterised using contact and demographic data, and illustrates the sensitivity of the outputs to the connectedness parameter, . A key point of this vignette is to illustrate how spatial structure generally leads to slower spread of a pathogen through a population compared to spatially homogeneous models. As such, interventions such as travel bans and travel restrictions may reduce connectivity between subgroups of a population and thus can be used to effectively slow disease spread.

SEmIRD: a model with multiple exposed compartments
In the SEIRD model, there is a single exposed compartment, which implies that the latent period follows an exponential distribution. An implication of the exponential distribution is that the rate at which infected individuals become infectious is greatest at the point when they first become infected. This contrasts with laboratory studies, which indicate that there is typically a non-negligible delay between when an individual becomes infected with SARS-CoV-2 and when they become infectious to others [26]. One way to introduce delays in such ODE models is to break up a compartment into multiple subcompartments. In the SEmIRD model, we include a user-specified number of intermediate exposed compartments, 1 , 2 , . . . , resulting in a model structure of the form: The tutorial for this model shows that the model described by Eqs. (7) results in a latent period distribution which is closer to what is observed experimentally.

SEIRDV and SEIRDVAge: models including vaccinations
A key intervention that has been used extensively by governments to curb COVID-19 transmission is mass vaccination campaigns, and, here, we present two different models for understanding their impact: the first is the ''base'' SEIRDV model; the second is an age-structured version of this model (SEIRDVAge). In these models, we add two compartments: one for vaccinated individuals who have not previously been infected ( ); and another for those who have ( ). The SEIRDV model takes the form: Here, we assume that vaccination provides perfect protection against infection while the person remains in the vaccinated compartment, but vaccinated individuals eventually lose immunity and return to susceptible. We also assume all (alive) non-vaccination compartments can be vaccinated with the exception of infectious individuals, whom we assume are symptomatic and do not participate at that time in vaccination campaigns. The maximum rate at which individuals can be vaccinated is given by > 0; the model allows dynamic changes in the vaccination campaign, and this is controlled by a ''vaccination protocol'' function Inter( ) (whose outputs are between 0 and 1) which modulates the intensity of the campaign. In this model, the rate of loss of individuals' immunity to reinfection is controlled by parameters , and for those previously vaccinated, those who have previously been infected and those who were vaccinated after they were infected respectively.
Faced with vaccinating an entire country, governments have chosen to prioritise vaccinations for those who are most vulnerable. One dominant characteristic of these campaigns has been to prioritise vaccinations for the older age groups who are at higher risk of COVID-19 complications [16]. To model age-specific vaccination campaigns, we extend Eqs. (8) to incorporate age-structuring as in Eqs. (3): where Inter ( ) indicates an age-specific vaccination protocol. Tutorials for the SEIRDV and SEIRDVAge models illustrate the importance of mass vaccination campaigns for controlling the current pandemic.

SEIRDNPIAge: a model including non-pharmaceutical interventions
Apart from pharmaceutical interventions such as vaccinations, a host of non-pharmaceutical interventions (NPIs) have been used by governments around the world [27]. These include physical distancing, shielding vulnerable groups (for example, the elderly), school closures, and lockdowns. Different types of NPIs may introduce changes in social contact matrices and/or the transmission rate [28], which leads to the following ODEs representation: (10) In this model, there are two ways to allow a dynamic change in transmission rates: the first is through contact matrices, with elements ( ), which can vary through time -these could, for example, allow for those elderly age groups to isolate themselves for a period of time; the second is through a dynamic transmission rate parameter, ( ), which could, for example, be used to represent the impact of lockdowns when social contacts across all ages are approximated as being reduced by a single common factor.
The tutorial for this model shows how this model can be used to investigate the effect of a lockdown similar to the one instituted in the UK on 23rd March 2020 [2]. In doing so, this illustrates how this NPI can ''flatten'' the epidemic curve.

SEIRD_CT: a model with contact tracing
Contact tracing, testing and isolation are key elements of many governments' responses to the COVID-19 pandemic. In this model, we consider how these types of NPIs can be used to control the spread of COVID-19. The model has the form: The effectiveness of contact tracing and quarantining hinges crucially on the timing of when individuals are isolated relative to their profile of infectiousness. It also depends on the difficulty of identifying infected and/or infectious individuals in the first place. To adequately account for these difficulties, we introduce a number of additions over and above the SEIRD model: since COVID-19 is thought to be transmissible before symptoms appear [29], we include a presymptomatic infectious compartment, ; we also introduce an asymptomatic infectious compartment, (making up a corresponding fraction 0 ≤ ≤ 1 of overall cases). If infected individuals are successfully isolated, they are assumed not to transmit infections from the moment of isolation onwards. They then follow a series of transitions of the same form as unisolated individuals: here, we index the isolated states by a superscript . This model considers two types of NPIs: self isolation upon appearance of symptoms with a proportion 0 ≤ ≤ 1 of symptomatic individuals doing so; and isolation following being traced as having come into contact with an infected individual: due to the difficulty in identifying asymptomatic individuals, only contacts of individuals who go on to be symptomatic cases are deemed traceable. Since the proportion of susceptible individuals forced to isolate is likely small relative to the overall susceptible pool, we do not include a compartment for susceptible individuals who are quarantined. Here, 0 ≤ ≤ 1 determines the proportion of contacts of symptomatic cases that are traced and isolated. This model structure is visualised in Fig. 3.
The tutorial for this model considers the relative effectiveness of these two types of NPIs and investigates their usefulness for controlling COVID-19 epidemics induced by ancestral SARS-CoV-2 and the Delta and Omicron variants.

Package structure
The comomodels package uses an object-oriented approach to design. In particular, it uses R's S4 classes to encapsulate each of the models described in Section 2. These classes allow a user to instantiate an object, corresponding to a particular compartmental model. Operations can then be performed in a consistent manner between different models. For example, a user may set, change or ask for the transmission Fig. 3. Compartmental model structure of the SEIRD_CT model. See 2.9 for a description of this model. parameters underpinning the model; a user may set, change or ask for the initial conditions of the system; they may run the model; and they may ask a model to return its basic reproduction number.
Understanding a model's structure can be made easier if the system's states and flows between them are visualised using a compartmental diagram. In comomodels, we provide this functionality to a user: calling the ode_structure_diagram method plots the compartmental structure (see Fig. 3).
Apart from choosing a model structure, the most important consideration in modelling the spread of any infectious disease is choice of parameter values, since the (typically) non-linear ODE system permits a variety of dynamics, and only a subset of these is epidemiologically realistic. Since this package is specific to COVID-19 modelling, we provide users with some predetermined parameter sets characterising the spread of ancestral SARS-CoV-2 and two of its variants: Delta and Omicron. Whilst there is ample uncertainty in these parameter values (particularly for Omicron), these parameter sets represent reasonable starting points for exploring how the different variants spread. These parameter sets are accessed through the covid_transmission_parameters function, which can, in addition, return age group-specific parameters if the model being parameterised is age-structured. The associated function documentation provides information on the studies and assumptions made in deriving these parameter value choices.
To run the age-structured models and the SEIRD_RU model, further inputs are required: an age-structured model requires contact matrices (see Section 2.2 and Fig. 2) and to set the initial conditions requires fractions of the population in each age group; the urban-rural model requires fractions of individuals living in rural versus urban environments. As part of the package, we provide datasets for each of these elements across a diverse range of countries. On running each model, an R list is returned with two named dataframe objects: ''states'', which contains the dynamic evolution of the state variables of the ODE (for example, for the SEIRD model, there are time series for susceptibles, exposed, infectious, recovered and dead individuals); and ''changes'', which calculate the number of individuals becoming infected or dying on a daily basis. This latter dataframe is particularly useful for comparison with observed epidemiological data. Each of these dataframes is in the long format to facilitate quick and powerful graphing through the ggplot2 package [30]. The dataframes each have column headings: ''time'' (in days), ''compartment'', ''age_range'' and ''value''. If the model is not age-structured, an age range of 0-150 is assigned to each data point. Fig. 4 plots the states from running the age-structured SEIRD model for a parameter set typical for transmission of ancestral SARS-CoV-2 (see the tutorial for this model for more information).
Along with the raw package functionality, the package comes with a series of detailed tutorials 3 -one for each model described in Section 2. 3 Note that these tutorials are different to those described in Section 4.

The ComoModel GUI
To complement the comomodels package, we have also produced an R shiny application [13] as an openly accessible web-based interface to the model. This tool allows users to select from the suite of models within the package, producing an interactive visualisation of the model structure. For each of the submodels, we introduce the model structure through a compartment structure diagram and its full ODE system. To use the application interactively, users first access a target model by selecting a header tab. Then, by either clicking on the corresponding parameter on the diagram or directly navigating through the side bar, the user can tune the model's parameters (see Fig. 5). Additionally, for age-structured models, users can upload social contact matrices from a host of countries [15] and population demography files [31]. Each time a parameter is modified, a simulation is run automatically, and the outputs are interactively visualised using plotly [32]. The model states and the daily incidence and deaths are visualised in separate panels. These simulation results can then be downloaded as comma-separated values (csv) files.

Software engineering approach
The model underpinning the influential Imperial College Report 9 [1] received substantial criticism for its code and software engineering practices [33]. We argue that these criticisms were misplaced given the importance of the public health message which the Imperial Report produced in short time, and we find instructive the following exert from an editorial by Christopher Witty, the Chief Medical Officer for England from the start of the pandemic until current [34]: Since the policy process tends to be very fast, papers must be timely. An 80% right paper before a policy decision is made it is worth ten 95% right papers afterwards, provided the methodological limitations imposed by doing it fast are made clear.
As a silver lining of this criticism, however, there has subsequently been a call for an increased focus on the quality of research software used to direct policy in public health [35,36]. We aimed to create software according to robust software engineering practices, which included collaborative development over GitHub using pull requests with reviews, class and function documentation, pedagogical vignettes (as discussed above), style checking of code, unit testing of individual functions and continuous integration testing.

Results
We used the comomodels package to produce three key lessons in understanding the transmission of COVID-19, which are contained within dynamic R markdown vignettes included as supplementary materials. In this section, we provide a brief description of these.   5. Using comomodels-explore to investigate the SEIRDV submodel. Here, the user selects the parameter (see Section 2.7) by clicking on the ODE structure diagram, which allows them to change this input.

On the numerical solution of compartmental models
All of the models considered in comomodels are systems of ODEs which do not have general analytical solutions. Thus, for any set of parameter values, the equations are solved approximately using numerical methods. In the vignette labelled ''S1'', we first describe a simple approach to solving differential equations using a fixed step size and demonstrate the importance of choosing an appropriate step size for the SEIRD model. In particular, we show that simulations of daily cases and deaths obtained using a daily time step differ significantly from those obtained with smaller steps. Next, we provide a brief overview of adaptive step size solvers and show how these are used in comomodels to obtain accurate solutions with reasonable runtimes. Finally, we illustrate how allowing discontinuous changes in intervention variables, such as vaccination coverage, can cause problems for numerical solution of compartmental models, and we describe approaches to avoid some of these issues.
Overall, these analyses demonstrate the importance of considering inbuilt ODE solvers not as oracular -rather as a set of tools which must be yielded with care and consideration. Of particular import when doing applied modelling is to experiment with the numerical solver step size to investigate if substantial changes to the outputs result when step size is decreased. If so, this signifies that the ODE solution has likely been poorly approximated.

The importance of uncertainty in age-specific contact patterns for quantifying COVID-19 risk
Contact matrices are an important element of age-structured transmission dynamics models, where they are used to dictate intra-and inter-generational mixing (Eq. (3)). In the ''S2'' vignette, we perform a type of sensitivity analysis on this element of the model. To do so, we use the socialmixr package (with POLYMOD contact data [37]) to generate bootstrapped samples of the contact matrix in the United Kingdom. By performing multiple simulations of the SEIRDAge model (once for each contact matrix sample), we propagate the uncertainty in the contact matrix and estimate the corresponding uncertainty in epidemic trajectories and deaths. The results show substantial uncertainty in both of these quantities, motivating the importance of incorporating uncertainty in the contact matrix either when making projections or when performing inference on age-structured models.

The importance of sensitivity analysis and the difficulties of model inference
Epidemiological models allow us to infer key model parameters of the disease from data, such as the infection rate. Using these parameter estimates, the models can be used to predict the trajectory of an ongoing epidemic. The reliability of such predictions, however, rests on (a) whether the model itself is a faithful approximation of reality and (b) whether the data are sufficient to identify parameters.
In the ''S3'' vignette, we show how using limited data, typical of that encountered early on during an epidemic, can lead to considerable uncertainty in key model parameter estimates. In particular, we use COVID-19 case incidence and deaths data for London [38] collected early on during the first wave in 2020 to illustrate that many diverse parameter sets are compatible with the observed data. A consequence of this is that, at that stage in the epidemic, there were many plausible future epidemic trajectories.
To further illustrate this issue with model identifiability, we conduct a profile likelihood analysis [39,40] using simulated data. This analysis highlights the difficulties involved in estimating the parameters of the SEIRD model directly from observed cases and deaths, particularly early on during the epidemic. This means that, in reality, a number of key parameters are typically estimated using other sources of data -for instance, studies of patients admitted to hospital with COVID-19.

Discussion and conclusion
The mathematical models used to inform public health policy for COVID-19 are often large and intricate, due to the inherent complexities of the manifold ways that this epidemic has affected society. Understanding these models can be challenging, particularly for those new to infectious disease modelling. Our software, comomodels, helps people to understand the common modelling motifs that appear in these models: bridging the gap between introductory SEIRD models and more intricate models used for policymaking.
As models become more complex, there is a greater chance of coding errors. Accordingly, we designed the comomodels package using robust software development practices, including unit testing and continuous integration testing.
The package thus far focuses on deterministic compartmental models of COVID-19 transmission dynamics. A number of other modelling frameworks have been used, including stochastic compartmental models (e.g. Ferguson et al. [1]) and stochastic renewal models (e.g. Flaxman et al. [8], Nouvellet et al. [9]). Compartmental models have also been extended spatially, to allow spatial variation in transmission and to explore how inter-and intra-country mobility patterns influence transmission [1,6,41]. In the future, we intend to expand the package to explore further these modelling frameworks.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.