An infectious way to teach students about outbreaks

Highlights • An updated epidemiological teaching exercise was developed.• Students participate in an outbreak that they subsequently analyse.• Data from five years of consecutive student cohorts is presented.• An R package and practical are developed that improve the pedagogical experience.


Introduction
An understanding of non-linear mechanisms that produce chains of transmission underlying outbreaks and epidemics is central to infectious disease epidemiology. Consequently, outbreak simulation is increasingly used as a teaching tool. Approaches to simulating real-life epidemiological data range from simple (e.g. rolling a dice) to complex (e.g. electronic barcode scanning), resulting in the generation of datasets that can then be analysed for epidemiological parameters of interest (Moore, 2017;Hayward, 2017).
Several recent infectious disease outbreaks have received much attention owing to their potentially high impact on public health. For instance, the Middle East Respiratory Syndrome Coronavirus (MERS-CoV), first reported in Saudi Arabia in 2012, currently circulates at a sub-critical level causing sporadic outbreaks (Cauchemez et al., 2014;Cauchemez et al., 2016). The West African Ebola epidemic caused over 28,600 cases between 2014 and 2016 (WHO Ebola Response Team et al., 2016) and its subsequent decline coincided with the emergence of Zika virus transmission in Brazil (Cugola et al., 2016). Following an outbreak of pathogens such as these, initial investigations include estimating the potential for onward transmission (i.e. the basic reproductive number, R 0 (Anderson and May 1992)) and understanding key epidemiological features to inform possible control policies. Such epidemiological studies have been crucial to understanding and managing the transmission dynamics of important outbreaks such as Severe Acute Respiratory Syndrome (SARS) in 2003, swine flu in 2009, and more recently for MERS, Ebola and Zika (Cauchemez et al., 2014;Riley et al., 2003;Cauchemez et al., 2009;WHO Ebola Response Team, 2014).
Outbreak exercises are an established tool for field epidemiology training. Often employed by national and international public health agencies, different case studies such as foodborne outbreaks are used to train epidemiologists to track down the aetiological agent and then apply appropriate control measures (CDC, 2017a;Public Health England, 2017). Case studies of infectious disease outbreaks are used in classroom teaching to introduce students to key epidemiological concepts such as the incubation period, serial interval, attack rate, and the basic reproductive number (R 0 ) (Ponder and Sumner, 2009;Betancourt-Bethencourt et al., 2016). Other outbreak exercises are designed to train individuals as case investigation interviewers to increase the public health surge capacity (Gebbie et al., 2007). In addition, virtual exercises and complex computer simulations which incorporate network models have been used as teaching tools (Huang et al., 2010;Neulight et al., 2007;Hsieh et al., 2006). Furthermore, there are now numerous interactive online teaching resources that allow the user to 'solve' outbreaks by playing the epidemiologist (Barber and Stark, 2015;CDC, T 2017b;Center for Technology in Teaching and Learning, Rice University, 2017). The University of Cambridge provides a selection of online games where the aim of the game is to infect as many people as possible using the characteristics of a specific infectious disease such as influenza (Cambridge Infectious Diseases, 2017). More widely, the potential for virtual role-playing games which exhibit social dynamics, economies and player-objectives resulting in complex networks to be used as model case studies have also been considered (Castronova, 2003;Balicer, 2007).
However, there are few training modules that explicitly use realtime simulated stochastic outbreaks where the participants themselves comprise the dataset they subsequently analyse (Bellan et al., 2012;Darwiche and Bokor, 2016). Participation in a real-time simulated outbreak and analysing the resulting data reinforces and affords a better understanding of important concepts while gaining insight into the process of infectious disease transmission. To meet this need, Bellan et al. developed a pedagogical approach to bridge the divide between classical and dynamical epidemiology (Bellan et al., 2012). This approach involved real-time simulation of a stochastic outbreak among participants, which ranged from university undergraduate students to academic professors, attending an epidemiology training course. We adapted the approach developed by Bellan et al. and used it as a teaching tool for postgraduate epidemiology students by simulating a five-day outbreak among five consecutive student cohorts (2012-2016 inclusive). Here we describe how we conducted the outbreak simulation and how the data are analysed. We provide the teaching materials for others to adapt and use in their own settings.

Materials and methods
A stochastic outbreak was simulated over the course of five days. The outbreak was simulated within postgraduate students of five consecutive student cohorts (2012-2016 inclusive) of the MSc in Epidemiology at Imperial College London. The infectious agent was represented by a paper form (see Supplementary Material). Students acquired "dide-disease" (an imaginary infection) by receiving a form from a fellow student and transmitted infection by handing out new forms.
The outbreak was seeded on day one by discretely handing an infection form to three students at random. These paper 'inocula' contained a list of instructions. Upon receiving the infection form each student sent an e-mail to the dataset curator to notify that they had been infected. The number of new infections to be transmitted was determined by a random draw from a Poisson distribution with a mean of 1.8, corresponding to the basic reproductive number for the outbreak. Students then printed out the required number of infection forms, located on a shared drive. Students were instructed to do all onward transmitting within 48 h. Whether or not symptoms occurred was determined by a random draw from a Binomial distribution with a mean of 0.8. Symptoms were also reported to the dataset curator via email. Each infected student listed on their infection form the names of those whom they had transmitted to and the time of transmission. Students recovered by placing their form in a 'recovery box' in their classroom. Students were instructed to place their form in the recovery box immediately after giving out all their infection forms. Infection was assumed to confer lifelong immunity.
The data were collated in a spreadsheet software (MS Excel ® , Microsoft Corporation), and analysed by the students in a practical session to analyse the outbreak. During the practical, students plot the time series of the outbreak, calculate the basic and reproductive numbers as well as other key epidemiological parameters such as latent, incubation and infectious periods. Calculating these and other properties such as the proportion of infections transmitted asymptomatically permits discussion regarding ease of control of "dide-disease". The infection form, practical handout and dataset templates are provided for others to adapt for their own use (see Supplementary Information).
When calculating the effective reproductive number (R t ), we use day of acquisition, regardless of date of onward transmission. Two different cumulative attack rates are calculated: an attack rate based on whether students were infected at all, and a clinical attack rate based only on whether students displayed symptoms. We exclude the seeds when calculating the cumulative attack rate, as the aim is to describe the process in the population of interest rather than the external process that triggered the outbreak via the seeds. Given that the time of acquisition, onset of symptoms (if any), onward transmission and recovery are documented, the latent, infectious and incubation periods can easily be calculated. When estimating the generation time, we use only successful transmissions, rather than all attempted transmissions, as this is the closest analogue to what is observed in real outbreaks. However, in real outbreaks it is the serial interval, rather than the generation time that is typically observed.
For visualisation, analysis and simulation of the outbreak data, an R package and accompanying tutorial was developed and is freely available at https://github.com/mrc-ide/outbreakteachR. Visualisation of the temporal dynamics of the outbreak's transmission network were animated at discrete intervals. Where contact tracing data was missing, the most probable source of an infection was imputed. Data imputation in this way first identified potential source individuals who were infectious at the time of the individual's infection. The individuals most likely to have caused an additional infection were then extracted. This was based on how many onward infections they had already caused and the basic reproductive number for the outbreak given by a Poisson distribution with mean 1.8. From these individuals a random source individual was chosen.
For comparison, several outbreaks with identical initial conditions were simulated. Each simulation was seeded with three infections, the population size was set equal to the students' class size and the outbreak was simulated for five days. In common with the paper-based outbreak, the number of secondary infections was drawn from a Poisson distribution with a mean of 1.8. The generation time and recovery period were sampled in two ways: (i) by sampling from a Poisson distribution with a mean equal to the observed mean generation time and recovery period respectively, (ii) by sampling from the observed distribution of generation times and recovery periods.
For each of the five years, 2000 outbreaks were simulated using a discrete-time approach with hourly intervals in line with the students' timetable. The simulated time-series was then compared to the observed time-series.

Results
Data for five outbreaks are presented (2012-2016) and together illustrate clearly how, despite key parameters being constant, stochastic effects and differences in behaviour from year-to-year affect the outcome of the outbreak. For each year, identical starting conditions were used (i.e. the outbreak was seeded with three infections and an R 0 of 1.8 was assumed). The class size increased over time from 56 students in 2012-84 in 2016. Each outbreak showed different temporal transmission dynamics (Fig. 1), however, each outbreak was broadly summarised by having one major chain of transmission resulting from one of the three seeds. A strong time effect was observed within each outbreak, with the majority of transmission observed within short windows of time each day when frequent contacts occurred. This is highlighted within the visualisation of the outbreak, which show long periods of inactivity (Supplementary Movies). This effect can be seen in the strongly multimodal distributions describing the students' behaviour (Fig. 2). There was substantial variation in the cumulative attack rate, with between 26% − 83% students uninfected at the end of each outbreak. Saturation effects were also observed in each year, with an overall decline in the proportion of infectious contacts that led to successful infections observed from day 1 to day 5, except in 2013 reflecting the observed low cumulative attack rate for that year.
Computer-simulations of a process similar to the classroom outbreak reveal potential explanations for some of the differences between years. For the 2016 outbreak, the number of infected individuals peaked at 28 on day 2 (Fig. 3a). Simulated numbers of susceptible, infected and recovered individuals broadly capture the magnitude of the outbreak. However, the simulated curves are delayed by approximately two days when the generation and recovery times are sampled from a Poisson distribution (Fig. 3b).
Sampling from the observed distributions provides an improved representation, with the diurnal patterns of the outbreak being better captured by the simulated mean, though a discrepancy remains (Fig. 3c). Subsequent comparisons between the simulated outbreaks and the observed outbreaks were consequently made with simulations using the empirical re-sampling method (Fig. 4).
Simulations captured the observed outbreak in 2012 well, however, simulations proceeded too quickly for 2013, 2014 and 2015, and failed to accommodate the slow transmission dynamics over the first 3 days followed by bursts of transmission in the final 2 days. In contrast, the simulation method lagged the 2016 outbreak and failed to capture the fast transmission dynamics on the first day.

Discussion
We successfully simulated an outbreak of an infectious agent among a class of epidemiology and public health students. Working with a perfectly observed outbreak allowed students to estimate directly key quantities such as the reproductive number (see Supplementary Fig. 1), generation time, and latent, incubation and infectious periods. In addition, key concepts including analysis of transmission networks and how this relates to the data time-series obtained from an outbreak of infection were explored. Comparing each of the five outbreaks provided students with an awareness of stochastic effects that distinguish outbreaks amongst years.
Simulating an infectious disease outbreak in the classroom is an engaging exercise as students actively participate and comprise the dataset they subsequently analyse. Analysing the outbreak data, in both excel and R, is a useful teaching exercise as it reinforces definitions previously covered in lectures. Active learning has been found to be more effective than traditional exposition-centered ("teaching by telling") approaches. Active learning can take many forms and has been defined as "engaging students in the process of learning through activities and/or discussion in class. It emphasizes higher-order thinking and often involves group work" (Freeman et al., 2014). A recent metaanalysis of 225 studies found that use of active learning in undergraduate science, technology, engineering and maths courses, improves exam performance (Freeman et al., 2014). Active learning was also found to be particularly beneficial at improving performance on "concept inventories". Thus, our constructivist approach, whereby students experience and then reflect, lends itself well to this particular teaching exercise, as the objective is to reinforce concepts and definitions.
Panel: How to run the outbreak exercise  Í. Cremin et al. Epidemics 23 (2018)  • A location to store 'recovered' infection forms, or use an online form Practical tips for running the outbreak: • Finalise a class list of all participating individuals before seeding the outbreak • Emphasise that not everyone in the class will get infected Í. Cremin et al. Epidemics 23 (2018)  • Students need to make sure to put their name on the form and the names of those they have infected Over the five years, the class size increased, however, this did not appear to affect the ability for an outbreak to be sustained, with the smallest and largest class sizes (2012 and 2016 respectively) yielding the highest cumulative attack rate. Importantly, the outbreak never experienced stochastic fadeout across the five days of the outbreak, with new infections still occurring on day five in all years. Overall, the outbreaks presented with short incubation and infectious periods, but comparatively longer and more variable latent periods. The increased variance in the latent period, and subsequently the generation times, was typified by the strongly diurnal patterns in the behaviour of the students, with most events occurring within their timetable. It is interesting to observe the decline in transmission exhibited on the third day of the 2012 and 2016 outbreak, which could reflect the absence of afternoon classes on Wednesday. However, this effect is hard to be definitive about, with the outbreaks in 2013-2015 exhibiting low levels of transmission over the first three days.
The human simulation is different from a computer simulation of the same system. Sampling the generation time and recovery time directly from the observed distributions, rather than sampling from a Poisson distribution with a mean based on the observed data, markedly improves the closeness of the simulated outbreak to the observed outbreak. This effect can be explained by the multi-modal nature of the distributions seen within the outbreaks, reflecting the strong diurnal time-effects observed, which weakens the suitability of a unimodal distribution such as the Poisson distribution.
Although the simulated outbreaks largely failed to capture the 2013-2016 outbreaks, we are able to use these discrepancies as a teaching exercise, through both highlighting the weaknesses of the mathematical model we used, and to highlight phenomena within the observed outbreaks that are typical within real outbreaks. Firstly, the simulation lagged behind the observed outbreak in 2016, due to the large amount of transmission that occurred on day 1 (see Supplementary Fig. 2). The lengthening generation times in 2016 could represent a degree of waning enthusiasm from the students, which would invalidate the time-independent sampling used within the simulations. Secondly, the long latent times exhibited at the beginning of the 2013-2015 outbreaks, combined with a burst of shorter generation times at the end of these outbreaks, yields another time-dependant effect upon the distribution of generation times. These effects, combined with the lower population mean number of secondary infections observed in these years, offer a potential explanation for the poor predictions afforded by the simulations. Additionally, these effects also demonstrate a teaching point concerning the potential for time dependent effects to occur in outbreaks, representative of real outbreaks (Chowell et al., 2015;Nishiura, 2010), and the difficulties they present in mathematical modelling. The slower transmission in 2013-2015 is illustrative of commonly observed stuttering chains of transmission that occur prior to sustained chains of transmission (Dibble et al., 2016).
This teaching exercise provides a flexible template which can easily be modified for other audiences, such as public-health professionals and policy makers. For example, a more advanced class with prior experience of epidemiological modelling could calculate clustering coefficients and path lengths to assess deviation from a random mixing assumption. For a larger class, students could calculate R 0 from the final class size and growth rate, and compare to the observed R 0 . The outbreak itself could be adapted to run over a shorter or longer time period (by altering R 0 or time in which onward transmission should occur) and could also be elaborated by looking at the impact of interventions to reduce transmission, such as vaccination, whereby some students are immune to infection. As noted by Bellan et al., the outbreak could be elaborated to include multiple pathogen strains with different properties (such as infectiousness, symptomatic proportion and infectious period).
These data collected over five years highlighted a number of limitations within the methodology of the practical. Firstly, the paper 'inocula' form was largely well completed by students, however, there were a number of missing data. A large degree of the missing information could be imputed once all the data was collected, but there is no guarantee that participants will always fill the form in fully or accurately. This represents a key issue in data collection (Owada et al., 2016), one which is often addressed with the design of more intuitive and robust surveillance tools and software. To address this, an online form, utilising both automated data fields and required data entry fields, has been designed for use in the future as an alternative to the paper form (see Supplementary Form). It is hoped that this will also strengthen participant engagement and will help prevent students 'forgetting' about the outbreak, which appeared to occur on a few occasions when observed latent periods were greater than 48 hours (see Supplementary Fig. 3). Secondly, the initial practical spreadsheet was designed to emphasise how epidemiological quantities such as the generation time and latent periods are calculated, as well as for plotting the outbreak time series. While this was helpful for introducing these calculations, we found that this analysis was overly time consuming and reduced the ability to explore additional epidemiological concepts such as the cumulative attack rate and saturation effects. Consequently, we redesigned the spreadsheet to automate specific sections of the analysis, thereby improving the pedagogical experience (see Supplementary Datasets).
In conclusion, we present five years of data in response to the approach presented by Bellan et al. (Bellan et al., 2012). Furthermore, we have introduced several new features to extend the pedagogical insights afforded by this approach, in particular the incorporation of a networks component, and the focus on comparing modelled results to the data itself in order to demonstrate both the strength and potential limitations of mathematical models. The approach could readily be modified for other settings, for example over a longer or shorter period of time and is a useful method to engage students and reinforce key epidemiological concepts in a "real-world" setting. In addition, the R package is freely available for extension and adaptation to consider alternative epidemiological questions and analysis and provides a framework to introduce discrete-time models.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflicts of interest
None.