Using Bayes' Rule to Define the Value of Evidence from Syndromic Surveillance

In this work we propose the adoption of a statistical framework used in the evaluation of forensic evidence as a tool for evaluating and presenting circumstantial “evidence” of a disease outbreak from syndromic surveillance. The basic idea is to exploit the predicted distributions of reported cases to calculate the ratio of the likelihood of observing n cases given an ongoing outbreak over the likelihood of observing n cases given no outbreak. The likelihood ratio defines the Value of Evidence (V). Using Bayes' rule, the prior odds for an ongoing outbreak are multiplied by V to obtain the posterior odds. This approach was applied to time series on the number of horses showing clinical respiratory symptoms or neurological symptoms. The separation between prior beliefs about the probability of an outbreak and the strength of evidence from syndromic surveillance offers a transparent reasoning process suitable for supporting decision makers. The value of evidence can be translated into a verbal statement, as often done in forensics or used for the production of risk maps. Furthermore, a Bayesian approach offers seamless integration of data from syndromic surveillance with results from predictive modeling and with information from other sources such as disease introduction risk assessments.


Introduction
Syndromic surveillance appeared in the late 1990's and is becoming more and more popular in a wide range of human public health issues such as seasonal disease surveillance [1] and digital disease surveillance [2]. The wider acceptance of the relevance of the ''One Health'' concept [3] amongst public health practitioners has led to an increased exchange of methodologies and disease control knowledge between the human medicine and the veterinary sides. In the last 5 years, researchers in veterinary medicine have been investigating the application of syndromic surveillance methods for the early detection of zoonotic and nonzoonotic diseases [4].
There is no unique definition of ''syndromic surveillance'' but it is commonly accepted that it focuses on data collected prior to clinical diagnosis or laboratory confirmation [5,6]. It is therefore based on non-specific health indicators which result in a surveillance system with low specificity but allow the early detection of outbreaks without a priori considerations. This constitutes a major advantage over traditional approaches which focus on a disease, or a list of reportable diseases, and rely on the ability of clinicians to correctly diagnose cases, which may be difficult when faced with a rare or emerging disease [4]. Moreover, the systematic and continuous data collection and analysis processes reduce the impact of chronic under-reporting observed in classical passive surveillance systems and also increases the sensitivity of this method [4]. Syndromic surveillance does not replace traditional approaches to disease monitoring (e.g. riskbased, active etc…) but is seen as an interesting and complementary tool for outbreak detection with a low specificity but with better sensitivity and timeliness [7].
Current approaches used in syndromic surveillance first seek to define the normal properties of the syndrome time-series when no outbreak of disease is recorded [4,6] in order to be able to detect abnormal events overlaid on top of the background noise during an outbreak situation. In traditional aberration detection methods, an alarm goes off when the observed data exceed the expected values from the population [4,6]. Such algorithms have an epidemic threshold and provide a yes/no qualitative output: ''No, there is no outbreak'' or ''Yes, something unusual is happening in the population''.
This black or white vision of the health of the population of concern is simple but it may not always be adequate or useful for decision makers who may often find themselves in grey areas (indicator values close to the epidemic threshold). Moreover, binary result can also be difficult to combine with other epidemiological knowledge such as a probability of disease introduction or other complex parameters which influence decision making [8]. The development of syndromic surveillance quantitative outputs, which are more objective, flexible and easily interpretable, is a promising area of research.
The art of presenting scientific evidence to decision makers has been more extensively studied in forensic sciences in which legal certainty requires statements that clearly specify how strong the evidence for/against an hypothesis is and how the expert reached that conclusion. In recent years, the state of the art in forensic interpretation has been to evaluate forensic evidence using likelihood ratios in the framework of Bayesian hypothesis testing. Within this framework, it evaluates the extent to which results from forensic investigations speak in favor of the prosecutors or defendants hypotheses [9,10]. The Bayesian approach has been applied to a wide range of forensic problems including evidence based on DNA analysis [10], mass spectroscopy [11], transfer of glass, fibers and paint [10] and microbial counts [12]. However, although initially developed for the legal system, the approach has been identified as useful for supporting decision making in other situations such as the tracing of Salmonella spp [13].
The aim of this study is to test the applicability of the Bayesian likelihood ratio framework to the early detection of outbreaks in a syndromic surveillance system. Transferability of the method is demonstrated by using two examples based on real data coming from RESPE, the French surveillance network on equine diseases. The first example makes use of data on French horses presenting nervous symptoms (NeurSy) and aim to test the ability of our approach to detect simulated outbreaks of an exotic disease, West Nile Virus (WNV). West Nile disease is an important zoonotic disease and syndromic surveillance applied in horses could be used as an early warning system to protect the human population [14]. The second example focuses on data on French horses with respiratory symptoms (RespSy) and is used to detect outbreaks of divergent strains of equine influenza (New-Influenza), a non-zoonotic disease leading to vaccine failure [15][16][17][18].

Background theory and proposed framework
Forensic evaluation of evidence is based on Bayesian hypothesis testing. In a syndromic surveillance context, this would mean that, in a particular week, there are two mutually exclusive hypotheses that should be evaluated, for example: H 1 ''There is an ongoing outbreak of disease x'' and H 0 ''There is NOT an ongoing outbreak of disease x''. Without any extra information, the relative probability of the two hypotheses may be expressed as the a priori odds: where P(H 1 ): The a priori probability for hypothesis H 1 . Typically the probability of an ongoing outbreak of the disease of interest in a particular region. P(H 0 ): The a priori probability for hypothesis H 0 which is the complementary hypothesis to H 1 . Typically the probability of an outbreak NOT going on.
In other words, the a priori odds define our prior belief about the disease status in the region. In a typical situation, the prior odds would be low (e.g. 1:1000) but under some circumstances, it might be higher (e.g. if an outbreak is ongoing in a neighboring country). When we are presented evidence (E) of some kind pointing in favor (or against) of H 1 , this will make us update our belief. This posterior belief is expressed as the a posteriori odds.
Where: P(H 1 |E) is the probability of hypothesis H 1 , given the evidence (E). P(H 0 |E) is the probability of hypothesis H 0 , given the evidence (E).
In syndromic surveillance, the evidence (E) is typically the number of reported suspected cases in a given time period. The degree to which the posterior belief differs from the prior belief will depend on the strength of the evidence. If the evidence is weak, the posterior odds will be similar to the prior odds whereas strong evidence in favor of H 1 would result in posterior odds being much higher than the prior odds. At this point, it is important to note that the hypotheses to evaluate (H 1 ) may differ and that the interpretation of the same piece of evidence would depend on the choice of H 1 . For example 10 reported cases of syndromes in horses may be a strong evidence that there is something unusual going on if these are nervous cases (H 1 = ''ongoing outbreak of some nervous disease (i.e. WNV)'') but only weak evidence in favor of an equine influenza in the case of a respiratory syndrome (H 1 = ''ongoing outbreak of equine influenza''), since in the latter case we might have expected far more reported cases.
This intuitive reasoning can be formalized by the application of Bayes' theorem: Where: E is the number of reported cases of a syndrome in the particular week. P(E|H 1 ) is the probability of observing the evidence (E) given that H 1 is true. P(E|H 0 ) is the probability of observing the evidence (E) given that H 0 is true In order to estimate P(E|H 1 ) and P(E|H 0 ) we need information on the probability distribution for the number of reported cases in a non-outbreak and outbreak situation. The probability of observing n cases given that H 1 is true can be estimated using statistical modeling of baseline data [19]. When the cases are independent (i.e. not clustered), the data can be modeled using a general dynamic Poisson model [19]. When cases are clustered (overdispersion), the Poisson model will underestimate the probability of observing very high or very low number of cases, and in such cases, the data can be modeled by continuous mixtures of the Poisson distribution including Negative Binomial (NB) distribution or Poisson-log-normal (PLN) distribution [19].
The probability of E (observation of n cases) during an outbreak is calculated as: Where P base (i) = Probability of drawing i cases from the baseline distribution (e.g. Poisson(l) or NB(mu = mu base , size = theta base )) P out (i) = Probability of drawing i cases from the outbreak distribution (e.g. NB(mu = mu out , size = theta out )) The outbreak distribution may be estimated by fitting an appropriate probability distribution to data from historical outbreaks. In the absence of data, the outbreak distribution may be defined based on expert knowledge about the disease in question or assumptions about the distribution of a new disease. In most cases there would be a large uncertainty about the shape of the outbreak distribution.
The next estimate is the probability of observing the Evidence (E) that is the actual number of reported cases. In forensics, the value of evidence (V) is defined as the ratio between the posterior and prior odds for H 1 versus H 0 . The value of evidence ( Fig. 1, line Log(V)) can be calculated from the two distributions by dividing the probabilities for each number of observed cases using equation 5:

V~P
(EDH 1 ) P(EDH 0 ) ðEq:5Þ As illustrated in Fig 1 the value of evidence will depend on the assumptions about the outbreak. In the examples A to D, 10 cases are reported from a region where the baseline prevalence is around 5 cases per week. If it is expected that an outbreak may be small, resulting in only a small number of extra cases, 10 reported cases would speak in favor of an outbreak ( Fig. 1, A, C). If, on the other hand, the disease(s) of interest are expected to yield a relatively large number of cases the evidence would speak against an outbreak ( Fig. 1, B, D).
In addition, the strength of the evidence will depend on the precision on the estimates for the number of outbreak-related cases. If the distributions are wide (low theta, Fig 1A, 1B), the absolute value of log(V) is smaller whereas more narrow distributions (high theta, Fig 1C, 1D) result in higher values of log(V). This is intuitive: the more we know about what we expect to see during an outbreak, the stronger conclusions we will make from the observed evidence.

Using the value of evidence for decision making
In contrast to traditional outbreak detection algorithms, the value of evidence approach does not have a built-in decision threshold. Typically a decision maker would not act upon syndromic surveillance data alone but rather combine it with other available knowledge. Cameron [20] proposed several approaches to disease freedom questions: (1) population or surveillance sensitivity, (2) probability of freedom from disease, and (3) expected cost of errori.e., consequences of false positive and false negative results. All approaches underline how the value of inspection findings will be augmented when interpreted in a broader context to complement other monitoring and surveillance systems (MOSS) activities. One option for a decision maker would be to set an action threshold for the posterior odds. We might, for example, want to initiate an epidemiological investigation if the odds that there is an ongoing outbreak are larger than 1:1 or 1:100. Ideally the decision maker would make a cost-benefit analysis taking into account the expected costs for taking action versus not taking action. For example the decision maker may initiate control measures (vaccination program etc) when the odds are such that, on average, the reduced loss from the early detection of the outbreak would exceed the extra costs from initiating control measures (or vaccination programs) in response to false alarms.
The combination of evidence evaluation and decision theory is discussed in [21]. The expected utility (ū) of action a i is the average amount of loss that we expect to incur with this action. In the context of diseases surveillance, an action could be to implement movement restrictions, vaccination, sampling, control of vectors or to do nothing. The loss could be the direct financial losses (e.g. animal infection, disease and production losses) but also the indirect losses (e.g. surveillance and control costs, compensation costs, potential trade losses, social consequences). Since an unmanaged outbreak as well as actions will result in costs, the expected utility will always be zero or negative. In this framework the expected utility (ū) of action a i is defined as: where H 1 = Outbreak H 0 = No outbreak a 0 = No action a 1 = Action C ij = Different scenarios with respect to hypothesis on outbreak status (H 0 , H 1 ) and action (a 0 , a 1 ) C 00 represents the case with no disease and no action implemented. C 01 is no disease but action implemented, C 10 disease but no action and C 11 is disease and action implemented) p(H j |?) = probability of hypothesis j given all available knowledge (Prior probability & evidence) u(C ij ) = expected utility for each possible situation C ij . Since gain is zero the utility is determined by economical and socioeconomical loss.
According to this framework it is favorable to act when the expected utility of action (ū(a 1 |?)) is higher than the expected utility of no action (ū(a 0 |?). The relation between posterior probability (P(Hi|E) and posterior odds (O post ) is defined by: In this work O post * was determined by numerical optimization. The derived action threshold for the value of evidence V* is calculated as: where the prior odds for an ongoing outbreak Log 10 (O pri ) is based on historical experience as well as knowledge about risk factors.
To make a decision, the risk manager would multiply the prior odds with the value of evidence using eq.3 to obtain the posterior odds for an outbreak O post (H 1 |E). If this odds goes over the action threshold Log 10 (O post *) where the expected utility from acting exceeds the utility for not acting, a decision would be taken to act.

Performance assessment
Sensitivity, specificity and predictive values of positive and negative tests are important concepts when planning animal health monitoring. In the syndromic surveillance context a true positive (TP) is when the system alerts when an outbreaks is ongoing. A true negative (TN) is no alert and no outbreak. A false negative (FN) is when the system does not alert when an outbreak is ongoing, and, false positive (FP) is when the system alerts in the absence of an outbreak.
Sensitivity (SE) is the probability that a true outbreak triggers an alert: Specificity (SP) is the probability the there is no alert when no outbreak is ongoing: The positive predictive value (PPV) is the probability of an indicated outbreak being a true outbreak: The Negative predictive value (NPV) is the probability that no signal of outbreak is true absence of an outbreak: The PPV and NPV depend on the (prior) probability of an outbreak and in the performance assessment PPV was calculated as: where: P pri = prior probability of ongoing outbreak in the week of interest Implementation Models were implemented in R664 version 3.0.2 [22]. TheR-Scripts are included as part of the material (Script S1, S2, S3, S4).
Dynamic regression was performed with function glm (package {stats) [22] for Poisson regression and glm.nb (package {MASS}) [23]. The expected number of counts at time 6 were estimated with the predict function of the respective package. Alternative regression models were evaluated using the Akaike information criterion (AIC). In addition adjusted deviance (Deviance/df) was used as a measure of goodness of fit (GOF).
The receiver operating characteristic (ROC) curve was generated in R by simulation. Counts for negative weeks were sampled from a Poisson distribution (function rpois in package {stats}) with lambda equal to the predicted value for each week in 2011 and 2012 (n = 53000). Counts for positive weeks were generated by sampling values from the fitted outbreak distribution (function rnbinom in package {stats}) and adding to the baseline. SE and SP were calculated for values of Log10(V) between -1 and +3 in steps of 0.01. The expected PPV for each value of V was calculated as above using the prior odds for outbreak from three scenarios.
Threshold values for posterior odds (O post *) were estimated using the Solver function of Microsoft Excel 2007.

Sources of data
As a proof of principle the value of evidence framework was applied to neurological and respiratory syndromes in French horses. The associated time series are named NeurSy and RespSy, respectively. These data are collected through the passive surveillance system ''RESPE'', the French network for the surveillance of equine diseases (http://www.respe.net/). This system collects the declarations from veterinary practitioners registered as sentinels who fill online a standardized questionnaire depending on the syndrome concerned. Along with their declaration, veterinarians send standardized samples for the laboratory diagnosis. Tests for equine influenza, equine herpes 1 and 4 and equine arteritis viruses are implemented in the case of a respiratory syndrome, West Nile and equine herpes 1 viruses in the case of a nervous syndrome. In our study, we used these weekly time series.
Data from 2006 to 2010 were used to train our models and define the background noise of each time series when no outbreak occurs. We only used the data on the number of cases with no positive laboratory test result in order to remove the outbreaks from our datasets and obtain these outbreak free baselines. Then, different regression models were tested.
No real outbreak of West Nile disease and divergent strains of equine influenza (New-Influenza) occurred during this time. Instead fictive test data were used for demonstrating outbreak detection. The baselines in the test data were based on NeurSy and RespSy data from 2011 to 2012 where unexplained aberrations, not related to the diseases of interest, were filtered out and fictive outbreaks inserted based on historical data. The weekly counts from several real outbreaks were fitted together to model the outbreaks of each disease. The prior odds for each example are based on our knowledge on the epidemiology and risk factors for transmission of the disease. New-Influenza is supposed to have the same probability of occurrence over the year and the

Data Accessibility
The datasets supporting this article have been uploaded as part of the Material. The baseline data for NeurSy and RespSy are included in Table S1 and Table S2 respectively. The outbreak data for NeurSy and RespSy are included in Table S3 and Table  S4 respectively.
The software R can be freely downloaded from the CRAN homepage (http://cran.r-project.org/).

Case study -Neurological syndromes and WNV (NeurSy)
Non-outbreak situation. To define the background noise of the NeurSy time series when no outbreak occurred, we fitted alternative regression models based on Poisson and NB distributions from years 2006-2010 on data containing only cases with no positive laboratory results (figure S1). The models evaluated including sinod models with 1, 2 and 3 periods/year and season or month as factorial variables. To account for differences between years we dynamically calculate the average counts for 53 consecutive weeks (histmean). To ensure that an ongoing outbreak will not influence the estimate, we used a 10 week guard band [24] for calculation of histmean. For the Poisson as well as the NB regression the best fit were obtained with the simplest model: Outbreak definition. Three observed WNV outbreaks were used to simulated the outbreaks in our model: French outbreaks in horses in 2000 [25] and 2004 [14] where 76 and 32 confirmed cases were reported respectively among 131 and 72 horses presenting nervous symptoms, and the Italian outbreak in 1998 [26] where 14 cases of WNV in horses were investigated by week of onset.
The weekly counts from these three outbreaks were fitted to the NB distribution. The resulting outbreak distribution was NB(mu = 4.45, theta = 0.94). Based on this we predicted a median number of outbreak-related cases per week during an outbreak to be 3 with a 95% confidence interval of 0 to 18 cases.
Outbreak detection. Three scenarios were tested. The probability of an outbreak is not constant over the year, instead the relative probability of an outbreak occurring in spring (week 10 to 30), summer/autumn (week 31 to 46) and winter (week 47 to 9) is approximately 1:5:0.04. We chose to test one scenario per time period. i.e. the scenario A occurs in autumn, scenario B in winter and the scenario C in spring. For each scenario, the Poisson model was applied on the test set and one simulated peak/outbreak was inserted into the baseline (Figure 2). For each week the value of evidence was calculated using Eq5 where the probability of the observed number of cases during no outbreak p(E|H 1 ) and during outbreak p(E|H 0 ) were calculated using the fitted model. Examples of the calculation of V during a non outbreak (scenario A) and during outbreaks (scenarios B and C) are shown in Figure 3.

Decision scenarios
The decision making in the outbreak scenarios for both examples is summarized in table 1.
The expected utility u(C ij ) for each scenario considered are given together with the action thresholds for posterior odds (O post *) and value of evidence (V*) in favor of an outbreak. That is the situation for which the decision to act and not act have the same expected utility.
The expected utility of taking action in response to false alert (u(C 01 )) represents the costs for increased surveillance and preventive actions such as mosquito control for WNV. The utility of not taking action when there is an outbreak (u(C 10 )) represents the costs for control and economical and socio-economical consequences of an outbreak when the response to the outbreak was delayed. The losses may depend on season and in the example we have assumed that a WNV outbreak in summer or spring in the south of France results in extra costs due to its impact on tourism. Finally the utility of taking action when there is an outbreak (u(C 11 )) represents the costs for surveillance plus the economical and socio-economical impact in case of a timely response to the outbreak.
For NeurSy (Scenarios A to C), the prior odds in the table are based on the assumption that an outbreak of WNV is likely to occur every 3 years over an averageof 5 weeks. The costs used are fictional but proportional to their expected relative contributions.
During the most at risk season regarding the probability of disease occurrence (Highest O pri ), the alarm threshold is low and 4 cases are sufficient to trigger an action (See Table 1. scenario A). For the season less at resk, the expected utilities are similar than during the most at risk season (O post * are equal), but no action is implemented even if 7 cases are reported because they are unlikely due to WNV (Low O pri ) (See Table 1. scenario B).

Sensitivity, specificity and receiver operating characteristics
The sensitivity and specificity of a surveillance system is defined by the chosen action threshold. The tradeoff between sensitivity and specificity of a model may be summarized in a receiver operating characteristics (ROC) curve [27]. The ROC curve corresponding to the case WNV case study is shown in figure 4A. The values of SE and SP arising from scenarios A to C are indicated by letters. The PPV i.e. the probability that an alarm corresponds to a real outbreak [28] depends not only on SE and SP but also on the prior probability of an outbreak as indicated in figure 4B.

Case study 2-Respiratory syndromes and equine influenza (RespSy)
The same approach was successfully applied to the RespSy dataset. However, in this case the analysis indicated a significant degree of overdispersion in the weekly counts. Using the same regression model (counts , sin(2p t) + cos(2pt) + log(histmean)) the NB model had lower AIC (1141 vs 1284) and GOF closer to one (1.14 vs 2.54) compared to the Poisson model. The theta parameter for the NB distribution was 1.78, and resulting in a much wider confidence interval for the expected number of cases in a non-outbreak situation ( Figure S2) compared to the Poisson model ( Figure S3). When the NB and Poisson models are applied to the same test dataset ( Figure S4, S5) the latter will report a value of evidence for the inserted peaks (D, E) that is several orders of magnitude higher than does the NB model. The Poisson model also reports peaks with Log(V) close to 2 several times per year ( Figure S5). An underlying assumption in the Poisson model is the absence of overdispersion and, when this assumption does not hold, the Poisson model underestimates the probability of obtaining a large number of reported cases in the non-outbreak situation. Consequently it overestimates the value of evidence in favor of an outbreak. The overdispersion may be due to clustering in reporting. In the surveillance protocol veterinarians are encouraged not only to declare the diseased horse but also 1 to 3 additional horses (from the same stable), suspected to be in the incubation phase of influenza.

Discussion
In this work we have demonstrated how the value of evidence concept may be incorporated in a decision support system for syndromic surveillance and how the output may be used for risk assessment and informed decision making. According to the OIE -Terrestrial Animal Health Code [29] the decision to take action involves balancing costs for activities against economical and social consequences of a delayed response to an outbreak is the responsibility of the risk manager and should be separate from risk assessment.
Thus, although it is perfectly possible to build a system that outputs a best decision, the proposed approach is in concordance  with the risk analysis framework [29] by offering explicit separation of assumptions (P prior ), scientific evidence (V) and criteria for decisions and a transparency of how the evidence is evaluated. In forensics, the value of evidence is typically presented to the court as a qualitative statement in which fixed verbal expressions correspond to specified intervals for V [10,30]. This approach may be useful also when presenting epidemiological results. For example a value of Log 10 (V) in the range 1-2 may be expressed as ''results provide moderate evidence to support that an outbreak is ongoing''. Alternatively intervals for V and/or O post could be expressed using a color scale to produce maps representing the results from surveillance and risk of ongoing outbreaks of different diseases.
The model presented here is intended as a proof of concept and when setting up an operational syndromic surveillance system it will, as usual, be necessary to perform a careful evaluation of the baseline model to ensure that the regression model does not overfit to the baseline data. When designing the current model it was evident that high dimensional regression models were prone to find artefactual seasonal patterns that could severely bias the estimated probability of observing a number of counts in a particular week (results not shown). In the current implementation the model learns seasonal patterns and distribution of residuals (Inverse theta parameter of NB distribution) from manually curated data whereas the expected yearly average (histmean) is continuously updated from outbreak-filtered weekly data. Naturally the value of evidence concept may also be applied to a system where the baseline model is automatically retrained on new data. However, since the distribution parameter (theta) of the NB distribution would determine the cutoff in the filtering algorithm we argue that it is safer not to use the filtered data for estimation of the same parameter without prior inspection of the data. The same conclusion holds for seasonal patterns.
The overdispersion in the RespSy dataset is largely due to veterinarians sampling several horses in a stable upon suspicion. Thus, in this special case it might be possible to handle the overdispersion by pre-processing the data to remove redundant cases, provided that the same pre-processing is applied to new data on weekly basis. However, when the mechanism behind overdispersion in baseline counts is not so transparent that automatic filtering out redundant cases is possible the NB model will support a correct interpretation of the value of the peak in the count data.
As indicated in Figure 4 the tradeoff between SE and SP differs between seasons. This is natural since in case the (prior) probability of an outbreak differs between seasons the average sensitivity SE avr and specificity SP avr will be given by: ðEq:17Þ where: SE i = sensitivity in season i SP i = specificity in season i P i = (prior) probability of outbreak in season i d i = (relative) duration of season i Thus, by incorporating prior knowledge about the seasonality of the diseases of interest it is possible to achieve a high average sensitivity without sacrificing the PPV and SP. Another important attribute of outbreak detection is timeliness. Whereas there is no general measure of timeliness [28] the number of cases are often small in the first week(s) of an outbreak, increasing the sensitivity (i.e. lowering the threshold for V and thus n) in the high risk season will result in improved timeliness as well as average sensitivity.
In this work we have introduced the framework using models that evaluate evidence from each week independently. Although this simple approach is suitable for presenting the framework and a reasonable choice for an early warning system, the evaluation of evidence from one week at a time is not a fundamental limitation of the approach. A model accounting for accumulation of evidence over several weeks may, for example, be constructed by considering, for each week in the interval [0…j] the conditional probability Where t is the week of interest H t-i is the hypothesis that an outbreak started i weeks before t E t-n is the number of reported cases in week [t-i… t] The probability of observing n outbreak-related cases will not be uniform throughout the outbreak but depend on whether the outbreak is in its first, second or third week etc. When accounting for evidence from several weeks the value of evidence in favor of the hypothesis H 1 ''An outbreak is going on'' against H 0 ''An outbreak is not going on'' will be dependent on the prior probability of an outbreak starting in any of the preceding weeks. This is due to the fact that H 1 is composed of several sub- where O post is the posterior odds of an outbreak going on in week of interest O pri is the prior odds of an outbreak going on in week of interest Although in the more complex models the calculation of the value of evidence would depend on the prior probability of outbreak, the framework is still applicable for communicating the evidence to decision makers. Essentially any Markov Chain model could be applied in the evaluation of evidence framework and the choice of complexity is a tradeoff between on the one hand realism and on the other hand simplicity and transparency. However, we anticipate that in most situations there will not be sufficient data to support very complex models.