An Integrated Approach to Identifying International Foodborne Norovirus Outbreaks

Surveillance gaps can be bridged through analysis of combined molecular and epidemiologic data.

International foodborne norovirus outbreaks can be diffi cult to recognize when using standard outbreak investigation methods. In a novel approach, we provide stepwise selection criteria to identify clusters of outbreaks that may involve an internationally distributed common foodborne source. After computerized linking of epidemiologic data to aligned sequences, we retrospectively identifi ed 100 individually reported outbreaks that potentially represented 14 international common source events in Europe during 1999-2008. Analysis of capsid sequences of outbreak strains (n = 1,456), showed that ≈7% of outbreaks reported to the Food-Borne Viruses in Europe database were part of an international event (range 2%-9%), compared with 0.4% identifi ed through standard epidemiologic investigations. Our fi ndings point to a critical gap in surveillance and suggest that international collaboration could have increased the number of recognized international foodborne outbreaks. Real-time exchange of combined epidemiologic and molecular data is needed to validate our fi ndings through timely trace-backs of clustered outbreaks. N oroviruses are the most prevalent causative agents of acute viral gastroenteritis in the community (1)(2)(3)(4). Currently, 5 norovirus genogroups have been described and subdivided into at least 40 genotypes (5,6), but in recent years, most clinical effects have been caused by viruses from a single genotype in genogroup II, GII.4 (7)(8)(9)(10). The symptoms of norovirus disease are usually mild and self-limiting, but there is some evidence the disease can contribute to proportion of deaths (11,12). Infection occurs by way of the gastrointestinal tract after contact with infected persons, after ingestion of contaminated food or aerosols, or through environmental contamination (13,14).
Because the different modes of transmission call for quite distinct control measures, it is important to assess which proportion of disease can be attributed to which mode of transmission. However, this question is diffi cult to answer. Due to the high rate and rapidity of secondary spread of norovirus infection following a foodborne introduction, outbreaks initially linked to a food source may appear to be person-to-person (PTP) outbreaks by the time they are recognized. Even if a foodborne source is suspected, confi rmation of the source is complicated. Virus detection in food commodities is possible but hampered by such factors as low levels of norovirus in food, food matrix complexity, genetic variability of norovirus (15), the absence of an effi cient cell culture system to propagate human noroviruses (16), and the unavailability of leftover food for pathogen detection.
Given the globalization of the food market, diffuse international outbreaks are likely (17,18). For public health offi cials, these may seem to be regular PTP outbreaks because infection of 1 or a few persons with viruses through food consumption will go unnoticed unless secondary spread occurs or the contaminated food is consumed by multiple persons, which may trigger an investigation to identify a source. However, identifi cation of international links is complicated. Viruses remain infectious in frozen ready-to-eat products over prolonged periods, and linked outbreaks are likely to be separated in time (19). Other problems are virus mutation rate, which results in nonidentical strains from a common source (20); sewage contamination with multiple nonsimilar strains during production of shellfi sh or crops (21); underreporting of cases (22,23); and incompleteness of outbreak reports (24,25). Other complicating factors include the unknown background level of viruses in foods, the environment, or asymptomatic shedders. Clearly, methods combining molecular and basic epidemiologic criteria are needed to assist public health efforts to identify international foodborne outbreaks.
For this reason, we performed a retrospective analysis of norovirus outbreak surveillance data collected since 1999 by Food-Borne Viruses in Europe (FBVE), a combined laboratory and epidemiology network (6). Although the name FBVE suggests a foodborne focus, the network actually investigates outbreaks of viral gastroenteritis with all modes of transmission. It seeks to obtain a comprehensive overview of viral activity in the community and to enable capture of foodborne norovirus outbreaks that have evaded recognition. Strain sequences from outbreaks linked to a common source are expected to be more similar than strains from outbreaks with a different source (26). We sought to quantify strain variability within and among molecular sequence clusters of multiple outbreaks to identify outbreaks with probable links to other outbreaks. Our goal was to retrospectively identify potential common-source events not detected by routine investigations and also to provide criteria that may assist in detecting such events.

Defi nitions
A norovirus outbreak was reported to FBVE when it included a minimum of 2 patients in the same area within 2 days who had >2 instances of vomiting and/or watery diarrhea within a 24-hour period (6,27). A gastroenteritis outbreak was ascribed to norovirus on the basis of compatible descriptive epidemiology and laboratory confi rmation in at least 2 of 5 feces samples tested (28). An outbreak strain was defi ned as a sequenced norovirus strain considered representative of an outbreak (found preferably in >2 samples from patients in the same outbreak). If dissimilar sequences were detected, multiple strains were considered representative. A genotype is a group of closely related noroviruses, i.e., showing >80% similarity in the complete capsid amino acid sequence. Genotypes can be assigned based on shorter sequences if a full capsid was previously identifi ed and sequenced for comparison (29). In this report, a cluster refers to a molecular cluster of multiple outbreaks, not an epidemiologic cluster of patients in 1 outbreak. A cluster of similar sequences is a group of outbreak strain sequences in the same genotype that show a minimal number of mutations within the region of overlap; the exact number of mutations depends on the sequence length in the region and the cutoff value used to defi ne similarity. A cluster of identical sequences is a group of outbreak strain sequences with the highest possible similarity (100%). According to reporting standards of the FBVE network (24,30), the suspected mode of transmission during an outbreak was considered foodborne when the infection was related to consumption of food contaminated during its production or processing; food handler-borne (FHB) when infection related to food prepared by an infected food handler; person-borne when it related to direct contact with infected persons; and unknown (UN) when no mode could be identifi ed.

Selection of Strains Representing Outbreaks
From January 1999 through November 2008, the FBVE network collected molecular information on a total of 5,499 norovirus outbreaks in Denmark, Finland, France, England and Wales, Germany, Hungary, Ireland, Italy, the Netherlands, Norway, Slovenia, Spain, and Sweden (24,30). Strengths and limitations of the FBVE data collection have been described (24). FBVE data are reported aggregated at outbreak level. Consequently, throughout the analysis here described, a strain represents an outbreak (i.e., outbreakrepresentative strain), and a cluster is a molecular cluster of outbreaks (i.e., cluster of outbreak-representative strains).
Because the norovirus genome shows its highest variability in the capsid, comparing sequences from this region will yield the lowest number of identical strains (31). Therefore, regions C and D, both located in open reading frame (ORF) 2 at the capsid gene, were our regions of choice for identifi cation of linked outbreaks. All norovirus outbreak strains reported to the FBVE network from January 1999 through November 2008 were included if a full or partial capsid sequence was involved. This yielded 1,504 outbreak-representative strain sequences reported by all above-mentioned countries. Sequence lengths varied between 90 nt and 1,640 nt. We used sequences including ORF2 nt positions 1-300 (93%) and other targets within ORF2 nt positions 300-1,620 (7%), including full capsid genes (8%).

Assignment of Genotypes
We classifi ed genotypes on the basis of their similarity to reference strains representing known genotypes using the norovirus typing tool (www.rivm.nl/mpf/norovirus/ typingtool). In this study, the ORF2 reference set was used for genotyping.

Alignments of Strains Representing Outbreaks
Nucleotide sequence alignment and similarity calculation according to the neighbor-joining method were performed for all ORF2 outbreak strain sequences within genotypes by using Bionumerics 5.1 (Applied Maths, Kortrijk, Belgium). If sequences from non-overlapping nucleotide positions were included, these sequences were separately aligned within the involved nucleotide positions.

Analysis of Clustering Strains Representing Outbreaks
Alignments were imported into the R project version 2.8.0 (http://cran.r-project.org) for analysis in 6 steps ( Figure). In step 1, the APE (32) and the seqinR (33) packages in R were used to assign numbers to clusters of identical outbreak strain sequences, according to pair-wise comparison of strains within genotypes. Cluster numbers were assigned to enable rapid and computerized linking of the large molecular and epidemiologic datasets and allow systematic statistical analysis of combined data.
In step 2, the characteristics of outbreak strain sequence clusters were compared with respect to the following aspects: frequency of clusters within genotypes, sizes of clusters, the overlapping number of nucleotides, number of countries involved, period over which outbreaks were reported, transmission modes as reported in the categories foodborne, FHB, person-borne, and UN. On the basis of available information, reported sources of infection in foodborne outbreaks were allocated to the following categories: fi lter-feeding bivalve shellfi sh (including oysters and mussels); berries (including raspberries, blueberries, and strawberries); water (including water related to food preparation, irrigation, and contaminating fl oods, but no shellfi sh or berries reported); ready-to-eat (including foods like bread, sandwiches, layer cakes, food purchased at a delicatessen, salad, but no shellfi sh or berries reported as one of the ingredients); and other (including self-served meals, buffet or catering, with multiple food items but none of the previous food classes reported). Outbreak strain sequence clusters that included at least 1 foodborne outbreak were selected for further analysis, and designated "possible foodborne clusters." In step 3, we used the APE package of R to stepwise extend all possible foodborne clusters to include sequences with similarities of 99.5%, 99%, 98%, 97%, 96%, and 95%. This was done to determine the cutoff level, i.e., the level of similarity needed to recognize potentially linked outbreaks and thereby to assist in the confi rmation of defi nitely linked outbreaks.
In step 4, p values were calculated to determine association with food for a chosen cutoff level. To do so, the frequencies of the transmission mode for each strain were considered a random draw from the frequencies of this transmission mode in the background population in the database as a null hypothesis, i.e., as random draws from a binomial distribution, and thus the probability of fi nding such a cluster by chance is low. For example, a cluster of 5 strains that includes 2 foodborne (i.e., 40%) has a probability of 0.02 to be found coincidentally in a total dataset containing 5% foodborne outbreaks. This cluster is then considered to be signifi cantly associated with foodborne transmission. Such calculations were done to determine the following: 1) the association of genotypes with foodborne transmission, i.e., foodborne genotype, with the frequency of foodborne outbreaks in the genotype considered as a random draw from the total dataset; 2) association of clusters with foodborne transmission, i.e., the frequency of the foodborne mode of transmission for the specifi c cluster considered as a random draw from all strains in the genotype; and 3) association of clusters with a specifi c food class, i.e., the frequency of the food class for the specifi c cluster considered as a random sample from all foodborne outbreaks. These calculations were the basis for the transition from possible to probable foodborne clusters.
In step 5, the calculations of step 4 were used to narrow all possible foodborne outbreaks to probable foodborne outbreaks, i.e., those clusters that were more likely to be related to food. The clusters that were signifi cantly and borderline signifi cantly associated with food were selected according to 3 selection criteria: 1) all possible foodborne clusters in foodborne genotypes; and for the non-foodborne genotypes; 2) those possible foodborne clusters that were signifi cantly or borderline signifi cantly associated with foodborne transmission; and 3) those possible foodborne clusters that were signifi cantly or borderline signifi cantly associated with a specifi c food class. Clusters selected through these criteria were designated probable foodborne clusters; p values were considered signifi cant if p<0.05, and borderline signifi cant if 0.05<p<0.10. In step 6, outbreaks that could be linked and internationally disseminated were selected from probable foodborne clusters if they involved >2 countries.

Verifi cation of Outcomes
The above selection criteria were applied to the FBVE database to retrospectively identify clusters of outbreaks that may have involved international dissemination of food. To verify the criteria of the approach as described above, the selected clusters were compared with the clusters previously reported to FBVE as linked outbreaks, as a measure of sensitivity of the approach (i.e., ability to detect true clusters).

Estimate of Proportion of Common-Source Foodborne Outbreaks
The frequency of linked foodborne outbreaks, both at national and international levels, was calculated as a proportion of the total number of reported outbreaks, based on the above analyses. Due to uncertainty of the causal, consequential, or coincidental relationship between outbreaks in molecular clusters, estimates were given for low, most likely, and high values of the number of linked foodborne outbreaks. Low values were calculated as the actual frequency of outbreaks in the specifi c clusters reported to be foodborne. High numbers were the total number of outbreaks in the specifi c clusters, i.e., including all reported transmission modes. For the likely values, outbreaks from unknown transmission were extrapolated proportionally to the foodborne outbreaks reported in the specifi c cluster. Thus, likely values were calculated as follows: for a cluster of x outbreaks containing a (foodborne), b (PTP), and c (FHB), and d (UN) outbreaks, the high value is x, the low value is a, and the likely value is d c b a a a × + + + Likely values with range for low and high values were subsequently compared with results of the epidemiologic overviews currently used in outbreak reporting in Europe (24,25).

Assignment of Genotypes
Genotyping resulted in clustering of reported outbreak strains into 23 ORF2 genotypes for 1,456 (97%) of 1,504 sequences. For the remaining 48, sequence data provided were insuffi cient for assignment of a genotype.
When the cutoff for strain similarity was lowered step-wise in R ( Figure, step 3), logically, the number of distinctive clusters decreased, whereas the size of each cluster increased. The similarity cutoff differed between genotypes. Six genotypes (I.1, I.4, I.5, II.1, II.5, and II.8) yielded a cluster of strains that remained distinct regardless of the cutoff used. For the other genotypes, lowering the cutoff to similarity levels of 99.5% or 99% showed a sharp drop in the number of distinct clusters, i.e., fewer clusters; as a consequence, clusters increased in size. For 7/14 genotypes the number of such clusters dropped to 50% at cutoff value of 99.5%. At 99%, this was the case for 10/14 genotypes (data not shown). Because we aimed to provide a conservative estimate for linked outbreaks for all genotypes, 100% similarity was chosen as the cutoff for further analysis steps.
Probable foodborne clusters of outbreaks were selected from 38 possible foodborne clusters based on 3 criteria for statistical association with food (Figure, step 4; online Appendix Table 2, www.cdc.gov/EID/content/17/3/412-appT2.htm): 1) twenty-two clusters in 8 genotypes (I.1, I.4, I.5, I.6, II.1, II.2, II.6, II.7, II.8) signifi cantly or borderline signifi cantly more often contained foodborne outbreaks, compared with the total dataset; 2) fi ve additional clusters showed nonfoodborne genotypes for which the specifi c transmission mode foodborne was more frequently reported than in the genotype; and 3) two additional clusters were associated with a food class, compared with the frequency of these food classes reported for all foodborne outbreaks. Fourteen of these 29 probable foodborne clusters involved >1 country and were therefore labeled as probable internationally disseminated foodborne outbreaks (Figure,  step 6).

Validation of Criteria
In the FBVE dataset of 1,456 capsid sequences, 36 outbreaks had previously been identifi ed as linked outbreaks in 10 clusters, based on standard epidemiologic investigation (24). In contrast, in the present study, 29 clusters of interest involving 122 likely linked outbreaks (range 51-166) were retrospectively identifi ed (online Appendix Table 2). Of the 10 previously reported FBVE outbreak clusters, 8 were identifi ed by using the approach described in this paper. These 8 clusters involved 32 likely linked outbreaks (range 18-69) and included 2 international and 3 national clusters, plus 3 clusters reported as national but containing sequences identical to those from outbreak strains reported elsewhere. The 2 FBVE clusters that were missed by our analysis involved 3 outbreaks with 3 different genotypes involved, and 3 outbreaks for which 2 different food classes were reported (ready-to-eat and other). Both food classes ended up nonsignifi cant for this cluster in step 5 ( Figure) of the analysis.

International Clusters Potentially Linked through a Common Source
Previously, 36 of 1,456 (2.5%) outbreaks reported through the FBVE network had been linked to a common source, of which 6 (0.4%) involved events in multiple countries. Our use of the stepwise criteria described here resulted in a signifi cant increase to 122 of 1,456 (8.4%, range low-high: 51-166) potential common-source outbreaks, of which 97 (6.7%, range 29-130) involved events in >1 country (data not shown).

Discussion
Our analysis suggests that 7% (range 2%-9%) of norovirus outbreaks reported through the FBVE network are likely to be international outbreaks with a common source. Our estimate is at least 5-fold higher than the 0.4% recognized through routine investigations. We showed that the proportion of linked foodborne outbreaks can be estimated with a sensitivity of 80% by using step-wise selection criteria combining molecular and epidemiologic information and derived from a large background dataset. The computerized linking of epidemiologic data to aligned sequences in R project for statistical computing considerably reduced the time needed for analysis and was an essential prerequisite of this novel approach. As sequencing becomes less expensive and public health databases expand, the utility of our approach for public health decision-making will increase (34).
Several research groups have made efforts to estimate the public health effects of norovirus and foodborne disease, fi nding that viral illness varies between 1/780 UK inhabitants and 1/33 US inhabitants (1,(35)(36)(37). For Europe, we previously estimated that 21% of all norovirus outbreaks were caused by food (38), but that report did not consider potential (international) links between outbreaks. In our current study, we found that 2%-9% of all reported outbreaks may be linked to a common source with international distribution. Because this study was done retrospectively, we could not collect additional data to verify suspected clusters. To prove this with certainty, the analysis should be done in real time and involve more in-depth outbreak investigations to establish a risk food with epidemiologic approaches and possibly food testing. However, we do see this as a novel approach to provide the basis for estimates of the prevalence and public health consequences of foodborne disease. Past studies have used gross extrapolations of data estimating the proportion of reported noroviral disease that can be attributed to food, but have not included the effect of outbreaks. We suggest a basis for such estimates, and especially the proportion attributable to foodborne transmission.
Our approach most likely provides a conservative estimate, because it relies on identical sequence clusters and does not include outbreaks caused by strains that are phylogenetically closely related. Given the mutation rate of genotype II.4 noroviruses (39), closely related strains could well represent linked outbreaks. This mutation rate, as well as the similarity cutoff, may be genogroup or genotype specifi c (online Appendix Table 1). When a single similarity cutoff for all genotypes is used as a selection criterion, any mutation counts equally. Nevertheless, a particular mutation may indicate that strains share a common ancestor, and a mutation in a particular genotype may indicate either a longer or shorter genetic distance. Therefore, phylogenetic analysis is needed to identify additional linked outbreaks involving closely related strains.
A shortcoming of our methods is that we would miss common source events that involve >1 strain, as has been described in some examples that involved sewagecontaminated shellfi sh (6,19,40). Nevertheless, we detected 3 of 4 linked outbreaks involving multiple genotypes, which indicates that such outbreaks are likely to show other characteristics that can be captured by our criteria.
A prerequisite for our approach is the availability of combined epidemiologic and laboratory data. National surveillance systems differ in their potential for matching these data, in the intensity of surveillance, and in the attention given to foodborne outbreaks. With no special focus, foodborne outbreaks are likely to be underreported, as recognition will be complicated by rapid emergence of PTP transmission (10). Moreover, many persons are involved in data entry, which may have had consequences for data quality because of human error (24). Unevenness of data quality may hamper international comparisons to detect foodborne outbreaks (25). For instance, France and Denmark have been reporting primarily foodborne outbreaks, whereas other countries include a wider range of transmission modes. An added value of our approach is refl ected in its fi nding of foodborne outbreaks in France and Denmark and in other countries as well, e.g., the United Kingdom and Germany, which are less focused on identifi cation of foodborne outbreaks. Despite the fact that underreporting of foodborne outbreaks in our dataset was likely, our criteria may thus provide insight into the number of foodborne outbreaks occurring in countries whose surveillance systems may miss such outbreaks.
In conclusion, combined epidemiologic and molecular analysis can recognize internationally disseminated outbreaks that may share a common foodborne source.
Step-wise selection criteria can be derived from an extensive background dataset and used to retrospectively estimate the proportion of international outbreaks that share a foodborne source. Prospective use of the criteria needs to be validated through real-time data sharing and timely follow-up of outbreak clusters. Our fi ndings nevertheless show that current surveillance has a critical gap, which can be bridged through systematic analysis of combined molecular and epidemiologic data.