The dynamics of GII.4 Norovirus in Ho Chi Minh City, Vietnam

Highlights • NoV was identified in the stools of diarrheal patients and controls in HCMC.• The locations of the NoV infections were GPS mapped.• A novel NoV GII.4-2010 (New Orleans) variant was detected.• The NoV GII.4-2010 demonstrated a significant spatiotemporal signal.


Introduction
Norovirus (NoV) is a non-enveloped positive-sense singlestranded RNA virus belonging to the taxonomic family Caliciviridae (Green et al., 2000;Jiang et al., 1990). NoV accounts for a significant proportion of the global burden of viral gastroenteritis (Glass et al., , 2009Patel et al., 2009Patel et al., , 2008, and up to 50% of allcause outbreaks of diarrhea (Patel et al., 2009). The disease typically presents as acute watery diarrhea with vomiting and a low-grade fever (Patel et al., 2009), and is usually self-limiting, lasting between one and three days, but can be aggressive, severe and protracted in young children, the elderly and the immunocompromised (Estes et al., 2006). NoV has an exceptionally low infectious dose (10-100 viral particles (Patel et al., 2009)) and can survive on surfaces for prolonged time periods (Weber et al., 2010); as a result, NoV frequently causes explosive gastroenteritis epidemics (Estes et al., 2006;Patel et al., 2009Patel et al., , 2008. The 7.5 kb genome of NoV has three open reading frames (ORFs), encoding the RNase-dependent RNA polymerase (RdRp, ORF1), the major capsid protein (VP1, ORF2) and the minor capsid protein (VP2, ORF3) (Bertolotti-Ciarlet et al., 2003;Jiang et al., 1993). The current classification divides NoV into five genogroups (GI -GV) on the basis of sequence identity within the major capsid protein (VP1). Genogroups GI, GII and GIV are associated with infections in humans (Zheng et al., 2006). Molecular characterization of coding sequences within RdRp (region A and B) (Ando et al., 1995Fankhauser et al., 2002;Jiang et al., 1999;Vinje and Koopmans, 1996) and ORF2 (region C, D and E) (Kageyama et al., 2004;Kojima et al., 2002;Noel et al., 1997;Vinje et al., 2004) is targeted for NoV detection, genogrouping and genotyping. Genogroups I and II are the most common cause of human infections (Donaldson et al., 2010), and can be differentiated into 8 GI and 23 GII capsid genotypes, and 14 GI and 29 GII polymerase genotypes (Kroneman et al., 2011;Zheng et al., 2006).
The epidemiology of NoV is complex and is influenced by a multitude of factors, including population immunity, the environment, and seasonality (Donaldson et al., 2010;Marshall and Bruggink, 2011), making molecular epidemiology challenging. The acknowledged interpretation of global NoV epidemiology, particularly GII.4 genotype, is that strain replacement occurs every two to three years (Bull et al., 2010;Bull and White, 2011;CDC, 2010;Donaldson et al., 2008;Siebenga et al., 2009). Over the last two decades, these replacements were typically caused by strains of a single lineage of a GII.4 genotype, which have been responsible for the majority of NoV outbreaks worldwide since being first identified in the USA in the mid 1990s (Bull and White, 2011;Noel et al., 1999). At least four major global NoV replacements have been described since 1995, each due to a novel GII.4 variant (Bull et al., 2006;Noel et al., 1999;Siebenga et al., 2010;Tu et al., 2008), believed to have escaped immunity in the population through antigenic variation (Lindesmith et al., 2012b(Lindesmith et al., , 2012c(Lindesmith et al., , 2011. The majority of NoV studies are performed in industrialized countries and disease outbreaks are continually monitored through several disease surveillance networks (Vega et al., 2011;Verhoef et al., 2009). However, little is known about the transmission, molecular diversity or spatiotemporal dynamics of NoV infections in areas with differing public health infrastructure and demographics. Vietnam is an industrializing country with densely populated urban centers and a changing spectrum of infectious diseases as a presumed consequence of rapid economic development and urbanization (Vinh et al., 2009). NoV was first reported in Ho Chi Minh City (HCMC) in 1999 (Hansman et al., 2004), and our recent work has demonstrated that NoV is endemic throughout the year, in contrast to the winter outbreaks observed in temperate locations (Lopman et al., 2009;Mounts et al., 2000). To understand the epidemiology and NoV strain diversity in HCMC, we investigated the molecular and spatiotemporal distribution of NoV genotypes in young hospitalized children between May 2009 and December 2010 in HCMC, Vietnam.

Study setting and design
This study was conducted according to the principles expressed in the Declaration of Helsinki and was approved by the ethical review boards of Children's Hospital 1 (HCMC), Children's Hospital 2 (HCMC), the Hospital for Tropical Diseases (HCMC), and the Oxford Tropical Research Ethics Committee (OxTREC Approval No. 0109) (Oxford). The parents or legal guardians of the enrolled children were required to provide written informed consent for sample collection and residential mapping.
A stool specimen was collected within 24 h of enrollment from each recruited individual (1443 diarrheal patients and 611 asymptomatic controls). These participants (N = 2054) were children of 0-60 months of age and residents of HCMC, Vietnam, over the study period from May 2009 to December 2010. Diarrheal patients were children with acute diarrheal disease (P3 loose stools or at least one bloody loose stool within 24 h period (WHO, 2005)) who were admitted to the three study sites and had not received treatment with antimicrobials in the three days prior to hospital admission. Asymptomatic controls were diarrhea-free children attending Children's Hospital 1 or Children's Hospital 2 for nutritional health checks or for other gastrointestinal issues unrelated to diarrhea or gastroenteritis without any history of diarrhea, respiratory illness or treatment with antimicrobials within 7 days of study enrollment.

Norovirus detection
Total viral RNA was extracted and reverse transcribed into cDNA as previously described (Tra My et al., 2011). Norovirus genogroup I (GI) and II (GII) were detected in separate reactions by conventional Reverse Transcriptase Polymerase Chain Reaction (RT PCR) using consensus primers, G1SKF/G1SKR (Kojima et al., 2002) and COG2F/G2SKR (Kageyama et al., 2003;Kojima et al., 2002) for GI and GII, respectively. These PCR primers amplify a region between position 5342 and 5671 (330 bp) in the genome of NoV GI (Norwalk/68, GenBank accession No. M87661) containing an overlap of 17 bp of 3 0 end ORF1 and 313 bp of 5 0 end ORF2, and between position 5003 and 5389 (387 bp) in the genome of NoV GII (Lordsdale/93, GenBank accession No. X86557) containing an overlap of 83 bp of 3 0 end ORF1 and 304 bp of 5 0 end ORF2.

Norovirus genotyping
NoV positive PCR amplicons were purified using the QIAquick PCR purification kit (QIAGEN, Hilden, Germany), and subjected to direct sequencing using the amplification primers. DNA concentrations were determined using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, United Kingdom) and direct sequencing was performed using a BigDye Terminator Cycle Sequencing kit (Applied Biosystems, USA) and generated with an ABI Prism 3130xl Genetic Analyzer (Applied Biosystems, USA). DNA sequences were assembled using DNA Baser Sequence Assembler v3.0.17 (Heracle Biosoft, Pitesti, Romania). NoV genotypes were assigned based on ORF2 sequences using the online Norovirus Automated Genotyping Tool, as directed (Kroneman et al., 2011).

Construction of NoV phylogenies
DNA sequences were uploaded into GenBank (HE716437 to HE716751) and used for local phylogenetic construction. Manual alignment of all sequences was performed in Se-AL (http://tree.bio.ed.ac.uk/software/figtree/) prior to phylogenetic reconstruction. Maximum likelihood (ML) trees were inferred using RAxML (Stamatakis et al., 2008), employing the general-time reversible model of nucleotide substitution with a gamma distribution of among-site rate variation (GTR + C) and 1000 bootstrap replicates.
Two hundred and sixty-nine global GII.4 strains encompassing the global diversity of GII.4 variants were retrieved from GenBank (http://www.ncbi.nlm.nih.gov/); 43 of these strains originated from previous studies conducted in Vietnam (Hansman et al., 2004;Nguyen et al., 2008Nguyen et al., , 2007aNguyen et al., , 2007bTamura et al., 2010;Trang et al., 2012). In addition, available archived samples that were Enzyme-Immuno Assay (EIA) positive for NoV (GI and GII) from our previous work in 2008 from southern Vietnam (Tra My et al., 2011) were also selected for analysis and genotyped using the same methodology.
The time of isolation for each of the NoV strains was retrieved from GenBank or the publication associated with the sequence, and the year of isolation was used to calculate evolutionary rate. All sequences were aligned in Se-AL and trimmed to 378 bp to correspond with the sequences identified in this study to maximize sequence homology for phylogenetic reconstruction. Phylogenetic reconstructions of relationships among the GII.4 variants identified in this study and global GII.4 sequences were inferred using the Bayesian Markov chain Monte Carlo (MCMC) method as implemented in BEAST (Drummond and Rambaut, 2007). A GTR substitution model with gamma-distributed rate variation and a relaxed uncorrelated lognormal clock model with a constant population size were employed. The MCMC analysis was run for 50 million generations (with a burn-in of 5 million) and analyzed using Tracer (http://tree.bio.ed.ac.uk/software/tracer/) to ensure that all parameters had converged. Maximum clade credibility trees were annotated using TreeAnnotator v1.6.1 (BEAST) and visualized in FigTree v1.3.1. The weighted average evolutionary rates across branches were assessed in Tracer.

Mapping of corresponding residential addresses
The location of each enrollee's residence was recorded using an eTrex Legend GPS device (Garmin, United Kingdom) and verified by an additional member of the study team. Latitude and longitude of each residence (recorded in decimal degrees) were entered along with patient metadata in Microsoft Excel (Microsoft, Redmond, USA). Location data were converted to KML format and locations were visualized and validated in Google Earth version 5 (http:// www.google.com/earth/index.html) (Supplementary Figure).

Spatiotemporal analyses
Mantel tests were performed to assess potential correlations between genetic, temporal, and spatial distances of GII strains and variants within the GII.4 clade, using the ade4 package in R (R_Development_Core_Team, 2011) (ww.ats.ucla.edu/stat/r/faq/ mantal_test.htm). A Bernoulli model was used to examine spatiotemporal clusters of GII.4-2010, using all non-GII.4-2010 to represent the background distribution of the NoV population using SaTScan v9.1.1 software (http://www.satscan.org/). For the current analysis, the upper limit for cluster detection was specified as 10% of the study population over 10% of the study duration. The significance of the detected clusters was assessed by a likelihood ratio test, with a p-value obtained by 999 Monte Carlo simulations generated under the null hypothesis of a random spatiotemporal distribution.
The genotyping data were combined with isolation dates to illustrate the distribution of GII.4 variants over the period of sample collection (Fig. 1)

Phylogenetic analyses of NoV sequences
Phylogenetic analyses were performed on all GI and GII NoV sequences, and the mean uncorrected genetic distances among the strains within the GII and between variants within the GII.4 genotype (pairwise distance of maximum composite likelihood calculation) were 0.147 and 0.016 substitutions/site, respectively. Based on this primary phylogenetic analysis, the GI strains were excluded and the GII strains were subsampled by removing identical GII sequences to reconstruct a maximum likelihood phylogenetic tree summarizing the genetic diversity present in HCMC (N = 109) (Fig. 2).
Sequences of the two GII.4 variants from HCMC (N = 247) were compared with 269 global sequences and 10 selected GII.4 sequences isolated in 2008 in southern Vietnam (Tra My et al., 2011) (Fig. 3). Using the Bayesian MCMC method and timestamped sequences, the evolutionary rate of NoV GII.4 was estimated to be 8.072 Â 10 À3 substitutions/site/year (95% Highest Probability Density (HPD): 6.195 Â 10 À3 , 1.012 Â 10 À2 ). The GII.4-2006b sequences from NoV originating in Vietnam fell in the same clade as global GII.4-2006b viruses, with clustering unrelated to the time or place of isolation. This GII.4-2006b lineage could be further divided into two sub-lineages; strains from HCMC could be found in both, confirming co-circulation of divergent GII.4-2006b viruses. Notably, the upper sub-lineage contained more sequences from this study while more Vietnamese strains from previous studies fell in the lower sub-lineage. The GII.4-2010 strains clustered in a single lineage, separate from the GII.4-2006b lineage. The GII.4-2010 lineage could be differentiated partially by location, with Vietnamese and Belgian sub-lineages stemming from the New Orleans GII.4-2010 variant.

Spatiotemporal clustering of NoV in HCMC
The temporal data suggested that a NoV strain replacement occurred during the period of investigation. There was a significant association between the genetic distance of strains within GII and their date of isolation (p < 0.0001; Mantel test), this association was particularly apparent between the GII.4 sequences and their isolation date (p < 0.0001; Mantel test). However, there was no similar association between geographical distance and genetic distance (p = 0.197 for GII strains; p = 0.844 for GII.4 sequences), or between isolation date and geographical distance (p = 0.248 for GII strains; p = 0.851 for GII.4 sequences). These data indicate a lack of a local transmission signal of NoV in HCMC. Yet, a spatiotemporal cluster detection analysis performed in SaTScan supported our original hypothesis, detecting a cluster of six GII.4-2010 NoV (over other NoV GIIs (0.59 expected)) in a 3.8 km radius in the northeast of the City (relative risk = 12.65, p = 0.0003) (Fig. 4), indicating that the initial dynamics of GII.4-2010 were highly localized during their introduction period into HCMC.

Discussion
There are inadequate data regarding the burden of NoV disease in industrializing countries such as Vietnam; this limits our knowledge of viral distribution, transmission chains and local microevolution. Data on NoV genotype distribution across a range of geographical locations through time is essential for understanding global NoV epidemiology. This is particularly important with respect to the ongoing development and clinical trials of NoV vaccines (Atmar et al., 2011;El-Kamary et al., 2010;Parra et al., 2012), which should be developed in consideration of global and regional strain circulation and their ability to induce cross-protection. Here, by examining the genetic, spatial and temporal dynamics of NoV in children in HCMC, we aimed to assess the local molecular epidemiology of NoV. Our data show a diverse array of NoV genotypes and the emergence of a novel variant. The emergence of the GII.4-2010 and subsequent lack of GII.4-2006b isolates suggest that a rapid strain replacement event may have occurred in the population, although the small numbers of isolates from the latter half of the study preclude strong inference on these dynamics.
Strains belonging to GII are responsible for the vast proportion of human NoV infections worldwide, and GII.4 variants play a particularly important role in pediatric NoV infections (Bull and White, 2011;Lopman et al., 2004;Siebenga et al., 2009). Here, a variety of NoV GII genotypes were found to be co-circulating, with GII.4 predominating. This observation is consistent with work originating in northern Vietnam (Trang et al., 2012) and other locations across Asia (Zeng et al., 2012), and we confirm that GII.4-2006b has continued to circulate in southern Vietnam since it was first detected in 2005 (Nguyen et al., 2008).
The GII.4-2010 variant detected here in December 2009, and first identified in October 2009 in New Orleans (USA) (Vega et al., 2011), was also reported in Belgium (Mathijs et al., 2011) and then internationally (Greening et al., 2012;McAllister et al., 2012;Nguyen and Middaugh, 2012;Puustinen et al., 2011;White et al., 2012), suggesting that this variant is the first globally disseminated strain to emerge since the pandemic GII.4-2006b (Minerva). The phylogenetic analyses demonstrated that the GII.4-2010 strains from the USA, Belgium and Vietnam were closely related, suggesting that these strains may have been introduced into Vietnam from the USA or Europe in 2009. Furthermore, the substitution rate for GII.4 (8.072 Â 10 À3 substitutions/site/year) estimated here is higher than previously reported (between 3.9 Â 10 À3 and to 5.3 Â 10 À3 substitutions/site/year) (Bok et al., 2009;Bull et al., 2010;Siebenga et al., 2010). This new estimate might reflect an increase in the rate of GII.4 evolution involving GII.4-2010 viruses, since Bok et al. (2009), Bull et al. (2010 and Siebenga et al. (2010) would have used a different sequence dataset (i.e. no GII.4-2010 sequences) given that their work was conducted prior to the emergence of GII.4-2010 (Bok et al., 2009;Bull et al., 2010;Siebenga et al., 2010). However, it is important to note that differences in the region of sequence selected for analysis (partial 5 0 capsid herein versus complete capsid in previous studies), in addition to the differential method of evolutionary inference (linear regression, strict or relaxed clock, uncorrelated lognormal or exponential model) or the measured unit of time, do not enable an accurate evolutionary comparison between studies. Nevertheless, the phenomenon observed in this study certainly warrants further investigation.
Our study has some limitations; including not tracking the source and/or route of transmission or examining the genotype distribution after the period of investigation. Therefore, it is difficult to determine if the shift in the distribution of the GII.4 variants over time is due to an emergent virus becoming fixed in the population, or a local outbreak in the northeast of the city. The short temporal investigation also limits the determination of the magnitude to which the local NoV dynamics observed in HCMC reflect or follow the global evolutionary trend, such that we are unable to determine whether GII.4-2010 viruses continued circulating within this setting after the study period or were capable of diffusing across the country in the presence or absence of GII.4-2006b viruses. The status of population-level immunity to NoV in the population of HCMC is unknown, so we are unsure if exposure to GII.4-2006b NoV is protective against the -2010 variants. These findings highlight a broader scientific issue concerning outstanding questions on immune cross-protection in the space of NoV variants (Lindesmith et al., 2012c(Lindesmith et al., , 2011. Furthermore, the analysis was not performed on whole genome sequences and focused on a fragment of the genome, which may restrict the phylogenetic interpretation. Whole genome sequencing would greatly improve the utility of NoV epidemiological datasets, specifically to study the evolution of novel GII.4-2010 variants, aiding the detection of genomic sites that may induce potential antigenic variation. Finally, our hospitalbased study design may be influenced by healthcare-seeking behavior, and may not be representative of the NoV in the local community.

Conclusions
This study expands the knowledge of NoV in industrializing countries, outlining a range of endemic NoV genotypes over a one-year period in HCMC. The analysis describes the co-circulation of heterogeneous NoV strains, and reports the identification of