Population Genomics of GII.4 Noroviruses Reveal Complex Diversification and New Antigenic Sites Involved in the Emergence of Pandemic Strains

Noroviruses are an important cause of viral gastroenteritis around the world. An obstacle delaying the development of norovirus vaccines is inadequate understanding of the role of norovirus diversity in immunity. Using a population genomics approach, we identified new residues on the viral capsid protein (VP1) from GII.4 noroviruses, the predominant genotype, that appear to be involved in the emergence and antigenic topology of GII.4 variants. Careful monitoring of the substitutions in those residues involved in the diversification and emergence of new viruses could help in the early detection of future novel variants with pandemic potential. Therefore, this novel information on the antigenic diversification could facilitate GII.4 norovirus vaccine design.

individuals (4)(5)(6). Norovirus outbreaks usually occur during the winter season in enclosed settings such as schools, hospitals, military facilities, and cruise ships. Because norovirus is highly contagious, outbreaks can be hard to control.
The norovirus genome is a positive-sense, single-stranded RNA molecule that is organized into three open reading frames (ORFs). ORF1 encodes a polyprotein that is cotranslationally cleaved by the viral protease into six nonstructural (NS) proteins required for replication. ORF2 encodes the major capsid protein (VP1) and ORF3 the minor capsid protein (VP2). The norovirus capsid consists of 180 copies of VP1, arranged in T ϭ 3 icosahedral symmetry. X-ray crystallography of the VP1 revealed two structural domains: the shell (S) domain and the protruding (P) domain. The S domain is relatively highly conserved and forms the core of the capsid, while the P domain is more variable and extends to the exterior of the capsid protein (7,8). The P domain interacts with host attachment factors, namely, human histo-blood group antigen (HBGA) carbohydrates, which could facilitate infection. Antibody (Ab)-mediated blocking of the VP1:HBGA interaction correlates with protection against norovirus disease (9,10). Expression of VP1 results in the self-assembly of virus-like particles (VLPs) that are structurally and antigenically similar to native virions (7,11,12). Given the lack of a traditional cell culture system for human norovirus, experimentally developed VLPs have been an important tool to study norovirus immune responses and vaccine design.
Norovirus strains are highly diverse, with at least seven genogroups (GI to GVII) and over 40 genotypes defined based on differences in their VP1 sequences (13). While over 30 different genotypes from GI, GII, and GIV can infect humans, noroviruses from the GII.4 genotype are responsible for at least 70% of infections worldwide (14). Since the mid-1990s, six major norovirus GII.4 pandemics have been recorded worldwide and were associated with the following variants: Grimsby 1995 (or US95_96), Farmington Hills 2002, Hunter 2004, Den Haag 2006b, New Orleans 2009, and Sydney 2012 (15)(16)(17). The predominance of GII.4 viruses has been linked to the chronological emergence of variants in the human population, with new variants emerging around the time of the previous declines (15). The emergence of these variants has been correlated with changes on five different variable antigenic sites (namely, sites A to E) that map on the surface of the P domain; thus, new viruses can evade the human immune responses elicited to previously circulating variants (11,(18)(19)(20). Using the recently developed cell culture system for human noroviruses (21), two of these sites have been confirmed to be involved in virus neutralization (22). Studies have shown that antibodies that map to these sites can block the interaction of VP1 with carbohydrates from the HBGA; however, the antigenic sites of several monoclonal antibodies (MAbs) with blocking activity, raised against GII.4 viruses, have not been determined (11,12,19,(23)(24)(25). The evolving nature of GII.4 noroviruses could challenge the development of crossprotective vaccines against noroviruses; therefore, a better understanding of the mechanisms responsible for the antigenic diversification will facilitate vaccine design.
In this study, we adopted a large-scale genomics approach to identify sites that play a role in GII.4 evolution and antigenic diversification. We (re)defined the sites involved in the antigenic make-up of GII.4 pandemic variants, and found that intravariant diversification exhibited a stochastic pattern of evolution. Importantly, we identified four sites (amino acids 352, 357, 368, and 378) implicated with the emergence of predominant GII.4 variants, and that could help in the early detection of the next pandemic variant.

RESULTS
GII.4 intervariant evolution is characterized by the accumulation of substitutions in the P domain. In order to investigate the evolutionary patterns of GII.4 strains, we calculated the genetic differences of 1,601 nearly full-length (Ն1,560-nucleotide [nt]) VP1 sequences from GII.4 strains collected from 1974 to 2016. The phylogenetic tree of these sequences showed the presence of at least 11 different GII.4 variants emerging since 1995 (Fig. 1a). As shown previously (15,26), GII.4 strains presented a chronological replacement of variants, with several unassigned intermediate strains (see Fig. S1 in the supplemental material). Genetic analyses revealed an accumulation of substitutions in both nucleotide (Fig. 1b) and amino acid (Fig. 1c) sequences (coefficients of determination [R 2 ] of linear regression ϭ 0.87 and 0.78, respectively). This pattern of accumulation of mutations was observed in all the subdomains of VP1 (S, P1, and P2; Fig. S2); however, higher slopes were noted in P2, where the five variable antigenic sites (A to E) are located, thus suggesting their role in the evolution and antigenic diversification of GII.4 noroviruses.
Identification of new antigenic sites of GII.4 noroviruses. To pinpoint the role of each amino acid within the P domain in the evolution of GII.4, we calculated the Shannon entropy to measure the residue diversity at the intervariant and intravariant levels. Because Grimsby-like viruses were the first recorded to cause large outbreaks worldwide, entropy values were calculated with strains (1,572 VP1 sequences) detected from 1995 to 2016. While minimal diversity was detected at the intravariant level (Fig. S3), the intervariant level revealed three substitution patterns: (i) a large number (87%, 285/325) of highly conserved residues, which include conserved antigenic site F (27) and likely maintain the structural integrity of the P domain; (ii) a small number (5%, 17/325) of variable residues that map on previously determined antigenic sites A to E (19); and (iii) 23 variable residues that map outside those antigenic sites (Fig. 2a). Among the 23 nonantigenic variable sites comprising the third pattern, 2 residues (residues 228 and 255) were surrounded by conserved residues on the surface of VP1, and 14 residues formed clusters (motifs) on the surface that could represent new antigenic sites or extensions of previously predicted, uncharacterized antigenic sites FIG 2 Conservation analyses redefined antigenic sites of the major capsid protein (VP1) from GII.4 noroviruses. (a) Shannon entropy was calculated to quantify amino acid variation for each site in the VP1. Analyses were calculated with strains (1,572 VP1 sequences) detected from 1995 to 2016. Data from the P domain are included here (amino acids 216 to 540). Residues were grouped depending on the degree of variability into the following categories: conserved sites (Shannon entropy value Յ 0.3), sites mapping on antigenic sites (columns A to E), and variable sites that map outside antigenic sites (left-side dot plot). Based on structural analyses, 14 variable residues that mapped outside antigenic sites were clustered as part of novel motifs (potential antigenic sites) or extension of previously defined antigenic sites (right-side dot plot). (b) Residues forming these novel or expanded motifs/antigenic sites on the surface of the GII.4 major capsid protein are colored accordingly.
Tohma et al. ® (Fig. 2b). One motif comprises residues 339, 340, 341, 375, 376, 377, and 378. Because this motif included two residues (340 and 376) previously described as being among the five major antigenic sites (20), we extended the number of residues forming this antigenic site, site C (Fig. 2b). Although residue 375 represented low variability, it was included as part of the antigenic site as it mapped on the surface of the molecule and might potentially play a role in antibody recognition. Antigenic site D was also extended to include two additional residues (396 and 397), as they clustered with original residues (393, 394, and 395) on the surface (19). The new motif, denominated motif G, included residues 352, 355, 356, 357, 359, and 364; the last motif, denominated motif H, included residues 309 and 310. A summary of the residues corresponding to each of the motifs/antigenic sites is shown in Fig. 2b. Profiling of the temporal frequency of the amino acid sequence patterns (mutational patterns) of previously characterized antigenic site A, expanded antigenic site C, and new antigenic site G (confirmed in this study) indicated that their mutational patterns correlated well with the fluctuation of GII.4 variants in the human population, suggesting a major role in the emergence of variants ( Fig. 3; see also Fig. S4). Correlation between the antigenic site mutational patterns and GII.4 variants was assessed using adjusted Rand index values, and antigenic site G was the one presenting the best correlation with the GII.4 variant circulation pattern while showing low sequence variation ( Fig. 3b and c). Of note is that the mutational pattern determined for old antigenic site C did not correlate with the circulation of variants in nature ( Fig. S5a and b), suggesting that our population-guided antigenic site characterization provided a better resolution of the antigenic profiling of GII.4 noroviruses. Neither previously defined putative epitope (motif) B (19) (Fig. S4). Mutations on antigenic sites D and E have been shown to alter both antigenicity and binding to HBGA carbohydrates (19,24,28) while showing modest correlation with GII.4 variant emergence (Fig. 3c). Thus, the evolutionary pressure correlating to antigenic sites D and E might be different from that correlating to sites involved only in the antigenic characteristics of the virus. Of note, the mutational pattern from expanded antigenic site D correlates much better than that from the original antigenic site D ( Fig. S5a and b), and this improvement of correlation is consistent with the recent addition of residue 396 to this antigenic site (29). In contrast, the mutational pattern determined for expanded antigenic site E showed a lower level of correlation than that determined for the original antigenic site ( Fig. S5a and b). This might have been due to the expanded site (inclusion of residue 414) showing variation within the variants Den Haag 2006b and Sydney 2012, making the profiles less consistent with the intervariant diversity. Since our data set represents sampling bias, i.e., over 50% of the sequences in the data set were collected during 2010 to 2016, the Shannon entropy analysis was reconducted using a randomly subsampled data set that included a maximum of 50 sequences per variant (Fig. S6a). This data set sensitively reflected the variation of viruses circulating before 2010 and showed 13 additional variable sites mapping outside the newly defined variable motifs/antigenic sites. Only four of them were located and clustered on the surface of the VP1; two of them (residue 250 and 504) were located together and might represent a site that differentiates early strains (Grimsby 1995 variant) from the others, and the remaining two (residues 300 and 329) mapped close to motif/antigenic site G (Fig. S6b). These two residues could be a part of antigenic site G; however, both represented minor variations over decades, suggesting a subtle impact on variant emergence (Fig. S6c). Residue 504 was recently shown to be a part of the GII.4 cross-protective epitope (30). Thus, despite the variation shown in Fig. S6c, the substitution (P504Q) might have a small impact on the antigenic diversification of GII.4 variants.
The mutational patterns determined for three motifs (A, C, and G) correlated well with the fluctuation of GII.4 variants (Fig. 3c). Motif/antigenic site A was previously confirmed to be a major antigenic site (23), while the expanded/new motifs (C and G) were not yet confirmed experimentally. To confirm the role of these two motifs in the antigenic makeup of the GII.4 capsid, we replaced residues of VP1 from a   MAb that targets the Shell domain was used as a negative control for blocking assay.
Diversification and Emergence of GII.4 Noroviruses 6E6, 17A5, and 18G12, newly developed against the WT 2012. We found that mutations at new motifs C and G abrogated binding of MAbs B11, B12, 6E6, and 18G12 and of MAbs 1C10 and 17A5, respectively (Fig. 4b). Reconstitution of motif C in the WT 2012 was achieved by reverting those sites to WT 2004 (indicated as "2012: C2004" VLPs in the upper panel of Fig. 4b). Of note, when mutations were introduced at residues 340 and 376, which were regarded as correlating to original antigenic site C, no differences in binding were observed with MAbs B11 and B12 (11). While changes at residues 377 and 378 reduced the levels of binding to those MAbs, an additional mutation at residue 340 was required for complete antigenic site depletion (Fig. S7). Likewise, reconstitution of motifs C and G in the WT 2004 was achieved by reverting those sites to the WT 2012 (indicated as "2004: C2012" and "2004: G2012" VLPs in the middle and bottom panels of Fig. 4b). Blockade potential (a surrogate of norovirus neutralization) was confirmed for all four MAbs using HBGA-blocking assays, except for MAb 30A11 that binds to the S domain of the VP1 (31) and was used as a control (Fig. 4c).
The impact of mutations on antigenic sites C and G with respect to the immune response was evaluated using HBGA-blocking assays. Mutants of antigenic site A were included for comparison. Polyclonal sera against WT 2004 VLP showed high blocking activity against homologous wild-type VLP (low half-maximal effective concentration blocking values [EC 50 values]) but reduced blocking against the WT 2012 VLP (high EC 50 value) (Fig. 5). Mutations on antigenic sites A, C, and G on those wild-type VLPs altered the blockade potential of polyclonal sera. As shown in previous studies (19,23), transplanting of antigenic site A between WT 2004 and WT 2012 resulted in a large difference in the levels of blocking activity in experiments using the polyclonal sera against the WT 2004 ( Fig. 5a and b, left panels). Mutations on antigenic sites C and G had a minor effect or no effect on the blocking activity of the sera against the WT 2004 ( Fig. 5a and b, left panels). Interestingly, the blocking pattern was very different in experiments using the polyclonal sera raised against WT 2012 VLPs. Transplanting of antigenic sites A and G into WT 2004 VLPs (indicated as "2004: A2012" and "2004: G2012" in the right panels of Fig. 5a and b) resulted in blocking activity by sera raised against the WT 2012. Notably, polyclonal sera raised against WT 2012 VLPs did not lose blocking activity against antigenic site A and C mutant VLPs (2012: A 2004 and 2012: C2004) but lost blocking activity against antigenic site G mutant VLP (2012: G2004) ( Fig. 5a and b, right panels). In summary, both antigenic sites (A and G) showed distinctive roles as blockade sites; notably, antigenic site G played a role equal to or greater than that played by major antigenic site A in the antigenicity of the WT 2012 strain (Fig. 5), suggesting the potential of both as protective antigenic sites. Intravariant evolution is driven by stochastic processes. In contrast to the accumulation of nucleotides and amino acids detected at the intervariant level ( Fig. 1b  and c), there was limited accumulation of nucleotide substitutions (data not shown) and amino acid substitutions (Fig. 6a) within variants (intravariant level). Despite this limited accumulation of substitutions, all of the pandemic variants presented diversity in their sequences (average of 4.8 amino acid substitutions). Interestingly, this diversity was detected at most of the major antigenic sites and variants ( Fig. 6b; see also Fig. S8). Thus, while each variant presented a major amino acid combination for each antigenic site, most of the GII.4 variants presented other minor amino acid patterns on those antigenic sites. Moreover, the analysis of intravariant diversification showed that their evolution was stochastic in time and location (Fig. 7), in contrast to the temporally clustered intervariant evolution of GII.4 strains (Fig. 1). While some variants (e.g., Hunter 2004, Den Haag 2006b, New Orleans 2009, and Sydney 2012) presented diversity in their major antigenic sites after 3 to 4 years of circulation, which might suggest that immune pressure acts at the intravariant level (Fig. 6), the numbers of strains (and sequences) are limited and do not represent dominant strains. Two other important observations occurred while analyzing the mutational pattern of each of the antigenic sites at the intravariant level: (i) major differences at the amino acid sequence level were detected in early strains for New Orleans 2009 and Sydney 2012 variants ( Fig. 6; see also Fig. S8), and those early strains did not represent the major amino acid combination for any of the four major antigenic sites (A, C, D, and E; Fig. S8); (ii) none of the preceding strains evolved toward (or presented) the amino acid motif seen in the later strains. This, together with the phylogenetic analyses, suggests that each pandemic variant presented a different origin and did not follow a trunk-like linear evolution such as that seen in H3N2 influenza viruses (32).
Differences in the evolutionary patterns of the intervariants and intravariants were also confirmed by Bayesian Markov chain Monte Carlo (MCMC) analysis. Substitution rates of seven variants with Ͼ50 sequences were calculated, and the results are summarized in Table 1. The rate of (overall) intervariant GII.4 strains was reported previously elsewhere (15). Intravariant substitution rates ranged from 1.57 ϫ 10 Ϫ3 to   (Table 1).
Diversifying pressure drives emergence of pandemic GII.4 noroviruses. Because different factors seemed to drive the evolution of the intervariants (overall) and intravariants, we performed selection analysis using the mixed-effect model of evolution (MEME) method and looked for evidence of site-by-site episodic diversifying pressure on the VP1 along the branches of its evolutionary tree. To analyze the overall evolutionary process, we randomly subsampled sequences from the original data set that included a maximum of 30 strains per variant. During the overall evolution, we found 9 positively selected sites (P Ͻ 0.05 and empirical Bayes factor Ͼ 100) on the P2 subdomain on the surface (codon sites 327, 335, 352, 355, 357, 366, 368, 375, and 378) distributed on 21 branches of the phylogenetic tree (Fig. 8a). Branches connecting discrete GII.4 variants presented sites undergoing episodic diversification (codon sites 352, 355, 357, 368, and 378) that mapped on the antigenic sites. The mutational pattern of these sites correlated well with GII.4 variant emergence, with higher adjusted Rand index values than any of the antigenic sites ( Fig. 8b and c). This suggests that residues on the antigenic sites experienced episodic diversifying pressure during intervariant evolution. Analyzing each variant only, most of the diversifying pressure was found to Diversification and Emergence of GII.4 Noroviruses be present on the branches connecting to the tips rather than on the internal branches (Fig. S9), suggesting that nonsynonymous substitutions were deleterious and rarely fixed in the population during the intravariant evolution. The more comprehensive data set presented by New Orleans and Sydney variants may have included many viruses with deleterious mutations that would not persist (as indicated by the higher number of nonsynonymous substitutions on branches connecting to the tips) but that would account for an artificially higher substitution rate than that seen with the other variants (Table 1). Taken together, the data show that the diversifying pressure has driven the intervariant (but not the intravariant) evolution of GII.4 strains.

DISCUSSION
GII.4 noroviruses are the most common cause of norovirus infections worldwide. Although other norovirus genotypes have predominated in specific locations and time, the global dominance of GII.4 has been recorded for almost 3 decades (16,33). The persistence and dominance of GII.4 over all other norovirus genotypes have been explained by the chronologically sequential emergence of variants, which enables the  virus to evade the immunity acquired to previously circulating GII.4 variants, a process similar to that seen with H3N2 influenza viruses (26,28,32). Since the mid-1990s, over 10 different variants have been reported, with 6 of them associated with large outbreaks worldwide. The overall evolutionary pattern of GII.4 viruses presents a strong linear accumulation of amino acid substitutions during intervariant diversification (15), with most substitutions occurring in the P2 subdomain (28). Antigenic differences among variants have been largely attributed to highly variable residues that map on the surface of the P domain, leading to the identification of five (A to E) motifs that are part of GII.4-specific antigenic sites (18)(19)(20). While the binding site was characterized for different GII.4-specific MAbs, the same studies reported numerous GII.4-specific MAbs whose binding sites have not been determined (11,12,19,(23)(24)(25). Applying a population genomics approach, we found new (or expanded) motifs on the surface of the capsid that presented mutational patterns that correlated with the circulation of GII.4 variants. The role of antigenicity of two of those motifs (antigenic sites C and G) was confirmed with (previously) uncharacterized HBGA-blockade MAbs (B11 and B12), newly developed HBGA-blockade MAbs (1C10, 6E6, 17A5, and 18G12), and polyclonal sera from guinea pigs immunized with wild-type VLPs. Antigenic site G presented the strongest correlation with the emergence and circulation of new GII.4 variants, while being more highly conserved than other large antigenic sites (e.g., site A, C, or E). This indicates that newly discovered antigenic site G plays a pivotal role in transforming the GII.4 noroviruses into new variants with pandemic potential. Antigenic site C is close to previously defined antigenic site A, but substitutions on antigenic site A did not affect the binding of those MAbs (11). Notably, competition analyses among different MAbs showed that MAbs B11 and B12 partially blocked interactions with MAbs mapping to antigenic site A (11), suggesting that this new motif is part of an antigenic site that involves two or more epitopes (at least antigenic sites A and C). Recently, Koromyslova and colleagues showed that the footprint of an antibody that neutralized human noroviruses mapped to residues from antigenic sites C and D (22). This demonstrates that the same epitope could be shared by the different antigenic sites and that their interaction could result in differences in the evolutionary patterns presented. In addition, differences in HBGA-blocking ability between antigenic sites (A, C, and G) and  (35,36). Similarly, the NERK motif, which was suggested to occlude conserved GII.4 antigenic sites and affect the antibody blockade potency (37,38), was shown to be highly conserved among GII.4 variants. Previous studies pointed out a mutation in this motif in a Sydney 2012 variant; however, our large-scale analysis showed that the New Orleans variant was the only one showing-at the population level-any mutation at this motif (Fig. S10). Thus, the profiling of mutational patterns that we implemented, which included the use of a large number of sequences, could provide a better understanding of the role of individual mutations in the circulation and predominance of the pandemic GII.4 variants. Immunological analyses that include multiple different viruses from each of the pandemic variants and in-depth population analyses are needed to better delineate the meaning of minor mutations with respect to the antigenic differences among the variants.
Improvements in the understanding of viral dynamics and of the correlation between antigenic and genetic changes in influenza viruses have facilitated the selection of virus strains to be included in upcoming seasonal vaccines (39)(40)(41). To better understand the viral dynamics of GII.4 noroviruses, we performed selection analyses that included all variants reported for over 4 decades. Episodic diversifying (positive) selection was observed in five residues (352, 355, 357, 368, and 378) during intervariant evolution (Fig. 7a); these residues are part of antigenic sites A, C, and G. Three of these residues (352, 357, and 378) showed positive selection on major branches of the GII.  (43), and our mutational analyses indicated a pivotal role of sites A and G in the antigenic differences, thus supporting our in silico observation. Initial studies suggested that most of the positively selected sites for GII.4 noroviruses were located in the S domain (28,42,44); however, our analyses, with an expanded number of sequences and variants, showed that the emergence of new variants stemmed from the positive selection of residues mapping on the surface of the P domain. Pinpointing of the changes required for emergence of a new antigenic variant of seasonal influenza virus or emergence of a pandemic virus is the "holy grail" for controlling viruses undergoing constant change. While prediction of viral emergence requires a holistic approach that includes studies of the virus, the environment, and the host (45)(46)(47), careful monitoring of the substitutions in residues involved in the diversification and emergence of new GII.4 viruses could help in the early detection of future novel variants with pandemic potential.
The GII.4 intervariant diversification, likely driven by the immune status of the population, is correlated with the accumulation of amino acid substitutions in the major capsid protein. In contrast, our analyses suggest that diversification at the intravariant level is much restricted, with amino acid substitutions occurring without indications of diversifying selection. Minimal substitutions of the amino acids that mapped on the major antigenic sites were observed over the predominance of each variant, and most seemed to follow a stochastic process. The latter could represent a result of multiple pressures exerted on the virus, including but not limited to individual virus-host interactions and dispersion. For two variants (New Orleans 2009 and Sydney 2012), a larger, more comprehensive data set was available, and prepandemic strains have been reported previously (48)(49)(50)(51). Interestingly, although these prepandemic strains cluster within their respective variants, they present multiple differences (mostly mapping at the antigenic sites) from the virus population that later established worldwide dominance. Together, these results suggest that emergence of GII.4 variants could occur in the following sequence: (i) a prepandemic stage characterized by acquisition of mutations that facilitate viral emergence and episodic diversification (exemplified here by residues 352, 357, 368, and 378); (ii) a short period (1 to 2 years) of adaptability (with different antigenic motifs) that precedes the pandemic phase; and, finally, (iii) the pandemic phase, where the virus is dominant and explores only a narrow space of sequence diversity. A similar pattern has been observed for H3N2 influenza viruses (52) and rotaviruses (53), in which viruses that circulate at low levels are able to predominate in the following season without major changes in their genetic background. This order of events might have occurred during the recent emergence and predominance of GII.17 noroviruses in many Asian countries. During 2013 to 2014, a "new" GII.17 norovirus (variant C) was detected circulating in different countries (Japan, Hong Kong, and China), but in the next epidemic season, this variant was ultimately replaced by variant D, which spread worldwide and predominated from 2014 to 2016 (15,(54)(55)(56). In the same context, we found six GII.4 strains that did not cluster with any GII.4 variant (Fig. S1) and that might represent strains that did not adapt well to the human population. These strains presented different sequence patterns in their antigenic sites and therefore might constitute strains in the prepandemic stage that explored the sequence space but failed to thrive sufficiently to reach the next evolutionary level.
Our findings suggest two different mechanisms behind the evolutionary dynamics of the major capsid protein from GII.4 noroviruses: (i) a steady pandemic phase governed by stochastic processes, preceded by (ii) substitutions that arise from positive selection. This report also provides a methodological framework that could facilitate the characterization of the variable antigenic sites that play a relevant role in the emergence of new viruses in the human population. Studies that include large and long-time-scale data sets of full-length genomes would help to determine the factors involved in norovirus predominance and persistence in the human population.

MATERIALS AND METHODS
Data mining and sequence analyses. A total of 1,601 full-length (1,623-nt) and nearly full-length (Ն1,560 nt) VP1 sequences of the GII.4 genotype, from which sequences from immunocompromised patients and environmental samples were removed, were downloaded from GenBank (accessed July 2017) (data available upon request). The sequences spanned the 42 years from 1974 to 2016. Sequences were aligned using ClustalW as implemented in MEGA v7 (57) and were visually inspected to confirm proper alignment. The nomenclature used for the GII.4 variants was adopted as previously indicated (15), and variant data sets were parsed following such information. Sequence analyses were performed using the total data set except where indicated. Entropy analysis and profiling of mutational patterns included only strains (1,572 VP1 sequences) collected from 1995 to 2016, as Grimsby-like viruses were the first recorded to cause large outbreaks worldwide (33) (58). Profiling of mutational patterns of motifs/antigenic sites was performed using R 3.4.2 (59). Amino acid sequences of each motif/antigenic site were profiled by year, and the corresponding mutational patterns were plotted in a composite bar graph that showed the number of strains with each pattern as a fraction (percentage) of a whole (total number of strains). The correlation of the mutational patterns and the variant distribution was assessed using adjusted Rand index values, known as a clustering analysis method, which evaluated the degree of the matches between mutational patterns and variant classification. Adjusted Rand index values were calculated using R and the mclust package (60). To account for the sampling bias associated with the original data set, we repeated the entropy and structural analyses using a randomly subsampled data set that included a maximum of 50 strains/variant (n ϭ 474) from the original GII.4 data set (1,572 sequences).
Diversifying selection analysis. Maximum likelihood (ML) phylogenetic trees of VP1-encoding nucleotide sequences for all variants (intervariant analyses) and for each variant (intravariant analyses) were constructed using PhyML (61). The best substitution models were selected based on the lowest corrected Akaike information criterion (AICc) value for each data set using jModelTest v2 (62,63). Analyses of larger data sets (i.e., the Sydney 2012 and New Orleans 2009 variants) tended to favor a generalized time-reversible (GTR) substitution model, while analyses of variants with fewer reported sequences favored a Tamura-Nei 93 (TN93) model. Diversifying selection analyses of the VP1-encoding sequence through its ML phylogenetic tree were performed by using MEME methods (64,65). We aimed to detect codon sites subjected to positive selection (i.e., with more nonsynonymous substitutions than synonymous substitutions) during their evolution and focused on the sites at or near the major antigenic sites of GII.4. Significant positive selection was indicated by P values of Ͻ0.05. The branches that were subjected to diversifying positive selection were explored by using empirical Bayes factors of Ͼ100 in MEME. To reduce the data size for the intervariant analyses of positive selection, we randomly subsampled a maximum of 30 strains/variant (n ϭ 308) from the original GII.4 data set (1,601 sequences) as indicated previously.
Bayesian analyses of nucleotide substitution rates. Using the VP1 sequences and data corresponding to the respective collection years from each strain, temporal phylogenetic analysis was performed using Bayesian Markov chain Monte Carlo (MCMC) methodology in BEAST v1.8.3 (66). The best substitution models were selected based on the lowest corrected Akaike information criterion (AICc) Diversification and Emergence of GII.4 Noroviruses ® value as mentioned above. The clock models (strict or relaxed lognormal clock) and tree priors (constant population size, exponential growth, or skyline) were tested, and the best models were selected based on the model selection procedure using AIC through MCMC. The MCMC runs were performed until all the parameters reached convergence. MCMC runs were analyzed using Tracer v1.6 (http://tree.bio.ed.ac .uk/software/tracer/). The initial 10% of the logs from the MCMC run was removed before summarizing the mean and the 95% highest posterior density interval of the substitution rates.
Site-directed mutagenesis and VLP production. The VP1-encoding sequences from a GII.4 Farmington Hills 2002 (MD2004-3) strain and a Sydney 2012 (RockvilleD1) strain were ligated into pFastBac1 vectors using SalI and NotI restriction sites (11,67). Site-directed mutagenesis of pFastBac-MD2004-3 and pFastBac-RockvilleD1 was performed using mutation-specific forward and reverse primers, followed by purification with illustra MicroSpin G-50 columns (GE Healthcare, Buckinghamshire, United Kingdom). Parental DNA was digested using the DpnI enzyme (New England BioLabs, MA, USA). VLPs presenting multiple mutations were developed by cloning a chemically synthesized P domain into a pFastBac1 plasmid containing the S domain by the use of PspXI and NotI restriction sites (11). Each pFastBac construct was transformed via electroporation into ElectroMAX DH10B cells (Thermo Fisher Scientific, CA, USA) and grown on LB plates with ampicillin overnight at 37°C. Selected colonies were used to extract plasmid DNA (QIAprep Spin Miniprep kit; Qiagen, Hilden, Germany). Introductions of mutations were confirmed by Sanger sequencing. VLPs were produced using a Bac-to-Bac baculovirus expression system (Invitrogen, CA, USA) and purified through a cesium chloride gradient as previously described (67). Expression of VP1 protein was confirmed by Western blotting, and VLP integrity was confirmed by electron microscopy.
Immunoassays. Mutants and wild-type norovirus VLPs were analyzed for reactivity to monoclonal antibodies by enzyme-linked immunosorbent assay (ELISA) as described previously (11). The B11 and B12 MAbs were obtained from mice immunized with GII.4 Farmington Hills 2002 variant (MD2004-3 strain) VLP (11) and were generously provided by Kim Y. Green (National Institutes of Health, USA). The GII.4 Sydney 2012 variant-specific MAbs (1C10, 6E6, 17A5, and 18G12) were developed from mice immunized with RockvilleD1 strain VLP (GenScript, NJ, USA). HBGA-blocking assays were performed using mutant and wild-type VLPs, HBGA molecules derived from human saliva, the Sydney 2012 variant-specific MAbs, and polyclonal antibodies. The polyclonal antibodies were obtained from guinea pigs and mice immunized with GII.4 Farmington Hills 2002 (MD2004-3) and Sydney 2012 (RockvilleD1 strain) VLPs (11,67). Human saliva was collected from a healthy adult volunteer. The saliva sample was boiled at 100°C for 10 min immediately after collection and centrifuged at 13,000 rpm for 5 min. The saliva supernatant was collected and used for HBGA-binding and HBGA-blocking assays. Briefly, serial dilutions of mouse MAb or guinea pig polyclonal sera were mixed with wild-type or mutant VLPs and incubated on the saliva-coated plate for 1 h at 37°C. Plates were washed four times to remove unbound (i.e., blocked by mouse MAbs or guinea pig polyclonal sera) VLPs. Pooled sera from guinea pigs or mice immunized with Farmington Hills 2002 VLP and Sydney 2012 VLP were used to detect the VLPs attached to the plate. Goat anti-guinea pig or anti-mouse IgG conjugated with horseradish peroxidase and 2,2'-azino-bis(3ethylbenzothiazoline-6-sulfonic acid) (ABTS) substrate (SeraCare, MA, USA) was used to develop the blue-green color on the VLP-attached plate. The degree of blocking was evaluated using optical density (OD) at 405 nm and EC 50 calculated for the serum dilutions. The EC 50 was calculated from the normalized OD curve using GraphPad Prism v7. One-way analysis of variance (ANOVA) and Dunnett's multiplecomparison test were conducted to analyze the differences in EC 50 values between wild-type and mutant VLPs using GraphPad Prism v7. Differences in EC 50 values corresponding to P values of Ͻ0.05 were considered statistically significant.

ACKNOWLEDGMENTS
We thank Steve Rubin for the critical reading of the manuscript. Financial support for this work was provided by Food and Drug Administration intramural funds (Program Number Z01 BK 04012-01 LHV to G.I.P.). K.T. and C.J.L. are recipients of a CBER/FDA-sponsored Oak Ridge Institute for Science and Education (ORISE) fellowship.