Introduction

Noroviruses (NoVs) are the second most important cause of food-borne viral gastroenteritis, after group A rotaviruses (RVs), and they are responsible for about 18% of all diarrheal cases and more than 200,000 deaths per year, globally [1]. Norovirus infection occurs in all age groups but is more predominant in children [2]. Norovirus illness and norovirus-related deaths among children below the age of five years have been shown to impose a significant economic burden on the countries of South East Asia and the Western Pacific region, while societal productivity was found to be impacted equally in low, middle, and high-income countries [3].

Noroviruses are non-enveloped viruses belonging to the family Caliciviridae, and they have a single-stranded positive-sense RNA genome that is nearly 7.5 kb long and contains three open reading frames (ORFs). ORF 1 encodes a non-structural polyprotein that is cleaved into six non-structural proteins, including the viral RNA-dependent RNA polymerase (RdRp), that are involved in viral replication. ORF 2 encodes a major structural protein, viral protein 1 (VP1), which is divided into an N-terminal shell (S) domain and a C-terminal protruding (P) domain. The S domain surrounds the viral RNA, and the P domain consists of P1 and P2 subdomains. The P2 subdomain is the most surface-exposed portion of the capsid protein, with a highly variable amino acid sequence, and contains the major neutralization epitope [4]. ORF 3 is around 720 bp long and encodes the minor capsid protein (VP2), which stabilizes the capsid and is thought to be involved in virus assembly [5]. On the basis of the amino acid sequence diversity of the VP1 gene, 10 norovirus genogroups (GI–GX) have been identified [6]. Among them, genogroups GI, GII, GIV, GVII, and GIX (previously GII.15) are found in humans, and GII is predominant among children. These 10 genogroups can be further divided into 49 capsid genotypes. On the basis of partial nucleotide sequences of the RdRp region, 60 P-types have been confirmed to date [7].

More than 30 norovirus genotypes that infect humans have been identified, but GII.4 is predominant worldwide and is responsible for ~70% of norovirus outbreaks [8]. The pandemic variants New_Orleans_2009 and Sydney_2012 have evolved through processes involving both antigenic drift and antigenic shift (due to GII.4 intra-genotype recombination at the ORF1-ORF2 junction) [8]. GII.4 Sydney viruses emerged in 2012 and were associated with a GII.P31 polymerase (GII.P31-GII.4 Sydney 2012), but in 2015, another recombinant GII.4 Sydney strain emerged that was associated with a novel GII.P16 polymerase (GII.P16-GII.4 Sydney 2012) [9]. As GII.4 NoVs are antigenically diverse, frequent changes in the major antigenic epitopes leads to emergence of new GII.4 variants, which can potentially escape from the immunogenic responses developed against the previous variants [8]. A few human norovirus vaccines (VLP-based vaccine candidates) are in the clinical stages of development. Some of them are bivalent vaccines that were generated based on a VP1 gene of the prototype GI.1 strain and a GII.4c strain (consensus of three different GII.4 strains) [10].

Although there are few reports pertaining to the disease burden of noroviruses in India, several sporadic cases of acute gastroenteritis caused by noroviruses have been reported in different parts of this country in the last two decades [11,12,13,14,15]. About 11% of diarrheal cases among the travellers visiting India during 2002-03 were caused by noroviruses [16]. An infection rate varying between 1.4% and 30.5% was observed in different states of northern India. In New Delhi, norovirus was found to be the second most predominant virus (25.7%) after RV [17]. In southern India, norovirus infection rates of about 10% and 44.4% were reported in two different studies during 2005-06 [18, 19], while a birth cohort study found that about 11.2% of diarrheal episodes were attributable to noroviruses [20]. About 10.7% norovirus positivity among hospitalized patients was reported from different parts of western India [21, 22]. A community-based study (GEMS) across Africa, India, Pakistan, and Bangladesh over 36 months (2007-2011) reported a norovirus GII incidence rate of 4.7% among children aged 12-23 months with moderate to severe diarrhoea [15]. Similarly, hospital-based studies in eastern India found about 3.1% norovirus infection cases during 2007-2009 [23,24,25].

Although rotavirus vaccination has successfully alleviated a major proportion of the morbidity and mortality associated with acute gastroenteritis (AGE), there are reports of increased incidence of noroviruses in certain countries since the introduction of RV vaccines [26,27,28]. The RV vaccine was introduced in the Universal Immunisation Programme (UIP) of India in 2016 in a phased manner, and in 2020, the rollout was complete. Current norovirus surveillance in eastern India has revealed an increase in the prevalence of GII norovirus infection in recent years. Therefore, the aim of this study was to estimate the prevalence and genetic diversity of GII norovirus in Kolkata, eastern India. The capsid genes of the norovirus strains circulating in eastern India were sequenced to find changes in the antigenic epitope of the P2 domain.

Materials and methods

Sample collection and preparation of viral suspensions

Fecal samples were collected from 2812 children suffering from acute gastroenteritis who were seeking treatment at two hospitals in Kolkata, Infectious Disease and Beliaghata General Hospital (IDH) and Dr.B.C. Roy Post-Graduate Institute of Pediatric Sciences (BCH), during January 2018 to December 2019. Prior to execution of this study, consent was obtained from the institutional ethics committee. A 30% w/v viral suspension (VS) of the stool samples was prepared in PBS. To remove the stool debris, the VS was centrifuged briefly at 13,000 rpm for 15 min at 4°C. The samples were further stored at -80°C until used.

RNA extraction and detection and genetic characterization of noroviruses

TRIzol Reagent was used for nucleic acid extraction from the clarified viral suspensions. The extracted genetic material was subjected to multiplex one-step reverse transcription PCR using NoV- and RV-specific primers (Supplementary Table S1) as described previously [29]. Norovirus positive samples (n = 170) were further subjected to RT-PCR using GI and GII genogroup-specific primers (Supplementary Table S1) [9]. The results showed that 145 samples were mono-infected with GII norovirus, whereas 15 samples had mixed infection with both GI and GII noroviruses. For genotyping of GII, every second positive sample was selected for sequencing, but for months in which positivity was low (n ≤ 5), all of the samples were sequenced. The G (capsid type) and P (polymerase type) genotype were determined by amplifying ORF2 (VP1 gene) and ORF1 (RdRP region), respectively, using primer sets specific for the capsid and polymerase genes (Supplementary Table S1).

DNA sequencing and phylogenetic analysis

The amplified PCR products were purified using a QIAquick PCR Purification Kit (QIAGEN GmbH, Hilden, Germany) and subjected to DNA sequencing using an ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction Kit v3.1 (Applied Biosystems, Foster City, California, USA) in an ABI Prism 3730 Genetic Analyzer (PE Applied Biosystems, Foster City, California, USA) [30].

The nucleotide sequences obtained for each norovirus genotype were analyzed using Norovirus Genotyping Tool v.2.0 [31] and a nucleotide sequence BLAST search using the National Centre for Biotechnology Information (NCBI, National Institutes of Health, Bethesda, MD) Basic Local Alignment Search Tool (BLAST) server with GenBank database release 143.0.

Phylogenetic analysis of the sequences was performed using the maximum-likelihood method (with 1000 bootstrap replications) in Molecular Evolutionary Genetics Analysis, version X (MEGA-X). For analysis of the capsid and polymerase genes, the GTR + G + I substitution model was found to be the best-fit model, producing the lowest BIC (Bayesian information criterion) and corrected Akaike information criterion (AICC) scores using the model testing parameter tool in MEGA X (v.10.0.5). The LALIGN program was used to analyze sequence similarity among norovirus strains using a global alignment without the end gap penalty criterion. For each phylogenetic analysis, a multiple alignment was made of all of the sequences. Strains with ≥ 99% nucleotide sequence identity to each other were grouped, and representative strains from each group were used in preparing the tree.

Statistical analysis

Epi-info version 7 was used for chi-square analysis and calculation of p-values. The mantel-Haenszel chi-square test was used to find any significant difference in the occurrence of norovirus between age groups, times, and genders. Spearman's rank correlation (SPSS Statistics version 21) was used to determine any correlation between norovirus genotypes and different age groups. p-values < 0.05 were considered to be statistically significant.

Recombination analysis

Recombination events were confirmed by similarity plot analysis in SimPlot software (v. 3.5.1). For recombination analysis, Kimura (2-parameter) was selected, and a window width of 200 bp and a step size of 20 bp were used.

Results

Demographic characteristics of norovirus gastroenteritis in hospitalized and OPD patients

During January 2018 to December 2019, a total of 2812 stool samples from acute gastroenteritis patients were screened for the presence of RV and NoV, 24.96 % (n = 702) and 6.04% (n = 170) of which were found to be rotavirus and norovirus positive, respectively. Of the 170 NoV-positive samples, 22 (12.9%) were also positive for RV. Norovirus positivity was higher (6.95%; 98/1411) in outpatients, who had mild to moderate diarrhoea, than in inpatients (5.14%; 72/1401), who had moderate to severe diarrhoea (chi-square 4.03; p < 0.05) (Fig. 1a). Norovirus infections occurred throughout most of the year in eastern India, with no distinct seasonality (Fig. 1b). Among the 2812 samples, 1703 (60.6%) were from males and 1109 (39.4%) were from females. No significant difference in the norovirus detection rate was observed between males and females (chi-square 0.0239; p > 0.05). However, a significant difference was observed between age group and norovirus positivity. Children 6 to 12 months of age were found to be the most susceptible to norovirus infection (8.2%) (chi-square value, 19.52; p-value, < 0.05). More than 98% (n = 167/170) of norovirus-infected children were less than 36 months old (Fig. 2).

Fig. 1
figure 1

a Distribution of samples from outpatient department cases and hospitalized cases of acute gastroenteritis in eastern India from January 2018 to December 2019. Values that are significantly different (p < 0.05) are indicated by an asterisk. b Distribution of norovirus infection among children (≤ 5 years of age) with acute gastroenteritis in eastern India from January 2018 to December 2019

Fig. 2
figure 2

Age distribution of norovirus infection among children (≤5 years of age) in eastern India from January 2018 to December 2019

Norovirus genotype distribution

Of the 170 norovirus-positive samples, 145 (85%) were infected with norovirus GII only, and two (1%) were infected with GI only. Fifteen (8.8%) samples were found to have a mixed infection with genogroups GI and GII, and eight samples were untypeable (Fig. 3). Of the GII single-infection samples (n = 145), 88 were sequenced of which 10 were found to be coinfected with NoV-RV. Of the 15 GI+GII coinfection cases, only two showed good sequence coverage. Thus, out of 160 GII-positive samples, 90 (56%) samples were analysed further. Four different genotypes of norovirus GII were identified, but GII.4 Sydney 2012 was the predominant capsid genotype (n = 75/90, 83.3%) in both single NoV infections and RV-NoV coinfections. The other genotypes were GII.3 (10/90, 11.1%), GII.13 (4/90, 4.4%), and GII.17 (1/90; 1.1%). No significant difference in the norovirus genotype distribution was observed between males and females (chi-square, 3.304; p-value, 0.347). Also, no significant difference was observed in the temporal distribution of norovirus genotypes between the years 2018 and 2019 (chi-square values, 0.0037-3.51; p-value >0.05). Noroviruses possessing the GII.4 Sydney 2012 capsid type were found to be associated with both GII.P16 and GII.P31 polymerase types, with a rate of 58.7% (n = 44/75) and 41.3% (n = 31/75), respectively. The GII.3 and GII.13 capsid types were found to be associated only with GII.P16 polymerase. To investigate the correlation between different norovirus genotypes with age, the Spearman correlation coefficient was determined. The GII.P16-GII.4 (-0.564, two-tailed significance, 0.322), GII.P16-GII.13 (-0.364, two-tailed significance, 0.547), GII.P31-GII.4 (-0.7, two-tailed significance, 0.188), and GII.P16-GII.3 (-0.791, two-tailed significance, 0.111) genotypes did not correlate with any specific age group.

Fig. 3
figure 3

Distribution of different norovirus genogroups detected in eastern India from January 2018 to December 2019

Phylogenetic analysis of the polymerase gene

For phylogenetic analysis based on the polymerase gene sequences of GII noroviruses circulating in eastern India, a 720-nt region at the 3’ end of the RdRp (ORF-1) was amplified. All representative strains with GII.P16 polymerase type, including those with both GII.4 and GII.3 capsids, were seen to cluster with the previously reported GII.P16 strains from the USA, Japan, the UK, Australia, China, and Russia, creating a novel and phylogenetically distinct monophyletic clade (Supplementary Fig. S1). The representative GII.P16 strains of subcluster A, were 97%-98% identical to each other and were closely related to GII.P16 strains from the USA (ST466) (98.1% sequence identity). The GII.P16 strains of subcluster B had 98.7%-99% sequence identity (98%-98.6%) to each other and were closely related to GII.P16 strains from the USA, the UK and Japan. The subcluster A and B strains showed 96%-97.4% identity to each other. The subcluster C strains were distantly related to the subcluster A and B strains, with 95.6%-96% and 96%-96.4% identity, respectively. The eastern Indian subcluster C strains were seen to cluster with the previously reported GII.P16 strains from the USA, Australia, Thailand and Taiwan (98.5%-99.7% nucleotide sequence identity), with the highest nucleotide sequence similarity to the GII.P16 strain Bay0244 (MK764016.1) from the USA.

Phylogenetic analysis of the capsid gene

The representative eastern Indian GII.4 Sydney 2012 capsid sequences were seen to cluster together and were 98.7%-99.5% identical to each other (Supplementary Fig. S2). Two representative GII.4 Sydney 2012 strains, viz., NIC- NV-390 and IDH-11405, out-clustered from the eastern Indian GII.4 cluster and associated with GII.4 strains from Australia, the USA, and Japan.

Molecular phylogenetic characteristics of recombinant noroviruses

The ORF1/ORF2 junction of the viral genome was amplified using the primers Mon431 and G2SKR to make an amplicon of ~ 557 bp (Supplementary Table S1) to identify potential recombination events between ORF1 and ORF2. Among the 90 GII-positive samples, 10 GII.P16-GII.3 recombinants and from GII.P16-GII.13 recombinants were found. Representative strains of these GII.P16-GII.13 and GII.P16-GII.3 recombinants were analysed in Simplot for the detection of putative recombination breakpoints. The crossing-over point was observed at the ORF1/ORF2 junction region of the parental strains (Supplementary Fig. S3). The capsid sequences (~280 nt at the 5’ end of the ORF 2) were analyzed, and a phylogenetic tree was constructed accordingly (Supplementary Fig. S4). Four samples were found to be of norovirus capsid type GII.13 and three of them clustered with the GII.13 strains from Thailand, the UK, Bhutan, and Australia (98%-100% nucleotide sequence identity). The highest nucleotide sequence identity (99.5%-100%) was found with the previously reported GII.13 reference strain from Thailand (LC 521517.1). One representative GII.13 strain (NIC-NV-531) was distantly related to the other three representative GII.13 strains and was genetically close to GII.13 strains from Spain and China (98.8%-99.3% nucleotide sequence identity). The polymerase of these circulating GII.13 capsid sequences was found to be of the GII.P16 type (Supplementary Fig. S4b).

Analysis of antigenic epitopes in the major capsid proteins of circulating GII.4 noroviruses

The antigenic epitopes of the P2 domain have been subdivided into eight antigenic sites (A-H) as described by Tohma et al. [32]. The major differences among the GII.4 noroviruses included in the study were predominantly seen encompassing these eight epitopes. The amino acid residues at position 295 of antigenic site A, residue 376 (epitope C), residues 394-396 (epitope D), residues 411 and 414 (epitope E), residue 355 (epitope G), and residue 309 (epitope H) were found to be conserved in the representative eastern Indian strains, the vaccine strain GII.4c (a consensus of three different GII.4 capsids), and also among the reference strains (Supplementary Table S2). Amino acids in the rest of the antigenic epitopes varied between the vaccine strains and the representative strains. The GII.4 strains obtained during the study period were divided into eight groups based on similarity of the accumulating mutations.

Discussion

The aim of this study was to estimate the proportion, seasonal incidence, and genotypic variations of noroviruses among children with acute gastroenteritis in Kolkata, eastern India. Concurrent with RV infection, norovirus infection was predominantly observed in children less than 24 months of age. This is consistent with reports from Africa, Asia, and South America [33,34,35]. In the multisite birth cohort study MAL-ED (Malnutrition and Enteric Disease Study) from different developing countries of South America, Africa, and Asia, norovirus was identified as the first- and second-highest attributable fraction for diarrhea during the first and second year of life, respectively [36]. Similarly, in the multisite study GEMS (Global Enteric Multicenter Study) conducted in sub-Saharan Africa and South Asia, rotavirus was the highest attributable fraction for diarrhea among 0-11 and 12-23-month-old children, while norovirus was identified as the second- and third-highest viral attributable fraction among children belonging to the 0-11 months and 12-23 months age group, respectively [15]. In the current study, the norovirus detection rate was 6.9% and 5.5% among children aged 0-1 year and >1-2 years, respectively. We observed 12.9% of all norovirus-positive patients (n = 22/170) to be coinfected with rotaviruses. Dual infection with rotavirus and norovirus has also been reported previously in eastern India, western India, northern India [17, 21, 25], and the nearby countries Bangladesh (~19%) and Pakistan (~56%) [37, 38].

Natural infection with norovirus in early life may provide immunity that protects children from infection when they are older. Although norovirus infection in different parts of Asia has been reported to be more frequent in boys [25, 37], other studies have revealed no gender-specific differences in infection rates [39]. In the endemic setting of the current study, the infection rate did not differ between boys and girls, suggesting that gender may not be the predisposing factor for norovirus infection in children. Viral gastroenteritis increases during the winter months and has been associated with lower temperatures and higher rainfall, but different seasonal patterns of norovirus infections have been observed in different studies. Some studies have reported a higher frequency of norovirus infection in summer [18, 21, 40], some in winter [41], and some in the spring or rainy season [39]. Norovirus infection prevailed throughout the year with no obvious seasonal peak in eastern India. This observation is consistent with earlier reports from India [42] as well as neighboring Bangladesh [35].

Mutation and recombination are the two major sources of genetic diversity contributing to genetic variation and the generation of viral quasispecies, ultimately leading to viral evolution. In noroviruses, recombination occurs mostly at the ORF1/ORF2 junction, while recombination frequency decreases within ORF1 and ORF2, as well as at the ORF2/ORF3 junction [43]. Recombination at the ORF1/ORF2 junction is a point of interest, as non-structural and structural proteins can be exchanged, and that may alter immune antagonism, fitness and pathogenesis [43]. Worldwide, GII.4 is the most frequent capsid genotype (representing ~70% of capsid sequences), followed by GII.3 (~16%) [44, 45]. GII.4 strains have caused six pandemics since 1995. The most recent GII.4 pandemic strain (Sydney 2012) was found primarily in conjunction with polymerase type GII.P31 (previously GII.Pe), but in the late 2014, a novel norovirus GII.P16 polymerase appeared with the capsid of pandemic GII.4 Sydney 2012 [46, 47]. This pandemic variant was found circulating in various countries, including USA, South Korea, England, Australia, and New Zealand during 2015-2016 and Japan, Germany, France, and southeastern Brazil during 2016-2017 [24, 46,47,48,49,50]. In certain countries, including the USA, Germany, and southeastern Brazil, its prevalence was seen to have eventually surpassed the previously circulating GII.P31-GII.4 strain, [49,50,51], whereas studies from northeastern Brazil and China showed a higher proportion of the GII.P31-GII.4 genotype than the GII.P16‐GII.4 genotype [47, 52, 53].

In addition to GII.4, the global prevalence of the GII.3 capsid type is also noteworthy [40, 47, 54, 55]. In Bhutan, Russia, and Tunisia, genotype GII.3 was found more frequently than GII.4 among norovirus-infected children [39, 56]. The GII.13 capsid genotype has been found in combination with different polymerase types, including GII.Pg, GII.P12, GII.P3, GII.Pb, and GII.P13, worldwide [21, 57]. Genetic variations among the circulating norovirus strains along with inter-genotype recombinant viruses have been reported in India as well [13, 14]. GII.P16-GII.13 was reported previously by Nataraju et al. [25] in eastern India, and this genotype has also been found in Spain, Germany, and Greece [55, 58, 59]. During this study period too, recombinant viruses were identified in a few patients admitted with AGE. Co-circulation of GII.P31-GII.4 Sydney 2012 (n = 31), and GII.P16-GII.4 Sydney 2012 (n = 44), GII.P16-GII.13 (n = 4) was also observed.

Since 2015, the novel GII.P16 polymerase has been found to be associated with multiple capsid types, including GII.1, GII.2, GII.10, and GII.12, and to be distinct from GII.P16 polymerases that have been associated with GII.2, GII.3, GII.13, GII.16, and GII.17 capsid types since the last decade [29, 49, 54, 55, 57]. During the study period, noroviruses with the novel GII.P16 polymerase together with a GII.3 or GII.4 capsid, were found to cluster together and shared the highest sequence similarity with other novel GII.P16 polymerase sequences reported from other parts of the world. GII.P16-GII.3 inter-genotype recombinants have been reported previously from different parts of the world [54, 57], but the GII.P16 polymerase of these GII.3 capsid variants clustered separately from the recently circulating GII.P16 polymerases of GII.P16-GII.3 noroviruses from the UK (KY887597, KY887598, KY887606), Russia (MG892946.3), and the USA (MK773588) [46, 49], thereby establishing a difference between older GII.P16 polymerases and the novel GII.P16 type (Supplementary Fig. S1).

The antigenicity of noroviruses is mediated by specific amino acid residues in the P2 domain of VP1, located in the outer capsid layer of the virus. Neutralization/antibody blockade epitopes in the P2 domain belong to eight groups, A-H. Amino acid residues in epitopes A and D are often targeted for antibody-mediated therapeutic intervention, as they mostly remain exposed on the surface. Amino acid substitutions at residues 294, 368, 372, and 373 of epitope A (which binds to host antibodies) and residues 391 and 393-395 at epitope D (which interact with HBGAs) are often positively selected and contribute to the generation of escape mutants and pandemic strains [9, 10, 60].

The present study underscores the need to perform systematic surveillance to monitor the prevalence of noroviruses after rotavirus vaccination, especially in areas with a high burden of childhood gastroenteritis, and to develop a multivalent vaccine that is effective against the plethora of norovirus variants circulating among humans. The Takeda vaccine formulation has demonstrated promising results in phase I and II trials, suggesting that it might provide better efficacy [47, 61]. During this surveillance study, analysis of the amino acid residues at antigenic epitopes of the GII.4 consensus sequences included in the vaccine revealed a high degree of divergence from the locally circulating strains (Supplementary Table S2). With the emergence of newer GII.4 variants every 3-4 years, it is necessary to monitor whether the current vaccine formulation with a GII.4 VP1 amino acid sequence (GII.4c) designed from a composite of three older variants [10] will be sufficient to provide cross-protection against emerging variants. Continuous monitoring of circulating strains is required to evaluate the prevalence and heterogeneity of circulating noroviruses in different geographical areas for assessing region-specific requirements and the effectiveness of norovirus vaccines.