Molecular characterization of rotavirus group A strains circulating prior to vaccine introduction in rural coastal Kenya, 2002-2013

Background: Kenya introduced the monovalent Rotarix® rotavirus group A (RVA) vaccine nationally in mid-2014. Long-term surveillance data is important prior to wide-scale vaccine use to assess the impact on disease and to investigate the occurrence of heterotypic strains arising through immune selection. This report presents baseline data on RVA genotype circulation patterns and intra-genotype genetic diversity over a 7-year period in the pre-vaccine era in Kilifi, Kenya, from 2002 to 2004 and from 2010 to 2013. Methods: A total of 745 RVA strains identified in children admitted with acute gastroenteritis to a referral hospital in Coastal Kenya, were sequenced using the di-deoxy sequencing method in the VP4 and VP7 genomic segments (encoding P and G proteins, respectively). Sequencing successfully generated 569 (76%) and 572 (77%) consensus sequences for the VP4 and VP7 genes respectively. G and P genotypes were determined by use of BLAST and the online RotaC v2 RVA classification tool. Results: The most common GP combination was G1P[8] (51%), similar to the Rotarix® strain, followed by G9P[8] (15%) , G8P[4] (14%) and G2P[4] (5%). Unusual GP combinations—G1P[4], G2P[8], G3P[4,6], G8P[8,14], and G12P[4,6,8]—were observed at frequencies of <5%. Phylogenetic analysis showed that the infections were caused by both locally persistent strains as evidenced by divergence of local strains occurring over multiple seasons from the global ones, and newly introduced strains, which were closely related to global strains. The circulating RVA diversity showed temporal fluctuations both season by season and over the longer-term. None of the unusual strains increased in frequency over the observation period. Conclusions: The circulating RVA diversity showed temporal fluctuations with several unusual strains recorded, which rarely caused major outbreaks. These data will be useful in interpreting genotype patterns observed in the region during the vaccine era.


Introduction
Rotavirus group A (RVA) infection is a leading cause of childhood severe dehydrating acute diarrhoea, which can lead to death 1 . The 2016 estimates show that, annually, RVA is responsible for 128,500 deaths globally, with the highest burden occurring in sub-Saharan Africa and South-East Asia countries 2 . In 2009, the World Health Organization (WHO) recommended the inclusion of either of the two licensed RVA vaccines (Rotarix® and RotaTeq®) into national immunization programmes (NIPs) of all countries to curb RVA associated disease burden 3 . Kenya introduced the monovalent Rotarix® vaccine (based on the G1P[8] strain) into its NIP in July 2014 4 .
In Africa, the introduction of the Rotarix® vaccine into the NIPs of several countries has been associated with a marked reduction in hospitalization caused by RVA infection 5,6 . For instance in Malawi, Burkina Faso and Tanzania, the vaccine effectiveness against hospitalization was estimated at 62%, 58% and 53%, respectively 7 . However, this effectiveness is lower than that observed in developed countries; for example in Belgium, vaccine effectiveness of Rotarix vaccine was estimated at 90% 8 . Furthermore, concerns remain that in time, given the high diversity of RVA strains, vaccine immunity escape variants could emerge which may undermine the gains from the vaccination programmes 9 . Such a scenario was observed in Japan where, a G8P[8] RVA strain appeared to emerge and was found in up to 66% (53/80) of children with acute gastroenteritis disease attending one pediatric clinic in Shizuoka Prefecture in February -July 2017 10 . Similarly, the predominance of non-vaccine type G2P[4] strains was observed in Rotarix® vaccinated populations of Belgium 11 and Brazil 12 . Although the rise was not linked to pressure induced by the vaccine, concerns still remain on the overall effect of the vaccine on circulating vaccine heterotypic strains.
The rotavirus genome is comprised of 11 segments of double-stranded RNA, which encode 11 proteins (VP1-4, VP6, VP7, NSP1-5) and sometimes 12 (NSP6). The VP7 and VP4 proteins independently elicit neutralizing antibodies and specify the G (glycoprotein) and P (protease-sensitive) genotypes, respectively 13 . Molecular characterization of the VP7 and VP4 proteins encoding regions is commonly used to investigate local and global RVA molecular epidemiology and is the basis of the dual genotype classification of this virus 14 . Up to 36 different G and 51 P RVA genotypes have been identified worldwide in humans and animals 15  recently been reported as emerging genotypes. The distribution of the genotypes can vary considerably from region to region and from one season to the next 16,17 . While globally dominant genotypes are similarly dominant in Africa 18 , understanding of their local natural seasonal fluctuations and intra-genotype diversity in the pre-vaccine introduction era is incomplete despite importance to vaccine impact evaluation.
The current study presents molecular analysis of historical RVA strains from coastal Kenya detected between 2002-2004, reported in Nokes et al. 19 , which we refer to as phase I, together with more recent RVA strains detected between 2010-2013, referred to as phase II. We present findings from partial sequence analysis of these longitudinally collected RVA strains identified at the Kilifi County Hospital (KCH), Kilifi, Kenya, and phylogenetically compare these with those deposited in public databases isolated across the globe. The GP typing of phase I strains was previously performed by nested multiplex PCR using genotype-specific VP7 and VP4 primers 19 . We utilize these extensive sequence data to illuminate on local RVA genotype circulation characteristics and provide baseline information on natural patterns of RVA genotype diversity in coastal Kenya prior to vaccine introduction.

RVA surveillance in Kilifi County Hospital
RVA surveillance in KCH reported in this analysis was conducted from January 2002 to December 2004 (phase I), and from January 2010 to December 2013 (phase II). Study subject recruitment criteria and sample collection methods are as previously described 19 . The study targeted children aged less than 13 years admitted with acute diarrhoea defined as three or more watery stools passed during a 24-hour period 20 . The KEMRI Scientific and Ethics Review Unit (SERU) in Kenya approved the study protocol (#SERU 3049).
Detection of RVA Stool samples were screened for RVA using an enzyme immunoassay (EIA) kit, marketed under two different names in the two periods: IDEIA (DAKO Rotavirus IDEIA TM , Oxoid, Ely, United Kingdom) in phase I and ProSpect TM (Oxoid, Basingstoke UK) in phase II, following the manufacturer's instructions.

Amendments from Version 1
This version contains revisions and comments as suggested by the reviewers. Importantly, we have modified the text to provide sufficient information on how mixed infections were detected in phase I and the limitations of the use of partial sequence data in detecting the same in phase II surveillance period.

See referee reports
REVISED electrophoresis in a 2% agarose gel. Products of samples that showed presence of the expected band size on gels were purified using GFX DNA purification kit (GFX-Amersham, UK) following the manufacturer's instructions. These were then sequenced using Big Dye Terminator 3.1 (Applied Biosystems, Foster City, California, USA) chemistry and the same primers as in PCR amplification on an ABI Prism 3130xl Genetic Analyser (Applied Biosystems, Foster City, California, USA).

RVA genotyping and sequence analysis
During phase I surveillance, RVA genotypes were determined by nested multiplex PCR using genotype-specific VP7 and VP4 primers, enabling identification of mixed genotypes 19 . In phase II, genotypes were determined by sequence of the VP7 and VP4 genes, which was limited in calling mixed infections. The sequence reads were assembled into contigs using Sequencher version 5.4.6 (Gene Codes Corp Inc., Ann Arbor, MI, USA). The nucleotide sequences were aligned using MAFFT version 7.222 23 and visualized in Aliview version 1.8 and further trimmed to remove sequence overhangs, resulting in contigs of lengths between 480-660 bp (coordinates; 184-748) covering ~23% of the VP4 gene, and 486-854 bp (coordinates; 460-824 of the VP7 gene) covering ~67% of the VP7 gene. G and P genotypes were determined using NCBI BLAST for sequences <500 bp (n=13 for VP4, n=5 for VP7) and the RotaC version 2.0 classification tool 24 for sequences >500 bp. MEGA v7.0.26 was used to select the best maximum likelihood evolution models based on the Bayesian Information Criterion 25 (Supplementary Table 1) and reconstruction of maximum likelihood phylogenetic trees with 500 bootstrap replicates. Global contemporaneous sequences (2002-2013) (accession numbers in Supplementary File 1, lists 1 and 2) together with the Rotarix® vaccine strain sequences were retrieved from GenBank database and phylogenetically compared with the local sequences. Duplicate sequences from the same country and non-overlapping sequences were removed. Clusters were identified based on high bootstrap values of >70% and high nt sequence similarity of >98%. Nucleotide and amino acid pairwise distances between the sequences were determined in MEGA v7.0.26. The trees were drawn to scale indicating nucleotide substitution rates per site.

Results
The prevalence of the genotypes and their circulation patterns were inferred using all the data collected in 2002-2004 (phase I) surveillance period irrespective of whether they were selected here for sequencing and all data collected between 2010-2013 (phase II). Data are available under restriction on Harvard Dataverse 26 .
RVA prevalence in KCH pediatric diarrhoea admissions Over the 7-year surveillance period, a total of 3,779 stool samples were screened for RVA using EIA, of which 27.3% (n=1,031) tested positive. In phase I, the prevalence of RVA in the study population was 27.4% (n=558) while in phase II the prevalence was 27.2% (n=473) ( Table 1). Among the selected samples for sequencing was successful for 569 (76%) and 572 (77%) samples for the VP4 and VP7 segments, respectively (Table 1).

RVA genotypes in the study populations
The G genotypes identified in patients admitted at the KCH were G1-G3, G8-G10, G12, G29, while the P genotypes were P were also detected, albeit in low frequency (<5%). The use of genotype-specific primers to identify RVA GP genotypes in phase I enabled the detection of mixed infections in 8.2% of the samples collected between 2002-2004; however, in phase II we used the sequencing approach to infer genotypes and this approach is not suited for detection of mixed infections. Additionally, among both phase I and II samples, 9.2% of these samples were genotyped for only one of the two genes due to failed in sequencing and/or contig assembly for the second gene (Table 2) Total 26 100 34 100  phases. Nevertheless, a high sequence homology of >95% at the nucleotide level was observed within G2 strains. Such high sequence homology was also observed in G9 strains, which were observed in high frequencies in all epidemic years except 2013.

Phylogenetic placement of Kilifi strains in the global context
The placement of Kilifi strains in the global context is shown in Figure 3 and  were retrieved from GenBank and phylogenetically compared to the observed genotypes. Duplicate sequences from strains isolated from the same country were removed. The G type in this samples (G8) clustered closely to other G8 strains isolated from humans with a nucleotide and amino acid (aa) identity of 95% and 99%, respectively, and G8 strains isolated from camel showing a nucleotide and aa identity of 94% and 98%, respectively ( Figure 5A). The P[14] genotype showed a high sequence similarity to other P[14] strains isolated from humans and bovine with a nucleotide similarity of 96% and 93%, respectively and aa identity of 98% ( Figure 5B).

Discussion
This study describes the molecular epidemiology of pre-vaccine RVA strains that circulated in Kilifi, Coastal Kenya from 2002-2004 and 2010-2013. The study spans over a decade in the period before introduction of the nationwide routine RVA vaccination programme in Kenya. The work builds on a previous study 19 analyzing strains collected between 2002-2004 (phase I) which highlighted the importance of genotypes G1, G8 and G9 in sub-Saharan Africa during the study period. In phase I surveillance period, genotype-specific primers were used to characterize the strains into different G and P genotypes and this strategy allowed the detection of mixed genotypes in some samples.
In the present analysis, a fraction of phase I (46%) and all phase II RVA samples were sequenced, assigned to GP as per the guidelines of the Rotavirus Classification Working Group 24 and subjected to phylogenetic analysis.    Overall, the observed strains showed a high nucleotide sequence homology of up to 100%, as observed in the different genotypes. The close genetic relationship of strains observed in phase I and phase II suggest a persistence in circulation of these RVA strains to continuously cause the observed epidemics. In addition, the exclusive clustering of majority of Kilifi strains from the global strains shows that theses strains might have been localized in Kilifi over a long period of time. However, few strains that formed three distinct clusters in both G1 and P[8] global trees, supports the notion of separate introductions and persistence of possibly foreign strains in this setting. Although cases of re-assortment and possible introductions is evident, partial data from only two genes is insufficient in providing a complete understanding of the genetic diversity of such common and not common genotypes. Full genome sequencing will thus illuminate on the complete genomic constellations of these strains and provide data on their evolutionary dynamics. The marked seasonal and longer-term changes in genotype distribution observed in this pre-vaccine surveillance should be considered when interpreting changes to genotype patterns that may follow the introduction of rotavirus vaccine in any setting.
This study had several limitations, e.g. firstly, the exclusive use of di-deoxy sequencing method to genotype phase II samples curtailed our ability to detect mixed infections. During phase I, we identified samples from patients who were infected by more than one RVA genotype since we had used the primerbased methods for genotyping. Di-deoxy sequencing identifies only the dominant genotype in mixed infections resulting to one genotype. The sequencing chromatograms of samples identified as mixed infections in phase I, appeared clean and monoinfected, with no background indicators of co-infections. Secondly, the classification of the strains into lineages and sublineages was limited due to the short consensus sequences, since only ~23% and ~67% of the VP4 and VP7 genes were sequenced, respectively. Thirdly, it was not possible to perform comparative analysis of the rare genotype G29 due to unavailability of cognate sequences in GenBank. Only a single reference sequence for genotype G29 had been deposited in GenBank by the time of this analysis.
In conclusion, this study shows that most of the pre-vaccine RVA infections and epidemics have been caused by a diverse range of RVA strains which fluctuated in prevalence from season to season, with some persistent in circulation for a long period. Additionally, new strains might have been introduced in this population and contributed significantly to the epidemics experienced in the pre-vaccine period. The recommendation by WHO for countries to vaccinate infants against rotavirus infection led to the inclusion of Rotarix TM vaccine in the childhood immunization programme in Kenya. In addition to reducing hospitalization caused by RVA diarrhoea, the vaccine has been reported to offer protection against both homotypic and heterotypic RVA strains 9 . With the increase in diversity of circulating strains and emergence of rare strains in Kilifi, continuous monitoring will help evaluate the performance of this vaccine against the circulating strains.

Data availability
The replication data and analysis data for this manuscript are available from the Harvard Dataverse: https://doi.org/10.7910/ DVN/LVGYYW 26 .
Owing to data personal protection concerns, these data are restricted, but will be made available to researchers who meet the criteria for access to confidential data. Details of the criteria for sharing data and the conditions under which data are made available can be found in the KEMRI-Wellcome data sharing guidelines. Users who wish to use the data should send a request to the KEMRI Wellcome Trust Research Programme data governance committee, which can be contacted by emailing: dgc@kemri-wellcome.org.

Nucleotide sequence accession numbers
Partial sequences for the VP7 and VP4 genes reported in this work were deposited in the GenBank database under the sequential accession numbers MH402005-MH402781 and MH402782-MH403560 for the VP7 and VP4 genes, respectively. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Click here to access the data Click here to access the data limited sequence data for tracking RVA transmission. We believe that this study provides an important baseline for future studies especially those attempting to interpret post-vaccine introduction strain patterns.

Supplementary
Reviewer comment; I recommend that authors take this one step further, by performing a next-generation sequencing on selected genotypes. This will give the public more information on the interaction between the vaccine strains and the wild type strains in Kenya. : Our response We thank the reviewer for this suggestion. We are developing an in-house NGS protocol but the results will form the basis of a separate publication. : We agree that our sequencing strategy could not support calling mixed infections. Our response This is a limitation we highlight in the discussion section of the manuscript. As explained in the manuscript, the study was divided into two phases (I & II). In phase I (2002-2014), genotyping was previously done using the PCR primer-based strategy . Phase II of the study used sequencing to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2923076/ genotype. We made this clear through the manuscript in this revised version.
The above data adds very little to the information we have on rotavirus genotypes circulating in Kenya.
: We appreciate this concern but would also like to point out that RVA genotypic and Our response sequence data from Kenya in the period leading to the vaccine introduction is sparse. We examine and present a reasonably large dataset to explore patterns in one part of the country in a period spanning a decade to show natural fluctuations and the extent of genetic diversity in circulating strains in the absence of a vaccination programme. We further examined the global context of the local strains to understand the nature of their source year-in year-out. We believe that this study provides an important baseline for future studies especially those attempting to interpret local strains to understand the nature of their source year-in year-out. We believe that this study provides an important baseline for future studies especially those attempting to interpret post-vaccine introduction strain patterns.
I recommend that authors take this one step further, by performing a next-generation sequencing on selected genotypes. This will give the public more information on the interaction between the vaccine strains and the wild type strains in Kenya.
: We thank the reviewer for this suggestion and we totally agree with it. We actually Our response have already started generating some sequence data using the NGS approach but are convinced that this will be best presented as a separate report in which the focus will not be genotype prevalence and long-term patterns.
No competing interests were disclosed. Owor conducted a longitudinal epidemiological study of rotavirus genotype distribution patterns in et al Coastal Kenya before the nationwide introduction of rotavirus vaccine. The authors employed methods such as enzyme immunoassay (EIA) for the detection of group A rotavirus (RVA) and partial sequencing of RVA positive samples in VP4 and VP7 segments for G and P genotyping. Data analysis reveals remarkable genetic diversity of RVA strains circulating in this area, characterized by substantial frequencies of unusual, mixed and emerging genotypes. Temporal fluctuation in RVA genotypes was observed, with major shifts in G-P predominance involving G1P[8] and G8P[4].
The study was well conducted and the manuscript well written. The findings of this study are timely in light of the recent introduction of rotavirus vaccine in Kenya and provide the baseline data necessary for the assessment of vaccine effectiveness. This baseline data will also allow monitoring of RVA G and P genotype changes that may alter vaccine effectiveness or that may be a result of vaccination, such as possible breakthrough events under vaccine immune selective pressure.
Of noteworthy, a rare G8P[14] strain was detected in this study and the partial sequencing of this strain indicated that its VP7 segment is closely related to humans and animals while its VP4 segment clustered closely to that of human and bovine origin. Due to the unconventional nature of this and many other uncommon strains detected in this study, it will be useful to sequence and characterize the full genomes of the representative strains in order to provide important insights into their evolutionary dynamics.
Furthermore, since the uncommon strains, such as the ones detected in this study are either partially or fully heterotypic to the currently licensed RVA vaccines (RV1 and RV5), vaccine effectiveness against