Human metapneumovirus prevalence and patterns of genotype persistence identified through surveillance of pediatric pneumonia hospital admissions in coastal Kenya, 2007-2016

Human metapneumovirus (HMPV) is an important respiratory pathogen that causes seasonal epidemics of acute respiratory illness and contributes significantly to childhood pneumonia. Current knowledge and understanding on its patterns of spread, prevalence and persistence in communities is limited. We present findings of a molecular-epidemiological analysis of nasal samples from children < 5 years of age admitted with syndromic pneumonia between 2007 and 2016 to Kilifi County Hospital, coastal Kenya. HMPV infection was detected using real-time RT-PCR and positives sequenced in the fusion (F) and attachment (G) genes followed by phylogenetic analysis. The association between disease severity and HMPV genotype was assessed using Chi-square test for independence. the epidemiology and gain insights into the spread and the evolutionary dynamics of HMPV. The two genes (F and G) code for immunogenic surface proteins that are the targets for vaccine development. G gene is highly variable and is used to assess the diversity of HMPV. Similarly, F gene which is quite conserved, has been used to characterise HMPV and is the gene of target in many molecular diagnostic assays. This report extends our previously published work from the same location for the period 2007-11 (41) and the analysis includes sequences from this previous work.

factors underlying the declining prevalence of HMPV in this population should be investigated.
HMPV clinical isolates are classified into two major subgroups, subtype A and B, based on nucleotide differences in the fusion (F) and attachment (G) glycoprotein encoding genes (1,(24)(25)(26). The two groups can be distinguished serologically using group-specific antibodies (24,27,28). Phylogenetic analyses of F and G sequences have divided the two subtypes into genotypes i.e. A1 and A2 (HMPV A), and B1 and B2 (HMPV B) genotypes. A2 is further classified into A2a and A2b sub-lineages (26,29,30). A few studies have revealed increased heterogeneity of A2 due to the identification of a unique sub-lineage, provisionally assigned A2c, first described in 2007 (31). Geographically, HMPV genotypes have been reported to circulate widely and cluster temporally (15), with the exception of genotype A1 that has been identified in few countries (15,31,32). There is frequent co-circulation of genotypes with replacement of the predominant genotype after a period of one or two seasons, although the drivers of this phenomena are unclear (33)(34)(35). Virus prevalence also varies from year-to-year within the same location (16,(36)(37)(38).
Few studies have reported long-term genotype circulation patterns of HMPV, necessary for improved epidemiological understanding and, in due course, of potential value to the design and implementation of control measures. In addition, there are very few studies on HMPV in Africa, which bears a high burden of pneumonia morbidity and mortality (39,40). In this study, we analysed the surface F and G gene nucleotide sequences collected through paediatric pneumonia surveillance at the Kilifi County Hospital in rural coastal Kenya, from 2007 to 2016, to describe the molecular epidemiology and gain insights into the spread and the evolutionary dynamics of HMPV. The two genes (F and G) code for immunogenic surface proteins that are the targets for vaccine development.
G gene is highly variable and is used to assess the diversity of HMPV. Similarly, F gene which is quite conserved, has been used to characterise HMPV and is the gene of target in many molecular diagnostic assays. This report extends our previously published work from the same location for the period 2007-11 (41) and the analysis includes sequences from this previous work.

Methods
Study population. The study was conducted at the Kilifi County Hospital (KCH) as part of long-term surveillance aimed at understanding the epidemiology and disease burden of RSV-associated pneumonia cases (42). KCH, located in coastal Kenya, is a referral hospital serving the population of around 260,000 (circa 2012) within the Kilifi Health and Demographic Surveillance System (43), and beyond in the wider Kilifi County. The population is mainly rural-agrarian. Children (59 months of age) admitted to the paediatric ward between January 1st 2007 and December 31st 2016 were recruited for this study and participants were eligible if they presented with modified WHO defined syndromic severe or very severe pneumonia as previously described (42). During the period between 2007 and 2009, only admissions arising from residents of Kilifi Health and Demographic Surveillance System (KHDSS) were eligible, whereas in later years, non-KHDSS residents were included. A nasal wash, nasopharyngeal (NP) swab or nasopharyngeal/oropharyngeal (NP/OP) swab was obtained for those who provided a written informed consent for study participation and transferred into viral transport medium for laboratory screening. Ethical approval for the study was obtained from the Kenya Medical Research Institute Scientific and Ethics Review Unit.
RNA extraction and rRT-PCR. For samples collected from 2007 to 2011 the methods for extraction and detection were as previously described (44,45). In brief, for samples collected in 2007 RNA was extracted using Magnapure LC32 total nucleic acid extractor (Roche, Manheim, Germany) and virus screening conducted using the LightCycler Fast Start DNA MasterPLUS Hyb-Probe kit (Roche, Mannheim, Germany) following manufacturer's instructions. From 2008 to 2011, sample extraction was by MagNaPure LC32 Kit (Roche, Manheim, Germany) or QIAmp Viral RNA Minikit (Qiagen, Germany) and virus detection carried out using a one-step in-house multiplex real time reverse transcription-PCR (rRT-PCR) using a TaqMan probe-based system targeting the fusion (F) gene (41,45). Samples with a rRT-PCR cycle threshold (Ct) value <35 were considered HMPV positive. From 2012-2016 total RNA extraction was performed using Qiacube HT automated total nucleic acid extractor (Qiagen, Germany) and virus detection carried out using the same one-step in-house multiplex real time reverse transcription-PCR (rRT-PCR) used for samples from 2008-11 (41). include additional 31 sequences that were detected by different rRT-PCR assay and therefore they were excluded in this analysis. The accession numbers for the excluded sequences (15 for F gene and 16 for G gene) are given in additional figure S1. For subsequent analyses, previously sequenced data for HMPV positives identified between 2007 and 2011 i.e. F (n=123) and G (n=56) genes (41) (accession numbers KT191299 to KT191484) was combined with the newly sequenced data, making a total of 210 sequences for F gene and 118 sequences for G gene used in subsequent analyses.
Sequences were aligned with MAFFT v7.220 (47). Nucleotide and amino acid mean genetic distances within and between genotypes were determined using Kimura-2 parameter model in MEGA v7. 0.2.6 (48). Further, to assess sequence diversity within epidemics, collated G gene sequences were aligned by epidemic sampling to show changes in the G gene within epidemics.
Phylogenetic analysis. Best fitting nucleotide substitution and site heterogeneity models were estimated in MEGA v7. 0.2.6 and phylogenetic trees inferred using Maximum Likelihood (ML) with 1000 bootstrap replicates. Genotypes were assigned based on F and G gene ML trees using previously described F and G gene references from GenBank (FJ168778, AY525843, AF371337, AB503857, AY530095 and GQ153651) (49). Genotypes were confirmed if sequences clustered with the reference sequences within a major branch with >70 % bootstrap support on the ML tree. Substitution rates and time to the most recent common ancestor (tMRCA) were determined under uncorrelated lognormal relaxed molecular clock using Bayesian Markov Chain Monte Carlo-based approach implemented in BEAST v1.8.4. Demographic histories of HMPV genotypes were inferred using GMRF Bayesian Skyride model. BEAST analysis was set at 50 million states with sampling every 2500 steps.
To assess the phylogenetic relatedness of the Kilifi strains with globally circulating strains, contemporaneous sequences (2007 to 2016) were obtained from GenBank for both F and G genes.
Maximum Likelihood trees were prepared for the combined Kilifi and global data and branch reliability analysed by bootstrap support for 1000 replicates. Mean nucleotide genetic distances were also determined to assess sequence similarity between Kilifi and the global data set.
Selection pressure analysis. Selection pressure within F and G genes was analysed by estimating the ratios of non-synonymous to synonymous substitutions (dN/dS = ω), using the codon-based phylogenetic method (CODEML) in PAML v 4.8a. Different site models were employed i.e M0 (one ratio), M1a (Nearly Neutral), M2a (Positive Selection) and M3 (discrete) model. M0 model calculates a rough average value (ω) distributed across all sites (homogeneous ratios). M1a assumes two categories of sites: conserved sites with ω = 0 and neutral sites with ω = 1. M2a allows for three classes i.e. neutral ω = 1, purifying (0 , ω 1), and positive selection ω > 1. M3 estimates ω ratios as M2a but using an unconstrained discrete distribution (50,51). The models M0 and M1a were used to test for selection over the entire sequenced regions while M2a and M3 tested for positively selected sites. Only sites with posterior probabilities >95% were considered as positively selected based on Bayes Empirical Bayes method.
Genotype changes and Disease severity. We did not observe a statistically significant association between HMPV genotype and pneumonia severity (p-value=0.264) ( Table 3).

Discussion
There remains limited information on long-term circulation patterns and prevalence of HMPV in Africa.
This study reports HMPV subtype incidence and molecular evolution in coastal Kenya over a 10-year period of surveillance (2007 to 2016). HMPV incidence in children under 5 years of age varied annually ranging from 1.2% to 8.7% and 71.5% of HMPV infections were in children aged <2 years.
Wide variation in annual prevalence has been reported elsewhere from long-term studies of acute respiratory infection (ARI) aetiology in children: in Germany, HMPV prevalence over 10 years of surveillance varied between 1.4% and 32.8% (15), and in Italy, the prevalence fluctuated from 37% to 7% and then to 43% over a 3 year period (36). To date the cause of the varied prevalence and whether there is difference in disease severity among HMPV genotype remains unclear. Different studies report contrasting results on the association of disease severity among the HMPV genotypes (21,53). In this study, no significant association was found between genotype and disease severity or clinical presentation. Further studies are necessary to evaluate whether changes in genotypes are associated with changes in disease severity and prevalence. Climatic factors such as temperature, rainfall and relative humidity have also been shown to influence HMPV activity in the tropics (31). In this study, seasonal increase in cases tended to coincide with low rainfall, lower relative humidity and higher temperatures. Further investigation is required to determine whether changes in such environmental factors influence HMPV activity.
HMPV A2b genotype dominated between 2007 and 2011, while B1 dominated between 2012 and recent years, as similarly reported in Bangladesh and Croatia (54,55), suggestive of global spread of HMPV variants. In contrast to continued circulation of sub-lineage A2b globally (56), in Kilifi, A2b circulated from 2007 to February 2012 and no longer. The replacement patterns of HMPV genotypes could be attributed to genotype-specific herd immunity offering protection against circulating homologous strains but not against heterologous strains. This observation is not unique to Kilifi, as similar patterns were reported from long-term surveillance studies in Germany and France (15,37).
Consistent with earlier studies, G gene exhibited higher rates of evolution compared to F gene (25). F gene is more conserved and our findings concur with previous studies on F protein diversity (11). The rates of G gene evolution were in the range of 3.42× 10-3 to 9.57× 10-3 , similar to findings from previous studies (57,58). Among the genotypes, the rates of evolution for A2b and B1 were at least 2 fold higher than other genotypes, this concurs with the findings from a similar study (57). Compared to other Pneumoviruses, such as human respiratory syncytial virus (HRSV), HMPV G gene has been shown to have higher rates of evolution (58). Our estimates show comparable results, the rates for HMPV A2b and B1 were 3 to 4 times higher than those reported for HRSV (3.58× 10-3 , 95% HPD = 3.04× 10-3 to 4.16× 10-3 ) (59).
Temporal analysis revealed circulation of multiple genotypes within a single epidemic. Further analysis of the G glycoprotein encoding gene revealed circulation of multiple variants for a single genotype within an epidemic. Multiple variants observed might be attributed to high rate of G gene evolution (substitutions/site/year) resulting in diversification in situ during a season. However, the level of variation observed within a genotype within epidemics was greater that would be expected from in situ diversification, implying multiple variant introductions of a genotype during single epidemic years.
As is the norm for HMPV detection in respiratory samples, we used molecular PCR-based diagnostics (16). Recent studies have shown that mutations at primer and probe binding sites can lead to false negative diagnostic results and hence underestimation of disease burden (60,61). Evaluation of the in-house diagnostic rRT-PCR assay would be important to determine whether there are any missed variants/genotypes and whether this can be associated with the apparent gradual decline in HMPV prevalence observed in the current dataset. There is some evidence that reduction in bacterial pneumonia (eg S. pneumoniae), as has been seen in this coastal location over this study period (62), results in a reduction in viral pneumonia (63,64). Future investigations will be necessary to give more insight.
The F and G protein encoding genes were generally determined to be under purifying selection pressure, which drives RNA virus evolution by purifying the deleterious mutations due to RNA replication errors (65). However, for the B2 genotype, a higher dN/dS ratio was observed in G gene sequences suggestive of diversifying selection within B2 viruses. The distinct diversifying selection and persistence of the B2 viruses observed requires further investigation. Overall, our F gene sequence analysis was based on 345 bp, and therefore, although our analysis was based on 205 sequences, the genetic distance estimates, evolutionary and selection pressure analysis were limited to the partial length which may reduce the potential to discriminate genetic clusters. In addition, the newly designed G gene subtype specific primers allowed sequencing of all HMPV subtypes for the newly generated data and significantly improved G PCR recovery by two-fold compared to previously reported assay. This improved the study power to characterise the different circulating HMPV variants.
Our analyses on HMPV epidemiology in Kilifi were limited to two viral genes only. Whole genome sequencing might give more insights into transmission, HMPV genotype characterisation and molecular evolution. Our estimation and inference on the association between HMPV genotype and disease severity was biased to in-patient surveillance data, and therefore future studies should include outpatient surveillance data. In addition, inpatient surveillance and sampling of HMPV infections might not be representative of the full variant population circulating in this community and not correctly reflect incidence and prevalence, as HMPV infections have also been reported in outpatient settings (66). Future studies across different locations in Kenya and in Africa will be important for tracing the introduction and transmission patterns of the virus.

Conclusions
In conclusion, this report shows HMPV activity characterised by high annual variation in prevalence, genotype prevalence patterns reflecting those in other locations globally, gradual replacement of genotypes over time, and multiple circulating variants each epidemic varying year to year. This suggests, widescale and rapid circulation of HMPV genotypes, seasonal epidemics resulting from multiple introductions rather than single source, and evidence of herd immunity replacement at the genotype and within-genotype cluster level.   Additional table S1: Newly designed F and G gene PCR and sequencing primers.