Molecular characterization of respiratory syncytial viruses circulating in a paediatric cohort in Amman, Jordan

Respiratory syncytial viruses (RSVs) are an important cause of mortality worldwide and a major cause of respiratory tract infections in children, driving development of vaccine candidates. However, there are large gaps in our knowledge of the local evolutionary and transmission dynamics of RSVs, particularly in understudied regions such as the Middle East. To address this gap, we sequenced the complete genomes of 58 RSVA and 27 RSVB samples collected in a paediatric cohort in Amman, Jordan, between 2010 and 2013. RSVA and RSVB co-circulated during each winter epidemic of RSV in Amman, and each epidemic comprised multiple independent viral introductions of RSVA and RSVB. However, RSVA and RSVB alternated in dominance across years, potential evidence of immunological interactions. Children infected with RSVA tended to be older than RSVB-infected children [30 months versus 22.4 months, respectively (P value = 0.02)], and tended to developed bronchopneumonia less frequently than those with RSVB, although the difference was not statistically significant (P value = 0.06). Differences in spatial patterns were investigated, and RSVA lineages were often identified in multiple regions in Amman, whereas RSVB introductions did not spread beyond a single region of the city, although these findings were based on small sample sizes. Multiple RSVA genotypes were identified in Amman, including GA2 viruses as well as three viruses from the ON1 sub-genotype that emerged in 2009 and are now the dominant genotype circulating worldwide. As vaccine development advances, further sequencing of RSV is needed to understand viral ecology and transmission, particularly in under-studied locations.

ssRNA (~15 kb) strand that encodes 11 proteins. Two of the proteins are glycosylated, enabling fusion and attachment (F and G proteins, respectively) [7]. Two antigenic subgroups, RSVA and RSVB, are defined based on mAb reactivity to the glycoproteins [13]. Both antigenic subgroups co-circulate worldwide, causing seasonal epidemics, although RSVA is thought to be more prevalent and cause more severe illness than RSVB [11]. However, a higher prevalence of RSVB than of RSVA has been reported in several studies [14][15][16], and further understanding of the prevalence and immunological interactions between the antigenic subgroups is needed. Models of alternating epidemics have been proposed, driven by waning herd immunity [15,17].
The molecular epidemiology of the virus has been characterized in multiple regions [10,[18][19][20], but there is a paucity of genetic sequence data for RSV, at least relative to influenza, and it remains unknown where new viral lineages originate and what drives their emergence. New RSV strains periodically emerge and become globally dominant, and 10 genotypes for RSVA and 13 RSVB genotypes have been identified to date [21,22]. In the 1990s, a new RSVB genotype (BA) emerged with a duplication in the G gene and quickly disseminated globally [23,24]. During the 2010-2011 winter season, a duplication in the G gene was detected among RSVA GA2 viruses, giving rise to the ON1 sub-genotype that also spread worldwide and became dominant in many countries [24][25][26][27][28][29][30][31][32]. The clinical impact of different genotypes has not been fully elucidated, as some studies report differences in disease severity [33][34][35][36][37][38], while others have seen similar clinical illness [36][37][38].
Intensive surveillance of RSV in infants in a paediatric cohort in Amman, Jordan, has revealed a high burden of RSV in hospitalized children [39][40][41]. To further understand the molecular evolution and transmission of RSV in this population, we obtained whole-genome sequences from 85 RSV samples collected during 2010-2013. We found that RSV epidemics in Jordan comprised multiple independent viral introductions of RSVA and RSVB, although one subgroup strongly dominated each year.

Sample and clinical data collection, and RSV diagnosis
All procedures were carried out in accordance with the ethical standards of the Jordanian Ministry of Health, University of Jordan and Vanderbilt University institutional review boards. After consent, trained local research staff obtained nasal and throat swabs. Standardized questionnaires were used to record demographic, clinical and socio-economic data. Parents were queried in Arabic, and bilingual research staff transcribed the information into an English-language case report form at the time of the interview. Demographic data, including age, sex, race and ethnicity, were recorded at the time of enrolment. Viral surveillance with clinical and demographic data in children<2 years old admitted with respiratory symptoms and/or fever was conducted at the Al-Bashir Government Hospital, Amman, Jordan from March 16 2010 to March 31 2013 [40,41]. Nasal and throat swabs were collected and combined in transport medium (M4RT; Remel), aliquoted into MagMAX lysis/binding solution concentrate (Life Technologies), snap frozen, and stored at −80 °C. Original and lysis buffer aliquots were shipped on dry ice and were tested by reverse transcriptase (RT)-PCR for 11 respiratory viruses [RSV, human metapneumovirus (HMPV), human rhinovirus (HRV), influenza (flu) A, B and C, parainfluenza (PIV) virus 1, 2 and 3, adenovirus (ADV), and Middle East respiratory syndrome coronavirus (MERS-CoV)] [40].
From the 2010-2013 study period, a total of 3168 children's respiratory samples were processed for viral diagnostics, where>80 % tested positive for at least one virus, with RSV being the most common virus detected (44 %). Out of 1397 RSV-positive samples, 93 were randomly selected from all the 3 years with quantitative RT-PCR C t value <29 and were processed for whole-genome next-generation sequencing of RSV. Out of the 93 samples, 7 samples were taken from study subjects who had died due to RSV disease.

Viral RnA extraction, whole-genome sequencing, assembly and annotation
Extraction of the viral RNA was performed using 140 µl nasal wash sample in viral transport medium using the viral RNA mini kit (Qiagen). Four forward reverse transcription (RT) primers were designed and four sets of PCR primers were manually picked from primers designed across a consensus of complete RSV genome sequences as described before [11]. The four forward RT primers were diluted to 2 µM in water

Impact Statement
With multiple respiratory syncytial virus (RSV) vaccine candidates in development, it is crucial to understand the evolutionary dynamics of the virus, including in regions where there is little data available, such as the Middle East. By investigating RSV in a paediatric cohort in Amman, Jordan, we were able to study the population dynamics of the virus over time, including the alternating dominance of the RSVA and RSVB subgroups, the association between viral genetics and symptom severity, and how different strains are introduced or persist over time in this region. Fig. 1. Global RSVA phylogeny. MCC tree inferred for 669 viruses sampled globally, coloured by genotype. The yellow-filled circles indicate 58 samples collected from a paediatric cohort in Amman, Jordan, which cluster within the GA2 genotype and ON1 sub-genotype. A few tips from countries other than Jordan are collapsed for visual clarity. Green dotted lines indicate samples from patients with bronchopneumonia. Posterior probabilities are indicated as support for nodes defining a genotype. A more detailed tree of RSVA samples collected in Amman is provided in Fig. 4. and pooled in equal volumes. cDNA was generated from 4 µl undiluted RNA, using the pooled forward primers and SuperScript III reverse transcriptase (Thermo Fisher Scientific). Four independent PCR reactions were performed on 2 µl cDNA template using either AccuPrime Taq DNA polymerase (Thermo Fisher Scientific) or Phusion high fidelity DNA polymerase (New England Biolabs) to generate four overlapping ~4 kb amplicons across the genome. Amplicons were verified with 1 % agarose gels, and excess primers and dNTPs were removed by treatment with exonuclease I (New England Biolabs) and shrimp alkaline phosphatase (Affymetrix) for 37 °C for 60 min, followed by incubation at 72 °C for 15 min. Amplicons were quantitated using a SYBR Green dsDNA detection assay (SYBR Green I nucleic acid gel stain; Thermo Fisher Scientific), and all four amplicons per genome were pooled in equal concentrations.
Samples were sequenced using both MiSeq (Illumina) and Ion Torrent PGM (Thermo Fisher Scientific) to overcome platform-specific errors. For the Illumina sequencing, libraries were prepared using the Nextera DNA sample preparation kit (Illumina) with half reaction volumes for MiSeq. Briefly, 25 ng pooled DNA amplicons were tagmented at 55 °C for 5 min. Tagmented DNA was cleaned with the ZR-96 DNA clean and concentrator kit (Zymo Research) and eluted in 25 µl resuspension buffer. Illumina sequencing adapters and barcodes were added to tagmented DNA via PCR amplification, where 20 µl tagmented DNA was combined with 7.5 µl Nextera PCR master mix, 2.5 µl Nextera PCR primer cocktail and 2.5 µl each index primer (Integrated DNA Technologies) for a total volume of 35 µl per reaction. Thermocycling was performed with 5 cycles of PCR, as per the Nextera DNA sample preparation kit protocol (3 min at 72 °C, denaturation for 10 s at  . Root-to-tip divergence. A plot of sampling time versus genetic distance, inferred from trees of RSVA and RSVB inferred using maximum-likelihood methods. Both RSVA and RSVB datasets display a strong temporal signal for a molecular clock, as shown by the high R 2 .  Quantitative PCR was performed on the pooled, barcoded libraries to assess the quality of the pool and to determine the template dilution factor for emulsion PCR. The pool was diluted appropriately and amplified on Ion Sphere particles (ISPs) during emulsion PCR on an Ion One Touch 2 instrument (Thermo Fisher Scientific). The emulsion was broken, and the pool was cleaned and enriched for template-positive ISPs on an Ion One Touch ES instrument (Thermo Fisher Scientific). Sequencing was performed on the Ion Torrent PGM using 318v2 chips (Thermo Fisher Scientific).

RSV genome assembly and annotation
Sequence reads were sorted by barcode, trimmed and de novo assembled using CLC Bio's clc assembler program, formerly known as clc novo assembly (http:// resources. qiagenbioinformatics. com/ manuals/ clcgenomicsworkbench/ 852/ index. php? manual= De_ novo_ assembly. html), and the resulting contigs were searched against custom, full-length RSV nucleotide databases to find the closest reference sequence. All sequence reads were then mapped to the selected reference RSV sequence using CLC Bio's clc_mapper_legacy, formerly called as clc_ref_assemble_long program (http:// resources. qiagenbioinformatics. com/ manuals/ clcassemblycell/ current/ index. php? manual= Options_ clc_ mapper_ legacy. html). At loci where both Ion Torrent and Illumina sequence data agreed on a variation (compared with the reference sequence), the reference sequence was updated to reflect the difference. A final mapping of all nextgeneration sequences to the updated reference sequences was performed with CLC Bio's clc_mapper_legacy program. Curated assemblies were validated and annotated with the viral annotation software called Viral Genome ORF Reader, VIGOR 3.0 [42], before submission to GenBank. VIGOR was used to predict genes, perform alignments, ensure the fidelity of ORFs, correlate nucleotide polymorphisms with amino acid changes and detect any potential sequencing errors. The annotation was subjected to manual inspection and quality control before submission. All 85 sequences generated as part of this study were submitted to GenBank as part of the BioProject ID PRJNA262901 with accession number KX655615-KX655701 (KX655701, KX655688 and KX655625 are partial sequences).

Dataset assembly and phylogenetic analyses
In order to enrich the datasets of 58 RSVA and 27 RSVB Jordanian full-genome sequences, we retrieved all RSV whole-genome sequences from GenBank on July 21 2016, resulting in datasets of 771 and 331 sequences for RSVA and RSVB, respectively. The data were aligned using mafft v7.310 [43] and subsequently manually edited to accommodate the ORFs of all genes and the G gene duplication. To identify recombinant strains that would complicate the phylogenetic inferences, we analysed the datasets using the Recombination Detection Program Beta v4.94 [44]. This resulted in the exclusion of one recombinant RSVA sequence (JX015495) and four recombinant RSVB sequences (KJ939932, KJ672473, KJ627251, KJ939933) from locations other than Jordan. Maximum-likelihood trees were inferred using RAxML v7.2.6 [45] incorporating a general time reversible (GTR) model of nucleotide substitution with a gamma-distributed (Γ) rate variation among sites. To assess the robustness of each node, a 500 replicates bootstrap resampling process was performed.
We also investigated the temporal signal of the datasets using TempEst [46]. A few sequences showing incongruent temporal patterns were excluded. Phylogenetic relationships were inferred for each of the data sets separately (669 RSVA sequences and 327 RSVB sequences) with a Bayesian phylogenetic approach using Markov Chain Monte Carlo available via the beast v1.8.4 package [47] and the high-performance computational capabilities of the Biowulf Linux cluster at the NIH, Bethesda, MD, USA (https:// hpc. nih. gov/ systems/). We used an uncorrelated relaxed molecular clock with branch rates drawn form a lognormal distribution to account for evolutionary rate variation among lineages, with a Skygrid demographic prior [48], and a GTR model of nucleotide substitution with gamma-distributed rate variation among sites. For viruses with only the year of viral collection available, the lack of tip date precision was accommodated by sampling uniformly across a 1 year window from January 1st to December 31st. The Markov chain Monte Carlo (MCMC) chain was run separately at least three times for each of the data sets and for at least 200 million iterations with sub-sampling every 20 000 iterations, using the beagle [49] library to improve computational performance. All parameters reached convergence, as assessed visually using  regions within Amman (North, Central, North-East and West regions) for our spatial analysis. To study the proportion of RSVA and RSVB circulating in the Amman population and globally, we merged the datasets of both RSV antigenic subgroups and estimated the proportion of RSVA and RSVB viruses, and their respective genotypes, circulating in 15 countries over time, using the R program [50].

Clinical data
We collected information on the patient age and bronchopneumonia status for the Amman paediatric cohort during 2010-2013. As we had collected this information, we investigated whether infection with a specific RSV antigenic subgroup was significantly associated with patient age or disease progression to bronchopneumonia. For this purpose, we used two statistical tests, namely the t-test (patient age as a continuous variable) and the chi-square test (bronchopneumonia as a binary outcome). Additionally, we also investigated the extent of RSV phylogenetic structure for bronchopneumonia (discrete trait) using the association index (AI). This metric quantifies the degree to which samples of patients with bronchopneumonia tend to cluster together relative to the expectation for randomized trait assignments. AI values close to 0 reflect strong phylogeny-trait correlation, whereas AI values close to 1 reflect the absence of phylogenetic structure for the trait [51]. For this purpose, we reconstructed the bronchopneumonia status at the ancestral nodes using the discrete diffusion models implemented in beast [52].  (Fig. 1). All RSVB samples were sub-genotyped as GB1 and positioned within the BA-like clade, which is defined by a duplication of 60 nucleotides in the G gene (Fig. 2) [11,19]. However, the genetic diversity of RSVA traces back to a common ancestor that existed approximately 27 years earlier than RSVB (Table 1).

multiple introductions of RSV into the Amman cohort during each epidemic
Our study period included four winter epidemic seasons, defined as September to June: 2009-2010, 2010-2011, 2011-2012 and 2012-2013. We found that each annual RSV epidemic in Amman was not a point-source outbreak, but instead seeded by multiple independent viral introductions, including both RSVA and RSVB lineages, into the Amman cohort population. The low availability of RSV sequence data from many countries makes it difficult to infer the precise number of viral introduction, but we estimated, conservatively, the number of introductions in each year (Fig. 4) (Tables 2 and 3). However, the low availability of background RSV sequences from other locations means that we cannot exclude the possibility that genetically close viruses were re-imported each year from an unsampled location, rather than persisting locally.

Spatial patterns of RSVA and RSVB in Amman
To study the spatial dissemination of RSV within the Amman cohort population, four regions were defined among the city's neighbourhoods from which viruses were successfully sequenced (Fig. 4). Four of the fourteen RSVA introductions were identified in multiple regions, consistent with spatial spread within the city. One RSVA introduction was identified in three of the four spatial regions defined within Amman. In contrast, although a large number of introductions of RSVB into Amman were identified (n=14), each introduction was confined to a single city region, and there was no evidence of dissemination of RSVB between regions (Tables 2 and 3).

Proportion of circulating RSV antigenic subgroups A and B
We  (Fig. 5a, b). Alternating dominance between RSVA and RSVB was also observed in the USA, the only country with a sufficiently long time series of genetic data to examine genotype interactions (Fig. 5c, d). The low availability of ON1 sequences in Jordan (n=3) prevented a similar examination of interactions between the RSVA ON1 and GA2 genotypes within Jordan. However, Fig. 5(e) shows that the cluster of ON1 viruses in Jordan during the 2012/2013 winter epidemic coincided with the emergence of ON1 in the USA.

Association of viral genetics and clinical outcomes
RSVA-infected children tended to be older than RSVBinfected children (

DISCuSSIon
RSV imposes a high burden of respiratory disease on infants and children in Amman, Jordan [40]. By sequencing the complete genomes of 85 RSVA and RSVB samples from children in Amman, the largest whole-genome collection of RSVs from any country in the Middle East, this study provides insights into the genetic diversity of RSVA and RSVB in an under-studied region. We found that RSV epidemics in Our study benefitted from using whole-genome sequences, as the partial G gene may produce misleading results [53] and transmission patterns may be difficult to infer using shorter coding regions [19]. However, at this time there is far more background RSV sequence data available for the G gene for other countries, compared to whole-genome data, which limited our ability to differentiate independent viral introductions into Amman during an epidemic and infer their spatial origins. Compared to influenza A viruses, for which the global spatial ecology has been extensively characterized [55,56], the paucity of RSV sequence data from many regions undermines efforts to understand how the virus moves spatially around the world between local epidemics. It is clear that RSV is continually being re-introduced into Amman, similar to the viral dynamics observed in other longitudinal studies of RSV, such as in Kenya [57]. However, it remains unclear whether global RSV evolution follows a metapopulation model or a source-sink model, and the extent to which viruses persist in a single location. Additionally, the limited number of RSV samples from Jordan prevented more sophisticated phylogeographical modelling that could unveil the dynamics within the city. In the future, it would be particularly interesting to compare the spatial dynamics of RSVA and RSVB. Our analysis revealed more instances of RSVA transmission across multiple regions of the city, but these inferences were based on very low sample sizes, particularly for RSVB, and require further study with larger data sets.
The ongoing development of candidate RSV vaccines underscores the need for further understanding of human immune responses, and the extent of cross-immunity between RSV strains of varying genetic distances. The potential interactions between RSVA and RSVB observed in Jordan and the USA, with sequential alternation in the transmission of antigenic subgroups and genotypes (Fig. S5), may result in subgroup-or genotype-specific immunity. These dynamics demonstrate that future vaccine compositions could benefit by including representative strains for both RSVA and RSVB, or even, representative strains for the different genotypes. Additionally, there is a need for further sequencing of RSV over sufficiently long time-series within discrete locations to characterize population dynamics and estimate competitive immunological interactions. The recent emergence of ON1 variants also provides an opportunity to examine competitive interactions between ON1 and previously dominant GA2 strains across multiple geographical locations, to better understand the strength and breadth of immune responses to different RSV strains. Going forward, long-term studies of RSV dynamics within defined populations will be essential to inform the design and potential impact of future RSV vaccines.

Conflicts of interest
The authors declare that there are no conflicts of interest.

Ethical statement
All procedures were carried out in accordance with the ethical standards of the Jordanian Ministry of Health, University of Jordan and Vanderbilt University institutional review boards. Nasal and throat swabs were obtained by trained local research staff with consent. Standardized questionnaires were used to record demographic, clinical and socio-economic data. Parents were queried in Arabic, and bilingual research staff transcribed the information into an English-language case report form at the time of the interview.