Genome sequencing reveals coinfection by multiple chikungunya virus genotypes in a recent outbreak in Brazil

Chikungunya virus (CHIKV) is an RNA virus from the Togaviridae family transmitted by mosquitoes in both sylvatic and urban cycles. In humans, CHIKV infection leads to a febrile illness, denominated Chikungunya fever (CHIKF), commonly associated with more intense and debilitating outcomes. CHIKV arrived in Brazil in 2014 through two independent introductions: the Asian/Caribbean genotype entered through the North region and the African ECSA genotype was imported through the Northeast region. Following their initial introduction, both genotypes established their urban cycle among large naive human populations causing several outbreaks in the Americas. Here, we sequenced CHIKV genomes from a recent outbreak in the Northeast region of Brazil, employing an in-house developed Next-Generation Sequencing (NGS) protocol capable of directly detecting multiple known CHIKV genotypes from clinical positive samples. Our results demonstrate that both Asian/Caribbean and ECSA genotypes expanded their ranges, reaching cocirculation in the Northeast region of Brazil. In addition, our NGS data supports the findings of simultaneous infection by these two genotypes, suggesting that coinfection might be more common than previously thought in highly endemic areas. Future efforts to understand CHIKV epidemiology should thus take into consideration the possibility of coinfection by different genotypes in the human population.


Introduction
Chikungunya virus (CHIKV) is a medically important arthropod-borne virus (arbovirus) from the Alphavirus genus and Togaviridae family, which is transmitted to humans through Aedes aegypti and Ae. albopictus bites [1,2]. This virus has a single stranded positive RNA genome with approximately 11.8-kb in length encoding two polyproteins responsible for replication and propagation [3]. CHIKV infection most commonly reported symptoms include fever and joint pain, which can last for a few months to several years, and a number of other manifestations such as bullous dermatitis and myocarditis. In addition, CHIKV infection can also cause neurological manifestations such as meningitis, myelitis, encephalitis and Guillain-Barré syndrome [4][5][6][7][8], although such outcomes are rare and mostly observed in patients showing pre-existing underlying diseases [7,9,10].
CHIKV was first isolated from an outbreak in Tanzania in 1952 [11] and phylogenomic analyses, with past and current lineages, revealed three distinct genotypes: West African (WA), East/Central/South African (ECSA) and Asian/Caribbean [12]. Both WA and ECSA genotypes were mainly maintained in enzootic cycles in the African continent, being transmitted among non-human primates, with sporadic spillover to humans, although some ECSA lineages expanded their range and are currently causing large outbreaks in several human populations living in tropical regions. The Asian/Caribbean genotype is mainly endemic/epidemic among human populations. Several CHIKV outbreaks have occurred across the globe due to the arrival of once geographically restricted genotypes into new areas with large naive human populations or in areas with previous circulation of other genotypes [13]. Three major outbreaks are well documented: I-An ECSA outbreak in Kenya in 2004, and further spreading to the Indian Ocean Islands, giving rise to the Indian Ocean Lineage (IOL); II-Several outbreaks of the Asian genotype in Pacific Islands followed by spreading to the Caribbean islands and mainland North and South America [14]; III-A recent emergence of ECSA genotype in the Northeast of Brazil through the arrival of a traveler from Angola in Feira de Santana city-Bahia state [15,16] and further spreading south and northwards [17][18][19][20]. Interestingly, the ECSA genotype lineage that entered Brazil was recently detected into both Culex quinquefasciatus and Ae. albopictus pools in the Caribbean island of Haiti [21] which was first hit by the Asian/Caribbean genotype.
Based on vector species habitat suitability, it was suggested that both Asian/Caribbean and ECSA genotypes would spread and co-circulate in countrywide Brazil [15]. So far, a single patient infected with the Asian/Caribbean genotype was detected in 2014 at the Pernambuco state, Northeast Brazil [15], however no other studies further confirmed the circulation of this genotype in this region. ECSA genotype was found in this same geographical location (personal communication-Gabriel Luz Wallau and GenBank KX228391 accession number) as well as in other states from the Northeast region, suggesting that both genotypes are simultaneously circulating in this area [17][18][19]. Albeit two hundred thousand CHIKV infection cases were reported in Brazil in 2016 the proportion and distribution of infected patients by each genotype as well as the possibility of coinfection is completely unknown. A recent study identified 132 CHIKV cases out of 263 samples initially screened for dengue (DENV) and Zika (ZIKV) viruses, showing that CHIKV was able to displace a previous ZIKV outbreak in Northeast Brazil between May 2015 to May 2016, but distinction of CHIKV genotypes were not determined [19].
In the present study, we sequenced the CHIKV genome directly from clinical samples and showed for the first time the presence of two CHIKV genotypes simultaneously circulating in the same region. Moreover, our results show evidence of coinfection of ECSA and Asian/ Caribbean CHIKV genotypes in naturally infected human individuals. Lastly, molecular clock analysis estimated that the ECSA CHIKV genotypes spread Northwards around mid 2015, concomitantly to its initial detection in the Northeast region of Brazil.

Study participants and sampling procedures
Patients were recruited through a previous established study protocol seeking neurological disorders with history of arbovirus infection (personal communication-Rafael Freitas Oliveira Franca). Briefly, patients were recruited from a public hospital, which is in charge of 70% of neurological cases in the state of Pernambuco, Northeast Brazil. As inclusion criteria, patients presenting any neurological disease who reported a previous episode of rash had a blood sample collected. Patients with classical CHIKV symptoms, without neurological manifestation, presenting with fever, myalgia and joint pain were also included. Samples were further processed for laboratory diagnosis of recent arboviral infection by rRT-PCR (Real Time RT-PCR), employing established protocols for detection of DENV, ZIKV and CHIKV viruses. A detailed history and clinical examination, including a full neurological examination were conducted in these patients and complementary clinical exams were performed when necessary (brain imaging, lumbar puncture, nerve conduction studies and electromyography). Based on neurological evaluation, patients were grouped on nervous system disease manifestation as follows: Acute Transverse Myelitis-acute signs of neurologic dysfunction in motor, sensory, and autonomic nerves and nerve tracts of the spinal cord, Optic Neuritis-acute severe visual disturbance without any clear diagnostic findings on ocular examination and Polymyositis-severe muscular weakness. Written informed consent was provided by all patients enrolled in this study. All human subjects were adults. The study protocol was reviewed and approved by the Oswaldo Cruz Foundation-FIOCRUZ, Aggeu Magalhães Institute Ethics Committee (CAAE #511.06115.8 000 5190).

Virus detection-Real Time RT-PCR
Viral RNA was extracted from serum samples by the use of QIAamp Viral RNA kit (Qiagen, Hilden-Germany) following manufacturer's instructions. Eluted RNAs were stored at -80˚C until use. Real Time RT-PCR (rRT-PCR) reactions were performed from purified RNA serum samples accordingly to Lanciotti et al., 2006 and2008 [22-24].

Viral genome sequencing
Total RNA was used for single strand cDNA generation employing Random Hexamers (Invitrogen) and ProtoScript II Reverse Transcriptase (New England Biolabs), following manufacturer instructions. cDNA were submitted to multiplex PCR with primers designed using Primal Scheme from the ZiBRA project (http://primal.zibraproject.org/) based on a multiple genome alignment of CHIKV sequences both from Asian/Caribbean and ECSA CHIKV strains detected in South America (GenBank accession numbers KP164576, KP164571, KP164572, KP164568, KP164570 and KP164569) available at the time. Primer sequences can be found in S1 Table. PCR were performed independently for 4 pools with Q5 Hot Start High-Fidelity DNA Polymerase (New England BioLabs). Cycle conditions were as follows: 98˚C at 30 seconds, 98˚C at 15 seconds, 61˚C at 5 minutes and 65˚C at 5 minutes for 45 cycles. PCR products were quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific Inc.). Sequencing libraries were prepared with Nextera XT Library Prep Kit (Illumina, San Diego, CA, USA) using 2 ng of cDNA following manufacturer 0 s instructions. MiSeq Reagent Kit V3 of 150 cycles was used employing a paired-end strategy. Sequencing was performed in the MiSeq (Illumina) machine. Raw reads are available at the European Nucleotide Archive (ENA) under the project number PRJEB23381.

Reads trimming, genome mapping and variant call
The raw "fastq" files were trimmed using Trimmomatic 0.36 [25] with the parameter SLI-DINGWINDOW:15 and the Illumina MiSeq adapter and custom primers sequences used for fragments amplification were provided to remove those sequences from the reads. Due to the high variability found in our samples we were not able to use softwares for recombination detection such as RPD [26] and Simplot [27] since they are based on the consensus sequences of viral genomes. Such consensus sequences in our case would be very disturbed due to the variability found. Moreover, most methods developed to characterize coinfection/mutation cloud do not output a graphical view to interpret the variability position along the genome. Therefore, based on the reasoning of Simplot [27] and Deep Simplot [28], we developed an in house shell and R script (available in S1 Data) to extract the "NM" column from alignment ".bam" files generated by Bowtie2-default parameters [29], which reports the number of SNPs of each read mapped along with the position of the read in the reference genome. Both KP164568 from ECSA genotype and KP164571.1 from Asian/Caribbean genotype genomes were used to map the reads of every sample using Bowtie2. It is important to highlight that these ECSA and Asian/Caribbean CHIKV genomes detected circulating in South America are 90% identical at the nucleotide level but several genotype specific SNPs also could be detected between them which allow the genotyping of the reads mapped at the variable regions (S1 Fig shows the conserved and variable regions among the 3 ECSA and 3 Asian Caribbean genomes already detected in South America). Finally, ggplot2 [30] was used to plot the average distance of the reads to both genomes using a 100 bp sliding window which then allows the visualization of recombination and coinfection/quasispecies signals in the same graph. Finally, SNP calls were performed with samtools mpileup and vcf-annotate tools [31] with the following parameters (vcf-annotate-filter Qual = 20/MinDP = 200/SnpGap = 20) and snpEff software was used for SNP annotation. [32]

Reads simulation
Read simulation was performed with ART [33] with the following parameters (art_illumina -ss MSv1 -p -l 75 -f 500 -m 200 -s 10 -qL 27), that is, generating 75bp paired-end reads, 500x of coverage depth, 200bp of median size DNA fragments for paired-end simulations, standard deviation of DNA fragment size of 10 and the minimum base quality score of 27. The reference genomes used for read simulations were the same ECSA (KP164568) and Asian/Caribbean (KP164571.1) genomes. In addition, we simulated two recombination events between these genomes: I-an ECSA 5' region (7555bp) /Asian/Caribbean 3' region (4646bp) genome; IIan Asian/Caribbean 5' region (7555bp) /ECSA 3' region (4257bp) genome; and a co-infection/ quasispecies pattern all along the genome mixing equal proportion of reads generated from ECSA and Asian/Caribbean reference genomes reported above.

Phylogenetic analysis
Phylogenetic tree were reconstructed using a bayesian inference approach with BEAST 1.8.0 [34] employing the GTR+I+G model indicated as the most likely model to represent the data by jModelTest 2.1.10 [35] with a relaxed lognormal molecular clock model and the Skyline tree prior. Consensus genomes from the samples with coinfection were generated with Integrated Genome Viewer (IGV) and all Asian/Caribbean SNPs were removed since the ECSA genotype genomes presented a larger genome coverage breadth. Six independent runs were performed with 50 million generations including the three genotypes (WAfr, Asian/Caribbean and ECSA) and 30 millions generations for ECSA focusing only on samples from South and Central America. Tree sampling was performed at every 1000 generations. Parameters convergence was evaluated by the effective sample size (ESS>200) using TRACER version 1.

Draft genomes
We obtained CHIKV draft genomes from all eight samples screened presenting coverage breadth varying from 79.46% to 98.64% and an average depth from 2201 to 10538x (Table 1). Due to the fact that amplicon sequencing is prone to cross contamination we also sequenced two negative controls following the approach used by Metsky et al 2017 for ZIKV genome sequencing [36]. We detected some degree of contamination in both negative control samples varying from 1033x in Serum negative control sample to 1641x in H20 negative control of coverage depth (Table 1). These low levels of contamination are in the range detected by Metsky et al (2017) using this same amplicon-based protocol for Zika virus [36]. Negative controls showed a much lower genome coverage breadth than all positive CHIKV samples analyzed, in agreement with previous studies. Moreover, we could not identify in either control samples a similar variability pattern (reads from the Asian/Caribbean genotype-see Polymorphism and coinfection subsection below) detected in some positive CHIKV positive samples (.bam mapping output files can be found at figshare repository-https://figshare.com/s/ b9e3d8848d4729ff3169). Therefore, it is unlikely that such low level of contamination has any impact on our results. Using a cutoff similar to Metsky et al (2017) we followed the analysis with all eight positive samples.
To check whether viral genomic variability could be the result of new CHIKV de novo genomic variants or to any other source of variation (coinfection or recombination) we called and annotate SNPs using a ECSA lineage reference genome. Fig 2 shows the different genome regions in which highly supported nucleotide variability (coverage depth > 200x) was found, along with the amino acid variations of the different mature peptides. Most of the polymorphisms were concentrated in the nsP4 region. We found several SNPs inducing amino acid changes, compared to the ECSA mature peptides (N = 69, non-synonymous SNPs), but the majority of the SNPs are synonymous (N = 348). The samples 89P87 and 53P51 showed the largest number of SNPs (141) with several Asian/Caribbean variants, and regions with SNPs supporting either the ECSA or Asian/Caribbean genotypes. Most of the SNPs found correspond to the exact variants present in the Asian genotype reference genome. Four of these samples (with the exception of sample #7P7) presented at least 7 times more variants corresponding to the Asian/Caribbean genotype than de novo mutations (Table 1 and Fig 2). Thus, the high variability observed is likely derived from coinfection and/or recombination between the ECSA and Asian/Caribbean genotypes rather than from de novo mutations.
The inversion and average signal in the dissimilarity plot detected in several of the analysed samples may have been a result of new recombinant genomic variants between the ECSA and Asian/Caribbean genotypes or bona fide coinfection of these two genotypes (Fig 1 upper left  panel). Our analysis revealed genomic regions covered by reads more similar to the Asian/ Caribbean genome while other regions were more similar to the ECSA genome (Fig 1-89P87, 29074, 53P41, 7P7 and 315P290). We detected only very few reads containing both genotypes specific SNPs, at potential breaking points, varying from only 1 to 14 reads in samples 7P7 and 53P51 respectively (S2 Data). Additionally, we also found genomic regions that are mapped by roughly as many reads supporting both ECSA and Asian/Caribbean genotype in sample 89P87 (~2800-3200bp) and 53P51 (~5600-7300bp) (Fig 1). Thus, the vast majority of reads supporting uniquely the ECSA or the Asian/Caribbean genotype, with no or only few reads supporting both genotypes (single reads with ECSA and Asian/Caribbean genotype-specific  SNPs) and the average signal found in the dissimilarity plot supports that coinfection, rather than recombination, has occurred.

Phylogenetic relationships
The draft genomes recovered were clearly more represented by the African CHIKV genotype belonging to the ECSA lineage which entered Brazil at Feira de Santana-Bahia in 2014 ( Fig  3A). Phylogenetic reconstruction showed that the genomes from Pernambuco state clustered with genomes from Paraíba, a north neighbouring state (Fig 3B). The two full genomes obtained from samples 43P41 and 97P45 clustered with high support with Paraíba state genomes while all other genomes clustered in a distinct clade (Fig 3B). Based on relaxed molecular clock estimates the common ancestor of Pernambuco samples dates between June 2015 and January 2016 with an average in September/October of 2015 ( Fig 3B).

Discussion
CHIKV infection has continually caused health and economic burden around the world and due to the continuous population flux this virus reached Brazil in 2014, spreading rapidly in a large naive human population [2,17,37]. Although some studies exist [16,19,38,39], only limited information is available on the epidemiology and evolution of CHIKV after its entrance in Brazil. In particular, we still do not know whether the two genotypes (ECSA and Asian/ Caribbean) are spreading countrywide and what is their prevalence in different regions of the country. Furthermore, it is unclear whether these two genotypes cocirculate and whether coinfection occurs as has been reported for other arboviruses [40,41]. Here, through direct sequencing of CHIKV genomes from clinical samples, we found that the ECSA and the Asian/ Caribbean genotypes are cocirculating, or at least have been cocirculating in the Northeast region of Brazil. Surprisingly, we also report, for the first time, data supporting simultaneous infection by two different genotypes in at least six clinical samples. While, at first, both recombination and coinfection could be invoked to explain the presence of both ECSA and Asian/Caribbean SNPs in our samples, several lines of evidence support the coinfection hypothesis. The patchy distribution of genomic fragments sequenced from each genotype in the same sample could be taken as evidence of recombination (Figs 2  and 4). However, this pattern can also be obtained in the case of coinfection, assuming differences in the amplification dynamics coupled to dissimilar titers of each genotype (viremia) and the likely variation of viral genome integrity of each genotype. Thus, if these patterns were due to recombination, one would have to invoke multiple different recombination events, which we believe is highly unlikely. Finally, and perhaps more importantly, the extreme scarcity of reads supporting potential breakpoints (i.e. reads containing SNPs from both genotypes), together with samples whereby multiple genomic regions are mapped by similar proportions of reads from both genotypes (Fig 4) strongly suggest the coinfection hypothesis is the most likely explanation.
It has been shown that a host infected by different DENV viral genotypes can progress to a more severe disease [40]. CHIKV coinfection has so far been reported only with other viral species such as CHIKV and DENV, CHIKV and ZIKV or CHIKV, ZIKV and DENV [39,[42][43][44][45][46][47] coinfected patients, but coinfection in humans generally occurs at a low rate [44]. Studies signal between ECSA and Asian/Caribbean genomes. Following plots correspond to the reads obtained to our 8 samples mapped against ECSA (red line) and Asian/Caribbean (blue line) genomes. We also added a trend line following the same colors to highlight the major patterns. Below each plot there is a grey bar representing the genome regions that we have coverage depth higher than 200x and thicker black line are regions where the coverage depth was lower than 200x.
https://doi.org/10.1371/journal.pntd.0007332.g001 reporting coinfection by different DENV genotypes suggest a link between coinfection and more severe clinical symptoms. In one of these studies, three children diagnosed with meningitis were shown to be infected by two to three different DENV serotypes [41]. Another study showed that 40 patients out of 280 diagnosed with DENV were simultaneously coinfected by different serotypes, being associated with severe clinical manifestations [40]. Interestingly, the finding of Ae. aegypti mosquitoes coinfected by CHIKV and ZIKV can simultaneously transmit both viruses [47], supports the idea that during concomitant outbreaks coinfection may occurs [48]. Albeit our data clearly bring new insights into the molecular epidemiology of CHIKV it remains to be elucidated if coinfection by two different genotypes can have any consequence on viral fitness and disease progression and outcome.
In addition to coinfection likely driving the observed higher CHIKV variability, altered viral fitness resulting from point mutation-inducing structural changes may have also occurred. Thus we searched, in our draft genomes, for specific mutations known to exacerbate viral neurotropism [49,50]. We did not observe any potential amino acid changes in the CHIKV envelope (E2) glycoprotein, positions 67 to 82. Indeed, all obtained draft genomes, compared to publicly available sequences from ECSA and Asian/Carribean genotypes, were identical at this specific E2 glycoprotein region. Another CHIKV genomic region that may impact viral fitness and disease outcome is the RdRp, since specific changes can produce larger mutation clouds [51,52]. However, no known mutation that affects the RdRp fidelity was found in our samples. Therefore, there is no evidence that the RdRp mutations observed here had a major impact on the observed genetic variability.
Finally, molecular clock analysis of the ECSA genotype genomes estimated the entrance of the virus in the Pernambuco state between June/July 2015 and November/December 2015 (HPD 95%), in very close agreement with the firsts CHIKV infection cases in Pernambuco detected in June 2015 reaching its peak in August of the same year [19]. The agreement between molecular clock inferences and epidemiological data shows that the sequencing of viral genomes coupled with evolutionary studies brings a wealthy of data to understand the evolutionary dynamics of the virus and make inferences about its spread. Therefore, genomic epidemiology studies can provides several key information to guide public health efforts focused on epidemic surveillance and outbreak control [53].
Clearly, one limitation of this study relies on the small number of sequenced samples. Hence, the possible link between coinfection by multiple CHIKV genotypes and severe symptoms/disorders should be further investigated in a large sample set to statistically evaluate the relationship of coinfection and disease manifestation. Taken together, our findings highlight the importance of sequencing viral genomes directly from clinical samples using high throughput sequencing, as it allows a fine-grained assessment of viral variability present during infection.
Supporting information S1