Northern African strains of human T-lymphotropic virus type 1 arose from a recombination event

UNLABELLED
Although recombination is a major source of genetic variability in retroviruses, no recombinant strain had been observed for human T-lymphotropic virus type 1 (HTLV-1), the first isolated human-pathogenic retrovirus. Different genotypes exist for HTLV-1: Genotypes b and d to g are restricted to central Africa, while genotype c is only endemic in Australo-Melanesia. In contrast, the cosmopolitan genotype a is widely distributed. We applied a combination of phylogenetics and recombination analysis approaches to a set of new HTLV-1 sequences, which we collected from 19 countries throughout Africa, the continent where the virus has the largest endemic presence. This led us to demonstrate the presence of recombinants in HTLV-1. Indeed, the HTLV-1 strains currently present in North Africa have originated from a recombinant event between strains from Senegal and West Africa. This recombination is estimated to have occurred around 4,000 years ago. This recombination seems to have been generated during reverse transcription. In conclusion, we demonstrate that, albeit rare, recombination can occur in HTLV-1 and may play a role in the evolution of this retrovirus.


IMPORTANCE
A number of HTLV-1 subtypes have been described in different populations, but none of the genetic differences between these subtypes have been ascribed to recombination events. Here we report an HTLV-1 recombinant virus among infected individuals in North Africa. This demonstrates that, contrary to what was thought, recombination can occur and could play a role in the evolution of HTLV-1.

G enetic recombination is a major characteristic of the human retrovirus HIV-1 and plays a crucial role in evolution and pathogenicity (1,2). In contrast, no recombination event has been described, so far, for human T-lymphotropic virus type 1 (HTLV-1), the first described human-pathogenic retrovirus (3). This fitted well with the remarkable genetic stability of HTLV-1, probably linked to viral amplification via clonal expansion of infected cells (4). The absence of HTLV-1 recombination was also supported by the fact that no superinfection at the cellular level had been described (5).
HTLV-1 infects at least 5 to 10 million people worldwide, mainly in geographical foci of high endemicity (6). HTLV-1 is the etiologic agent of a malignant CD4 lymphoproliferation (adult T-cell leukemia [ATL]) (7) and of a chronic progressive neuromyelopathy (tropical spastic paraparesis/HTLV-1 associated myelopathy [TSP/HAM]) (8). Among the infected individuals, 3 to 7% will develop such severe diseases. Different HTLV-1 genotypes exist, some of which are geographically restricted: Genotypes b and d to g are restricted to central Africa, while genotype c is endemic in Australo-Melanesia. In contrast, the cosmopolitan genotype a is widely distributed, supposedly dispersed during the past centuries through migration of infected populations, as for instance during the Atlantic slave trade (6,(9)(10)(11). Within this large cosmopolitan a subtype, several molecular clades have been described (e.g., Japanese and West African). Here we applied a combination of phylogenetics and recombination analysis approaches to a set of new HTLV-1 sequences, which we collected from 19 countries throughout Africa, the continent where the virus has the largest endemicity. We demon-strate that recombination has occurred in natura in HTLV-1 and that strains currently present in North Africa originated from a recombination event.

MATERIALS AND METHODS
Study population and ethics statement. Samples were obtained from 41 HTLV-1-infected individuals (of different clinical statuses: ATL, TSP/ HAM, and asymptomatic carriers) originating from 19 African countries (Table 1). All blood samples were obtained according to French laws and regulations (Articles L.1211-2 and L.1243-3 from Code de la Santé Publique). The human sample collection has been declared to the Ministère de l'Enseignement Supérieur et de la Recherche (2010 DC-1197).
Viral sequencing. For the purpose of this study, high-molecularweight DNA was extracted from peripheral blood buffy coat using the QIAamp DNA blood minikit (Qiagen, Hilden, Germany). The samples were first subjected to PCR using human ␤-globin-specific primers to ensure that DNA was amplifiable.
For long terminal repeat (LTR) sequencing (774 bp long for the HHZ sequence), we followed the protocol previously described (12). Briefly, complete LTR sequences were obtained through 2 PCRs, generating LTR-gag (with primers Enh280 and R2380) and Tax-LTR (with primers F6501 and 3VLTRext) segments. The former was sequenced using the Enh280/ R1021 pair of primers and the latter using the 8200LA/Rev3 pair of primers. The forward and reverse sequences were aligned using the ClustalW algorithm (MacVector 6.5 software; Oxford Molecular) and were found identical. Then the 2 fragments were concatenated. There was a 248-bplong overlapping region. In this region, the sequences were identical. Of note, the recombination site is found in this region, and 4 out of 8 informative mutations were also present. This rules out the possibility of the recombination being an artifact consecutive to our sequencing strategy.
Concerning the env segment (522-bp-long fragment), we used the previously described protocol (13). A seminested PCR was performed: a first amplification with the env1/env22 pair of primers was followed by amplification with the env1/env2 pair.
Phylogenetic analyses. Multiple sequence alignments were performed with the DAMBE program (v4.2.13) (14). For the studies on the env segment, no gaps or stop codons were observed. Absence of saturation of the alignment was confirmed by the test of Xia and Xie in the DAMBE program.
The most appropriate nucleotide substitution model was selected in the Modeltest v3.6 program (15), based on the Akaike information criterion (AIC). The best-fitting models were GTR-⌫ and Tamura-Nei-⌫ for the LTR and env sequences, respectively. Phylogenetic reconstructions were conducted in PAUP* v4.0b10 using the neighbor-joining method TABLE 1 Specimen identification, geographic anddemographic data, and clinical status of HTLV-1infected individuals from northern,western, central, and austral African countries a a Provided are the HTLV-1 subtype and subgroup confirmed by phylogenetic analysis and new LTR and env African sequences characterized in this study. Abbreviations: AC, asymptomatic carrier; ATL, adult T-cell leukemia; TSP/HAM, tropical spastic paraparesis/HTLV-1-associated myelopathy; a-NA, a-WA, a-Sen, and a-TC, North African, West African, Senegalese, and transcontinental clades of the a subtype, respectively. with 1,000 bootstrap replicates performed to test the robustness of the tree topology. Phylogenetic topologies were also confirmed using the maximum likelihood method (on the PAUP program).
Recombinant search. The recombinant search and breakpoint detection were performed by boot scanning in Simplot v3.5.1 (16). This program compares inferred clusters of sequences to each other. Phylogenetic relationships of these clusters are estimated for successive overlapping subregions (window, 180 bp; step, 20 bp). For each subregion, the bootstrap value of the query and the references are calculated (according to the Kimura two-parameter model with 1,000 replicates). Bootstrap values are then plotted along the genome on an x/y plot, so that x values reflect the genome position at the midpoint of the analyzed windows and y values reflect the bootstrap value calculated from the windows. The recombination breakpoint determined is then verified by phylogenetic analyses of the 2 fragments.
Quartet mapping was performed on the 2 LTR segments of interest (17). Quartet mapping consists of a likelihood mapping in which 4 groups of interest are initially inferred. The methods evaluated the maximum likelihood values for each of three possible trees when considering a quartet consisting of a sequence from each defined group. The result is plotted inside an equilateral triangle, where each corner represents a specific tree topology. The center represents sets where trees are poorly supported, named the "starlike area." From the biological standpoint, a likelihood mapping showing more than 20% of dots in the starlike area suggests that the data are noisy. Quartet mapping was performed with the Tree-Puzzle v5.2 program.
Molecular clock. Molecular clock analysis was performed on the HTLV-1 LTR alignment using 2 external times (divergence of the subtype c Autralo-Melanesian strains and radiation of the HTLV-1a subtype, as defined previously [18]). The evolutionary rate was estimated by using a Bayesian Markov chain Monte Carlo (MCMC) molecular clock method using BEAST v1.8.0 (19). A strict molecular clock hypothesis was inferred. The GTR-⌫ substitution model with 8 gamma categories was used. The MCMC chain was run for 20 million steps and sampled every 10,000 steps. Convergence and effective sampling sizes (ESS) of the MCMC chain were examined in Tracer v1.6 (http://beast.bio.ed.ac.uk). All parameter estimates showed significant ESSs (Ͼ150).

RESULTS
Complete LTR sequences were obtained from 41 HTLV-1-infected individuals originating from various African regions: 6 from North Africa, 18 from West Africa, 12 from central Africa, and 5 from austral Africa (Table 1). Phylogenetic analyses were performed on the 774-bp-long LTR fragment. The topologies of the phylogenetic tree were comparable for the neighbor-joining (Fig. 1) and maximum likelihood methods (data not shown).
The main HTLV-1 subtypes (a, b, and d) were identifiable ( Fig.  1) (20)(21)(22). The sequences belonged to either the central African b subtype, with strains isolated in Nigeria, Democratic Republic of Congo (former Zaïre), Zambia, Central African Republic, and Gabon, or the cosmopolitan a subtype. Within the latter subtype, one large transcontinental clade (a-TC) and three geographically restricted clades were identified: we named these the Senegalese (a-Sen), North African (a-NA), and West African (a-WA) clades. Of note, the a-Sen group has previously been described and named "trans-Saharan" or "clade N" (within the subgroup HTLV-1a D) (23,24).
Interestingly, unlike the other groups, the a-NA clade was not supported by a strong bootstrap value and had an intermediate position between the a-WA and a-Sen groups. These characteristics were reminiscent of a clade emerging from a recombination event. To test this hypothesis, we studied the phylogenetic relationship of the different groups through bootstrap resampling by the boot-scanning method ( Fig. 2A) (16). The LTR sequences of the a-NA group could be decomposed into 3 distinct regions. (i) The 5= region (positions 1 to 375), which corresponds to the U3 region of the LTR, is closely related to the a-Sen strains, (ii) the central region (positions 375 to 526) corresponds to a portion of the R region of the LTR and is conserved among strains of the different groups, and (iii) the 3= region (positions 526 to 774), corresponding to an R-U5 region of the LTR, segregates with the a-WA strains. Thus, we concluded that the a-NA strains might have resulted from a recombination event between strains from the a-Sen and a-WA groups that occurred within the R region.
When focusing on the nucleotide sequences, we observed that nucleotides at positions 153, 182, 263, and 319 are conserved between a-NA and a-Sen strains (but not a-WA strains), while nucleotides at positions 377, 557, 624, and 631 are identical between a-NA and a-WA strains (but not a-Sen strains). This suggests that the recombination event has occurred at the U3-R junction, around nucleotide position 375. Separate analysis of the U3 segment (positions 1 to 375) and the R-U5 segment (positions 376 to 774) of the LTR confirmed that the a-NA sequences are more closely related to a-Sen on the U3 region and close to a-WA on the RU5 segment ( Fig. 2B and C). Topologies of the phylogenetic trees were consistent between neighbor-joining ( Fig. 2B and C) and maximum likelihood (data not shown) studies.
As the viral diversity is low (a characteristic of HTLV-1), the groups defined in these analyses are not supported by strong bootstraps. Therefore, to strengthen our observation, quartet mapping was performed on the 2 LTR fragments (17). We considered 4 phylogenetic groups (the a-Sen, a-NA, and a-WA subgroups and the b subtype). For a given quartet (i.e., a group of 4 sequences, each originating from a different phylogenetic group), 3 topologies are possible: the a-NA strain is closely related to the a-Sen, a-WA, or b strain. The probabilities of each topology are reported as barycentric coordinates in a triangle map. Therefore, if the quartet is undefined (i.e., the quartet has a perfect starlike shape), the sequence is reported in the center of the triangle. In contrast, if the a-NA strain is closely related to the a-Sen strain, the sequence is reported close to the angle corresponding to the a-Sen group. The analyses confirm that indeed a-NA strains are related to the a-Sen and a-WA groups on the U3 and RU5 fragments, respectively. Together these data confirm that a-NA strains are recombinants of a-Sen and a-WA sequences.
This recombination site at the U3-R junction suggests that the recombination event could have occurred during reverse transcription (RT). Indeed, retrovirus particles incorporate 2 strains of genomic RNA, and during RT, the newly synthesized R-U5 single-strand DNA is transferred to the 5= end of genomic RNA (25). If both genomic RNAs were from different HTLV-1 genotypes (considering that a single cell can be infected with 2 different HTLV-1 strains), such a jump occurring between viral genomes would result in a virus with a recombined LTR. Based on this model, we predicted that the env sequence from a-NA strains should be related to the env sequence from the a-Sen strains. Phylogenetic analysis of env sequences confirmed that while a-WA strains form a separate clade, the a-NA and a-Sen strains are virtually undistinguishable (Fig. 3).
Finally molecular clock analyses were performed in order to date the recombination event; we hypothesized that a single event is at the origin of the a-NA group. Strict molecular clock analyses were performed using 2 different calibration dates: the divergence isolates, including the 41 sequences generated in this study (in boldface) and the 56 previously published sequences (in italics). The Melanesian sequence Mel5 was used as the outgroup. The sequence name was colored according to the localization origin of the individuals, as presented in Table 1. Purple, orange, and yellow sequences are from North Africa, the Senegalese area, and West Africa, respectively. The phylogeny was derived by the neighbor-joining method using the GTR model (gamma ϭ 0.5325). Horizontal branch lengths are drawn to scale, with the bar indicating 0.01 nucleotide replacement per site. Numbers on each node indicate the percentage of bootstrap samples (of 1,000) in which the cluster to the right is supported. of the Australo-Melanesian strain at 50 Ϯ 10 kiloyears and the subtype Ia radiation date (12,700 Ϯ 2,000 years) (18). From the two hypotheses, the origins of the a-NA group were estimated to be 4,650 years before present (BP) (95% highest posterior density [HPD], 2,850 to 6,450; ESS, 255) and 3,740 years BP (95% HPD, 2,610 to 4,870; ESS, 283), respectively.

DISCUSSION
Here we demonstrate that albeit rare, recombination in HTLV-1 can occur and may play a role in the evolution of this virus, as for the other retroviruses (26). Indeed, the strains we sampled from North Africa are a result from a recombination between HTLV-1 strains currently found in Senegal and West Africa, which is estimated to having occurred around 4,000 years ago. This recombination seems to have been generated during reverse transcription.
The existence of this recombinant clade suggests that, in vivo, some cells can be infected by multiple HTLV-1 strains. Indeed, superinfection is a required condition for recombination to occur-a cell has to be infected with two viruses to generate a "hybrid" viral particle that would generate the recombinant virus we describe. The possibility of multiple HTLV-1 infections at the cellular level has been a matter of debate. Lymphocytes from patients with adult T-cell leukemia with multiple integrated viral genomes have been reported. However, it was unclear whether this was the consequence of multiple infections or HTLV-1 secondary amplification by retrotransposition (27,28). While HTLV-1 receptors are still expressed at the cell surface of infected lymphocytes (29,30), a recent report suggested that naturally infected T cells contain a single integrated HTLV-1 provirus (5). One can therefore speculate that the target cell for multiple infections might be another cell type (dendritic cells are sensitive to HTLV-1 infection [31]) or that recombination could occur upon formation of syncytia in vivo.
Considering that the a-NA clade was derived from a unique nucleotides and the 399 last nucleotides, respectively, were derived from the neighbor-joining method (GTR; gamma ϭ 0.5325). The groups of interest are colored as follows: red, green, and blue sequences belong to a-Sen, a-NA, and a-WA, respectively. In both trees, 4 informative sites determine the topology between the 3 clades of interest. The quartet mapping of each sequence fragment is presented.
recombination event, we estimated that this recombinant might have originated from the intermixing of African populations infected with strains related to the ones currently present in West Africa and the Senegalese area: this event occurred around 4,000 years ago. The origins of such a recombination are, however, still unclear. Based on the current data, the recombination is unlikely to have occurred in a simian host. Indeed, no simian T-lymphotropic virus type 1 (STLV-1) subtype a strain has been described so far. However, STLV-1 genetic diversity in the Sahel and West Africa needs to be documented. Considering that recombination might have occurred during human intermixing, the place of recombination remains unknown. Either recombination occurred in West Africa (or the Senegal area) and was followed by migration of populations infected with the recombined virus, or populations infected with a-Sen and a-WA viral strains mixed in North Africa prior to recombination. As no a-NA strains were found in regions other than North Africa, it is more likely that the recombination occurred in North Africa, after migrations of infected individuals from West Africa and Senegal.