Origin of Human T-Lymphotropic Virus Type 1 in Rural Côte d’Ivoire

Simian T-lymphotropic virus type 1 (STLV-1) strains occasionally infect humans. However, the frequency of such infections is unknown. We show that direct transmission of STLV-1 from nonhuman primates to humans may be responsible for a substantial proportion of human T-lymphotropic virus type 1 infections in rural Côte d’Ivoire, where primate hunting is common.

samples, the quantities of DNA extract that were added to this first-round reaction were derived from the results of the HTLV-tax quantitative PCR and arranged so that each reaction would start from only 2 to 6 templates. A total of 47 such near endpoint dilution reactions were run for each of the 10 DNA extracts. Our reason for conducting these steps in this manner was that in these conditions, multiple infections would easily be identified by multiple peaks in chromatograms at polymorphic positions. For nonhuman primate samples, first round reactions were seeded with Two separate second-round reactions were then completed by using the first-round products. Primers Xho (GAG CTC GAG CAG ATG ACA ATG ACC ATG AG) and R8906 were used in a seminested reaction that targeted LTR while primers HFL75 (TCA AGC TAT AGT CTC CTC CCC CTG) and ENV2 (GGG AGG TGT CGT AGC TGA CGG AGG) were used in a nested reaction that targeted env. Cycling conditions were 95°C for 5 min followed by 35 cycles of 95°C for 30 s, 58°C (LTR) or 62°C (env) for 30 s and 72°C for 90 s with a final step at 72°C for 10 min. Two μL from a 40-fold dilution of first-round reaction products were used to seed all second-round reactions.
PCR products were visualized with gel electrophoresis before being purified. Sequencing was performed according to the Sanger method. For all cases, comparison to publicly available sequences, using the NCBI BLAST service (3), confirmed that the expected proviral DNA sequences had been amplified. Sequences determined in this study have been deposited in the EMBL Nucleotide Sequence Database under the following accession numbers: HE667747-59 (LTR) and HE667760-72 (env).

Sequence Analyses Datasets
The LTR and env datasets comprised the sequences determined in this study, 2 outgroup sequences (1 STLV-1 sequence determined from an Asian macaque and 1 HTLV-1C determined from a Solomon Islands inhabitant), as well as all publicly available HTLV-1/STLV-1 sequences determined from humans or nonhuman primates in West and North Africa. All publicly available sequences were retrieved from NCBI (in phylogenetic trees, all sequences appear with their accession numbers). Sequences determined from captive monkeys from uncertain geographic origin, such as captive nonhuman primates, were also included as long as the distribution of the host species in the wild was clearly restricted to West and North Africa (according to distribution ranges from the IUCN Red List Web site: http://www.iucnredlist.org/). Table 1 summarizes the main characteristics of the West and North African sequences that were part of these datasets. Results of analyses performed on nonhuman primates from Taï National Park are specifically reported in Table 2. Molecular subtype assignation was based on the analysis of an enlarged LTR dataset, which also comprised the 42 reference sequences described in (4) (data not shown). Each dataset was aligned by using MUSCLE (5) as implemented in SeaView v4 (6), reduced to unique sequences by using Fabox (http://birc.au.dk/software/fabox/) (7) and checked for the presence of recombinant sequences by using RDP3 (no recombinant was found) (8). It should be noted that although alignments included gaps, most were unambiguous. In addition, analytical methods used thereafter are notoriously insensitive to gaps (as long as they do not lead to faulty site homology), which are dealt with as missing data. Haplotyped datasets are available at http://sebastiencalvignac.fr/downloads/index.html.

Phylogenetic Analyses (including Divergence Date Estimations)
Several models of nucleotide substitution were first assessed for their ability to explain the data by using jModeltest v0.1 (9). On the basis of the comparison of Akaike information criterion (AIC) scores derived from model likelihoods, a global time reversible (GTR) model with a proportion of invariant sites (+I) and γ-distributed rate heterogeneity with 4 classes (+G4) was selected for both the LTR and the env datasets, Phylogenetic analyses were performed in both maximum-likelihood (ML) and Bayesian frameworks, using the corresponding models.
ML analyses were performed on the PhyML webserver (http://www.atgcmontpellier.fr/phyml/) (10,11). Substitution models also included nucleotide equilibrium frequency optimization. Tree search was arranged to start from a BioNJ tree and to be performed using both nearest-neighbor interchange and subtree pruning-regrafting with optimization of topology and branch lengths. Branch robustness was assessed by using non-parametric bootstrapping (500 pseudo-replicates).
Bayesian analyses were performed by using BEAST v1.6.1 (12). Analyses were run under the assumptions of a relaxed molecular clock (uncorrelated lognormal) and a constant population size (previous analyses of a comparable HTLV-1/STLV-1 dataset had shown that tree topology was robust to tree shape assumptions) (1). So as to be able to place divergence events into an absolute time framework, we placed a strong prior on the divergence date of Melanesian (http://tree.bio.ed.ac.uk/software/tracer/) was used to check that individual runs had converged, that independent runs converged onto the same parameter values and that chain mixing behavior was appropriate (effective sample size values of combined runs >200). Trees sampled in duplicate runs were then gathered into a single file by using LogCombiner v1.6.1 (part of the BEAST suite) after the removal of a visually conservative 10% burn-in period and, a 2-and 10fold decrease in sampling frequencies for LTR and env chains, respectively. The information of 18,000 trees per dataset was condensed into a maximum clade credibility tree by using TreeAnnotator v1.6.1 (also part of the BEAST suite). Posterior probability, the frequency of a given bipartition in the posterior sample, was taken as a measure of branch robustness.

Tree Display
ML trees were chosen to be displayed as Figure 2 (LTR; main text) and Technical Appendix Figure (env; this file). Outgroup-based rooting was further optimized by using Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/), a purely cosmetic operation whose principle is equivalent to midpoint rooting, although being much cleaner since it determines the display that minimizes the variance of root-to-tip distances (as midpoint rooting it therefore makes the assumption of clock-like evolution). Branch robustness measures of ML and Bayesian analyses were annotated on the resulting tree.