Phylogeographic Pattern and Extensive Mitochondrial Dna Divergence Disclose a Species Complex within the Chagas Disease Vector Triatoma Dimidiata

Background: Triatoma dimidiata is among the main vectors of Chagas disease in Latin America. However, and despite important advances, there is no consensus about the taxonomic status of phenotypically divergent T. dimidiata populations, which in most recent papers are regarded as subspecies.


Introduction
Chagas disease (American Trypanosomiasis) is one of the most important parasitic diseases in Latin America with about 8-10 million people infected, 10-12 thousand deaths per year and ,25 million at risk of infection [1,2]. Humans acquire the disease when they come into contact with Trypanosoma cruzi-infected faeces of blood-sucking bugs of the subfamily Triatominae (Hemiptera: Reduviidae). As the most effective mechanism to prevent Chagas disease transmission relies on vector control strategies, substantial effort has been devoted to the study of the ecology, population structure and evolution of triatomine bugs (for review see [3]). Triatoma dimidiata, T. infestans, and Rhodnius prolixus are the main vectors of Chagas disease. Vector control programs have achieved remarkable success towards the elimination of R. prolixus and T. infestans in several regions of Central and South America, respectively [4]; T. dimidiata is currently the primary target of control efforts across its range [5], which spans from southern Mexico through Central America into Colombia, Peru, and Ecuador [6].
T. dimidiata occupies a wide variety of domestic and peridomestic environments, in both rural areas [7,8] and urban settings [9,10]. In the wild, it is also very versatile, and colonies have been found in a wide variety of ecotopes (e.g., rocky outcrops, trees, caves, and palm trees [11,12,13]).
Throughout its geographic distribution, T. dimidiata exhibits high phenotypic variability, which has caused considerable taxonomic controversy since the species description in 1811. A number of chromatic, morphometric, and antennal phenotype variants have been recognized; while some of them were regarded as geographic populations, others were formally described as subspecies or species (reviewed by [14]).
The first genetic evidence suggesting the existence of undescribed cryptic species within the T. dimidiata taxon was reported in 2001 [15]. Based on nucleotide sequence analyses of the ribosomal DNA second internal transcribed spacer (ITS-2), substantial differences were observed between one population from Yucatán (Mexico) and those from other localities in Mexico and Central America. Chromosome C-banding patterns, genome size [16], and mitochondrial cyt b sequence analyses [17] later corroborated these findings and extended the distribution of this new species into the forests of Petén, Guatemala. Bargues et al. [18] have proposed that all other T. dimidiata populations (including the closely related species T. hegneri) although also genetically distinct, but with distance values markedly lower than those for the particular population from Yucatán, should be regarded as subspecies. Genetic groups based on the subspecific criteria adopted by Usinger [19] were proposed: Group 1A (specimens from Chiapas in southern Mexico, Guatemala, Honduras, El Salvador, Nicaragua, Costa Rica, and Ecuador), which would correspond to T. dimidiata dimidiata; Group 1B (specimens from Panama and Colombia), corresponding to T. dimidiata capitata; and Group 2 (samples from central and northwestern Mexico, Guatemala and Belize), corresponding to T. dimidiata maculipennis. Results based on cuticular hydrocarbon patterns support the existence of these ''three subspecies'' and suggest the existence of yet a fourth subspecies and a full species within the taxon T. dimidiata [20].
In summary, phenotypic and genetic studies have indicated that T. dimidiata is a complex of sibling or near-sibling taxa, although the precise number of species and subspecies and their relationship with each other remain uncertain. To further our knowledge on the taxonomy and population subdivision of this important Chagas disease vector, we present new data based on two mitochondrial gene fragments, cytochrome b (cyt b) and nicotinamide adenine dinucleotide dehydrogenase 4 (ND4).

Insect Sampling and DNA Isolation
A total of 126 T. dimidiata specimens were collected from 32 localities across the species geographic range (Table 1 and Figure 1). Ten additional specimens of five closely related Triatoma species (T. hegneri, T. flavida, T. phyllosoma, T. pallidipennis, and T. nitida) were also sampled ( Table 1). Insects were collected between 1995 and 2004 from domestic, peridomestic, and sylvatic habitats and identified following the Lent and Wygodzinsky [6] taxonomic key. Whenever necessary home/property owners gave consent for traps to be placed. One leg of each individual was stored in 95% ethanol until the DNA purification step. Extractions were performed using the Wizard Genomic DNA extraction kit (Promega, Madison, Wisconsin) following the manufacturer recommendations.

Sequence Variation and Phylogenetic Analyses
Standard genetic diversity indices such as both nucleotide (p) and haplotype (h) diversities, and number of variable and parsimony informative sites were estimated using DNASP 5.10 [23]. Tajima's D [24], as implemented in ARLEQUIN 3.11 [25], were used to test for mutation-drift equilibrium deviation in the overall sample.
The strategy employed to infer the phylogenetic relationships among T. dimidiata populations was to first generate a tree based on all cyt b sequences available, identify the main clades present, and then select at least one representative specimen of each clade to be further sequenced for the ND4 gene fragment to be used in a cyt b+ND4 combined analysis. Other Triatoma species (T. flavida, T. phyllosoma, T. pallidipennis, and T. nitida) were used as outgroups in the phylogenetic analyses. Best-fitting substitution models for each dataset were determined with JMODELTEST 0.1 [26,27] based on Akaike's information criterion (AIC [28]), which led to the selection of the Hasegawa-Kishino-Yano (HKY) model [29] with a proportion of invariable sites (+I), and gamma-distributed rate heterogeneity among the remaining sites (+G).
Phylogenies were inferred by Maximum Likelihood (ML) using PHYML 2.4.4 [30]. Bootstrap node support values were estimated from 1000 pseudoreplicates. ML trees were also submitted as user trees to MRBAYES 3.1.2 [31] for a Bayesian analysis. Posterior probabilities of phylogenetic trees were estimated by a 10,000,000generation Metropolis-coupled Markov chain Monte Carlo (MCMC) simulation (four chains, chain temperature = 0.2) under the HKY+I+G model of substitution, with parameters estimated from the dataset. A majority-rule (50%) consensus tree was constructed following 200,000 burn-in generations to allow likelihood values to reach stationary equilibrium. Identical conditions were used for the cyt b and the combined (cyt b+ND4) datasets.

Population-level Inferences and Divergence Times
Mean intra-and inter-group Kimura 2-parameters genetic distances [32] were estimated in MEGA5 [33], with standard errors estimated by bootstrapping (pseudoreplicates). A medianjoining network analysis [34] was performed using NETWORK 4.5.1.6 (http://www.fluxus-engineering.com). The maximum number of mutations between haplotypes within the same network for cyt b was 61. The 95% connection limit was not used because of the high levels of sequence divergence, which would cause an undesired fragmentation of the network.
Principal component analysis (PCA) was used to classify all input sequences at once into one or more groups. Variable sites from the nucleotide sequence dataset were selected and nucleotide bases were coded (A = 1, C = 2, G = 3, T = 4) and combined in an alignment matrix, where each row represents a specimen's DNA sequence. This alignment matrix was then converted into a genetic distance matrix as implemented in GENALEX 6.4 [35].
DNASP was used to generate distribution plots of pairwise sequence differences. No attempt was made to compare the observed distributions with expected distributions, because all models available in the software for producing expected distributions are suitable only for population-level analysis.
We used cyt b to estimate divergence times as we had more data for this particular gene fragment (both for specimens and haplotypes), and because reliable evolutionary rate estimates are available for this gene [36]. The time to the most recent common ancestor (tMRCA) was estimated for all major genetic groups revealed in the phylogenetic analyses using a Bayesian approach with BEAST 1.6.1 [37]. The analysis was performed using an HKY+I+G model of nucleotide substitution with gamma-distributed rate variation among sites and four rate categories -the substitution model selected using JMODELTEST. We used the suggested divergence rate of 1.1 to 1.8% per Myr [36]. The Yule process was chosen as speciation process for our data set. Results from two independent runs (20,000,000 generations, with the first 2,000,000 discarded as burn-in and parameter values sampled every 1000 generations) were analyzed with TRACER 1.5 [38] to assess convergence and confirm that the combined effective  Table 1 T. dimidiata    sample sizes for all parameters were .200, ensuring that the MCMC had ran long enough to produce valid estimates for the parameters [39]. The dating process was based on all specimens per group to calculate the distance (time) to the nearest node that determines each group.

Sequence Variation and Phylogenetic Analyses
A total of 126 cyt b (621 bp long) and 47 ND4 sequences (554 bp long) were produced for T. dimidiata. In addition, four 4 cyt b and one ND4 sequences were generated for T. hegneri, along with nine sequences (three cyt b and six ND4) for the outgroup species (Table 1). Five T. dimidiata cyt b sequences previously reported [17] were retrieved from GenBank and also included in the analyses. There was no indication of the presence of pseudogenes or numts among the sequences generated as no indels or stop codons were detected and all sequences appeared to be evolving (i.e. accumulating mutations) as expected for normal mtDNA protein coding genes. Inspection of the T. dimidiata mtDNA sequences revealed the existence of 97 and 36 unique haplotypes for the cyt b and ND4 gene fragments, respectively. Basic statistics are presented in Table 2. Neutrality tests failed to reject the null hypothesis that mtDNA sequences were evolving in a neutral manner in the studied species (Tajima's D: P.0.70; Table 2).
Saturation plots for both cyt b and ND4 gene fragments (third codon position transversion and transition substitutions against HKY+G+I distances), show no indication of saturation (results not shown).
ML and Bayesian phylogenetic methods yielded the same tree topologies for both datasets used (cyt b and cyt b+ND4). However, the analysis of the larger cyt b+ND4 combined dataset (1175 bp) led to the resolution of the Group I/Group II/T. hegneri polyphyly that was not discriminated in the cyt b tree (Figure 2a).
Both cyt b and cyt b+ND4 tree topologies indicate the existence of four well-defined monophyletic groups: Group I includes samples from Mexico (Chiapas), Guatemala, Honduras, El Salvador, Nicaragua, Costa Rica, Panama, Ecuador, and Colombia; Group II comprises the westernmost samples from Mexico but also includes specimens from Tabasco and Petén; Group III includes specimens from Petén (Guatemala), Yucatán (Mexico), and Belize; and Group IV includes sylvatic samples from Belize. T. hegneri, from the island of Cozumel, Mexico, appears as yet another independent lineage within the range of between-group variability observed (Figures 2 and 3). Mean Kimura 2parameters pairwise cyt b genetic distances among Groups I-IV plus T. hegneri were very high, ranging from 0.080 to 0.155 (Table 3). Bootstrap support values (above 50) are given above branches; Posterior probabilities for the Bayesian analysis are given below branches. Branch color codes indicate each of the four different genetic groups (plus T. hegneri) that comprise the T. dimidiata species complex. The three haplotypes in blue (Nic, Ec3, and Ec5) call attention to the high genetic similarity between specimens from Manabi, in Ecuador, and those from Nicaragua, indicating that T. dimidiata populations from the latter most likely represent the source of insects that were recently introduced into Ecuador. doi:10.1371/journal.pone.0070974.g002 Group I is the most widely distributed and genetically variable clade (Figure 4), with within-group divergence reaching values as high as 8.5% (when Colombian haplotypes are compared with Central American haplotypes). Thus, we can suppose that it might harbor more than one species. Observation of the cyt b tree (Figure 2a) towards the lower part of Group I reveals a striking similarity between haplotypes obtained from a specimen collected in Masaya, Nicaragua (Nic) with those obtained from specimens from Manabí, in Ecuador (Ec3 and Ec5), suggesting a very recent common origin.

Population-level Inferences and Divergence Times
With the observation of the magnitude of the inter-population divergence levels of the mtDNA sequences generated, and after unsuccessful attempts to extract meaningful population-level information from the data, we realized that it would be worthless (uninformative) to proceed with regular population-level inferences such as F ST or AMOVA, and therefore decided to exclude such analyses from this paper.
The median-joining haplotype network shows that most specimens presented unique haplotypes ( Figure 3). Moreover, highly divergent haplotypes were found in Petén, Guatemala, which segregated into different parts of the network. Conversely, certain haplotypes found in geographically distant (Ecuador and Nicaragua) were strikingly similar. Overall, the network displayed the same groups detected in the phylogenetic analyses. The median-joining haplotype network illustrates the intricate genetic structure that characterizes Group I.
PCA based on cyt b sequences alone revealed only three groups, with T. hegneri falling within Group I, while PCA of the combined cyt b+ND4 dataset resolved the same four groups (plus T. hegneri) recovered in the phylogenetic analyses (Figures 5a, 5b).
Results of within-and between-group comparative analysis of mismatch distributions are depicted in Figure 6. Mismatch distribution within Group III exhibits a unimodal distribution, with most pairwise comparisons revealing small genetic distances. Mismatch distributions within Groups I and II are multimodal and ragged, and contain a larger proportion of comparisons resulting in larger genetic distances. All inter-group mismatch distributions are clearly multimodal and have similar shapes, with most pairwise comparisons revealing large genetic distances; this same pattern is evident when including all T. dimidiata species complex members (Figure 6a). Figure 6e represents the mismatch distribution for individuals from Petén, Guatemala, where representatives of Groups I, II, and III occur in sympatry. The mismatch frequency distribution is multimodal, with visibly separated peaks that reflect the absence of haplotypes with intermediate genetic distances. This clearly suggests the existence of reproductive barriers isolating these    groups from one another. These Petén specimens are therefore very likely to belong to different biological species; divergence time estimates suggest that they have been evolving independently for at least about five million years (Table 4).

Discussion
Since its description by Pierre André Latreille, in 1811 (as Reduvius dimidiatus), the taxonomy of T. dimidiata has been a topic of controversy (reviewed in [14]). Central to the debate was the issue of whether morphologically recognized subspecies should merit specific status. Lent and Wygodzinsky [6] put an end to the dispute by concluding, after the examination of 160 specimens from the species' distribution range, that the differences observed were ''clinal in nature'', and thus, all morphological types should be considered variations within the same species. Our results unmistakably reject this hypothesis.
We report the existence of very high levels of mitochondrial DNA (cyt b and ND4) sequence divergence among populations of T. dimidiata sampled throughout its geographic range. Bayesian and ML phylogenetic analyses indicate the existence of five well defined monophyletic groups, including the formally described species T. hegneri from the island of Cozumel. Group I stretches from Southern Mexico (Chiapas), all the way through Central America into Colombia, with Ecuadorian specimens resembling Nicaraguan material; Group II comprises samples from western and northwestern Mexico, as well as from Petén in Guatemala; Group III includes specimens from the Yucatán peninsula (including Petén, Cozumel and domestic specimens from Belize); and Group IV includes sylvatic samples from Belize (Figures 2 and  4). We will argue that each of these groups merits specific status.

Hypothesis-testing and Taxonomic Implications
The comprehensive study of Bargues et al. [18] on the phylogeography of T. dimidiata based on ITS-2 sequences greatly advanced our knowledge on the taxonomy and evolution of this important vector. Our mtDNA-based results corroborate those of Bargues et al. [18] in the sense that they identify, overall, the same genetic groups present within T. dimidiata s.l. (Figure 7). However, it appears that the ITS-2 region may be too conserved to fully resolve the phylogenetic relationship among those different genetic groups [17]. By adding resolution to this matter the mtDNA gene fragments bring about a few discrepancies. For example, ITS-2 sequence data place samples from Panama together with those from Colombia, in sub-group 1B, and position T. hegneri specimens within group 2 (sensu Bargues et al. 2008 [18]). The mtDNA markers used here seem to be more appropriate for this level of taxonomic investigation. Being less conserved, and thus more informative, they allow for the detection of readily recognizable, well supported monophyletic groups.
The level of sequence divergence between groups I to IV exceeds the value of 8% reported to separate several closely related Triatoma species [40]. The smallest distances observed here resulted from the comparison of Groups I and II (0.088); all other inter-group comparisons gave values that surpass 13%. T. hegneri and Group I cyt b sequences differ by an average of 8% (Table 3).
Three distinct chromatic forms of T. brasiliensis from northeast Brazil analyzed with the same marker (cyt b) showed large genetic distances (d.0.075), which led to their recognition as members of a species complex [40]. Two of these forms were subsequently formally raised to the specific level [41,42]. Divergence levels of the same magnitude (d.9%), again coupled with chromatic differences, led to the proposition that T. rubida cochimiensis should be considered a full species [34]. Other recent comparisons between valid Triatoma species based on the same marker are T. nitida vs. T. rubida sonoriana/uhleri (d = 10-11%) and T. longipennis vs. T. recurva (d = 11%; [34]). In addition to the very high mtDNA genetic distances among the T. dimidiata groups here described, high values of ITS-2 sequence divergence were also reported for haplotypes belonging to groups I and II (5.62%), which, according to the authors of the study, is suggestive of speciation [18]. These are very convincing arguments in favor of the hypothesis that T. dimidiata is a true species complex.
Group I is the most geographically widespread and genetically variable. Pairwise within-group genetic distances can be as high as 8.5%. The divergent samples from Colombia appear as a monophyletic sister clade with respect to all other specimens in the group (which are predominantly from Central America). Colombian specimens were once described as T. dimidiata capitata on morphological grounds [43], to be later synonymized [6]. Thus, it is fair to speculate that this group might conceal yet another biological species.

Sympatric Occurrence of Different Genetic Groups
Sympatry between Groups II and III is well documented in the Yucatán peninsula [17]. Although there seems to be extensive hybridization, reproductive isolating barriers (RIBs) do exist (such as reduced viability of female hybrids [44]) and appear to prevent the two species from merging into a single entity. This is a compelling argument in favor of the validation of Group III insects as a different species, as previously suggested [16,18].
Remarkably, in Petén, Guatemala, there is not only overlapping occurrence of Groups II and III as in Yucatán, but also of Group I insects (Figure 4). Mismatch distribution results reveal multimodality caused by the absence of haplotypes with intermediate genetic distances among groups ( Figure 6). This is a very

Divergence Times and Biogeography
The Isthmus of Tehuantepec is known to represent an important recent geological barrier for a number of sister taxa of birds, mammals, and butterflies [45]. It has been shown to be a phylogeographical barrier to both highland [46] and lowland species [47]. Given the present distribution of the genetic groups revealed by the mtDNA fragments analyzed in this study, it is tempting to speculate that the Isthmus of Tehuantepec orogeny split the original population and caused the allopatric generation of Groups I and II.
The isolation that might have led to the origin of Group III insects from the Yucatán peninsula could be explained by changes in climate and vegetation that took place particularly during the Pleistocene period. Lee [48] suggests that a period of Pleistocene aridity, during which there was a continuous subhumid to xeric habitat, extended from the Pacific side of Mexico across the Isthmus of Tehuantepec to the gulf coast and from there to the Yucatán Peninsula. The increase in humidity together with the introduction of mesophytic vegetation in the area resulted in an isolation of this subhumid environment from the west of Mexico, leading to speciation.

Triatoma Dimidiata in Ecuador
Bargues et al. [18] proposed that Ecuadorian T. dimidiata populations may have derived from recently introduced specimens originally from the Guatemala-Honduras-Nicaragua region, as a result of human migrations. This view was further supported by subsequent molecular analyses [15,18] and by ecological and biogeographic observations, including the absence of records of wild populations in Ecuador (in contrast with abundant observations elsewhere) and the discontinuous distribution of the species, with Ecuadorian populations isolated from their Colombian relatives by the Central Colombian Massif and the humid Chocó eco-region [10]. The fact that T. dimidiata populations seem to have disappeared from some formerly infested rural areas of Ecuador [49,50] and appear to persist only in a few urban foci (Abad-Franch F, pers. obs.) also matches the predictions of the artificial introduction hypothesis. As shown in the cyt b tree (highlighted in blue on Figure 2a) and the haplotype network (Figure 3), there is a striking similarity between haplotypes obtained from a specimen collected in Masaya, Nicaragua (Nic) and from Ecuadorian material (Ec3 and Ec5). This genetically pinpoints T. dimidiata populations from Nicaragua as the most likely source of insects to have colonized Ecuador sometime in the recent past.

Lanquín Cave Specimens
Studies based on morphometry [51], RAPD [52], antennal sensilla [53] and cuticular hydrocarbons [20,54] of cave-dwelling specimens from Lanquín, Alta Verapaz, in Guatemala, revealed great phenotypic divergence from all other T. dimidiata populations analyzed. The differentiation was so remarkable that it was suggested that these insects could represent an incipient species [51,54]. A different interpretation was put forth by Bargues et al. Figure 7. This figure shows how the topology recovered for the T. dimidiata species complex based on the phylogenetic analysis of ITS-2 sequence data of Bargues et al. [18] (a) compares to the one derived from the mtDNA sequence data (cytb+ND4) presented in this paper (b). Examination of this new figure shows that ITS-2 groups 1, 2, and 3 of Bargues et al. are essentially the same as our mtDNA groups I, II, and III (i.e. they include specimens collected from the same geographic areas). Branch color codes in ''b'' indicate each of the four different genetic groups (plus T. hegneri) that comprise the T. dimidiata species complex. See Discussion section for more details on the few incongruities between the two topologies and on how these were interpreted and discussed. doi:10.1371/journal.pone.0070974.g007 [18], based on the phylogenetic analysis of the ITS-2 region of the rDNA, that these specimens would have derived from the ancestor which gave rise to the subspecies T. d. dimidiata. Yet another result, also derived from the ITS-2 marker, contradicts the former and depicts Lanquín samples as a separate independent lineage [17].
Our results indicate that the Lanquín cave specimens are no different from other T. dimidiata Group I specimens from Central America (see haplotypes GuVe4, GuVe5, and GuVe6 in the upper portion of Group I, and GuVe3 close to haplotypes GuIz2 and GuJu3 in Figure 2). This suggests that Lanquín cave-dwelling specimens represent a striking case of phenotypic plasticity, most likely related to micro-habitat adaptation, within a single genetic cluster.

Samples from Belize
Sylvatic specimens from Belize (Cayo District) represent a different species which constitutes the most basal lineage of the T. dimidiata species complex, as previously suggested based on cuticular hydrocarbon patterns [20]. Divergence time estimates show that this lineage has been evolving independently for approximately 8.25 My (Table 4). These insects are clearly different from the domestic Belize specimens studied by Dorn et al. [17], which belong in Groups I and III (Figure 2). A possible explanation for this incongruence is that the specimens we studied were collected in the Rio Frio Cave. Interestingly, unlike the specimens collected from the Lanquín caves in Guatemala, these insects are quite large and present lighter tegument coloration throughout all developmental stages (Marcet PL, Dotson EM, pers. obs.).

Concluding Remarks
Bargues et al. [18] state, in the Discussion section of their paper, that -''Results of the present study do not support the rise of the abovementioned subspecific taxa to species level for the time being, although it is evident that in the three cases relatively long divergence processes have taken place. Similar genetic studies with other molecular markers may contribute to a more complete assessment of these evolutionary isolation and speciation processes.'' We believe we have made an important contribution toward that end. The data presented here unmistakably reject the hypothesis that the intraspecific variation seen in T. dimidiata is clinal. The results further support previous analyses that indicated the existence of clearly recognizable genetic groups within T. dimidiata. We report the finding of very high levels of mitochondrial DNA (cyt b and ND4) sequence divergence among monophyletic populations of this vector which are incompatible with current views that regard most of these populations (with the exception of the Yucatán clade) as subspecies. We alternatively defend the interpretation that all four genetic groups revealed herein merit specific status. All the evidence presented strongly supports the proposition that T. dimidiata is a complex of five species (as it also includes T. hegneri) that play different roles as vectors of Chagas disease, from the apparently strictly sylvatic populations of Group IV in Belize to the heavily synanthropic populations (Groups I and II) in Mesoamerica, Colombia and Ecuador -and with the Yucatán clade (Group III) apparently presenting intermediate behavior.