Genetic structure and diversity in Brazilian populations of Anastrepha obliqua (Diptera: Tephritidae)

Anastrepha obliqua (Macquart), the West Indian fruit fly, is one of the most economically important pest species in the Neotropical region. It infests an extensive range of host plants that include over 60 species. The geographic range of A. obliqua is from northern Mexico to southern Brazil and includes the Caribbean Islands. Previous molecular studies have revealed significant genetic structure among populations. We used sequences from a fragment of the mitochondrial protein-coding gene cytochrome c oxidase I to estimate structure and genetic diversity of A. obliqua populations from Brazil. We analyzed a total of 153 specimens from the Amazon Forest, Atlantic Forest, Cerrado, and Caatinga biomes. Our study revealed weak genetic structure among the A. obliqua Brazilian populations sampled. Collections from the Amazon Forest had similar haplotype diversity compared to previously reported estimates for collections from the Caribbean and both populations are also closely related to each other, thus challenging the hypothesis that A. obliqua originated in the Caribbean and then moved to other regions of the Americas. Therefore, further evidence is necessary to draw a definite conclusion about the putative center of origin for A. obliqua. Additionally, we suggest a putative historical migration from the west to the east for the A. obliqua Brazilian populations, which could explain the high genetic diversity for this fly in the Amazon Forest and low genetic diversity in the other Brazilian biomes.

Introduction Tephritidae is one of the largest families of Diptera with over 4.500 species worldwide and includes some of the world's most economically important fruit pests [1,2]. The genus Anastrepha (Diptera: Tephritidae) is endemic to the Neotropical region where it is widespread and comprises around 296 species [3]. In Brazil, 120 species of Anastrepha have been reported infesting over 275 hosts in 48 plant families [4].
Anastrepha obliqua (Macquart), the West Indian fruit fly, is the second most polyphagous species within this genus in Brazil and therefore one of the most economically important pest species. It belongs within the fraterculus taxonomic group, which encompasses a total of 34 formally described species that can be distinguished only by morphological characters [2]. The aculeus and tip lengths of A. obliqua vary along a geographic cline and even from specimens reared from same host in Brazil [5]. This species ranges from northern Mexico to southern Brazil and also occurs in the Caribbean Islands [6]. It has been recorded occasionally in Florida, Texas, and California [6], all of which are outside its natural distribution [7]. It infests an extensive range of hosts, over 60 species from 24 plant families, with a strong preference for anacardiaceous species [6][7][8]. In Brazil, A. obliqua has been reported infesting 49 hosts [4].
Molecular studies using mitochondrial DNA (mtDNA) sequences revealed a high genetic diversity among A. obliqua populations [9,10]. Based on the cytochrome oxidase subunit I (COI) gene, for the phylogenetic reconstruction of the fraterculus group [9], eight A. obliqua individuals were clustered into two clades: (I) a clade containing Brazilian A. obliqua, Anastrepha fraterculus (Wiedemann), and Anastrepha sororcula specimens; and (II) a monophyletic lineage of four A. obliqua from Mexico, Brazil, and Colombia. Therefore, A. obliqua populations were not recovered as a monophyletic group raising the question of whether A. obliqua is a cryptic species complex.
A molecular study by Ruiz-Arce et al. [10] on the West Indian fruit fly sequenced two fragments of mtDNA genes [COI and NADH subunit six (ND6)] to test population genetic structure of collections from Mexico, Central America, the Caribbean, and eastern Brazil. The results revealed six distinct genetic clusters in the species that are strongly associated with geography. The genetic differences separating these clusters were similar to the divergence levels separating other species within the fraterculus group, suggesting there could be cryptic species in A. obliqua. The study also showed high haplotype diversity among specimens from the Caribbean and thus those authors raised the hypothesis that the Caribbean, most likely the Greater Antilles, could be the center of origin of A. obliqua. In a recent study, Scally et al. [11] analyzed intra-specific relationships within A. obliqua and interspecific relationships within the fraterculus group using a multi-locus data set (seven nuclear and two mitochondrial loci). The results from the nuclear loci support monophyly of this species, however, analysis of the mtDNA sequences showed high variability revealing mito-nuclear discordance in A. obliqua. Those authors suggested that mitochondrial introgression between A. obliqua and A. fraterculus is one possible explanation for this discrepancy and alternatively incomplete lineage sorting could have caused the high mitochondrial diversity observed in A. obliqua.
Variation among A. obliqua populations has also been documented in behavioral and ecological studies. In Mexico, A. obliqua matings took place in the early morning [12], contrasting with a study showing that A. obliqua populations in Brazil mated in the afternoon [13]. Additionally, morphometric studies have also detected divergence between populations of A. obliqua in Colombia, which formed two groups and probably more than one biological entity could be present within the nominal species [14].
The study by Ruiz-Arce et al. [10] detected population structure for relatively broad regions in the Americas but they did not investigate structure within Brazil because of limited fly collections. It is possible that A. obliqua populations are structured considering different Brazilian biomes. An in-depth knowledge of the genetic diversity and the population history of a species are important for pest management because it improves the capacity of action agencies and pest programs to predict how it will react to environmental changes [15,16]. Here we sequenced a portion of the mitochondrial gene COI to evaluate the genetic structure and diversity as well as the historical demography of A. obliqua populations in four Brazilian biomes: Amazon Forest, Cerrado, Caatinga, and Atlantic Forest.

Ethics statement
A permit for the collection of insects was provided by SISBIO (permanent permit number 30206-1 to RA). No collections involved endangered or protected species.

Sampling, DNA extraction, amplification, and sequencing
Anastrepha obliqua samples were collected in 33 localities covering four distinct Brazilian biomes (Fig 1 and Table 1). Adults were either collected using McPhail traps or reared from ripe or ripening fruit collected randomly both from tree canopies and recently fallen. Voucher specimens were stored in 100% ethanol at -20˚C prior to the molecular procedure. Total Cycling conditions for amplification were 3 min at 94˚C, followed by 39 cycles of 1 min at 94˚C, 1 min at 50˚C and 2 min at 72˚C, and a final extension step of 10 min at 72˚C. PCR products were purified using exonuclease I of Escherichia coli (EXOI) and shrimp alkaline    The forward and reverse sequences were imported to BIOEDIT v7.0.5.2 [18] to produce a consensus sequence for each sample and multiple sequence alignment was generated using Clustal W. All edited sequences were deposited at GenBank (accession numbers KY996561 -KY996713).

Population structure, phylogenetic relationships, and genetic diversity
We initially delineated populations according to Brazilian biomes, as follows: Amazon Forest (AM), Cerrado (CE), Caatinga (CA), and the Atlantic Forest (AF). We constructed a haplotype network using median-joining method [19] in Network v4.603 (www.fluxus-engineering.com) to infer the relationships among haplotypes and their geographical distribution. We examined genetic structure among populations using F-statistics [20] and Analysis of Molecular Variance (AMOVA) in Arlequin v3.1 [21], considering two hierarchical levels. We also estimated two additional AMOVA analyses with three hierarchical levels each one. For the third level of AMOVA, we considered (1) four hypothetical groups according to Brazilian biomes (AM, CA, CE, and AF), and (2) separating the populations in two groups (AM versus CA, CE and AF) according to our phylogenetic and network analyses (see Fig 2). We evaluated the correlation between genetic and geographic distance among the populations using Mantel Tests [22] implemented in IBDWS v3.23 [23]. Bayesian inference (BI) was conducted using MrBayes v3.2.4 [24] to infer the phylogenetic relationships among haplotypes. Additionally, we performed Neighbor-Joining (NJ) and Maximum Likelihood (ML) analyses using MEGA v6 [25] with 1000 bootstraps each and using COI sequences from Caribbean flies previously published by Ruiz-Arce et al. [10]. The appropriate nucleotide substitution model (herein GTR+ Γ) was selected using JModeltest v3.7 [26]. The K 2 P model was used to calculate pairwise distances using MEGA v6 and Arlequin v3.1. BI was carried out with two simultaneously runs of 10 million generations each with four chains and sampled every 1,000 generations. We used Tracer v1.6 [27] to examine run convergence through ESS values (> 200). The first 25% of the trees were discarded as burn-in and the remaining trees were used to produce the maximum credibility tree that was visualized and edited using FigTree v1.4.0 [28]. A sequence of Anastrepha serpentina (Wiedemann) obtained from the Genbank (GenBank: RX130130) was used as the outgroup.
Finally, we calculated the nucleotide diversity (π), number of haplotypes (h), haplotype diversity (Hd), and number of polymorphic/segregating sites (S) for the entire dataset and for each group using DNAsp v5.0 [29].

Historical demography
Demographic history was inferred based on the neutrality tests and assumptions of constant population size using the Tajima's D [30], Fu's F S [31] and R 2 statistics [32] calculated in DNAsp v5.0. These tests were estimated with 10,000 coalescent simulations to calculate significance values and to test the hypotheses that all mutations are selectively neutral. We also estimated mismatch distributions [33] and calculated Sum of Squared Deviations (SSD) and Harpending's raggedness index (rg) in Arlequin v3.1 [21]. Specifically, for these tests we assumed two groups, AM and the eastern portion of South America (CE, CA, and AF). We expected to detect signal of population expansion in the colonized region but not at the center of origin.
Additionally, we performed a Bayesian Skyline Plot (BSP) analysis [34] implemented in BEAST v1.8.2 (Bayesian Evolutionary Analysis Sampling Trees) [35] to estimate changes in effective population size over time. We implemented a strict molecular clock prior with a standard invertebrate mitochondrial divergence rate (1.5% per million years) [36], due to the absence of useful fossil calibrations in Anastrepha. We performed two replicate runs with 100 million of generations, with tree sampling every 10,000 generations. Tracer v1.6 [27] was used to verify convergence (ESS > 200). Replicate runs were combined after a burn-in of 20% using LogCombiner v1.8.2.

Results
Our sequence data set comprised 621 base pairs (bp) of the mtDNA COI fragment from 153 specimens. The sequenced region contained 27 variable sites defining 20 different haplotypes. No premature stop codons or indels were found in the COI fragment. The most common haplotype (H1) was detected in 66% (n = 101) of the samples and it was observed over a Genetic structure of Anastrepha obliqua in Brazil widespread geographic range and among all geographic groups (CE, CA, AM, and AF). The collections from AM showed 15 haplotypes, out of which 14 were singletons. The CE, CA, and AF populations showed two, four and two haplotypes, respectively ( Table 2). The average genetic distance varied between 0.2% and 1.6% among A. obliqua haplotypes (S1 Table).
Our unrooted haplotype network (Fig 2a) demonstrates that the Brazilian flies are genetically diverse with many haplotypes separated by multiple mutation steps. Although the eastern portion of South America (CE, CA, and AF) represented the majority of individuals analyzed (n = 83; 54.24% of our total sampling) they showed only six haplotypes, reflecting a low genetic diversity within these populations. When the flies from the Amazon Forest are excluded, the network exhibits a star-like topology with short branch lengths in which most of the unique haplotypes were closely related to the common central haplotype (H1) (data not shown). Our BI tree was congruent with the unrooted haplotype network (Fig 2b). Additionally, our NJ and ML analyses comprising Caribbean and Brazilian flies showed that the Caribbean haplotypes form a clade with haplotypes from northern Amazon region (Fig 3).
Our AMOVA test did not find any evidence for explicit physical barrier or host association triggering genetic structure in A. obliqua Brazilian populations (Tables 3 and 4). Our Mantel tests indicated a non-significant correlation between genetic and geographic distance (r = 0.0372, p = 0.2450). Table 1. The genetic diversity scores inferred from COI for the entire dataset and for each group are shown in Table 5. The AM populations showed a higher genetic diversity (Hd = 0.8298 and π = 0.00847) when compared to the remaining groups (approximately a 8-fold increase). Neutrality and mismatch distribution tests were performed for the two biome groupings (AM and AF+CA+CE) separately as well as for the total A. obliqua data set. Neutrality tests revealed a genetic signature of demographic expansion when considering only populations from the eastern portion of South America (CE, CA, and AF) and unimodal curves that fit a sudden expansion model. Conversely, the mismatch distribution for the AM group showed a bimodal shape (Fig 4). The SSD and rH of group CE, CA, and AF showed estimates that were not significant,  which indicate that the data have relatively good fit to a model of population expansion [37] ( Table 5).

Biome H Collection Site
Our BSP analysis based on all groups showed demographic equilibrium (Fig 5) and recent and slight postglacial expansion starting approximately after the Last Glacial Maximum (LGM; 21,000 years ago).

Population structure and genetic diversity
Previous molecular studies using mitochondrial DNA genes have shown a relatively high genetic diversity of the West Indian fruit fly [9,10]. Our study also revealed a high genetic diversity within Brazilian populations. However, our study used a distinct region of the COI gene than the one used in the Smith-Caldas et al. [9] study and smaller sequences than those used by Ruiz et al. [10], thus precluding direct comparison of diversity levels across studies. Ruiz-Arce et al. [10] found strong genetic structure in A. obliqua populations from Mesoamerica, western Mexico, Central America, the Andean and Caribbean regions, and eastern Brazil, suggesting the occurrence of some geographic barrier to gene flow. Our data did not detect population structure across the Brazilian biomes. In addition, no phylogeographic pattern was observed among Brazilian biomes other than low genetic diversity for biomes on the eastern portion of Brazil. The H1 haplotype is predominant in CE, CA, and AF biomes and present in the AM biome (specifically Amapá and Pará) indicating that there was gene exchange between the A. obliqua populations. Lack of population structure could be due to human-aided dispersal and/or that these populations cross naturally in the field. The fly populations from the CE, CA, and AF biomes had low mtDNA haplotype and nucleotide diversity values, which could indicate that these populations have experienced a recent population expansion after a small effective population size or a bottleneck effect. Low haplotype and nucleotide diversity can indicate that demographic events have affected populations, such as a decline in population size or founding effect [38]. When the flies from the Amazon Forest are excluded from the data set, the haplotypes form a network exhibiting a star like topology with the haplotype H1 at the center.
Low levels of genetic diversity were detected in A. obliqua populations from Mesoamerica and Western Mexico [10]. Similarly, Lima 2011 [39] analyzed A. obliqua populations from Northeastern and Southeastern Brazil and also observed low genetic diversity using nuclear genes, which could be associated with a founder and bottleneck events followed by a recent Considering our sampling, the populations from the Amazon region showed the highest levels of haplotype (Hd = 0.8298) and nucleotide (π = 0.00847) diversity. These populations had several private haplotypes, which could have resulted from limited migration and gene flow among them. Ancestral populations generally show high levels of genetic diversity in contrast with populations established more recently [40]. Moreover, it has recently been proposed that the geographic regions with populations showing high levels of genetic diversity acted as dispersal centers for the fruit flies Bactrocera dorsalis (Hendel) and Anastrepha ludens (Loew) [41,42]. Based on our current sampling for the Amazon region, we see a pattern of high haplotype diversity (Hd = 0.814), which is very similar to what was observed in populations from the Caribbean, considered as the probable center of origin for A. obliqua [10]. Interestingly, Caribbean haplotypes are closely related to haplotypes from the northern Amazon region suggesting a connection between these two geographic regions (Fig 3). Some of the sequences seen among Caribbean samples are not exclusive to that region as they are shared with others, including Mesoamerica [10]. Both the Caribbean and Amazon represent regions of high diversity for this fly that arose independently from an ancestral source. Thus, further evidence is necessary to draw a definite conclusion about the putative center of origin for A. obliqua. Resolving the origin and pathway for these sequence types may require additional and unlinked markers. With respect to collections gathered from the biomes included in our study, the haplotypes diversity estimates suggest the Amazon Forest is probably the source of this species in this region.

Demographic history
The negative values of neutrality tests and unimodal mismatch distribution suggest that the populations from eastern South America (CA+CE+AF) have undergone a recent demographic expansion. In contrast, the AM group exhibited multimodal mismatch distribution patterns suggesting demographic equilibrium or a stable population, indicating that these populations have been stable over time. Thus, we detected signals of demographic expansion in the East (CE, CA and AF) but not in the West (AM) for A. obliqua Brazilian populations. Long-distance dispersal events tend to result in reduced genetic diversity due to sequential founding events and genetic drift [43]. The tests about demographic expansion performed by Ruiz-Arce et al. [10] for the Caribbean A. obliqua populations were incongruent since the mismatch distribution was unimodal and consistent with a population expansion event, but the neutrality tests were not significant. Hence, based only on the demographic tests both scenarios on the origin of A. obliqua (Caribbean versus Amazon) are equally parsimonious.
Our BSP analysis revealed stability followed by a recent population expansion in the A. obliqua Brazilian populations. Similarly, Ruiz-Arce et al. [10] verified signs of demographic expansion in A. obliqua populations from Mesoamerica, suggesting a bottleneck event followed by expansion. Changes in the geographic and climatic patterns over time might be associated with the distribution of genetic variability in A. obliqua populations from South America.
As our results are based solely on mtDNA sequence data, they should be viewed with some caution, since our conclusions might be biased due the intrinsic mutation rate, mode of inheritance, and effective population size of mtDNA markers [44]. Additionally, there is evidence of introgression between A. obliqua and A. fraterculus [11] and that may also be present among our sampling. Thus, further studies integrating ecological, morphological, and molecular (using mtDNA and nuclear markers) data are necessary to understand the evolutionary history of the West Indian fruit fly.
Supporting information S1 Table. Estimate of genetic distance of Anastrepha obliqua haplotypes based on sequencing of a fragment of the mitochondrial COI gene. Analyses were conducted using the K 2 P model. (DOCX)