Molecular Phylogenetics and Evolution Phylogenomics of the genus Tursiops and closely related Delphininae reveals extensive reticulation among lineages and provides inference about eco-evolutionary drivers

Phylogeographic inference has provided extensive insight into the relative roles of geographical isolation and ecological processes during evolutionary radiations. However, the importance of cross-lineage admixture in facilitating adaptive radiations is increasingly being recognised, and suggested as a main cause of phylogenetic uncertainty. In this study, we used a double digest RADseq protocol to provide a high resolution (~4 Million bp) nuclear phylogeny of the Delphininae. Phylogenetic resolution of this group has been especially intractable, likely because it has experienced a recent species radiation. We carried out cross-lineage reticulation analyses, and tested for several sources of potential bias in determining phylogenies from genome sampling data. We assessed the divergence time and historical demography of T. truncatus and T. aduncus by sequencing the T. aduncus genome and comparing it with the T. truncatus reference genome. Our results suggest monophyly for the genus Tursiops , with the recently proposed T. australis species falling within the T. aduncus lineage. We also show the presence of extensive cross-lineage gene flow between pelagic and European coastal ecotypes of T. truncatus , as well as in the early stages of diversification between spotted ( Stenella frontalis ; Stenella attenuata ), spinner ( Stenella longirostris ), striped ( Stenella coeruleoalba ), common ( Delphinus delphis ), and Fraser’s ( Lagenodelphis hosei ) dolphins. Our study suggests that cross-lineage gene flow in this group has been more extensive and complex than previously thought. In the context of biogeography and local habitat dependence, these results improve our understanding of the evolutionary processes determining the history of this lineage.

With the advent of next-generation genomic technologies, analyses of large genetic datasets are increasingly tenable, particularly with reduced-representation protocols such as RADseq (Rubin et al., 2012;Andrews et al., 2016). Due to the large number of sites scattered across the genome, these reconstructions present many challenges regarding the choice of appropriate evolutionary models and partitioning schemes. However, they can produce high resolution phylogenetic trees that avoid the bias of single gene trees typical of mtDNA analyses (Wagner et al., 2012;Malinsky et al., 2015;Moura et al., 2015). They can also provide insights into the possibility of reticulate evolution and thus lead to improved phylogeographic inferences and clarification of evolutionary histories (Eaton and Ree, 2013).
Here we study species in the family Delphinidae (oceanic dolphins). This group is thought to have originated from a single species radiation which started approximately 10-5 Ma, and the associated formation of distinct lineages in quick succession has made phylogenetic inference challenging for this group (McGowen, 2011). This is particularly apparent for~14 species of the subfamily Delphininae, for which no agreed topology exists (Kingston et al., 2009;McGowen, 2011;Amaral et al., 2012a,b;Perrin et al., 2013), although some patterns have begun to emerge with more data (McGowen et al., 2019). The relationships among bottlenose dolphin lineages (genus Tursiops spp.) are particularly contentious, but potentially informative concerning evolutionary processes, given the propensity for this highly mobile genus to show fine-scale patterns of population genetic structure (e.g. Natoli et al., 2004;Gaspari et al., 2015;Gray et al., 2018). The genus is distributed worldwide in both tropical and temperate seas and shows a high degree of morphological and ecological variation. Two species are currently recognized: T. truncatus (common bottlenose dolphin) which has a broad distribution across the genus range, and T. aduncus (Indo-Pacific bottlenose dolphin) which is limited to warm temperate and tropical coastal regions of the Indian and Pacific Oceans (Committee on Taxonomy, 2018). A third potential species, T. australis (Burrunan dolphin) has been identified from Australia (Charlton-Robb et al., 2011), but its validity remains uncertain (Committee on Taxonomy, 2018).  (Ronquist et al., 2012), under the partitioning substitution model scheme inferred in PARTITIONFINDER (Lanfear et al., 2017) for Tursiops and related Delphininae species (see Methods for more details). Coloured dashed and dotted lines show the geographic range for the different Tursiops species/ecotypes used in this study. Samples marked as bold identify those belonging to T. truncatus offshore ecotype.
Despite several dedicated studies, there are still uncertainties about the phylogenetic relationships between the major lineages of Tursiops, reflecting difficulties with phylogenetic reconstruction of recent species radiations more generally. Trees inferred from nuclear autosomal DNA generally support monophyly for the Tursiops lineage (Kingston et al., 2009;McGowen, 2011; but see Amaral et al., 2012a,b), while markers with uniparental inheritance such as mtDNA (LeDuc et al., 1999;Xiong et al., 2009;Moura et al., 2013) and Y linked markers (Nishida et al., 2003(Nishida et al., , 2007 support polyphyly. Although mitogenomes provide high resolution, they represent only one maternally-inherited locus and can be unreliable when recovering the 'true' species history (e.g. Nichols, 2001;Hoelzel and Moura, 2015;Moura et al., 2015;Mendes et al., 2016;Platt et al., 2017); a problem that is particularly common in groups that have experienced recent diversifications (Streicher et al., 2016). This is compounded by the possibility of cross-lineage gene flow at different points in the evolutionary history of the group, with several species of dolphins known to hybridize with Tursiops (and in the broader Delphininae) both in captivity and in the wild, which does not only involve putative sister-species (Crossman et al., 2016;Bérubé and Palsbøll, 2018).
In this study we use RADseq data together with a shotgun genome sequence of T. aduncus (compared with the published T. truncatus genome) to investigate the phylogenetic relationships of the genus Tursiops within the broader context of the sub-family Delphininae. Our main objective is to draw inference from the evolutionary history of division and admixture in this lineage, to better understand the relevant eco-evolutionary processes driving their rapid species radiation. To accomplish this we aim to resolve the question of monophyly and lineage structure for the genus Tursiops using high resolution nuclear DNA sequence data, to improve our understanding of these phylogenetic relationships within the broader Delphininae, and to consider evidence for reticulate evolution influencing the pattern of evolutionary radiation for this group in the context of their biogeography and life history.

RAD library preparation
Samples of Tursiops spp. were obtained from various tissue archives, representing most of the main species/ecotypes commonly recognized. Details regarding sources of these samples are described by Moura et al. (2013;see Fig. 1 for geographic distribution). In addition, samples were obtained for species from the broader Delphininae as defined by McGowen (2011), and with the rough-toothed dolphin, Steno bredanensis, included as an outgroup (See Supplementary Table S1 for details on all samples).
Genome-wide SNP discovery and genotyping was achieved through a modified double digest RADseq protocol (Peterson et al., 2012), starting with 250 ng genomic DNA per sample, and using MspI as the frequent cutter and HindIII as the infrequent cutter. Genomic DNA concentration was estimated using the Qubit High Sensitivity kit (Thermo Fisher Scientific), and 250 ng per sample was used for library preparation. Restriction digest was carried out for both enzymes simultaneously, in a reaction mixture of 1 X NEB Buffer 4, 12.5 mM of Spermidine, 1 mg/ml of BSA, and 100 units of each enzyme. Digestion was carried out at 37°C overnight on a thermocycler. Sequencing adapters were ligated using T4 ligase in a reaction solution containing 1 X Buffer, 400 units of T4 Ligase and 1.5 μM of adapter that contained a unique in-line barcode for each individual. Ligation was carried out in a thermocycler, with a program of 30 min at 16°C followed by 20 min at 65°C. Uniquely barcoded samples were pooled, and cleaned using a PureLink PCR Micro Kit (Invitrogen). Fragments between 400 and 600 bp were size-selected using a Pippin Prep (Sage Science). Optimal target size was determined beforehand by carrying out trial digestions for all Tursiops species/ecotypes, which were run on an agarose gel to identify potential satellite DNA array sizes to be avoided. The size selected fragments were cleaned using the PureLink PCR Kit (Invitrogen). An indexing PCR was then carried out using a Phusion Master Mix kit (New England BioLabs). Eight uniquely indexed libraries were prepared, each containing ten individually barcoded samples, with randomized allocation of species/ecotypes across libraries. Fragment size distributions of resulting libraries were evaluated using a Tapestation (Agilent), and library DNA concentrations were estimated using qPCR. Pools were then sequenced on a single lane of an Illumina HiSeq 2500 using 125 bp paired-end sequencing.

Genotype calling
Demultiplexing of raw reads was carried out using the pro-cess_radtags program in STACKS (Catchen et al., 2011). Filtered reads for all samples returning more than 1 million reads were then mapped against the bottlenose dolphin reference genome (TurTru1.83 downloaded from the Ensembl website) using BWA  and genotypes were called using the GATK UnifiedGenotyper . UnifiedGenotyper was run with a minimum base quality score of 10 and set to output all confident sites. This strategy of mapping reads to a closely related species has been previously shown to be appropriate for SNP determination from RAD data (Shafer et al., 2017). The resulting vcf files were then filtered using VCFTOOLS (Danecek et al., 2011) to remove all sites with depth of coverage below 20, with genotype quality below 20, and to remove indels. The maximum amount of missing data per site was set to 5.

Phylogenetic inference
Phylogenetic trees were generated using MRBAYES (Ronquist et al., 2012). We performed two initial analyses, one in which the dataset was un-partitioned, and one inferred under a partitioned model of nucleotide substitution. Partitioning a concatenated RAD alignment is not straightforward. The intuitive strategy of using independent models for each read is generally too computationally demanding, and the short length of reads makes accurate model testing impossible. Furthermore, the lack of robust information on the group's evolutionary patterns and chromosome assembly impairs the creation of custom models. Therefore, we chose the approach implemented in the software PARTITIO-NFINDER (Lanfear et al., 2017), which uses heuristic algorithms to identify a best possible model within reasonable computational times. The alignment was annotated into reads using a mapping strategy based on proximity of mapping reference position. All confidently mapped sites that were separated by a gap of less than 300 bp in the reference genome were grouped into individual reads, and the best partition search was done using the modified hierarchical clustering algorithm (Lanfear et al., 2014) in RAXML (Stamatakis, 2014) with rcluster-max set to 100. The best fitting model was selected using the Bayesian Information Criterion.
For the unpartitioned dataset, two independent runs of 10,000,000 iterations with the initial 25% iterations discarded as burn-in were carried out, each divided into four chains, three of which were heated. Similar settings were used for the partitioned dataset, except that the chains were run for 100,000,000 iterations necessary to achieve convergence (see results). Convergence of runs was assessed by comparison of the log-likelihood (−lnL) plots for two independent runs, and assessing whether the PSRF + statistic was close to one for all main parameters estimated. For the unpartitioned dataset, a single GTR + I + G nucleotide substitution model was used.
To evaluate the possibility of long-branch attraction (LBA) with the outgroup, we also carried out maximum-likelihood tree inference using RAXML (Stamatakis, 2014). All trees were inferred by running 10,000 rapid bootstraps followed by thorough maximum-likelihood search for the best tree, under a GTR + I + G nucleotide substitution model. The process was repeated for an alignment without the outgroup sequences (Steno bredanensis), and the bootstrap scores compared between the trees. If LBA was occurring, outgroup removal should substantially improve bootstrap scores for the nodes affected (Bergsten, 2005). To investigate other potential sources of bias, subsets of sites were identified using the strategies described below. Phylogenetic trees were generated using MRBAYES as described above, except where otherwise noted.

Identification and removal of sites potentially under selection
Sites potentially under selection were identified using the following outlier method. Considering only Tursiops samples, invariable sites were first removed from the vcf file using VCFTOOLS. The resulting SNPs were then analysed using LOSITAN (Antao et al., 2008), considering all the main Tursiops ecotypes/species as different populations (defined in Moura et al., 2013). LOSITAN uses the Fdist method described by Beaumont (2005) to compare F ST against heterozygosity for each SNP, and calculates an expected neutral distribution for all SNPs. SNPs that are found to be outliers against this distribution are inferred as being putatively under selection. It should be noted that processes other than selection could cause deviations from neutrality, e.g. extensive gene flow between lineages could lead some loci to be identified as under balancing selection. Our approach is nevertheless conservative, as LO-SITAN ensures that only sites that are behaving according to neutral expectations are kept in the dataset.
An initial run was carried out to remove putative outliers before inferring the neutral distribution. Significance was determined by 50,000 simulations and a false discovery rate of 0.05. SNPs under putative selection were removed from all species, using a similar site mapping strategy as described above, to identify individual reads based on proximity of mapping reference position. First, all sites that were separated by a gap of less than 20 bp were grouped into individual reads, then all reads containing a SNP identified as potentially under selection were removed from the alignment using VCFTOOLS. Phylogenetic trees were then built from non-outlier loci using MRBAYES, as described above.

Phylogenetic analysis of differences in GC content
To assess the effects of differences in GC content, the main alignment was separated into two datasets based on GC proportions. A 50% majority consensus sequence of the main alignment including all species was generated using GENEIOUS (Kearse et al., 2012). This sequence was then separated into individual reads using the perl script SEQCONVERTER (Bininda-Emonds, 2010) based on the mapping strategy detailed above using a 300 bp gap criteria. The R package BIOCONDUCTOR (Huber et al., 2015) was then used to calculate GC content of all individual reads, and the main vcf file was then separated into low and high GC reads, based on a threshold of 50% GC. Phylogenetic trees were then built for both the low and high GC alignments using MRBAYES as described above, but using 1,000,000 iteration runs. It should be noted that this approach can mask potential differences in GC content between clades in the tree, as it is based on average values across all sequences. This can never be fully avoided, and in Delphininae, genetic distances are not so large that substantial bias would be expected, but it should nevertheless be considered when interpreting our results.

Estimate and analysis of reticulation patterns
To infer and visualize patterns of potential reticulate evolution, we first calculated a split decomposition network using the software SPL-ITSTREE (Huson and Bryant, 2006). This creates a graphical representation of multiple credible links between taxa, but does not identify the process leading to the inferred reticulation.
To distinguish between the possibility of incomplete lineage sorting and ancestral hybridisation, an ABBA/BABA test was carried out by calculating D statistics (Durand et al., 2011). Calculations were carried out for selected clade trios and respective outgroups, based on the patterns obtained in SPLITSTREE and carried out using the R package EVOBIR (Blackmon and Richard, 2015). Two calculation strategies were carried out: 1 -a single D score was calculated using information from all sequences within each clade; 2 -for each clade with more than two sequences, a 75% strict consensus sequence was generated using GEN-EIOUS. If only two sequences were available, then the sequence with the least missing data was used. D was then calculated using the single sequence for each clade, with p-value obtained through 1000 bootstrap resampling. Since the consensus sequence will be biased towards SNPs that are fixed within the clades, results of this analysis should be thought as reflecting potential gene flow patterns that occurred at more ancient time in the lineages evolutionary history, as compared to the strategy of calculating D from all sequences.
The ABBA/BABA test is commonly used in this context, but it is based on assumptions such as equal rates of evolution between branches, and no substantial changes in effective population size. When comparing data from multiple genomic regions of varying genealogies, the consensus tree can diverge from the species tree in predictable ways (Degnan and Rosenberg, 2009). Because of this, it is possible to analyse patterns of gene discordance in phylogenetic signal, to infer potential processes underlying phylogenetic instability such as cross lineage gene-flow. Therefore, we also carried out a Bayesian Concordance Analyses (Ané et al., 2007) to test for potential cross-lineage Table 1 Results for the ABBA/BABA test using D statistics, as calculated in the R package evobiR (Blackmon and Richard, 2015). P-values marked in bold are significant at the 0.05 threshold after Bonferroni correction. Clade names translation relative to Split decomposition network (  A.E. Moura, et al. Molecular Phylogenetics and Evolution 146 (2020) 106756 reticulation. This approach generates separate phylogenetic trees for each individual gene or genomic region, and summarizes them using two methods: a consensus tree of all genomic regions, and a quartet tree where only the most common quartets of taxa are used. If discordance between "gene" trees is due to incomplete lineage sorting (ILS), the two trees should result in similar topologies, but not if discordance is caused by hybridisation. For this analysis, the different genomic regions were created by separating the main alignment into individual segments of 40,000 bp in length (relative to the whole alignment length), plus a final alignment of circa 20,000 bp. This resulted in 101 individual computationally tractable genomic segments that we considered as separate "genomic regions". We then calculated phylogenetic trees for each individual genomic region, using the GTR + I + G model of nucleotide substitution and 10,000 iterations of the MCMC chains. The resulting trees were summarized and used in a Bayesian concordance analyses (Ané et al., 2007) using the software BUCKY (Larget et al., 2010). To check for consistency of results, multiple independent runs were carried out, with MCMC lengths of 100,000 and 1,000,000 and alpha values of 1 and 5.

Reconstruction of ancestral effective population size
The large number of loci incorporated in this study and associated variation in mutation rates, make node dating difficult, imprecise and computationally demanding. In order to generate an estimate of divergence dates within Tursiops, ancestral changes in effective population size (N e ) were reconstructed for T. truncatus and T. aduncus, through the pairwise sequential Markovian coalescent (PSMC) model, implemented in the PSMC software package (Li and Durbin, 2011). When the PSMC inferences of two closely related species are plotted together, the time point where the two PSMC plots diverge approximates the divergence time, because these two species share the same N e value before divergence. The genomic sequence of T. truncatus was downloaded from the NCBI reference genome available in SRA Accession number SRX200685. One T. aduncus sample originating from South Africa was shotgun sequenced in a single lane on the Illumina HiSeq 2500, using version 4 chemistry. The library was constructed starting with 2 μg of purified DNA, using the Illumina PCR-free Tru-Seq kit following manufacturer's instructions. The short reads from both T. truncatus and T. aduncus were mapped to the Ensembl T. truncatus genome (turTru1.92) using Bowtie2 (Langmead et al., 2009), and duplicated reads were removed using SAMTOOLS ). Details about mapped reads are provided in Table 2. Average coverage is greater than 18 for both species, suggesting that the amount of sequenced reads is enough for PSMC calculation (Nadachowska-Brzyska et al., 2016). The PSMC perl scripts (Li and Durbin, 2011) were then used to produce the final N e plot with 64 atomic time intervals and 28 free interval parameters (the "-p" parameter used for PSMC was set to "4 + 25 * 2 + 4 + 6"). The scaling to real time was based on a generation time of 21.5 years (Taylor et al., 2007) and mutation rate of 1.5 × 10 −8 mutations/site/generation as described in (Moura et al., 2014). Bootstrapping was performed 100 times by breaking the consensus sequences into 5 Mb segments and randomly resampling a set of segments by replacement.
Although the estimated N e between two splitting groups is expected to deviate at time of splitting, this is not necessarily the case, especially if the two populations had the same size after the split. To avoid such bias, we also performed a pseudo-diploid approach (Li and Durbin, 2011;Prado-Martinez et al., 2013) for estimating the divergence time between T. truncatus and T. aduncus. For this, two haploid sequences from each species were hybridized into a pseudo-diploid genome using the SEQTK software (https://github.com/lh3/seqtk), followed by a PSMC estimate with this pseudo-diploid sequence. It is expected that the time point where the inferred N e rises to infinity corresponds to the divergence time between the two groups (Prado-Martinez et al., 2013). To derive haploid sequences from each species, loci with low estimated consensus quality (< 20) were excluded. For heterozygous sites, one of the alleles was excluded randomly. The mitochondrial scaffold was excluded from all the PSMC analyses. In addition, scaffolds identified as putative X chromosome regions (see Table S2 for detail) were also excluded from the PSMC computation. As a result, we obtained a pseudodiploid genome with 1.75 Gbp of length for PSMC calculation.

Results
The mean number of reads produced per sample was 5,000,396, ranging from 1,083,995 to 13,423,013 (note that samples with less than 1,000,000 reads were not included in this study). Our genotype calling strategy produced an initial alignment of 4,029,090 bp, of which 26,720 were SNPs. Most missing data occurs on five samples: the two S. bredanensis samples used as an outgroup, one South African T. aduncus, one pelagic T. truncatus and the Pakistani T. aduncus samples. However, none of these samples proved to be problematic in regard to our phylogenetic analyses, and consistently grouped within the expected clade based on ecotype/species classification (see below for more details).
All Bayesian phylogenetic trees produced in this study achieved significant convergence statistics, and showed similar -lnL values between independent runs, apart from the partitioned Bayesian analysis which showed dissimilar -lnL likelihood values between runs. This occurred in spite of running the analyses for considerably longer than others (100,000,000 iterations), which likely reflects the computational complexity of a partitioned model for RADseq data. Nevertheless, the topology obtained reflects some of the most consistent patterns obtained among the various other analyses undertaken in this study, and provides higher resolution at the intra-specific level.
There is little difference in topology between trees built under a single versus a partitioned nucleotide substitution model. Inconsistencies involved the striped and spotted dolphins, and did not meaningfully alter the phylogenetic relationships of the other groups. As the partitioned scheme mostly accurately reflects the nature of RADseq data, we report the results obtained in this tree. The phylogenetic tree resulting from the partitioned analyses shows that the genus Tursiops forms a monophyletic group, with all T. truncatus individuals forming a well-supported clade (Fig. 1). The putative species T. australis is nested within T. aduncus as a sister clade to Australasian T. aduncus (Fig. 1). Within T. truncatus, the western north Atlantic coastal ecotype forms the sister clade to all other ecotypes/populations, with the worldwide offshore ecotype grouping as a sister clade to the Mediterranean + Black Sea populations. All Black Sea individuals form a monophyletic group nested within the Mediterranean population.
The striped dolphin (Stenella coeruleoalba) is the delphinine species most closely related to Tursiops, although the posterior probability (PP) for that node is < 0.95. The next closest clade to Tursiops includes common (Delphinus delphis), spinner (Stenella longirostris) and Fraser's (Lagenodelphis hosei) dolphins. This clade is also consistently recovered in different trees produced in this study. The spotted (S. frontalis; S. attenuata) dolphin lineages form a sister clade to all other Delphininae included in this analysis. All Bayesian support values for the nodes separating spotted, striped and common + spinner + Fraser clades are 0.92. Given that several studies have shown that Bayesian posterior probability values can overestimate statistical confidence of individual nodes (Simmons et al., 2004), a value of 0.92 implies a lower confidence than other basal nodes separating different species. A.E. Moura, et al. Molecular Phylogenetics and Evolution 146 (2020) 106756 Furthermore, the exact positioning of the spotted and striped dolphins within Delphininae is variable depending on site selection. Phylogenetic trees with and without outliers show mostly congruent topologies, except in the placement of the striped dolphin (S. coeruleoalba). While in the full dataset striped dolphin (S. coeruleoalba) groups among the non-Tursiops Delphininae, when outliers are excluded it groups as the sister clade to all Tursiops (Fig. S1), similar to what is seen in the partitioned tree. Therefore, removal of outliers results in a topology similar to that obtained with the most realistic mutation model, in spite of reduced resolution of the dataset (as it removes part of the variable sites). Although the topologies recovered by the two GC content groups appear quite distinct, this is mostly due to shifts in the phylogenetic position of the striped (S. coeruleoalba) and spotted (S. frontalis; S. attenuata) dolphins lineages. However, the basal nodes of these groups have somewhat lower statistical support, particularly in the high GC content trees where it is very close to the 50% threshold (Fig. S2). In our case the higher GC content topology was most consistent with phylogenies based on the full dataset, but the distortion in the low-GC content topology only involved a few OTUs. Crucially though, the shift in position of the striped dolphin (S. coeruleoalba) caused Tursiops to become polyphyletic, the only tree in this study where this was the case. A more detailed version of the results regarding the site selection analyses is available in the supplementary material.
The split decomposition network shows extensive reticulation between Tursiops species/ecotypes (Fig. 2), reflecting the presence of several alternative links between major clades. We should note that reticulation in this figure does not necessarily identify the causative biological process. To identify whether such links reflect gene-flow between lineages, we integrate these results with additional analyses meant to address the question of admixture (such as the ABBA/BABA analyses). The most extensive reticulation is found between the Australasian T. aduncus, South African T. aduncus and T. australis, and between the coastal and pelagic T. truncatus ecotypes in the North Atlantic. Some cross links are also noticeable between clades that are taxonomically well differentiated in the broader Delphininae. Notably, the lineage leading to striped dolphins shows extensive reticulation with the lineage leading to spotted + common + spinner + Fraser's dolphins before it diversified. There is also apparent reticulation between the striped dolphin and the Western North Atlantic coastal ecotype of T. truncatus, although it likely occurred early in the histories of those lineages.
Most D-statistics calculated for ABBA-BABA comparisons involving Tursiops clades lineages are, however, not significantly different from zero, suggesting that incomplete lineage sorting may be an important factor generating the reticulations found among these lineages (Table 1 -Trios 1-4). In contrast, D-statistics are significantly divergent from zero (when gene flow is inferred) for comparisons involving spotted (S. frontalis, S. attenuata), striped (S. coeruleoalba) and the common (D. delphis) + spinner (S. longirostris) + Fraser (L. hosei) clades (Table 1 -Trios 6-8). Because the topology recovered between those clades was inconsistent between analyses, we tested three different topologies (Trios 6-8). From these topologies, the one where the spotted dolphin lineage branches from the most basal node (Table 1 -Trio 7; consistent with the topology inferred in the partitioned tree - Fig. 1) results in the lowest deviation from 0 (although significant), and is therefore the one that implies lower levels of gene flow. Interestingly, this topology results in similar values of D for the population and consensus-based calculations, while for the other topologies the population based D is much closer to zero as compared to the consensus-based D. The consensus-based D includes only SNPs that are fixed within each clade, and therefore does not include mutations that are more recent and have yet Fig. 2. Split decomposition network produced in SPLITSTREE (Huson and Bryant, 2006), for all samples used in this study. Cross links represent alternative possible topologies between groups.
A.E. Moura, et al. Molecular Phylogenetics and Evolution 146 (2020) 106756 to fixate in the clades. Therefore, stronger deviations from 0 for consensus-based D, is suggestive of gene flow occurring earlier in the history of the respective lineages.
Other trios with D-statistics significantly different from zero all involved comparisons between striped dolphins, pelagic T. truncatus and coastal T. truncatus in the Atlantic (both European and American coasts; Table 1 -Trios 5, 9). Both trios tested suggested potential gene flow between pelagic T. truncatus and striped dolphin, as these clades have a higher proportion of shared derived alleles compared to the coastal clades. For these trios the consensus-based D is higher than the population based value, again suggesting that patterns of gene flow might be relatively ancient. The last trio showing a significant D suggests gene flow between T. truncatus from the Mediterranean and the Black Sea (Table 1 -Trio 3), which could again be interpreted as relatively ancient.
The gene tree concordance analyses suggest that gene flow is only likely between spotted dolphins and the common + spinner + Fraser's dolphin lineage, and between the pelagic and the Mediterranean ecotypes of T. truncatus. According to this analysis, reticulation between other groups visible on the split-decomposition network, are thus likely due to incomplete lineage sorting. The tree based on only the most concordant quartets (Fig. 3, quartet tree) is consistent with the phylogenetic tree inferred from the full dataset using a partitioning scheme (Fig. 1). A notable difference is in the placement of the T. truncatus samples from Oman (Arabian Sea), which groups closely with Mediterranean samples in the quartet tree (Fig. 3), but groups with the pelagic ecotypes in the partitioned tree (Fig. 1). The inference from concordance analyses was robust to different alpha priors, and the length of the MCMC chain.
Maximum-likelihood trees show a similar topology to that obtained by the Bayesian algorithms for the major taxa, but lacked resolution for more terminal nodes, particularly intraspecific nodes (Fig. S3). Reconstruction of trees without the outgroup improved bootstrap scores for the nodes involving the spotted and striped dolphin lineages, however they remained low compared to other nodes, and always below 75%. This mostly affected branching order between the lineages including Stenella, but not their placement relative to Tursiops.
Our PSMC analyses demonstrate that N e profiles for T. aduncus and T. truncatus are similar until roughly 2 Ma (Fig. 4). Both species experienced an increase in inferred N e at this time, but it is much more pronounced in T. aduncus. However, at around 1 Ma N e for both species started to decrease until more modern times, when it approaches the A.E. Moura, et al. Molecular Phylogenetics and Evolution 146 (2020) 106756 present where inference becomes less robust. There are some upward fluctuations during this period, which occurred at different times for each species, but the overall trend is that of gradual reduction. The pseudo-diploid analyses shows the increase in inferred N e starts at 2 Ma, and rises to infinity at approximately 1 Ma, suggesting this to be the most likely time of divergence (end of gene flow) between the two species. Before this time, N e profiles were similar.

Discussion
In this high resolution phylogenomic study including the majority of currently recognized species included in the sub-family Delphininae, we resolve some long-standing inconsistencies in topology within this group, and provide novel evidence that can help explain the evolutionary drivers of the group's radiation. All representatives of the genus Tursiops consistently form a monophyletic group in our analyses. Monophyly of Tursiops is consistent with patterns reported in previous studies based on somatic genetic data (Kingston et al., 2009;Steeman et al., 2009;McGowen, 2011;McGowen et al., 2019), while studies resulting in a polyphyletic Tursiops were either based on mtDNA only (Möller et al., 2008;Vilstrup et al., 2011;Moura et al., 2013), had relatively low resolution (Kingston et al., 2009;Amaral et al., 2012a,b), or both (LeDuc et al., 1999. In our study, only the tree based on reads with < 50% GC content (Fig. S2) showed Tursiops as polyphyletic, as a result of shifts in the position of striped dolphin (Stenella coeruleoalba).
It is more common for phylogenetic distortions to be apparent in GC rich reconstructions (Romiguier and Roux, 2017), however in our study the effect may be associated with reticulation, given that the rest of the phylogeny is unaffected, and S. coeruleoalba in particular shows evidence of strong reticulation with the Tursiops lineage (Fig. 2, Table 1; see below for more details). Although we cannot exclude some effect of long-branch attraction in the shifts observed for striped and spotted dolphins, the site selection caused more substantial shifts than those cause by the exclusion of the outgroup, and therefore we believe that long branch attraction does not provide a sufficient explanation for the inconsistencies found between our various analyses. This phylogeny thus provides the clearest indication so far for a classification that would recognise Tursiops as monophyletic, with several putative subspecies placed within two well-defined lineages. At the same time, evidence is provided for historical admixture both within the genus Tursiops and among other species within the Delphininae, including across genera. In particular, there is evidence for relatively ancient admixture between T. truncatus and striped dolphins in the North Atlantic.

Phylogeographic inference & evolutionary process
Our main phylogenetic tree (Fig. 1) resolves some earlier questions about the evolution of lineages within the genus Tursiops. Notably, T. australis is nested within T. aduncus, forming a single monophyletic lineage. This is in contrast to previous results based on whole mitogenomes, placing it as a separate lineage branching from a more basal node relative to both T. truncatus and T. aduncus (Moura et al., 2013). This previous result suggested a localized Australasian origin of Tursiops. However, our current study proposes instead that the group's origins lie more broadly in the Indian and Pacific Oceans. Its separation into species/ecotypes likely resulted from complex patterns of environmental and coastal topography in the area, as shown in a previous more geographically detailed study in the region (Gray et al., 2018).
Divergence times between T. truncatus and T. aduncus likely occurred around 1 Ma and is consistent with previous suggestions that differentiation in this group was largely driven by Pleistocene climatic oscillations and associated dispersal patterns (Moura et al., 2013;Gray et al., 2018). Although comparison with previous divergence time estimates is not straightforward (owing to the differing topologies between mtDNA and nuDNA), our estimate is similar to that inferred from mtDNA phylogenies (Moura et al., 2013). A recent broader phylogenetic study of Cetacea using target capture sequencing, also estimates divergence between Tursiops truncatus and T. aduncus at 2.6-1 Ma (McGowen et al., 2019), although only two samples were analysed and timings were calculated in the context of the entire cetacean tree, which tends to overestimate the age of more recent splittings. It should be noted that divergence dating for broader lineages is imprecise due to deficient data on calibration points (see Moura et al., 2013), and due to uncertainties in determining the correct mutation rate for nuDNA (see Moura et al., 2014, McGowen et al., 2019. Our approach is based on a coalescent model, and therefore based on different assumptions that take into account incomplete lineage sorting and changes in effective population size through time. Nevertheless, the dates obtained are similar to and within the error boundaries of earlier studies, and we therefore conclude that they are converging onto an accurate timeframe for divergence within the genus Tursiops. Reconstruction of ancestral N e further shows an overall reduction for both T. truncatus and T. aduncus until more recent times. For T. aduncus, our genome was obtained from a South African sample, whose population today has a very restricted geographic distribution and is genetically well differentiated from other T. aduncus (as shown in this study). Therefore, low N e levels are not unexpected. For T. truncatus, interpretation is made difficult by the lack of public metadata regarding the sample used for genome sequencing. It is known that the sample originated from a female dolphin used in the US Navy Marine Mammal   Fig. 4. PSMC analysis comparing the reference T. truncatus genome sequence (blue dotted; see text for details) with the genome sequence of a T. aduncus from South Africa (red dashed; see text for details). Main thicker line represents the estimate, while the thinner lines represent the inference from the bootstrap analyses. Grey solid line represents the inference resulting from a pseudo-diploid genome between T. aduncus and T. truncatus (see text for details). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Program, but not the geographical location where this animal was obtained from. Nevertheless, only the pelagic ecotype has a worldwide distribution (Tezanos-Pinto et al., 2008) and therefore potentially larger N e values. Other ecotypes are generally coastal, have a much more restricted geographic distribution, and many are genetically well differentiated (as shown in this study), and therefore low N e can also be expected. Nevertheless, the progressive reduction in N e , similar in both Tursiops genomes from different geographic regions, could suggest some broader phenomena affecting cetaceans over the last~2 Myr. Similar patterns have been described for other cetacean species, namely killer whales (Moura et al., 2014), sperm whales (Warren et al., 2017) and several species of baleen whales (Kishida, 2017;Árnason et al., 2018).
This study supports an earlier conclusion that the pelagic ecotype is likely a derived state in the genus Tursiops (Moura et al., 2013), with some coastal populations (i.e. in the Mediterranean and Black Sea) later derived from the pelagic population. Early admixture between Tursiops and Stenella coeruleoalba (striped dolphin, a species often found in oceanic habitat, but also in coastal waters) could have facilitated or been a consequence of a transition from inshore to offshore habitat for Tursiops truncatus. Our results indicate differentiation between pelagic and Mediterranean + Black Sea T. truncatus, although concordance analyses suggest there could still be some residual level of gene flow between these two lineages, consistent with a secondary colonisation of the coastal habitat.
Population studies often show differentiation between local coastal areas in Europe and pelagic samples (Natoli et al., 2005;Fernández et al., 2011;Louis et al., 2014;Gaspari et al., 2015), but these are not always reciprocally monophyletic when included in phylogenetic studies (Moura et al., 2013). Our study confirms that pelagic samples group as an independent monophyletic lineage, but further demonstrates the occurrence of residual gene flow with coastal ecotypes (from concordance analyses). In contrast, the polyphyly observed in Black Sea T. truncatus for mtDNA phylogenies (Moura et al., 2013), is suggested by our concordance analyses to result from incomplete lineage sorting, as proposed previously (Moura et al., 2013). Within the Eastern Mediterranean Sea, two paraphyletic lineages are seen, which is consistent with more detailed population level analyses (Gaspari et al., 2015). This may account for discordance between population and phylogenetic studies, and is consistent with previous suggestions of ongoing gene flow between different localized populations in the Eastern North Atlantic (Nichols et al., 2007;Gaspari et al., 2015).
Overall, our analyses provide further support that cyclical expansion and recession of coastlines during glacial fluctuations is likely an important driver of diversification in Tursiops, with the earlier forms dependent on coastal habitat (e.g. Moura et al., 2013). This could be associated with the occurrence of specific geographic areas where environmental conditions were more favourable to dolphins, acting as refugial areas and promoting differentiation between groups over relatively short geographic distances (as suggested for the Indian Ocean; Gray et al., 2018). Residual gene flow observed between different species/ecotypes could reflect short isolation times during climatic oscillations, secondary contact from associated range shifts, or ecological/ behavioural characteristics determining interaction between groups (as suggested in Hoelzel and Moura, 2015). We should note that coincidence between splitting times and main climatic shifts does not necessarily imply a causal mechanism. In fact, in previous studies we have suggested that such climatic shifts are likely proxies for other biological processes, namely redistribution of prey resources (Moura et al., 2014, Hoelzel and, although more data is clearly needed. Further integration of Tursiops coastal ecotypes from other parts of the world would help to confirm this interpretation. Incorporation of data on T. truncatus from the Pacific will be especially useful, though greater detail is still required on possible regional morphological and genetic variation across this region before an appropriate sampling scheme could be devised, which was not available for this study. Nevertheless, given that similar patterns are seen in various other cetacean species, it appears that climate fluctuations have been an important driver of differentiation in cetaceans more broadly, even if the exact biological processes involved cannot yet be determined with certainty.

Inference on Delphininae phylogeny and cross lineage reticulation
Although our study does not include all the currently recognized species in the sub-family Delphininae, our phylogenetic reconstruction provides novel inference regarding the relationships within this subfamily. Our analyses showed that inconsistencies in topology were most pronounced for several of the Stenella species included (namely striped dolphin -S. coeruleoalba -and spotted dolphins -S. frontalis + S. attenuata), but also consistently gave strong support for a clade comprised of D. delphis + S. longirostris + L. hosei resolving as sister clade to the genus Tursiops (c.f. Steeman et al., 2009;McGowen, 2011;Amaral et al., 2012a,b;McGowen et al., 2019). The placement of spotted and striped dolphins is discordant with the recent phylogenomic analysis of McGowen et al. (2019), but in our study the placement of those lineages is shown to be unstable and sensitive to changes in substitution models and site selection, and we can therefore expect discordance in the placement of those species between different studies. The D-statistic (Table 1), splits-tree (Fig. 2) and concordance analyses (Fig. 3) all suggest the occurrence of ancestral gene flow between subsets of some of these species. Although the analyses used to detect gene flow between lineages have several assumptions that might not be met by our data, many species for which cross-species gene flow was detected (e.g. S. coreuleoalba) were also the most unstable between analyses, leading to further support for sustained gene flow.
These results are consistent with several studies showing successful mating between Delphininae species in the present; both in the wild and in captivity (see Crossman et al., 2016). However, consistent cross lineage gene flow could have occurred before diversification into the species observed today, between more ancestral but still well differentiated lineages. This may explain the previously described mitonuclear discordances in phylogenies involving these species (e.g. Amaral et al., 2012aAmaral et al., ,b, 2014Gaspari et al., 2015). Furthermore, although we find no evidence for gene flow between T. truncatus and T. aduncus in these analyses, these species are known to mate in captivity, producing viable offspring at least to the F2 generation (Gridley et al., 2018). This suggests that gene flow patterns in the wild might be the result of local conditions that lead to the breakup of established premating barriers, rather than due to post-mating isolation mechanisms, although we do not have the data to identify what these might be.
Overall, most of the inconsistencies between topologies using different site selections and nucleotide substitution models can be explained by the occurrence of cross-lineage gene flow throughout the species radiation process. Rapid radiation could create a complex pattern of gene sharing among species, accounting for the difficulties in determining branching order in the earlier literature (e.g. McGowen, 2011). A combination of lineage sorting due to drift and stochastic patterns of recombination would result in some areas of the genome retaining genes obtained from other lineages during ancestral hybridization episodes. This hypothesis may also help explain why previous low resolution phylogenetic studies of this group showed differing topologies (e.g. Kingston et al., 2009;Amaral et al., 2012a,b), and why even high resolution datasets can show discrepancies in some key groups (see discussion above comparing our results with those presented in McGowen et al., 2019). Overall, the results of our analyses on reticulate evolution in this group, is consistent with recent suggestions that cross-lineage gene flow might be a common and important process in adaptive radiations (Berner and Salzburger, 2015), and may be an important process in supporting adaptation to novel regions and habitats.

Taxonomic considerations
The formal attribution of taxonomy is outside the scope of this study, however, our results provide information that would be valuable for future more formal taxonomic studies. The monophyly of Tursiops is consistent with its status as a single genus, although the division between the T. truncatus and T. aduncus lineages is relatively deep. If the T. truncatus and T. aduncus lineages retain separation at the species level, then the Burrunan Tursiops lineage may be considered a subspecies within T. aduncus (rather than a distinct species T. australis as previously proposed; e.g. Charlton-Robb et al., 2011), as it makes T. aduncus a paraphyletic group in our trees. Similarly, the Indian Ocean and Australasian ecotypes of T. aduncus could also be given sub-specific status given all three clades are reciprocally monophyletic. Classification of these three lineages as subspecies might represent the best compromise, in line with recent recommendations for a wider use of subspecies classification in cetacean groups (Taylor et al., 2017). In this context, there are additional lineages that may also merit separate subspecies classification within T. aduncus, including a lineage identified off Pakistan, India and possibly Bangladesh (Gray et al., 2018), though more data are needed.
The Western North Atlantic coastal (WNAC) ecotype of T. truncatus likely also warrants subspecies classification on the basis of taxonomic consistency. This reflects the deeper intra-specific separation within Tursiops species, which has been robust to choice of marker and level of resolution in previous studies (Tezanos-Pinto et al., 2008;Moura et al., 2013). In this context the much shallower division of the Black Sea ecotype less clearly supports its classification as a separate subspecies (T. t. ponticus). Although the lineage is monophyletic in the nuclear phylogenies (based on the samples included), it was not in the mtDNA phylogenies (Moura et al., 2013). The classification of the pelagic ecotype of T. truncatus is a more complex issue. Although it consistently forms its own lineage in this and other studies (Tezanos-Pinto et al., 2008;Moura et al., 2013), we show here that gene flow between the pelagic and the Mediterranean lineages is non-negligible. Further insight into the broader T. truncatus group from other regions around the world was not possible in our study, due to incomplete sampling for other regions where divergent groups have also been identified, especially in the Eastern and Western North Pacific (Perrin et al., 2011;Chen et al., 2017;de los Angeles Bayas-Rea et al., 2018) as well as Western and Eastern South Atlantic (Ott et al., 2016;Fruet et al., 2017). The latter includes the proposed species T. gephyreus Oliveira et al., 2019).
In other Delphininae genera, Stenella is clearly polyphyletic indicating a need for taxonomic revision consistent with previous suggestions Geisler et al., 2011;LeDuc et al., 1999;McGowen et al., 2009;Perrin et al., 2013). The spinner dolphin (S. longirostris) consistently groups together with common (D. delphis) and Fraser's dolphins (L. hosei), however defining the broader membership of this lineage would require more inclusive taxon sampling, in particular Stenella clymene and the genus Sotalia. The position of striped dolphin (S. coeruleoalba) within Delphininae is not robust to the choice of markers and partitioning scheme. Our results suggest this could be due to cross-lineage gene flow before the diversification of other species within the group. A similar phenomenon might be occurring with the spotted dolphin species (S. attenuata, S. frontalis), which consistently form a monophyletic clade, but whose position within Delphininae is not robust, possibly due to high levels of cross species reticulations in the past. However, incomplete taxon sampling could also be a factor in each of these cases, and although our full dataset had fairly high resolution, this was clearly reduced in some of the smaller site partitioning datasets resulting in lower Bayesian support values.

Conclusions
In this study we produced a high resolution multi-locus phylogenomic tree of the sub-family Delphininae with a focus on the genus Tursiops, and resolve earlier contradictory or unclear topologies. Our analyses of site selection schemes indicate that sites whose patterns of evolution deviate from strict neutrality, and potential patterns of cross-lineage reticulation can shift the position of certain groups considerably, but that these can be identified by carrying out gene congruence analyses. The genus Stenella is least well resolved, likely due to historical admixture among lineages. Historical reticulation was also identified involving the striped and spotted dolphin lineages, which could explain the inconsistent placement of these groups in both this and previous studies. A major early division in the genus Tursiops was likely between Australasian and North Atlantic coastal habitats with a geographically widespread offshore lineage evolving later, possibly associated with admixture between the Tursiops and Stenella lineages. Offshore lineages became globally distributed and subsequently involved some secondary colonisation of coastal habitats, such as those observed in the Mediterranean and Black Seas. The previously proposed progressive evolution of T. aduncus populations by dispersal and vicariance during Quaternary expansion from Australasia into the Indian Ocean (Gray et al., 2018) is supported by well resolved regional lineages found in this study, and little evidence for admixture among them. In contrast the T. truncatus lineage is more complex, likely due to intra and inter-specific reticulation, and the occupation of both coastal and pelagic habitats where their distributions may overlap.