Para-allopatry in hybridizing fire-bellied toads (Bombina bombina and B. variegata): inference from transcriptome-wide coalescence analyses

Ancient origins, profound ecological divergence and extensive hybridization make the fire-bellied toads Bombina bombina and B. variegata (Anura: Bombinatoridae) an intriguing test case of ecological speciation. Narrow Bombina hybrid zones erect barriers to neutral introgression whose strength has been estimated previously. We test this prediction by inferring the rate of gene exchange between pure populations on either side of the intensively studied Kraków transect. We developed a software pipeline to extract high confidence sets of orthologous genes from de novo transcriptome assemblies, fitted a range of divergence models to these data and assessed their relative support with analytic likelihoods calculations. There was clear evidence for post-divergence gene flow, but, as expected, no perceptible signal of recent introgression via the nearby hybrid zone. The analysis of two additional Bombina taxa (B. v. scabra and B. orientalis) validated our parameter estimates against a larger set of prior expectations. Despite substantial cumulative introgression over millions of years, adaptive divergence of the hybridizing taxa is essentially unaffected by their lack of reproductive isolation. Extended distribution ranges also buffer them against small-scale environmental perturbations that have been shown to reverse the speciation process in other, more recent ecotypes.


Introduction
The central role of environmental heterogeneity in driving the evolution of novel ecotypes has been impressively illustrated in the rapidly accumulating literature on ecological speciation. In particular, numerous studies have demonstrated how divergent adaptation can lead to partial reproductive isolation by causing assortative mating within ecotypes, reduced performance of hybrids or both (e.g. Jiggins et al. 2001;Bearhop et al. 2005;; see Nosil 2012 for a detailed treatment). Such barriers to gene flow can arise even on a historical time scale . If this momentum can be maintained then, intriguingly, ecological speciation may run its course relatively swiftly.
The hypothesis of rapid ecological speciation appears to be contradicted by the millions of years of divergence that are typically required before substantial pre-and/or postzygotic isolation is observed in experimental settings . interbreeding in nature may cease much earlier, if divergent adaptation includes novel habitat preferences and/or shifts in breeding/flowering time . But in the absence of irreversible barriers to reproduction, the new ecotypes are reliant on the particular environmental conditions under which they evolved . Subsequent change may increase gene flow levels so that phenotypic divergence is eroded again  or it may create periods of allopatry that facilitate the accumulation of habitatindependent gene flow barriers . In that case, speciation is initiated by ecological divergence but may be completed by a variety of processes. Much work 3 35 40 has been done on recently evolved ecotypes ), many of which will not persist long enough to become reproductively isolated . It is therefore of interest to determine the level of gene exchange between taxa with a long history of ecological divergence in allo-and parapatry.
Here, we report on one such taxon pair at a relatively advanced stage of the speciation process and estimate the rate of gene flow between two pure populations on a either side of a current hybrid zone. This long term average reflects introgression past genetic barriers that arise from ecological divergence and any habitat-independent incompatibilities. We discuss our results in the context of theory about genetic architectures that produce more or less efficient barriers to gene flow.
The European fire-bellied toads, Bombina bombina and B. variegata, are clearly defined taxa that differ profoundly in a large number of traits, including features of morphology, anatomy, life history and the mating system (see  for an overview). Many of these differences reflect the distinct ecological niches that the toads occupy. B. bombina reproduces in semipermanent lowland ponds, whereas B. variegata uses ephemeral breeding sites in more mountainous terrain. These aquatic habitats place opposing demands on the tadpoles: cryptic behavior to avoid invertebrate predators versus rapid growth and development as desiccation looms . Morphological and behavioral differences between the tadpoles  match established patterns in specialized anuran species along the aquatic permanence gradient (e.g. ). Many differences between the adults are also likely adaptations, for example of B. variegata to an ever-shifting distribution of breeding sites in the landscape and of B. bombina to the large mating aggregations that form 4 50 55 60 in ponds. Such simultaneous selection on multiple traits (multifarious selection) may generate particularly efficient barriers to gene flow ).
B. bombina and B. variegata diverged several million years ago ) and have experienced repeated cycles of allo-and parapatry . Present day hybrid zones formed after the last glacial maximum (less than 10,000 years ago; Szymura , 1996. They are typically 2 -9 km wide and located along the edge of mountainous regions (Fig.   1). Recombinant marker genotypes (< 10 diagnostic loci) predominate in intensively sampled transects in Poland, Hungary, Romania and Ukraine  and make up about a quarter of individuals in the Pešćenica (Croatia) transect . We expect that all individuals in a hybrid zone are recombinant to some degree. The temporal stability of phenotypic clines at 50 and 70 year sampling intervals  proves that natural selection counteracts the erosion of taxon differences. Direct evidence for habitatindependent natural selection comes from increased mortality of hybrid embryos under uniform conditions Kruuk et al. 1999b).
Classic cline theory links the patterns seen in nature to the three main determinants of hybrid zones: gene flow, selection and recombination. It predicts that the observed significant linkage disequilibria among independently segregating markers  result from the influx of pure genotypes into the hybrid zone center and increase the 'effective' selection pressure on each one of the selected loci Kruuk et al. 1999a). The associated allele frequency clines thus become steeper  which further strengthens linkage disequilibria and effective selection. This positive feedback  can explain the sharp step in allele frequency in the center of all clinal Bombina transects . It helps to maintain adaptive taxon differences and also generates a barrier to gene flow for neutral variants ).
The strength of this barrier reflects the overall level of reproductive isolation and can be estimated from the shape of clines at marker loci . The two Polish transects near Kraków and Przemyśl (width: ~ 6 km) provide a robust estimate of this barrier into the B. variegata gene pool . Under a diffusion model of dispersal and based on an estimated dispersal range of ~ 1 km per generation , this barrier is equivalent to 51 km of unimpeded habitat.
Assuming a generation time of three years , it would delay the introgression of neutral B. bombina alleles by about 8000 years . The estimated barrier strength also implies that the mean fitness of hybrid populations in the zone center is reduced by 42% relative to that of pure populations . Mitochondrial genomes might be expected to introgress more freely , but this is not observed in Bombina: concordant mtDNA and nuclear marker clines together with very weak cytonuclear disequilibria suggest that mtDNA may itself be subject to selection in hybrids  but see .
Thanks to the genome-wide linkage disequilibria in the hybrid zone, it was possible to draw all of these inferences from the small number of available diagnostic markers. Direct estimates of gene exchange between pure populations, however, require genome-wide genetic resources.
Their development has so far been hampered by the 'big' genomes  of B. bombina and B. variegata; for both taxa, C-value estimates just over the threshold of 10 pg 6 95 100 105 (~ 10 Gbp) have been obtained . There is, however, no indication of polyploidy . The two sequenced anuran genomes (Silurana (Xenopus) tropicalis, Nanorana parkeri, Sun et al. 2015) are too distant from Bombina (MRCA about 250 Mya) to be useful. We therefore chose de novo transcriptome assembly as a costeffective method to generate reduced representation genomic data .
Evolutionary or population genomic analyses of next generation sequence data take a locuscentered view ) and require the definition of orthologous sequences across samples. With RNA-seq data, the task of ortholog definition is non-trivial, because variants of a single locus (splice isoforms, alleles) are recovered along with sequences from recent paralogs. These issues have been variously addressed in recent de novo transcriptome assemblies. With the help of a reference genome, paralogs can be identified by phylogenetic methods (e.g. . Alternatively, variant call sets have been screened for suspect genotype distributions in order to remove SNPs that are likely derived from more than one locus (e.g. . In the absence of a reference genome and starting from either one or two samples per taxon, we devised criteria to identify putative paralogs within transcriptome assemblies. These were included in a clustering step across taxa to generate a high-confidence ortholog gene set for which SNPs were called. Each orthogroup 7 contained sequences from four Bombina taxa with a fully resolved phylogeny based on complete mitochondrial genomes : the hybridizing pair B. bombina and B. v. variegata as well as B. v. scabra, a subspecies from the Southern Balkans ( Fig. 1), and B. orientalis (East Asia), the sister taxon of the European lineages.
To quantify nuclear divergence and introgression, we used a recently developed coalescent approach ) that fits models of divergence with or without gene flow and assesses their support via analytically computed likelihoods. This inference is extremely efficient compared to multilocus methods that require computationally intensive Markov chain Monte Carlo sampling  or simulations (e.g. . Models were fit by two approaches: either using the site frequency spectrum (SFS) or multilocus data summarized by the blockwise SFS, i.e. the full configurations of mutations in blocks of sequence. With these coalescent analyses, we address the following questions: (i) Does the timing and order of divergence between the four Bombina lineages match that previously inferred from mitochondrial data?, (ii) Is there evidence for post-divergence gene flow between the hybridizing taxa away from the hybrid zone and, if so, how does it compare to the estimate between the B. variegata subspecies (genetic barrier to gene flow vs. greater spatial separation)? and (iii) Is there any evidence for a recent burst of introgression as a result of the current hybrid zone?.

RNA extraction, library preparation and sequencing
For Roche 454 sequencing, total RNA was extracted from liver, using RNAzol®RT, and mRNA was isolated using Dynabeads Oligo ( For Illumina RNA-seq, frozen samples were thawed and transferred into 1 ml Trizol, homogenized in a TissueLyser and passed five times through a 0.9 mm (20 gauge) needle attached to a syringe , followed by 5 min incubation at RT and centrifugation for 1 min at 14,000 rpm. The samples were Chloroform extracted once and then mixed with 200 µl of 70% Ethanol. Each sample was divided into 2 -3 aliquots, each of which was added to a Qiagen miRNA spin column and processed according the manufacturer's instructions. cDNA library preparation and paired-end sequencing on an Illumina HiSeq instrument were carried out by Edinburgh Genomics.

Transcriptome assemblies
Roche 454 data -Reads were cleaned by removing residues of poly(A) tails with SnoWhite v.1.1.4 ). Reads with low-complexity regions (including stretches of di-and trinucleotide repeats which could inflict spurious assemblies) and exact read duplicates were also removed. Low quality ends were trimmed with PRINSEQ-lite 0.17 . Reads which showed significant sequence identity (using BLASTN) to rRNA sequences of Silurana tropicalis were discarded.
De novo transcriptome assemblies of B. bombina and B. variegata were conducted separately for each species. We took a combinatorial approach to optimize each assembly (Kumar and Blaxter 2010). Reads were first assembled using Newbler (v2.6, Roche) and Mira v. 3  independently, after which both assemblies were co-assembled with CAP3 . For Mira and Newbler we used settings recommended for de novo transcriptome assembly from reads produced by the 454 platform (Mira: no clipping of poly(A) stretches). For CAP3, we used an overlap length cutoff equal to 80, an overlap percent identity cutoff equal to 95, and switched off the clipping option. To validate the assembly, reads were mapped back to assembled contigs using BLAT (Kent 2002), retaining only contigs with reads that mapped over at least 95% of their length (discarded contigs: 11% of B. bombina and 14% of B. variegata).
Illumina data -Adapter removal was carried out with Cutadapt (Martin 2011) as implemented in the wrapper Trim Galore! v. 0.3.7 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and quality trimming was performed with Trimmomatic , clipping any sequence downstream of four consecutive bases with mean quality below 25. Only sequences with a minimum length of 50 nucleotides (nt) were retained (Table 1).
De novo assembly was carried out with Trinity (v. 2014-04-13p1, ) in pairedend mode with the addition of all retained single reads. Default settings were chosen except for the minimum K-mer count (2), the maximum expected length between fragment pairs (400) and the minimum percent identity required for the merging of paths during transcript reduction (98).
Assemblies were searched for contaminants by BLAST analysis (megablast) against the NCBI nucleotide database and contigs with close to full length matches (ID > 0.90) to Bacteria and Archaea were removed (37 -219 contigs per taxon). For each assembly, a BLASTX comparison to the Silurana (Xenopus) tropicalis proteome which is bundled with Trinity. Finally, assembly completeness was assessed with CEGMA ).

Reference transcriptomes
For each taxon, we aimed to define a reference transcriptome that comprised all assembled gene copies (including paralogs) but excluded splice isoforms. Details of the analyses are presented in the Supporting Information and summarized here. Sequence identities (ID) were computed for pairwise nucleotide alignments within Trinity components. In the first instance, all contig pairs with overall sequence identity below 0.98 were designated as paralogs. In the case of markedly heterogeneous ID scores along the alignment, we distinguished poorly aligned tracts due to structural variation near alignment termini from those due to the alternate use of

Orthologue identification and alignment
We used OrthoMCL  with default parameters to cluster predicted peptide sequences from all four reference transcriptomes (Fig. 2). The complete proteome of S. tropicalis was added as a 'backbone' for clustering and to provide additional information on multi-gene families. From the initial clustering (inflation value = 1.5), we retained only those groups that comprised the following: (1) at least one contig from each Bombina taxon, (2) no paralog pairs from any Bombina taxon and (3) no more than three S. tropicalis contigs (see 2b for examples). From this subset of clusters we retained only those that were robust to changes in the inflation value (range: 0.5 -5.0) in order to counter common errors of graphbased methods such as OrthoMCL, i.e. over-clustering and incorrect handling of domain recombination (Kristensen et al. 2011). Finally, pairwise peptide alignments between Bombina taxa were computed within robust orthogroups. Note that we apply stringent criteria to obtain a set of robust orthogroups rather than an accurate list of paralogs. False positives in that list are therefore of no concern.

Variant detection
For the coalescence analyses, only the Illumina datasets were considered (corresponding to one diploid individual per taxon). For each taxon, reads were mapped to the reference transcriptome (for B.b. and B.v.v Peptide alignments between taxon pairs within orthogroups were back-translated into the nucleotide sequences of the reference contigs with PAL2NAL . For a given orthogroup and taxon, an alternate haplotype was constructed from the variant call format (VCF) file and added to the between-taxon alignments. These were parsed for variation at fourfold degenerate sites. (kB), iii) double heterozygote (2 distinct alleles only, kAB) and iv) alternate homozygotes (kAABB).
First, we based inference on the average frequencies of these site types E [ki], essentially the folded, joint site frequency spectrum (SFS) for two species. Second, we used the joint distribution of the four site types in the short blocks of fixed length (150 consecutive fourfold degenerate sites) to fit models of divergence and gene flow. The configuration of mutations in each block can be summarized as a vector of site counts k = {kA, kB,kAB,kAABB}. Assuming blocks are unlinked, the overall model likelihood is given as the product of p(k), the probability of each observed blockwise configuration of mutations across blocks. We will refer to this as the blockwise SFS (bSFS) throughout.
These two approaches trade off power against bias. The SFS ignores all linkage information and summarizes the data into four site frequencies (Tab. 2). In principle, this allows us to coestimate four parameters. However, the near absence of shared heterozygote sites, E[kAB], restricted our analyses to three-parameter models. This limitation is offset by more robust inference. In particular, point estimates based on the SFS are not affected by recombination or heterogeneity in mutation rate. In contrast, the analyses based on the bSFS assume a constant mutation rate across blocks and no recombination within them. Violations of these assumptions produce biased estimates. This is of particular concern for transcriptome data, given the added potential for recombination in introns. But by considering the joint distribution of linked mutations, the bSFS allows us to fit more complex models. where m is the proportion of immigrants per generation. We refer to the null model without migration as Div (Fig. 3) Asymmetry in the frequency of unique heterozygous sites in each species (kA and kB) may be due to differences in effective population size, asymmetry in post-divergence gene flow or both.
We therefore also considered a model in which Ne in one of the descendant species was allowed to change instantaneously after divergence, while Ne of the other species remained constant (Fig. 3). We contrasted the relative support for this model without gene flow (Div2, M = 0) with that for its counterpart, IM, with gene flow but a single Ne for all three populations. These models with three parameters were fit using the SFS and the bSFS. Finally, we used the bSFS to fit models that allow for both post-divergence gene flow and asymmetries in Ne (IM2). In each case, we tested for gene-flow and asymmetries in Ne in either direction (i.e. NA = a Nanc or NB = b Nanc and MAB or MBA).
We assume that orthoblocks are randomly distributed and, given the large size of the Bombina genome (2600 cM, , that the effects of physical linkage between them can be ignored (a rough calculation gives a maximum density of one orthoblock per 1.8 cM). Each orthoblock can therefore be treated as an independent replicate of the coalescent. To obtain a set of unlinked SNPs, we randomly sampled a single SNP per orthoblock and bootstrapped the 15 295 300 305 310 analysis across 1000 such sub-sampled data sets. 95% confidence intervals of parameter estimates were estimated as 1.96 SD of parameter estimates across bootstrap replicates. The dataset comprised more than 1000 blocks with 150 fourfold degenerate sites and an average of 2.7 -6.0 variable sites per block (depending on the comparison, Table 2). Maximum likelihood estimates were obtained using the Nmaximise function in Mathematica (for details see .

Estimation of synonymous substitution rates
In order to convert coalescence time estimates, T, into absolute times, t, we derived an estimate of the substitution rate per generation, µ, at fourfold degenerate sites from a recent anuran phylogeny based on nine nuclear genes (rag1, rag2, bdnf, pomc, cxcr4, slc8a1, scl8a3, rho, h3a, Irisarri et al. 2012b). That study demonstrated considerable rate variation between Neobatrachians and Archaeobatrachians (including Bombina). We therefore used data from two taxa in the latter group, Alytes and Discoglossus (MRCA 153 Mya, , which form a sister lineage to Bombina. From the concatenated nuclear dataset ), 896 fourfold degenerate sites were extracted and the substitution rate was estimated with PAML v. 4.7. . Given the three year generation time of Bombina, we obtained µ = 9.15 x 10 -9 .

Assembly overview
As is typical for Trinity assemblies, a large fraction (~ 60%) of relatively short contigs with low coverage had no similarity (BLASTX) with S. tropicalis and no ORF prediction. For example in the B. v. variegata assembly, contigs without ORF (n = 56,517) were shorter (median: 411 vs. of 248 ultra-conserved core eukaryote genes (CEGs) were either partially or completely (> 70% of total protein length) assembled for each taxon.

Paralogs
We focused our search for paralogs on those subsets of the assemblies in which ORFs had been identified, because all downstream analyses were based on coding sequence. In the Trinity assemblies, there were between 2791 (Bvs) and 3715 (Bvv) components per taxon with ORFs and more than one contig. Paralogs were identified in about 25% of these components in each taxon. In B. v. variegata for example, 10,753 contigs were considered in the paralog search and partitioned as follows: 3,715 contigs (with maximum ORF length) representing the components, 1,767 paralogs and a remainder of 5,271 contigs, that was removed from further analysis and presumably contained alleles and isoforms. For the other taxa, the paralog counts were 1,375 (Bvs), 1,737 (Bb) and 1,422 (Bo). A BLASTN search of Roche 454 contigs with 17 ORFs against the Trinity database generated matches for 9,172 and 7,866 contigs in B. v. variegata and B. bombina, respectively. Of these Roche 454 contigs, 3,375 (Bvv) and 2,626 (Bb) were designated as paralogs. This relatively greater proportion reflects the overall lower identities of these BLAST-derived sequence pairs compared to those within Trinity components.
Note that only the most similar Trinity/Roche 454 paralog pairs are expected to affect the definition of orthogroups (see the Supporting Information for additional information).

Orthologs
Of the 15,775 OrthoMCL orthogroups with more than one Bombina taxon, 1,719 were excluded because they comprised at least one pair of paralogous contigs. There were 6,772 groups from which at least one of the four Bombina taxa was missing. An excess (> 3) of S. tropicalis contigs excluded 3,845 groups. Bombina paralogs were the sole reason for exclusion in 716 cases ( Figure 4a). After these filtering steps, 4,978 orthogroups remained. Among these, 4,040 groups were robust to changes in the inflation value and used to generate pairwise alignments among taxa. The inclusion of presumptive paralogs in the clustering analysis not only led to the removal of groups with in-paralogs but also allowed for a sequence-based decision which transcript of a paralog pair to include in a given group. Figure 4b gives an example where paralog pairs were split into two orthogroups that most likely reflect alternate exon use in a gene with homology to acid phosphatase 1 in S. tropicalis. Note that in this case an alternative clustering based on just one contig per component resulted in an orthogroup that mixes isoforms and would lead to erroneous estimates of sequence divergence (see Supporting Information B for a comparison of both clustering approaches).

Variant detection
In each taxon, genotype qualities (a Phred-scaled measure of the confidence in the reported genotype) reached the maximum reported value (99) for more than 2/3 of SNPs and were 40 or over for over 95% of SNPs. We observed a small number of mostly low quality (< 20) genotype calls (0.5 -3.4% per taxon) that did not include the reference allele (i.e. alternate homozygotes).
With only one diploid individual per taxon, these cases point to assembly errors and were removed from further analysis.
Average heterozygosity (kA, kB) at fourfold degenerate sites ranged from 0.  (Table 2). This is expected given the order of taxon divergence previously inferred from protein and mitochondrial data ).

Coalescence analysis
In both the B. v. variegata/B. v. scabra and the B. variegata/B. bombina comparison, an IM history fit the data significantly better than a history of isolation without gene flow. This was true regardless of whether we allowed for differences in Ne or not (Table 3) (Table 4).
Our estimates of the divergence time also agreed well between SFS and bSFS analyses (Table   4). We converted estimates of T into absolute time using t = T * 2N * g, where is g is generation time, and N = θ / (4µ), using an estimate of µ from a related pair of Archaeobatrachians (cf.  (Table 4).
To assess the relative fit of the best supported models, we computed the expected SFS under the models inferred from the bSFS (Table 2). We also tested how much of the gene flow signal is due to the presence of shared heterozygous sites. For B. variegata and B. bombina, only one block contained shared heterozygous sites and removing this block had no effect on the results. There were 95 invariant blocks, which is not unexpected under the IM2 model (p = 0.235).

Discussion
The coalescent analyses provide evidence that hybridization between B. bombina and B. v. variegata results in continued gene exchange between these anciently diverged taxa, but the estimated long term average of gene flow was very low (M = 0.020 or 0.027, depending on the analysis). Thus despite the abundance of fertile hybrids in contact zones, the effective rate of gene exchange between the two pure populations was no greater than one immigrant per 100 -150 years. These values of M are at least 40 times smaller than an estimate by the same method for two sympatric Heliconius species that hybridize infrequently ).
However when integrated over time, the Bombina figures imply that a substantial portion of genes has been affected by past gene flow: based on the results from the bSFS analysis, this proportion for B. v. variegata is 1 -e -0.020 * 16.8 = 0.285 (Table 4). The analogous figure for B. bombina is 0.24.
Before we explore our results in more detail, we comment on aspects of our methodology. We devised an analysis pipeline that identified putative paralogs based on nucleotide sequence identity and excluded all cross-taxon gene clusters with paralog pairs from further analysis. This filtering step will also have eliminated the more serious cases of incorrectly phased variants . By using the same tissue (liver) for all samples, we avoided problems due to tissue-specific expression. However, some errors will remain. For any recent paralog pairs 21 420 425 430 above our identity threshold (0.98), only one contig was used in the clustering step and so orthology may have been mis-assigned. Finally, differential gene loss  may have resulted in some paralogous clusters.
Interpretation of the coalescent results requires weighing the strengths and weaknesses of both inference schemes. Recombination within blocks reduces the variance in coalescence times in the bSFS scheme and so leads to underestimates of M and Ne . Our blocks of 150 fourfold degenerate sites were on average 1500 nt long, which is longer than the average CDS length in S. tropicalis (1323 nt) and N. parkeri (1382 nt, Sun et al. 2015). In these fully sequenced genomes, the average gene length is 18.0 kb and 24.4 kb, respectively (Sun et al. 2015). This suggests an average span of 20 kb or more for the genomic complements of our blocks and a likely influence of recombination on the parameter estimates (recall the relatively lower values for M, Table 4). Overall, though, the biases appear small, because the estimates for a given model agreed well between bSFS and SFS analyses. Selective sweeps at linked sites in the ancestral population may have the opposite effect of inducing additional variation in coalescence times . The excess of blocks with no or few divergent sites in the European comparisons may in principle be explained in this way. But no such excess is seen in the allopatric comparison (Bvv/Bo) which suggests that it is instead a genuine signal of past migration.
While the SFS analysis makes fewer assumptions, it confounds gene flow with differences in effective population size since the lineage split. Thus, when assuming a single Ne, we always inferred gene flow into the taxon with higher heterozygosity and obtained unrealistic estimates for the allopatric pair (Table 4). In the bSFS analyses with two Ne parameters, M estimates were reduced to zero as expected (Bvv/Bo) or the direction of gene flow was reversed (Bvv/Bvs). The 22 bSFS contains additional information to distinguish gene flow from asymmetries in Ne: a larger Ne in one descendant population increases heterozygosity uniformly whereas gene flow generates a small number of blocks with large numbers of heterozygous sites in the population receiving migrants.
The observed heterozygosities per taxon (Table 2) fully match our expectations from previous studies. B. bombina, with the lowest estimate (kB = 0.00106), has recently expanded its range from glacial refugia near the Black Sea and has displayed minimal genetic diversity in all previous Bombina comparisons . The difference in heterozygosity between the B. variegata subspecies agrees with previously reported differences in Ne ). Finally, the B. orientalis specimen (max. estimate in this study, kB = 0.00748) comes from Korea, an important Pleistocene refugium in Asia ).
While the blockwise analyses allow a more detailed reconstruction of the divergence history, they still fit a very simplistic model. In particular, the assumption of unidirectional gene flow is unlikely to apply to Bombina. In fact for the hybridizing taxa, both the support for the two versions of the IM2 model (A→B vs. B→A, Table 3)  The species divergence times we have inferred here are by necessity more recent than the divergence at any particular gene (which includes an additional coalescence time in the common ancestral population). However, the difference between our estimates and previous calibrations for mitochondrial genes is surprising. For example for the split between B. v. variegata and B. bombina, we obtain 3.3 Mya compared to the 6.4 or 8.9 Mya divergence at mtDNA genes (two different fossil calibrations, ; see also . While some of this difference may reflect coalescent stochasticity (i.e. by chance, the coalescence of mitochondrial genomes may considerably predate the taxon split), it likely reflects differences in calibration.
The new divergence time estimate is consistent with the appearance of the first Bombina bombina fossils in the late Pliocene  and with analyses of allozymes (Nei's D, . But it too is clearly beset by uncertainties. We therefore conclude Assuming similar barrier strengths, the B. bombina sample (71 km) should be insulated even longer from introgression via that route. Selectively favored alleles, however, will introgress much more quickly . For example, an allele with a 1% selective advantage is expected to traverse the Kraków transect in a mere 245 years (via expression in Barton and Gale, 1993, p. 30 top). Our data set may well contain a few blocks of this kind, but larger samples of individuals and long range linkage information would be needed to detect them.
The strength of the genetic barrier to gene flow varies among different Bombina contact zones.
Near Apahida, Romania, the two taxa meet in an extended habitat mosaic at a distance of 20 km from the nearest pure species range . Plausible total selection strengths can be found that could balance the high estimated gene flow rates and maintain adaptive taxon differences indefinitely. But with, say, 20 selected loci, neutral marker clines would dissipate within ~ 160 years. A very dense spacing of the total selection pressure would be required (~ one selected locus every 4 cM, i.e. about 650 loci) to maintain such clines over 10,000 years. In fact, there are reasons to believe that this particular contact zone is much more recent . Spatial and temporal heterogeneity in rates of gene exchange is a common theme in the hybrid zone literature  and often linked to environmental conditions, as in this example. Episodic spates of gene flow appear to be just as much a product of environmental heterogeneity as any initial adaptive divergence. But given the slow rate of diffusion, neutral introgression should attenuate quickly with increasing distance from the hybrid zone, even in Apahida.
A post-zygotic barrier to neutral gene flow depends on physical linkage between neutral and selected loci and thus becomes stronger as a given total selection stength is more finely distributed across the genome . In Bombina, the number of loci 25 510 515 520 525 under selection has been estimated to be at least 55 and is most likely considerably larger . Abundant trait differences also suggests that multifarious selection may promote reproductive isolation, not just because many loci are involved but also because of the pleiotropic interactions they may have . Under  geometric model, such interactions lengthen the adaptive walk towards a new phenotypic optimum and, as a consequence, reduce the mean fitness of hybrids upon secondary contact . The largest drop in mean hybrid fitness is expected after the first few mutations have fixed, but the total fitness reduction at the end of an adaptive walk is only moderate (< 50%, Barton 2001). This polygenic model of adaptive divergence thus predicts that hybrids would continue to be conduits of gene flow unless adaptive substitutions also trigger strongly deleterious epistatic interactions (i.e. Dobzhansky-Muller incompatibilities, ). Such incompatibilities exist in Bombina (cf. Introduction). They may or may not result from ecological divergence, but their joint effect is clearly insufficient to cause reproductive isolation. We do not know at present why this is so but note that one important driver of speciation is missing: these toads do not have heteromorphic sex chromosomes and so Haldane's Rule  does not apply.
As it stands, B. bombina and B. variegata play important roles in their respective ecological communities, just like 'good' biological species, despite their gene pools fraying at the edges.
With an estimate of m on the order of 10 -6 only a few dozen dispersal ranges away from the hybrid zone, adaptive divergence can proceed uninhibited by hybridization. Similarly low gene flow rates were obtained for hybridizing sunflower species . Under these para-allopatric conditions , there is no antagonism between selection and gene flow. Consequently, any speciation mechanism may increase reproductive isolation between well separated, pure population of either taxon without constraints on the underlying 26 genetic architecture. In contrast, local adaptation under selection-gene flow balance between two panmictic populations favors a clustered genomic distribution of selected loci  which can gradually evolve in redundant genetic systems . Especially when such traits under divergent natural selection also cause assortative mating (magic traits, , novel ecotypes may rapidly form. But as long as prezygotic barriers are incomplete or labile under environmental perturbations, hybrids will be generated and post-zygotic barriers to gene flow are likely to be inefficient. The conundrum of incomplete speciation in Bombina illustrates that speciation tends to defy rules  and argues for an open-minded exploration of possible paths towards reproductive isolation. Rapid evolution of reproductive isolation between recent ecotypes is feasible whenever it can overcome the challenges of a limited gene pool from which to draw mutations that drastically reduce gene flow, the trend towards a clustered genomic distribution of selected traits and the need to build up irreversible barriers to gene flow before further environmental change threatens to reverse diversification. This suggests that many young ecotypes will be ephemeral, because they do not persist long enough to reach complete isolation . Those ecotypes that adapt to abundant and widely distributed resource should stand a relatively much better chance to become reproductively isolated in their own time. This does not diminish the role of ecology in generating biological diversity. To the contrary, it underscores that relatively young ecotypes can play important roles in their respective communities and even prompt diversification in other taxa , irrespective of whether they are reproductively isolated. We hasten to add that this view has a very long history indeed see Mallet 2008 Table 4. Maximum likelihood estimates of parameters under the best supported model ( Table 3). The divergence time (T) is given in units of 2 Nanc generations (95% CI in brackets). a is a scalar for the effective population size of one of the descendant populations (in parentheses, cf. Figure 3). n/a = not applicable.      in blocks of 150 fourfold degenerate sites. The divergence model predicts a narrower distribution (gray dotted) than observed (black), i.e. there is an excess both of blocks with no fixed differences and of blocks with a large number of fixed differences (kAABB > 6). An IM2 model with migration and asymmetry in Ne (gray, dashed) gives a good fit to the data.