Cryptic diversity and multiple origins of the widespread mayfly species group Baetis rhodani (Ephemeroptera: Baetidae) on northwestern Mediterranean islands

Abstract How the often highly endemic biodiversity of islands originated has been debated for decades, and it remains a fervid research ground. Here, using mitochondrial and nuclear gene sequence analyses, we investigate the diversity, phylogenetic relationships, and evolutionary history of the mayfly Baetis gr. rhodani on the three largest northwestern Mediterranean islands (Sardinia, Corsica, Elba). We identify three distinct, largely co‐distributed, and deeply differentiated lineages, with divergences tentatively dated back to the Eocene–Oligocene transition. Bayesian population structure analyses reveal a lack of gene exchange between them, even at sites where they are syntopic, indicating that these lineages belong to three putative species. Their phylogenetic relationships with continental relatives, together with the dating estimates, support a role for three processes contributing to this diversity: (1) vicariance, primed by microplate disjunction and oceanic transgression; (2) dispersal from the continent; and (3) speciation within the island group. Thus, our results do not point toward a prevailing role for any of the previously invoked processes. Rather, they suggest that a variety of processes equally contributed to shape the diverse and endemic biota of this group of islands.

The Tyrrhenian (northwestern Mediterranean) islands belong to the largest continental fragment within the Mediterranean basin. This fragment originated by detachment from the Iberian Peninsula, the counterclockwise rotation of the microplate (initiated 21 to 15-18 Ma ago; Gattaccecca et al., 2007), and subsequent oceanic spreading (Brandano & Policicchio, 2012), while the disjunction of the two main islands, Sardinia and Corsica, was completed about 9 Ma ago (Alvarez, 1972;Bellon, Coulon, & Edel, 1977;Bonin, Chotin, Giret, & Orsini, 1979). Subsequent land connections between islands and with the continent formed repeatedly, as a consequence of Mediterranean sea-level oscillations (Hsü, Ryan, & Cita, 1973;Krijgsman, Hilgen, Raffi, Sierro, & Wilson, 1999;Shackleton, Van Andel, & Runnels, 1984;Van Andel & Shackleton, 1982). This island system shows an extremely wide range of climates, landscapes, and habitats, and both its paleoclimates and its palaeoenvironments have been investigated in detail (Vogiatzakis, Pungetti, & Mannion, 2008). Consequently, in light of the wealth of background information available, it offers an ideal system for exploring both traditional and new hypotheses about the processes leading to the assembly and uniqueness of continental island biotas, an opportunity that has not escaped attention in the recent past.
Several hypotheses on the origin of the Tyrrhenian islands biota have been put forward during the last century. At first, it was proposed that such a distinctive biota formed via hologenesis, a process implying a prominent role for vicariance events (Luzzatto, Palestrini, & D'entrèves, 2000;Monterosso, 1935;Monti, 1915). Subsequently, more emphasis was given to dispersal, and a three-step process of colonization was hypothesized (Baccetti, 1964): a pre-Miocene step through a supposed land bridge with the Baetic region (southern Spain); a Miocene step which brought warm-temperate species from North Africa and Italy; and a Quaternary step which brought temperate species from the Apennines (Italy) via a land bridge connecting Tuscany with northern Corsica. Once the theory of plate tectonics spread (Alvarez, 1972), a reappraisal of vicariance as the leading force occurred, particularly to explain the most distinctive (endemic) components of this insular biota (Baccetti, 1980). With the advent of molecular phylogenetic approaches, data on several taxonomic groups began to accumulate (Grill, Casula, Lecis, & Menken, 2007;Ketmaier & Caccone, 2013), leading to a reconsideration of dispersal events, although vicariance events primed by the microplate disjunction are still regarded by several authors as the main triggers of endemic species formation on the islands (Ketmaier & Caccone, 2013).
In this study, we focus on mayflies of the Baetis rhodani (Pictet, 1843) species group, which is one of the most common mayflies in the Western Palaearctic region, and one of the most abundant insects in freshwater running environments (Brittain, 1982). Phylogenetic and phylogeographic investigations of this species group have proliferated in recent years. In spite of the homogeneity of morphological characters, such studies have revealed unexpectedly deep divergences. Williams, Ormerod, and Bruford (2006) investigated the genetic structure of B. rhodani populations in the Western Palaearctic, and identified multiple divergent lineages, probably belonging to distinct cryptic species. Lucentini et al. (2011) expanded the analysis to other European countries (UK and Italy), increasing the number of lineages. More recently, populations from the Canary Islands have also been investigated (Rutschmann et al., 2014), again showing deep divergences and several endemic lineages, likely resulting from a complex evolutionary history. Unfortunately, these studies were based on a single mitochondrial DNA marker (mostly cytochrome oxidase I), preventing sound inferences about the species status and the evolutionary processes involved. Nonetheless, results of these studies point to these mayflies as potentially excellent models for evolutionary and historical biogeographic studies in the Western Palaearctic.
Within the Tyrrhenian islands, a single endemic species, Baetis ingridae (Thomas & Soldán, 1987), has been described from the B. rhodani group, based on a sample from northeastern Corsica. This species shows subtle morphological differences with respect to continental B. rhodani. Their larvae are very similar, with differences affecting color patterns, the shape of the femoral bristles, and the shape and number of the teeth on the inner margin of the paraproct (Bauernfeind & Soldan, 2012). So far, the diversity (including genetic diversity) and phylogenetic relationships of this mayfly within the Mediterranean islands are still virtually unexplored.
Here, we employ a set of nuclear (nDNA) and mitochondrial (mtDNA) gene regions with the aim of investigating (1) the diversity of the B. rhodani species group in the Tyrrhenian islands; (2) the phylogenetic relationships with its continental relatives; (3) its evolutionary history, with special reference to the mode of settlement on islands; and ultimately (4) to aid shedding more light on the historical origin of these islands' highly endemic diversity.

| Sampling
Larval individuals of the B. rhodani species group were collected from aquatic habitats at 28 localities: 23 localities from the two main Tyrrhenian islands (Sardinia and Corsica); two samples from the Elba island, which is close to the northeastern side of Corsica and shows faunal and floral affinities with that island; and three samples from peninsular Italy (i.e., the nearest mainland), added to this study for comparative purposes. Taxonomic assignment of the sampled individuals to the B. rhodani species groups was carried out using well-established morphological characters of diagnostic value (Müller-Liebenau, 1969
treatment (Sambrook, Fritsch, & Maniatis, 1989 (Li, Riethoven, & Ma, 2010). With this tool, we searched for exon-primed intron-crossing (EPIC) markers, using annotated genomes of Drosophila melanogaster ("query"), Aedes aegypti ("subject"), and Apis mellifera ("subject"), as annotated genomes of mayflies were not available. PCR primers were designed for 10 putative EPIC markers using PRIMER 3. After preliminary PCR trials, we focused on five putative markers which were consistently amplified in a sample of 10 randomly selected individuals, and which provided single and clean PCR bands in a standard (1.5×) agarose-gel electrophoresis. Among these markers, the RYA gene fragment returned high-quality sequence electropherograms, and was thus retained and used to analyze the entire pool of sampled individuals.
Polymerase chain reaction cycling conditions were the same for all genes: 5 min at 92°C followed by 30 cycles of 1 min at 92°C, 1 min at an annealing temperature specific for each gene, and 90 s at 72°C, followed by a single final step of extension of 10 min at 72°C.

| Data analysis
Sequence variation, substitution patterns, and divergence between haplotypes and between the main haplotype groups were assessed using MEGA 5.1 (Tamura et al., 2011).
As this study is the first to investigate patterns of genetic diversity and affinities of mayflies of the B. rhodani group throughout the Tyrrhenian islands, we began our data exploration by verifying their purported endemicity, that is, whether they belong to well-supported and divergent clades, geographically restricted to the Tyrrhenian islands. It is worth emphasizing, however, that we refrained from fur- and Narcis (2014), and Williams et al. (2006),in order to have the widest possible coverage of continental Europe, and we built a Bayesian phylogenetic tree as described below.
Gene trees for the CO1 dataset, as well as for the concatenated mtDNA, PEPCK, and RYA datasets, were estimated by means of the Bayesian inference procedure implemented in MRBAYES v.3.2.1 (Ronquist et al., 2012). The nuclear gene sequences were phased using the software PHASE 2.1 (Stephens & Donnelly, 2003) with the default settings, while the possible occurrence of recombination was assessed using the pairwise homoplasy index (PHI statistic, Bruen, Hervé, & Bryant, 2006) implemented in SPLITSTREE v.4.11 (Huson & Bryant, 2006). The best partitioning strategy for the concatenated mtDNA Nuclear DNA data were further analyzed using the nonhierarchical Bayesian clustering procedure implemented in BAPS 6.0 (Corander, Siren, & Arjas, 2008), which allows us to identify the best clustering option based on multilocus datasets, to assign individuals to clusters, and, importantly, to identify individuals of mixed ancestry among groups. We ran five replicates of this clustering analysis to check its consistency, based on the "clustering with linked loci" option, the "in- Finally, we used the Bayesian inference procedure implemented in *BEAST (Heled & Drummond, 2010), in order to co-estimate individual gene trees for the four gene fragments analyzed (CO1, ND1, RYA, and PEPCK), and to embed them into a single multilocus species tree. With this method, terminal taxa have to be indicated a priori, and so, we used the results of previous phylogenetic inferences based on mtDNA and the nonhierarchical cluster analysis of nDNA data. Substitution, clock, and tree models were unlinked, and each dataset was assigned the appropriate ploidy level and the substitution model previously estimated by means of PARTITIONFINDER and JMODELTEST. The analysis was run using the Yule species tree prior, and a strict molecular clock model was assumed. In order to get an approximate temporal context for the main divergences inferred, and in the absence of internal calibration points or fossil records, we used the CO1-specific substitution rate of 0.0115 substitutions per site per Ma (Brower, 1994), which has been extensively used to date divergences in insects, including mayflies, and which falls in the middle of the ranges estimated for this gene fragment in insects. Substitution rates for the remaining gene fragments were estimated using the CO1-specific rate as a reference, and the 1/x prior distribution. Five independent *BEAST analyses were performed to check for convergence, each run for 100 million generations, and sampling every 10,000. Convergence of the analyses, the appropriate number of samples to discard as burn-in (10%), stationarity, and the achievement of appropriate ESS values (»200) were inspected using TRACER. The final species tree was estimated as a maximum clade credibility tree summarizing results of the postburn-in trees, with nodes annotated with ages and the corresponding 95% highest posterior densities (HPD), by means of the BEAST module TREEANNOTETOR. In addition, the entire posterior distribution of species trees obtained with *BEAST was inspected using DENSITREE.

| RESULTS
For the 112 individuals of the B. rhodani species group used in this study, we obtained fully resolved sequences (i.e., without missing data) for all the gene fragments analyzed. Sequence length was as fol- The Bayesian phylogenetic analysis performed on the concatenated mtDNA (Fig. 1A)  The nonhierarchical cluster analysis of the nDNA dataset carried out with Baps indicated K = 4 as the option best fitting the data (posterior probability = 1.00). Individuals from insular samples were assigned to three distinct clusters, fully corresponding to clades A, B, and C identified from the mtDNA dataset. Contrary to what was previously found from the mtDNA, all individuals from the mainland were grouped into a single cluster. Moreover, with this analysis, no individuals of mixed ancestry between clades were found, even within population samples where distinct mtDNA clades and nuclear clusters were observed to be syntopic (samples 7, 12, 20, 25).
The phylogenetic networks built among haplotypes found at the nDNA PEPCK and RYA are shown in Fig. 2. Within both networks, haplotypes from individuals carrying the mtDNA clades A, B, and C can be identified as groups of closely related nDNA haplotypes. Heterozygote individuals carrying haplotypes from two of these groups were not observed, not even within sites of syntopy. Haplotypes from these groups were never observed. In addition, a closer affinity between the nDNA group C (yellow in Fig. 2) and the Italian haplotypes was observed.
However, the genealogical relationships among Italian haplotypes and between them and haplotypes of group C appeared much less clearly resolved than with the mtDNA dataset. Within the insular populations, where the three haplogroups from distinct groups were found, heterozygotes for haplotypes from the three distinct groups were not observed.
The multilocus species tree analysis with *BEAST was carried out using three alternative strategies for the a priori definition of terminal taxa. First, following results based on the CO1 dataset, we added to our multilocus dataset two individuals from the closest relatives of our insular clades. We defined a set of seven ingroup taxa, five resulting from the Bayesian analysis of the concatenated mtDNA data, plus two groups retrieved from the literature (G1-G2 [Lucentini et al., 2011] and G5-Brho2hap8 [Lucentini et al., 2011;Murria et al., 2014], here referred to as clades G and H). The *BEAST analysis was then run with the non-CO1 data from these two latter groups encoded as missing. Second, the same analysis was rerun after the exclusion of CO1 sequences derived from the literature (i.e., with five terminal taxa). Third, individuals were assigned to four terminal taxa defined according to results of the nonhierarchical cluster analysis of the nDNA data, that is, with the mainland samples grouped together. With seven groups (see Fig. 3), the T A B L E 2 Mean pairwise uncorrected sequence divergence among (below diagonal) and within (diagonal) the five main haplotype groups identified in the concatenated mtDNA dataset Standard errors of the between-group divergence estimates are given above the diagonal. Haplotype groups are encoded as in Fig. 1.
F I G U R E 2 Phylogeographic networks generated with HAPLOTYPEVIEWER, based on the Bayesian phylogenetic trees estimated for each nDNA gene fragment using MRBAYES. Circle size is proportional to the frequency of the corresponding haplotype across the dataset. Each individual was given the same color as the respective mtDNA haplogroup, for comparative purposes *BEAST analysis confirmed with high support a closer affinity between clade C and the continental clade G (rather than between clade C and the other insular clades). The divergence between the two main clades was estimated to have occurred at 31.4 Ma (95% HPD: 21.0-42.4), whereas the divergence between the two insular clades A and B was dated at 7.4 (95% HPD: 4.6-10.3). Divergence between clade A + B and its closest continental relative (H) was estimated at 19.9 (95% HPD: 12.0-29.3). Instead, the divergence between the insular clade C and clade G was dated at 2.8 (95% HPD: 0.5-4.8). Carrying out the analysis with four terminal taxa did not appreciably affect branching patterns, nodal support, or node age estimates, whereas node ages were 1-4 Ma older with five groups than with the other grouping options, although they fell well within the respective 95% HPDs (see Fig. S2).

| DISCUSSION
With the ever-increasing application of barcoding procedures to the study of biological diversity, a wealth of animal and plant species, including mayflies of the B. rhodani group, are revealing previously unsuspected levels and depths of variation, with potentially far-reaching implications for their taxonomy, systematics, and conservation, as well as for our understanding of their ecologies and evolutionary pathways. Unfortunately, however, in most cases, mtDNA is the only component of the genetic variation analyzed, which prevents a full appraisal of the biological meaning of the observed patterns of diversity (Ballaux, 2010;Hudson & Coyne, 2002).
Within the B. rhodani group, all previous studies identified deeply divergent mtDNA (CO1) lineages, often occurring in syntopy.
Nonetheless, as mtDNA provides no information about the occurrence and extent of gene exchange between diverging lineages, conclusions about the many implications of these results remained open. In the present study, we took advantage, for the first time in the B. rhodani group, of combined mitochondrial and nuclear perspectives, which allowed us to overcome the many pitfalls associated with the use of mtDNA alone when addressing patterns of genetic variation and species' evolutionary histories.
Within the Tyrrhenian islands and neighboring Elba, we found three endemic lineages (Fig. S1) that were deeply divergent in their mtDNA. Indeed, the observed degree of divergence (Table 2) largely exceeded the suggested threshold for putative species, in one case (haplogroups A + B vs. C) exceeding also the one frequently observed among congeneric mayflies (18%; Ball & Hebert, 2005). Furthermore, based on the inferred phylogenetic pattern, they appeared to be independently derived from continental relatives (see below). Nuclear DNA data confirm this pattern and, when analyzed in a Bayesian population genetics framework, clearly indicate the lack of admixture between lineages, even where the lineages are sympatric (see Fig. 1 and results of BAPS analysis). Thus, overall, our data clearly indicate that the three lineages we observed belong to three distinct species, under both phylogenetic and biological species concepts (Hausdorf, 2011 There are several sources of uncertainty in our time estimates, including our reliance on a fixed and non-organism-specific molecular rate, and the lack of internal fossil calibration points. In addition, we refrained from using the disjunction of the plate from the continent as a dating point, as we had no data supporting this choice, and we cannot firmly exclude subsequent overseas dispersal for this mayfly (Monaghan et al., 2005). In spite of these major limitations, our dating exercise yielded a time estimate for the divergence between lineage A + B and its closest continental relative clade (19.9 Ma; see  Gattaccecca et al., 2007;Brandano & Policicchio, 2012;Tomassetti, Bosellini, & Brandano, 2013), separating the Corsica-Sardinia block from the Iberian peninsula. As intervening marine barriers may in fact fragment mayfly populations, this time coincidence appears strongly suggestive of a causal relation.
In line with previous studies (Ketmaier & Caccone, 2013), our results support an important role for the microplate disjunction and the consequent paleogeographic events in the origin of the highly endemic biota of this island complex. Nonetheless, they also emphasize that this paleogeographic process has not been the only or the major forge of endemic taxa. At least two further processes contributed: (1) speciation within the island complex and (2) (Alvarez, 1972;Bellon et al., 1977;Bonin et al., 1979;Gattaccecca et al., 2007). Second, both clades have been found in Corsica, and our data are not appropriate to carry out a sound analysis of the respective ancestral areas; more data will be needed for that purpose. Third, and more importantly, without clear evidence to invoke specific scenarios, we cannot exclude nonallopatric processes of speciation from the set of hypotheses (Coyne & Orr, 2004), especially considering that knowledge of the evolutionary ecology of this group of mayflies is still virtually null. Thus, how clades A and B originated, and how they reached reproductive isolation, will necessarily be a subject for future research.
Lineage C was the most differentiated among the three endemic lineages of the B. rhodani group on the island complex. The mean estimate of the divergence time between the respective clades (31.4 Ma) lies close to the Eocene-Oligocene transition (the so-called Grande Coupure or "great break"), an epoch of major and synchronous floral, faunal, and climatic turnover (Hooker, Collinson, & Sille, 2004;Liu et al., 2009). Nonetheless, our estimate has much uncertainty, and any hypothesis about the underling processes would require a wealth of weakly justified assumptions so that, again, we prefer not to set them at all. On the other hand, it is worth noting that both for this event and for the divergence between lineage C and its closest continental relative clade, our mean time estimates (2.9 Ma in the latter case) lie close to major and parallel climatic transitions toward cooler and drier climates. Furthermore, the Plio-Pleistocene transition (~2.6 Ma) also introduced a marked seasonality in the Mediterranean region, and led to an abrupt sea-level drop that could have made environmental conditions more favorable to a colonization of the Tyrrhenian islands from neighboring continental areas (Thompson, 2005).
Finally, although the long polarized vicariance-dispersal debate has now been largely replaced by a more integrated view that sees them as variously interacting processes, it continues to form the basis of historical biogeographic investigations in several instances.
In the context of the Tyrrhenian islands, the view that the microplate disjunction (i.e., vicariance) was the leading process responsible for the islands' highly endemic biota has historically prevailed (Ketmaier & Caccone, 2013). Our results do not support this view, and add to some previous studies (e.g., Nascetti, Cimmaruta, Lanza, & Bullini, 1996) in showing that not only both vicariance and dispersal played a part, but also that other processes, which acted within the island complex, were involved. These results thus invite us, once more (e.g., Whittaker & Fernández-Palacios, 2007), to a sharper shift from an either-or logical framework to a more integrative one, which is increasingly appearing to provide a better perspective on the complexity of the processes that have molded patterns of biological diversity.

| CONCLUDING REMARKS
Patterns of genetic variation above and below the species level have remained largely unexplored in baetid mayflies until recent times. This study adds to several recent ones in showing that, in spite of their limited morphological variation, they usually show high levels of deeply structured genetic diversity. Besides retaining the genetic footprints of past evolutionary processes over a wide timescale, they are rather easy to sample (during the aquatic larval stages), are usually common and abundant at sites of occurrence, and given the very limited duration of the adult life stages (in the order of hours), they show comparatively limited dispersal abilities with respect to other flying insects. All these features make these organisms excellent models for historical biogeographic and evolutionary studies.
Here, we have identified the occurrence of three distinct and deeply divergent species within the B. rhodani group in the northwestern Mediterranean islands. For the time being, we refrained from making formal taxonomic descriptions, as we judged our data suboptimal for that purpose. This task shall await the examination (genetic and morphological) of the available type material and following comparisons, the designation of new type material.
Finally, although we did not carry out statistical phylogeographic and historical demographic analyses on our samples, given the limited sample size per lineage, it did not escape our notice that the three lineages we found showed substantial intraspecific variation. Thus, as soon as a larger sample becomes available, a thorough analysis of this level of variation will allow us to shed light also on the evolutionary processes that followed settlement on the island system and the achievement of reciprocal isolation and so will contribute to the ongoing debate concerning the relevance of intraisland microevolutionary processes.