Phylogenetic position and age of Lake Baikal candonids (Crustacea, Ostracoda) inferred from multigene sequence analyzes and molecular dating

Abstract With 104 endemic species family Candonidae is one of the most diverse crustacean groups in Lake Baikal, yet their phylogenetic relationships and position in the family have not been addressed so far. Here, we study the phylogenetic position of Baikal candonids within the family and their evolutionary history using molecular markers for the first time since their original description. We choose 10 Baikal and 28 species from around the world, and three ribosomal RNA‐s (18S, 28S, and 16S), and analyze individual and concatenated datasets using Bayesian Inference in MrBayes and BEAST. For molecular divergence time estimates, four fossil records are used to calibrate the root and three internal nodes. The 28S dataset is tested under the strict molecular clock, while for other data we use relaxed clocks. Resulting trees show incongruence between molecular and fossil divergence time estimates, with the former suggesting older ages. Strict molecular clock analysis results in narrower node age confidence intervals and younger time estimates than other analysis. All trees support at least two candonid lineages in Baikal, with two independent colonization events, and 28S suggests a major radiation between 12 and 5 Mya. This divergence time estimate mostly agrees with another, unrelated, ostracod group in the lake and other lake animals as well. Baikal candonid clades show a close phylogenetic relationship with Palearctic lineages, but their deep divergence is indicative of separate genera. Results also suggest a monophyly of tribes that today live exclusively in subterranean waters, and we offer several hypotheses of their evolutionary history.

For molecular divergence time estimates, four fossil records are used to calibrate the root and three internal nodes. The 28S dataset is tested under the strict molecular clock, while for other data we use relaxed clocks. Resulting trees show incongruence between molecular and fossil divergence time estimates, with the former suggesting older ages. Strict molecular clock analysis results in narrower node age confidence intervals and younger time estimates than other analysis. All trees support at least two candonid lineages in Baikal, with two independent colonization events, and 28S suggests a major radiation between 12 and 5 Mya. This divergence time estimate mostly agrees with another, unrelated, ostracod group in the lake and other lake animals as well. Baikal candonid clades show a close phylogenetic relationship with Palearctic lineages, but their deep divergence is indicative of separate genera. Results also suggest a monophyly of tribes that today live exclusively in subterranean waters, and we offer several hypotheses of their evolutionary history.

K E Y W O R D S
ancient lakes, evolutionary history, fossil calibration, molecular clock
Over 2,500 species have been recoded so far (Timoshkin, 2001), and more than half of the known species are endemic to this lake; many lineages are represented by series of "flocks of closely related species" (Martens, Coulter, & Goddeeris, 1994;Timoshkin, 2001).
In comparison with other ancient lakes and surface freshwater ecosystems in general, Lake Baikal has the highest proportion of crustaceans in its fauna (32%; see Martens & Schön, 1999). Amphipods are the most diverse crustacean group, with nearly 300 species (Takheev, 2000). The second largest are ostracods; according to Martens (1994) there are 174 species and subspecies here, of which 90% are endemic. Lake Tanganyika, in comparison, has 64 ostracod species and subspecies, but a slightly higher endemicity (94%) (Martens, 1994), and far more endemic genera than Baikal. However, the latter is a matter of the current systematics and previous taxonomic decisions, and does not necessarily reflect phylogeny of the group .
Both Baikal and Tanganyika have some nonendemic taxa. In Lake Baikal, Palearctic taxa live in the top 2 m of the littoral only and do not penetrate beyond this zone; only very few Baikal endemics live sympatrically with these Palearctic species (Mazepova, 1994). Present data indicate that in Lake Tanganyika many endemics abound in the upper-littoral of the lake (at depths of 0.5 m and less) (Martens, 1994). Lake Baikal is populated by two ostracod suborders: Cytherocopina and Cypridocopina. The former, predominantly a marine group, is represented by two genera: Limnocythere Brady, 1867 (one species) and Cytherissa Sars, 1925 (47 species and 10 subspecies). The latter is represented by the family Candonidae, exclusively a freshwater group, classified into three genera: Candona Baird, 1845 (48 species and five subspecies); Pseudocandona Kaufmann, 1900 (27 species and three subspecies); and Baicalocandona Mazepova, 1976 (11 species and 10 subspecies). Only Baicalocandona is endemic to Lake Baikal, while the other four genera have primarily Holarctic distributions. The family Candonidae today numbers about 500 Recent species (Karanovic, 2012;Martens & Savatenalinton, 2011), of which almost a half live either in Lake Baikal or in the subterranean waters of Western Australia (Karanovic, 2007).
A majority of Baikal candonids (and also Cytherissa) were described in two main publications: Bronstein (1947) and Mazepova (1990). These descriptions, although missing some important taxonomic information, revealed a great morphological diversity and indicated that Baikal candonids need to be revised and probably subdivided into several genera (Danielopol, Baltanás, Morocutti, & Österreicher, 2011;Karanovic, 2007Karanovic, , 2012. Karanovic (2007) provided a phylogenetic reconstruction of the family Candonidae based on morphological characters alone, erecting several new tribes, of which the largest one (the nominotypical tribe Candonini) remained paraphyletic. This is partly due to a high morphological diversity of Baikal candonids belonging to this tribe.
Thanks to their well calcified shell, ostracods are one of the most abundant microfossil groups. More than 80% of species are known only from the fossil record, which stretches back to Ordovician (Siveter, Briggs, Siveter, & Sutton, 2010). One of the most reliable characters for discrimination of higher systematic ranks, such as families, in the fossil record is the adductor muscle scar imprint on the shell. Shell ornamentation and shape are used for lower taxonomic units. The candonid shell is generally poorly ornamented and with high intrageneric shape variability, which may pose a problem in fossils identification.
The record of Candonidae from the Upper Carboniferous is dubious because of very poorly preserved shells, with undistinguishable pattern of muscle scar imprints (Sohn, 1975(Sohn, , 1977. According to Danielopol et al. (2011), the oldest Candonidae ostracod dates back to Early Jurassic and is attributed to Septacandona Cabral & Colin, 2002 from Portugal (Cabral & Colin, 2002). The exact number of fossil Candonidae is hard to corroborate partly due to a great variability in the carapace shape, but also because of discrepancy between paleontological and neontological systematics of the family. For example, Krstić (2006) provided an overview of the Pliocene ostracods from the Pannonian plane and divided the family into 11 tribes, separating genera which are very closely related based on neontological data.
There are numerous caveats for the use of fossil record for molecular clock calibrations (Parham & Irmis, 2008), but this method has nevertheless been widely applied to aid divergence time estimations in various groups (see Gandolfo, Nixon, & Crepet, 2007;Warnock, Parham, Joyce, Lyson, & Donoghue, 2014). Despite an abundant ostracod fossil record, their age is rarely used in molecular clock calibrations. In addition, a study based on 18S rRNA stipulated a high incongruence between fossil and molecular divergence time estimates in this group, partly due to the controversial taxonomy of fossil ostracods (Tinn & Oakley, 2008). Consequently, studies attempting to estimate divergence times in ostracods mostly applied universal invertebrate COI molecular clock rates proposed by Wilke, Schultheiß, and Albrecht (2009) (see Schön, Shearn, Martens, Koenders, & Halse, 2015), or rates calculated for some ostracod lineages (Schön, Martens, van Doninck, & Butlin, 2003) based on COI and ITS markers. The study of the evolutionary history and phylogenetic relationships of Lake Baikal and Lake Tanganyika Cytherocopina by Schön and Martens (2012) exerted several dating methods in order to compare the divergence times of this ostracod group in two ancient lakes. Using geological dates of the two lakes origins, fossil record, and Wilke's universal COI molecular clock, the authors settled with the last method which placed the origin of Cytherissa species flocks in Lake Baikal between 8 and 5.3 Mya, rather similar with the age estimates based on fossil record.
The age of Cytherissa in Lake Baikal is in accordance with other animal groups and shows that Baikal's diverse endemic fauna is young, but it may stem from ancient lineages (Hidding, Michel, Natyaganova, & Sherbakov, 2003). Overall, the highest species flock explosions happened in the post-Pliocene ages, when the current ecological conditions established and abyssal parts of the lake expanded and became well oxygenated (Stelbrink et al., 2015). Animal groups mostly differ in the number of lake colonization events and in the evolutionary age of colonizers. In amphipods, molecular data suggested several independent colonization events of the lake, and subsequent diversifications (Macdonald, Yampolsky, & Duffy, 2005). In addition, invading lineages were much older than the lake itself and not even closely related (see Daneliya, Kamaltynov, & Väinölä, 2011;Sherbakov, 1999).
The aim of this research was to study evolutionary history and phylogeny of Baikal candonid ostracods, which has not been done so far.
To address this problem, we use three molecular markers (18S rRNA, 28S rRNA, and 16S rRNA) and 38 Candonidae species, of which 10 are from Lake Baikal and include representatives of all three genera. We also want to verify whether the evolutionary history of Baikal candonids is congruent with Cytherissa and other animal groups in the lake. By conducting molecular divergence time analyzes on the concatenated dataset and on 18S rRNA and 28S rRNA separately, we will test weather different datasets with the same calibration points and ages render similar time estimates. In addition, our results will test if the divergence time estimates based on slowly evolving nuclear markers (such as 18S and 28S) are comparable to those based on COI (Schön & Martens, 2012).

| Collecting
Samples were taken from 11-15 m depths by SCUBA diving from the shore of Lake Baikal at Listvyanka (51°51′51.3″N 104°50′37.8″E) on September 12, 2015. Three bottom types were sampled: rock, mud, and sand. Ostracods were sorted alive on the spot and immediately fixed in 97% ethyl alcohol. Dissection and identification were performed with the aid of Zeiss Axiostar-plus light microscope and Leica DM 2500 compound microscope, equipped with N-Plan objectives.
Scanning Electron Microscope (SEM) photographs were taken with a Hitachi S-4700 at Eulji University (Seoul).

| Nomenclature choices
In this study, we followed a recent revision of Cypridocopina (Hiruta, Kobayashi, Katoh, & Kajihara, 2016) based on the molecular phylogenetic analysis in which three Candonidae subfamilies, Candoninae, Paracypridinae, and Cyclocypridinae were all erected to the family level. In our analysis for each species we retained genera names in which they were originally described, unless a new combination has been proposed later on. For example, the genus Typhlocypris Vejdovský, 1882 is considered a senior synonym of Pseudocandona (see Karanovic, 2005), but not all species described in Pseudocandona have been given a new combination, so we abstained from doing this in the present publication. Namiotko, Danielopol, Meisch, Gross, and Mori (2014) redefined Typhlocypris to include a number of species originally described in Pseudocandona, none of which is part of our analysis. The same authors retained Pseudocandona for the rest of the species.

| DNA extraction and amplification
In the first step of the DNA extraction, specimens were kept for 2-3 hr in distilled water. LaboPass Tissue Mini extraction kit (Cosmo Genetech Co., Ltd, Korea) was used in all further steps of extraction, following the manufacturer's protocol. Fragments of 28S were amplified using the primer pairs dd/ff, ee/mm, vv/xx from Hillis and Dixon (1991), of the 18S with primers from Yamaguchi (2003), and fragments of 16S were amplified with primers from Palumbi et al. (1996), all using a TaKaRa PCR Thermal Cycler Dice. For all amplifications, PCR reactions were carried out in 25 μl volumes, containing: 5 μl of DNA template, 2.5 μl of 10× ExTaq Buffer, 0.25 μl of TaKaRa Ex Taq (5 units/μl), 2 μl of dNDTP Mixture (2.5 mmol/L each), 1 μl each primer, and 13.25 μl distilled H 2 O. The PCR protocol for 28S consisted of initial denaturation for 5 min at 94°C, 40 cycles of denaturation for 35 s at 95°C, annealing for 1 min at 50°C, extension for 1 min at 72°C.
Final extension was at 72°C for 5 min. PCR settings for the amplification of 18S followed Yamaguchi (2003) for each corresponding primer pair. Settings for 16S consisted of initial denaturation at 94°C for 5 min, 35 cycles of denaturation for 30 s at 94°C, annealing for 30 s at 48°C, extension for 1 min at 72°C. Final extension was at 72°C for 10 min. The PCR products were electrophoresed on 1% agarose gels; if DNA was present the products were purified for sequencing reactions using the LaboPass PCR Purification Kit, following the guidelines provided with the kit. DNA was sequenced on an ABI automatic capillary sequencer (Macrogen, Seoul, South Korea) using the same set of primers always in both directions.

| Molecular data analysis
All sequences were visualized using Finch TV version 1.4.0 (http:// www.geospiza.com/Products/finchtv.shtml). BLAST (Altschul, Gish, Miller, Myers, & Lipman, 1990) analysis of GenBank database was used to check that the obtained sequences were ostracod in origin and not contaminants. Each sequence was checked for the quality of signal and sites with possible low resolution, and corrected by comparing forward and reverse strands. Sequences were aligned in MEGA 7 (Kumar, Stecher, & Tamura, 2016) with ClustalW (Thompson, Higgins, & Gibson, 1994) with extension penalty changed from default settings (6) to 1 for 28S dataset in order to allow alignment of homologous regions that were separated by expansion segments present in some taxa but not others. All alignments were manually checked and corrected where necessary. The 28S alignments were also checked with Gblock (Castresana, 2000), and ambiguous blocks were removed. We analyzed alignment of each gene and all three regions of 28S amplified with different primes (dd/ff, ee/mm, vv/xx) separately. In addition, we performed two analyzes of the concatenated dataset: one including all three genes, and the other with only three 28S fragments; the latter was used only in the divergence time estimations. In the concatenated datasets, some species datasets were composed of sequences acquired from different specimens in order to avoid missing data, and for our outgroup we combined 16S from a different, but closely related, species. Missing data in concatenated datasets were coded "?". Recent simulations and empirical analyzes suggested that missing data in Bayesian phylogenetics are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall numbers of characters are large (Wiens, 2003;Wiens & Moen, 2008).
Sequence differences within and between groups in each individual alignment, as well as in concatenated datasets, were calculated in MEGA 7 using simple p-distance method. Sequences are divided into groups, defined by the genus they belong to. For the best fit evolutionary model program, jModelTest 2.1.6 (Darriba, Taboada, Doallo, & Posada, 2012;Guindon & Gascuel, 2003) was used with the Akaike information criterion (Hurvich & Tsai, 1989). Bayesian inference reconstruction in MrBayes v3.2.6 (Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003;Ronquist et al., 2012) was performed with the best fit model and priors for the base and state frequencies calculated by jModelTest. For the concatenated set, data were partitioned into five blocks corresponding to gene regions, each with its fixed priors. All analyzes ran with four chains simultaneously for two million generations in two independent runs, sampling trees every 200 generations. Of the four chains three were heated, and one was cold, the temperature value ("Temp" command in MrBayes) was 0.1 (default option). The results were summarized, and trees from each MrBayes run were combined with the default 25% burn-in. A >50% posterior probability consensus tree was constructed from the remaining trees.
For the choice of the outgroup we relied on the phylogeny published in Hiruta et al. (2016). As the relationships within Cypridoidea were not clearly resolved and Candonidae appears as a sister taxon to all other Cypridoidea, we decided on a representative of Cyclocyprididae, which used to be in the same family with Candoninae. For details of the number of original sequences, their sampling localities as well as for those downloaded from GenBank (Supplement 1).
Saturation test and likelihood ratio test for deviation from molecular clock of each separate dataset were performed with DUMBE5 (Xia, 2013), while for the concatenated datasets marginal model likelihood using stepping stone algorithm was applied to test molecular clock in MrBayes. After examining the consensus tree resulted from separate and concatenated analysis we chose four nodes to calibrate the molecular clock in the divergence time analysis performed in BEAST v1.8.3 (Drummond, Suchard, Xie, & Rambaut, 2012). Three analyzes were run as follows: concatenated dataset with all three genes, 18S dataset, and combined 28S dataset. The last differed from the first two in using strict clock model, while in the case of the first two we  Danielopol, 1968). All other priors were set to default program options. We conducted two independent runs for each analysis, each for 10,000,000 generations, sampling every 1,000 generations.

| Taxonomy
The samples collected from the lake contained representatives of  Mazepova, 1990; Candona rupestris Mazepova, 1990;and Candona spicata Mazepova, 1982 ( Figure 1a-f). Beside these six species, another four have been included in the analysis, but not identified to the species level as they were all at some of the juvenile stages. Two species were placed in Pseudocandona because they had strongly ornamented rectangular shells, typical for the Baikal Lake representatives of this genus (Figure 1g, h). Candoninae 7 and Candoninae 10 were left without any generic assortment. They both had a smooth carapace.

| Sequence diversity
The The amplification by em primer pair was relatively successful, but this was the most difficult dataset to aligned due to the long expansion segments present in several species. Although initially this alignment was very long (1,521 base pairs), after the Gblock analysis (Castresana, 2000) it was truncated substantially.
GTR (Rodríguez et al., 1990) (or its variations) with unequal rates among sites, with gamma distribution and invariable site (GTR + G + I) for 18S, 16S, 28S (df and vx fragments), but without invariable sites for 28S, em fragment was chosen as the best fit evolutionary model.
Supplement 2 summarizes general information for each alignment and also includes the base and rate frequencies, proportion of invariable sites and gamma shape.
The results of pairwise distance analysis are shown in Figure 2.
Within group means did not exceed 4% in any of the datasets.
Between group means varied from 5% for 18S to 13% for 16S. Of the three fragments of 28S, em was the most variable, followed by vx, and df fragments. The results of the p-distance analysis show that 16S is by far the fastest evolving gene, followed by 28S, and 18S, although there is little difference in values between the latter gene and the 28S df fragment.

| Phylogeny
After two million generation runs in MrBayes, the final standard deviation of split frequencies fell below 0.01 (for all datasets it was around 0.003) and the potential scale reduction factor was ~1.0 for all parameters, suggesting that convergence had been reached. All resulting consensus trees were rooted with the outgroup-Physocypria biwaense and P. cf. biwaense, or P. sp. in the case of 16S dataset analysis. Figure 3 illustrates the 50% consensus tree resulting from the analysis of the concatenated dataset. On this tree, Candonidae is strongly supported as a monophyletic group. The Candonidae clade can be broadly divided into two subclades, both with high posterior probability values: one containing 15 sequences equating to nine species, and the other which incorporates 34 sequences belonging to 28 species. The former clade contained four Candonidae tribes, proposed by Karanovic (2007): Cryptocandonini (letter "b"), Candonopsini (letter "c"), Trapezicandonini (letter "d"), and Humphreyscandonini (letter "e"). Candonopsini was a sister taxon to Trapezicandonini, while Humphreyscandonini was the sister taxon to these two. These The larger clade on the tree was composed of two tribes. All ex-  Kass & Raftery, 1995). (on all datasets) reconstructed this node as 50+ million years. Time trees differed in the 95% confidence intervals for the node heights, in that the intervals for 28S dataset were much narrower than those on the concatenated dataset. Traces analysis in Supplement 3 summarizes some of the BEAST results. Although, estimated sample sizes did not fall below threshold value of 100, in the analysis of the concatenated dataset they were significantly lower than in the 28S analysis.

| Divergence time estimates
The fact that both divergence time analyzes produced consistently older estimates than the fossil record suggested is in accordance with previous studies on ostracods based on 18S (Tinn & Oakley, 2008).
These authors showed that the molecular divergence rates differ among ostracod lineages, and that molecular time estimates are not always older than the fossil record would suggest. In contrast to our analyzes, they showed that the relaxed clock model aligns fossil and molecular time estimates better than the strict clock. In our study, the strict clock model applied to the 28S dataset analysis resulted in smaller age differences between fossil and molecular divergence time estimates. One of the reasons why the concatenated dataset suggested older dates is that the 18S dataset when analyzed alone (results not included in this study) placed divergence time even further back in the past, so the concatenated dataset reflected a consensus between 18S and 28S rates of evolutions. We agree with Tinn and Oakley This implies that the former group evolved before Lake Baikal was formed, while the latter may have evolved in some shallow lakes which preceded the formation of today's conditions. However, the latter group is more closely related to a group of typical European Candona species than to their Baikal congeners and the most recent common ancestor of this clade appeared 60 Mya. The 28S analysis dated the origin of these two Baikal lineages to a more recent time (12 and 5 Mya respectively), which would allow for the possibility that they both evolved in some shallow lakes in the Lake Baikal region. However, some caution is necessary with this interpretation, as in the period between Upper Miocene and Lower Pliocene (10-5 Mya) a great diversification of Candonidae lineages occurred in the Pannonian basin (Central Europe) after the closure of Paratethys under the conditions of decreased salinity in the Pannonian Sea (see Krstić, 1972). The habitat shift from saline to freshwater may have prompted this diversification, as it apparently happened with the Tethyan amphipods (Hou, Sket, & Li, 2011). This is important because, some of the Pannonian candonids (allocated to various genera described from this fossil record) strongly resemble forms found today in Lake Baikal.
Our results of the 28S divergence time estimates are very similar to those published by Schön and Martens (2012) for the other endemic ostracod group here, Cytherissa. Previously, and based on morphological data alone, Danielopol, Olteanu, Löffler, and Carbonel (1990) suggested that the Baikal Cytherissa flock originated through several independent radiations; they recognized at least three groups, one with earlier colonization time than the other two. This has been confirmed with the molecular divergence time estimates (Schön & Martens, 2012), which placed the time of this Baikal group diversifications between 8 and 5.3 Mya. Since Schön and Martens (2012) study was based on mitochondrial COI gene with general COI invertebrate clock (Wilke et al., 2009), and on a much larger sample size of Baikal species, this put more weight on our younger time estimates based on 28S. In general, Baikal ostracods diversification times are highly congruent with other animal groups (Kontula et al., 2003;Stelbrink et al., 2015;Zubakov et al., 1997, etc.).
On the other hand, beside potentially receiving fauna from various parts of the world, Lake Baikal was potentially also the fauna source.  Schön & Martens, 2012) and other groups with species flocks in Lake Baikal: Sculpin fishes have a high diversity in Baikal and one closely related species in Lake Michigan (see Sherbakov, 1999); and an amphipod species found in Finish streams has closest relatives in Baikal (Vainola & Kamaltynov, 1995). Karanovic and Abe (2010) and Karanovic, Grygier, and Lee (2013) attributed to ancient lakes a role of biodiversity pumps for subterranean habitats in addition to their role as refugia, because their deep and dark benthic environments provide ideal conditions for the evolution of subterranean adaptations. Our resulting trees did not reveal a close connection between subterranean ostracods and those from Lake Baikal, but our sample from the lake was limited.  Danielopol, 1980), and this may be the time of Candonini colonization as well. We believe that by the time Candonini started colonizing subterranean waters, this ecosystem was already inhabited by other candonid lineages, causing strong competition. There is a possibility that the clade consisting of the tribes Trapezicandonini, Cryptocandonini, and Humphreyscandonini invaded subterranean waters from marine environments, which has also already been postulated for unrelated ostracods and other crustacean groups (see Danielopol, 1980). However, supposedly the most recent common ancestor of the genus Trapezicandona lived about 6 Mya in cold fresh surface waters (Danielopol, 1968), contradicts this hypothesis.

Schön and Martens (2012) recovered at least four lineages within
Baikal Cytherissa species flock, but the basal branches remained unresolved, and the authors believe that assignment of all species to one genus underestimates real morphological variability. The abovementioned example of homoplasy related to the male sexual bristles is just one of many cases of convergent evolution in candonid ostracods.
There are numerous examples from subterranean ostracods from Western Australia (Karanovic, 2007). This is particularly true for the shell shape and ornamentation. Projecting shells of the Baikal candonids onto either of the resulting trees shows little congruence between the shape/ornamentation and phylogeny. Soft parts morphology remains obscure for all Baikal candonids, and further conclusions need to wait detailed taxonomic studies, because the morphology of hemipenis seems to best reflect phylogenetic relationships between candonid lineages (see Karanovic, 2007).

| Phylogeny of candonidae
Our analyzes supported five of the eight Candonidae tribes proposed by Karanovic (2007), and molecular phylogeny was almost identical to the morphological one proposed in the same publication.  (Ekman, 1908), belong to a yet undescribed genus.
The morphological phylogeny of Candonidae was carried out on the genus level and could not reveal polyphyletic nature of several Candonini genera, although this tribe was in fact the only paraphyletic lineage in Karanovic's (2007) cladistic analysis. The present molecular study showed that the most diverse Candonini genera