Parallel evolution in Ugandan crater lakes: repeated evolution of limnetic body shapes in haplochromine cichlid fish

The enormous diversity found in East African cichlid fishes in terms of morphology, coloration, and behavior have made them a model for the study of speciation and adaptive evolution. In particular, haplochromine cichlids, by far the most species-rich lineage of cichlids, are a well-known textbook example for parallel evolution. Southwestern Uganda is an area of high tectonic activity, and is home to numerous crater lakes. Many Ugandan crater lakes were colonized, apparently independently, by a single lineage of haplochromine cichlids. Thereby, this system could be considered a natural experiment in which one can study the interaction between geographical isolation and natural selection promoting phenotypic diversification. We sampled 13 crater lakes and six potentially-ancestral older lakes and, using both mitochondrial and microsatellite markers, discovered strong genetic and morphological differentiation whereby (a) geographically close lakes tend to be genetically more similar and (b) three different geographic areas seem to have been colonized by three independent waves of colonization from the same source population. Using a geometric morphometric approach, we found that body shape elongation (i.e. a limnetic morphology) evolved repeatedly from the ancestral deeper-bodied benthic morphology in the clear and deep crater lake habitats. A pattern of strong genetic and morphological differentiation was observed in the Ugandan crater lakes. Our data suggest that body shape changes have repeatedly evolved into a more limnetic-like form in several Ugandan crater lakes after independent waves of colonization from the same source population. The observed morphological changes in crater lake cichlids are likely to result from a common selective regime.


Background
The spectacular species richness of cichlid fishes and their famous phenotypic diversity in terms of morphology, coloration, and behavior have made them a well-known textbook model system for the study of speciation and adaptive evolution [1][2][3]. The adaptive radiations of cichlid fishes in East Africa are also renowned for their astonishingly fast rates of speciation [4][5][6][7]. The most species-rich endemic cichlid species flocks are made up entirely of species that belong to one particular lineage of cichlids known as the Tribe Haplochromini [2,4,5,8,9]. The adaptive radiation of cichlids in Lake Victoria has attracted particular attention of biologists because its~500 endemic species probably arose within less than 100,000 years [5,6], which translates to one of the fastest known rates of speciation [10].
Another fascinating aspect of cichlid evolution is the frequent occurrence of evolutionary parallelisms, where species from different lakes have independently evolved a remarkable phenotypic resemblance, converging on several traits, including coloration, body shape, and trophic morphology [1,9,11,12]. Parallel morphological evolution has been considered to be strong evidence for similar regimes of natural selection being at work in driving diversification [13]. By studying repeated parallel evolution, the independent evolution of similar morphologies from a recent common ancestor in isolated and similar environments, we are investigating whether natural selection alone might be sufficient to produce these parallel morphologies, or whether genetic drift, geographic isolation, developmental or genetic bias has influenced the direction of diversification [14][15][16]. Multiple crater lakes are an ideal system for the study of parallel evolution in body shape in cichlids where similar morphs have repeatedly evolved under comparable ecological niches [17,18].
The crater lakes in southwestern Uganda (close to the Kazinga Channel that connects Lakes Edward and George, Figure 1) represent one of the few natural experiments in which one can study whether independent parallel diversification took place after independent colonization events. This region of Uganda is biologically almost completely unexplored. Over 50 crater lakes were created in this area by extensive volcanic activity in the East African Great Rift Valley. Geologists date the earliest volcanic activity in this region to approximately 50,000 years ago [19]. Some of these lakes were established through temporal connections with nearby river systems. Until now, only a few of these crater lakes (e.g., Lake Kyamwiga and Lake Nkugute) have been studied [20,21], and each lake was found to contain one genetically and morphologically distinct haplochromine species (Haplochromis "Lutoto" in Lake Nkugute and H. "Nshere" in Lake Kyamwiga). Analyses using mitochondrial and nuclear markers [20] suggested that these two new species are distinct, but originated from the same founding populations derived from the Kazinga Channel ( Figure 1).
Based on these initial findings, it is reasonable to predict that more Ugandan crater lakes might harbor endemic haplochromine cichlid speciesmaking these lakes an interesting natural experiment that permits one to study the interaction between geographical isolation and natural selection promoting phenotypic diversification and speciation. Each crater lake probably provides new and different habitats that are not found in the rivers or great lakes, such as clear and deep open water niches, each of which might exert similar selective regimes.
Here, we report on the first phylogeographical investigation of the haplochromine cichlids of the Ugandan crater lakes combining morphological and population genetic analyses. Using both mitochondrial and microsatellite markers, we inferred the phylogeographic relationships among 16 lakes within the region (Figure 1). Based on this, we also reconstructed the source populations as well as estimated time of colonization for each of the studied crater lakes. We then tested, using geometric morphometrics, whether independently colonized crater lakes, with characteristic larger pelagic zones, promoted the repeated evolution of limneticlike body shapes from the ancestral deeper-bodied colonizing species.

Genetic differentiation
Medium to high levels of genetic polymorphism were detected for most of the microsatellite loci (Table 1). Allelic richness ranged from 2.77 (MAF) to 6.46 (KAZ). Despite the fact that heterologous primers were used, no sign of null alleles was detected. Moreover, no clear signal of balancing or directional selection was detected with LOSITAN software for the panel of 15 microsatellites employed in this study. Strong genetic differentiation was found between crater lakes using both microsatellite and mtDNA markers, in marked contrast to high levels of gene flow between the great Rift Valley Lakes Edwards and George (EDW and GEO) and the Kazinga Channel that connects them (Tables 2 and 3). High levels of genetic differentiation between lakes and regions were also suggested by means of a principal coordinate analysis (Additional file 1) and the clustering pattern determined with STRUCTURE ( Figure 2). The most likely number of clusters was determined to be three, following Evanno's Delta K correction procedure [22]. This suggests that there are three geographically distinct groups: one group composed of both great lakes and the river connecting them (EDW-KAZ-GEO, in gray), one formed by the northern and central crater lakes (in green), and one including the southwestern crater lakes (in purple). Interestingly, further differentiation was detected when each of the previously mentioned clusters was analyzed separately. This is in agreement with high levels of differentiation found even within regions ( Table 2). The group affiliations are in concordance with the results found in the haplotype network (see Figure 3).
Central haplotypes in the haplotype network were almost exclusively from great lakes Edward and George (EDW-KAZ-GEO), supporting the hypothesis that these are the ancestral populations ( Figure 3). Moreover, three different mitochondrial lineages (northern, central and southern haplotypes are shown in yellow, blue and purple, respectively) were found corresponding to the geographically defined groups of crater lakes. Most of the haplotypes found in the crater lakes were only a few mutations apart from the ancestral ones, providing evidence that these populations have diverged rather recently.

Colonization time
Most of the crater lake populations showed a distinctive pattern of population expansion (Fu's tests were significantly negative), probably following colonization (Table 4). However, this pattern was not so clear in the great lakes Edward and George, where the population sizes seem to be more stable.
Clear asymmetric gene flow was detected for all the crater lakes, with, as expected, a much higher effective migration from the great lakes (EDW-KAZ-GEO) to the crater lakes than in the other direction (Table 3). This pattern reinforces the idea that the great lakes George and Edward were the older, larger and ancestral populations for the haplochromine cichlids that subsequently colonized the crater lakes of southwest Uganda.
Estimation of divergence times, together with the fact that three different mitochondrial lineages were found (see Figure 3), suggests that there were at least three independent waves of colonization from the source population ( Table 5). The oldest colonization event took place around 89,000 years ago to the crater lakes located geographically closer to the source lakes, those in the center (CEN) of the study area (KAT and MIR). Fittingly, these lakes also contain the largest haplotype diversity among the haplochromine cichlids sampled (Figure 3).  The older age of these crater lake haplochromine populations is also supported by the presence of many exclusive alleles and the fact that their private haplotypes are separated by several mutations in the haplotype network ( Figure 3). The crater lakes located in the northern region of the study area (CHI, BUG, KAB) were colonized around 71,000 years ago, whereas it seems that the geographically more distant lakes (southern part of the study area) were colonized more recently, around 50,000 years ago. This pattern is also supported by the fact that southern crater lakes share some central haplotypes with the great lakes, suggesting a more recent divergence or several colonization events. Based on the phylogenetic analyses of the microsatellite data, all southern crater lakes grouped together in a NJ tree and were clearly separated from the central and northern lakes (Additional file 2).

Morphological differentiation
No morphological differentiation was detected among fish collected from the great lakes (EDW-KAZ-GEO). However, a clear pattern of morphological differentiation was found between the haplochromine cichlids from the source and crater lake populations. No overlap was detected in the measured morphospace between the great lakes and the crater lakes except in a few individuals from the southern lakes ( Figure 4A). Only one morphometric cluster was found for each crater lake, suggesting no intralacustrine differentiation. Nevertheless, we found that haplochromine cichlids from the source and the young crater lakes have significantly different body shapes (Hotelling's T 2 test, P < 0.0001). Crater lake cichlids have more shallow body shapes (Landmarks 6, 9 and 10) relative to the great lake source cichlids (see the thin plate spline representation in Figure 4 B and Additional file 3). Interestingly, this pattern of morphological differentiation between source and crater lakes was consistent for most of the pairwise comparisons (Additional file 3) providing evidence for repeatedly evolved limnetic body shapes.
A positive correlation (r = 0.448, P = 0.052) between genetic (Fst) and morphometric (Procrustes) distance was found, suggesting that younger crater lakes (e.g. southern lakes) are more similar to the source populations.

Discussion
We found clear evidence for strong genetic and morphological differentiation of haplochromine cichlid fishes  Figure 2 Bayesian population assignment test based on 15 microsatellite loci with STRUCTURE. A hierarchical analysis was performed. The most likely number of clusters after DeltaK Evanno's correction corresponds to K = 3 (source, northern + central and southern lakes represented in gray, green and purple, respectively). Further analyses were performed for each of these clusters separately and the most likely number of clusters is shown. from the crater lakes and the source lakes (Edward and George) in Uganda. Based on mitochondrial markers, at least three different waves of colonization were suggested to have occurred, coinciding with increased geological activity around 50,000 years ago [19]. In addition to genetic differences, morphological differentiation was associated with the colonization of the Ugandan crater lakes. Crater lake fish are more slender than those from the shallow source lakes. Thus, the repeated evolution of elongated body shapes is likely to be an adaptation to living in the open, clear and deep waters of crater lakes compared to the murky and shallow waters of their ancestral lakes and riverine habitats.

Genetic differentiation
As expected, higher levels of genetic diversity were found in the great lakes than in the much younger crater lakes. The degree of genetic differentiation between crater lakes, even among those from the same region, supports a scenario in which each of the crater lakes constitutes a geographically isolated population. Moreover, no intralacustrine differentiation was detected, and each crater lake (except for Mugogo) consists of only one genetic cluster based on the genetic marker set employed.
Although based on only a relatively small number of individuals (N = 38), clear genetic differentiation was found between riverine and crater lake haplochromine cichlids in Lake Mugogo (see STRUCTURE plot for Lake Mugogo, MUG in Figure 2), suggesting no gene flow between riverine and crater lake populations. Consequently, our results indicate that these populations diverged mainly in allopatry, which is generally considered to be the most common and plausible mode of speciation [23][24][25].
Clear genetic and morphological differences exist between several lakes, suggesting that some of these crater lakes likely harbor undescribed and endemic haplochromine species. These findings should be corroborated by further taxonomic investigations.

Colonization/parallelism
The observed patterns of genetic differentiation and asymmetric gene flow found with both kinds of genetic   markers support at least three independent colonization events from the great lakes into sets of geographicallyclustered crater lakes. This finding is in line with evidence from previous studies that suggest that two crater lakes within this region were colonized by the Edward-Kazinga-George system [20]. Moreover, several central haplotypes were shared with those found in the older Lake Kivu, which was suggested to be ancestral to Lake Victoria [6]. Interestingly, the central haplotype 56 that connects Lake Kivu (LK) with Lake Victoria was suggested to be exclusive to lake Kivu in previous studies, supporting the crucial role of Lake Kivu haplochromines in the evolution of the haplochromines of eastern Africa [6]. However, in the present study we did find that this haplotype is also present in the Edward-Kazinga-George system, highlighting the relevance of this region during the stepwise colonization by haplochromine cichlids of the Lake Victoria region from the Lake Kivu region. Strikingly, every independent colonization event of Ugandan crater lake cichlids was associated with a morphological change in the same direction -that is, the evolution of a more slender (limnetic-like) body shape. An elongated, more streamlined body is usually associated with the exploitation of open water habitats [26][27][28]. Moreover, the repeated pattern of phenotypic divergence in concert with the use of certain habitats has been taken as evidence for the important role of natural selection in the generation of diversity [29,30]. Alternatively, only a limited set of phenotypes might be obtained in evolution, and the entire morphospace is not available for all lineages [14,31,32]. Indeed, an eco-morphological differentiation along the limnetic-benthic axis is taxonomically widespread and has long been reported as the most common pattern of divergence in freshwater fishes [28,[33][34][35][36]. Also, divergence along a benthic to limnetic axis is common in cichlids [17,[37][38][39][40] and we are beginning to identify the genomic basis for such ecologically divergent body shapes [41]. In our case, even though phenotypic plasticity cannot be completely ruled out, it does not explain much of the variance in shape found when wild larvae fish are reared in the lab (see Additional file 4). Clear morphological differentiation between lab-reared crater lake fish and fish from the source lakes was found (Hotelling's T 2 test, P < 0.001, Additional file 4). Thus, crater lake fish reared in lab conditions maintain shallower body shapes than those of wild fish captured in the great lakes. Altogether, this would suggest a genetic component of the elongated body shape characteristic of Ugandan crater lake cichlids.
A scenario in which the riverine haplochromine had first differentiated into more elongated shape and later colonized the crater lakes cannot be completely discarded, however, it is rather unlikely due to the fact that all the haplochromine cichlids found in the ancestral lakes (including the river Kazinga) have deeper body shapes, and that each of the three mitochondrial lineages (waves of colonization) originated from the central haplotypes (Figure 3, in gray). Moreover, riverine species inhabiting the proximity of the crater lakes were genetically different from the crater lake populations (see the STRUCTURE plot for Lake Mugogo, MUG in Figure 2). Unfortunately, the low number of riverine specimens included in this study precluded the proper geometric Estimates are based on four independent runs. T in thousands of years (95% confidence interval), M = 2 Ne m. Group 1, 2 and 3 correspond to the different mitochondrial lineages suggested from Figure 3.
morphometric comparison between crater lake and riverine individuals.

No intralacustrine diversification
Although clear genetic and morphological differentiation was found between each crater lake and the great lakes, no signal of intra-lacustrine diversification was detected. Different factors, such as temporal and spatial variation, ecological opportunity, and lineage-specific features have been proposed to affect the propensity for intralacustrine radiation [1,42,43]. Obviously, linage-specific characteristics do not seem to cause the absence of divergence within Ugandan crater lakes due to the fact that members of the same tribe, the haplochromine cichlids, have undergone some of the greatest radiations in other African lakes [2,4,5]. The line terminus refers to the shape change along a particular axis, compared with the average shape (black dot).
A positive correlation between the size of the lake and its species richness is expected [17,42]. In general, one would expect that the area of a lake is positively correlated with higher environmental heterogeneity. Hence, niche diversity would tend to increase with size as well as the opportunity for isolation by distance (but see Wagner et al. [44] 2012). Even though Ugandan crater lakes are generally very small (<1 km 2 , see Table 1), intralacustrine diversification has been found even in smaller crater lakes such as those in Cameroon [45] and Nicaragua [46,47]. The estimates of divergence times are similar to those previously calculated for two other Ugandan crater lakes [20]. It might be expected that because colonization occurred so recently (around 50,000 years ago), there has not been enough time to complete speciation within each lake. However, intralacustrine divergence in cichlid fishes has been detected in much younger lakes, such as Neotropical and African crater lakes, where ecological speciation has been suggested [48,49]. Possible reasons for the perceived lack of intralacustrine diversification in the Ugandan crater lakes might be that deep, clear open-water niches, like those found in the Nicaraguan crater lakes [47], might be much smaller or missing in the very small and relatively shallow crater lakes of Uganda.

Conclusions
A pattern of strong genetic and morphological differentiation was observed in the Ugandan crater lakes, suggesting that this system might still harbor several undescribed endemic species. The patterns of colonization events suggest that lakes that are geographically close tend to be genetically more similar, and that crater lakes in three different geographic areas have been colonized by three independent waves of colonization. Our data suggest that body shape changes have repeatedly evolved into a more limnetic-like form in several of these natural replicates. The observed morphological changes in Ugandan crater lake cichlids are likely to result from a common selective regime.

Sampling
A total of 337 haplochromine cichlids were collected (sample collection permit FISH201011/AU1) from 13 different lakes in southwestern Uganda, from Lakes Edward and George, and the Kazinga Channel ( Figure 1 and Table 1) in November 2011. Fish from both the shore and the middle of the lakes were sampled whenever possible in order to have representative samples from each lake. In addition, riverine haplochromine fish were also collected from a river close to the crater lake Mugogo. Fish were caught using seine nets and hand nets with the assistance of local fishermen. Each fish was euthanized with an overdose of MS-222, labeled and photographed in the field. A small tissue sample was preserved in pure ethanol and stored at 4°C until DNA extraction. These specimens were combined with previously-collected samples from Lakes Victoria, Albert and Kivu [6] stored in Axel Meyer's collection at the University of Konstanz.
The mitochondrial control region (CR) (838 bp) was PCR amplified using published primers and reaction conditions (L-Pro-F [55]; 12S5R, 5′-GGC GGA TAC TTG CAT GT-3') on a GeneAmp PCR System 9700 Thermocycler (Applied Biosystems). The PCR products were purified using the QIAquick PCR Purification kit (QIAGEN), and sequenced in both directions with the BigDye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems). Sequencing products were analyzed on an ABI 3130 Automated Sequencer (Applied Biosystems). Mitochondrial DNA sequences were aligned using the software SEQUENCHER v. 4.2 (Gene Code Corporation) and verified by eye.

Mitochondrial data analysis
Mitochondrial CR sequences were edited using the BioEdit Sequence Alignment Editor software [56] and aligned with the ClustalW application included in BioEdit. The different haplotypes were obtained with the program DNASP [57] and submitted to Genebank (Accession number KP406813 -KP406919). MODELTEST v3.7 [58] was employed to determine the model of sequence evolution that best fit the datasets and to calculate the proportion of invariable sites and the value of the gamma distribution shape parameter.
Deviations from equilibrium were tested with Fu's Fs [63] neutrality tests based on infinite-site model without recombination. Negative values of Fu's F are expected under a model of sudden population expansion. Mismatch distributions [64] were also calculated to investigate demographic changes. Time since lineage expansion (t) was calculated from tau = 2 μt, where t is the expansion time and μ is the mutation rate per million years per nucleotide.
Divergence times and migration rates among source and crater lake populations were estimated by comparing mitochondrial sequences with the program MDIV [65]. Initial runs were tested under a finite sites (HKY) model of evolution and default priors (M = 10, T = 5) to approximate the posterior distribution of scaled migration rate (M) and time since divergence (T), while allowing MDIV to estimate θ. We ran the MCMC for 5 million generations with 600,000 generations discarded as burn-in. Convergence was determined by evaluating the consistency of model values for each of the three parameters across four runs, which were then averaged to calculate mean θ, M and T values ± standard deviation. Time of divergence was calculated as tdiv = Tθ/2Lμ [65] where T (or TMRCA) and θ were estimated by the height of the posterior distribution, L is the sequence length analyzed, and μ is the mutation rate. Divergence times were estimated based on two different substitution rates [66] (see Table 5).

Microsatellite data analysis
Scoring errors, large allele dropout and null alleles were checked in MICROCHECKER [67]. The LOSITAN software [68] was used in order to test for neutrality. Microsatellite variation (allelic richness per locus, observed and expected heterozygosity) was calculated with the program GENETIX 4.05 [69].
The program ARLEQUIN [59] was employed for estimates of F ST values and their statistical significance between samples pairs, i.e. the significance of population differentiation, with the following settings: 1000 permutations for significance, 10,000 steps in Markov chain. Levels of significance for multiple tests were determined using sequential Bonferroni adjustments for simultaneous tests [70] whenever relevant.
The software STRUCTURE v2.3 [71] was used to assess the number of genetic clusters (K) using a Bayesian approach. A burn-in period of 50,000 steps followed by 500,000 Markov chain Monte Carlo (MCMC) iterations were enough to ensure convergence. Five independent runs were performed using an admixture (each individual draws some fraction of its genome from each of the K populations) and correlated allele frequencies model. The STRUCTURE software provides an estimation of the membership fraction in each of the inferred clusters.
A hierarchical structure analysis was performed until no more resolution was observed. Evanno's correction [22] was implemented and visualized using STRUCTURE HARVESTER [72] software. Thus, Delta K values were used to infer the most likely number of genetic clusters (K) at each hierarchical level. Genetic clusters were further validated using a principal coordinate analysis (PCoA) based on genetic distance in GenAlEX [73].
The program MIGRATE-N version 3.5.1 [74] was then used to estimate dispersal rates and long-term effective population size (N e ) from the microsatellite data, using a maximum-likelihood (ML) coalescent approach and averaging over five runs. The program estimates Ө, which is the product of the effective population size and mutation rate: 4N e μ, where μ is the mutation rate per generation. The effective number of migrants per generation, N e, is estimated as well as migration rate, m. As program settings, we employed a stepwise-mutation model (Brownian motion approximation) and used the default settings for other parameters. For each run, starting estimates for Ө were based on F ST values, with a burn-in of 15,000 trees, 14 short chains with a total of 100,000 genealogies sampled, and three long chains with 1,000,000 genealogies sampled, for each locus. Adaptive chain heating, with four different temperatures, was used to achieve an efficient exploration of the data.
A Neighbor-Joining tree was constructed based on a distance matrix calculated from the frequency data for the 15 microsatellite loci employing the computer package PHYLIP [75]. Statistical support of nodes was estimated with 1000 bootstrap replicates.

Morphological analysis: body shape
We examined body shape differentiation among source and crater lake fishes using geometric morphometrics. Fifteen homologous body landmarks were digitized in TPSDIG2.17 [76] from standardized pictures of 370 individuals (see Figure 4 for landmark description).
Shape analyses were performed in MorphoJ1.03d [77]. Landmarks were first aligned using a full Procrustes superimposition, which involves scaling all shapes to unit centroid size, translation to a common position, and rotation to minimize the Procrustes distance between landmark configurations [78,79]. Allometry is common in fish and thus morphology and total body size are typically related [79]. Therefore, a multivariate regression of body shape (Procrustes coordinates) on size (centroid size) was used to correct for allometric effects. Regression residuals were then used for all downstream geometric morphometric analyses.
Individual variation in body shape across and within lakes was visualized using Canonical Variate Analysis (CVA) and Discriminant Function Analysis (DFA) on the regression residuals (Figure 4). Shape differences between groups were visualized using thin plate splines [78]. Body shape differentiation of source and crater lakes was assessed with Hotelling's T 2 test as implemented in MorphoJ.