Microsatellite loci development and population genetics in Neotropical fish Curimata mivartii (Characiformes: Curimatidae)

The Curimatidae family plays an ecological role in the recycling and distribution of nutrients and constitutes a major food source for several commercially important fishes. Curimata mivartii, a member of this family, is considered a short-distance migratory species (≤100 km), categorized by the International Union for Conservation of Nature as a near threatened species, based on its declining population densities and habitat disturbance and fragmentation. Since population genetics and species-specific molecular tools remain unknown for all members of the Curimatidae family, this study developed a set of microsatellite loci and studied the population genetics of C. mivartii in the lower section of the Colombian Magdalena-Cauca basin. The results showed high levels of genetic diversity and evidence of gene flow even between locations separated over 350 km. This information provides a baseline for designing conservation and management programs for C.mivartii and constitutes the first study of population genetics in Curimatidae.


INTRODUCTION
The Curimatidae (Pisces: Characiformes) family encompasses eight genera and approximately 105 species that present a wide distribution in the freshwater environments of the cis-and trans-Andean basins of South America (Melo et al., 2016a). This is the fourth most diverse family of the Characiformes order, and its members have been increasing in number over the last 20 years due to the discovery and introduction of new species (Melo et al., 2016a). Although their commercial importance is limited to the subsistence fisheries and ornamental species trade, these detritivorous, benthopelagic and migratory species contribute to the recycling and redistribution of nutrients (Alvarenga et al., 2006). Additionally, they represent higher percentages of biomass, providing food to birds and a large variety of fishes, particularly economically important catfish species, that support the nutritional safety of the riverside communities (Lasso et al., 2010). Curimata mivartii Steindachner 1878, "vizcaína" or "cachaca" is a Colombian endemic species that represents the dominant biomass of the floodplain lakes in the middle basin of the Magdalena, San Jorge and Sinú rivers (Fig. 1). It is the largest Colombian trans-Andean species of the genus (approximately 35 cm) and has a short-distance migration range (approximately 10.1 km; average speed 0.3 km/day) (Lasso et al., 2010;López-Casas et al., 2016) through the main channel of the river. It forms great shoals and uses floodplain lakes as habitats for nourishment, refuge and larval development (Lasso et al., 2010). Like other species (Jiménez-Segura, Palacio & Leite, 2010; Zapata & Usma, 2013), its migrations seem to be related to changes in the water level of the rivers, which are bimodal in the Magdalena river basin (Jiménez-Segura, Palacio & Leite, 2010).
During migrations, C. mivartii is mainly exploited by subsistence fisheries, but it has become a target of commercial fishing due to decreasing capture of traditional species (Mojica et al., 2012). The estimated decline of 30% of its population density and the noticeable fragmentation of its habitats caused by the growing anthropic intervention have led to the inclusion of C. mivartii in the "Near Threatened" list, according to the international union for conservation of nature. Additionally, biological information about the species is scarce and the population genetics is unknown for all members of the Curimatidae family, which limits the design of appropriate conservation and management programs.
In contrast, population genetics studies have been performed on members of the phylogenetically related family, Prochilodontidae (Melo et al., 2016b;Vari, 1989), with which Curimatidae shares habitats and migratory, benthopelagic and detritivorous features. These studies reveal that Prochilodontidae exhibits populations formed by both nongeographical (Hatanaka, Henrique-Silva & Galetti, 2006;López-Macías et al., 2009;Melo et al., 2013;Orozco Berdugo & Narváez Barandica, 2014) and geographically structured stocks . Given the genetic structure of Prochilodontidae populations and the short-distance migration described for C. mivartii, we hypothesized that their populations are structured according to an isolation by distance model (Wright, 1943). To test this hypothesis, this study developed a set of primers for microsatellite loci amplification and evaluated the genetic diversity and structure of C. mivartii samples from different locations along the main stream and some floodplain lakes of the Magdalena-Cauca basin, encompassing differences in distance extending to over 350 km between the farthest locations.

MATERIALS AND METHODS
This study analyzed a total of 209 muscle tissues of C. mivartii from the main stream of the rivers and floodplain lakes in the lower section of the Colombian Magdalena-Cauca basin. The study area includes floodplains of the Andean Magdalena-Cauca basin that present riverbeds with an upper width of 500 m, low velocities, rock shards and fine sediments. This area comprises a group of floodplain lakes, which are the principal habitats of C. mivartii and differ in size, depth, levels of connection and anthropic intervention (farming and cattle breeding). Additionally, during the last decade (between 1997/1998 and 2009/2010) the Magdalena-Cauca basin has been exposed to the remote more extreme effects of atypical climate fluctuations such as the El Niño Southern Oscillation (ENSO) events (IDEAM-Instituto de Hidrología Meteorología y Estudios Ambientales, 2014), with eight to nine successive months (NOAA's Climate Prediction Center, 2015) of extreme floods (La Niña) and droughts (El Niño) in trans-Andean fluvial systems (Bookhagen & Strecker, 2010).
All samples, preserved in 70% ethanol, were provided by Integral S.A., through two scientific cooperation agreements (September 19th, 2013;Grant CT-2013-002443). Sampling collection was performed from 2011 to 2013 by Integral S.A., framed under an environmental permit from Ministerio de Ambiente, Vivienda y Desarrollo Territorial de Colombia # 0155 on January 30, 2009 for Ituango hydropower construction. The first group of samples was collected from the Cauca river ( Fig. 2A), three floodplain lakes (Grande, Las Culebras, Panela-Bolívar Department), a site on the Caribona river (La Raya, San Jacinto del Cauca-Bolívar Department) and another site on the Man river (Caucasia-Antioquia Department). The second group of samples was collected from the Magdalena river (Fig. 2B), a floodplain lake (Chucurí, Puerto Parra-Santander Department) and a site on the main channel of the river (Puerto Berrío, Antioquia Department).
To develop species-specific microsatellite loci, the sequence reads from the genomic library of one specimen of C. mivartii, previously pyrosequenced by 454 FLX technology (Landínez-García, Alzate & Márquez, 2016), were analyzed using PRINSEQ lite software to eliminate sequences of less than 100 pb in length, duplicated reads and low-quality regions. Then, PAL_FINDER v.0.02.03 (Castoe et al., 2010) was used to extract the reads that contained perfect tri-, tetra-and pentanucleotide microsatellite motifs, Primer3 v.2.0 (Rozen & Skaletsky, 2000) for the primer designs in the flanking sequences of the microsatellite loci and ePCR (Rotmistrovsky, Jang & Schuler, 2004) for assessing the correct alignment of the primers. The level of polymorphism was evaluated in 40 microsatellite loci and further genetic analyses were performed with a set of loci that satisfied the characteristics required to validate new microsatellite primers (Neff, Garner & Pitcher, 2011;Fernandez-Silva et al., 2013;Schoebel et al., 2013).
For genotyping, the PCR mixtures were performed in volumes of 10 ml containing the following final concentrations: 0.6 pmol/ml of each forward primer tagged on the 5′ end with one of the adapters described by Blacket et al. (2012), 12 pmol/ml of each reverse primer, 10 pmol/ml of each fluorescently labeled adapter, 1.1Â Master Mix, 2.5% v/v GC Enhancer Platinum Multiplex PCR Master Mix (Applied Biosystems, Foster City, CA, USA) and three to five mg/ml of template DNA isolated with GeneJET DNA purification kit (Thermo Scientific, Waltham, MA, USA), following the manufacturer's instructions. The amplification reactions were performed on a thermocycler T100 (BioRad, Hercules, CA, USA) with an initial denaturation step of 95 C for 3 min, followed by 42 cycles consisting of a denaturation step of 90 C for 22 s, an annealing step of 56 C for 20 s The average number of alleles per locus and the polymorphism information content (PIC) for each marker were calculated respectively using GenAlEx v.6.503 (Peakall & Smouse, 2006) and Cervus v.3.0.7 (Marshall et al., 1998). The average number of alleles per locus, observed (H O ) and expected (H E ) average heterozygosities and fixation index were calculated to estimate the genetic diversity of C. mivartii. Tests for departures from Hardy-Weinberg and linkage equilibria as well as the H O and H E were estimated using Arlequin v.3.5.2.2 (Excoffier, Laval & Schneider, 2005). The sequential Bonferroni correction was applied to adjust the statistical significance in multiple comparisons (Rice, 1989).
To explore nonneutral evolutive forces acting on the microsatellite loci, scanning analyses were performed using BayeScan v.2.1 (Foll & Gaggiotti, 2008) software to detect diversifying, positive or balancing selection. Parameters included 10:1 prior odds for the neutral model, 20 pilot runs consisting of 5,000 iterations each, followed by 200,000 iterations and a burn-in of 50,000 iterations. The loci were ranked according to their estimated posterior probability or posterior odds (equivalent to the Bayes factor), a statistical criterion to test the model. Positive alpha values were then used to distinguish microsatellites under diversifying selection, while negative alpha values were used to detect balancing selection (Foll & Gaggiotti, 2008).
To explore whether outlier loci found in this study represented false positives, the ocurrence of recent genetic bottlenecks of populations was evaluated by calculating the levels of heterozygosity using the Wilcoxon sign-rank test (Luikart & Cornuet, 1998) included in Bottleneck v.1.2.02 software (Piry, Luikart & Cornuet, 1999). Addittionally, the M ratio (the mean ratio of the number of alleles compared with the range in allele size) was calculated using Arlequin v.3.5.2.2 (Excoffier, Laval & Schneider, 2005). The M ratio indicates that the population has experienced recent and severe reduction in population size when its values are smaller than 0.68 (Garza & Williamson, 2001).
After exploring the genetic structure of the samples, the hypothesis of isolation by distance was tested using a Mantel test (Mantel, 1967) implemented in GenAlex v.6.503 (Peakall & Smouse, 2006). The genetic structure among geographical samples was calculated using the standardized statistics F′ ST (Meirmans, 2006) and Jost's D est (Meirmans & Hedrick, 2011), analysis of molecular variance (AMOVA) (Meirmans, 2006), with 10,000 permutations and bootstraps included in GenAlex v.6.503 (Peakall & Smouse, 2006). Patterns of genetic structure were further explored, using the diploid genotypes of 20 loci (40 variables) in 209 individuals, which were submitted to discriminant analysis of principal components using the R package Adegenet (Jombart, 2008).  (Pritchard, Stephens & Donnelly, 2000) was used to examine other sample groupings. Parameters included 350,000 Monte Carlo Markov Chain steps and 20,000 iterations such as burn-in, admixture model, correlated frequencies and the LOCPRIOR option for detecting relatively weak population structures (Hubisz et al., 2009). Each analysis was repeated 20 times for each simulated K-value, which ranged from 1 to 10 groups. For the best estimation of genetic stocks (K), the DK ad hoc statistic (Evanno, Regnaut & Goudet, 2005) was calculated using STRUCTURE HARVESTER (Earl & VonHoldt, 2012). Then, CLUMPP v.1.1.2b (full-search algorithm, function G′normalized, parameters at their default values) (Jakobsson & Rosenberg, 2007) and DISTRUCT v.1.1 (Rosenberg, 2004) were used, respectively, to summarize the results of independent STRUCTURE runs and plot the Q-matrix obtained in a histogram, displaying the ancestry of each individual in each population.
The pairwise tests of genotypic disequilibrium were nonsignificant and no evidence of null alleles or scoring errors were detected by Micro-Checker in the overall sample.  Berrío(35) Na 11.000 21.000 6.000 11.000 14.000 19.000 6.000 10.000 9.000 9.000 14.000 7.000 8.000 6.000 16.000 6.000 8.000 11.000 13.000 8.000 However, five of the 27 loci (Cmi02, Cmi05, Cmi07, Cmi34 and Cmi48) showed the lowest PIC values (Table 1), departure of allelic frequencies from Hardy-Weinberg equilibrium expectations in three or more of the evaluated samples and inconsistencies in the amplification (Table 2); consequently, they were excluded from additional analysis. Furthermore, the BayeScan analyses showed significant evidence of putative signals of diversyifing/positive selection for the additional loci Cmi35 (posterior probability: 0.946; log 10 PO: 1.242; alpha: 1.339 and P Simul Fst < sample Fst : 0.032) and Cmi42 (posterior probability: 0.999; log 10 PO: 3.699; alpha: 1.670 and P Simul Fst < sample Fst : 0.042) in the comparison between the floodplain lake Las Culebras and the remaining localities. Results from the Bottleneck tests (Table 3) were significant for all populations under the infinite alleles model (IAM) and some populations under the two-phase model (TPM); whereas they were non-significant under the stepwise mutation model (SMM). Due to the thought that few loci follow the strict SMM (Piry, Luikart & Cornuet, 1999), the best estimation of H E at mutation-drift equilibrium is expected under a combination of IAM and TPM. Additionally, all values of the M ratio were substancially smaller than 0.68, indicating that all populations have experienced recent and severe reduction in population size (Table 3).
The Bayesian analysis, discriminant analysis of principal components (Figs. 3A and 3B), AMOVA (F′ ST = 0.001; P = 0.285) and pairwise comparisons with standardized estimators F′ ST and Jost's D est (Table 4), evidenced the presence of a single genetic stock. Consequently, the Mantel test showed a weak spatial correlation of genetic distances in the evaluated sector (R 2 = 0.001; P = 0.031).

DISCUSSION
This study tested the hypothesis that C. mivartii exhibits genetically structured populations, according to an isolation by distance model; to accomplish that, a set of primers was developed for the amplification of 27 loci microsatellites, 22 of which exhibited allelic frequencies according to Hardy-Weinberg equilibrium expectations in most of the evaluated samples. This work also showed evidence of putative selection in 9.091% of the 22 loci examined, which is in line with the percentage of outlier loci (5-18%) reported for microsatellites in migratory marine fishes (Larsson et al., 2007;Rhode et al., 2013;Liu et al., 2016) and different molecular markers in other taxa (for reviews see Nosil, Funk & Ortiz-Barrientos (2009)).
The outlier loci found in this study may represent false positives resulting from the inclusion of severely bottlenecked populations (Teshima, Coop & Przeworski, 2006;Foll & Gaggiotti, 2008), although the significant excess of heterozygosity and small M ratio values were found even in populations that did not exhibit outlier loci. Alternatively,  the outlier loci may result from asymmetry gene flow by unidirectional migration (Hansen, Meier & Mensberg, 2010) as well as hitchhiking selection resulting from temporal disconnections between the floodplain lake Las Culebras and the Caribona river main stream. Although the contribution of these events in the observed findings remains unaddressed, both explanations are plausible considering the positive/negative ENSO successions observed in the last decade for the Magdalena-Cauca basin, the strongest recorded so far for this basin in terms of low water levels and high temperatures (IDEAM-Instituto de Hidrología Meteorología y Estudios Ambientales, 2014).
As this is the first report of population genetics for a species of the Curimatidae family, the results of this study were compared with species of the phylogenetically related family, Prochilodontidae. In C. mivartii values of the average number of alleles per locus (10.493) and expected and observed heterozygosities (H O = 0.757 and H E = 0.801) were similar to those found in studies that used species-specific microsatellite loci with large repeat motifs such as Prochilodus argenteus (Sanches et al., 2012;Melo et al., 2013) and Ichthyoelephas longirostris .
In contrast to the a priori expectation of a population structure concordant with an isolation by distance model, this study evidenced a high genetic connectivity of C. mivartii even in localities separated by over 350 km (Man river and Puerto Berrío). Given that this species is considered a short-distance migrant (<100 km; Zapata & Usma, 2013), a decrease in genetic similarity is expected among populations as the geographic distance between them increases. However, allele frequencies found in this study were not spatially autocorrelated at distances two and three times longer than the estimated migration range of C. mivartii, providing no support to our hypothesis.
The above findings may indicate that the dispersion range of the species is underestimated as the only register (10.1 km) published contained information about the recapture of a single individual out of 149 marked, using the mark-recapture method (López-Casas et al., 2016). Consequently, this outcome suggests that C. mivartii exhibits at least a medium migration range, a category that includes a displacement capacity of between 100 and 500 km (Zapata & Usma, 2013).
An alternative and non-excluding explanation is that C. mivartii displacements are performed in various events and in different directions and magnitudes, thereby providing a possible explanation for the high levels of connection among floodplain lakes along the studied range. This may also imply that C. mivartii does not exhibit a homing behavior like that described in various members of the Prochilodontidae family (Godinho & Kynard, 2006).
Similarly, given the sample collection period, the extensive gene flow might have resulted from a strong 2010/2011 La Niña event, which affected precipitation patterns world-wide (Boening et al., 2012) and caused damage in Colombia associated with occurrences of floods, windstorms, lightning and landslides (Hoyos et al., 2013). This event, preceded by warm anomalies during the first quarter of 2010 (Hoyos et al., 2013), may also explain an apparent excess of novel alleles and an incomplete allele frequency distribution, as evidenced by the Bottleneck analysis.
In summary, rejecting the hypothesis of isolation by distance, C. mivartii represents a single stock in the lower section of the Colombian Magdalena-Cauca basin, which exhibits high genetic diversity and an apparent recent and severe reduction in population size. This information constitutes a baseline for monitoring the population genetics of these species that inhabit main streams of the rivers and floodplain lakes downstream from the Ituango hydropower construction. Additionally, this knowledge is crucial to establish conservation units and facilitate its management.

CONCLUSIONS
The results of this study indicate that populations of C. mivartii are not structured according to an isolation by distance model in a sector three times longer than its estimated migration range (<100 km). This study also developed a group of 20 loci microsatellites, the first report for the Curimatidae family, that can be used in subsequent studies to delve into the conservation, basic genetics and biology of the species. Additionally, we found two outlier loci putatively under selection that will require further assessment in estimating their usefulness in future studies. These markers may be valuable for delineating conservation units known as Management, Adaptive and Evolutionarily Significant Units (Funk et al., 2012), thereby providing a more complete insight into the evolutionary potential for conservation of wild populations and the short-and long-term persistence of the species in the Magdalena-Cauca basin, a historically neglected aspect among most Neotropical fish species.