Development and characterization of 24 polymorphic microsatellite loci for the freshwater fish Ichthyoelephas longirostris (Characiformes: Prochilodontidae)

The Neotropical freshwater fish Ichthyoelephas longirostris (Characiformes: prochilodontidae) is a short-distance migratory species endemic to Colombia. This study developed for the first time a set of 24 polymorphic microsatellite loci by using next-generation sequencing to explore the population genetics of this commercially exploited species. Nineteen of these loci were used to assess the genetic diversity and structure of 193 I. longirostris in three Colombian rivers of the Magdalena basin. Results showed that a single genetic stock circulates in the Cauca River, whereas other single different genetic stock is present in the rivers Samaná Norte and San Bartolomé-Magdalena. Additionally, I. longirostris was genetically different among and across rivers. This first insight about the population genetic structure of I. longirostris is crucial for monitoring the genetic diversity, the management and conservation of its populations, and complement the genetic studies in Prochilodontidae.


INTRODUCTION
Ichthyoelephas longirostris (Steindachner, 1879) is a Colombian endemic fish of importance in the commercial and subsistence fisheries. This freshwater fish is a member of the family Prochilodontidae, which comprises three genera (Ichthyoelephas, Prochilodus and Semaprochilodus) and 21 species that occur in the major river basins of South America (Vari, 1983). In the Colombian red list of threatened freshwater fishes, I. longirostris is considered an endangered species based on some criteria that include scarce biological and ecological information, restricted distribution (tributaries of Magdalena and Ranchería basins), infrequent catches, disappearance in some floodplain lakes and rivers (Ranchería) and the habitat degradation by anthropogenic activities (Mojica et al., 2012).
Therefore, it is highly recommended to develop species-specific molecular markers, which has been greatly facilitated by next generation sequencing technologies (Castoe et al., 2010;Ekblom & Galindo, 2011). In addition to other advantages, this approach permits the rapid selection of long repeat motifs to prevent the genotyping problems associated with di-nucleotide (Gardner et al., 2011;Fernandez-Silva et al., 2013;Schoebel et al., 2013). Thus, this study developed de novo molecular markers for future population genetics studies of I. longirostris using next-generation DNA sequencing and bioinformatics. In addition, these novel microsatellite loci were used to assess whether I. longirostris comprise genetically differentiated populations in sections of three Colombian rivers of the Magdalena basin.

MATERIALS & METHODS
This study analyzed a total of 193 preserved tissues of I. longirostris from three rivers in the Magdalena-Cauca basin that will be influenced by two hydropower station projects: Cauca River (Ituango project), and the Samaná Norte and San Bartolomé rivers (Porvenir II project). All these samples were provided by Integral S.A., through two scientific cooperation agreements (19th September 2013;Grant CT-2013-002443). In the Cauca River, the samples came from three out of eight sections sampled (Fig. 1A) because the number of individuals was extremely low in the other sections: S1 and S2/3 are river sites in the department of Antioquia, whereas S6 comprise two floodplain lakes in the department of Bolívar. Sections S1 and S2/3 are respectively upstream and downstream of an area that exhibits steep topography, rapids, geomorphologic peculiarities, riverbed narrowing, and drastic changes in the water velocities and the slopes. Similarly, the rivers Samaná Norte and San Bartolomé-Magdalena (Fig. 1B) exhibit canyons associated with rough topography and pronounced changes in the slope. These characteristics of the three rivers may limit the migration of several fishes. DNA isolations were performed with the commercial kit GeneJET DNA purification (Thermo Scientific) following the manufacturer's instructions. To identify microsatellite regions and develop primers to amplify them, low-coverage genome sequencing was carried out with the Illumina MiSeq v2 instrument using the Nextera library preparation kits. This sequencing process generated paired-end reads of 250 bases that were cleaned using Prinseqlite v0.20.4 (Schmieder & Edwards, 2011) to eliminate low quality regions at both ends and remove reads that were duplicated or <50 bases in length. The genome assembly of reads was performed with Abyss v1.3.5 (Simpson et al., 2009) using a kmer 64 and the contigs were analyzed with the PAL_FINDER v.0.02.03 software (Castoe et al., 2010) to extract those that contained perfect tri-, tetra-and pentanucleotide microsatellites. The primer-pairs for microsatellite loci amplification were designed from their flanking sequences by using the Primer3 v.2.0 software (Rozen & Skaletsky, 2000). Additionally, the potential amplifiable loci were submitted to electronic PCRs (Rotmistrovsky, Jang & Schuler, 2004) for verifying in silico the correct primer alignment (http://www.ncbi.nlm.nih.gov/tools/epcr/). A total of 40 microsatellites were selected for optimization and polymorphism analysis in I. longirostris. Preliminary tests of standard PCR conditions (Sambrook, Fritsch & Maniatis, 2001) were carried out in 15 DNA samples and the amplicons were separated in 10% polyacrylamide gel in a Mini Protean R Tetra vertical electrophoresis cell (Biorad TM ) run at 100 volts for 45 min and visualized by silver-stain. Polymorphic loci were selected based on criteria of amplification in all samples, band resolution, specificity, size (from 100 to 400 bp) and ability to detect heterozygotes in the different samples analyzed. Then, a set of 24 polymorphic microsatellite loci that met these criteria and amplified consistently were selected and fluorescently labelled for further genotyping of 28 samples of I. longirostris. Finally, a subset of 19 loci were selected to evaluate the genetic diversity and genetic structure in 193 samples from three Colombian rivers.
PCR reactions in volumes of 10 µl containing final concentrations of 1×buffer (Invitrogen), 2-4 ng/µl of template DNA, 2.5% formamide (Sigma) 0.35 pmoles/µl labelled forward primer (either FAM6, VIC, NED or PET, Applied Biosystems), 0.5 pmoles/µl reverse primer (Macrogen), 0.2 mM dNTPs (Thermo Scientific), 0.05 U/µl Platinum TM Taq DNA Polymerase (Invitrogen) and 2.5 mM MgCl 2 . The PCR amplifications were performed on a thermocycler T100 (BioRad) with an initial denaturation step of 95 • C for 3 min, followed by 32 cycles consisting of a denaturation step of 90 • C for 22 s and an annealing step of 57 • C for 16 s. The extension step and a final elongation were absent in this thermal profile. Finally, the PCR products were submitted to electrophoresis on an automated sequencer ABI 3730 XL (Applied Biosystems) using LIZ500 (Applied Biosystems) as internal molecular size. Allelic fragments were denoted according their molecular size and scored using GeneMapper 4.0 (Applied Biosystems).
Tests for departures of Hardy-Weinberg and linkage equilibria and the estimation of the observed (H O ) and expected (H E ) heterozygosities were performed using Arlequin v.3.5.2.2 (Excoffier, Laval & Schneider, 2005). Statistical significance in multiple comparison was adjusted applying the sequential Bonferroni correction (Rice, 1989). The software GenAlEx v.6.501 (Peakall & Smouse, 2006) was used to estimate the average number of alleles per locus. Potential genotyping errors were evaluated by using Micro-Checker v.2.2.3 software (Van Oosterhout et al., 2004). The polymorphism information content (PIC) for each marker was determined using the program PICcalc (Nagy et al., 2012).
Genetic diversity was estimated by calculating the average number of alleles per locus, observed and expected average heterozygosities and fixation index. Non-geographical genetic differentiation among samples was tested the Bayesian analysis of population partitioning using Structure v.2.3.4 (Pritchard, Stephens & Donnelly, 2000). This analysis was performed with 100,000 Monte Carlo Markov Chain (MCMC) steps and 10,000 iterations as burn-in, run length that reached the convergence. Parameters included admixture model, correlated frequencies and the LOCPRIOR option for improving the performance of the algorithm when the signal of the structure is relatively weak (Hubisz et al., 2009). Each analysis was repeated 20 times for each simulated K value, which ranged from 1 to 8 groups. Then, the best estimate of genetic stocks (K ) was calculated using K ad hoc statistic (Evanno, Regnaut & Goudet, 2005) with STRUCTURE Harvester (Earl & VonHoldt, 2012). Results of independent STRUCTURE runs were summarized using CLUMPP v.1.1.2b (Jakobsson & Rosenberg, 2007), setting the parameters to their default values and the algorithm full search with the function G'normalized to guarantee we would find the optimal alignment of clusters across multiple runs. The Q-matrix obtained was plotted in a histogram displaying the ancestry of each individual in each population using DISTRUCT v.1.1 (Rosenberg, 2004). In addition, genetic differentiation among geographical samples was calculated by the standardized statistics F ' ST (Meirmans, 2006) and Jost's Dest (Meirmans & Hedrick, 2011) and analysis of molecular variance, AMOVA (Meirmans, 2006) included in GenAlex v.6.502 (Peakall & Smouse, 2006). Furthermore, the diploid genotypes of 19 loci (38 variables) in 193 individuals were submitted to Discriminant analysis of Principal Components to examine other grouping of the samples using RWizard (http://www.ipez.es/RWizard/).

RESULTS
All 24 polymorphic microsatellite loci initially selected for additional testing (Table 1) showed clearly defined peaks and the absence of stutter bands in the chromatograms. The number of alleles per locus ranged from 4 to 18, with an average number of 8.5 alleles/locus and average observed heterozygosity (H o ) of 0.669. Additionally, allelic frequencies of 20 loci were concordant with Hardy-Weinberg and linkage equilibria after sequential Bonferroni correction and no evidence of null alleles or scoring errors were detected by Micro-checker. Moreover, the PIC values ranged from 0.375 to 0.871 (average: 0.733) indicating that these markers are highly informative (Botstein et al., 1980). Nineteen of these loci were subsequently used to explore the population genetics of I. longirostris in a greater sample, whereas the other five are awaiting a similar analysis.
In the rivers, 3 of the 19 loci (Ilo21, Ilo11, Ilo3) departed from Hardy-Weinberg equilibrium expectations in all samples evaluated ( Table 2). The average number alleles per locus was higher in Samaná Norte (11.95) and Cauca S2/3 (11. 05) followed by Cauca S6 (9.16), Cauca S1 (8.84) and San  The three approaches for measuring genetic differences revealed contrasting results.The Bayesian analysis showed the presence of two genetic stocks ( K = 2), one predominantly in the Cauca River and the other one in the rivers San Bartolomé and Samaná Norte ( Fig.  2A). Although K = 2 was the most supported number of clusters using the K method, an additional clustering pattern (K = 4) was examined to compare it with the other approaches (Fig. 2B). This latter analysis showed the same tendency of clustering in two major stocks and two minor stocks with non-homogenous distribution (Fig. 2B). However, the discriminant analysis of principal components and AMOVA found significant genetic differences of I. longirostris among (Figs. 2C and 2E; F ST(0.001) = 0.010; P = 0.000) rivers. In addition, within the Cauca River, the discriminant analysis of principal components (Fig. 2D) displayed differences among the three sections examined, whereas AMOVA  showed low but significant genetic differences among the S1 and the other sections of the Cauca River but not between S2/3 and S6 (Table 3).

DISCUSSION
This study developed a set of 24 microsatellite loci for population genetic studies of the Colombian endemic fish I. longirostris. These loci are polymorphic, highly informative and 19 of them exhibited abilities to detect reliable levels of genetic diversity and structure in three Colombian rivers. Thus, these microsatellite loci are suitable for future studies of diversity and population genetics of I. longirostris. Remaining loci are awaiting to test their usefulness for population genetic analysis in a greater sample. The mean number alleles per locus of I. longirostris is similar to that found in P. argenteus using microsatellite loci with pentanucleotide motifs (Hatanaka, Henrique-Silva & Galetti Jr, 2006). However, as expected, this value is lower than those found in other studies that include a greater selection of dinucleotide motifs (Yazbeck & Kalapothakis, 2007;Passos et al., 2010;Orozco Berdugo & Narváz Barandica, 2014;Braga-Silva & Galetti Jr, 2016). Additionally, the levels of observed and expected heterozygosities are similar to those found in P. lineatus and Semaprochilodus insignis (Yazbeck & Kalapothakis, 2007;Passos et al., 2010) and higher than those found in P. costatus (Braga-Silva & Galetti Jr, 2016), P. argenteus (Hatanaka, Henrique-Silva & Galetti Jr, 2006) and P. magdalenae (except expected heterozygosity; (Orozco Berdugo & Narváz Barandica, 2014). The levels of observed heterozygosity are also similar to the values of average heterozygosity per species across loci found in 13 freshwater fish species, using microsatellite loci (0.54 ± 0.25; Dewoody & Avise, 2000).
Low incidence of I. longirostris in the fisheries has been interpreted as a signal of the decline in population density (Mojica et al., 2012). However, diversity levels found in this study might suggest that the low incidence in traditional fisheries may also result from the preference of this species for turbulent waters, which provide refuge in steep topography and treacherous rock riverbeds impeding its capture. Alternatively, since I. longirostris seems to make short displacements during dry stations (López-Casas et al., 2016), it might   not be an important component of commercial species' migrations. It remains to explore whether potential differences in spawning periods or reproductive/alimentary behavior with commercial species may explain the low captures. The three approaches for measuring population genetic structure generated different, but non-excluding results. STRUCTURE revealed only two stocks related to the highest hierarchical grouping, which is concordant with other studies that show that this software is limited for the nested fine substructure detection (Evanno, Regnaut & Goudet, 2005) and for the cluster identification at low levels of genetic differentiation (Latch et al., 2006). Except for genetic difference between S2/3 and S6, the results of the discriminant analysis and AMOVA (fixation and genetic differentiation indexes) are similar and reveal a level of fine scale structuring, indicating that the suggested structure is reliable. These latter analyses support the idea that I. longirostris is structured in four (AMOVA) or five stocks (discriminant analysis). Genetic differences between fishes from different sections of Cauca and Magdalena rivers were also found in populations of P. magdalenae (Orozco Berdugo & Narváz Barandica, 2014). This outcome might be explained by the short-distance migration range and habitat preferences of I. longirostris considering that the physicochemical characteristic of the habitat may explain the genetic structure in populations of other species (Schaack & Chapman, 2003;Duponchelle et al., 2006;López-Macias et al., 2009). Another alternative could be that homing behavior explains the genetic structure, which occurs in other members of Prochilodontidae (Godoy, 1959;Godoy, 1975;Godinho & Kynard, 2006).
Additionally, the minor genetic diversity upstream of the steep topography in the Cauca River may indicate that rapids do limit the gene flow between these sectors. However, the genetic differences between S6 and S2/3 in the absence of topographic accidents suggest that the behavior of the species, rather than the physical barriers, plays an important role in the non-homogeneous distribution of the genetic diversity. This explanation is also consistent with genetic differences found between Samaná Norte and San Bartolomé River.
In summary, this study developed the first set of polymorphic microsatellite loci for population genetics of I. longirostris and provides the first insights about the genetic structure of this species. Genetic differences were found among rivers and even within several sections of the Cauca River indicating that I. longirostris is conformed by, at least, four (likely five) stocks in the examined sites. This information, previous to the hydropower station construction, is crucial for monitoring the genetic diversity for management and conservation of this species as well as for complementing the genetic studies in Prochilodontidae.