Are scattered microsatellites weak chromosomal markers? Guided mapping reveals new insights into Trachelyopterus (Siluriformes: Auchenipteridae) diversity

The scattered distribution pattern of microsatellites is a challenging problem in fish cytogenetics. This type of array hinders the identification of useful patterns and the comparison between species, often resulting in over-limited interpretations that only label it as "scattered" or "widely distributed". However, several studies have shown that the distribution pattern of microsatellites is non-random. Thus, here we tested whether a scattered microsatellite could have distinct distribution patterns on homeologous chromosomes of closely related species. The clustered sites of 18S and 5S rDNA, U2 snRNA and H3/H4 histone genes were used as a guide to compare the (GATA)n microsatellite distribution pattern on the homeologous chromosomes of six Trachelyopterus species: T. coriaceus and Trachelyopterus aff. galeatus from the Araguaia River basin; T. striatulus, T. galeatus and T. porosus from the Amazonas River basin; and Trachelyopterus aff. coriaceus from the Paraguay River basin. Most species had similar patterns of the (GATA)n microsatellite in the histone genes and 5S rDNA carriers. However, we have found a chromosomal polymorphism of the (GATA)n sequence in the 18S rDNA carriers of Trachelyopterus galeatus, which is in Hard-Weinberg equilibrium and possibly originated through amplification events; and a chromosome polymorphism in Trachelyopterus aff. galeatus, which combined with an inversion polymorphism of the U2 snRNA in the same chromosome pair resulted in six possible cytotypes, which are in Hardy-Weinberg disequilibrium. Therefore, comparing the distribution pattern on homeologous chromosomes across the species, using gene clusters as a guide to identify it, seems to be an effective way to further the analysis of scattered microsatellites in fish cytogenetics.

Introduction Microsatellites, also known as short tandem repeats (STRs, [1]) or simple sequence repeats (SSRs, [2]), are stretches of DNA that consist of tandemly repeating di-, tri-, tetra-or pentanucleotide motifs arranged through eukaryotic genomes [3,4] in both coding and noncoding regions [4,5]. Most microsatellites are located in the nucleus (nuSSR), although they can also be found in mitochondria (mtSSR) and chloroplasts (cpSSR) [4]. They are one of the most abundant and variable types of DNA sequences in the genome [3,5,6] and occur primarily due to slipped-strand mispairing and subsequent error(s) during DNA replication, repair or recombination [4,7]. However, the activity of transposable elements, mainly non-LTR retrotransposons, has also been reported as a major source of new microsatellites and their movement throughout the genome [4,[8][9][10][11].
SSRs are recognized as powerful informative markers of genetic diversity and variability in both animal and plants [12][13][14]. In the last decades, they have been associated to several evolutionary and diversification aspects (see [3,10,15], including as potential causes of major structural chromosomal rearrangements, which seems to be facilitated by their high flexibility and low stability that creates fragile chromosomal sites [16,17]. In chromosomal mapping, although studies are still scarce for most species, they have provided a useful tool for understand the genome and chromosomal evolution of many different taxa [18][19][20]. They can be cytogenetically identified in large accumulations on a few chromosomal pairs [20,21] or with scattered signals throughout the chromosomes [20,22,23]. Furthermore, SSRs are commonly described associated to heterochromatic regions (e.g., [24,25]), often participating in its origin and increase (cHC) (for reviews, see [26][27][28]), and with crucial roles in the origin and evolution of specific chromosomes, mainly B chromosomes [22,23,29] and sex chromosomes [21,30,31]. However, in complement A chromosomes and populations without sex or B chromosomes, the cytogenetic studies are still concentrated only in the type of array and presence or absence of the marker, considering the whole karyotype [32,33].
One of the most challenging problems in expanding the use of microsatellites in fish cytogenetics is the scattered distribution pattern (e.g., [34][35][36], Table 1), because this type of array exhibit apparently random signals on the chromosomes. As a consequence, in most species without B or sex chromosomes, it hampers the description of useful distribution patterns to discuss evolutionary aspects as well as the comparison between karyotypes, often leading to oversimplified interpretations that simply label it as widely distributed or scattered. However, different lines of evidence indicated that the distribution of microsatellites is nonrandom [10]. They can influence in several aspects of the genome, including the nucleosome packing [37], methylation [38], high order chromatin structure [39][40][41] and splicing [42][43][44]. Microsatellites can also have enhancer functions [45][46][47], modulate gene expression [48-50] and participate of gene activity, recombination, DNA replication, cell cycle and mismatch repair (MMR) system [10,51]. Therefore, it is unlikely to believe that the scattered distribution pattern of SSRs in cytogenetic mappings is random and that it could not be helpful as chromosomal markers. In this scenario, considering that the scattered pattern consists in signals throughout the chromosomes, the association or colocalization with clustered sites in cytogenetic mapping, as already reported with several other types of DNA sequences [11,13,52], may provide an additional method to compare species and karyotypes. The clusters of previously mapped repetitive elements can be used as guide to identify homeologous chromosomes among species or populations, enabling comparative analyses beyond those with B or sex chromosomes.
All PCR products were sequenced in both ways, forward and reverse, using the ABI 3730 DNA Analyzer with the BigDye Terminator v3.1 Cycle Sequencing Kit (code 4337456) and the Sequencing Analysis software 5.3.1. The consensus sequence was generated by the Bioedit Sequence Alignment Editor [98]. The fragments identity was confirmed through BLASTn 2.11.0 (National Center for Biotechnology Information) [99].

Cytogenetic analyses and Fluorescent in situ hybridization
The samples were treated with a 0.02% colchicine solution (1 ml/100g of body weight) for 30-40 min and sequentially euthanized by clove oil overdose [100] (according to the ethics committee on animal experimentation and practical classes at Unioeste: 09/13-CEEAAP / Unioeste). The mitotic chromosomes were obtained from anterior kidney cells [101]. The chromosome morphology was classified according to [102].
Fluorescent in situ hybridization (FISH) was carried out with 77% of stringency [103] with some suggested modifications [104]. Clusters of 18S rDNA, 5S rDNA, snRNA U2, H3 and H4 histone genes were used as guide to integrative (guided) mapping with the SSR (GATA) n . The digital images were captured by the DP Controller 3.2.1.276 software using an Olympus DP71 digital camera connected to the BX61 epifluorescence microscope (Olympus America Inc., Center Valley, PA, United States of America). The Hardy-Weinberg equilibrium (HWE) and Chi-squared test were performed using the Hardy-Weinberg (HW) testing program [105].

Cytogenetic analysis
Trachelyopterus aff. galeatus-Araguaia River basin. This species had 2n = 58 chromosomes for both sexes (7 males and 9 females). The microsatellite (GATA) n were found spread throughout the chromosomes. The chromosomal pair 24, which bears the 18S rDNA, H3 and H4 histone genes loci, exhibited two blocks of the SSR (GATA) n in the terminal position of the long arm and none in the short arm (Figs 1 and 4). The chromosomal pair 25, which bears another H3 and H4 histone genes loci, also had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 4). The chromosomal pair 26, which bear the U2 snRNA locus, exhibited a chromosome polymorphism associated to the SSR (GATA) n distribution pattern (Fig 2), which combined with the U2 snRNA distribution pattern evidenced three chromosomal forms, including a new one, here referred as C chromosomal form:   Table 2). The χ2 value for Hardy-Weinberg equilibrium (HWE) was 15.99 (Df = 03; p = <0.05). The chromosomal pair 3 (5S rDNA) had three blocks of the SSR (GATA) n , two in the short arm (one in the terminal position and one in interstitial position) and one in the terminal position of the long arm (Fig 3).
Trachelyopterus galeatus-Amazon River basin. This species had 2n = 58 chromosomes for both sexes (7 males and 9 females). The chromosomal pair 20, which bears the 18S rDNA,  The chromosomal pair 24, also bearing a H3 and H4 histone genes loci, had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 2). The chromosomal pair 14, bearing the first 5S rDNA locus, exhibited three blocks of the SSR (GATA) n , two in the short arm (one in the terminal position and one in interstitial position) and one in the terminal position of the long arm (Fig 4). The chromosomal pair 16, bearing the second 5S rDNA locus, had two blocks of the SSR (GATA) n , one in the terminal position of the short and another in the long arm (Fig 3).
Trachelyopterus porosus-Amazon River basin. This species had 2n = 58 chromosomes for both sexes (4 males and 4 females). The microsatellite (GATA) n were found spread throughout the chromosomes. The chromosomal pair 23, which bears the 18S rDNA, H3 and H4 histone genes loci, had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Figs  1 and 4). The chromosomal pair 24, also bearing a H3 and H4 histone genes loci, had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 4). The chromosomal pair 26, which bears the U2 snRNA locus, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 2). The chromosomal pair 3, which bears the first 5S rDNA locus, exhibited had two blocks of the SSR (GATA) n , one in the terminal position of the short and another in the long arm. The chromosomal pair 4, bearing the second 5S rDNA locus, had three blocks of the SSR (GATA) n , two in the short arm (one in the terminal position and one in interstitial position) and one in the terminal position of the long arm (Fig 3).
Trachelyopterus coriaceus-Araguaia River basin. This species had 2n = 58 chromosomes for both sexes (4 males and 3 females). The microsatellite (GATA) n were found spread throughout the chromosomes. The chromosomal pair 23, which bears the 18S rDNA, H3 and H4 histone genes loci, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Figs 1 and 4). The chromosomal pair 28, bearing the U2 snRNA loci, had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 2). The chromosomal pair 3, which bears the first 5S rDNA locus, had three blocks of the SSR (GATA) n , two in the short arm (one in the terminal position and one in interstitial position) and one in the terminal position of the long arm. The chromosomal pair 16, bearing the second 5S rDNA locus, had two blocks of the SSR (GATA) n , one in the terminal position of the short and another in the long arm (Fig 3).
Trachelyopterus aff. coriaceus-Paraguay river basin. This species had 2n = 58 chromosomes for both sexes (2 males and 1 female). The microsatellite (GATA) n were found spread throughout the chromosomes. The chromosomal pair 22, bearing the 18S rDNA locus, had two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 1). The chromosomal pair 23, which bears the H3 and H4 histone genes loci, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 4). The chromosomal pair 27 bearing the U2 snRNA locus, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 2). The chromosomal pair 16, bearing the first 5S rDNA locus, had two blocks of the SSR (GATA) n , one in the terminal position of the short and another in the long arm. The chromosomal pair 18, which bears the second 5S rDNA locus, had three blocks of the SSR (GATA) n , two in the short arm (one in the terminal position and one in interstitial position) and one in the terminal position of the long arm (Fig 3).
Trachelyopterus striatulus-Doce River basin. This species had 2n = 58 chromosomes for both sexes (3 males and 3 females). The microsatellite (GATA) n were found spread throughout the chromosomes. The chromosomal pair 23, which bears the 18S rDNA, H3 and H4 histone genes loci, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Figs 1 and 4). The chromosomal pair 18, also bearing a H3 and H4 histone genes loci, had had two blocks of the SSR (GATA) n ; however, one in terminal position of the long arm and another in the interstitial position of the short arm (Fig 4). The chromosomal pair 28, which bears the U2 snRNA locus, exhibited two blocks of the SSR (GATA) n in the long arm and none in the short arm (Fig 2). The chromosomal pair 10, bearing the first 5S rDNA locus, had three blocks of the SSR (GATA) n , two in the long arm (one in the terminal position and one in interstitial position) and one in the terminal position of the short arm. The chromosomal pair 13 and 15, which bears another two 5S rDNA locus, exhibited two blocks of the SSR (GATA) n , one in the terminal position of the short and another in the long arm (Fig 3).

Discussion
The (GATA) n repeats, molecular components of the Bkm satellite DNA (Banded krait minorsatellite DNA), are widely distributed in the genome of higher organisms [106]. As expected, in all Trachelyopterus species of this study it was found scattered throughout the chromosomes (summarized in Fig 5). This type of array and has been described to different SSRs and taxa, such as plants [107,108], amphibians [20], fungus [109] and fish [22,23,33]. In Auchenipteridae, it was already reported in T. porosus and T. galeatus [22], Trachelyopterus aff. coriaceus (cited as Trachelyopterus sp. [23], and now it can also be seen in T. coriaceus, T. striatulus and Trachelyopterus aff. galeatus. This type of scattered array could be explained by the activity of transposable elements, which are a major source of new microsatellites and can also drive it throughout the genome [8][9][10][11] and/or to chromosomal rearrangements, as already proposed for a close genus, Hypostomus [33]. Though the (GATA) n repeats have been associated to sex chromosome in different organisms [21,[110][111][112], no differences between males and females have been found in these populations.
In the integrated mapping with the 18S rDNA, the chromosomal pair of all species had almost the same SSR (GATA) n distribution pattern, characterized by the absence of the microsatellite only in the short arm (Fig 1). It was also reported in a close species, Glanidium ribeiroi [53], but it was unexplored yet. Usually, SSRs can constitute a larger fraction of noncoding DNA, but are rare in protein-coding regions [10] mainly due to negative selection against frameshift mutations in coding regions, as evidenced in plants, primates, and microorganisms [113]. Comparative studies showed that only repeats in multiples of three may develop evenly in both regions [5,113], since RNA bases are read as triplets and other types could result in frameshift mutations [113,114]. In this way, the absence of the (GATA) n sequence in the short arm of the 18S rDNA chromosomes, might suggest a negative impact of the microsatellite near the coding areas of these species.
However, this hypothesis is contradicted by some evidences: (a) T. galeatus from the Amazon River basin had a (GATA) n block in the short arm of the 18S rDNA carrier; (b) there is overlaid signal between the (GATA) n sequence and the 5S rDNA and U2 snRNA sites; and (c) the (GATA) n sequence is distributed throughout the chromosomes, indicating that it could also be near other unmapped gene sequences. Therefore, the non-existence of (GATA) n signal in the short arm might only be related to a spatial issue, since besides the 18S rDNA, the short arm of most species also carries the H3 and H4 histone genes and, consequently, there could not be enough space for large amounts of (GATA) n sequence detectable through Fluorescent in situ Hybridization, which needs targets of at least 1kb to express significative results [115].
In contrast to the other species of our sample, T. galeatus from the Amazon River basin was the only one to possess the (GATA) n sequence in the short arm of the 18S rDNA carrier. However, it was identified in a polymorphic state, which appears to be neutral, since the heterozygous has no different adaptive value compared to other forms ( Table 2). Although other mechanisms have been proposed to explain how microsatellites can arise and expand over time, e.g., double-stranded DNA recombination (unequal crossing over and gene conversion), mismatch/double-strand break repair, and retrotransposition; replication slippage is still considered one of the main mechanisms thus far [3,4,116]. It also seems to be the most parsimonious to explain the origin of the microsatellite (GATA) n in the short arm of the 18S rDNA carrier in T. galeatus from the Amazon River basin. In this case, a small number of repeats (proto-microsatellite) is required before DNA polymerase slippage can extend the number of repeats, originating a new microsatellite [114], which seems not to be a problem in the genome of these Trachelyopterus species, as it is highly enriched with the (GATA) n repeats. Once generated, the new microsatellite can undergo several reduction or expansion events over time (see [3]), which can lead to the formation of large microsatellite blocks, such as the one visualized in T. galeatus from the Amazon River basin. On the other hand, the polymorphic state observed in this population may have arisen from a crossing between individuals possessing the new trait and those with the original condition.
The (GATA) n distribution pattern in the U2 snRNA chromosomes was conserved among most species. However, Trachelyopterus aff. galeatus from Araguaia River basin presented a chromosomal polymorphism in the same chromosomal pair (26) that was also reported to have a U2 snRNA chromosomal polymorphism [90]. Combining both polymorphic markers, U2 snRNA and SSR (GATA) n resulted in three chromosomal forms. The B chromosomal form is exclusive of this population; whereas the C chromosomal form was also found in T. galeatus from Amazon River basin and the and A form is present in T. striatulus, T. coriaceus, T. porosus, and Trachelyopterus aff. coriaceus. Although both polymorphic states interact to compose the chromosome arrangement in Trachelyopterus aff. galeatus, they seem to be originated in different evolutive events. The U2 snRNA polymorphism is an exclusive trait of Trachelyopterus aff. galeatus and product of a pericentric inversion [90]. On the other hand, the additional (GATA) n block that characterizes the polymorphism in Trachelyopterus aff. galeatus is not an exclusive trait and can also be seen in T. galeatus from Amazon River. Thus, the most parsimonious hypothesis is that the (GATA) n polymorphism in Trachelyopterus aff. galeatus might be originated through hybridization in secondary contact zones [117,118], which could be facilitated by the historical and geomorphological aspects of the Araguaia River floodplains, known by constant ichthyofaunistic exchange across surrounding hydrographic systems during the neotectonics reactivations in the Transbrasiliano Lineament during the formation of the Araguaia depression [119][120][121].
Interestingly, the U2 snRNA inversion polymorphism in Trachelyopterus aff. galeatus was reported in Hardy-Weinberg equilibrium [90], in which, the spread of the polymorphism could be associated to the neutrality of the rearrangement, since it suggests that there is no change in adaptative value among the genotypes or in the host fitness [122], as reported for water beetles [123] and blackflies [124]. In this state, the polymorphism is essentially influenced by genetic drift and migration [122]. However, the polymorphic state of U2 snRNA with the (GATA) n sequence, resulted in Hardy-Weinberg disequilibrium ( Table 2), suggesting that the combined arrangement may be under the effect of different forces beyond just genetic drift and gene flow.
Furthermore, all genotypes were found in a similar proportion compared to the expected by the Hardy-Weinberg equilibrium test (Table 2), except the ones with the C chromosomal form, in which, none of the heterozygous were found in the sample (AC and BC) and the homozygous (CC) presented a three times higher frequency than expected. In some cases, heterozygous originated from chromosome rearrangements can suffer severe reductions in fitness [125], zygotic lethality [126] or hybrid [127,128], especially when it involves change in gene order within a chromosome (inversions) [125] or when the hybrids carry multiple rearrangements [125,126]. In this scenario, the presence of multiple chromosomal polymorphism in the same chromosomal pair, which origin of both could be related to major chromosome rearrangements (inversions), associated to the absence of the C heterozygous and higher frequency of the C homozygous, may suggest the existence of distinct evolutive pressures over it compared to other genotypes. However, analyzes with a larger sample size, since the C form in heterozygosis may just not have been collected, are still needed to clarify it.
In contrast to the previously discussed markers, the (GATA) n chromosomal mapping on the 5S rDNA and H3/H4 histone gene carriers did not reveal new information about the structure of these chromosomal pairs. However, using the (GATA) n distribution pattern on the 5S rDNA carriers, the possible chromosome homeologies between the species could be inferred (Fig 3). Since the 5S rDNA is a marker usually found on multiple chromosomal pairs in Trachelyopterus, the chromosomal correspondence is difficult to suggest without additional information about the organization of each chromosome, a gap that could be partially filled with the distribution of the (GATA) n sequence on these chromosomes. Through the (GATA) n mapping on the 5S rDNA chromosome pairs, three main chromosomal arrangements could be evidenced among the species (Fig 3), in which, the chromosomal arrangement (1) is present in all species, whereas the chromosomal arrangement (2) is present in T. striatulus, T. galeatus, Trachelyopterus aff. coriaceus and T. coriaceus, and the chromosomal arrangement (3) is present only in T. striatulus and T. porosus.
The microsatellite mapping combined with the H3 and H4 histone genes confirmed the distribution of SSR (GATA) n already pointed out through the integration with 18S rDNA for the species that have this synteny (T. striatulus, T. galeatus, Trachelyopterus aff. galeatus, T. porosus and T. coriaceus) as well as the polymorphism in the chromosomal pair 20 of T. galeatus from the Amazon River basin. No new arrangement could be observed, and even for species that have multiple sites of H3 and H4 histone genes (T. striatulus, T. galeatus, Trachelyopterus aff. galeatus and T. porosus), both chromosomal pairs showed the same microsatellite distribution pattern.

Integrated mapping perspectives to scattered microsatellites in Neotropical fishes
To date, there is no similar approach in cytogenetic mapping of microsatellites comparing Neotropical fish species. Most studies have focused in the origin and evolution of specific chromosomes, mainly B chromosomes (13 out of 81 studied species-16,04%) and sex chromosomes (24 out of 81 studied species-29,62%); whereas others focused only on the type of array and presence or absence of the marker (Table 1). Nonetheless, of all cytogenetically analyzed species through physical mapping of microsatellites, 76.14% had at least one scattered microsatellite (67 out of 88 analyzed species), and in most studies, they could not be used to differ the species or populations.
Although the integrated mapping did not reveal large chromosomal rearrangements and most species had a similar distribution pattern of the (GATA) n sequence in the analyzed chromosomes, it proved that the scattered microsatellite (GATA) n has a non-random distribution, reiterating the existence of organization even in scattered microsatellites, which can be better described through a smaller scale analysis, comparing specific chromosomes between species. Through the integrated mapping a more accurate (GATA) n pattern were described to the chromosome carriers of the 18S and 5S rDNA, H3 and H4 histone genes and U2 snRNA, and even though three species/populations used in this study were already mapped with (GATA) n , none of them were able to detect the (GATA) n polymorphism in T. galeatus from Amazonas River basin and the presence of three chromosomal arrangements of the (GATA) n sequence in the 5S rDNA carrier. Likewise, the (GATA) n polymorphism in the U2 snRNA chromosomes of Trachelyopterus aff. galeatus from the Araguaia River basin would possibly go unnoticed without using the U2 snRNA site as a guide. Therefore, the integrated mapping (guided) proved to be an efficient methodology to reveal cryptic chromosomal arrangements of scattered microsatellites.
Cytotaxonomically, the integrated mapping showed the divergence in one more marker between T. galeatus from the Amazon River basin and Trachelyopterus aff. galeatus from the Araguaia River, which reiterate the existence of a possible new species, as already proposed through ribosomal markers [89] and integrated mapping with H3/H4 histone genes and U2 snRNA [90]. Methodologically, the integrated mapping turned the widely distributed SSR (GATA) n , with contributions in Trachelyopterus only to B chromosomes origin and evolution, into promising marker to distinguish other Auchenipteridae species. Thus, with the advancement of repetitive elements cytogenetic, we expect that the integrated mapping can further add to Trachelyopterus and also to other species with scattered distributed microsatellites.