A New Problem with Cross-Species Amplification of Microsatellites: Generation of Non-Homologous Products

: Microsatellites have been widely used in studies on population genetics, ecology and evolutionary biology. However, microsatellites are not always available for the species to be studied and their isolation could be time-consuming. In order to save time and effort researchers often rely on cross-species amplification. We revealed a new problem of microsatellite cross-species amplification in addition to size homoplasy by analyzing the sequences of electromorphs from seven catfish species belonging to three different families (Clariidae, Heteropneustidae and Pimelodidae). A total of 50 different electromorphs were amplified from the seven catfish species by using primers for 4 microsatellite loci isolated from the species Clarias batrachus . Two hundred and forty PCR-products representing all 50 electromorphs were sequenced and analyzed. Primers for two loci amplified specific products from orthologous loci in all species tested, whereas primers for the other two loci produced specific and polymorphic bands from some non-orthologous loci, even in closely related non-source species. Size homoplasy within the source species was not obvious, whereas extensive size homoplasy across species were detected at three loci, but not at the fourth one. These data suggest that amplification of products from non-orthologous loci and appearance of size homoplasy by cross-amplification are locus dependent, and do not reflect phylogenetic relationship. Amplification of non-orthologous loci and appearance of size homoplasy will lead to obvious complications in phylogenetic interference, population genetic and evolutionary studies. Therefore, we propose that sequence analysis of cross-amplification products should be conducted prior to application of cross-species amplification of microsatellites.

Microsatellites are short tandem repeat DNA sequences with the unit length of 1 to 6 base pairs (Weber & May, 1989). Because they are highly polymorphic, co-dominant in nature, easy to score by PCR and rather abundant in most organisms studied, they have been widely used for the study of linkage mapping, comparative mapping, demographic structure and phylogenetic history in populations (Goldstein & Schlotterer, 1999;Zhang et al, 2001). However, microsatellites are not always available for the species to be studied and their isolation could be time-consuming (Lin et al, 2008;Wang et al, 2008). In order to save time and effort researchers often rely on cross-species amplification Küpper et al, 2008;Kayser et al, 1996;Kijas et al, 1995;Lin et al, 2008). This procedure uses PCR primers complementary to the flanking regions of loci from a extensively studied (source) species to amplify microsatellites from closely (Harr et al, 1998) or sometimes quite distantly related species (Gonzalez-Martinez et al, 2004) for which no such markers are described. One problem related to cross-species amplification is size homoplasy (Anmarkrud et al, 2008;Estoup et al, 1995). PCR products of microsatellite loci with the same fragment length, but different sequence can arise from mutational events (deletion or insertion) in the flanking regions of the repeats or by interruptions in a perfect repeat producing alleles of the same size, which however are not identical by decent. Microsatellite size homoplasy has been reported in a number of papers (Hempel & Peakall, 2003;Makova et al, 2000;van Oppen et al, 2000) and was thought be a major problem of cross-amplification. It seems that size homoplasy increases with time divergence among populations and taxa (Estoup et al, 1995). However, a current study showed that homoplasy at microsatellite electromorphs did not represent a significant problem for many types of population genetics analyses performed by molecular ecologists, as the extensive variability at microsatellite loci often compensated for their homoplasious evolution .
In this paper, we describe a new problem of applying microsatellites for several different taxa. Cross-species amplification of microsatellites generated polymorphic products from non-orthologous loci, which were revealed by sequence analysis of 240 clones representing all 50 electromorphs from four loci in seven species (Clarias batrachus, C. fuscus, C. gariepinus, C. macrocephalus, Heterobranchus longfilis, Heteropneustes fossilis and Phractocephalus hemioliopterus).

Species and phylogenetic analyses
Seven species of catfish were used in this study, namely: Clarias batrachus (abbreviation: Cba; the source species), C. fuscus (Cfu), C. gariepinus (Cga), C. macrocephalus (Cma), Heterobranchus longfilis (Hlo), Heteropneustes fossilis (Hfo), and Phractocephalus hemioliopterus (Phe). According to the current taxonomical system, five of the species studied were from the Clariidae family, one (Heteropneustes fossilis) from the Heteropneustidae family, which is closely related to Clariidae and the last (P. hemioliopterus) from the more distant Pimelodidae family. In order to determine the exact evolutionary relationship among the seven catfish species, phylogenetic analyses were conducted on the basis of the partial sequences of cytb genes from their mitochondrial genome. were downloaded from Genbank, whereas the one of P. hemioliopterus was amplified with PCR and sequenced as described (Agnese & Teugels, 2005). The sequence of the cytb gene of the Asian arowana (Scleropages formosus; DQ023143) was used as an outgroup. All seven sequences were aligned using Clustal_X (Thompson et al, 1997), and a NJ tree was reconstructed using the Kimura-2 parameter model of nucleotide using MEGA 3.0 (Kumar et al, 2001). The partial sequence of the cytb gene of P. hemioliopterus was deposited in GenBank under the accession number DQ200272. 1.2 Sequencing of electromorphs generated by cross-species amplification All 50 electromorphs (Tabs. 1−4) generated in an earlier study (Yue et al, 2003) from four microsatellites (Cba01, Cba03, Cba06 and Cba20) from each of the seven species were used for cloning and sequencing. PCR products (25 µL) were cleaned using a glassmilk-based optimized procedure described earlier (Yue et al, 2007;Yue & Orban, 2001) prior to ligation of the fragments in to the pGEM-T-Easy vector (Promega) and subsequent transformation into XL-10 gold ultracompetent cells (Stratagene). Colonies were subjected to white/blue selection, and the insert of selected white clones was amplified by colony PCR as described (Yue et al, 2000). Un-incorporated PCR primers were removed by treating 5 µL PCR product for each clone with 0.5 unit shrimp alkalic phosphatase (SAP; USB) and 0.2 unit Exonuclease I (ExoI; USB) in 1× SAP buffer at 37℃ for 30 min, followed by a treatment at 80℃ for 15 min to inactivate the enzymes. One µL treated PCR product was directly used as template for sequencing from both directions using a BigDye kit (Applied Biosystems) and either M13 forward or M13 reverse primer in a PTC-100 PCR machine (MJ Research). Electrophoretic separation of the sequencing products was performed by using an ABI3730xl sequencer (Applied Biosystems). In order to exclude the possibility of cloning artifacts, for each electromorph from each species, multiple clones (at least 199

Tab. 4 Electromorphs amplified by the primer pair designed for Cba20 in seven catfish species
Cfu (22) 3) were sequenced. Altogether the following number of clones were sequenced for the four microsatellite types: Cba01-107 clones, Cba03-20 clones, Cba06-50 clones and Cba20-63 clones. Alignment of sequences was carried out by using Clustal X (Thompson et al, 1997).

Phylogenetic relationship of the seven catfish species
Based on the partial sequences of the cytb gene of the seven species, a NJ tree was constructed (Fig. 1). The three species Clarias batrachus, C. fuscus and C. macrocephalus were closely related and clustered into a group. This group was linked to the group of C. gariepinus and Heterobranchus longifilis. The remaining two species: Heteropneustes fossilis and Phractocephalus hemioliopterus were distantly related to other five species.

Sequence analysis of electromorphs amplified by
the Cba01 primer pair The primer pair designed to the Cba01 locus amplified polymorphic products in all seven catfish species tested. Altogether 23 clear bands (eletromorphs) were detected in the seven species (size range: 199−349 bp), their sequencing analyses uncovered the total of 34 different alleles (Tab. 1). In C. fuscus, C. macrocephalus and P. hemioliopterus both the repeat and the flanking regions exhibited high similarity to source sequences from C. batrachus ( Fig. 2A). On the other hand, the corresponding sequences from C. gariepinus, and Heteropneustes fossilis species were completely different from the source sequences (Fig. 2B), but quite similar among these three species. The length of Heterobranchus longifilis alleles was similar to those of the source species, but the flanking region and repeats were entirely different (Fig. 2C).
The 5' and 3' flanking sequences for each allele were nearly identical in different individuals of C. batrachus, C. fuscus, C. macrocephalus and P. hemioliopterus, respectively. On the other hand, several differences were found between sequences from different species both at the 5' and 3' flanking regions (seven and eight positions, respectively). Most of them seem to have been caused by substitution, whereas the rest by insertion or deletion of a single base pair. A notable feature is, that the repeat structures of this locus were slightly different in these four species: (GC) 2 (AC) n in the source species, (GC) 3 GT(GC) 5−6 (AC) 5 (GC) 0−1 (AC) n in C. fuscus and P. h e m i o l i o p t e r u s , w h e r e a s ( G C ) 2 − 5 ( A C ) 0 − 1 (GC) 0−4 (AC) 0−2 GC(AC) n in C. macrocephalus ( Fig. 2A). Therefore, the polymorphism at this locus was caused by change in the number of either AC or GC repeat units in different species, resulting in fragments of the same length, but with quite different sequences. Within species, size homoplasy could only be detected in C. macrocephalus, but not in the source species, C. fuscus or P. hemioliopterus.
In C. gariepinus and H. fossilis, the sequences of the 7 electromorphs (Tab. 1) were different from those in source species. The flanking sequences were quite similar among different alleles, although the polyA and polyT repeats (located at the 5' and 3' flanking regions, respectively) showed polymorphism both within and among species (Fig. 2B). Moreover, a deletion of 16 bp was detected in the 5' flanking region of Heteropneustes fossilis (data not shown). In C. gariepinus (but not in H. fossilis) a CAG unit was deleted from the 3' flanking region. A few point mutations, short deletions or insertions have also been detected in the 5' and 3' flanking regions among electromorphs from different species (data not shown). Polymorphism in the repeat at this locus was caused either by a change in the length of polyA stretch in the 5' flanking region, or by the unit number of (GA) n , (GAA) n , (GGA) n compound repeats or by a deletion of three base pairs CAG and a change in the length of the polyT in the 3' flanking region (Fig.  2B).
In H. fossilis, the locus appeared to be duplicated, because more than two bands were detected in the PCR product of each individual tested, whereas no such phenomenon was observed in the other two species. The 199 bp allele from all six individuals of H. fossilis tested (Genebank No. AY196549) lacked a 150 bp fragment including the 5' flanking region and even the whole repeat region as compared with the largest allele (Hfo349) (Fig. 2B).
In Heterobranchus longifilis, the sequences of electromorphs were entirely differently from the alleles of the source species, although the length of the electromorphs was similar to those of the source species (Fig. 2C). The length polymorphism of the electromorphs was caused by the change of number of CT repeats.

Sequence analysis of electromorphs amplified by
the Cba03 primer pair At the Cba03 locus, a total of three electromorphs (range: 129 − 135 bp) were detected across the seven species (Tab. 2). Sequencing of each electromorph (20 clones) revealed that the sequence of this locus was highly conserved across the catfish species studied (Fig.  3). The polymorphism was caused exclusively by the change in the unit number of the (GGA) n repeat. At three positions of 3' flanking region, single base pair substitution was also seen in two species (C. macrocephalus and P. hemiolopterus). No size homoplasy was identified among individuals of any species.

Sequence analysis of electromorphs amplified by
the Cba06 primer pair At the Cba06 locus, a total of 11 electromorphs (range 168 −258 bp) were identified across the seven species (Tab. 3). Their sequence analysis demonstrated that they could be divided into two groups and two individual sequences (Fig.4A − D). Fragments amplified from C. fuscus (1 allele) and C. macrocephalus (4 alleles) showed an overall high similarity to the source sequence (Fig. 4A). In these two species, an insertion of a 34 bp fragment was detected at the 5' flanking region between the primer and repeats in every allele in comparison to the source sequence.
Additional single base pair substitutions, located in the flanking regions were also found. The length polymorphism was caused by the change in the unit number of the (AAC) n repeat within each species, but among species the length polymorphism could also be caused by change in the extent of polyA in the 3' flanking region or the insertion of a 34 bp fragment into the 5' flanking region. Although no size homoplasy was identified within these two species, its presence was quite obvious among species. For example, the 245 bp electromorph in C. fuscus and that in C. macrocephalus showed different unit number of CAA-repeats and appearance of a CTA sequence due to an A→T mutation in the latter.
The second group (Fig. 4B) included sequences from C. gariepinus (2 alleles), and H. longifilis (1). The DNA sequence of the fragments from the two species showed high similarity to each other, but differed from the source sequence both in their flanking regions and repeat motif [(AAC) n vs. (CA) n ]. An insertion of five base pairs (CGAAC) was seen in the 5' flanking region of the species H. longifilis, as compared the sequences from the C. gariepinus (Fig. 4B). Apart from this insertion, the length polymorphism was caused by the different number of the (AC) n repeat units in all fragments. Between the two species, single base pair substitution was observed at several positions of the flanking regions. The 168 bp fragment appeared in both species. However comparison of sequences between the two species revealed two different alleles. The remaining two sequences ( Fig. 4C−D; GenBank Nos. AY196578 and AY196579) originated from P. hemioliopterus and Heteropneustes fossilis, respectively. They did not show any similarity to the first two groups except the primer binding sites and did not contain repeats.

Sequence analysis of electromorphs amplified by
the Cba20 primer pair A total of 13 electromorphs (range: 93 − 143 bp) were detected across six species (Tab. 4), but not in P.
hemioliopterus. Sequence analysis revealed 10 additional alleles (Fig. 5), without any evidence of homoplasy within the source species. In the 5' flanking region, single base pair substitutions were detected at three positions among species. As compared with the source (4 alleles), sequences from C. macrocephalus (4) and Heterobranchus longifilis (3) showed an insertion of three basepairs (GTC) in the 3' flanking region. Single basepair substitutions were also detected at two positions of the 3' flanking regions. The repeat region was highly variable within and among species. In the source species, repeat structure for the 95 bp allele was (TC) 6 GC(TC) 2 , although longer and shorter alleles showed change in repeat number of longer repeat, the GC(TC) 2 motif remained constant among all alleles. In C. fuscus (3 alleles), where the (TC) n repeat was interrupted by a TA unit, the (TC) 3 upstream from the TA remained unchanged, whereas the downstream (TC) n repeat showed polymorphism among individuals. In C. gariepinus (2 alleles) the TC repeats were interrupted by GC and TG units at several positions and the polymorphism was caused by the change of the long, upstream TC repeat, whereas the shorter ones remained constant. In C. macrocephalus and Heteropneustes  fossilis (3 alleles), the (TC) n repeat was interrupted by CC, TT and GT motifs, whereas in Heterobranchus longifilis by GC, AG and TG units. The reason for the polymorphism was similar to that described for C. gariepinus.

Discussion
Microsatellites are very useful tools for genetic and evolutionary studies. However, their genotyping is based on prior sequence information from the genome to be analyzed. Despite of recent improvements on the procedure (for review see: Zane et al, 2002) the isolation of microsatellites is still cumbersome. One of the possible solutions for this problem is cross-species amplification, which involves the use of primer pairs designed for the flanking region of conserved microsatellites (of a so-called source species) for genotyping in related species amplification (Housley et al, 2006;Kayser et al, 1996;Kijas et al, 1995). Data for several such experiments have been reported in teleosts during the last decade (e.g. Koskinen & Primmer, 1999;Yue et al, 2004;Yue et al, 2003). However all PCR products generated in the non-source species have only been analyzed at the sequence level in a few cases (Kayang et al, 2002;Viard et al, 1998). We have tested the applicability of four conserved microsatellite markers isolated earlier from C. batrachus (Yue et al, 2003) on six additional catfish species. We found that PCR primer pairs designed for the flanking regions of the four C. batrachus microsatellite loci amplified products in most of the related species. However, sequencing analyses of 240 clones representing 50 electromorphs from seven catfish species revealed a new problem of cross-species amplification of microsatellites: the generation of non-orthologous loci, beside the appearance of size homoplasy. Primer pairs designed for two C. batrachus loci (Cba03 and Cba20) amplified highly similar (orthologous) sequence products in all non-source species. On the other hand, those designed for other two loci (Cba01 and Cba06) yielded polymorphic products with entirely different sequence from some of the distantly related species (e.g. P. hemioliopterus and Heteropneustes fossilis), and even in closely related species (e.g. C. gariepinus) indicating that these bands originated from non-orthologous loci. The amplification of specific products from non-orthologous source was locus-dependent, and did not reflect the phylogenetic relationship. Thus, in the absence of sequence information it would be very difficult to predict whether certain primer pairs will amplify products from orthologous loci in a given non-source species or not. Similar phenomenon was observed earlier in soybean (Peakall et al, 1998) and rice (Chen et al, 2002), but those findings have not been analyzed in detail. Taken together, our data suggest that generation of polymorphic products from non-orthologous loci by cross-species amplification is not a unique feature of certain taxonomic groups in fish, instead it might occur throughout the animal and plant kingdom. Although the mechanisms underlying this phenomenon are not fully understood, they are thought to be related to genome and gene duplication, as well as speciation. Such events are expected be more frequent in fish, since the ancestor of today's teleosts seems to have experienced an additional round of genome duplication (Meyer & Schartl, 1999;Postlethwait et al, 2000) and chromosome duplications (Chang et al, 2005) after their ancestor has split from that of the other vertebrates. Duplication of microsatellite loci followed by gene conversion can lead to amplification of non-orthologous loci as proposed (Angers et al, 2002).
Sequencing of all alleles of four microsatellite loci in the source and six non-source species showed that length difference of microsatellites was not restricted to their repeat regions. A longer insertion and several shorter insertions were detected in the flanking region of the loci orthologous to Cba06 in non-source species. At the Cba01, Cba06 and Cba20 loci, a number of alleles from different non-source species showed the same length, but with different sequences. At the same time, at Cba03 locus electromorphs of the same length represented the same sequences, suggesting that size homoplasy for microsatellite markers produced by cross-species amplification is locus-dependent, it does not reflect the phylogenetic relationship. We also found the tendency of increase in the number of interrupted repeats of orthologous loci in non-source species, as observed by others in different taxonomic groups (e.g. Culver et al, 2001;Di Gaspero et al, 2000;Estoup et al, 1995;Garza et al, 1995;van Oppen et al, 2000). This tendency also seems to be locus-dependent in catfish, since two loci (Cba01 and Cba20) showed clear interruptions in non-source species, whereas the other two (Cba03 and Cba06) exhibited no or few interruptions in them.
Applications of microsatellites to population genetics, ecological and evolutionary studies rely heavily on the models used for explaining the mutational process of these markers. However, all models relay on the assumption that differences between alleles at orthologous loci are due entirely to changes in the number of repeats. In this study, we demonstrated that appearance of size homoplasy and amplification of non-orthologous products by cross-species amplification were locus-dependent, and did not reflect phylogenetic relationships. Therefore, application of cross amplific-ation of microsatellites to population genetics and phylogenetic analyses in distantly or even in closely related species, might make the interpretation of length difference of electromorphs difficult and cause wrong estimation of evolutionary relationship.
In conclusion, we revealed a new problem of microsatellite cross-species amplification, namely amplification of non-orthologous loci, besides the well-known problem (size homoplasy). The new problem and appearance of size homoplasy will lead to obvious complications for phylogenetic interferences, population genetics, mapping and evolutionary studies. The sequence analysis of products generated by "cross-species primers" should always be performed, as it could reveal previously unrecognized problems and might allow for extracting more information from these loci, thereby increasing their usefulness.