Introduction

Markers have a key role in the study of genetic variability and diversity, in the construction of linkage maps and in the tracking of individuals or lines carrying particular genes. The emergence of marker systems has closely followed developments in biochemistry and molecular biology for the past 40 years (Hubby and Lewontin, 1966). The shortcomings of biochemically derived markers, such as isozymes, drove the development of markers based on DNA polymorphisms (Kan and Dozy, 1978). A DNA molecular marker in essence detects nucleotide sequence variation at a particular location in the genome. The variation must be found between the parents of the chosen cross for the marker to be informative among their offspring and to allow its pattern of inheritance to be analyzed. DNA markers can generate ‘fingerprints,’ which are distinctive patterns of DNA fragments resolved by electrophoresis and detected by staining or labeling. The advent of the PCR was a breakthrough for molecular marker technologies, and made possible many fingerprinting methods. These fall into two broad categories, namely methods that detect single loci and multiplex methods that detect multiple loci simultaneously.

The first multiplex methods to be developed were named randomly amplified polymorphic DNA (Williams et al., 1990; Welsh and McClelland, 1990) and DNA amplification fingerprinting (Caetano-Anollés et al., 1991), respectively, and involve amplification of random repetitious sites in the genome using short primers, typically 8–12 nt in length. The approaches involve quick and easy reaction set-up and no genome sequence information is needed to design the primers. However, problems in reproducibility due to the presence of many potential priming sites in the genome and the low annealing temperatures in the reactions, because of the nature of the primers themselves, have led to the disappearance of these systems from the molecular marker toolkit today. The amplified fragment length polymorphism (AFLP) method, introduced in the mid-1990s, also generates anonymous markers. It detects restriction sites by amplifying a subset of all the sites for a given enzyme pair in the genome by PCR between ligated adapters (Vos et al., 1995).

Interspersed repetitive sequences comprise a large fraction of the genome of many eukaryotic organisms and they predominantly consist of transposable elements (TEs). It is therefore not surprising that many DNA marker techniques that are based on these repeats have been devised. In an early example, restriction fragment length polymorphism (RFLP) probes derived from repetitive sequences were hybridized to Southern blots of restriction-digested genomic DNA to produce a highly variable pattern (Lee et al., 1990). The RFLP technique was used extensively in the past, but has been replaced by PCR-based methods because of the slowness of Southern blotting.

Nucleotide sequences matching repetitive sequences showing polymorphism in RFLP analyses have also been used as PCR primers for the inter-repeat amplification polymorphism marker method (Meyer et al., 1993; Salimath et al., 1995). Such repetitive sequences include microsatellites, such as (TG)n or (AC)n, which are distributed throughout the genome. A derived approach was developed to generate PCR markers based on amplification of microsatellites near the 3′ end of the Alu (short interspersed repetitive element (SINE)) TEs, called Alu-PCR or SINE-PCR (Chariieu et al., 1992).

TEs are divided into two major classes. Class II transposons, which were first discovered by McClintock (1984), move by a cut-and-paste mechanism as double-stranded DNA. In contrast, Class I retrotransposons transpose through an RNA intermediate, and hence the original copy remains in the genome (Finnegan, 1989). Retrotransposons are separated into two major subclasses that differ in their structure and transposition cycle. These are the long terminal repeat (LTR) retrotransposons and the non-LTR retrotransposons (LINE and SINE elements), which are distinguished by the respective presence or absence of LTRs) at their ends. All groups are complemented by their respective non-autonomous forms that lack one or more of the genes essential for transposition: MITEs (miniature inverted-repeat terminal elements) for class II, SINEs for non-LTR retrotransposons and TRIMs (terminal-repeat retrotransposons in miniature) and LARDs (large retrotransposon derivatives) for LTR retrotransposons.

LTR retrotransposons and genome organization

Retrotransposons are abundant throughout the genomes of virtually all eukaryotes and are particularly numerous in plants (Finnegan, 1989; Flavell et al., 1992; Voytas et al., 1992; Suoniemi et al., 1998). In plants, the LTR retrotransposons are typically more plentiful and active than their non-LTR relatives (see, for example, Arabidopsis Genome Initiative, 2000; Rice Chromosome 10 Sequencing Consortium, 2003; Hill et al., 2005; Macas et al., 2007; Paterson et al., 2009). In many crop plants, between 40 and 70% of the total DNA comprises LTR retrotransposons (Pearce et al., 1996; SanMiguel et al., 1996; Shirasu et al., 2000). Although most prevalent retrotransposons are dispersed throughout the genome, at least in the cereals and citrus they are often locally nested one into another and in extensive domains that have been referred to as ‘retrotransposon seas’ surrounding gene islands (SanMiguel et al., 1996; Shirasu et al., 2000; Ramakrishna et al., 2002; Bernet and Asins, 2004; Gu et al., 2004; Kong et al., 2004; Sabot et al., 2005). Bertioli et al. (2009) have shown that retrotransposon-deficient gene islands correspond to highly conserved blocs in legume evolution.

LTR retrotransposons are transcribed from one LTR of an integrated element to produce a nearly full-length RNA transcript containing a single copy of the LTR split between its two ends (the LTR provides both the start site and polyadenylation signal for the element; Figure 1). This RNA is then reverse-transcribed into an extrachromosomal complementary DNA, reconstituting the full-length element that is ultimately integrated back into the genome. Immediately internal to the LTRs are the priming sites for reverse transcription. The large central part of the retrotransposon encodes the structural components of a virus-like particle into which the RNA is inserted, together with reverse transcriptase and integrase enzymes.

Figure 1
figure 1

Organization of the LTR retrotransposon genome. The order of coding domains differs between the Copia and Gypsy superfamilies as shown. Retrotransposons are bounded by long terminal repeats (LTRs), which contain the transcriptional promoter and terminator (indicated diagrammatically by a bent arrow and stop sign, respectively). The resultant transcript is indicated as a hatched box between the Gypsy and Copia diagrams. The LTRs contain short inverted repeats at either end (shown as filled triangles). Reverse transcription is primed at the PBS and PPT domains, respectively, for the (−) and (+) strands of the complementary DNA (cDNA). The internal region of the retrotransposon codes for the proteins necessary for the retrotransposon life cycle and is generally divided into two open reading frames: GAG, for the capsid protein, which packages the transcript into a virus-like particle, and POL, for the other proteins. The POL contains: aspartic proteinase (AP), which cleaves the polyprotein; integrase (IN), which inserts the cDNA copy into the genome; reverse transcriptase (RT) and RNaseH (RH), which together copy the transcript into cDNA. An additional open reading frame for the envelope protein (ENV), found in some groups of Gypsy elements, is indicated. The LTRs are generally well conserved within families, and can serve for the design of primers to generate DNA footprints (Figure 2). Direct repeats in the flanking genomic DNA are generated upon retrotransposon integration: these are depicted as short, hatched arrows. The flanking genomic DNA is shown as a wavy line. The apposition of a long element bearing conserved sequences within genomic DNA of random sequence is the basis for retrotransposon marker methods.

The structural features described above, as well as the basic stages of the life cycle, are shared by the retrotransposons and the retroviruses (Frankel and Young, 1998; Kim et al., 2004; Wicker et al., 2007). However, rather than escaping the genome to infect new individuals as do retroviruses, retrotransposons insert the new copies only into their host genomes. If the integration takes place within a cell lineage from which pollen or egg cells are ultimately derived, then a new polymorphism is contributed to the gene pool.

Retrotransposon-based marker systems

General considerations: The dynamism and dispersion of the various groups of TEs have led to their widespread exploitation as molecular markers. Direct comparisons of retrotransposon methods with AFLP indicate that the retrotransposon markers are more informative in a variety of crops (Waugh et al., 1997; Ellis et al., 1998; Yu and Wise, 2000; Porceddu et al., 2002; Tam et al., 2005). Polymorphism in AFLP markers is based on single-nucleotide polymorphism (SNP) changes in restriction sites and on indels throughout the genome, whereas retrotransposon markers incorporate integration events in the generation of detectable variation. Hence, it seems that retrotransposon insertions tend to be more dynamic than SNPs and indels as a whole.

Unlike Class II TEs, retrotransposons cannot excise themselves from their insertion locations (Finnegan, 1989). This unidirectionality of integration confers great advantages in reconstructing pedigrees and phylogenies because the ancestral state is obvious—it is the empty site, whereas for almost all other genetic polymorphisms upon which markers are built, directionality cannot be inferred. In this way, phylogeny can be explored using retrotransposon insertions; for example, SINE elements have been used to trace human roots to Africa (Batzer et al., 1994), to establish the relationship of whales to even-toed ungulates (Shimamura et al., 1997) and to infer the evolutionary relationships between wild rice species (Cheng et al., 2002).

Most of the retrotransposon marker methods take advantage of two basic properties, namely that they cause large insertions by their transpositional activity and they contain conserved domains from which PCR primers can be designed. Some other methods target the small insertions and deletions found within otherwise conserved TE domains to generate fingerprints. Most of the techniques are also anonymous, producing fingerprints from multiple sites of retrotransposon insertion in the genome (Schulman et al., 2004) by using PCR primed on conserved motifs in the element and on some widespread and conserved motif in the surrounding DNA. For LTR retrotransposons, the primers are generally designed from the LTRs near to the insertion site, in LTR sub-domains that are conserved within retrotransposon families and differ between families (Figure 2). Although regions internal to the LTR containing conserved segments can be used for this purpose, generally the LTRs are chosen to minimize the size of the target to be amplified and to assay insertion site polymorphism rather than events internal to the element.

Figure 2
figure 2

Retrotransposon-based molecular marker methods. (a–c) Alternative priming sites in the genome paired with a priming site in a retrotransposon. (a) The SSAP method. Amplification is carried out from genomic DNA cut with two restriction enzymes (R1 and R2), containing a retrotransposon and ligated to an adapter (shown only for R2). Primers are indicated as arrows; the LTR generally serves as the retrotransposon priming site. (b) The IRAP method. The second priming site is also a retrotransposon. (c) The REMAP method. Amplification takes place between a microsatellite domain (labeled simple sequence repeat (SSR)) and a retrotransposon, using a primer anchored to the proximal side of the microsatellite and a retrotransposon primer. (d, e) RBIP. (d) Full sites are scored by amplification between a primer in the flanking genomic DNA (shown as a blue wavy line) and a retrotransposon primer. The single product is shown as a red bar beneath the diagram. The alternative reaction between the primers for the left and right flanks (light blue bar beneath the diagram) is inhibited in the occupied site by the length of the retrotransposon. (e) The flanking RBIP primers are able to amplify the empty site, depicted as a deep blue bar beneath the diagram, but amplification from the retrotransposon primer does not occur (missing product shown as a light red bar) because the TE insert is missing.

The various methods for revealing insertion polymorphism of retrotransposons differ in the nature of the feature external to the TE that is used for primer design (Figure 2; Schulman et al., 2004). Because the LTRs are direct repeats, a primer facing outward from the left or 5′ LTR will necessarily face inward from the right, or 3′ LTR. Depending on the nature of the second primer, the inward facing primer will either not amplify a product or produce a band from the TE interior that is typically of low usefulness because it represents the total TE subfamily rather than a particular copy. The internal amplicon can be removed by judicious use of an infrequently cutting enzyme or designing the primer to overlap the LTR end and adding bases that do not match the LTR-interior junction (Waugh et al., 1997; Vershinin et al., 2003). For retrotransposons with relatively short LTRs the transposon-specific primer can derive from an internal sequence present only once per element, simplifying this process (Ellis et al., 1998).

Below, we present an overview of the various retrotransposon-based molecular marker methods that have been developed and discuss their use to visualize the genetic diversity generated by TE activity.

Sequence-specific amplified polymorphism (SSAP)/transposon display

Waugh et al. (1997) exploited the dispersion and prevalence of the BARE1 LTR retrotransposon in barley through modification of the AFLP technique (Figure 2). Their method is based on the digestion of genomic DNA with two different enzymes to generate a template for amplification between retrotransposons and adaptors ligated at the restriction sites (usually MseI and PstI although any pair of restriction enzymes could in principle be used), using selective bases in the adaptor primer (Syed and Flavell, 2007). Usually, SSAP shows more polymorphism and more co-dominance than AFLP. Nevertheless, SSAP also requires restriction digestion of genomic DNA to provide sites for adapter ligation as does the AFLP method. The sensitivity of commonly used restriction enzymes to DNA methylation could provide false genotyping results and needs to be taken into account in experimental design; nevertheless, this can also be used as a design to sample DNA methylation.

In SSAP, nonselective primers can be used when enzymes used for digestion cut infrequently, or when the copy number of the TE is low. For high-copy-number families, the number of selective bases may be increased (Schulman et al., 2004). The use of two enzymes in SSAP correspondingly reduces genomic complexity, as does the use of selective bases on the primers associated with the adapters. TEs with low numbers of copies are not well suited to methods to reduce genomic complexity. However, the use of single enzyme digests with a systematic series of all possible selective bases allows the survey of all insertion sites for a given TE.

SSAP has been optimized for multiple plant species and protocols for rapidly obtaining retrotransposon sequence information for SSAP primer design have been developed (Syed and Flavell, 2007; Kalendar et al., 2010). The same technique was named transposon display when applied to DNA transposons rather than retrotransposons (van den Broeck et al., 1998). Rim2/Hipa transposon display produced highly polymorphic profiles with ample reproducibility within a species as well as between species in the Oryza genus (Kwon et al., 2005).

Inter-retrotransposon amplified polymorphism (IRAP)

As discussed above, although retrotransposons are dispersed, they can also be found clustered in the genome. It is the phenomenon of clustering that makes possible the IRAP method, which detects insertional polymorphisms by amplifying the portion of DNA between two retrotransposons (Figure 2; Kalendar et al., 1999; Kalendar and Schulman, 2006). If all retrotransposons were dispersed equally throughout the genome, even for an abundant family such as BARE1, individual elements would be 50 kb apart, and could not yield IRAP amplification templates.

A virtue of IRAP is its experimental simplicity. All that is needed is simple PCR followed by electrophoresis to resolve the PCR products. IRAP can be carried out with a single primer matching either the 5′ or 3′ end of the LTR but oriented away from the LTR itself, or with two primers. Nearby TEs may be found in different orientations in the genome (head-to-head, tail-to-tail or head-to-tail) increasing the range of tools available to detect polymorphism depending on the method and primer combinations. If two primers are used, they may be from the same retrotransposon family or from different families. The PCR products, and therefore the fingerprint patterns, result from amplification of hundreds to thousands of target sites in the genome. The pattern obtained will be related to the TE copy number, insertion pattern and size of the TE family.

IRAP fingerprints with single primers often generate bands from 500 to 3000 bases, lengths that are not convenient for capillary electrophoresis. To reduce the size of the DNA products to be separated and visualized, fluorescent primers may be used in the PCR reaction and the amplicon DNA digested with a four-base-specific restriction enzyme such as TaiI or TaqI after the PCR reaction. In this way, IRAP can be adapted to analyses on capillary sequencing platforms.

Retrotransposon-microsatellite amplified polymorphism (REMAP)

The REMAP method is similar to IRAP, but one of the two primers matches a microsatellite motif (Figure 2; Kalendar et al., 1999; Kalendar and Schulman, 2006). Abundant in most genomes, microsatellites or simple sequence repeats seem to be associated with retrotransposons and have high mutation rates due to polymerase slippage. Therefore, they may show much variation at individual loci within a species (Schulman et al., 2004). In REMAP, anchor nucleotides are used at the 3′ end of the simple sequence repeat primer to avoid slippage of the primer between the individual simple sequence repeat motifs. An anchored primer also prevents the detection of variation in repeat numbers within the microsatellite.

Banding patterns are completely different if REMAP primers are used individually or in combination, indicating that the majority of bands are derived from sequences bordered by a microsatellite on one side, and by an LTR on the other. Usually, the REMAP pattern is more variable than the corresponding inter-simple sequence repeat pattern; and often (but not always, depending on the LTR sequence) the IRAP pattern with primer combinations shows more variability than with a single primer (Leigh et al., 2003; Kalendar et al., 1999, 2004; Kalendar and Schulman, 2006).

Retrotransposon-based insertional polymorphism (RBIP)

In addition to SSAP, IRAP and REMAP, a fourth method based on the polymorphic integration pattern of retrotransposons, RBIP, has been developed (Flavell et al., 1998). RBIP is the sole retrotransposon method designed to detect polymorphism for the integration of an element at a particular locus (Figure 2). The RBIP method uses primers flanking retrotransposon insertions and scores the presence and absence of insertions at individual sites. The method has also been called insertion sequence-based polymorphism (Paux et al., 2010).

Using three primers, RBIP yields co-dominant marker scores, which are particularly useful for phylogenetic studies because retrotransposon insertions are irreversible. In the case of a retrotransposon, a primer designed in the LTR is used together with a primer designed in the flanking region to allow the amplification of an insertion site, whereas primers specific to both the 5′ and 3′ flanking regions are used to score the corresponding empty site. TE insertions are usually more than thousands of bases long, and hence the flanking primers do not generate an amplicon from the occupied site. Hence, RBIP detects both the presence and absence of the insertion but requires that the sequence of the 5′ and 3′ flanking sequences of the TE insertions are known. RBIP has been used in rice (Vitte et al., 2004) to address the issue of the evolution of Indica and Japonica rice varieties.

Tagged microarray marker (TAM)

The basic RBIP method has been developed for high-throughput applications by replacing gel electrophoresis with array hybridization to a filter (Flavell et al., 1998; Jing et al., 2007). Initially, PCR reactions detecting the occupied sites and unoccupied sites carried out together were spotted onto membranes, and probed with a locus-specific probe. TAM is an extension of this to a microarray format (Jing et al., 2007). TAM based on the PDR1, Cyclops and Tpv LTR retrotransposons of pea has been developed for scoring thousands of DNAs for a co-dominant molecular marker on a glass microarray slide (Figure 3).

Figure 3
figure 3

TAM fingerprinting of two RBIP markers. A total of 3263 Pisum lines were scored for the RBIP markers (a) Birte-B1 and (b) 1794-2 (Jing et al., 2005) by the TAM approach (Flavell et al., 2003; Jing et al., 2007, 2010). Each spot represents a single sample (sample locations in the array are conserved between slides) and in these two cases a red spot indicates an occupied (retrotransposon insertion present) locus and the green spot an unoccupied locus. Yellow spots indicate an individual heterozygous for the retrotransposon insertion.

Scaling the dot blot approach to microarrays has given attendant advantages in throughput, efficiency and data collection. RBIP also works well with SNP markers (Flavell et al., 2003; Jing et al., 2007). In this approach, biotin-terminated allele-specific PCR products are spotted unpurified onto streptavidin-coated glass slides and visualized by hybridization of fluorescent detector oligonucleotides to tags attached to the allele-specific PCR primers. Two tagged primer oligonucleotides are used per locus and each tag is detected by hybridization to form a concatameric DNA probe labeled with multiple copies of a fluorochrome. This method has recently been used to study the diversity of a complete pea germplasm collection containing thousands of samples (Jing et al., 2010). The DArT (diversity array technology) array (Jaccoud et al., 2001), similar to TAM, is also array based. In contrast to TAM, it is an anonymous method (the sequences of the loci are not known beforehand) and scores multiple loci per array for relatively few accessions.

Retrotransposon UTL polymorphism fingerprinting (RUP)

A multilocus molecular marker based on length variability within the internal untranslated leader (UTL) region named RUP has been described (Pelsy, 2007). It uses PCR amplification between primers from the LTR and the gag domain that lies near the 5′ LTR. Unlike the previously described methods, which visualize insertional polymorphisms, RUP detects UTL size variation in a variable region within individual members of a particular retrotransposon family. The RUP method shows these size differences as a fingerprint. The individual UTL size classes are stable enough to permit phylogenetic resolution of the plant accessions in which they are found. Pelsy (2007) used RUP to describe a genotype specific to each of the 94 Vitis accessions analyzed. Previously, a minimum of six standard microsatellite markers chosen for their high degree of allelic polymorphism were necessary to identify a grape variety (This et al., 2004).

In a related approach, Vershinin and Ellis (1999) used primers throughout the internal region of the pea retrotransposon PDR1 and compared this method with SSAP, which assesses insertion site polymorphism in pea. The two methods gave similar results especially when comparing the gag domain with the insertion sites. Nevertheless, the analysis of internal domain markers in addition to the insertional polymorphisms gave a more detailed picture of the evolution of element families in germplasm as well as their population history.

Development of retrotransposon marker projects

A major disadvantage of all the methods described above is the need for retrotransposon sequence information to design family-specific primers. However, related species have similar TE sequences (retroelements or transposons), meaning that primers for the anonymous marker methods described above (SSAP, IRAP and REMAP) from one species can be used in another. In our experience, LTR primers can be readily used across species lines, among closely related genera and even sometimes between plant families (Lou and Chen, 2007; Sanz et al., 2007; Figure 4). In this case, primers designed to conserved TE sequences are advantageous. Moreover, TEs are dispersed throughout the genome and often interspersed with other elements and repeats. By combining PCR primers from different classes of repeats and families of LTRs, PCR fingerprints can be improved.

Figure 4
figure 4

IRAP fingerprints for Triticeae species. A barley BARE1 LTR primer (5′-GCCTCTAGGGCATAATTCCAACAC-3′) served in the reaction. (a) Hordeum vulgare cultivars and landraces: 1, Rolfi; 2, CI 9819; 3, Pallidum107; 4, Odesskij 31; 5, Odesskij 17; 6, Sonja; 7, Sultan; 8, Ingri; 9, Beka; 10, Djau Kabutak; 11, W1991; 12, 408; 13, 688; and 14, 1354. (b) Hordeum spontaneum lines: 15, T1 (Turkey); 16, T11(Turkey); 17, J31(Jordan); 18, IN68 (Iran); 19, IN80 (Iran); 20, IS112 (Israel); and 21, IS147 (Israel). (c) Other Hordeum species: 22, H. murinum ssp. glaucum; 23, H. brachyantherum ssp. californicum; 24, H. erectifolium; and 25, H. marinum ssp. gussoneanum. (d) Other Triticeae: 26, Aegilops peregrina; 27, Triticum diccocoides; 28, Triticum aestivum (cv. Bogdarka); 29, Psathyrostachys fragilis ssp. fragilis; 30, Phleum pratense; 31, Avena sativa; and 32, Secale strictum. Marker sizes in bp are indicated on the left axis.

Deployment of a retrotransposon marker system into a species, in which the methods have not been previously used, requires PCR primers that recognize a retrotransposon and, in the case of RBIP, the flanking sequences. The retrotransposon targets that can be amplified by heterologous primers developed in a different species tend to be members of old families of elements present before the divergence of the plant clades in question. Jing et al. (2005) estimated the average age of segregating retrotransposon insertion sites in Pisum as being approximately 2 Myr. This result is roughly similar to estimates made by SanMiguel et al. (1998) and Vitte et al. (2004) in maize and rice, respectively, but may be biased toward younger elements because structural disruptions make old insertions harder to characterize. Nevertheless, these ages are comparable to divergence times between some closely related species (or even the genera Homo and Pan), suggesting that some retroelements may be useful in recently diverged clades.

A balance between monomorphism, allowing the alignment of fingerprint patterns between samples, and polymorphism, yielding information on relationships between samples, needs to be achieved. Phylogenetic analysis can only be made on individual fingerprinting bands of the same size that correspond to the same sequence and loci. Within a species, bands of the same length and amplification intensity usually contain the same sequence. Between species, it is best to confirm the identity for at least some of the bands to be scored. Similarly, for RBIP and TAM, the possibility that insertion flanks may not be single copy within the genome, or may not correspond to the same linkage group in interspecific comparisons, needs to be considered.

The IRAP, REMAP and SSAP methods are all dominant, as are randomly amplified polymorphic DNAs, restriction-fragment length polymorphism and AFLP, although some implementations permit the identification of heterozygotes by measuring amplicon amounts (Knox et al., 2009). For dominant retrotransposon markers, the absence of an amplicon may be the consequence of mutation at the locus carrying the insertion. The mutation could affect the binding site for the retrotransposon primer or, in SSAP, lead to the gain or loss of a restriction site. In practice, this problem does not arise in the application of retrotransposon markers to segregating populations generated by deliberate crosses, because the alleles are determined by the parents. Similarly, alleles that share a recent common ancestor will be unlikely to carry second site mutations. Although it may be possible to map two polymorphisms to the same site, and show by sequencing that they correspond to two allelic states differing by the insertion of a retrotransposon, this is very tedious in practice. It may also be possible to detect the presence of one or two doses of a retrotransposon marker band (Knox et al., 2009), thereby scoring both heterozygotes and homozygotes, although band intensity may also vary because of the reaction conditions.

In general, if one wishes to study closely related varieties or breeding lines, one should develop a retrotransposon marker system based upon the most polymorphic TE available. This process begins with amplification and sequencing of variable regions close to the outer termini of the TE, development of primers specific for the retrotransposon families found and testing these for their efficacy as markers (Pearce et al., 1999; Jing et al., 2005; Kalendar et al., 2010). It may be necessary to clone and sequence hundreds of clones to obtain a few good primers. In practice, up to one person-year of time is needed to develop and apply a fully functioning novel retrotransposon-based marker system in a new species. However, this is a one-time investment that can be applied thereafter to the corresponding species and its close relatives.

The most polymorphic retrotransposons are likely to include those that are currently active. To identify these, one could amplify and sequence unconserved regions between conserved domains (for example, within the integrase or reverse transcriptase domains) in RNA, and then use a primer from the unconserved region and an adapter primer in a genome-walking approach to isolate the corresponding LTR (Kalendar et al., 2010).

The most general approach for acquiring new TE sequence information is to use shotgun sequencing of the whole genome, for example, with a 454 Genome Sequencer FLX Instrument (454 Life Sciences, Branford, CT, USA), followed by clustering of the repeats (Macas et al., 2007). Putative LTRs may then be identified by the presence of PBS and PPT motifs, with primers designed to match the most prevalent groups of similar putative LTRs. Given the long contiguous sequences spanning retrotransposons, active ones will likely contain open reading frames (indicating translational capacity), identical LTRs (indicating recent integration) and identical target site duplications (indicating recent integration). Alignments of elements meeting these criteria in comparison to those that do not may allow the design of primers specific to active retrotransposons for use in marker methods.

In principle, a marker from any of the multilocus, anonymous systems (SSAP, IRAP and REMAP) can be converted into a corresponding RBIP marker and vice versa. Markers from the former methods are easy to harvest and they can be quickly examined for their informativeness before taking on the investment of developing a corresponding RBIP marker. SSAP, IRAP and REMAP bands are derived from one side of a retrotransposon insertion and sequencing of them enables the design of a flanking genome PCR primer, provided that the sequence is not repetitive and therefore unusable. However, the genomic sequence flanking the other side of the element needs to be found to score the empty site. This can be obtained by screening germplasm accessions that are polymorphic for the original band, and then carrying out a SSAP reaction on these, in which the LTR primer is replaced with a primer designed to the known flank that is facing toward the insertion site (Jing et al., 2005).

As stated above, the only true co-dominant retrotransposon marker system is RBIP. But even here, the independent detection of the occupied and empty sites is really the detection of two cosegregating dominant alleles linked in repulsion. RBIP requires appreciable investment to develop for a new species, and is difficult in cases in which most retrotransposon insertions are nested, because of the issue with flanking repetitious DNA mentioned above. In the one situation in which it has been fully tested, pea, 80 functioning RBIP markers were developed and applied to ca 3000 lines in 6 years.

One way to overcome the dominant nature of the other marker systems is to use genetically homozygous material. For mapping populations, this can be achieved using double-haploid or recombinant inbred lines. In principle, the same result could be achieved by mapping in haploid tissues, for example, in the endosperm of gymnosperms. Populations consisting of double-haploid lines have been used in a wide variety of mapping efforts, using a diversity of marker systems after the early publications on their advantages (Choo, 1981; Dunn et al., 1991). The efficacy of double-haploid populations for the mapping of retrotransposon markers, and in the mapping of genes with retrotransposon markers, has been well established (Waugh et al., 1997; Manninen et al., 2000). The availability of methods for gametophytic embryogenesis goes hand in hand with effective deployment of retrotransposon markers in mapping efforts.

In species in which wide crosses are used to introduce novel traits, dominant markers developed from the wild species are useful in subsequent backcrosses needed to generate material for further breeding (Messmer et al., 1999). Wide crosses are similar, regarding the introgressed chromosome segment, to the genomic constitution of allopolyploid species, such as groundnut, in which the two constituent genomes have different retroelement populations. In this case, dominant markers corresponding to each genome behave in a co-dominant manner (Bertioli et al., 2009).

Application of retrotransposons to the analysis of plant populations

Quantitative genome diversification

Retrotransposons are known to be activated by biotic stresses such as pathogen attack (Grandbastien et al., 1997) and abiotic stress such as drought (Kalendar et al., 2000) as well as by tissue culture (Hirochika et al., 1996) and genome methylation status (Kaeppler et al., 2000; Liu and Wendel, 2000; Kubis et al., 2003). Retrotransposon transcriptional activation will lead to an increase in copy number and genome size if the newly transposed copies survive selection. These new copies will contribute to insertion site polymorphisms that can be detected using the methods described above. However, because of practical limitations on amplifying and showing every genomic copy, generally only a subset of the retrotransposon insertions can be surveyed by any given marker method. Quantitative methods offer an overall view of the effect of retrotransposon insertion on the genome by estimating total copy number. The accumulation of non-excising retrotransposon insertions seems to be a process that would lead to an interminable increase in retrotransposon number. However, insertions are lost from genomes by recombination and from populations by genetic drift or active selection. If the rates of gain and loss of TE insertions are balanced, then copy number will be stable (The International Brachypodium Initiative, 2010).

Measurement of retrotransposon copy number at first was based on Southern hybridization and its variants, dot and slot blotting (Vershinin et al., 1990; Pearce et al., 1996; Vicient et al., 1999; Kalendar et al., 2000, 2004). However, hybridization-based quantification is sensitive to small nuances in the protocol and to the affinity between a probe and its target sequence. Several factors need to be carefully controlled, including DNA loading, DNA hybridization efficiency, probe concentration and labeling and the linearity of signal detection, to avoid under- or overestimation in the number of copies.

An alternative to blotting is real-time PCR, an approach that allows quantification of PCR products during the exponential phase of amplification. In plants, the first time real-time PCR method was successfully used by Soleimani et al. (2006) for quantification of BARE1 in five cultivars of barley (Hordeum vulgare) and two accessions of its wild relative, Hordeum spontaneum. Examining the LTR and reverse transcriptase domains revealed significant differences in BARE1 copy number among cultivars, providing further evidence that BARE1 is active and has a major role in shaping the barley genome as a result of breeding and selection. More than 90% of the elements were truncated (solo LTRs), suggesting high levels of recombination. Intracultivar variation in BARE1 content was found in this study to be statistically insignificant. A real-time PCR approach was adopted and optimized by Pagnotta et al. (2009) to estimate and compare the copy number of WIS2-1A and BARE1 retrotransposons in Triticum and Aegilops species. Great variation was detected in copy number both within and among species. A nonlinear relationship was found between the copy number of retrotransposons and ploidy level. Both these studies provide significant insights into the microevolution of the repetitive part of genome.

Analysis of plant population structure

It has been well established that various TE families have been active in different temporal periods through evolution. This means that TE marker systems based on different TEs can show different levels of resolution and can be chosen to fit with the required analysis (Pearce et al., 2000; Leigh et al., 2003; Schulman and Kalendar, 2005; Teo et al., 2005; Antonius-Klemola et al., 2006; Vukich et al., 2009). Recently active retrotransposons with a short half-life for the persistence of individual genomic insertions are best suited for comparisons of breeding lines, whereas older families with slower turnover are better for analyses at the species and genus levels. We have described earlier, in the section ‘Development of retrotransposon marker projects’, how to go about identifying currently or recently active retrotransposon families.

On the genus level, examination of the SSAP insertion patterns of eight retrotransposon families in ten Vitis accessions show that these families are present across the Vitis genus and only a few insertion sites are fixed in all accessions, which seem to have been maintained during speciation (Moisy et al., 2008). Most of the scored bands are polymorphic, indicating that these families have been active after speciation across the genus. IRAP and REMAP have been used to track genome evolution in sawgrass (Spartina spp.) after hybridization (Baumel et al., 2002), to analyze the genome constitution in various banana (Musa) polyploids (Teo et al., 2005), and to examine genomic stability in Helianthus (Vukich et al., 2009).

All of the available retrotransposon marker systems have been used to examine genome evolution within species. Various combinations of SSAP, IRAP and REMAP has been used to measure diversity, similarity and cladistic relationships within Pisum (Pearce et al., 2000; Smỳkal, 2006), Hordeum (Kalendar et al., 2000; Vicient et al., 2001), Citrus (Bretó et al., 2001), Malus (Antonius-Klemola et al., 2006), Oryza (Branco et al., 2007), Cucumis (Lou and Chen, 2007) and Aegilops (Nagy et al., 2006; Saeidi et al., 2008). The marker systems have also been applied to plant pathogens, such as the rice blast pathogen (Magnaporthe grisea spp.; Chadha and Gopalakrishna, 2005). The locus-specific RBIP method has been used to analyze evolutionary history in Pisum (Flavell et al., 1998; Vershinin et al., 2003; Jing et al., 2005, 2010) and Oryza (Vitte et al., 2004).

Retrotransposon insertional polymorphism is sufficiently great to support not only analyses on the whole-genome level within species, but also gene mapping projects within the generally narrower germplasm of cultivated varieties (Queen et al., 2004). The anonymous methods (SSAP, IRAP and REMAP) have been used for variety fingerprinting and to develop genetic maps in barley (Waugh et al., 1997), pea (Ellis et al., 1998), bread wheat and its wild relatives (Gribbon et al., 1999), oat (Yu and Wise, 2000), Medicago sativa L. (Porceddu et al., 2002), tomato (Tam et al., 2005), apple (Venturi et al., 2006), globe artichoke (Lanteri et al., 2006) and lettuce (Syed et al., 2006). These methods have also been applied in gene mapping projects in barley (Manninen et al., 2000), oat (Tanhuanpää et al., 2006, 2007) and in the D-genome of bread wheat (Aegilops tauschii Coss.; Boyko et al., 2002).

Conclusions

Markers based on LTR retrotransposons, in one or other of the manifestations described above, often generically referred to as ‘transposon display,’ have come of age since their introduction over 13 years ago. At least 99 studies using these marker systems had been published by the end of 2009, covering the gamut from cereals and grasses to cashew and coconut, tomato and pepper, multiple legumes species, fungi, birds and insects. The applications range from investigations of retrotransposon activation and mobility to studies of biodiversity and genome evolution, to the mapping of genes and the estimation of genetic distance, to assessment of essential derivation of varieties and detection of somaclonal variation and to food traceability and purity. Because LTR retrotransposons (or their relatives, the endogenous retroviruses) are ubiquitous, these methods are generic.

Similar approaches have been applied to the non-LTR retrotransposons in the plants and animals, in particular to the SINE elements (Cheng et al., 2002, 2003; Prieto et al., 2005). The insertion pattern of Alu, a SINE and the most prevalent transposable element in the human genome has been especially useful. It has not only served as a tool in many studies of human population structure and origins (Batzer et al., 1994; Watkins et al., 2003), but has also been linked to various heritable diseases (Deininger and Batzer, 1999; Jurka, 2004). In other mammals, the SINEs have served to determine the relationship of whales to even-toed ungulates (Shimamura et al., 1997) and, for plants, to clarify the relationships between wild rice species (Cheng et al., 2003).

Recently, commercial platforms for SNP detection (for example, Illumina, San Diego, CA, USA) have been developed and have garnered much popularity for major crops, domestic animals and humans. Development of SNPs depends on having abundant sequence data. The costs of acquiring these data, as well as of applying commercial assays, represent a barrier for research on underfunded tropical crops and wild species. Furthermore, evolutionary studies with SNPs are affected by the problems of homoplasy in SNP state, the lack of neutrality of genic markers and the uneven chromosomal distribution of the highly expressed genes that are used to generate SNPs. Although genetic analysis by shotgun sequencing remains a tantalizing possibility, the cost is still prohibitive. For these reasons, cheap, generic, easily applied retrotransposon marker systems will remain a viable choice for genetic markers for the foreseeable future.