Variable genome evolution in fungi after transposon-mediated amplification of a housekeeping gene

Background Transposable elements (TEs) can be key drivers of evolution, but the mechanisms and scope of how they impact gene and genome function are largely unknown. Previous analyses revealed that TE-mediated gene amplifications can have variable effects on fungal genomes, from inactivation of function to production of multiple active copies. For example, a DNA methyltransferase gene in the wheat pathogen Zymoseptoria tritici (synonym Mycosphaerella graminicola) was amplified to tens of copies, all of which were inactivated by Repeat-Induced Point mutation (RIP) including the original, resulting in loss of cytosine methylation. In another wheat pathogen, Pyrenophora tritici-repentis, a histone H3 gene was amplified to tens of copies with little evidence of RIP, leading to many potentially active copies. To further test the effects of transposon-aided gene amplifications on genome evolution and architecture, the repetitive fraction of the significantly expanded genome of the banana pathogen, Pseudocercospora fijiensis, was analyzed in greater detail. Results These analyses identified a housekeeping gene, histone H3, which was captured and amplified to hundreds of copies by a hAT DNA transposon, all of which were inactivated by RIP, except for the original. In P. fijiensis the original H3 gene probably was not protected from RIP, but most likely was maintained intact due to strong purifying selection. Comparative analyses revealed that a similar event occurred in five additional genomes representing the fungal genera Cercospora, Pseudocercospora and Sphaerulina. Conclusions These results indicate that the interplay of TEs and RIP can result in different and unpredictable fates of amplified genes, with variable effects on gene and genome evolution. Electronic supplementary material The online version of this article (10.1186/s13100-019-0177-0) contains supplementary material, which is available to authorized users.


Background
Transposable elements (TEs) or mobile genetic elements are nucleic acid entities that can move in a genome. TEs have been detected in the genomes of both prokaryotic and eukaryotic organisms [1], and have been rightly labeled as 'drivers of genome evolution' [2] due to their direct and indirect impacts on genes and genomes. Several lines of evidence point to their pivotal role in important processes across the tree of life. For example, Alu elements, at more than a million copies comprising 11% of the human genome, are a major contributor to primate genome evolution and the standing genetic diversity in human populations [3]. Some major effects of Aluelement amplifications include alterations of gene expression from insertions near gene promotors, insertional mutagenesis and repeat-mediated nonhomologous recombination that can lead to disease, and 'exonization' of Alu elements yielding alternative splicing of transcripts. All of these likely have played an important role in the evolution of humans and other primates [3].
TEs are major contributors to genetic diversity in populations and have been linked to major phenotypic changes in plant morphology. Natural and artificial selection can act on this variation to favor new morphotypes. For example, the transition in plant architecture from highly branched teosinte, the ancestor of modern corn, to apically dominant corn is controlled in part by a quantitative trait locus (QTL), tb1 [4], whose expression is modulated by Hopscotch, a retrotransposon enhancer located 65 kb upstream of the gene [5]. This change in morphology predates the start of agriculture (10,000 years ago) and provided early agriculturalists with existing variation that could be selected from within populations. Similarly, a change in tomato fruit shape from round to elongate was initiated by a retrotransposon-mediated gene duplication of the SUN locus. This rearrangement introduced new upstream cis elements, which increased the expression of SUN, thereby causing the change in morphology [6].
In fungal and oomycete plant pathogens, besides modulating genome size (e.g., Blumeria graminis [7], Pseudocercospora fijiensis; [8]) and genome architecture (Leptosphaeria maculans; [9]), TEs also can be associated with plant-pathogen interactions through modulation of effectors and secondary metabolite gene clusters. TE-rich genomic islands in expanded fungal (P. fijiensis, L. maculans) and oomycete (Phytophthora infestans) genomes carry genes that code for lineage-specific, putative small, secreted proteins. In the barley powdery mildew pathogen, B. graminis f. sp. hordei, the amplification and diversification of an avirulence gene, AVR k1 has been attributed to a Long Interspersed Nuclear Element (LINE) TE [10]. Recently, it was shown that the sequence for this gene family of avirulence effectors was derived from the LINE TE [11]. Other fungal genome components, such as telomeres in the rice pathogen Magnaporthe oryzae AVR-Pita [12] and lineage-specific chromosomes in Fusarium oxysporum [13] also are enriched in pathogenicity factors and TEs. Recently, TEs have been shown to be implicated in gain and/or loss of host-specific effector genes in M. oryzae [14].
Universal mechanisms exist that can minimize the deleterious impacts of TEs on host genomes. Posttranscriptional silencing and DNA methylation are two primary methods that limit the activity of TEs in genomes. The genetic network employed to silence the TEs also can be context dependent. In germline cells, piRNA, a specific class of small, non-coding RNA, is responsible for the epigenetic and post-transcriptional silencing of TEs [15]. Other genome-defense mechanisms are unique to specific organisms, e.g., Repeat-Induced Point mutation or RIP [16] has been described only in fungi. The RIP machinery can recognize repetitive sequences that are approximately 400 bp or longer with identity of 80% or more, and introduce random transition (cytosine to thymine) mutations during each meiotic cycle in Neurospora crassa [17][18][19] and many other fungi. These mutations generate premature stop codons within TE-encoded genes that prevent translation of the proteins required for movement, thus rendering the transposon immobile. This abundance of transition mutations also skews the GC content of the sequence and makes it possible to identify signatures of RIP [20], as well as to predict the original sequence prior to RIP (deRIP) [21] in silico.
A side effect of RIP is that the required machinery does not discriminate between functional genes and TEs; any sequence in a genome that is repetitive can be targeted, which occasionally causes unexpected effects. For example, a single-copy DNA methyltransferase (DNMT) gene in the wheat pathogen Zymoseptoria tritici (previously known as Mycosphaerella graminicola) was amplified to 23 copies and became a target for RIP. All of the DNMT sequences in the genome, including the original copy, were inactivated by RIP-introduced transition mutations [22]. A genome-wide assay for cytosine methylation revealed that it was lacking in the Z. tritici genome [22], but present in close relatives that possessed an intact copy of the DNMT gene. Those species are thought to have diverged from Z. tritici within the past 10,000 years [23], hence this change appears to be very recent. In another wheat pathogen, Pyrenophora tritici-repentis (Ptr), the histone H3 gene was captured as part of a hAT DNA transposon and amplified to 26 copies in the genome [24]. Acquisition of a partial or complete copy of a gene between the termini of a DNA transposon and its subsequent amplification in a genome is termed 'transduplication'. However, in contrast to Z. tritici, 23 of the transduplicated histone H3 copies in the Ptr genome appeared to code for a functional protein [24], potentially yielding multiple active copies of the gene. These two fungi are in different taxonomic orders of the class Dothideomycetes, and demonstrate that the fates of repeated sequences can vary, with different and unpredictable effects on gene and genome evolution.
The two examples described above define the extremes of the possible outcomes of TE-mediated gene amplification events in fungi with RIP, where either all the copies, including the original, can be inactivated leading to loss of gene function, or almost none of the copies being affected by RIP leading to multiple functional genes. To test whether similar gene amplifications are common in the Dothideomycetes, a genome-wide search was conducted in multiple sequenced species to quantify the prevalence of such events, and to investigate whether TE-associated gene amplifications occur commonly with large effects on gene and genome evolution. These analyses identified an amplification event in a fungal clade (Cercospora / Pseudocercospora / Sphaerulina) that fits between the spectrum of events bounded by the two extremes described above. In this newly described amplification event, the original gene was maintained, presumably due to selection, whereas all the amplified copies were targeted and inactivated by RIP, thus yielding very different outcomes from three similar gene amplifications in fungi.

Results
Repeats carrying histone H3-like sequence occur exclusively in AT-rich blocks Analysis of the repetitive fraction of the P. fijiensis expanded genome [8] revealed an abundance of histone H3-like sequences. A similarity search using the original histone H3 protein sequence revealed a total of 784 H3like copies that were found exclusively in repetitive regions across 28 scaffolds, with the original histone H3 gene located on scaffold 6 ( Fig. 1). All H3-like sequences were identified in one repeat family, with a total of 1579 members, many of which were incomplete but overlapping and could be merged into 920 contigs, of which 471 copies contained H3-like sequences that accounted for 4.1% (3 Mb) of the P. fijiensis genome. Each repetitive element contained one or two copies of the H3-like sequences giving 784 copies in total. All these repeat elements were compartmentalized in the AT-rich blocks [8] that were identified previously in the P. fijiensis genome.
Most (90%) of the histone H3-like sequences were truncated, i.e., had lengths that ranged between 71 and 100 amino acids (AAs), which is 52-73% of the original histone H3 protein length of 136 AAs (Fig. 2). When two histone H3-like copies were present within the same element they had lengths that differed by an insertion of 16 AAs in the second copy, which was not present in the original copy on scaffold 6 and presumably arose from a mutation that occurred prior to amplification.
A complete hAT DNA transposon carries the histone H3like sequence Annotation of the repetitive sequences flanking the histone H3-like copies identified a hAT transposase domain, the hallmark of hAT DNA transposable elements [25]. The hAT domain was present in 277 (59%) of the aforementioned 471 repetitive elements that contained the histone H3-like sequence. Based on element length distribution, a group of 133 repetitive elements, with lengths ranging from 9.5 to 9.9 kb, was defined as the full-length set (Fig. 3). As compared to the full-length elements, 39% of the repeat elements were considered truncated, i.e., they contained less than 50% of the fulllength element (Fig. 3). Both the merged repeat dataset (920 sequences) and the full-length subset (133 repeats) were used to assess the dinucleotide bias introduced by RIP. A clear CpA to TpA dinucleotide bias was observed suggesting the presence of RIP in the P. fijiensis genome (Figs. 4 and 5). The full-length repeat set was then utilized to search for the structural features of DNA transposons, such as terminal inverted repeats (TIRs) and target site duplications (TSDs). In addition to element length and presence of hAT domain, the occurrence of intact TIRs and identical TSDs was used to define a repeat subset of 99 complete repetitive elements (average percent identity 96.5%) with all of these characteristics. The nucleotide composition of the 20-bp TIR sequence was well conserved across the 99-repeat set (Fig. 6). Two sites in the TIR displayed the characteristic transition mutations and CpA to TpA dinucleotide bias introduced by RIP. The TSD was 8 bp in length, which is a characteristic feature of hAT DNA transposons [25]. No bias in insertion site was identified based on analysis of the TSD nucleotide compositions (Fig. 6) of the 99 complete repetitive elements.

Transduplication of histone H3 coding sequence into a hAT DNA transposon
Co-occurrence of the histone H3-like sequence with the hAT DNA transposon element was investigated to test whether the complete, genomic histone gene or the transcribed coding sequence was acquired. The original genomic copy of the histone H3 gene in P. fijiensis has three exons (two introns) (Fig. 7a), with the coding  Length distribution of histone H3 gene copies in the Pseudocercospora fijiensis genome. Length distribution of 784 histone H3-like copies in the P. fijiensis genome showed that most were truncated. Only 11 (1.4%) of the sequences were near the full length (i.e., > 90% of the full-length sequence), as compared to the original histone H3 protein of 136 amino acids Fig. 3 Histogram of the lengths of repetitive elements in the Pseudocercospora fijiensis genome that carry histone H3-like sequences. A total of 471 repetitive elements carry the H3-like sequences. Full-length repetitive elements (133) are~9.5 kb in length (black bar marked by an asterisk) and were used to identify the terminal inverted repeat (TIR) sequence sequence being 411 nucleotides (136 AAs) in length (Fig. 7b). Most (94%) of the H3-like copies in the genome had similarity only to the third exon of the histone H3 gene and there were few H3-like sequences spanning one (n = 23, 4%) and two (n = 15, 2%) exon-exon junctions ( Fig. 7c), while a search using the histone HMM profile only generated partial matches to the third exon in the full-length repetitive element dataset (Fig. 7d). However, in the consensus sequence generated from in silico deconvolution of the RIP-introduced transition mutations, a single exon-exon junction could be recovered (Fig. 7e). Moreover, a near full-length histone H3 protein sequence containing the two exon-exon junctions without the intron, could be resolved when the deRIP consensus sequence was edited manually to remove all stop codons (Fig. 7f). The presence of histone exon-exon junctions and the absence of any intron sequence in the duplicated copies suggest that a histone H3 transcript or retrocopy, rather than a genomic copy, was captured by the hAT DNA transposon.
The functional histone H3 gene may carry signatures of RIP A consensus derived from ten H3-like sequences and its deRIP version was used to test whether RIP affected the original histone H3 coding sequence. These ten histone H3-like sequences spanned both the exon-exon junctions and covered at least 90% of the query sequence. The original H3 coding sequence had a GC content of 60%, whereas it ranged from 38 to 40% for the H3-like copies. The H3-like sequences were more similar to each other (89-98% identity) than they were to the original histone H3 coding sequence (51-55% identity). A higher proportion of transition (Ti) mutations was seen in the H3-like sequences across a range of sites that were explored. The number of Ti mutations at variable as well as zero-, two-and four-fold degenerate sites was higher in the H3-like sequences as compared to the original histone H3 gene ( Fig. 8; Table 1). The direction of change for the Ti mutations (C > T, G > A) from the genomic histone H3 to repetitive H3-like sequence and vice versa was also evaluated. This analysis showed that the original histone H3 sequence also accumulated substitution mutations, even though the H3-like sequences had at least a 2x higher number of Ti mutations across all the classes of sites evaluated in both the deRIP and RIP consensus sequences ( Table 1). As any substitution at a zero-fold degenerate site is non-synonymous, there were 13 (5.6%) sites that may have been changed in the original histone H3 sequence (Table 1) of P. fijiensis. However, the overall analysis showed that the histone H3 gene is under strong purifying selection with dN/dS ratio of 0.01429.
Another method to estimate the effect of RIP is to calculate the transition/transversion (Ti/Tv) ratio. Comparison of the original histone H3 coding sequences from the six fungal species with the transduplication event and the closest outgroup species, D. septosporum (Dse), lacking the histone H3 amplification ( Fig. 9) showed that for P. fijiensis the Ti/Tv ratio was greater than 2 (Additional file 1: Table S1). This observation also suggests that the original histone H3 sequence from P. fijiensis was impacted by RIP. One effect of RIP damage is a decrease in the GC content as was observed in the six Dothideomycetes genomes where it ranged from 58.8-60.8% (Additional file 1: Table S1) as opposed to the outgroup Dse-H3 sequence that had a higher GC content (62.0%). The histone H3 coding sequence showing the exon-exon boundaries, vertical black lines on the salmon-pink bar. The numbers at the exon-exon junction correspond to the amino acids at the beginning of exon2, exon3 and the last amino acid on exon3. Presence of exon-exon boundaries was used as an indication of transcribed coding sequence captured by the hAT DNA transposable elements. c Initially, tBLASTn identified the histone H3 copies in the hAT DNA transposon. d The initial search was refined by using histone hmm profiles with HMMER. The original sequences had very poor H3 domain hits due to numerous changes caused by repeat-induced point mutation (RIP). e The identification of H3-like sequence in deRIPped elements was substantially better. f However, the best result, i.e., all exon-exon boundaries were evident, was obtained when the deRIPped sequences were curated manually by removing the remaining stop codons that resulted from RIP. These protein alignments have been drawn to scale. The two horizontal bars in each panel represent the results from the two histone H3-like domains in the hAT element Fig. 8 Multiple sequence alignment shows the absence of introns in the histone H3-like sequences in the hAT elements. Ten copies of deRIPped histone H3-like sequences and their consensus were aligned to the original histone H3 gene and CDS sequence. Nucleotide sequences were aligned based on the protein alignment. There are two columns in the alignment with the gap symbol (−) that represent the exon-exon junction, and corresponding intron sequence is shown in lower case in the red box Occurrence of histone H3 capture across the Dothideomycetes phylogeny In addition to P. fijiensis, the histone H3 transduplication was identified in the genomes of five other species, viz., Cercospora zeae-maydis, Pseudocercospora eumusae, P. musae, Sphaerulina musiva and S. populicola among 12 Dothideomycetes genomes available in the family Mycosphaerellaceae (see Methods). Based on the phylogeny, it appears that a single histone H3 amplification event occurred prior to the split of the Pseudocercospora/Cercospora/Septoria clade from the other members of this family (Fig. 9), which was estimated previously to have taken place approximately 100 Mya [26]. In addition to the 784 H3-like copies in P. fijiensis, a total of 242, 135, 520, 186 and 160 copies of H3-like sequences were identified in the repetitive fractions of the C. zeae-maydis, P. eumusae, P. musae, S. musiva and S. populicola genomes, respectively. As with P. fijiensis, a clear CA↔TA dinucleotide bias was observed in the repetitive elements carrying the histone H3-like sequences among these other fungal genomes. All of the extra copies contained premature stop codons due to RIP that would inactivate their function, except for a single, presumed original which contained an intact reading frame. Due to the fragmented nature of repeats that carry the histone H3-like sequences, the hAT domain could be identified only in C. zeae-maydis, P. eumusae and S. musiva, whereas TIR and TSD sequences could only be identified in the C. zeae-maydis genome (Fig. 10).
A phylogeny-based approach was used to understand the relationship between hAT DNA transposons as well as the captured histone H3-like sequences among the seven Dothideomycetes genomes. The amplified histone H3-like sequences derived from RIP-damaged sequences grouped within species boundaries and clustered away  from the original histone H3 protein sequences (Fig. 11).
The putatively functional copies of histone H3-like sequences from P. tritici-repentis grouped closest to the original histone protein sequences from the 18 species (Fig. 11). Comparison of the consensus nucleotide sequences for the hAT elements (Additional file 2: All_ hAT_consensus.txt) from the six genomes, on the other hand, did not show any similarity to the P. triticirepentis hAT element sequence. Although a stretch of up to ten genes adjacent to the original histone H3 gene was syntenic between the P. fijiensis genome and those of the other five species, no synteny was found in the genomic regions around any of the histone H3-like copies. As expected, a stretch of nine or ten genes (including the original histone H3) was syntenic and collinear between P. fijiensis and the two closely related banana pathogens P. eumusae and P. musae, respectively, whereas the Cercospora and Sphaerulina genomes had eight genes that were in mesosynteny [27] with the P. fijiensis genome (Fig. 12).

Discussion
Previous analyses have shown that amplification of genes or gene fragments can have huge effects on genome architecture and evolution. However, as far as we know this is the first analysis in which a housekeeping gene  has been amplified to a high copy number as part of a transposable element, yet all of the copies were inactivated, except for the original. This phenomenon resulted in the genome evolving to a much larger size due to the accumulation of numerous RIP-affected copies of inactivated gene fragments. Capture and amplification of a transcript of the housekeeping gene histone H3 as part of a hAT DNA (class II) transposon was identified in six of 12 genomes tested in the family Mycosphaerellaceae, order Capnodiales of the fungal class Dothideomycetes. In each species all copies were inactivated by RIP, except for the presumed original, leading to one active gene plus hundreds of RIP-inactivated copies scattered throughout the genome. Acquisition of a partial or complete copy of a gene between the termini of a DNA transposon and its subsequent amplification in a genome is termed 'transduplication'. A hAT TE-mediated transduplication of the histone H3 coding sequence was first documented in the wheat pathogen Pyrenophora tritici-repentis (Ptr) [24], another Dothideomycete in the order Pleosporales. The occurrence of multiple, putatively functional H3-like copies in the Ptr genome appears to be the result of a recent and independent event, as indicated by the paucity of mutations in the repetitive sequences and the presence of 12 identical copies in its genome [24]. RIP adds another layer of complexity to the possible outcomes of transposon-mediated gene captures and amplifications in fungi. With RIP, the rate at which the function of amplified genes is lost depends on several factors, including the number of codons amenable to RIP mutations, the frequency of sexual reproduction (because RIP only occurs during meiosis) and the efficacy of the RIP machinery [19], which can vary by species. Length and sequence identity are two additional factors that affect RIP efficiency. Length of the P. fijiensis histone H3 coding sequence (411 bp) was just above the Fig. 11 Relationships among the original histone H3 proteins and RIP-affected H3-like proteins from the seven Dothideomycetes genomes. The H3-like proteins form clades delineated by species, and cluster away from the original histone H3 sequences. Bootstrap values greater than 50 are shown. The three-letter prefixes Czm (shaded blue), Pfi (orange), Meu (yellow green), Pmu (red), Ptr (yellow), Smu (pink) and Spo (bright green), stand for Cercospora zeae-maydis, Pseudocercospora fijiensis, P. eumusae, P. musae, Pyrenophora tritici-repentis, Sphaerulina musiva and S. populicola, respectively, and individual sequences are indicated with numbers. * Original histone H3 copies from 18 genomes minimum cutoff (~400 bp) required for recognition by the RIP machinery. Moreover, being a part of the longer hAT element, the H3-like retrocopies were more prone to RIP damage; the original histone H3 gene could have avoided RIP damage as it did not have a contiguous match of 400 bp with the retrocopy.
In the absence of RIP, the redundant gene copies after every amplification are free to evolve under different, and possibly relaxed, constraints. Even though transduplications lack promoter sequences, moved gene fragments potentially can be expressed if inserted near regulatory elements to obtain new functions [28]. In plants, both transcription and translation of gene fragments transduced by Pack-MULEs, another type of class II transposon, have been demonstrated [29]. Besides the expression of processed pseudogenes, lack of RIP coupled with occurrence of multiple identical transcripts also could lead to post-transcriptional regulation of the original gene [30].
With the high frequency and efficiency of RIP in P. fijiensis, it seems highly unlikely that any of the duplicated histone H3 copies will contribute to future gene function Fig. 12 Organization of the histone H3 gene region in six Dothideomycetes genomes. A 10-kb sequence up-and down-stream of the histone H3 gene in the Pfi genome was used to do a pair-wise comparison with five other genomes: Pmu, Peu, Czm, Smu and Spo. The 20-kb region from Pfi is syntenic with Pmu and Peu, whereas genes from this region are dispersed across the scaffolds in the Czm, Smu and Spo genomes (mesosynteny as defined by [27]). The Pfi sequence was used as a reference and is shown on the X axis. Protein similarity between the two genomes is shown by the colored lines in the plots in this fungus. Instead, all of the copies appear to have become rapidly pseudogenized. Similarly, RIP-induced mutations in an avirulence gene from Leptosphaeria maculans have been linked to the breakdown of majorgene-mediated resistance in Brassica napus [31]. This contrasts with organisms that lack RIP, where duplicated sequences often contribute to the standing genetic variation. In addition to the Alu repeats in humans [3], many other instances of TE or gene amplifications are known in different animals and plants [32,33]. For example, novel transcripts and proteins generated by pack-MULEs in rice undergo purifying selection and are maintained in its genome [29].
Three hypotheses could explain the occurrence of a single, intact copy of the histone H3 gene in a sea of inactivated, partial, RIP-affected copies. The first is that the original gene was protected from RIP, possibly because the duplicated fragment was too small to trigger recognition of the original gene. The second is that the process of purifying selection could maintain protein homogeneity of the original histone H3. RIP introduces bi-directional changes in all repetitive sequences in a genome, i.e., while the duplicated TEs accumulate RIPintroduced mutations, similar changes also can occur in the original histone H3 gene. Following its amplification by hAT transposition, histone H3, a housekeeping gene that is under strong negative selection, would suddenly become subjected to numerous transition mutations. Repetitive sequences that have more than 80% similarity continue to be targeted by the RIP machinery during every meiotic cycle [34]. Several meiotic cycles would be required before all copies of the repetitive H3-like genes became sufficiently diverged to completely disengage the original histone H3 gene from RIP damage. During this time, the original histone H3 gene most likely was affected by RIP. One measure of assessing RIP-induced changes in the original P. fijiensis H3 gene, Ti/Tv > 2 (Additional file 1: Table S1), suggests that it was subject to RIP along with the copies. However, the strong purifying selection exerted on this gene appears to have eliminated any change that could affect its function.
The third hypothesis is that the original H3 sequence was protected from RIP due to its location in the genome. The Nucleolar Organizing Region (NOR), which is comprised of tandem arrays of rDNA repeats, is the only such region that is postulated to escape the effects of RIP in fungi. In the N. crassa genome,~175 copies of ribosomal DNA (rDNA) repeats that are located in the NOR do not show signatures of RIP, whereas rDNA repeats outside the NOR are susceptible to RIP [19]. However, in P. fijiensis, the original, functional histone H3 copy was located on scaffold 6, whereas the ribosomal DNA repeats and the putative NOR are present on scaffold 7. Therefore, a protected location of the original histone H3 gene is unlikely to explain its survival.
A typical hAT DNA transposon is~5 kb long but their length may vary from 110 bp (DEBOAT in Oryza sativa) to 7144 bp (Gulliver in Chlamydomonas reinhardatii) [35]. The average length for the intact hAT elements carrying the histone H3-like proteins in the P. fijiensis genome was~9.5 kb, although no other domains or structural features (direct or inverted repeats) could be identified. The transposase ORF in hAT elements typically contains two domains, one involved in dimerization and another of unknown function (DUF), DUF659. Both of these domains were present in an unRIPed hAT transposase ORF identified in P. fijiensis, but only DUF659 could be identified in the hAT family carrying histone H3-like proteins. Characterization of target site duplications (TSD) showed that hAT DNA transposons belonging to different families like Sleeping Beauty, piggyBac, Buster and Space Invaders show an insertion bias [36]. However, not all hAT DNA transposons have a target site preference, such as the families Rover and Roamer, which were identified in yeasts but lack a TSD preference [37].
Transposable elements can be horizontally transferred between species [38]. However, lack of sequence similarity between the hAT elements from the seven species does not support horizontal acquisition. However, the age of the transduplication event (Fig. 9) and the greatly accelerated accumulation of mutations due to RIP may be confounding possible relationships among the hAT elements found in the different species. Similarly, evidence suggests that, following the initial histone H3 amplification event, the histone H3-like sequences have continued to evolve independently within the lineages (Fig. 11).
Transduplication of the histone H3 gene in the Mycosphaerellaceae appears to be a relatively old event that most likely occurred in a common ancestor prior to the split of the Pseudocercospora and Cercospora/ Sphaerulina lineages about 100 Mya [26]. Subsequent to this divergence, the genomes of the Pseudocercospora clade may have experienced one or more repeatmediated expansions, resulting in a near doubling of their genomes compared to the average sizes of those from other Ascomycetes. The increased repetitive contents in P. fijiensis and P. musae are mirrored by the highest copy numbers of H3-like sequences in these two genomes. The cause of the relaxation of genome defense mechanisms that drove TE expansion in this clade is not known. Long periods of asexual reproduction could allow transposons to escape RIP, but this seems unlikely in P. fijiensis where the sexual stage is an integral part of the life cycle. Differences in copy numbers also may be affected slightly by the sequencing platform and the downstream assembly algorithms, leading to many poorly assembled copies of repetitive elements. However, this seems unlikely as the highest copy number of H3like sequences (784) is found in the most well assembled genome that was sequenced with relatively long-read Sanger technology, P. fijiensis (56 scaffolds, N50: 6 Mb), i.e., copy number is not a proxy for poorly assembled genomes. If there is a bias, it would be in under-reporting of histone H3-like copies in genomes assembled from short sequencing reads.
Within the class Dothideomycetes there now exist three examples of independent TE-mediated amplifications that resulted in different outcomes for gene function and genome evolution. In the wheat pathogen Z. tritici in the order Capnodiales, a single-copy DNA methyltransferase gene was amplified to more than 20 copies, most likely through capture of a transcript followed by exchange among telomeres [22]. All copies were inactivated by RIP, including the original, leading to a loss of cytosine methylation in many Z. tritici populations. This was postulated to be a recent event, as cytosine methylation was detected in very close relatives from wild grass hosts that are thought to have diverged within the past 10,000 years.
The second event was transduplication of the histone H3 gene in Ptr, in the order Pleosporales. This event also appeared to be very recent and involved capture of a transcript, but the amplification to tens of copies occurred through duplication and movement of a hAT transposon [24]. Here, RIP appears to be very inefficient or lacking, leaving multiple, potentially functional copies of the histone H3 gene.
The third event, initially identified in P. fijiensis, also involved capture of a histone H3 transcript by a hAT transposon, but unlike the other two examples appears to be very ancient, having originated in a common ancestor before the divergence of the Cercospora/Pseudocercospora/Sphaerulina clade from other species in the order Capnodiales. In this case, all of the copies have been heavily mutated and inactivated by RIP, except for the original, leading to a single functional copy of the histone H3 gene, and leaving repetitive regions that are graveyards of pseudogenized histone H3-like sequences. The original copy also most likely was affected by RIP, but not to the point of altering the reading frame in a way that would prevent function. This most likely reflects the action of purifying selection to maintain the essential function of the histone H3 protein.
The three cases of gene amplification in the Dothideomycetes had different effects on genome evolution. In Pyrenophora tritici-repentis, amplification of the histone H3 gene and low efficiency or fewer cycles of RIP led to many transcriptionally active copies [24]. In the case of Z. tritici, where the original copy of a DNA methyltransferase gene as well as the amplified copies were mutated by RIP, the protein is not required and the fungus clearly can survive without cytosine methylation; presumably other types of methylation can compensate for the loss of this function [22]. In P. fijiensis, retention of the original histone H3 gene could have occurred for two reasons: 1) it is essential and its function cannot be lost so that all sexual (i.e., post-RIP) progeny with the original histone H3 gene mutated to the point of inactivation will not survive; and 2) the part that overlapped with the gene was too small to be targeted by RIP, so it received fewer changes. Thus, the co-existence of gene amplification events and targeting of the RIP machinery to repetitive elements could lead to very different and unpredictable outcomes that impact both the function and evolution of fungal genomes.

Conclusions
Previous analyses of repetitive sequences in fungal genomes identified two cases where genes were amplified to many copies but had different outcomes. In the first, all copies of a DNA methyltransferase gene in the wheat pathogen Zymoseptoria tritici (synonym Mycosphaerella graminicola) including the original were inactivated by repeat-induced point mutation (RIP), a genome-defense mechanism specific to fungi, leading to a loss of cytosine methylation in that species. The second case involved a different wheat pathogen, Pyrenophora tritici-repentis, in which RIP effects are lower, where capture and amplification of a histone H3 gene in a hAT DNA transposon led to multiple putatively active copies. Here a third case is identified, in which parts of a histone H3 gene were amplified to hundreds of copies as part of a hAT transposon, but all of the copies were highly mutated and inactivated by RIP, except for the original, leading to a greatly expanded genome but no additional functional copies of the gene. In contrast to the first two examples, this third case appears to be relatively ancient, and function of the original gene most likely was retained by strong purifying selection in spite of likely damage from RIP. These results demonstrate the variable effects that gene amplifications can have on the structure and evolution of genomes. The final outcome depends on the interplay of multiple factors that cannot be predicted without a much better understanding of genome biology.

Identification of the histone H3 amplification event
During the characterization of the P. fijiensis repetitive fraction [8], a histone H3-like sequence was identified in six families of repeats. The consensus repeats from these families have overlapping ends, i.e., they represent one repeat family that could be merged into a single contig. The original histone H3 protein sequence was then used to search the P. fijiensis genome using tBLASTn [39] to determine copy number. A similar search was used to identify H3-like sequences in the repetitive elements, and copy number per element was determined.

Annotation of repeat families carrying the H3-like sequences
All of the repeat elements in the six families were annotated using TransposonPSI [40]. Repeat elements were aligned using ClustalW [41] and the alignment was curated manually before RIP analysis [20]. Overlapping repeat-element sequences were merged irrespective of family, as the family delineations by repeat-finding programs are arbitrary. A set of four criteria -element length, presence of hAT domain, Terminal Inverted Repeat (TIR) and Target Site Duplication (TSD) -was used to identify the full-length copies in the merged repeat-element dataset. RIP analysis was also repeated on the subset of full-length repeat elements, each of which contained two copies of a histone H3-like gene.

Histone H3-like sequence analysis
The full-length dataset was used to determine whether the genomic or coding histone H3 sequence was captured. Both of the H3-like copies present in each element were examined. Initially, the high-scoring segment pairs (HSP) resulting from the default tBLASTn output between the RIPped repeat elements and the original histone H3 protein were analyzed for the presence of exon-exon junctions. The multiple sequence alignment was deRIPped using the deRIP module in RIPCAL, which scans the alignment for polymorphic sites containing transition mutations (C/T or G/A) and reverses the effect of RIP. Manual curation was necessary to revert stop codons in coding sequences, as some sites were completely RIPped. To determine the directionality of the RIP mutations, near-full-length H3-like sequences present in the repeat elements were aligned to the complete, original histone H3 coding sequence using RevTrans v.1.4 [42] and this alignment was visualized using MEGA v.6.06 [43]. Additionally, histone HMM was used with HMMER [44] to check the RIPped, in silico deRIPped, and manually curated deRIPped repeat sequences.
The protein datasets from 18 genomes, except for Zar, Zce, and Zps were used for an all-versus-all BLAST. The BLAST output was analyzed using OrthoMCL [45] to identify 2220 one-to-one orthologous clusters (OrthoMCL inflation value of I = 1.5). These orthologous cluster sequences were then aligned using ClustalX [41]. For each alignment, conserved blocks were identified using Gblocks [46] at default settings. ProtTest [47] was used subsequently to identify the best model of protein evolution for each alignment using the Akaike Information Criterion (AIC). The protein alignments were then concatenated and used for generating a maximum likelihood (ML) species phylogeny using RAxML [48]. A similar ML phylogeny using the original Histone H3 protein sequences from these 18 genomes was generated using RAxML and the dN/dS ratio was determined using PAML [49].
The original histone H3 protein sequences from the 18 genomes mentioned above, along with H3-like sequences translated from RIP-affected hAT DNA transposons extracted from the six Dothideomycetes genomes with the transduplication event (Pfi, Czm, Peu, Pmu, Smu and Spo) were used to generate a ML phylogeny using RAxML [48] (100 bootstraps and PROTGAM-MAAUTO option to estimate the model of protein evolution). H3-like sequences from Ptr also were included. H3-like sequences that were at least 120 amino acids (~90% of the Pfi histone H3 protein length; 137aa) long were used. Protein sequences were aligned with ClustalX [41] and the alignment was manually curated to remove insertions present in H3-like protein sequences. After an initial phylogeny, two diverged sequences with extremely long branches, one each from Czm and Spo, were discarded, leaving a total of 183 sequences.
A stretch of 10 kb of the sequence up-and downstream of the original P. fijiensis histone H3 gene was used to determine the extent of synteny with the five most closely related genomes (Czm, Peu, Pmu, Smu and Spo). Additionally, a search for orthologous repetitive elements carrying the H3-like sequences also was conducted.