Transposon-Mediated Horizontal Transfer of the Host-Specific Virulence Protein ToxA between Three Fungal Wheat Pathogens

This work dissects the tripartite horizontal transfer of ToxA, a gene that has a direct negative impact on global wheat yields. Defining the extent of horizontally transferred DNA is important because it can provide clues to the mechanisms that facilitate HGT. Our analysis of ToxA and its surrounding 14 kb suggests that this gene was horizontally transferred in two independent events, with one event likely facilitated by a type II DNA transposon. These horizontal transfer events are now in various processes of decay in each species due to the repeated insertion of new transposons and subsequent rounds of targeted mutation by a fungal genome defense mechanism known as repeat induced point mutation. This work highlights the role that HGT plays in the evolution of host adaptation in eukaryotic pathogens. It also increases the growing body of evidence indicating that transposons facilitate adaptive HGT events between fungi present in similar environments and hosts.

H orizontal gene transfer (HGT) is a mechanism whereby DNA from unrelated organisms is transferred between the organisms in a non-Mendelian fashion (1). In proteobacteria, HGT is thought to have occurred in over 75% of all protein families, making HGT one of the most important tools facilitating adaptation to stressful environments (2,3). This propensity to share DNA between species has been attributed to many human health issues, such as the rapid rise and spread of antibiotic resistance in hospitals (4,5). In eukaryotes, HGT was once thought to be a rare event and therefore not an important contributor to environmental adaptation. However, numerous studies have now shown that HGT between eukaryotes plays a very important role in adaptation, especially in the case of microbes that colonize a common host (6)(7)(8).
Among eukaryotic microbes, fungi are often used for kingdom-wide studies of adaptation, due to their relatively small genome size, importance in human and plant disease, and applications in food and biotechnology (6,8,9). Domesticated fungi, particularly those used in food production, are now being used as model organisms to understand the genetic basis of adaptation (10)(11)(12). On an evolutionary time scale, these organisms have been subjected to a short but intense period of selection, which has had dramatic effects on their preferred carbon and nitrogen sources, secondary metabolite production, and many other physiological traits (12,13). One emerging theme from these studies is that organisms which are common contaminants of the food-making process are often donors of the genes that provide fitness advantages in these specialized environments. The reported HGT events are extensive and involve tens of thousands of bases of DNA, which remain over 90% identical between very distantly related species (10,11). These HGTs contain both coding and noncoding regions which are stably integrated into the core nuclear genomes of the recipient species (10,11). While the original reports suggested that these regions were important for adaptation to the domestic environment, the fitness advantage conferred by these genes had to be demonstrated in follow-up studies performed with knockout strains (13,14).
Rapid adaptation via HGT is not restricted to domesticated species, but there exist very few described instances where the horizontally transferred DNA is integrated into the core nuclear genome and remains highly identical outside coding regions. One standout example is the virulence gene ToxA and the surrounding 11 to 12 kb, which to date has been reported in three fungal wheat pathogens: Parastagonospora nodorum, Pyrenophora tritici-repentis, and Biopolaris sorokiniana (13)(14)(15)(16). While all three species belong to the same fungal order, the Pleosporales, they are distant relatives, with several million years separating their speciation (15,16). Similar to the domesticated fungi discussed above, this HGT event is hypothesized to be extremely recent, as the average pairwise nucleotide identity across this 12 kb region remains greater than 92% (17). The ToxA gene itself remains identical between P. tritici-repentis and B. sorokiniana and only three nucleotides different from between B. sorokiniana and P. nodorum (17). The fitness advantage that ToxA confers has been demonstrated experimentally, whereby the presence of ToxA in a fungal isolate leads to faster development of necrotic lesions on wheat leaves (17,18). This virulence function is genotype specific, as ToxA causes necrosis only on wheat lines that carry the susceptibility gene called Tsn1 (19)(20)(21). In the absence of Tsn1, all three fungal species can still infect wheat due to the presence of other virulence genes (17,19,20).
Though ToxA confers a strong fitness advantage, this HGT event is not a fixed insertion and persists in all three pathogen populations as a presence/absence polymorphism (17,(21)(22)(23). The size of this presence/absence polymorphism has yet to be fully characterized. The frequency of ToxA in different field populations around the world also varies dramatically, ranging from 6% to 97% in different pathogen field populations (22). The selective forces that increase the frequency of ToxA in some fungal populations and decrease it in others remain unknown. Results of studies that examined whether there was a positive correlation between the frequency of ToxA in fields planted with Tsn1 (susceptible) wheat cultivars were inconclusive (24). For ease of reading, here we use the notation ToxA ϩ for isolates that contain the gene and toxa Ϫ for isolates that do not carry the gene.
Despite detailed knowledge on the molecular function of ToxA and its prevalence in fungal pathogen populations throughout the world, we still do not know the origins of this important virulence gene or the mechanisms that facilitated its transfer and stable integration into the genomes of these three pathogen species. There is clear evidence that ToxA is embedded in an AT-rich, repeat-dense region of the genome in all three species. AT richness in these portions of the genome is driven by a fungus-specific genome defense process known as the repeat induced point mutation (RIP) (25). RIP targets repeated sequences as small as 155 bp, mutating C:G to T:A, which introduces early stop codons in repeated DNA sequences (25)(26)(27)(28). This mechanism is hypothesized to have evolved in some phyla of fungi to stop the spread of transposons or other self-copying elements within their genomes (26).
ToxA and its highly conserved flanking DNA provide a unique opportunity to dissect the integration of horizontally transferred DNA into the nuclear genomes of three fungal pathogens. To define the location and extent of each HGT event, we used long-read DNA sequencing to generate near-complete genome assemblies for several representatives from two of the three species, in addition to several other published assemblies (29,30). We performed extensive de novo annotation of the repeat families in all three fungal species and manually annotated the region surrounding the ToxA gene. These assemblies and repeat annotations resolve the genomic context in which the virulence gene is located and provide insights into potential mechanisms of HGT as well as into the history of horizontal transfer events.

RESULTS
Long-read sequencing reveals a conserved type II DNA transposon. The genomic location of ToxA has been best described in P. tritici-repentis, where two long-read assemblies place this gene in the middle of chromosome 06 (supercontig1.4) (23,31). Several long-read assemblies have also been generated for ToxA ϩ P. nodorum isolates Sn4 and Sn2000, where ToxA is found on chromosome 08 in both isolates (29). In addition to these publicly available assemblies, we sequenced ToxA ϩ isolates P. nodorum SN15 and B. sorokiniana CS10 (original isolate name, BRIP10943) with seven PacBio SMRT cells each, resulting in approximately 500,000 reads with average read lengths of 10.6 kb and 9.4 kb, respectively. In addition to the two SMRT assemblies, we resequenced an additional four isolates with the Oxford Nanopore MinION. This included two toxa Ϫ isolates, P. nodorum isolate Sn79-1087 and B. sorokiniana isolate CS27 (original isolate name, BRIP27492a), as well as two additional ToxA ϩ B. sorokiniana isolates, WAI2406 and WAI2411. All isolates were de novo assembled using long-read data only. Short-read Illumina data were used to "polish" the Nanopore de novo assemblies of CS27 and Sn79-1087. A complete list of all isolates used in this study and assembly quality indicators is provided in Table 1. Genome assembly accession num- bers and additional information about the isolates are provided at https://github.com/ megancamilla/Transposon-Mediated-transfer-of-ToxA/tree/master/S1_GenomeStats. B. sorokiniana chromosomes were ordered from largest to smallest and named based on the PacBio assembly of isolate CS10. Our P. nodorum contigs were named based on synteny alignments to the assemblies published recently by Richards et al. (29). Assembly quality was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool, which identifies fragmented, duplicated, and missing genes from de novo assemblies (32,33). The score reported in Table 1 is the percentage of complete genes found in a set of 1,313 BUSCO genes from the Ascomycota odb9 data set. The number of complete genes is used as a proxy to estimate total-genome completeness (32). The assembly completeness scores were much lower for Nanoporeonly assemblies in isolates WAI2406 and WAI2411, where no short-read data were available for genome correction. For isolates CS27 and Sn79-1087, available short-read data allowed correction of the assemblies, such that the proportion of complete BUSCO genes found exceeded 90%. Both PacBio assemblies, without short-read data, generated a BUSCO score above 98% (Table 1). For both the CS10 and SN15 PacBio assemblies, the 6-bp telomeric repeat (TAACCC) was found on the ends of several contigs (summarized in Table 1). We could not identify telomeric repeats in the Nanopore assemblies for the B. sorokiniana isolates. However, we were able to identify many contigs with telomeric repeats for the assembly of P. nodorum isolate Sn79-1087 (Table 1).
We have previously reported the presence of Ͼ92% identical 12 kb region shared between all three species (17). This aligns with the original publication from Friesen et al. (20), in which an 11 kb element was reported to be conserved between P. triticirepentis and P. nodorum. In both P. nodorum SN15 and B. sorokiniana CS10, the chromosomes that contain ToxA were assembled completely, with telomeric repeats on both ends. A self-alignment of this region in B. sorokiniana CS10 revealed intact, terminal inverted repeats (TIRs) separated by 14.3 kb (Fig. 1A). TIRs are structural features of type II DNA transposons of the order "TIR," which are required for excision by transposases (34). These TIRs were not identified in previous studies (17,20), which we explore further below. The aligned TIRs consist of 74 bp and are ϳ92% identical (Fig. 1B). Here, we refer to ToxA and the accompanying noncoding and coding DNA enclosed within these TIRs as "ToxhAT." This name reflects the historical association of ToxA with the neighboring type II hAT-like transposase gene (20).
We annotated the coding regions within ToxhAT in B. sorokiniana isolate CS10 with both long-read and short-read RNA sequencing (RNA-seq) data (see Fig. S1 in the supplemental material). This annotation plus the self-aligned sequence revealed eight genes, three inverted repeats (IRs), and two additional internal TIRs (Fig. 1C). Three annotated genes, CS10_08.708, CS10_08.709, and CS10_08.714, contain no known protein-coding domains. Excluding ToxA, the remaining four genes had conserved domains, as identified by NCBI's conserved domain database search tool (Fig. 1C). One gene contained a major facilitator superfamily (MFS) domain (NCBI accession no. cl28910), which in yeast was shown to be a proton-coupled transporter of dipeptides and tripeptides (35). The largest coding DNA sequence (CDS) within ToxhAT contained two known protein domains, with a Patatin domain at the N terminus (accession no. cd07216) followed by tetratricopeptide repeats (TPR; pfam13424) at the C terminus. In fungi, proteins that contain these domains are recognized as members of the NOD-like receptor (NLR) family (36). Only a limited number of these proteins have been functionally studied in fungi, but they are considered to be broadly involved in selfrecognition and immunity (36,37). The fourth gene was flanked by its own set of TIRs and contained a helix-turn-helix (HTH) DNA binding domain (accession no. cl04999). This structure indicated that this CDS likely represents a nested type II transposase within ToxhAT (Fig. 1C). This indicates that ToxhAT is a composite of at least two DNA transposons. Fragments of the eight open reading frames (ORFs) are also found in the other two species, P. nodorum and P. tritici-repentis (Fig. S2); however, the 3= end of ToxhAT is invaded by unique sequences in these two species (Fig. S2).
Using B. sorokiniana as a guide, we were able to identify remnants of the ToxhAT TIRs in both P. tritici-repentis and P. nodorum. The 5= TIR in P. tritici-repentis remains largely intact, whereas the 3= TIR is enriched in the C-to-T and G-to-A transitions characteristic of RIP (Fig. S2) (Fig. S2). Despite these additional insertions, the TIRs identified in B. sorokiniana were found to be present, though heavily mutated, in the other two species, indicating a common evolutionary origin for ToxhAT. ToxhAT was transferred in two independent HGT events. Whole-chromosome alignments (WCA) between the ToxhAT-containing chromosomes of P. nodorum and P. tritici-repentis revealed DNA with Ͼ70% sequence identity beyond the boundaries set by ToxhAT TIRs ( Fig. 2A; see also Fig. S2). Pairwise alignments of these regions revealed that almost all polymorphisms were RIP-like (Fig. S3). Excluding the RIP-like mutations, the sequence identity between the two species is nearly 100%. Ten genes annotated in P. tritici-repentis 1C-BFP, PTRG-04890 to PTRG-04909, are all present in P. nodorum SN15 upstream of the 5= ToxhAT TIR. In P. nodorum, however, each of these was a pseudogene due to RIP and therefore has not been annotated in any assembly (Fig. S3) (21,29). Furthermore, 5 of these 10 genes, PTRG-04891 to PTRG-04895, are duplicated and found in inverse orientation within the P. nodorum SN15 assembly (Fig. 2B, blue boxes). In P. tritici-repentis 1C-BFP, these 10 genes are on a contiguous piece of DNA that extends 61.2 kb upstream of the ToxhAT 5= TIR. The total length of nearly identical sequence shared between these two species is ϳ80 kb, which includes 61.2 kb upstream and 1.7 kb downstream of ToxhAT. In P. nodorum SN15, the 61.2 kb shared with P. tritici-repentis 1C-BFP is present but highly fragmented across chromosome 08, spanning nearly ϳ370 kb (Fig. 2B). These data demonstrate that a specific HGT event occurred between P. tritici-repentis and P. nodorum that included ToxhAT and a large segment of surrounding DNA.
WCA between P. tritici-repentis and B. sorokiniana also revealed ϳ30 kb outside ToxhAT that was ϳ50% identical and partially overlapped the DNAs shared between P. tritici-repentis and P. nodorum ( Fig. 2A [inclusive of genes PTRG-04892 to PTRG-04900]). This indicated that a region outside ToxhAT was potentially also horizontally transferred between B. sorokiniana and P. tritici-repentis. However, a pairwise alignment of this region showed no evidence of extensive RIP-like mutations that could account for the pairwise nucleotide differences observed between these two species (Fig. S4). Furthermore, we could identify the same region in toxa Ϫ isolate B. sorokiniana ND90Pr as well as in other closely related Bipolaris spp. (Fig. S5). We conclude that this region was not part of a horizontal transfer of ToxhAT into B. sorokiniana and instead represents a region of synteny between the two species.
Individual components of ToxhAT are found in other Pleosporales. After careful manual annotation of ToxhAT, we conducted full de novo repeat prediction and annotation with the REPET pipeline (38,39). The total proportion of each genome annotated as repeats is shown in Fig. S6. The nonredundant repeat library generated by REPET included the manually annotated ToxhAT transposon from B. sorokiniana CS10, named DTX-comp_CS10_RS_00, and a second nearly full-length version from P. nodorum, named DTX-incomp-chim_SN2000-L-B14-Map1. These two sequences were used to identify all instances of ToxhAT within each genome listed in Table 1 and in P. tritici-repentis 1C-BFP. A total of 195 instances of ToxhAT were annotated in these seven isolates, 183 (ϳ94%) of which we were able to successfully align to the CS10 ToxhAT sequence (Fig. S7). This alignment showed distinct areas within ToxhAT that were found in high abundance within these genomes, primarily overlapping CS10_08.708, CS10_08.709, and the Patatin-like gene (Fig. S7). A summary of the total number of identified ToxhAT instances is presented in Table 2. These data show that most of the annotations represent fragments, with the median length ranging from 176 to 295 bp, representing approximately 1% of the total length of ToxhAT. The large number of partial ToxhAT annotations in toxa Ϫ isolates suggested that some regions may be repetitive elements, independent of ToxhAT. To investigate this further, we performed tBLASTn queries using the NCBInr database and the Dothideomycetes genomes available at JGI MycoCosm. In both searches, the hAT transposase, MFS transporter, and Patatin-TPR genes had over 500 partial hits with an E value of less than 1eϪ10 ( Fig. S8A and B). Within the JGI Dothideomycetes database search, CS10_08.708 had 139 hits, CS10_08.709 had 85 hits, and the Tc1 transposase had 263 hits (E value, Ͻ1eϪ10). The small gene CS10_08.714 had the lowest number of hits, with only four in total across both databases (excluding known instances of ToxhAT). Hits for the ToxA gene itself represented mostly distant (Ͻ50% identical) homologs, previously described as ToxA-like or ToxA* in various Bipolaris spp. (40). A short summary of the top hits from both database searches is presented in Table 3.
The hits with the highest identity were in a single Alternaria alternata strain, NBRC 8984 (GenBank accession no. AB525198.1), for two contiguous open reading frames, CS10_08.708 and CS10_08.709. The pairwise identity for these two genes exceeded 90%, and the genes are colocalized in A. alternata NBRC 8984. This strain was also the highest-identity hit in NCBI for the hAT-transposase. A. alternata is a well-known plant pathogen that has a broad host range and is also a member of the Pleosporales. In the JGI database, a fungal isolate collected from leaf litter, Didymella exigua CBS 183.55 v1.0, also produced colocalized hits for CS10_08.708 and CS10_08.709 with identity greater than 85%, indicating that these two predicted genes may represent a single repetitive unit. This species is also classified in the Pleosporales (41). The third species identified with Ͼ90% sequence identity for the hAT transposase was Decospora gaudefroyi, again classified in the Pleosporales as a salt-tolerant marine fungus (42). Overall, the large number and breadth of hits across different fungal species confirmed our hypothesis that the individual coding regions of ToxhAT are part of repetitive elements broadly found within the Pleosporales. This indicates a common evolutionary origin of these repetitive coding regions within this fungal Order.
The presence/absence polymorphism of ToxA is much larger than the extent of HGT. We showed above that the extent of shared DNA differ between different pairwise comparisons of the three species harboring ToxhAT. This does not address the unknown size of the presence/absence polymorphism maintained in these species. To investigate this, the homologous toxa Ϫ and ToxA ϩ chromosomes were aligned (Fig. 3). For P. tritici-repentis, where no long-read data for a toxa Ϫ isolate were available, short reads from isolate DW5 were aligned against the assembly of P. tritici-repentis 1C-BFP. The large spike in coverage for DW5 within the deleted region corresponds to a 7 kb TIR transposon, most likely from the Tc1-Mariner superfamily, which is found near ToxhAT (DTX-incomp-chim_Ptr-L-B62-Map1_reversed). Both of the chromosomes containing ToxA in isolates P. nodorum SN15 and B. sorokiniana CS10 contain telomeric repeats and are complete chromosomes. This shows that the absence polymorphism spans several thousand kilobases in all three species (Fig. 3). Using the last known homologous regions from the whole-chromosome alignments, the absence alleles in B. sorokiniana CS27, P. nodorum Sn79-1087, and P. tritici-repentis DW5 were estimated to span ϳ239 kb, ϳ467 kb, and ϳ150 kb, respectively.
To compare the size of the presence/absence polymorphism of ToxA with the sizes of two other well characterized necrotrophic effectors, we examined the location of SnTox3 and SnTox1 in P. nodorum SN15 (43). These two effectors also exist as a presence/absence polymorphism in this species but have no known history of HGT. SnTox3 is the last annotated gene on chromosome 11 in isolate SN15, and the absence polymorphism approximately corresponds to the 7 kb tail of this chromosome. This absence encompasses annotated SN15 genes SNOG_08981 (Tox3) to SNOG-08984 (Fig. S9A). The end of chromosome 11 in the assembly of Sn79-1087 (which lacks SnTox3) contains telomeric repeats (data not shown), which suggests that the absence of the 7 kb is not due to an incompletely assembled chromosome. The absence allele of SnTox1 is even smaller, spanning ϳ3 kb on chromosome 6 of SN15. At this locus, there is a unique insertion of ϳ1.3 kb which is present only in Sn79-1087 (Fig. S9B).  These data demonstrate that the absence allele of the horizontally transferred ToxA gene is much larger than absence alleles of other known effectors and highlight potential genome instability after HGT events. Evidence of mobility of ToxhAT in B. sorokiniana. The finding of intact TIRs in B. sorokiniana CS10 suggested that ToxhAT in this species may remain mobile. To investigate this, we resequenced two additional ToxA ϩ isolates of B. sorokiniana (WAI2406 and WAI2411). ToxhAT was found in different genomic locations in the two genomes compared to isolate CS10, where ToxhAT is located near the end of chromosome 08 (Fig. 4A). For WAI2406, ToxhAT and the surrounding ϳ200 kb of repeat-rich DNA were found imbedded in the middle of chromosome 01 ( Fig. 4A to C). This has led to an increase in the size of WAI2406's chromosome 01, which is ϳ4.0 Mbp compared to CS10's PacBio assembled chromosome 01 size of ϳ3.8 Mbp. To confirm that this translocation was not a misassembly, we aligned the corrected Nanopore reads from Canu to both the CS10 and WAI2406 assemblies ( Fig. 4B and C). These reads aligned, with a slope of 1, to chromosome 01 in isolate WAI2406, with single reads clearly spanning the breakpoints on both sides of the translocation. These same reads from isolate WAI2406 did not align well to chromosome 08 in isolate CS10.
For isolate WAI2411, ToxhAT assembled to a small contig (ϳ776 kb) that has homology to chromosome 02 and chromosome 08 of the PacBio assembled CS10 (data not shown). While it remains unclear if this small scaffold is a part of chromosome 02 or chromosome 08, the flanking DNA on either side of ToxhAT was conserved but shuffled in order (Fig. 4D). ToxhAT was found to be inverted and in a different position from that in chromosome 08 in CS10. The breakpoints of the inversion were precisely from TIR to TIR of ToxhAT. Again, we aligned the corrected nanopore reads to isolate WAI2411 (self) and isolate CS10. The self-alignment showed reads that clearly crossed the breakpoints of the inversion/transposition (Fig. 4E); however, these same reads did not map contiguously to the CS10 assembly (Fig. 4F). As a secondary confirmation, we used BLASTn to identify all reads that contained ToxA and generated a multiplesequence alignment (Fig. S10). Twenty-nine single-molecule reads were aligned that spanned ToxhAT and continued into both flanking regions. These flanking regions aligned well to WAI2411's contig 17, confirming the inversion of ToxhAT precisely at the TIRs (Fig. 4E). We postulated that this inversion could represent an active transposition event, in which case there might be signature target-site duplications (TSDs) made by the transposase. However, neither this inversion nor any other instance of ToxhAT in any sequenced isolate was flanked by identifiable target-site duplications.

DISCUSSION
As shown in previous studies, the ToxA gene and surrounding noncoding DNA are highly conserved between these three species (17,20,30). We extend this knowledge by defining flanking TIRs that give this region the structural features of a type II DNA transposon, defined here as ToxhAT. These TIRs and the enclosed 14.3 kb are conserved in all three fungal species, indicating that ToxhAT had a common evolutionary origin prior to HGT. The finding of homologous DNA shared between P. nodorum and P. tritici-repentis outside these TIRs indicates that the HGT event between these two species did not involve ToxhAT alone but included ϳ63 kb of flanking DNA. In contrast, homology between these two species and B. sorokiniana breaks precisely at the ToxhAT   TIRs. Within B. sorokiniana, there is strong evidence that ToxhAT and the surrounding repeat-rich DNA remain mobile within the genome. This finding contrasts with two previously published long-read assemblies of P. nodorum and P. tritici-repentis, which show ToxA and the surrounding horizontally transferred (HT) DNA on the same chromosome (23,29,30). The individual coding regions found within ToxhAT appear to be part of repetitive elements in other Dothideomycetes. However, the breadth of hits gives no indication of whether any single species could have assembled ToxhAT as a unit before horizontal transfer between these three wheat pathogens.
ToxhAT is structured like a type II DNA transposon, but mobility may be facilitated by recombination. By identifying conserved TIRs in all three pathogens, we describe a unit of DNA that has the structural features of a type II DNA TIR transposon. In B. sorokiniana isolate WAI2411, ToxhAT itself is inverted precisely at the TIRs. This inversion bounded precisely by these structural features suggested that this putative transposon may remain active. However, extensive searches of the flanking DNA in all three species did not reveal TSDs typical of other TIR transposases (44,45). TSDs are created by a transposase when it cuts at its target site, usually a short sequence ranging in length from 2 to 11 bases (34). Most transposases make an uneven cut which, after DNA repair, leads to a duplication of the target site on either side of the inserted transposon (34). The absence of TSDs in WAI2411 suggests that the inversion seen in isolate WAI2411 was not facilitated by an active transposition event. An alternative mechanism that could explain the inversion is intrachromosomal recombination between these structural features (25). In the absence of TSDs, we consider this mechanism a strong alternative to explain the movement of ToxhAT in WAI2411.
While WAI2411 showed a relatively small genomic rearrangement, the other resequenced B. sorokiniana isolate, WAI2406, contained a large segmental movement of 200 kb from one chromosome to another. Similar interchromosomal translocations were observed previously in the fungal plant pathogens Verticillium dahliae, Magnaporthe oryzae, and Colletotrichum higginsianum (46)(47)(48). While all three studies demonstrated that translocations occur in regions where transposons are prevalent, only the study by Faino et al. (47) in the asexual species V. dahliae was able to show homologous transposons/DNA sequence at the translocation breakpoints. These breakpoints in V. dahliae implicate homologous recombination as the mechanism underpinning genome plasticity in this species (47). In M. oryzae and C. higginsianum, transposons are associated with translocation events, but it remains unclear if these translocations occurred over regions with high sequence identity (46,48). Similarly, in B. sorokiniana, whole-chromosome alignment of chromosome 08 and chromosome 01 in isolate WAI2406 did not show high sequence identity in the regions outside the translocation breakpoints on either chromosome. However, the regions near these breakpoints are enriched in transposon annotations. A key difference between the three fungal species examined in this study and Verticillium is their sexual life cycle. Meiotic recombination could obscure the precise translocation boundaries and is also a prerequisite for RIP. We postulate that in sexual fungal species, AT-rich regions, with otherwise limited sequence identity, may undergo "homologous" interchromosomal recombination events like those observed in B. sorokiniana. Supporting this hypothesis is a recent study on Epichloë festucue that leveraged Hi-C data to build a contact map of DNA within the nucleus (49). The Hi-C sequencing technique shows the frequency at which different regions of a genome interact with each other in the nucleus (50). For example, DNA fragments that are physically close to each other, on the same chromosome, have a higher frequency of interaction than sequences on different chromosomes (49). A study by Winter et al. showed that AT-rich regions, usually in the form of heavily RIPed transposon islands, on different chromosomes had significantly higher interactions with each other than non-AT-rich regions (49). These data suggest that in sexual fungi, where RIP is active, AT-rich islands are associated with each other in three-dimensional (3D) space. We postulate that AT-rich regions in B. sorokiniana, like those in E. festucue, also associate with each other within the nucleus, which provides the physical proximity required for interchromosomal recombination events.
The opportunity to examine interchromosomal transfer of ToxhAT extends beyond B. sorokiniana. Chromosomal movement of the ToxA gene was also observed in P. tritici-repentis (51). In that study, pulsed-field gel electrophoresis followed by Southern hybridization was used to show that ToxA was found on chromosomes of different sizes in P. tritici-repentis. Going further, the authors showed that, in at least one isolate, ToxA was on a chromosome different from the ToxA location in reference isolate 1C-BFP (51). While that study probed for the ToxA gene alone, we consider it likely that ToxhAT or a larger chromosomal segment was translocated, similarly to what was observed in B. sorokiniana. Further long-read assembly coupled with Hi-C data from both P. triticirepentis and B. sorokiniana, ideally from a sexual cross of two previously sequenced isolates, is required to systematically reconstruct the level of sequence identity or other genome features that facilitate interchromosomal translocations.
ToxhAT resides in an accessory genomic region in all three species. The analysis of the syntenic relationship between ToxA ϩ and toxa Ϫ chromosomes within each species showed that the absence of ToxhAT is coincident with the absence of large (Ͼ100 kb) chromosomal segments. The DNA composition of these regions fits well within the definition of the "lineage-specific" or "accessory" regions described in pathogenic fungi, where virulence genes are found nested within transposon-rich regions of the genome (52)(53)(54)(55). This genome structure is hypothesized to facilitate the rapid adaptation of fungal pathogenicity genes, often referred to as the "two-speed" or "two-compartment" genome (56,57). These data further show that the absence polymorphism does not coincide with the exact boundaries of HGT. This is most clearly seen in P. nodorum, where fragments of the horizontally transferred DNA are scattered across a 300 kb region. However, the entire region, extending well beyond the horizontally transferred fragments, is missing from toxa Ϫ isolate Sn79-1087. This leads to an interesting issue concerning whether the horizontal acquisition of ToxhAT precipitated the expansion of transposons within this region. Our data for SnTox3 and SnTox1 in P. nodorum suggest that this may be the case, whereby the absence alleles for these well-characterized effectors span only a few kilobases. Unfortunately, similar comparisons were not possible in B. sorokiniana due to a lack of known effectors or in P. tritici-repentis, where the only other known effector, ToxB, is present in multiple copies in the genome (58). Intrachromosomal recombination is also a possible mechanism to generate these large absence polymorphisms. In many model organisms, such as Drosophila, yeast, and human cell lines, large segmental deletions were previously found to be facilitated by ectopic recombination between tandemly arrayed repeat sequences (59,60). This mechanism is particularly interesting in the context of HGT, as these recombination events often result in the formation of circular extrachromosomal DNA (61,62).
The origins of ToxhAT remain obscure. Since the discovery of ToxA in the genome of P. nodorum, the evolutionary origin of this gene has been a topic of debate (20,23,63,64). To date, P. nodorum remains the species with the highest known ToxA sequence diversity. This diversity underpins the prevailing hypothesis that ToxA has had the most time to accumulate mutations and therefore has resided in the genome of this organism longest (20,63). The discovery of ToxA in B. sorokiniana and the characterization of the conserved 74 bp TIRs in all three species strongly suggest that ToxhAT had a single evolutionary origin in all three species. The ToxhAT TIRs in B. sorokiniana define the exact boundaries of the HGT event, where sequence identity with the other two species abruptly ends. In contrast, the HGT between P. tritici-repentis and P. nodorum included ToxhAT and 63 kb of flanking DNA. In P. tritici-repentis 1C-BFP, this flanking DNA remains contiguous; in P. nodorum, however, this same DNA is fragmented and partially duplicated across a 370 kb island of RIPed transposable elements (TEs). Together, these data suggest that ToxhAT was horizontally transferred in two separate events both with and without flanking DNA.
We propose two opposing models to explain the HGT of ToxhAT between the three species. In the first model, we assume a population genetic perspective where the longer the DNA has resided in the genome, the more fragmented and dispersed the HT DNA becomes. In this model, P. nodorum would be the donor of ToxhAT along with the flanking DNA to P. tritici-repentis. This is based on our observation that the flanking DNA outside ToxhAT is highly fragmented and duplicated in P. nodorum, indicating this species has had more time to accumulate these changes. This model assumes that the P. nodorum donor isolate once had a contiguous stretch of DNA as observed in P. tritici-repentis. In this model, we also postulate that ToxhAT was most recently acquired by B. sorokiniana due to its relatively compact form and conservation of structural features. Again, for this model to hold, we must assume that an intact form of ToxhAT exists in the population of P. nodorum or P. tritici-repentis that could act as a donor to B. sorokiniana. Our second model assumes that changes can accumulate more rapidly in transposon-rich regions and therefore sequence diversity in these regions is not a good indication of evolutionary time. In this model, the intact version of ToxhAT observed in B. sorokiniana would represent an ancestral version. This is the minimal unit of HT DNA, bounded by conserved TIRs. In this model, we propose that the first HGT event was from B. sorokiniana to P. tritici-repentis, based on the identical ToxA sequences that they share. Then, in a second HGT event, P. tritici-repentis would have been the donor of the large segment of DNA inclusive of ToxhAT to P. nodorum. The HT DNA flanking ToxhAT was then subject to rapid decay and duplication in P. nodorum, due to its proximity to transposable elements. This model does not provide a good explanation of why the rapid decay is not also observed in the other two species.
In order to differentiate between these two models, we suggest long-read sequencing across the broadest possible sample times and locations for both ToxA ϩ and toxa Ϫ isolates in all three species. The first goal of this sequencing would be to systematically characterize the sequence diversity of ToxhAT variants all three species. The presence of any "intact" ToxhAT sequences coupled with overall higher ToxhAT sequence diversity in isolates of P. nodorum and P. tritici-repentis would support the first model. Second, we suggest searching the sequence of toxa Ϫ isolates for the HT DNA flanking ToxhAT in P. nodorum and P. tritici-repentis. For example, if this region were present in a toxa Ϫ P. nodorum isolate, this would identify P. nodorum as the donor of ToxhAT and its flanking DNA to P. tritici-repentis. Determining the direction of transfer between B. sorokiniana and the other two species is more challenging, given the overall high sequence conservation of ToxhAT. Support for the model where B. sorokiniana harbors the ancestral ToxhAT could come from sequence data that show that ToxhAT in this species has a level of sequence diversity similar to that seen with the other two species.
While we have presented two models above that describe the history of HGT between these species alone, results of the BLAST searches conducted on the coding regions annotated in B. sorokiniana indicate that there are other species which may harbor highly identical components of ToxhAT. One standout isolate is A. alternata strain NBRC8984, which carries two genes that are 90% and 95% identical to CS10_08.708 and CS10_08.709, respectively. Similarly to their arrangement in ToxhAT, these two coding sequences also neighbor each other in A. alternata NBRC8984. These hits within ToxhAT are by far the most similar sequences found in a species that is not reported to carry ToxA. These high-identity hits also included some nonpathogenic and marine species also found within the Pleosporales. Collectively, the coding regions within ToxhAT had hundreds of hits across species representing several hundred million years of evolution. However, none of these coding regions have been characterized as repetitive or classified in a transposon family. Despite the ancient evolutionary history of transposons, the vast majority of described DNA transposons with TIRs are classified into only 10 superfamilies (34,65). Our detailed characterization of ToxhAT highlights an opportunity to characterize novel transposon superfamilies in nonmodel fungi.
Towards a mechanism; flanking noncoding DNA provides clues. The tBLASTn results, coupled with a detailed structural characterization of ToxhAT, suggest that it is a mosaic of repetitive coding regions. We propose that ToxhAT was transferred horizontally as, or by, a transposon, with the fitness advantage of ToxA fixing these HGT events in three wheat-infecting species. Similarly, the horizontally transferred regions in the cheese-making Penicillium spp. were flanked by unusual i non-LTR retrotransposons (13). In the present study, it remained unclear whether ectopic recombination, rather than active transposition, is the mechanism that integrates the HT DNA into the recipient genome. Horizontal transfer of transposons (HTT) has been widely reported in eukaryotes since the early discovery of P elements in Drosophila (66,67). The literature on this topic, however, seems to clearly divide HGT from HTT as two separate phenomena, the latter being much more common (68)(69)(70). Recent reports of the HTT between insects has used noncoding regions flanking horizontally transferred genes to demonstrate that a viral parasite, with a broad insect host range, is the vector for the horizontally transferred DNA (71). This report highlights how insights from noncoding regions can bring these studies closer to a mechanistic understanding of the HGT event (71,72). Other studies which report HGT of secondary metabolite clusters into and between fungal species often rely on phylogenetic methods performed using coding regions alone to detect these events (73)(74)(75)(76). While these studies focus on the biological significance of the coding regions, clues to a possible mechanism may remain in the surrounding noncoding DNA.
One limitation encountered with early Illumina-based genome assemblies was the inability to correctly assemble highly repetitive regions. Here we demonstrated with two long-read sequencing technologies that it is possible to assemble very large repetitive regions heavily affected by RIP. These assemblies allow us to look at the noncoding and transposon-rich regions that may be important for the movement or integration of HT DNA. Further population-scale long-read sequencing will enable further refinement of our understanding of the role that transposons play in facilitating adaptive gene transfer.

MATERIALS AND METHODS
Fungal culture and DNA extraction. Fungal cultures were grown on V8-potato dextrose agar (PDA) media at 22°C under conditions of a 12-h light/dark cycle (77). Cultures ranging in age from 5 to 10 days were scraped from the surface of agar plates into water by the use of a sterile razor blade. These harvested cultures were freeze-dried for 48 h to remove all water. High-molecular-weight (HMW) DNA was extracted using a C-TAB phenol-chloroform method modified from Fulton et al. (1995) (78). Full details of our protocol, including gel images of the final DNA, are available at https://doi.org/10.17504/ protocols.io.hadb2a6. DNA size was assessed by pulsed-field electrophoresis and DNA purity by examining 260/280 and 230/280 UV absorbance ratios on a NanoDrop spectrophotometer (Thermo Scientific, USA). The total quantity of DNA was measured using a Qubit fluorometer (Life Technologies, USA).
(ii) Nanopore DNA sequencing. Resequencing of B. sorokiniana isolates CS27 (BRIP27), WAI2406, and WAI2411 and P. nodorum isolate Sn79-1087 was performed on Oxford Nanopore's MinION sequencer. R9.4 flow cells were used for sequencing, and a SQK-LSK08 1D library kit was used to prepare the libraries according to the manufacturer's protocol. All DNA samples were purified using Agencourt AMPure beads prior to starting the 1D library preparation (Beckman Coulter, Inc., CA, USA). Genomes were assembled with Canu v1.5 or v1.6 with a minimum read length of 5 kb (79). De novo genome assemblies were corrected using the trimmed reads output from Canu. Trimmed reads were mapped to the genome with Minimap2 followed by correction with Racon (81). The output consensus sequence from Racon was used as the input for additional corrections steps performed iteratively up to five times. Sn79-1087 and CS27 assemblies were further refined using Pilon software; this correction was performed Transposon-Mediated Horizontal Transfer of ToxA ® iteratively up to five times (82). Sn79-1087 Illumina data were taken from Syme et al. (21) and BRIP27 Illumina from McDonald et al. (17).
Long-read RNA-seq data were also generated using Oxford Nanopore's MinION sequencer. Total RNA extracted from mycelia cultured in Fries 3 medium was enriched for mRNA by the use of NEBNext oligo(dT) magnetic beads and concentrated by the use of Agencourt RNAClean XP magnetic beads (Beckman Coulter, Inc., CA, USA). RNA-seq and cDNA-PCR libraries were generated directly with SQK-RNA001 and SQK-PCS108 library preparation kits, respectively, and sequenced with R9.4 flow cells. Reads were base called with Albacore v2.0.2, quality filtered with Nanofilt, and subjected to error correction using the CS10 genome sequence with Proovread (default settings) (86). Error-corrected reads were filtered for reads corresponding to full-length transcripts using SQANTI (run with default settings) (87).
Transcripts and aligned Illumina RNA-seq reads from all culture conditions were pooled for gene prediction. Pooled transcripts were used for gene prediction with CodingQuarry v2.0 (self-training Pathogen mode; default parameters) (91). Aligned reads and protein sequences from P. nodorum were used as evidence for gene prediction by BRAKER v2.0 (nondefault parameters --fungus, --prg gth) and GeMoMa v4.33 (default parameters) (92,93). Gene predictions were combined with a nonredundant (nr) set of reviewed fungal Uniprot protein sequences and high-quality StringTie transcripts using EVidenc-eModeler (EVM), which generated a weighted consensus of all predicted gene models (Haas et al., 2008) (94). Evidence sources were weighted accordingly as follows: CodingQuarry Ͼ BRAKER Ն GeMoMa Ͼ StringTie transcripts Ͼ nr fungal proteins. An assembly of MinION RNA-seq reads concordant with EVM gene models and StringTie transcripts was generated using PASA; this was used to update the EVM models (e.g., correcting intron-exon boundaries) and to annotate untranscribed regions (UTRs) (94). The completeness of these gene models was assessed using the BUSCO ascomycete database (32,33). Complete gene annotations for B. sorokiniana isolates CS10 and CS27 are available in File S2 at https://github.com/megancamilla/Transposon-Mediated-transfer-of-ToxA/tree/master/S2_Bipolaris_Gff3. Annotation of transposons. Transposons were identified de novo using the TEdenovo pipeline distributed as part of the REPET package v2.5 (38,39). Long-read assemblies from the following species and isolates were used for de novo discovery: P. nodorum isolates Sn2000 and Sn4, assembled by Richards et al. (29), and SN15 (this study); P. tritici-repentis 1-C-BFP; and B. sorokiniana isolate CS10. Repbase v20.05 was used as the reference transposon database. TEs from each genome were combined into a common database according to the parameters set in Tedenovo_six_dnLibTEs_90_92.cfg. After combining the TEs, we manually added the coordinates of ToxhAT and named this TE "DTX-comp_CS10_RS_00." Finally, TEs were annotated in each genome listed in Table 1 with the common TE database using the TEannot pipeline and settings available in the TEannot.cfg file available at https://github.com/megancamilla/ Transposon-Mediated-transfer-of-ToxA. Transposons were automatically classified into three-letter codes based on the Wicker nomenclature (34,95). (REPET pipeline configuration and run scripts, plus the repeat annotation files for each genome used in this study, can be found online in File S3 at https://github .com/megancamilla/Transposon-Mediated-transfer-of-ToxA/tree/master/S3_REPET_Files. Inverted repeats and TIRs flanking the ToxA gene and surrounding DNA were identified in Geneious with the Dotplot (Self) viewing tool, which is based on the EMBOSS 6.5.7 tool dotmatcher (http://emboss .sourceforge.net/apps/release/6.5/emboss/apps/dotmatcher.html) (96). The specific settings required to reproduce the line plot shown in Fig. 1 are as follows: Reverse complementϭyes, Score Matrixϭexact, window size ϭ 100, Threshold ϭ 75, and Tile Size ϭ 1000.