Complete genome of the Medicago anthracnose fungus, Colletotrichum destructivum, reveals a mini-chromosome-like region within a core chromosome

Abstract Colletotrichum destructivum (Cd) is a phytopathogenic fungus causing significant economic losses on forage legume crops (Medicago and Trifolium species) worldwide. To gain insights into the genetic basis of fungal virulence and host specificity, we sequenced the genome of an isolate from Medicago sativa using long-read (PacBio) technology. The resulting genome assembly has a total length of 51.7 Mb and comprises ten core chromosomes and two accessory chromosomes, all of which were sequenced from telomere to telomere. A total of 15, 631 gene models were predicted, including genes encoding potentially pathogenicity-related proteins such as candidate-secreted effectors (484), secondary metabolism key enzymes (110) and carbohydrate-active enzymes (619). Synteny analysis revealed extensive structural rearrangements in the genome of Cd relative to the closely related Brassicaceae pathogen, Colletotrichum higginsianum. In addition, a 1.2 Mb species-specific region was detected within the largest core chromosome of Cd that has all the characteristics of fungal accessory chromosomes (transposon-rich, gene-poor, distinct codon usage), providing evidence for exchange between these two genomic compartments. This region was also unique in having undergone extensive intra-chromosomal segmental duplications. Our findings provide insights into the evolution of accessory regions and possible mechanisms for generating genetic diversity in this asexual fungal pathogen.


Abstract
Colletotrichum destructivum (Cd) is a phytopathogenic fungus causing significant economic losses on forage legume crops (Medicago and Trifolium species) worldwide.To gain insights into the genetic basis of fungal virulence and host specificity, we sequenced the genome of an isolate from Medicago sativa using long-read (PacBio) technology.The resulting genome assembly has a total length of 51.7 Mb and comprises ten core chromosomes and two accessory chromosomes, all of which were sequenced from telomere to telomere.A total of 15, 631 gene models were predicted, including genes encoding potentially pathogenicity-related proteins such as candidate-secreted effectors (484), secondary metabolism key enzymes (110) and carbohydrate-active enzymes (619).Synteny analysis revealed extensive structural rearrangements in the genome of Cd relative to the closely related Brassicaceae pathogen, Colletotrichum higginsianum.In addition, a 1.2 Mb species-specific region was detected within the largest core chromosome of Cd that has all the characteristics of fungal accessory chromosomes (transposon-rich, gene-poor, distinct codon usage), providing evidence for exchange between these two genomic compartments.This region was also unique in having undergone extensive intra-chromosomal segmental duplications.Our findings provide insights into the evolution of accessory regions and possible mechanisms for generating genetic diversity in this asexual fungal pathogen.

Impact Statement
Colletotrichum is a large genus of fungal phytopathogens that cause major economic losses on a wide range of crop plants throughout the world.These pathogens vary widely in their host specificity and may have either broad or narrow host ranges.Here, we report the first complete genome of the alfalfa (Medicago sativa) pathogen, Colletotrichum destructivum, which will facilitate the genomic analysis of host adaptation and comparison with other members of the Destructivum species complex.We identified a species-specific 1.2 Mb region within chromosome 1 displaying all the hallmarks of fungal accessory chromosomes, which may have arisen through the integration of a mini-chromosome into a core chromosome and could be linked to

INTRODUCTION
The ascomycete fungal pathogen Colletotrichum destructivum causes anthracnose disease on lucerne (alfalfa, Medicago sativa) and Trifolium species and is responsible for significant economic losses on these forage legumes [1,2].Despite being isolated most frequently from members of the Fabaceae, C. destructivum has occasionally been recorded from genera of the Asteraceae (Helianthus, Crupina), Poaceae (Phragmites) and Polygonaceae (Rumex) [3,4].It has a worldwide distribution that includes the USA, Canada, Argentina, Italy, Netherlands, Greece, Serbia, Morocco, Saudi Arabia and Korea.C. destructivum is a haploid fungus with no known sexual stage [3].Previous reports of a sexual stage (Glomerella glycines) for soybean isolates of C. destructivum [5,6] were based on incorrect identification of the soybean pathogen, which was recently shown to be C. sojae [7].
Over the last decade, the application of multi-locus molecular phylogeny approaches has revealed that C. destructivum belongs to the Destructivum species complex, which contains 17 accepted taxa [3,8].All these plant pathogenic species show distinct host preferences, spanning phylogenetically diverse botanical families.An increasing number of species in the Destructivum complex have now been genome sequenced, namely, C. higginsianum [9,10], C. tanaceti [11], C. lentis [12] and C. shisoi [8], which cause disease on Brassicaceae, Tanacetum (Asteraceae), Lens (Fabaceae) and Perilla (Lamiaceae), respectively.The clade therefore provides excellent opportunities for comparative genomic studies on the genetic determinants of host adaptation.
The availability of complete genome sequences is crucial not only for the analysis of large gene clusters, such as secondary metabolism biosynthetic gene clusters (BGCs), but also for understanding fungal genome evolution.Complete or near-complete genome sequences have enabled the structure and dynamics of accessory mini-chromosomes to be analysed in several Colletotrichum species [9,13,14].The importance of mini-chromosomes for virulence on plant hosts has been demonstrated in several fungal pathogens including Fusarium oxysporum f.sp.lycopersici [15], Magnaporthe oryzae [16], C. lentis [12] and C. higginsianum [17].
Here, we present the complete genome sequence and gene annotation of C. destructivum strain LARS 709, hereafter called Cd709, based on long-read sequencing with PacBio single-molecule, real-time (SMRT) Sequel technology.The resulting high-quality chromosome-level assembly allowed us to perform comparative genomics with the close sister species, C. higginsianum, highlighting the gene content specificity and extensive genomic rearrangements.In particular, the genome showed evidence of multiple segmental duplications (SDs), as well as the likely integration of a mini-chromosome into one core chromosome.Although the origin of this integrated region remains to be determined, it displays all the hallmarks of fungal mini-chromosomes.We also show for the first time that C. destructivum is pathogenic and completes its life cycle, on the model plant Medicago truncatula, providing a new tractable pathosystem in which both partners have been genome sequenced.

Fungal and plant materials
The C. destructivum strains used in this study were originally isolated from M. sativa in Saudi Arabia (CBS 520.97,LARS 709) and Morocco (CBS 511.97,LARS 202) [2] and are hereafter called Cd709 and Cd202.The C. higginsianum strains used for comparative genome and chromosome analyses were IMI 349063A and MAFF 305635 [10,17,18], hereafter called Ch63 and Ch35, respectively.The fungi were cultured as described previously [18].
Seeds of nine M. truncatula accessions (Table S1, available in the online Supplementary Material) were provided by the INRAE Centre de Ressources Biologiques (UMR 1097, Montpellier, France), while M. sativa seeds were purchased from Germ'line SAS (France).M. truncatula seeds were first abraded with sandpaper and imbibed with water for 1 h before sowing in seed compost (Floragard Vertriebs-GmbH, Oldenburg, Germany), while M. sativa seeds were sown directly in the same compost.All plants the pathogenicity of this fungus.We show that this region is also a focus for segmental duplications, which may contribute to generating genetic diversity for adaptive evolution.Finally, we report infection by this fungus of the model legume, Medicago truncatula, providing a novel pathosystem for studying fungal-plant interactions.

Infection assays and microscopy
To test the susceptibility of M. truncatula accessions to Cd709, intact plants (17 days old) were inoculated by first immersing the above-ground parts in a solution of 0.01 % (v/v) Silwet to wet the leaves and then by immersion in a suspension of C. destructivum spores (2×10 6 ml −1 ).The inoculated plants were incubated in a humid box inside a controlled environment chamber (25 °C, 12-h photoperiod, PPFR 40 µmol m −2 s −1 ).For microscopic examination, pieces of infected tissues were cleared with a 1 : 3 mixture of chloroform:ethanol for 1 h and then with lactophenol for 30 min, before mounting on a microscope slide in 70 % glycerol and imaging with a Leica DM5500 light microscope.Symptoms were recorded at 4 days post-inoculation (dpi).

Pulsed-field gel electrophoresis and Southern blotting
The plugs containing the conidial protoplasts for PFGE were prepared as previously described [17].PFGE (Bio-Rad CHEF-DR II system) was performed using the following conditions: runtime 260 h, switch time 1,200 to 4,800 s, 1.5 V/cm and 0.75 × TBE at 8 °C.Yeast chromosomal DNA served as a size marker (Bio-Rad; 200 kb-2 Mb).
Southern blotting was conducted using standard protocols [19].A digoxigenin-labelled probe was generated by PCR following the manufacturer's instructions (PCR DIG Probe Synthesis Kit, Roche).The 993 bp probe (Cd709 chr1, position 6 711 095 to 6 712 088) was specific to mini-chromosome-like sequences at the right arm of chromosome 1 in Cd709.Hybridization was performed in DIG Easy Hyb buffer at 42 °C overnight.The membrane was then extensively washed with low-and high-stringency buffers and subsequently blocked with buffer B2 [1 % blocking powder (Roche) in buffer B1 (100 mM maleic acid, 150 mM NaCl and pH 7.5)].The blocking solution was then replaced with the antibody solution [buffer B2 containing DIG-antibody 1 : 26,000 (Roche)].The membrane was washed with buffer B1 containing 0.3 % Tween 20.The membrane was subsequently equilibrated in buffer B3 (100 mM Tris pH 9.5, 100 mM NaCl and 50 mM MgCl 2 ) and developed with chemiluminescence (CDP-Star, Roche).

Genome data, assembly, rearrangements and duplications
The genomic DNA of Cd709 was used to prepare a size-selected library (20 kb) prior to sequencing with a PacBio Sequel sequencer (kit 2.1, Keygene N.V., Wageningen, The Netherlands) on two SMRT cells, yielding raw data with approximately 224× genome coverage (1,474,759 reads, N50 10,837 bp).Genome assemblies were generated from several runs of the Hierarchical Genome-Assembly Process version 4 and Canu [20] assemblers.The draft genome was polished with the Arrow algorithm, and the completeness of the assembly was evaluated with BUSCO using the Ascomycota gene set as evidence [21].Telomeres were validated by the presence of at least three repeats of the TTAGGG/CCCTAA motif at the end of assembled contigs [22].The polished assembly was aligned with nucmer against the Ch63 and Ch35-RFP genomes to visualize chromosome rearrangements.SDDetector [9] was used to detect SDs in combination with Bedtools and BWA-MEM for validation.The Cd709 mitochondrial genome was assembled with Organelle_PBA [23] (Table S2).

Transcriptome data and analysis
RNA sequencing was performed on samples of mRNA from undifferentiated mycelium grown axenically and in two different stages of plant infection, 48 and 72 h after inoculation, corresponding to the biotrophic and necrotrophic phases, respectively.Mycelium was grown for 3 days in potato dextrose liquid medium (PDB, Difco) at 25 °C with shaking (150 r.p.m.) and harvested by filtration.Seedlings of M. sativa (8 days old) were inoculated by placing a droplet (10 µl) of Cd709 spore suspension (7×10 5 spores/ ml) onto the surface of each cotyledon, and the plants were then incubated as described for M. truncatula.Discs of the infected cotyledon tissue were harvested using a cork borer (4-mm diameter).After grinding the tissues in liquid nitrogen, total RNA was extracted using the RNeasy Plant Mini Kit (Qiagen).Libraries were then prepared from each sample type using the TruSeq Paired-end Stranded mRNA Kit and sequenced (100 bp reads) using a HiSeq 4000 sequencing platform (IntegraGen Genomics, Evry, France).RNA-Seq paired reads were cleaned and trimmed using Trimmomatic [24] and then mapped to the genome assembly of Cd709 using STAR [25].A genome-guided transcript assembly was obtained from mappings with StingTie v1.3.4.Assembled raw transcripts were then filtered based on the Transcript Per Million (TPM) distribution per transcript per library.

Genome annotation
Transposable elements (TEs) were searched in the C. destructivum genome sequence using the REPET package [26,27].Consensus sequences identified with the TEdenovo pipeline were classified using the PASTEC tool [28], based on the Wicker hierarchical TE classification system [29], and then manually filtered and corrected.The resulting library of consensus sequences was used to annotate TE copies in the whole genome using the TEannot pipeline.
Protein-coding genes were annotated using the Eugene [30] and FunGAP [31] tools.Predicted genes were filtered out when 10 % of their coding sequence (CDS) overlapped a TE predicted by the REPET package.Filtered predicted genes from Eugene and FunGAP were clustered together based on their CDS coordinates (overlap of one base required) with no strand consideration.The annotation edit distance [32] was computed with transcript and protein evidence for each transcript, and the predicted model with the best score was retained at each locus.Mitochondrial genomes were annotated with MFannot [33] and MITOS2 [34].Results were manually inspected, and in case of divergence between the predictions, the longer gene model was retained.
The synteny between C. destructivum and C. higginsianum proteomes was analysed with SynChro [35], which detects ortholog proteins with reciprocal best hit (RBH), based on 40 % similarity and a length ratio of 1:3.Colinear orthologues were then grouped in syntenic blocks, according to a delta threshold=1 (very stringent mode).Non-syntenic blocks were extracted when five or more consecutive non-syntenic genes were found.Proteome similarities with other Colletotrichum spp.were performed with Blast 2.2.28+ and the results were filtered with a cut-off of 30 % identity and 50 % query coverage.Proteome synteny and associated figures were obtained using Clinker [36].

Functional annotation of predicted genes
Functional annotations of genes obtained using Interproscan 5.0 [37] and Blastp (e-value <1e-5) [38] against the NCBI nr databank (September 2019) were then used to perform Gene Ontology (GO) [39] annotation with Blast2GO [40].Carbohydrate active enzymes (CAZymes) were annotated with dbCAN2 [41] launching HMMER, Diamond and Hotpep against dedicated databases.Genes were considered as CAZymes when at least two of the three tools provided a positive annotation.
Genes encoding potential secreted proteins were predicted with a combination of SignalP v4.1 [42], TargetP v1.1 [43] and TMHMM v2.0 [44] results.The secretome was defined as the union of SignalP and TargetP results and then intersected with TMHMM results (zero or only one transmembrane domain).Proteins smaller than 300 amino acids were then extracted and considered as small secreted proteins (SSPs).In parallel, EffectorP v2.0 [45] was applied to the predicted secretome to identify putative effector proteins.Finally, the intersection of EffectorP and SPP results was retained to establish a list of potential effectors.
To detect secondary metabolism BGCs, predicted genes were submitted to Antibiotics and Secondary Metabolite Analysis Shell (antiSMASH) v5 [46].Only core biosynthetic genes [commonly known as secondary metabolism key genes (SMKGs)] were considered for further analysis.Presence/absence patterns of SMKGs were based on RBHs with Ch63 and Ch35 and then manually inspected.Among the newly predicted SMKGs, those encoding polyketide synthases (PKSs) and non-ribosomal peptide synthases (NRPSs) were checked for the presence of the minimal expected set of enzymatic domains, namely, ketosynthase and acyltransferase (AT) domains for PKS and A and PCP domains for NRPS.Terpene synthase (TS) and dimethylallyltryptophan synthase genes were manually inspected and retained if they had RNA-Seq or protein support.Those Cd709 genes not predicted as SMKGs by antiSMASH, but orthologous to a C. higginsianum SMKG, were also included.For example, antiSMASH failed to annotate six TSs that are present in both species.

Codon usage analysis
Codon usage was computed for predicted gene CDS on each chromosome or region using the EMBOSS tool 'cusp' .The resulting codon usage matrix (i.e. the fraction of each codon in a given amino acid) was subjected to Fisher's exact tests (with a Bonferroni correction for multiple testing) to address the statistical significance of differences between the core and mini-chromosomes.The matrix was also subjected to a principal component analysis (PCA), and the results were projected onto the first two principal components.To analyse the GC percentage of the three letters of each codon, the 'cusp' tool was run individually on each CDS of each chromosome or region, and the results were represented as density plots.The corresponding figures were generated using R (v4.0.5) and the libraries ggplot2 (v3.3.3),cowplot (v1.1.1)and ggbeeswarm (v0.6.0),all available from the Comprehensive R Archive Network (CRAN) repository (https://cran.r-project.org/).

A novel C. destructivum-M. truncatula pathosystem
The cell biology of infection of M. sativa by Cd709 was previously described [2].Here, we report the infection of the model plant M. truncatula (barrel medic) by this species.Five out of the nine tested M. truncatula accessions, including the genome-sequenced accession ESP074-A [47], were found to be susceptible to C. destructivum in two independent infection assays (Fig. 1 and Table S1).At 4 dpi, necrotic water-soaked lesions were visible on the trifoliate leaves of the susceptible accessions (Fig. 1).In contrast, the leaves of resistant accessions presented only small necrotic flecks or no visible symptoms.The genome-sequenced accession R108-C3, which is widely used for M. truncatula functional genomics [48], was resistant to C. destructivum in these infection assays.
On cotyledons of the susceptible M. truncatula accession ESP155-D, Cd709 spores germinated to form melanized appressoria, which by 48 hours post-inoculation (hpi) had penetrated host epidermal cells to form bulbous, intracellular biotrophic hyphae that were confined to the first infected cell (Fig. 2a).Thinner necrotrophic hyphae started to emerge from the tips of the biotrophic hyphae at 60 hpi (Fig. 2b), and after 72 hpi, the fungus had completed its asexual cycle by producing sporulating structures (acervuli) on the surface of the dead tissues (Fig. 2c).On cotyledons of the resistant accession ESP163-E, appressoria formed abundantly on the leaf surface but penetrated the host epidermal cells very infrequently (Fig. 2d, e).Groups of dead epidermal cells underlying the appressoria appeared yellow-brown in colour and had granulated contents, suggesting they had undergone a hypersensitive cell death response.Rarely, small hyphae were visible in epidermal cells beneath appressoria, but they developed only a short distance into the dead cells and most remained smaller than the appressorium.Acervuli were never observed on plants of accession ESP163-E.

Genome assembly and structural annotation
Long-read data allowed us to generate a complete genome assembly for Cd709, with a total length of 51.75 Mb in which all 12 chromosomes were sequenced from telomere to telomere (Fig. 3), together with the circular mitochondrial genome (34 kb).The annotation of TEs revealed a total of 49 consensus sequences, representing all the possible TEs in the Cd709 genome.The classification of the TEs (Table S3) showed that the genome contains 18 different families of retrotransposons, including 11 LTR and 7 long interspersed nuclear elements; 28 DNA transposons, including 25 terminal inverted repeat (TIR), 1 helitron and 2 Miniature Inverted-Repeat TE (MITE); and 3 unclassified repeated elements.The library of 49 consensus sequences was then used to annotate TE copies in the Cd709 genome.Overall, TEs covered 6.2 % of the genome assembly by length.The class I LTR Gypsy superfamily was the most abundant in terms of coverage and number of copies, whereas the class I TIR Tc1-Mariner was the most abundant in terms of full-length copies.Two Gypsy transposons (R172 and G87) resemble the most abundant TE family in C. higginsianum, namely, the LTR transposon family RLX_R119 [9].Looking at the distribution of TE families along the chromosomes, we found that the telomeres of all 12 C. destructivum chromosomes were associated with a single copy of a TE belonging to the helitron family (G103).
To annotate the protein-coding genes, a genome-guided assembly of RNA-Seq reads provided 16,122, 13, 901 and 15, 081 transcripts for axenic mycelium, 48 hpi and 72 hpi libraries, respectively (Table S4), with 1.88 TPM, 9.38 TPM and 4.90 TPM as minimum expression levels, respectively (Fig. S1).Assembled transcripts were then used to predict gene models in conjunction with Colletotrichum and Ascomycota protein databanks.The results of EuGene and FunGap were combined and filtered to generate the Cd709 gene set comprising 15,631 complete gene models, of which 11, 853 had transcript support and 15, 172 resembled Ascomycota predicted proteins.Features of the gene annotation are summarized in Table S5.The completeness of this annotation was confirmed by comparison to the BUSCO Ascomycota set (1,315 genes), with 1,309 complete genes predicted and only one missing.Functional annotation assigned InterPro entries to 10, 298 genes, among which 7,475 had at least 1 GO term and 1,105 were potential enzymes (annotated with an enzyme code).Based on Blast2GO descriptions, 12,192 predicted genes (78 %) had a predicted function, i.e. a description other than 'hypothetical protein' (Table S6 tab 'all').The mitochondrial genome of Cd709 was annotated with 29 tRNAs, 2 rRNAs (small and long subunit) and 21 genes.(d, e) Resistant accession ESP163-E.At 72 hpi, few appressoria had penetrated cotyledon epidermal cells, and groups of cells underlying the appressoria were pigmented yellowish brown with granular contents (*).Any hyphae (h) visible inside epidermal cells were typically smaller than the appressorium.Scale bars=20 µm.

Plant interaction-related genes
A total of 619 Cd709 genes were annotated to encode CAZymes, among which 410 were assigned to the glycoside hydrolase, carbohydrate esterase and polysaccharide lyase CAZyme classes (Table S6 tab 'CAZyme').The proportion of genes in each CAZyme class closely resembled that previously found in Ch63 [49], and 98 % (400/410) of Cd709 CAZyme genes were also detected in the Ch63 genome.In silico analysis of the Cd709 secretome revealed a total of 2,608 potential extracellular secreted proteins, including 1,118 small proteins (<300 amino acids).Among these, 484 genes were retained as putative effectors because they were also present among 508 genes identified by EffectorP.Comparing these to the effector repertoire of Ch63, a total of 127 putative effectors (26.2 %) were unique to Cd709, having no reciprocal best BLAST hit in Ch63 (Table S6 tab 'predicted effectors').A total of 110 SMKGs were detected in the Cd709 genome using the fungal version of antiSMASH and were manually curated.These C. destructivum SMKGs were compared to the 105 C. higginsianum SMKGs [9].Overall, 78 % (94 out of 120) of the SMKGs were present in both species (Table S6 tab 'secondary metabolism' , Fig. S2).A total of 17 C. destructivum SMKGs, distributed over 8 BGCs, were not detected in C. higginsianum.

Chromosome structure comparison
Complete chromosome-level assemblies are available for two different C. higginsianum strains, namely, Ch63 [9] and Ch35-RFP, a transformant of MAFF 305635 (Ch35) expressing red fluorescent protein which lacks both mini-chromosomes 11 and 12 [10,17].The genetic proximity of C. destructivum and C. higginsianum allowed us to align assemblies to observe chromosome structural variations.This generated 38 Mb of C. destructivum alignments (>10 kb) with each C. higginsianum strain, ranging from 88 to 96.7 % identity.Thus, C. destructivum shared approximately 73.6 % of its total genome length with C. higginsianum.At the chromosome scale, alignments revealed that five chromosomes of C. destructivum (chr1, 2, 3, 5 and 9) were not involved in any large rearrangements and five others (chr4, 6, 7, 8 and 10) showed inter-chromosomal rearrangements, while the two mini-chromosomes (chr11 and 12) lacked large regions of conserved sequences and appear to be species specific (Fig. 4a).
One rearrangement involved chr7 and chr8 of Cd709 resulting in chr4 and chr10 of Ch63.The breakpoints in chr7 and chr8 were associated with TEs in Cd709 (Fig. S3A and B).A similar rearrangement was found relative to Ch35-RFP, albeit with different breakpoints in both species that were not associated with TEs (Fig. S3F, G).A second rearrangement involved chr4 and chr10 of Cd709 such that their left and right arms resulted in chr9 and chr7 of Ch63, respectively (Fig. S3C, D).Interestingly, this rearrangement was not found relative to Ch35-RFP, suggesting that it is specific to particular C. higginsianum strains, as was noted previously [10].A third inter-chromosomal rearrangement concerned 121 kb at the 5′ extremity of Cd709 chr6 coming from chr4 and contig_1 of Ch63 and Ch35-RFP, respectively.In C. destructivum, this breakpoint is surrounded by TEs and nonsyntenic regions (Fig. S3E).Remarkably, a specific rearrangement of 42 kb between chr11 of Cd709 and contig 11 of Ch35-RFP (Fig. S3H) corresponds to a region that is absent from the Ch63 genome assembly and that encodes highly variable effectors (having ≤90 % alignment coverage) and secondary metabolism-related proteins [10].In addition, several short stretches (2 to 5 kb in length) from chr11 of Cd709 were present at the extremities of chromosome 6 in Ch63 and the corresponding region of Ch35-RFP (contig _9) (Fig. 4).
A notable feature of the C. destructivum genome assembly is the unusually large size of chr1 (7.3 Mb), which is 0.9 Mb longer than the largest chromosome in C. higginsianum (6.4 Mb).Genome alignments highlighted a near-complete synteny between chr1 of Cd709 and chr2 of Ch63 except for a 1.2 Mb subtelomeric region (coordinates chr1 : 6,076,875-7,282,542), for which no similarity was found in C. higginsianum (Fig. 4).Synteny between the genes of Cd709 and those of Ch63 was investigated using SynChro.With stringent settings, 400 syntenic blocks were identified based on 12 135 RBHs.A total of 1,083 genes were found in 47 non-syntenic blocks composed of at least 5 consecutive Cd709-specific genes (Tables S7 and S8).The largest non-syntenic block, corresponding to the 1.2 Mb region specific to Cd709 on chr1, contained 305 genes.Mini-chromosome chr12 contained one non-syntenic block of 170 genes, while chr11 was divided into seven non-syntenic blocks, the largest containing 106 genes.Although only 356/1,083 genes inside non-syntenic blocks could be annotated with a GO term, GO enrichment tests revealed that the Cd709-specific genes were enriched in protein kinases, protein phosphorylation activity and secondary metabolism process (Table S9).Likewise, effector genes were found to be enriched in non-syntenic blocks whereas CAZymes were depleted (Table S10).

Validation of the 1.2 Mb non-syntenic region in C. destructivum chromosome 1
To verify the large non-syntenic region identified within chr1, we first checked for potential errors in the sequence assembly of this region by manually inspecting long reads spanning the two junctions (Fig. S4).Second, to obtain an assembly-independent validation, PFGE and a Southern hybridization were performed (Fig. 5a, b).A 993 nt probe (coordinates chr1 : 6, 711, 095 to 6, 712, 088) was designed within the 1 Mb non-syntenic region to target a unique locus that avoided TEs (Fig. 5b).This probe is 83.5 % identical to the gene CH63R_14488 located on chromosome 11 of Ch63 that was used as a hybridization control.
Chromosomes of two C. destructivum isolates (Cd709 and Cd202) and two C. higginsianum isolates (Ch63 and Ch35) were separated by PFGE and analysed by Southern hybridization (Fig. 5c, d).For both C. destructivum isolates, the probe hybridized to molecules with high molecular weight that could correspond to the largest chromosome, consistent with a location on chr1 (Fig. 5c, d).The high molecular weight signals were absent in the C. higginsianum blots, and instead, hybridization signals were detected at a position corresponding to mini-chromosome 11, although these were weak, as expected for a probe with only 83.5 % identity to the target.Overall, our findings validate that a non-syntenic region is embedded within chr1 of C. destructivum.Hereafter, we refer to the syntenic and non-syntenic portions as chr1A and chr1B, respectively, and their distinct properties were explored further in the following analyses.

Region chr1B shows the characteristic features of fungal accessory chromosomes
In many aspects, the region chr1B of Cd709 resembled the mini-chromosomes 11 and 12.All three compartments were more AT rich than the core genome.Region chr1B was also highly enriched with TEs, having 32.8 % coverage with TE copies by length, similar to chr11 and chr12 (32.3 and 35.1 %, respectively), whereas the core chromosomes (excluding chr1B) had only 3-6.2 % TE coverage (Fig. 3 and Tables 1 and S1).Moreover, the distribution of TE families in region 1B and the two mini-chromosomes differed markedly from the core chromosomes in that they were all enriched with LINE retrotransposons (44 %, 19 % and 34 % coverage, respectively), compared to only 7 % in the core genome (Fig. S5).LINE TEs are also present in C. higginsianum on chromosomes 11 and 12, but their expansion was less striking in this species (7 % and 2 % coverage, respectively) (Fig. S5) than in Cd709 [9].
The examination of the gene content of region chr1B revealed that, similar to the mini-chromosomes, it was overall depleted in protein-coding genes (twofold less than the core chromosomes), contained a significantly larger proportion of genes encoding proteins of unknown function (i.e.annotated as hypothetical proteins) and had fewer expressed genes (RNA-Seq transcript evidence) compared to the core genome (Table 1).Considering categories of potentially pathogenicity-related genes, no CAZyme genes or SMKGs were detected in either region chr1B or chr 12, although eight SMKGs were present on chr11 (Table S6, tab 'secondary metabolism'), all of which had RNA-Seq transcript support.Moreover, 38 effectors were found in chr1B and the two mini-chromosomes.Remarkably, 36 of these were absent from C. higginsianum (had no RBH in Ch63), of which 20 were expressed in planta (Table S6 tab 'predicted effectors').With 15 and 10 effectors, respectively, the mini-chromosomes 11 and 12 were significantly enriched in putative effectors compared to the core chromosomes whereas no enrichment was observed for the 13 effectors of the chr1B (Table 1).Remarkably, the most highly expressed effectors during the biotrophic phase (48 hpi), namely CDEST_01870 (chr1B) and CDEST_15472 (chr12), were located on mini-chromosome-like regions.This raises the possibility that genes carried in such regions are important for virulence.

Codon usage in region chr1B and the mini-chromosomes differs from the core chromosomes
Analyses of codon usage were used previously to detect differences between the core and accessory chromosomes or lineagespecific compartments of other plant pathogenic fungi [15,50,51].We therefore computed the codon usage of CDS located on the core chromosomes, mini-chromosomes and the chr1B region of Cd709.Based on a PCA, codon usage on the core chromosomes was very homogeneous, whereas that of the mini-chromosomes and region chr1B clustered together and separately from the core chromosomes (Fig. 6a).To illustrate this in greater detail, we plotted the codon usage for each amino acid and for each chromosome or region (Fig. S6, representative examples are given for three amino acids in Fig. 6b).For these analyses, we excluded the two amino acids (Trp and Met) that are encoded by a single codon.Based on Fisher's exact tests for each of the remaining 59 codons, almost all the codon usages were different between the core chromosomes on one hand and chr1B, chr11 or chr12 on the other hand.In striking contrast, there were only three differential codon usages between chr1B and chr11 and one between chr1B and chr12.However, chr11 and chr12 were most different from each other with 15 differential codons (Table S12; adjusted P<0.001).

Region chr1B is a hotspot for segmental duplications
The genome of Cd709 was inspected for SDs, as described previously for C. higginsianum [9].A total of 48 duplications involving genes were detected on 4 chromosomes (chr1, chr6, chr11 and chr12).Among them, 12 duplications were larger than 10 kb (Fig. 7) of which only 3 were inter-chromosomal (all involving chr12).Similar to C. higginsianum [9], these inter-chromosomal duplications were all associated on at least one side with TEs, supporting a potential role of TEs in duplication (Fig. S7).However, in contrast to C. higginsianum, these duplications did not take place preferentially near telomeres.remarkable feature of region chr1B that it showed a strong intra-chromosome duplication pattern, with some regions replicated up to three times (Fig. 7).Assembling large duplications can be difficult even with long-read sequences [52].To check for possible bias during assembly, the eight largest intra-chromosome duplications on chr1B were inspected manually (Table S13).Due to the problem of multiple reads mapping to duplicated regions, we considered only uniquely mapped reads.Consequently, the read coverage of these eight regions was on average twofold lower than the non-duplicated regions.No other regions of chr1B showed a significant decrease in coverage, and the extremities of the SD regions were well anchored to chr1B.Reads were identified spanning the 2 smallest duplications, SD1B-2 (10 reads) and SD1B-6 (22 reads), but other duplicated regions were too large (>16 kb) to be spanned by single PacBio reads.Finally, the short-read RNA-Seq data used to annotate the genome were also employed to detect mutations within the duplicated genes.Mutations were detected in all the duplicated regions, albeit with the support from only few reads in most cases.Taken together, these results support the reliability of the observed duplications in region chr1B.
To gain insight into the possible origin of region chr1B, we examined the conservation of the 300 genes contained within this region in the genomes of 23 other Colletotrichum species (Table S14).As expected, given that C. destructivum and C. higginsianum belong to the same species complex [3], the total proteome of Cd709 showed the greatest similarity to that of Ch63 (14 372 conserved proteins).Surprisingly, however, the chr1B proteome shared most conserved proteins with a phylogenetically distant species, namely, C. truncatum (217 protein matches, compared to only 134 matches in C. higginsianum) [53].Almost half of the genes shared with C. truncatum were involved in SDs within the Cd709 chr1B.Remarkably, the region triplicated in SD1B-1, SD1B-3 and SD1B-7 was also found in a large duplicated region represented by two contigs within the C. truncatum genome assembly (Fig. S8) [54], which may be located on a mini-chromosome due to their low GC content (49.0 %, compared to 51.2 % in the longer contigs of C. truncatum).Other genes located within the Cd709 SD1B-1 duplications had Blast matches that were mostly restricted to C. incanum, C. spaethianum and C. tofieldiae (Spaethianum species complex), C. salicis and C. nymphaeae (Acutatum species complex), C. fructicola (Gloeosporioides species complex), C. sublineola (Graminicola species complex) and C. orchidophilum, which vary in their phylogenetic distance from C. destructivum [53].The absence of these gene sequences from the C. higginsianum genome was confirmed by TBLASTN searches against the NCBI wgs Colletotrichum database (266 genomes).showing codon usage bias for three amino acids (alanine, glutamine and glycine) in genes located on core chromosomes (1 to 10 excluding region 1B), mini-chromosomes 11 and 12 and region 1B.Codon usage on chr11, chr12 and region chr1B differed significantly from that on core chromosomes (Fisher's exact test, P<0.001) for the ten codons presented except GCG (all comparisons) and GGG (chr12 vs core).Other amino acids are displayed in Fig. S6.The significance is reported for all the codons in the Table S12.
Examination the gene content in duplicated of chr1B gave few clues to their possible role in the host interaction or the advantage for the fungus to maintain multiple mutated copies of these genes.One gene duplicated four times (CDEST_01898, CDEST_01949, CDEST_02058 and CDEST_02116) encoded a major facilitator superfamily transporter.The five genes duplicated between SD1B-2 and SD1B-6 comprised four FAD-binding domain-containing proteins and a patatin-like serine hydrolase.

DISCUSSION
In this study, we present a chromosome-level reference assembly of the C. destructivum genome, a phytopathogen causing anthracnose disease principally on species of Medicago and Trifolium (Fabaceae).Among other members of the Destructivum species complex, which currently contains 17 recognized species [3], the genomes of C. lentis, C. tanaceti and C. shisoi were sequenced previously, but the resulting assemblies were highly fragmented, containing 2,980, 5,242 and 36,350 contigs, respectively [8,11,12].Using PacBio long-read sequencing, we were able to generate a gapless assembly of the Cd709 genome, which, together that of Ch63 [9], provides second complete genome within the Destructivum species complex, facilitating future comparative genomic analyses within this important group of plant pathogens.
Alignment of the Cd709 genome assembly with those of C. higginsianum strains Ch63 and Ch35 revealed large-scale chromosome rearrangements between the two closely related species.Some of these rearrangements were potentially mediated by a recombination between homologous regions containing TEs, which flanked one or both of the breakpoints.Similar TE-mediated chromosome rearrangements were previously reported at the intra-species level in C. higginsianum [10].Our analysis of synteny between the genomes of Cd709 and Ch63 also revealed the presence of a 1.2 Mb species-specific region within chr1 of Cd709, which we called chr1B.This 'accessory region' (AR) displays many of the hallmarks that characterize fungal mini-chromosomes, or 'accessory chromosomes' , in that it is AT rich, transposon rich and gene poor and has a distinct codon usage [51,[55][56][57].In all these respects, chr1B resembles the mini-chromosomes chr11 and chr12 but is strikingly different from the rest of Chr1 and other core chromosomes of Cd709.The TE enrichment observed in chr1B and both mini-chromosomes is largely caused by the specific expansion of LINE and TIR elements in these compartments, unlike the core chromosomes where the Gipsy TE family predominates.
Using PFGE and Southern hybridization with a probe specific to chr1B, we were able to confirm that this AR is carried not only on chr1 of Cd709 but also on the largest chromosome of Cd202, despite the widely separated geographical origins of these two isolates (Saudi Arabia and Morocco, respectively).The analysis of a larger collection of C. destructivum isolates is now needed to determine the extent to which chr1B is conserved within this pathogen species.The presence of an AR embedded within a core chromosome has been reported in other plant pathogenic fungi.For example, isolates of the T race of Cochliobolus heterostrophus harbour an AR of about 1.2 Mb distributed between two core chromosomes that contain the Tox1 locus producing the T-toxin polyketide [58,59].In Verticillium dahliae, chr3 and chr4 each harbours two ARs of ~300 kb [60], while in Fusarium poae, a 204 kb block with AR characteristics is inserted near one telomere of chr3 [57].However, it should be noted that in these two examples, the inserted AR blocks are fourfold to sixfold smaller than chr1B of Cd709.
Our working hypothesis is that the AR chr1B arose by the integration of a mini-chromosome into a core chromosome of C. destructivum, but the mechanism by which this occurred is unclear.Despite the subtelomeric position of chr1B, its integration is unlikely to have resulted from the telomeric fusion of a mini-chromosome with a core chromosome because it is flanked on both sides by portions of chr1, both of which are highly syntenic to chr2 of C. higginsianum.A chromosome containing distinct regions characteristic of core and accessory chromosomes was previously reported in the genome of C. fructicola strain Nara gc5 [61].In this case, the chimeric chromosome, called Nara_c11, is smaller (2.8 Mb) than Cd709 chromosome 1 (7.3 Mb) and the TE-rich, gene-poor AR occupies most of the chromosome (66 % by length), in contrast to Cd709 chr1B, which occupies only 16 %.A further difference to Cd709 chr1B is that the AR of Nara_c11 includes a telomere, suggesting that in this case, the chimeric chromosome arose through a different mechanism.Taken together, our findings provide further evidence for genetic exchange between core and accessory genomic compartments in Colletotrichum species [61].In other fungi, chromosome breakage-fusionbridge cycles have been invoked not only in the creation of accessory chromosomes from core chromosomes [62] but also in their reintegration into core chromosomes [63].
A distinguishing feature of the chr1B AR is that it has undergone extensive region-specific SDs.Some inter-chromosomal SDs in Cd709 were associated with TEs at one or both of their borders, as we found previously in Ch63 [9], but there was little evidence that the region-specific SDs in chr1B were mediated by TEs.Similarly, the AR of C. fructicola chromosome Nara_c11 was found to be implicated in numerous intra-and inter-chromosomal SDs, but as in Cd709, these were not consistently flanked by TEs [61].Among fungal pathogens, SDs can play important roles in generating genetic diversity and novel gene functions, either at the level of expression or CDS [64,65].A recent study on Fusarium strains infecting banana also highlighted the importance of SDs in driving the evolution of ARs and the effector genes contained within them [66].Although the C. destructivum genome contains a complete Mat1-2-1 mating-type locus (Table S6, tab MAT1-2-1) and should therefore be capable of sexual reproduction, this has never been observed [3,67].In this context, SD may therefore provide an important mechanism for generating genetic diversity for host adaptation in this essentially asexual pathogen.
A remarkable finding was that some segmentally duplicated blocks of genes within chr1B of C. destructivum are conserved and syntenic with duplicated regions in the genome of C. truncatum, a species that is phylogenetically very distant [53].Given that these two taxa diverged ~60 million years ago [68], soon after speciation in Colletotrichum, these SDs may be very ancient and have been selectively retained in some species and lost in others.Alternatively, these duplicated regions may have been acquired by horizontal chromosome transfer (HCT) from another species to a common ancestor, or through independent transfers to C. destructivum and C. truncatum.HCT would be consistent with the distinct codon bias in chr1B and the taxonomic incongruity of many genes within this region.The horizontal transfer of a mini-chromosome between vegetatively incompatible biotypes of C. gloeosporioides was shown experimentally [69,70], and it is well documented that genetic material can be exchanged following fusion between conidial anastomosis tubes of the same, or even different, Colletotrichum species [71][72][73].
Chr1B contains a variety of genes with potential roles in fungal virulence, some of which were expressed during infection.These include genes encoding 13 candidate-secreted effector proteins, 8 protein kinases, 5 major facilitator superfamily membrane transporters, 5 heterokaryon incompatibility (HET) proteins 8 transcription factors (TFs) (Table S6).It is interesting to note that, similar to chr1B, the accessory 'pathogenicity chromosome' of Fusarium oxysporum f.sp.lycopersici is enriched not only with effector genes but also with genes encoding protein kinases, membrane transporters, HET proteins and TFs, of which one TF was shown to regulate the expression of plant-induced effector genes [74,75].TFs were also found to be enriched in the four lineage-specific ARs of V. dahliae [60].Overall, the gene content of chr1B suggests that it may contribute to C. destructivum pathogenicity.This was demonstrated experimentally for ARs in two other members of the Destructivum species complex, namely, chr11 of C. higginsianum (isolate Ch35), which was essential for virulence on A. thaliana [17], and chr11 of C. lentis, which was required for virulence on lentil [12].In the case of Cd709, it is noteworthy that the three most highly expressed and plant-induced effector genes are all located in ARs, namely, CDEST_01870 on chr1B, CDEST_15404 on chr11 and CDEST_15472 on chr12.These and other pathogenicity-related genes carried within these genomic compartments will provide interesting candidates for future functional analysis.
Finally, we show here that Cd709 can complete its life cycle not only on its original host, M. sativa, but also on the widely studied model legume, M. truncatula.Until now, the only other Colletotrichum species known to attack M. truncatula was C. trifolii, which belongs to the phylogenetically distant Orbiculare species complex and uses a different infection process where the biotrophic phase extends to many host cells [76,77].With complete genome assemblies and high-quality gene annotations available for both partners, together with abundant genetic tools and resources on the plant side, the C. destructivum-M.truncatula interaction could provide a tractable new model pathosystem for studying hemibiotrophic fungal interactions with Fabaceae hosts.Our identification of susceptible and resistant M. truncatula accessions also raises the possibility that natural variation among accessions could be exploited to analyse the genetic basis of resistance to C. destructivum [78].

Fig. 1 .
Fig. 1.M. truncatula accessions used in this study and their infection phenotypes with C. destructivum LARS 709.Upper panel: geographical distribution of M. truncatula in the Mediterranean area according to GBIF (2019) and collection locations of the nine ecotypes used in this study.Lower panel: symptoms produced on the trifoliate leaves of six M. truncatula accessions at 4 dpi with spore suspension of C. destructivum LARS 709.Leaves of the susceptible accession DZA210-5 showed large necrotic lesions, while those of DZA327-7 and ESP155-D were completely necrotic.Leaves of the resistant accessions ESP163-E, DZA016-F and R108-C3 showed small necrotic flecks or no visible symptoms.Note that R108-C3 is considered to be M. truncatula ssp.tricycla.

Fig. 3 .
Fig. 3. Schematic representation of the 12 chromosomes of C. destructivum isolate 709.The distribution of genes and TEs across each chromosome is shown together with the corresponding genome statistics (inset table).

Fig. 4 .
Fig. 4. Whole-genome alignments between Cd709 and two C. higginsianum strains.Chromosomes of Cd709 (white bars) were aligned with (a) the chromosomes of Ch63 (grey bars) or (b) the contigs of Ch35-RFP (grey bars).Syntenic regions (length >10 kb and per cent identity >88 %) were linked together using coloured arcs specific for each chromosome in the Cd709 genome assembly.Red diamonds indicate inter-chromosomal rearrangements.Black diamonds indicate chromosome breakpoints associated with separate contigs in the Ch35-RFP assembly only.The blue track indicates gene blocks that are unique to Cd709.Note that region chr1B of Cd709 (highlighted in green) has no alignments in either of the C. higginsianum isolates.The black arcs linking chr11 of Cd709 to the 3' end of chr6/contig_9 in C. higginsianum (highlighted in pink) indicate regions with strong sequence similarity (per cent identity >88 %) that are smaller than 10 kb.Asterisks indicate where chromosome sequences were reverse complemented for better visualization.Tick mark spacing=1 Mb.

Fig. 5 .
Fig. 5. Chromosome 1 of C. destructivum has a bipartite structure.(a) Scheme of the structure of Cd709 chromosome 1.The probe is specific to the mini-chromosome-like part of the chromosome (chr1B).(b) Detailed scheme of the regions targeted by the 993 bp DIG-labelled probe in Cd709 and in Ch63 (chr11: 493,380 to 494,373).Patterned boxes indicate sequence identity of the target regions to the probe.(c) PFGE of chromosomal DNA from C. destructivum isolates LARS 202 (Cd202) and LARS 709 (Cd709) compared to C. higginsianum isolates IMI 349063 (Ch63) and MAFF 305635 (Ch35).(d) Southern hybridization.Numerals 1 to 4 indicate signals corresponding to chromosomes displayed in (c).

Fig. 6 .
Fig. 6.Codon usage bias in the core and mini-chromosomes of C. destructivum.(a) PCA of codon usage for all amino acids on each chromosome.The region chr1B was considered separately from the rest of chr1.The first two axes accounted for 95 % of the variance.PC, Principal Component.(b) Plotsshowing codon usage bias for three amino acids (alanine, glutamine and glycine) in genes located on core chromosomes (1 to 10 excluding region 1B), mini-chromosomes 11 and 12 and region 1B.Codon usage on chr11, chr12 and region chr1B differed significantly from that on core chromosomes (Fisher's exact test, P<0.001) for the ten codons presented except GCG (all comparisons) and GGG (chr12 vs core).Other amino acids are displayed in Fig.S6.The significance is reported for all the codons in the TableS12.

Fig. 7 .
Fig. 7. Circos plot showing C. destructivum SDs larger than 10 kb found with SDDetector.The blue and red tracks represent genes and TEs, respectively.The light green and orange arcs indicate intra-chromosomal and inter-chromosomal duplications, respectively.Duplicated genes are highlighted by blue arcs.The level of sequence similarity along the duplications is shown by a line graph with a colour scale where green indicates greater than 95 % similarity, black between 95 % and 90 % similarity and red below 90 % similarity.A sliding window of 100 bps was used to calculate and display sequence similarity from large alignments.The scale displayed on the graph ranges from 100% to 85 % similarity.