Introduction

Botryosphaeriaceae is a family of ecologically important taxa that colonize diverse, globally distributed woody plants (Phillips et al. 2013; Yan et al. 2013). These species cause stem cankers in a variety of economically important plants including apples, grapes, peaches, and pears (Mancero-Castillo et al. 2018; Yan et al. 2013; Delgado-Cerrone et al. 2016; Feijo et al. 2019). These pathogens can block xylem vessels and induce the death of xylem parenchyma and bark tissues (Cedeño et al. 1996; Biggs and Britton 1988; Mahadevan et al. 2022; Obrador-Sanchez and Hernandez-Martinez 2020). When colonizing the xylem tissues, they can produce phytotoxic metabolites and translocate them via the xylem sap, and cause canopy disease symptoms, including leaf wilt and chlorosis (Bertsch et al. 2013). Apart from being pathogens, Botryosphaeriaceae species are also saprobes and have been isolated from dead aerial branches, stems, and leaves from urban and forest tree species (Dissanayake et al. 2017; Phillips et al. 2013). In addition, species may be endophytes that colonize bark and internal woody tissues without causing any obvious symptoms for extended periods until stress commences (Slippers and Wingfield 2007; Osorio et al. 2017).

A previous study based on comparative genome analysis suggested that the ancestors of Botryosphaeriaceae originated as saprobes and underwent at least three transitions from saprobic to phytopathogenic lifestyles (Yu et al. 2022). Furthermore, Rathnayaka et al. (2023) suggested that ancestors of Botryosphaeriaceae may originate as endophytes and then switch into saprobes or pathogens. Bhunjun et al. (2023) proposed that all fungi have endophytic ancestors.

Studies have found Botryosphaeriaceae species are enriched with genes with biochemical functions such as cell wall degradation, nutrient uptake, secondary metabolism, and membrane transporters. Importantly, these functions are closely related to colonizing (endophytic lifestyle), killing (pathogens), and/or decomposing (saprobes) host tissues (Yan et al. 2018; Garcia et al. 2021; Wang et al. 2018; Nagel et al. 2021), suggesting that Botryosphaeriaceae have with different lifestyles. It has been observed that clade specific gene expansion occurs in this family which are involved in toxin production and mobilization, wood degradation, and nutrient uptake. This could contribute to the virulence differences in Botryosphaeriaceae (Garcia et al. 2021). In addition, Nagel et al. (2021) found secreted hydrolytic enzymes and secondary metabolite biosynthetic gene clusters in Botryosphaeriaceae which contribute to plant–pathogen interactions. However, whether these genes existed in their ancestral states or evolved from their saprobic/endophytic ancestors is still unclear.

Woody tissues are enriched with lignin, a source of phenolic compounds, that can provide mechanical strength to the cell wall (Yadav and Chattopadhyay 2023; Tribot et al. 2019). Phenolics are aromatic compounds that are commonly found in woody tissues as secondary metabolites and are generally involved in defenses against plant pathogens (Tanase et al. 2019). Phenolic compounds may influence reactive oxygen species (ROS) and iron homeostasis in plant–pathogen interactions, which are also important in controlling fungal colonization and proliferation (Karamanoli et al. 2011; Kumar et al. 2020). Recent studies have indicated that pathogens and avirulent pathogens induce different plant defense strategies during infection. Avirulent pathogens can induce a biphasic ROS production in plants, but virulent pathogens only induce the first phase ROS wave during interactions (Shetty et al. 2007; Mengiste 2012). However, the mechanisms underlying how Botryosphaeriaceae species evolved these genes to adapt to woody pathogenic infections from their ancestors remains unclear.

Horizontal gene transfer or lateral gene transfer refers to the transfer of genetic material among phylogenetically distant species (Soucy et al. 2015). This is a universal phenomenon in bacterial and archaeal genomes and is an important mechanism by which these organisms gain new pathogenicity-related genes (Arnold et al. 2022; Gophna and Altman-Price 2022). However, horizontal gene transfer is not as common in eukaryotes as in bacteria or archaea (Fitzpatrick 2012; Richards 2011). Nevertheless, increasing evidence implicates horizontal gene transfer as an important mechanism in fungal evolution and in providing fungi species with the capacity to adapt to new lifestyles, environments, and hosts (Soanes and Richards 2014; Dhillon et al. 2015; Qiu et al. 2016; Milner et al. 2019). Therefore, horizontal gene transfer may also be an important evolutionary driver in Botryosphaeriaceae in lifestyle changes. Mitogenome analysis of Botryosphaeria species revealed that nearly half of the mitogenome introns were most likely obtained through horizontal gene transfer from other organisms (Wang et al. 2021). However, the extent to which horizontal gene transfer has contributed to the evolution of Botryosphaeriaceae remains largely unknown.

Here, we investigated the genome evolution and origin of pathogenicity in Botryosphaeriaceae using comparative genomics, transcriptomics, and horizontal gene transfer analysis. For the comparative genomic analysis, we sequenced four Botryosphaeriaceae genomes and combined them with another 14 Botryosphaeriaceae species. In addition, 56 previously published phylogenetic closely related dothideomycete genomes, including plant pathogenic fungi and saprobes were included in the analysis. These analyses provide a unique opportunity to explore pathogenicity- and detoxification-related genes in Botryosphaeriaceae, and illuminate the mechanisms of how Botryosphaeriaceae evolved from saprobic or endophytic ancestors to woody pathogenic infections.

Materials and methods

Strains, genome sequencing, and assembly

The four strains evaluated in this study, namely Botryosphaeria dothidea, Neofusicoccum parvum, Lasiodiplodia theobromae, and L. pseudotheobromae, were collected from Vitis vinifera plants grown in different Chinese provinces (Table S1). The strains were identified based on the traditional and genomic taxonomic approaches, and deposited at the culture collection of Beijing Academy of Agricultural and Forestry Sciences in Beijing, China.

Long-read sequencing was conducted using total genomic DNA from the four strains. The DNA was extracted using the CTAB method and then fragmented using 26 G blunt-end needles. Fragments greater than 20 kbp were then selected via Blue Pippin electrophoresis (Sage Sciences, Beverly, MA, USA). The DNA libraries were then prepared using the Pacific Biosciences Express Template Prep Kit and sequenced on the PacBio Sequel platform (Pacific Biosciences of California, Menlo Park, CA, USA). A total of 4.04–5.31 Gbp of raw data were generated for each strain and then quality-filtered using the SMRTlink program (version 5.0, https://www.pacb.com/support/software-downloads), with the parameters –minLength = 50 and --minReadScore = 0.8. The N50 lengths of the long subreads ranged from 10.2 to 10.8 kbp. Short-read sequencing was also conducted using the Illumina X Ten platform to generate 150-bp paired-end sequence libraries (average fragment size of 500 bp). The quality of the raw short-read sequence was assessed using FastQC (Wingett and Andrews 2018), followed by quality-filtering and removal of low-quality and short reads with Trimmomatic (version 0.30) (Bolger et al. 2014).

The long PacBio reads were first corrected using the Canu software program (Koren et al. 2017). Three different assemblers were independently used to assemble the long-read data, using default parameters for the initial genome assemblies, namely Canu (version 2.1), WTDBG2 (Ruan and Li 2020), and MASURCA (version 3.3.8) (Zimin et al. 2013). Canu and WTDBG2 were only used to assemble the PacBio long reads, while MASURCA was used to produce a hybrid assembly with the PacBio and Illumina long and short reads. The initial genome assembly was then polished with Pilon (version 1.23) (Walker et al. 2014) using the Illumina short reads. The draft Canu assemblies were separately merged with the MASURCA and WTDBG2 assemblies using the Quickmerge program (version 0.3) (Solares et al. 2018). Pilon was again used to polish the merged assembly for further analysis. The BUSCO program (version 3.0.2) was used to evaluate the completeness of each assembly using the fungi odb9 dataset and a total of 290 BUSCO genes (Manni et al. 2021). The Illumina short reads were mapped to each of the assemblies using the BWA aligner (version 0.7.17) to further evaluate assembly quality (Li and Durbin 2010).

Genome annotation and gene prediction

To annotate the repeat sequences within each genome, a de novo repeat library was constructed for each genome assembly using RepeatModeler (version 1.0.11) (Saha et al. 2008), and RepeatMasker (version 4.0.9, http://www.repeatmasker.org) was then used to screen and annotate the whole genome. The BRAKER pipeline (version 2) was used to predict protein-coding genes in each genome using the model of proteins that considers any evolutionary distance (Bruna et al. 2021). The BRAKER2 protein prediction combined with the two gene prediction programs GeneMark-EP+ and AUGUSTUS allowed greater prediction accuracy than MAKER2 alone. The model was run using the OrthoDB fungal protein database (version 10) (Kriventseva et al. 2019) and with the parameter --softmasking.

Gene functional annotation

Carbohydrate active enzymes (CAZymes) were classified using the dbCAN (version 2) classification system (Zhang et al. 2018). The HMMER (with parameters --hmm_eval 1e-15 --hmm_cov 0.35), DIAMOND (with parameter --dia_eval 1e-102), and Hotpep (with parameters --hotpep_hits 6 --hotpep_freq 2.6) program algorithms were combined to predict CAZyme genes, and only results that were identified by at least two methods were used for further analysis. Putative secondary metabolite biosynthesis clusters were identified with antiSMASH (version 5.2.0) (Blin et al. 2021), while candidate cytochrome P450s were identified with InterProScan (version 5.52-86.0) (Jones et al. 2014) using the PFAM domain PF00067 as the reference. Secreted proteins typically contain secretion signals that were identified using the SignalP (version 4.1) (Almagro Armenteros et al. 2019) and TargetP (version 2) (Emanuelsson et al. 2007) programs to identify the lack of transmembrane domains, as previously described (Krogh et al. 2001). Only proteins with signal peptides predicted by both SignalP and TargetP, along with the lack of identified transmembrane domains outside the signal peptide that would be predicted by TMHMM, were regarded as predicted secreted proteins. The secreted protein datasets were then used to identify candidate effectors using EffectorP (version 2) (Sperschneider et al. 2016). Putative proteases were identified and classified using the MEROPS database (release 12.3) with a BLASTp approach by using an E-value cutoff of 1e−5 (Rawlings et al. 2018).

InterProScan (version 5.52-86.0) (Jones et al. 2014) was used to predict protein domains and Gene Ontology (GO) annotations for each encoded proteome. The GO terms were assigned to a gene family only when they occurred in at least three of the four strains, followed by enrichment analysis using hypergeometric tests. Those GO terms with at least three genes or gene families and with adjusted P-values < 0.05 were considered statistically significantly enriched. The IPRSCAN terms in the InterProScan results for each genome were mapped to the Fungal Transcription Factor database and used to predict transcription factors (Park et al. 2008). Transporters were then predicted by mapping the PFAM terms from the InterProScan results to the Transporter Classification Database (Saier et al. 2021).

Detoxification-related gene prediction

The reviewed Swiss-Prot database was used to predict genes involved in detoxification systems using BLASTp searches and with a cutoff of P < e−10. To identify genes involved in phenolic compound degradation, the salicylate hydroxylase amino acid sequence of FgShy1 from Fusarium graminearum (Rocheleau et al. 2019), the predicted salicylate hydroxylase (NAHG-Anid) from A. nidulans (AN2114 [http://www.aspergillusgenome.org/cgi-bin/locus.pl?locus=AN2114&organism=A_nidulans_FGSC_A4]), and the predicted salicylate hydroxylase (SH) from L. theobromae (comp12473_c0_seq1) (Paolinelli-Alfonso et al. 2016) were added to the Swiss-Prot database to identify the salicylate hydroxylase SHY1. In addition, the amino acid sequence of a suggested salicylate 3-hydroxylase from A. nidulans (Martins et al. 2015) was added to the Swiss-Prot database to identify the salicylate hydroxylase SHY3. Furthermore, the amino acid sequence of DhbA from the A. niger NRRL3 genome (NRRL3_4385) was used to identify DHBA homologs (Lubbers et al. 2021). Lastly, the amino acid sequence of the four intradiol ring cleavage dioxygenase genes NRRL3_02644, NRRL3_04787, NRRL3_01405, and NRRL3_05330 of A. niger (Semana and Powlowski 2019) and the predicted IRCD gene of L. theobromae (comp4276_c0_seq1) (Paolinelli-Alfonso et al. 2016) were added to the Swiss-Prot database to identify IRCD genes.

Genomic data collection

We also collected 14 other Botryosphaeriaceae species across six genera. Among these, the genome sequence of Diplodia corticola was downloaded from the RefSeq database, and the other 13 genome sequences were downloaded from the GenBank. Gene identification and protein annotation for these 13 species were performed using the same workflow described in the previous section.

Fifty-eight other phylogenetic closely related Dothideomycetes genomes were included in the comparative analyses, including Phyllosticta citriasiana, Aplosporella prunicola, and Saccharata proteae; non-Botryosphaeriaceae Botryosphaeriales in another three families; and 56 non-Botryosphaeriales Dothideomycetes including 21 plant pathogens and 30 saprobes across seven orders and two saprobes in unknown orders of Dothideomycetes. We only included species with predicted lifestyles as plant pathogens or saprobes based on all the methods used in a previous study of 101 Dothideomycetes genomes (Haridas et al. 2020). The detailed species information and their lifestyle prediction can be found in Table S2.

Phylogenomic analyses

Proteins without premature stop codons and with over 50 amino acids were extracted for gene family analysis from all of the 76 genomes (including Pyricularia oryzae and Aspergillus flavus). Proteins were clustered using the all-to-all DIAMOND method using an E-value threshold of 10−5 with OrthoFinder (Emms and Kelly 2019), resulting in 30,694 identified gene families. A total of 968 protein sequences were randomly sampled from 1936 conserved single-copy genes in 41 species, including 18 Botryosphaeriaceae, 21 Dothideomycetes, P. oryzae, and A. flavus. The proteins were then used to construct a species tree, as described below.

The MAFFT aligner (version 7.313) was used to separately align each gene using default parameters (Katoh and Standley 2013). Spurious sequences or poorly aligned regions were removed from the multiple sequence alignments using trimAl (version 1.2) with the parameter setting of ‘-automated1’ (Capella-Gutierrez et al. 2009), followed by a concatenation of the 968 alignments into a single matrix. The RAxML-NG program was then used to conduct Maximum Likelihood phylogenetic inference using the following parameters: --all, --model LG+I+G4, and --bs-trees 100 (Kozlov et al. 2019). The phylogenetic trees were rooted using P. oryzae and A. flavus sequences.

Gene family expansion and contraction analyses

The Markov Chain Monte Carlo method within MCMCTREE of PAML66 (version 4.9i) was used to estimate the divergence times of fungal species (Yang 1997). Fossil times for the divergences of P. oryzae and A. flavus (241–402 million years (Mya)) were used based on the Time Tree database (http://www.timetree.org), with the crown age of Botryosphaeriales set to 110 Mya (Phillips et al. 2019). Independent clock rates (clock = 2) and a constraint on the root age (RootAge < 500 Mya) were set in the control file. The MCMCTREE program was then run using the parameter settings of burnin = 20,000, sampfreq = 100, and nsample = 100,000. Two independent runs were performed to assess the convergence of the results.

The Computational Analysis of gene Family Evolution (CAFÉ) program (version 5.0) was then used to evaluate the evolution of gene family size (De Bie et al. 2006). The CAFÉ program uses a random birth and death process to identify gene gains and losses and may provide false results if the included gene families do not have data in the tree regarding their most recent common ancestor (MRCA). A total of 20,531 gene families were present in the 41 species’ genomes and were used in the gene family expansion and contraction analysis. The phylogenetic time tree obtained (as described above) was used to assess gene family evolution herein.

Transcriptomic sequencing

The L. theobromae strain Las1 cultures were used for inoculation experiments onto detached Vitis vinifera cv. ‘Summer Black’ branches. Briefly, green grapevine shoot wounds were inoculated with 5-mm-diameter mycelium plugs that were cut with a cork borer and then maintained under alternating dark/light cycles inside a growth chamber with a humidity of 80% and a temperature of 25 °C. The inoculated grapevine tissues were collected at 6, 24, and 72 h post-inoculation, while fungal mycelia were collected as controls from the potato dextrose agar (PDA) plates at 6 h post-culturing and subjected to subsequent RNA-Seq analyses. Total RNA was extracted from the samples, processed, and used to generate Illumina RNA-Seq libraries following standard protocols at Novogene Company. Paired-end sequencing was then performed to generate 150-bp paired-end reads for each library using three independent biological replicates. The RNA-Seq reads were quality-filtered and mapped to the Las1 genome using the Hisat2 aligner with default parameters (Kim et al. 2015). Differential expression was subsequently assessed using the R/Bioconductor package limma (v.3.0.8) (Ritchie et al. 2015). Genes with an adjusted P-value < 0.05 and a fold change > 1.5 were considered differentially expressed.

Gene expression data were also retrieved from a previously published study of the L. theobromae isolate AM2As before inoculation and during infection of cacao leaf disks for 48 h (Ali et al. 2020). This dataset did not contain raw RNA-Seq data, and so to compare the AM2A genes to those of Las1, their proteomes were reciprocally compared using BLASTp analysis to identify orthologs in these two isolates. Previously generated transcriptomes for L. theobromae-infected wounded tea leaves for 48 h and before inoculation were also included for comparison (Jiang et al. 2021). The raw RNA-Seq data from this study were mapped to the Las1 genome, and the same pipeline described above was used to perform differential expression analysis.

Horizontal gene transfer analysis

The NCBI non-redundant (Nr) database was used to investigate inter- and intra-kingdom horizontal gene transfers within the newly sequenced four genomes of this study as well as by using phylogenomics and phylogeny-independent approaches. To facilitate the calculation of horizontal gene transfer event transfer times, the proteomes of the four species of this study and of the 14 other Botryosphaeriaceae species that were predicted using our gene prediction pipeline were added to the Nr database and combined as an Nr+ database for subsequent horizontal gene transfer analysis. The horizontal gene transfers originating from distantly related donors outside Dothideomycetes were considered and included potential inter-kingdom donors of viruses, archaea, bacteria, animals, and plants. In addition, potential intra-kingdom donors included the non-Ascomycota donors of Mucoromycota and Basidiomycota. Furthermore, the intra-kingdom Ascomycota donors of Taphrinomycotina, Saccharomycotina, Sordariomyceta, and other non-Dothideomycetes classes in Pezizomycotina were also included in the horizontal gene transfer analysis (Table S3). Non-Botryosphaeriales species within Dothideomycetes were considered the closest outgroup taxa and a reference for identifying horizontal gene transfers among the four Botryosphaeriaceae species genomes. Sixteen non-Botryosphaeriales orders are present within Dothideomycetes, and 110 species have well-represented genomes in the NCBI Nr database. Thus, these datasets are likely sufficient to mitigate candidate horizontal gene transfers that could be erroneously inferred due to under-sampled lineages.

To construct single-gene phylogenetic trees for each of the four strains, their protein sequences were searched against the Nr+ database using the Diamond BLASTp method (e-value = 10−10). Genes with > 10 matches in a putative non-Dothideomycetes donor group were retained for downstream phylogenomic analyses. The retained genes were further filtered using the following criteria. First, the protein sequence identity must be above 20% between the putative donor group and the recipient group. Second, for the top 50 best non-Botryosphaeriales BLAST hits in each query gene, at least 80% of the species must belong to one of the donor groups listed above, or at least 90% of the species must be within Sordariomyceta and Eurotiomycetes. Third, for each gene, the frequency values of non-Botryosphaeriales Dothideomycetes species within the top 500 BLASTp hits must be less than 30%, and at least 50% of the species must belong to one of the donor groups listed above, or must be within the Sordariomyceta and Eurotiomycetes in the top 500 BLASTp hits. Fourth, the Alien Index (AI) (Fan et al. 2020) score must be more than 0. The AI was calculated as follows:

$$\mathrm{AI}=\mathrm{ln}\left(\mathrm{DEvalue}+\frac{1}{{10}^{200}}\right)-\mathrm{ln}\left(\mathrm{OEvalue}+\frac{1}{{10}^{200}}\right)$$

In the equation, the DEvalue is the e-value of the best hit from the sequence of the closest non-Botryosphaeriales species within Dothideomycetes from the BLASTp search results. The OEvalue is the e-value of the best hit from more distant species outside Dothideomycetes. Higher AI scores indicate greater possibilities of genes coming from more distant species than from Dothideomycetes species.

Putative sequencing contamination was removed from the horizontal gene transfer results using two criteria. First, candidate genes must be located on long assembled scaffolds (> 20 kbp) and not at the end of a scaffold. Second, candidate genes could also be found in at least one of the other publicly available genomes from closely related species.

For each BLASTp result in the filtered candidate horizontal gene transfer gene dataset, the amino acid sequences of the 500 best matches were then retrieved. The retrieved homologous sequence sets were then aligned with the query gene using MAFFT, followed by the removal of poorly aligned regions with TrimAI, as described above. Trimmed alignments with lengths < 50 amino acids were discarded. Maximum Likelihood trees of the homologs were constructed using IQ-TREE (v.1.6.9) (Nguyen et al. 2015) and with an automatically selected best-fit amino acid substitution model. Standard non-parametric bootstraps (-b 1000) were used to assess branch support with the parameter setting of ‘-gt 0.4’. FastTree (version 2.1.10) was also run using the parameters -lg -spr 4 -mlacc 2 -slownni, with the branch support values estimated using the Shimodaira-Hasegawa test to confirm the IQ-TREE (Price et al. 2009).

When the query genes of Botryosphaeriales species were nested within the donor groups with at least two confident nodes both in IQ-TREE and FastTree (Fan et al. 2020), their IQ-TREE trees were then manually checked to infer horizontal gene transfers. The transfer time was predicted before or after Botryosphaeriaceae lineage formation based on the Botryosphaeriales species in the gene trees using the Botryosphaeriales phylogenetic time tree as a reference. Sequences of all horizontal gene transfer genes in the four Botryosphaeriaceae species in the same protein ortholog groups were extracted and merged with the sequences of the 500 best BLASTp matches in one randomly selected horizontal gene transfer gene in the ortholog group. Ortholog horizontal gene transfers gene trees were then constructed using the same workflow described above and visualized with the command version of iTOL v5 (Letunic and Bork 2021).

Results

Genome assembly of four Botryosphaeriaceae genomes

Genome sequencing was conducted for the four Botryosphaeriaceae strains, including Botryosphaeria dothidea (Bod1), Neofusicoccum parvum (Neo1), Lasiodiplodia theobromae (Las1), and L. pseudotheobromae (Las3), using a combination of PacBio Sequel and Illumina sequencing platforms, and were assembled to 24 and 178 scaffolds, with 43–47 Mbp of assembled data per genome (Tables S1, S4). The BUSCO scores were > 99 for all assemblies and > 99.5% of the Illumina reads could be mapped to the assembled genomes, suggesting a high degree of completeness for all four genomes (Table S4). The four genomes were predicted to encode 13,105–13,523 genes, comparable to the representative genomes publicly available in the GenBank or RefSeq databases. Blast analysis revealed that 92% of the predicted L. theobromae genes were one-to-one mapped genes in the RefSeq representative genome, suggesting a high quality of gene prediction in this study (Table S4).

Widespread gene family expansion before the divergence of Botryosphaeriaceae genera

Phylogenetic analysis based on a concatenated alignment of 968 protein sequences encoded by conserved single-copy genes in the genomes of 18 Botryosphaeriaceae, three Botryosphaeriales plant pathogens in other families, and 18 non-Botryosphaeriales Dothideomycetes confirmed the position of Botryosphaeriales within Pleosporomycetidae (See section “Materials and methods”, Fig. 1a). Further, the saprobes Coniosporium apollinis, Patellaria atrata, Verruconis gallopava, Mytilinidion resinicola, and Hysterium pulicare were their closest relatives. After fixing the crown age of Botryosphaeriales at 110 Mya following Phillips et al. (2019), the ages of the 18 Botryosphaeriaceae taxa were estimated at 39 Mya, with their divergence from Phyllostictaceae estimated at 67 Mya.

Fig. 1
figure 1

Gene family expansion in Botryosphaeriaceae. a Gene family gain and loss analysis of Botryosphaeriaceae and 23 other fungal species. Bubbles correspond to gained or lost gene families inferred in that node. The text below the nodes represents different ancestors of Botryosphaeriales at different evolutionary stages. b The genome length of Botryosphaeriaceae members and their phylogenetic closely related plant pathogens and saprobes. c GO enrichment analysis of gene functions in gene families that were gained at different stages of Botryosphaeriales evolution listed in panel a

Based on the phylogenetic tree, the 18 Botryosphaeriaceae species could be classified into four clades: five Neofusicoccum species were in clade I, four Lasiodiplodia species were in clade III, and five Diplodia species were in clade IV. Clade II had three genera including Neoscytalidium, Macrophomina, and Botryosphaeria. The genome lengths of all four clades were significantly expanded compared with those of the three non-Botryosphaeriaceae pathogenic species in Botryosphaeriales or their closest saprobic relatives (Fig. 1b). A high variation of transposable element (TE) content was identified in these four clades, and no significant differences were found compared with other Botryosphaeriales families or their closest saprobic relatives, suggesting that TEs may not have driven their genome expansion (Fig. S1a). In addition to clade IV, the other three clades showed relatively higher gene numbers than species in other Botryosphaeriales families or their closest saprobic relatives, further suggesting genome expansion in Botryosphaeriaceae (Fig. S1b).

Gene family expansion and contraction analysis revealed abundant gene gains in the MRCA of Botryosphaeriaceae. Specifically, a total of 652 gene family gains were identified in the MRCA of Botryosphaeriaceae, compared with 14 losses. More recent extensive gene gain events were also observed in the MRCA of clade I (531 gains vs. 40 losses), the MRCA of clade II (141 gains vs. 18 losses), and the MRCA of clade III (328 gains vs. 19 losses; Fig. 1a). Significant gene gain was also identified in the MRCA of Botryosphaeria in clade II (297 gains vs. 67 losses). However, comparable gene gains and losses were found in the early ancestor of clade IV, and more genes were lost in the later ancestor of clade IV.

The GO analysis revealed that gene families gained in stages before and after Botryosphaeriaceae diverged from Phyllostictaceae (also of Botryosphaeriales) were particularly enriched in genes encoding proteins involved in transmembrane transport and oxidoreductase activity (Fig. 1c). However, GO terms in response to oxidative stress and siderophore biosynthetic processes in iron detoxification systems were enriched among the gene families gained only within the MRCA of Botryosphaeriaceae.

Expansion of putative pathogenicity-related genes in Botryosphaeriaceae

To characterize the gene repertoires involved in invading host plants, gene families associated with host infection and pathogenicity including those encoding CAZymes, effectors, secondary metabolism enzyme clusters (SM), proteases, cytochrome P450 enzymes, transcription factors (TFs), and transporters for 18 Botryosphaeriaceae were compared against 56 other phylogenetic closely related Dothideomycetes genomes comprising plant pathogens (24) and saprobes (32, See section “Materials and methods”, Table S2). Botryosphaeriaceae harboured a greater abundance of genes encoding CAZymes that can potentially degrade plant cell walls than other plant pathogens and saprobes (P < 0.01, Wilcoxon rank sum test; Fig. 2a and Table S5). Cytochrome P450 enzymes, TFs, and transporters were also significantly expanded in the genomes of these 18 Botryosphaeriaceae species compared with other plant pathogens and saprobes (P < 0.05, Wilcoxon rank sum test; Fig. 2a and Fig. S2). However, a significant expansion of effectors was not observed in Botryosphaeriaceae genomes and was comparable to saprobes but was significantly lower than other plant pathogens (Fig. 2a).

Fig. 2
figure 2

Analysis of pathogenicity-related genes in Botryosphaeriaceae. a Boxplots showing the distribution of total genes in different pathogenicity-related groups among the 74 genomes analyzed in this study. Wilcoxon rank sum tests were used to separately compare the gene numbers in Botryosphaeriaceae genomes with those of non-Botryosphaeriaceae Dothideomycetes. Botry: Botryosphaeriaceae; Pathogen: other non-Botryosphaeriaceae plant pathogens. ***: P < 0.001; **: P < 0.01; and *: P < 0.05. b Heatmap showing the gene numbers of different pathogenicity-related families among the 74 species analyzed in this study. We randomly selected five and six species from 25 pathogens and 31 saprobes for heatmap analysis, respectively. The last two columns on the right show the statistical significance of the comparison of Botryosphaeriaceae with all Pathogen and Saprobe species. *: P < 0.05, 0.01, or 0.001

A total of 40 and 34 CAZyme families were identified in Botryosphaeriaceae that exhibited higher gene numbers than other plant pathogens and saprobes, respectively (P < 0.001, Wilcoxon rank sum test; Table S6). These gene families included plant cell wall-degrading enzymes (PCWDEs) that can degrade chitin, cellulose, hemicellulose, pectin, and lignin, which play critical roles in breaking down plant cell walls during host–pathogen interactions (Kubicek et al. 2014). Among these gene families, five families (GH1, GH3, GH12, GH29, and GH43) can degrade cellulose or hemicellulose and five families (GH28, GH78, PL1, PL3, PL4, and PL9) that can degrade pectin, in addition to two families (AA3 and MCO) involved in lignin degradation, were significantly expanded in Botryosphaeriaceae compared with other plant pathogens and saprobes (Fig. 2b and Table S6).

Further analysis revealed the presence of three transporter superfamilies that were enriched in Botryosphaeriaceae, namely APC, LysE, and TOG superfamilies (Fig. 2b). Families related to SMs (3), proteases (21), TFs (3), and P450s (20) were also found to be enriched in Botryosphaeriaceae genomes in comparison to other plant pathogens and saprobes (P < 0.001; Table S6). These results collectively suggest that Botryosphaeriaceae exhibit special pathogenicity-related gene repertoires that are used for infecting hosts.

Expansion of iron detoxification genes in Botryosphaeriaceae

During the early stages of plant–microbe interactions, fungi need to scavenge ROS to successfully invade their hosts. However, we did not find that the number of these genes in Botryosphaeriaceae was higher than that in other plant pathogens or saprobes, except for Trx homologs in thioredoxin system (P < 0.001, Table S7). Fungi can produce low-molecular-weight (500–1500 Da) ferric-iron-specific chelating siderophores that uptake iron from their hosts. Particularly, ornithine-N5-monooxygenase (Siderophore biosynthetic enzyme A, SIDA) is the enzyme responsible for the first step of fungal siderophore biosynthesis (Haas et al. 2008). Relatively lower numbers of SIDAs were observed for Botryosphaeriaceae than for other plant pathogen and saprobe fungal genomes (P < 0.001; Fig. S3 and Table S7). Further analysis revealed the presence of homologs for enzymes for intracellular (SIDC) and extracellular (SIDD) siderophore biosynthesis that are involved in subsequent biosynthesis steps in Botryosphaeriaceae genomes, and these families were significantly expanded in Botryosphaeriaceae when compared with other plant pathogen and saprobe genomes (P < 0.001; Fig. 3a).

Fig. 3
figure 3

Detoxification-related genes in Botryosphaeriaceae. a Boxplots showing the distribution of total gene numbers in families related to iron siderophore biosynthesis and transport. b Gene numbers in families related to phenolic compound degradation pathways. ***: P < 0.001; **: P < 0.01; and *: P < 0.05. c Pathogenicity analysis of L. theobromae Las1 during the infection of grapevine branches. Assays were conducted by inoculating 1-year-old wounded grapevine shoots with 5-mm-diameter mycelial plugs, followed by incubation at 25 °C. The inoculated shoots were then photographed at 4, 8, 24, and 72 h post-inoculation. I: wounded and inoculated shoots; CK: wounded shoots without inoculation. d Transcriptional analysis of detoxification-related genes of L. theobromae Las1 during grapevine branch infection. Dual RNA-Seq (host and pathogen RNA sequenced together) was conducted by inoculating grapevine shoots with Las1 for 6, 24, and 72 h. Fungal transcriptomic reads were mapped to the Las1 genome and compared with fungal mycelia that were collected from PDA 6 h post-culturing. Before (B): fungal gene expression from cultivation in PDA; after (A): fungal gene expression after inoculation into grapevine branches. The last three columns on the right show the statistical significance of differential gene expression between inoculated samples and cultures grown on PDA. 1: significant difference, where the expression of inoculated samples is greater than that of cultures grown on PDA; − 1: significant difference, where the expression of inoculated samples is lesser than that of cultures grown on PDA; and 0: no difference in expression

The siderophore iron transporter (SIT) gene family is a subfamily of the major facilitator superfamily, and its products are used by fungi to take up siderophore iron chelates during infection (Haas et al. 2008). A total of 743 SIT homologs were identified among the 76 fungal genomes in this analysis (including Pyricularia oryzae and Aspergillus flavus), with Botryosphaeriaceae encoding more members of the SIT family compared with other plant pathogen and saprobe fungal genomes (P < 0.001; Fig. 3a and Table S7). Phylogenetic analysis of these 743 homologs classified them into five groups, comprising three with homology to the MIRA, MIRB, and MIRC SIT families of Aspergillus nidulans, one group homologous to the MFS1 SIT family of Ceriporiopsis subvermispora, and another group homologous to the SIT1 family of Aspergillus fumigatus (Fig. S5 and Table S8). Fungi use different SIT families to chelate different siderophore substrates (Haas et al. 2003; Park et al. 2016; Raymond-Bouchard et al. 2012). The MIRA, MIRB and MFS1 of these five SIT families were significantly enriched (P < 0.001) in Botryosphaeriaceae genomes compared with other plant pathogen and saprobe genomes (Fig. S5). Thus, Botryosphaeriaceae may have a more robust capacity to transport special siderophore substrates to maintain iron homeostasis during infection.

Siderophore-chelated iron can also be transported via the reductive iron assimilation (RIA) pathway. Gene families in this pathway were also expanded in Botryosphaeriaceae compared with other plant pathogen and saprobe fungal genomes (P < 0.001; Table S7). Thus, the RIA pathway may also be important for balancing iron levels in Botryosphaeriaceae during host infection.

Expansion of phenolic degradation genes in Botryosphaeriaceae

The degradation of host salicylic acid (SA) plays an important role in overcoming host defense during plant–fungal interactions. In this pathway, SA is first converted to catechol by a salicylate hydroxylase (SHY1) or 3-hydroxylated to 2,3-dihydroxybenzoic acid by salicylate 3-hydroxylase (SHY3), followed by decarboxylation to catechol by 2,3-dihydroxybenzoate decarboxylase (DHBA). Comparative analysis revealed a significantly higher number of SHY3 and DHBA homologs encoded by Botryosphaeriaceae genomes than by other plant pathogen and saprobe genomes, suggesting a greater expansion of SA-degrading genes in Botryosphaeriaceae (P < 0.001; Fig. 3b and Table S7). A significantly higher number of SHY1 homologs were also identified in Botryosphaeriaceae (P < 0.001; Fig. S3) than in other plant pathogens; however, this number was only moderately higher than that identified in saprobes (P < 0.05).

The catechol ring is then opened by an intradiol ring cleavage dioxygenase (IRCD) catechol 1,2-dioxygenase (Lubbers et al. 2019, 2021; Rao et al. 1967; Martins et al. 2015). Fungi can use IRCDs to degrade different phenolic metabolites that are components of plant defense. To identify IRCDs encoded by the 76 fungal genomes analyzed here, four well-annotated A. niger IRCD genes (IRCD1–4) including IRCD4 in SA degradation and a reported IRCD gene of L. theobromae (IRCD5) with unknown function were used to identify their homologs (Semana and Powlowski 2019; Paolinelli-Alfonso et al. 2016). A total of 535 IRCD genes were identified in 76 genomes, with Botryosphaeriaceae encoding more IRCD genes than other plant pathogen and saprobe genomes (P < 0.001; Fig. 3b). Phylogenetic analysis classified these 535 IRCDs into four groups, including three with homology to four A. niger IRCDs and a group with homology to IRCD5. The IRCD5-encoding genes were significantly more prevalent in Botryosphaeriaceae genomes than in other plant pathogen and saprobe fungal genomes (P < 0.001; Fig. S7a and Table S8). Phylogenetic analysis indicated that the IRCD5 members were more closely related to the spider mite Tetranychus urticae IRCDs, which can degrade jasmonic acid (JA) (Fig. S7b) (Njiru et al. 2022). Overall, these results suggest that genes involved in degrading SA and other phenolic compounds were significantly expanded in Botryosphaeriaceae genomes.

Genome expansion in Botryosphaeriaceae contributed to plant–pathogen interactions

We characterized the function of expanded gene families in the MRCA of Botryosphaeriaceae. A greater abundance of genes encoding CAZymes, transporters, SMs, and proteases was encoded by gained gene families identified in the MRCA of the 18 Botryosphaeriaceae members compared with the genomic background (Fig. S8a–d; P < 0.001). Furthermore, genes in iron detoxification and phenolic compound degradation were enriched in gene families expanded in the MRCA of the 18 Botryosphaeriaceae members compared with the genomic background (Fig. S8e–h; P < 0.001).

To further investigate the functions of expanded gene families during host infection, the stems of V. vinifera var. ‘Summer Black’ were infected with L. theobromae, and the mixed plant–fungi transcriptomes were evaluated at 6, 24, and 72 h post-infection based on the pathogenicity phenotypes in grapevine branches (Fig. 3c). The resultant fungal transcriptome was compared against a control transcriptome from L. theobromae maintained in PDA medium under the same conditions (Table S9).

We identified comparable differential expressed genes in L. theobromae in each of the post-infection stages compared to the control transcriptome (Fig. S9a). Most of the induced genes were enriched in oxidation–reduction process, cellular aromatic compound metabolic process, transmembrane transport, amino acid transport, proteolysis and so on (Fig. S9b). The frequency of induced genes in expanded gene families during infection compared to those grown on PDA before infection was significantly higher than the genomic background (Fig. S9c).

Considering the iron detoxification systems of the fungi, SIDAs and SIDCs were significantly repressed in L. theobromae during the infection grapevine than grown on PDA before infection. However, no statistically significant differences were observed for SIDD genes (Fig. 3d). Of the 18 SIT genes encoded by the L. theobromae genome, eight were expressed in the transcriptomes generated here, and most exhibited relatively high expression during infection, with four that were significantly upregulated compared with gene expression before the invasion, suggesting that siderophore transporter genes may help pathogens infect their hosts (Fig. 3d).

Among the 29 predicted phenolic degradation genes in the L. theobromae genome, 18 were expressed in the transcriptomic data (Fig. 3d). Seventeen exhibited significantly upregulated expression after inoculation compared with PDA-grown cultures, suggesting the importance of phenolic degradation for L. theobromae infection of hosts. Of the eight predicted SA-degradation homolog genes (SHY1, SHY3, DHBA, and IRCD4), most of them (six) exhibited their highest expression at 6 h or 24 h after infection (including IRCD4). By contrast, the expression of IRCD5 homologs was induced later than that of the SA-degradation genes, with four most expressed at 24 h and four others peaked at 72 h after infection (Fig. 3d). These results all suggest that the degradation of different phenolic compounds is staggered during L. theobromae infection of its hosts, with SA degradation occurring earlier and the degradation of other phenolic compounds occurring later.

To further characterize the functions of expanded genes in infecting other plants and tissues, two published transcriptomic datasets, namely one for L. theobromae infection of cacao leaves at 48 h (Ali et al. 2020) and another for the infection of tea leaves at 48 h (Jiang et al. 2021), were included to investigate the expression of expanded genes involved in the siderophore iron uptake pathway and phenolic degradation pathway. Comparable frequencies of expressed genes were identified in samples before and after infection of hosts in these three datasets (Fig. S10a). However, more induced genes were identified in the detoxification system compared with genomic backgrounds, particularly in the tea leaf infection dataset (Fig. S10b, c), suggesting that greater numbers of detoxification genes were induced and contributed to their infection within different host–pathogen interaction systems.

Ancient intra-kingdom horizontal gene transfer in Botryosphaeriaceae

To evaluate the presence of horizontal gene transfers in Botryosphaeriaceae (transfer events occurred after Botryosphaeriales diverged from other orders of Dothideomycetes), phylogenomics and phylogeny-independent analytical approaches were used to investigate horizontal gene transfer events within the backgrounds of our four Botryosphaeriaceae genomes because of their high assembly quality. Both inter- and intra-kingdom horizontal gene transfers were investigated. To exclude false-positive horizontal gene transfers that may be identified due to under-sampled lineages, we only considered donors from distant archaeal, bacterial, and other inter-kingdom eukaryotic species, in addition to well classified intra-kingdom fungal species outside Dothideomycetes (Table S3). After a strict filtering procedure (see the “Materials and methods” section and Fig. S11), a total of 36 putative horizontal gene transfer events were identified in the four genomes, comprising 27 in N. parvum, 31 in B. dothidea, 33 in L. theobromae, and 34 in L. pseudotheobromae (Fig. 4a and Tables S10, 11). The putative horizontal gene transfer genes had shorter coding sequences and lower GC contents than the vertical descent genes in the host genome (Fig. 4b; pairwise t-test, P < 0.05) for all four strains, indicating that foreign-derived genes exhibit unique gene features relative to their host genomes, which was also found in HGT genes of other species (Husnik and McCutcheon 2018).

Fig. 4
figure 4

Evidence for horizontal gene transfer in Botryosphaeriaceae. a Inferred horizontal gene transfer (HGT) events in Botryosphaeriaceae. The HGT analysis was conducted for the four Botryosphaeriaceae species sequenced in this study. Numbers in the orange circles refer to inferred HGT genes from corresponding organisms. Numbers in light coral and the medium-light shade of cyan circles at each node refer to HGT events inferred in Botryosphaeriales and Botryosphaeriaceae, respectively, based on the distribution of inferred HGT ortholog genes in species descending from that node. A total of 36 HGT events were identified among all Botryosphaeriales nodes. b Comparison of gene structures including average CDS lengths and CDS GC contents in inferred HGTs with those in non-HGT genes (controls) for the four strains. Pairwise t-tests were used to assess statistical significance. *: P < 0.001. c Donor distribution of the 36 inferred HGT events. d An example of a TrxR HGT gene that was transferred from Sordariomycetes into an early ancestor of Botryosphaeriaceae. The full tree information can be found in the figshare repository (https://doi.org/10.6084/m9.figshare.22224061)

Most of the putative horizontal gene transfer events occurred in the MRCA of the 18 Botryosphaeriaceae species (15 horizontal gene transfer events), with six horizontal gene transfer events occurring early before Botryosphaeriaceae species diverged from other Botryosphaeriales families (Fig. 4a). For example, an inferred horizontal gene transfer event was identified in all four species. It was derived from Proteobacteria and transferred to Botryosphaeriales at an early stage, leading to its ubiquity among Botryosphaeriales and suggesting an ancient transfer from bacteria to the ancestor of Botryosphaeriales (Fig. S12). Putative horizontal gene transfers were also identified for the MRCA of various genera in Botryosphaeriaceae (Fig. 4a), suggesting that horizontal gene transfer events were also common after this family diverged into different lineages. Further analysis indicated that the majority of horizontal gene transfers (29) were from Sordariomycetes, with the rest deriving from bacteria (2) and a rankless taxon Sordariomyceta (taxon that includes the classes of Sordariomycetes, Laboulbeniomycetes and Leotiomycetes; 5; Fig. 4c). The Fig. 4d shows an inferred horizontal gene transfer of a TrxR ROS scavenger gene from Sordariomycetes into Botryosphaeriaceae in the early ancestor of this family. It was ubiquitously found in Botryosphaeriaceae and was further duplicated in Lasiodiplodia. Thus, the gain of foreign intra-kingdom genetic material may be common for Botryosphaeriaceae. Even though Sordariomycetes and Dothideomycetes may share common ancestors in early Ascomycota (Wijayawardene et al. 2018), our results suggested the possibility of continuous gene family loss and gain during the later stages of evolution in Dothideomycetes (Haridas et al. 2020).

Horizontal gene transfers contributed to the expansion of pathogenicity-related genes and detoxification systems in Botryosphaeriaceae

For the 33 inferred horizontal transfer genes in L. theobromae, nearly 30% induced their expression in samples before and after host infection in the cacao and tea leaf datasets, which is significantly higher than their genomic backgrounds (Fig. 5a and Fig. S13a). Further, nine horizontal gene transfer events were associated with the transfer of pathogenicity-related genes, comprising one protease-encoding gene, three transporter-encoding genes, and five TF-encoding genes, and nearly half of them were inferred to have been transferred from donor Sordariomycetes or Sordariomyceta into the MRCA of the 18 Botryosphaeriaceae members (Fig. 5b and Fig. S13b).

Fig. 5
figure 5

Horizontal gene transfer of pathogenicity- and detoxification-related genes in Botryosphaeriaceae. a Frequency of induced genes in L. theobromae upon infection of different plant hosts. Induction was defined as a gene in the inoculation treatment exhibiting significantly higher expression than before inoculation. ***: P < 0.001; **: P < 0.01. HGT: HGT genes; BK: genome background. b HGT analysis of pathogenicity- and detoxification-related genes in Botryosphaeriaceae. c, d An example of an IRCD5-related HGT gene (HGT4) that was transferred from Sordariomycetes into an early ancestor of Botryosphaeriales before Botryosphaeriaceae diverged from Aplosporella prunicola. Las1.g3824.t1 in L. theobromae was used as the query to build the gene tree. c: Circular tree. d: Collapsed rectangular tree. Others: species outside the blue inner circle of the circular tree in c. The full tree information can be found in the figshare repository (https://doi.org/10.6084/m9.figshare.22224061)

Five horizontal gene transfer events were identified for proteins involved in detoxification systems, including one TrxR horizontal gene transfer (Fig. 4d) involved in ROS scavenging, two SIT horizontal gene transfers (HGT1 and HGT2; Fig. 5b) involved in siderophore transport, and two IRCD5 HGTs (HGT3 and HGT4; Fig. 5b) involved in the degradation of phenolics. Similar to the abovementioned inferred horizontal gene transfer events, the donor taxa for inferred horizontal gene transfers in detoxification systems were from Sordariomycetes or Sordariomyceta. Three putative horizontal gene transfers including TrxR, IRCD5 (HGT3), and SIT (HGT1) were gained in the MRCA of the 18 Botryosphaeriaceae species (Fig. S14a). Inferred HGT genes in these detoxification systems were more enriched in the four genomes compared with their genomic backgrounds (P < 0.001; Fig. S14b), with eight, six, seven, and seven of these horizontal gene transfer genes found in the Botryosphaeria dothidea, Neofusicoccum parvum, Lasiodiplodia theobromae, and L. pseudotheobromae genomes, respectively (Fig. 5b). Among the seven HGT genes of L. theobromae, five were significantly upregulated in at least one transcriptome dataset after infection of their plant hosts (Fig. S13a), suggesting that these horizontal gene transfer genes play important roles in host infection by fungi.

HGT4 in IRCD5 was transferred from Sordariomycetes before Botryosphaeriaceae diverged from Aplosporellaceae (Fig. 5c, d). Each of the four species in this study encoded three HGT4 genes in their genomes, and these were inferred to have experienced two duplication events in Botryosphaeriaceae after being transferred to Botryosphaeriales (Fig. 5d). Among the three L. theobromae HGT4 genes, two significantly upregulated their expression in samples after host infection in the cacao and tea leaf datasets than before infection (Fig. S13a). These results suggest that the duplication of genes derived from horizontal gene transfers contributed to the expansion of detoxification systems in Botryosphaeriaceae.

Discussion

In this study, we investigated genome evolution and pathogenicity origin in Botryosphaeriaceae using comparative genomics, transcriptomics, and horizontal gene transfer analysis. For the comparative genomic analysis, genomes of four Botryosphaeriaceae species were sequenced and combined with another 14 Botryosphaeriaceae species. Our newly assembled sequences showed significantly higher N50 values and BUSCO scores than previously assembled genomes in other Botryosphaeriaceae species using only the Illumina sequencing platform (Garcia et al. 2021; Nagel et al. 2021; Yu et al. 2022). In addition, 56 previously published phylogenetic closely related Dothideomycetes genomes of plant pathogenic and saprobic isolates were also included in this analysis. However, it should be pointed out that endophytic strains were not included in this study as they were not available. Bhunjun et al. (2023) has postulated that all fungi have evolved from endophytic ancestors, however, since we do not have endophytic genomic data, this study does not reject or confirm this hypothesis. Therefore, any reference to saprobes also refers to possible endophytes as they could not be distinguished (Oses et al. 2008; Promputtha et al. 2010).

Our analyses of high-quality genomes revealed significant gene family expansion and horizontal gene transfers in the ancient ancestors of Botryosphaeriaceae that may have aided changes in lifestyles by expanding their CAZymes, SMs, proteases, P450s, TFs, and transporters, as well as by expanding their detoxification systems including in iron detoxification and phenolic compound degradation. Phenolic phytohormones like salicylic acid play important roles in the activation of basal defences against plant pathogens (Bhattacharya et al. 2010; Shalaby and Horwitz 2015). To successfully invade woody hosts, Botryosphaeriaceae may evolved different scavenging abilities to suppress or neutralize ROS defences, maintain iron homeostasis, and degrade phenolics to manipulate plant defence responses compared to their saprobic or endophytic ancestors (Yaakoub et al. 2022; Misslinger et al. 2021; Rocheleau et al. 2019).

Further, our isolates had a close relationship to saprobes as their closest relatives (Coniosporium apollinis, Patellaria atrata, Verruconis gallopava, Mytilinidion resinicola, and Hysterium pulicare) which indicates that the ancestors of Botryosphaeriaceae (Botryosphaeriales) might originate from saprobes and began to obtain pathogenicity before Botryosphaeriaceae diverged from other families (Yu et al. 2022). As proposed in Rathnayaka et al. (2023), the ancestors of Botryosphaeriaceae may also originate as endophytes and then switch into saprobes or pathogens. We postulate that during this time, Botryosphaeriaceae may have evolved genes to facilitate their infection of woody plants.

PCWDE are the major virulence determinants in phytopathogenens (Cubitt et al. 2013). Compared to other plant pathogens and saprobes, the expansion of the PCWDE repertoire of CAZymes in Botryosphaeriaceae underpins their ability to act as powerful wood decayers that can efficiently degrade cellulose, hemicellulose, pectin and lignin (Fig. 2b). Widespread gene family expansion were also found in transporters, which were also conceded as pathogenicity factors (Lu et al. 2014; Moktali et al. 2012; Vela-Corcia et al. 2019). The expanded APC superfamilies are involved in transporting amino acids and their derivatives that may help fungi acquire nutrients from their hosts (Jack et al. 2000). The expanded LysE superfamily consists of transmembrane transport proteins that catalyze the export of amino acids, lipids, and heavy metal ions, and are involved in cell physiology and pathogenesis (Tsu and Saier 2015). In our analysis, we found a significantly high number of genome innovation events in the MRCA of the Botryosphaeriaceae, suggesting this family mainly gained their pathogenicity features from their saprobic ancestors in that stage.

Woody tissues contain large quantities of phenolic compounds and lignin (Tanase et al. 2019; Bucher et al. 2004). The expansion of lignin-degrading enzymes suggests Botryosphaeriaceae were adapted to efficiently degrading lignin, which is not found in other woody pathogens that cannot effectively degrade xylem (Yin et al. 2015). Phenolics are aromatic benzene ring compounds produced by plants with critical roles for plant–pathogen interactions (Shalaby and Horwitz 2015). The salicylic acid degradation genes SHY1 and SHY3, in addition to the estradiol ring cleavage dioxygenase IRCD5 that can degrade aromatic compounds, were significantly expanded in Botryosphaeriaceae. Recent studies have found that Botryosphaeriaceae species can quickly metabolize phenolic compounds such as stilbene phytoalexins during grapevine infection, thereby helping fungi bypass the defence responses of the host (Labois et al. 2021; Stempien et al. 2017). The IRCD5 homologs were more closely related to IRCD genes from one herbivorous spider, with the latter induced in response to jasmonic acid defense signaling which can cleave a surprisingly wide array of aromatic plant metabolites (Njiru et al. 2022). These genes were inducted later than SA degradation genes after L. theobromae inoculation into grapevine branches, suggesting that the Botryosphaeriaceae can degrade different phenolic compounds efficiently in different infection stages.

Iron is an essential element for plants and fungi. Consequently, pathogens and their hosts often compete for the acquisition of iron (Herlihy et al. 2020). Abundant iron in a plant may increase local oxidative stress, while depletion of iron in hosts could also trigger defence responses to pathogens (Liu et al. 2021). Xylem and phloem are critical for iron transport in plants. The expansion of gene families involved in siderophore and RIA iron uptake pathways may help Botryosphaeriaceae efficiently maintain iron balance and help avoid plant defence responses when penetrating the xylem and phloem. A previous study had found siderophore transporters were also expanded in other woody pathogens, suggesting these genes may be important for woody pathogens infections (Yin et al. 2015). Biosynthesis of siderophore is required for virulence and can induce host immune responses (Albarouki and Deising 2013; Albarouki et al. 2014). Therefore, tight control of the activity of these iron uptake pathways could be important for pathogens to maintain iron homeostasis during infection. In the grapevine branches transcriptome data, SIDA and SIDC expression by L. theobromae was repressed and more SIT genes were up-regulated after infection of branches, potentially due to the abundance of ferrous iron in xylem (Pandey and Sonti 2010). Thus, increased SIT gene expression could be sufficient to transport iron from their hosts and siderophore biosynthesis would not be much needed to avoid causing host defence responses.

Our results suggest widespread horizontal gene transfer events in Botryosphaeriaceae before the divergence of genera. Most of the inferred horizontal gene transfer events in Botryosphaeriaceae genomes occurred in the distant past and were mainly transferred from Sordariomycetes. Previous studies have primarily focused on inter-kingdom horizontal gene transfers or more recently intra-kingdom horizontal gene transfers whereas few horizontal gene transfers have been identified in fungal genomes (Zhang et al. 2019; Sun et al. 2016). Though Sordariomycetes and Dothideomycetes may share common early ancestors in Ascomycota (Wijayawardene et al. 2018), horizontal gene transfer may still occur in Dothideomycetes that was transferred from Sordariomycetes. Lineage-specific genetic content was commonly found among organisms, including in different classes of fungi (Sipos et al. 2017; Cai et al. 2006; Schoch et al. 2009). Previous study had found nearly 8.5% of genes in Dothideomycetes were not found in other fungi classes (Schoch et al. 2009). In our ortholog gene analysis which included 38 Dothideomycete species and 48 Sordariomycetes species we identified 31,297 ortholog gene groups in these species. Among these ortholog genes, 3222 were enriched in Dothideomycetes than in Sordariomycetes, 1515 were enriched in Sordariomycetes than in Dothideomycetes. Further analysis shows 102 ortholog genes were only found in Dothideomycetes, and 61 only in Sordariomycetes. These results suggested that even though Sordariomycetes and Dothideomycetes may share common ancestors, their genetic materials may be significantly different. In addition, previous studies have found a lot of horizontal gene transfer events in Dothideomycetes that were transferred from Sordariomycetes (Reynolds et al. 2017; Wang et al. 2019; Herzog et al. 2020). Thus, we suggest it is possible to transfer genetic materials from Sordariomycetes to Dothideomycetes. Furthermore, our results are consistent with those from previous studies, wherein few horizontal gene transfers were inferred to have been transferred from archaea, bacteria, or even non-fungal eukaryotes. An increasing number of studies have supported the presence of intra-kingdom DNA transfers among fungal species (Qiu et al. 2016; Druzhinina et al. 2018). These intra-kingdom horizontal gene transfers may be a common method for the fungal exchange of genetic material in their ancient ancestors, especially in Botryosphaeriaceae.

Our study estimated the divergence time of the Botryosphaeriaceae lineage from other Botryosphaeriales families at 67 Mya. After this time the Cretaceous–Paleogene extinction, the global mass extinction event occurred and eliminated approximately 80% of plant species (Raup 1986; Alvarez et al. 1980). This may dramatically increase the available substrates for saprophytic organisms. On the other hand, the increased atmospheric sulfur aerosols and dust may reduce solar insolation and change the earth's temperature at that time. These two conditions may favour saprophyte activity. Thus, the widespread depletion of arboreous vegetation and favourable climatic conditions caused a large expansion of fungi with saprobic abilities, resulting in a “fungal spike” (Vajda and McLoughlin 2004). This was then followed by an episode of explosive plant radiations to recover Earth’s flora (Vajda et al. 2001; McElwain and Punyasena 2007). The ancestor of Botryosphaeriaceae, or its successors, may have experienced drastic environmental and host changes and had a greater chance of survival with other fungi sharing different lifestyles, thereby allowing for frequent DNA exchange during that period. Concomitantly, gene family expansion and intra-kingdom horizontal gene transfers have been commonly observed in Botryosphaeriaceae ancestors (Fig. 1a, 4a). These genome innovations likely expanded pathogenicity-related gene repertoires and fungal detoxification systems that helped them transition from saprotrophs to pathogenic lifestyles to adapt to the changed environmental conditions or hosts.