Increased abundance of secreted hydrolytic enzymes and secondary metabolite gene clusters define the genomes of latent plant pathogens in the Botryosphaeriaceae

The Botryosphaeriaceae are important plant pathogens, but also have the ability to establish asymptomatic infections that persist for extended periods in a latent state. In this study, we used comparative genome analyses to shed light on the genetic basis of the interactions of these fungi with their plant hosts. For this purpose, we characterised secreted hydrolytic enzymes, secondary metabolite biosynthetic gene clusters and general trends in genomic architecture using all available Botryosphaeriaceae genomes, and selected Dothideomycetes genomes. The Botryosphaeriaceae genomes were rich in carbohydrate-active enzymes (CAZymes), proteases, lipases and secondary metabolic biosynthetic gene clusters (BGCs) compared to other Dothideomycete genomes. The genomes of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum, in particular, had gene expansions of the major constituents of the secretome, notably CAZymes involved in plant cell wall degradation. The Botryosphaeriaceae genomes were shown to have moderate to high GC contents and most had low levels of repetitive DNA. The genomes were not compartmentalized based on gene and repeat densities, but genes of secreted enzymes were slightly more abundant in gene-sparse regions. The abundance of secreted hydrolytic enzymes and secondary metabolite BGCs in the genomes of Botryosphaeria, Macrophomina, Lasiodiplodia, and Neofusicoccum were similar to those in necrotrophic plant pathogens and some endophytes of woody plants. The results provide a foundation for comparative genomic analyses and hypotheses to explore the mechanisms underlying Botryosphaeriaceae host-plant interactions.


Background
Secreted hydrolytic enzymes and fungal toxins play crucial roles in enabling fungal pathogens to establish successful infections on their plant hosts. Among the secreted proteins, carbohydrate-active enzymes (CAZymes), protease and lipases are important for nutrient acquisition, as well as for the breakdown, manipulation (i.e effectors) or circumvention of host defences [1][2][3][4][5][6][7]. Fungal toxins are a diverse group of compounds and those most commonly found in fungal pathogens include polyketides, non-ribosomal peptides, terpenes and indole alkaloids [8]. These toxins are secondary metabolites that induce plant cell death, and for this reason, necrotrophic plant pathogens usually possess greater numbers of genes involved in secondary metabolite synthesis than biotrophic pathogens [9].
The genomes of many fungal and Oomycetes plant pathogens, especially those rich in repetitive elements, are not homogenous, but rather compartmentalized into repeat-rich, gene sparse regions and repeat poor, gene dense regions [10][11][12][13]. Genes localized to repeat-rich, gene sparse regions also have a higher rate of mutation and are often under stronger selective pressure [11,14,15]. This has given rise to a phenomenon referred to as 'two-speed' genomes, due to the stark differences in evolutionary rates between the two different types of genomic regions.
Fungi residing in the Botryosphaeriaceae include important plant pathogens. These fungi mostly cause diseases of woody plant species and they can impact negatively on the health of many economically and ecologically significant plant species [16,17]. The Botryosphaeriaceae infect a wide range of plant hosts, most notably grapevine [18], pome and stone fruits [19], plantation forest trees such as Eucalyptus spp., Pinus spp. and Acacia mangium [20][21][22], as well as plants in their native habitats [23][24][25][26]. Many of these fungi (e.g. B. dothidea, M. phaseolina, Lasiodiplodia theobromae, Neofusicoccum parvum) have wide host ranges, while a few species (e.g. Diplodia sapinea on Pinus species) have narrower host ranges or are even very host-specific (e.g. Eutiarosporella darliae, E. pseudodarliae and E. tritici-australis on wheat) [27]. Many species of Botryopshaeriaceae are also known to occur endophytically in asymptomatic plant tissues or to have a latent pathogenic phase, where they inhabit their plant hosts in the absence of symptoms and cause disease only after the onset of stress, such as drought, frost or hail damage [16,28].
A few recent studies have investigated secreted proteins and secondary metabolites in species of the Botryosphaeriaceae. Proteomic studies analyzing the secreted proteins of Diplodia seriata [29] and D. corticola [30] identified secreted proteins involved in pathogenesis. Studies of grapevine pathogens also predicted secreted CAZymes and genes involved in the production of secondary metabolites of D. seriata and Neofusicoccum parvum [31,32], however no studies directly linking these genes to disease symptoms or plant interactions exist. Secondary metabolite biosynthetic gene clusters (BGCs) have also been shown to play a role in host range determination, e.g. in E. darliae and E. pseudodarliae causing white grain disorder, where the presence of a secondary metabolite biosynthetic gene cluster is likely to allow woody hosts to be infected [27]. Despite the many publicly available genomes of species of Botryosphaeriaceae [31][32][33][34][35][36][37][38], no comprehensive comparative studies have been undertaken using these genomes; neither have analyses been conducted to characterise secreted proteins and secondary metabolites in most of these fungi. Such studies are also hampered by the lack of publicly available genome annotations.
The manner in which plants interact with beneficial microorganisms, while at the same time restricting the negative effects of pathogens, is an important and intriguing question in plant biology [39]. One proposed model referred to as the 'balanced antagonism model' [40] holds that endophytism is a result of both the host plant and the fungus employing antagonistic measures against each other, in such a way that neither overwhelms the other. Disruption of this balance either results in the pathogen causing disease or in the host plant successfully killing the fungus. The model thus predicts that known endophytic species should have similar genetic repertoires to their closely related plant pathogenic relatives. This appears to be the case when considering recent comparative genomics studies conducted on endophytic fungi [41][42][43][44], although some endophytic species, e.g. Xylonia heveae had fewer CAZymes than expected and were more similar to mutualistic species [45]. Indeed, the above-mentioned endophytes (other than X. heveae) commonly had high numbers of plant cell wall degrading enzymes and secondary metabolite genes.
Despite their ubiquity as endophytes and their importance as latent pathogens, very little is known regarding how Botryosphaeriaceae species interact with their diverse plant hosts at a molecular level. Studies have characterized this fungus-host interaction for the most prominent of Botryosphaeriaceae species [31,[46][47][48][49][50], but such knowledge remains lacking for most species. Key questions in this regard relate to the secreted hydrolytic enzymes and secondary metabolic biosynthesis genes present in their genomes. Based on the results of previous studies on Ascomycetes that are endophytes of woody plants, we have hypothesised that these genes and gene clusters in the Botryosphaeriaceae will resemble those of closely related plant pathogens. To test this hypothesis, we compared the predicted secreted hydrolytic enzyme and secondary metabolite genes of Botryosphaeriaceae species with those of other Dothideomycetes. We also characterised the genome architecture of the Botryosphaeriaceae in terms of gene density, repeat content and prevalence of repeat-induced point mutations (RIP), and considered how these associate with secreted hydrolytic enzymes and secondary metabolite BGCs.

Genome sequencing, assembly and annotation
Nine genomes of Neofusicoccum and three genomes of Lasiodiplodia species were sequenced using Illumina sequencing (Table 1). These included two isolates each of N. cordaticola, N. kwambonambiense, N. parvum and N. ribis were sequenced. A single isolate was sequenced for L. gonubiensis, L. pseudotheobromae, L. theobromae and N. umdonicola.
De novo genome assembly resulted in genome lengths of approximately 43 MB for both Lasiodiplodia spp. and Neofusicoccum spp. ( Table 2). The number of scaffolds/ contigs was variable between the sequenced genomes, but the three Lasiodiplodia genomes had a lower number of scaffolds (376-424) than the Neofusicoccum genomes (1343-5188). The N. parvum CMW9080 genome that was sequenced on the Miseq platform had a higher degree of fragmentation, as seen from the high total number of scaffolds (5188) and a large number of short contigs (N50: 897, L50:13.55 kb) and scaffolds (N50:830, L50: 14.81 kb). The percentage of repetitive elements of each genome was significantly greater in the Neofusicoccum genomes (6.84%) than in the Lasiodiplodia genomes (3.33%) (p = 0.004545, Wilcoxon rank sum test).
Twenty-six Botryosphaeriaceae genomes were annotated by predicting protein-coding genes with MAKER using BRAKER trained profiles. BUSCO analysis using the Ascomycota ortholog library (Table 3) indicated that all Botryosphaeriaceae genomes had a high degree of completeness (average of 98.%, minimum of 95.1%). The genome annotations that were generated also had a high BUSCO completeness score (average of 97.8.%, minimum of 94.3%). When comparing theses BUSCO results to those of species with existing genome annotations on NCBI/JGI, it was clear that in five out of the six cases the genome annotations from the present study had a higher BUSCO completeness score than the existing genome annotations.

Phylogenomic analyses
We identified 207 core orthologous genes from the collection of 26 Botryosphaeriaceae, 39 other Dothideomycetes and the outgroup (Aspergillus nidulans) genomes. Only orthologous genes that were represented by a single gene per species were retained. The results of the phylogenomic analyses corresponded well to previous phylogenies for the Dothideomycetes [74][75][76]. The phylogeny indicated the early divergence of the Dothideomycetidae (Dothideales, Capnodiales and Myrangiales) from the lineage containing the Pleosporomycetidae (Pleosporales, Hysteriales and Mytilinidiales) and other Dothideomycetes without current subclass designation (Fig. 1). This phylogeny further supported the early divergence of the Botryosphaeriales from the ancestral Dothideomycetes lineage after the divergence of the Dothideomycetidae and Venturiales. The phylogenetic relationships between the Botryosphaeriacaeae were well defined and the phylogenetic placement of species and genera corresponded with that found in previous studies [77,78].

Functional annotation
In the Botryosphaeriaceae, both genome size and the total gene number were strongly correlated with the number of secreted proteins, CAZymes, proteases, lipases and secondary metabolite gene clusters (Additional files 1 and 2). Within the Dothideomycetes, however, the numbers of secreted proteins, CAZymes, proteases, lipases, and secondary metabolite BGCs present within a genome were correlated with one another (i.e. species that contained large numbers of secreted proteins also contained large numbers of CAZymes, proteases, lipases and secondary metabolite BGCs), but only weakly correlated with genome size and the total number of genes (Additional file 2).
Among the Botryosphaeriaceae, the Eutiarosporella spp. had the lowest number of each of functional annotation category, followed in increasing order by Diplodia spp. and Botryosphaeria, Lasiodiplodia, Macrophomina, and Neofusicoccum species (Fig. 1, Additional file 1). This observation was also evident when considering the different classes of CAZymes/proteases/lipases and the different types of secondary metabolite BGCs (Fig. 2). Furthermore, for many functional annotation categories the Botryosphaeriaceae were more similar to the Pleosporomycetidae than the Dothideomycetidae.
The genomes of the Botryosphaeriaceae genera differed significantly (p-value < 0.05) for many of the annotation categories (Additional file 2). All genera were significantly different (Neofusicoccum > Botryosphaeriaclade > Lasiodiplodia > Diplodia > Eutiarosporella) for the number of secreteted genes, number of total CAZymes and secreted CAZymes. This trend also existed for the other annotation categories with a few exceptions: Among the proteases and lipases, the Neofusicoccum and Botryosphaeria-clade were not significantly different. When considering the secreted proteases, species in the Botryosphaeria-clade were not significantly different to those of Lasiodiplodia, Eutiarosporella or Diplodia. The secreted lipases were not significantly    different between Eutiarosporella and Diplodia. The secondary metabolite BGCs were not significantly different between the Neofusicoccum and species in the Botryosphaeria-clade. Significant differences also existed when comparing the functional annotation categories of the Botryosphaeriaceae to the rest of the Dothideomycetes (Additional file 2). The Botryosphaeriaceae had significantly greater numbers of secreted genes, total CAZymes and total proteases than the Dothideomycetidae. Furthermore, the Botryosphaeriaceae had significantly greater numbers of secreted CAZymes, secreted proteases, both total and secreted lipases and secondary metabolite BGCs than both Dothideomycetidae and Pleosporomycetidae.
The Botryosphaeriaceae, had significantly more CAZymes of the auxiliary activities (AA), carbohydratebinding modules (CBM), carbohydrate esterases (CE) and glycoside hydrolases (GH) classes than Dothideomycetidae ( Fig. 2, Additional file 2). Additionally, the Botryosphaeriaceae had significantly more polysaccharide lyase (PL) genes than both Dothideomycetidae and Pleosporomycetidae. Conversely, significantly fewer CEs were present in the Botryosphaeriaceae than in the Pleosporomycetidae, as well as fewer glycosyltransferases (GT) than the other two Dothideomycetes sub-classes.
When considering the secreted CAZyme classes, the Botryosphaeriaceae had significantly more CAZymes of the AA, CBM and CE classes than Dothideomycetidae and significantly more GH and PL classes than both Dothideomycetidae and Pleosporomycetidae (Fig. 2, Additional file 2). The most abundant secreted CAZyme families in the Botryosphaeriaceae were CBM1, AA3, GH3, GH43, GH5, AA9, CBM18, AA1, GH28 and CBM13 (Additional file 1). The Botryosphaeriaceae had above-average numbers of aspartic-(A), metallo-(M) and serine-(S) proteases, especially in the species of Botryosphaeria, Lasiodiplodia, Macrophomina, and Neofusicoccum (Fig. 2). For both the total and secreted number of predicted aspartic and serine proteases, the Botrosphaeriaceae had significantly greater levels than the Dothideomycetidae and Pleosporomycetidae ( Fig. 2, Additional file 2). The total and secreted metallo-proteases of the Botryosphaeriaceae were significantly higher than those of the Dothideomycetidae. These three protease classes were also the dominant proteases in the secretome. Furthermore, the Botryosphaeriaceae had significantly fewer secreted cysteine (C) proteases and protease inhibitors (I) than the other Dothideomycetes. Notably, the Botryosphaeriaceae possessed a single secreted protease inhibitor family, namely I51.001 (serine carboxypeptidase Y inhibitor), whereas many other Dothideomycetes secreted protease inhibitors were of this family, as well as I09.002 (peptidase A inhibitor 1) or I09.003 (peptidase B inhibitor). Diplodia sapinea and D. scrobiculata had no secreted protease inhibitors. The most abundant secreted protease families among the Botryosphaeriaceae were S09, A01, S10, S08, M28, S53, S33, M43, M35 and S12 (Additional file 1).
The Botryosphaeriaceae genomes were rich in gene clusters involved in the synthesis of secondary metabolites (Fig. 2). Type 1 polyketide synthases (t1PKS) were the most abundant type of gene cluster, followed by non-ribosomal peptide synthetases (NRPS) and NRPSlike, terpene synthases (TS) and t1PKS-NRPS hybrid . The supermatrix maximum likelihood phylogeny was determined using the sequence data of 207 single-copy core orthologous genes. Branches with 100% bootstrap support are indicated in black, those with less than 100% support are indicated in grey. This phylogeny illustrates the how the Botryosphaeriaceae taxa are related to one another, as well as how the Botryosphaeriaceae relates to the other Dothideomycetes. The number of genes (secreted and non-secreted) annotated with CAZyme, proteases or lipase activity, as well as the number of gene cluster types involved in secondary metabolite biosynthesis are indicated using bar graphs. These bar graphs depict that the number of these functional annotations are generally conserved within each Botryosphaeriaceae genus but that these values vary widely across this family. The Botryosphaeriaceae contains some of the largest (Neofusicoccum) as well as some of the smallest (Eutiarosporella) amounts of these functional annotations among the Dothideomycetes. Further statistical analyses of these values are provided in Additional file 2 clusters. The Botryosphaeriaceae had significantly more t1PKS clusters than the Dothideomycetidae and more NRPS-like, TS and betalactone clusters than both Dothideomycetidae and Pleosporomycetidae (Additional file 2). Certain Dothideomycetes genomes, predominantly those of the Pleosporomycetidae also contained indole and type 3 PKS BGCs, however, these were not present in any of the Botryosphaeriaceae genomes.
The most abundant secreted CAZyme, protease and lipase families of the Botryosphaeriaceae were also those that had the greatest difference from the rest of the Dothideomycetes (Table 4, Additional file 2). The twenty most abundant secreted hydrolytic enzyme families of the Botryosphaeriaceae were all significantly greater that both the Dothideomycetidae and Pleosporomycetidae, with the exception of the secreted CBM1 and AA9 CAZymes and the M28 metalloprotease families of the Botryosphaeriaceae that were not significantly greater than those of the Pleosporomycetidae. Furthermore, these twenty most abundant secreted families of the Fig. 2 Box-and-whisker plots of the number of prominent CAZyme, protease and lipase classes (total and secreted) and secondary metabolite BGC types present within the genomes of the considered Botryosphaeriaceae and other Dothideomycetes taxa. Represented data is scaled using the mean of each class/type and is indicated by the number appearing at the top of each facet. Taxa are placed into three categories: Dothideomycetidae, Pleosporomycetidae (plus related taxa without subclass designation) and Botryosphaeriaceae. The upper and lower bounds of the box represent the 1st and 3rd quartiles (respectively) and the bar inside represent the median. The error lines (whiskers) represents 1.5 times the interquartile range (IQR) and outliers are indicated as dots. Additionally, the genera of the Botryosphaeriaceae are indicated by means of a scatterplot overlain on the Botryosphaeriaceae box-and-whisker plot. These graphs visually compares the variance of each functional annotation class between the Botryosphaeriaceae and other Dothideomycetes subclasses. It also depicts that, among the Botryosphaeriaceae, the Eutiarosporella and Diplodia have smaller amounts of most functional annotation classes than the other genera Table 4 Comparison between the secreted and non-secreted amounts of the twenty most abundant secreted enzyme families in the Botryosphaeriaceae Enzyme family CBM1 S09 AA3 abH03 abH36 GH3 GH43 GH5 A01 AA9 CBM18 S10 AA1 GH28 S08 CBM13 M28 PL1 CE5 PL3

Amount of secreted proteins
Botryosphaeriaceae average 42. 85   Botryosphaeriaceae were also those that had, on average, the highest deviation from the Dothideomycetes average. The number of genes in these gene families were not increased among the non-secreted proteins. Consequently, the ratio of secreted to total proteins for these gene families was higher among the Botryosphaeriaceae than in the the Dothideomycetidae and Pleosporomycetidae.

Gene family evolution
These results of the CAFE analyses (Additional file 3) indicated that expansions and contractions of CAZyme gene families occurred at roughly similar levels. This was after the divergence of the Botryosphaeriales ancestor from the Pleosporomycetidae until the formation of the Botryosphaeriaceae crown group (61 MYA) [79]. After the divergence of the Botryosphaeriaceae, the genera Botryosphaeria, Lasiodiplodia, Macrophomina and Neofusicoccum experienced more gene family expansions than contractions, whereas the opposite was observed for the Diplodia and Eutiariosporella. Among the Neofusicoccum spp., the AA3, AA7, GH3 and GT2 gene families were rapidly expanding. Similarly, among the Lasiodiplodia spp., the AA7 and GH106 gene families were rapidly expanding. Conversely, among the Diplodia spp. the AA7 gene family was rapidly contracting, as were the AA1, AA3, AA7, GH28, PL1 and PL3 gene families among the Eutiarosporella spp.

Principal component analysis and hierarchical clustering
Hierarchical clustering separated the taxa into four groups (Fig. 3). The taxa in the Pleosporomycetidae and Dothideomycetidae generally clustered separately, however, there was no overall clustering based on taxonomic placement. Botryosphaeriaceae species were present in three of the four dominant clusters.
The Botryosphaeriaceae were distributed mainly along the first dimension of the PCA and clustered into three groups. Eutiarosporella spp. clustered at the lower ranges of the first dimension (x-axis) followed by clusters accomodating the Diplodia species towards the middle ranges and the other genera of Botryosphaeriaceae clustered at the high ranges of the x-axis. The Botryosphaeriaceae clustered along a relatively narrow range along the second dimension (y-axis) compared to the other Dothideomycetes. The clustering of the other Dothideomycetes along the x-axis was correlated to their clustering on the y-axis: taxa towards the higher end of the x-axis also occurred towards the higher end of the yaxis. The clustering of taxa did not correspond to their nutritional lifestyle, but their taxonomic placement was reflected in their clustering.

Genome architecture
Two-dimensional heatmaps of the 5′ and 3′ FIRs of the 26 Botryosphaeriaceae genomes indicated no genome compartmentalization ( Fig. 5 and Additional file 4). This was evident from the unimodal gene density distributions of these genomes. Genomes of Botryosphaeria,  The predicted secreted proteins of the Botryosphaeriaceae contained a greater number of genes in gene sparse regions than the total predicted genes (Additional file 4). The total CAZymes contained a higher proportion of genes in gene sparse regions than the total predicted genes. The secreted CAZyme gene density distribution was very similar to that of the total CAZymes. The secreted lipases and cutinases, however, occurred more frequently in gene sparse regions than the total lipases and cutinases (Additional file 4). This trend was also observed for the secreted proteases, although not as strongly. Genes associated with secondary metabolite BGCs were less prevalent in gene sparse regions.
The levels of repetitive sequences for most Botryosphaeriaceae genomes were between 3 and 8% of the total genome size ( Table 5). The two Botryosphaeria spp. differed considerably in their repeat content (3.48 vs. 11.88%). The genome of M. phaseolina also had a higher than average repeat content (16.37%). Among the Neofusicoccum species, the genomes of N. parvum and N. umdonicola had less repetitive sequences than the other genomes of this genus. The genomes of D. sapinea and D. scrobiculata also contained less repetitive sequences than the other two Diplodia species.
On average, the Botryosphaeriaceae genomes had less than 10% of their genomes composed of TA rich regions (GC < 50%) ( Table 5). The genomes of B. kuwatsukai LW030101 and M. phaseolina had 18.9 and 19.8% of their genomes as TA rich regions, respectively. The genomes of Diplodia and Eutiarosporella had fewer TA rich regions (approximately 5% of the genome) compared to the other genera. Less than 5% of genes were present in TA rich regions and secreted genes were not found to be over-represented among these genes (Additional file 4). The genomes of Diplodia and Eutiarosporella had considerably fewer genes associated with TA rich regions, than the other taxa of this family.
Analysis of the prevalence of RIP in the genomes of Botryosphaeriaceae indicated that this has occurred to varying degrees in these genomes (Table 5).

Discussion
This study represents the first large-scale comparative genomics-level consideration of all available genomes of Botryosphaeriaceae. The results showed that the included Botryosphaeriaceae genomes, especially those of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum, encode high numbers of secreted hydrolytic enzymes and secondary metabolite BGCs. This emerges due to these fungi having increased numbers of genes associated with plant interactions in their secretome. The results also indicate that the Botryosphaeriaceae are most similar to species of the Pleosporomycetidae based on secreted enzyme and secondary metabolite profiles. Botryosphaeriaceae genomes were furthermore determined not to be compartmentalized based on gene density or GC-content.
There was a strong correlation between the number of hydrolytic enzymes and secondary metabolite BGCs, and the genome size and gene number of the Botryosphaeriaceae considered in this study. This correlation between genome size and gene number has generally not been seen in other fungi [4,11,80], because transposable elements and repetitive DNA vary significantly among species [11]. A recent comparison of Dothideomycetes genomes [81] also showed that genome size and gene number were not correlated to the abundance of functional annotation classes; neither to the lifestyle or phylogenetic placement of a species.
(See figure on previous page.) Fig. 5 Gene density landscape of the genera of Botryosphaeriaceae. The two-dimensional heatmap shows the distribution of genes to gene dense or sparse regions of the genome based on their 5′ and 3′ flanking intergenic regions (FIRs). Two-dimensional heatmaps of each genus is the average across bins of all genomes in the group. The heatmaps labelled as "Botryosphaeria-clade" include the genomes of B. dothidea, B. kuwatsukai and M. phaseolina. The values on the axes are distances in base pairs and signify the upper limit of each bin. The median bin is indicated by dotted lines to assist in comparison between plots. The colours of the heatmap represents the number of genes present within each two-dimensional bin. All Botryosphaeriaceae had unimodal gene density distributions, with Eutiarosporella, Diplodia and Neofusicoccum with a greater overall gene density (smaller intergenic regions) than the other genera The genomes of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum had abundant secreted hydrolytic enzyme and secondary metabolite BGCs. This is a pattern that is most similar to prominent necrotrophic plant pathogens (A. alternata, C. casiicola), saprobes (C. aquaticus, P. sporulosa) and the endophyte/ latent pathogen P. macrospinosa in the Pleosporales. The pattern was consistent with reports that necrotrophic pathogens tend to have higher numbers of hydrolytic enzymes and secondary metabolite toxins than biotrophs and symbiotic fungi [7,82]. An abundance of secreted hydrolytic enzymes and secondary metabolite BGCs found in the Botryosphaeriaceae is also similar to that of other species of woody endophytes. Studies on such endophytic species have shown that they have similar or higher amounts of various secreted enzymes (notably plant cell wall degrading enzymes) or secondary metabolites than closely related plant pathogenic species [41][42][43][44]. The specific gene families that are enriched, however, differ among endophytic lineages, due to the evolutionary independent origins of endophytism [41][42][43][44]. It has furthermore been noted that fungi with dual lifestyles (e.g. fungi with endophytic and pathogenic phase) have large numbers of CAZymes [83,84], however, few studies have investigated this. These observations also emerging from the present study are consistent with the known dual-lifestyle of Botryosphaeriaceae as latent pathogens.
The Botryosphaeriaceae genomes were rich in CAZymes, especially those involved in plant cell wall degradation (PCWD), although at lower numbers in the genomes of Diplodia and Eutiarosporella species. CAZymes involved in the degradation of cellulose, hemicellulose and pectin were present in all Botryosphaeriaceae. The genomes of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum were particularly rich in CAZyme families involved in cell wall degradation. Specifically, CAZymes involved in plant, general and fungal cell wall degradation [4,85,86] were abundant in the genomes of the above-mentioned genera.
The Botryosphaeriaceae secretomes were rich in CAZyme families involved in the recognition of cellulose (CBM1) and chitin (CBM18). Although carbohydratebinding domains have no catalytic activity of their own they play important roles in substrate recognition and binding of other CAZymes [87,88], they are also involved in the protection of fungal cell walls from degradation by host enzymes and prevention of host detection [1,89]. High numbers of secreted CAZymes involved with PCWD have also been found in previous studies of N. parvum and D. seriata [31,32] and in other, especially necrotrophic, Dothideomycetes [62,63,66,70,85]. The abundance of these CAZyme families in some genera of Botryosphaeriaceae suggests that cell wall degradation plays an important role in the biology of these fungi.
The Botryosphaeriaceae were rich in secreted serine-, metallo-and aspartic-proteases. Protease families (A01, S08, S09, S10) that were previously identified as the most common secreted proteases among Dothideomycetes [4] were also the most abundant in the Botryosphaeriaceae. Secreted proteases play important roles in nutrient acquisition, signalling and degradation of plant defences [3,[97][98][99]. Although secreted proteases are abundant in several necrotrophic pathogens, e.g. Corynespora cassiicola [66] and several Colletotrichum spp. [100], no patterns between nutritional lifestyle and the abundance of secreted proteases could be distinguished. The precise function of most of these proteases in the Botryosphaeriaceae are unknown and their role during infection and disease expression remains to be determined.
A lower abundance and diversity of secreted protease inhibitors of the Botryosphaeriaceae suggests a reduced capacity and/or need for extracellular enzyme inhibition. Plant pathogenic fungi secrete protease inhibitors to inhibit plant proteases involved in defence responses [101] and several protease inhibitors are known virulence factors, e.g. avr2 of Cladosporium fulvum [102,103] and Pit2 of Ustilago maydis [104]. However, the exact role of many fungal protease inhibitors, such as those secreted by the Botryosphaeriaceae and Dothideomycetes, remains unknown [105]. Botryosphaeriaceae, especially species of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum possessed high numbers of secreted lipases. Three lipase families were present in high numbers in the secretomes of the Dothideomycetes, i.e. Candida rugosa lipase-like (abH03), cutinases (abH36 and CE5) and Filamentous fungi lipases (abH23). The Botryosphaericeae genomes were rich in secreted enzymes for these three families. Lipases and cutinases are important for fungal penetration of host tissue [6,106], growth and adhesion [107,108] and manipulation of host defences [5]. The abundance of these secreted lipases and cutinases emphasises their potentially important role during the infection process in the Botryosphaeriaceae.
The genomes of Botryosphaeria, Macrophomina, Lasiodiplodia and Neofusicoccum contained many BGCs, especially t1PKS, NRPS, NRPS-like and TS type clusters. The products produced by most of these clusters are unknown, however, the products of some clusters could be determined. These compounds included melanin, phytotoxins (ACT-Toxin II, (−)-Mellein), siderophores (dimethylcoprogen) and antioxidants (pyranonigrin E). Many phytotoxic secondary metabolites have been identified in the Botryosphaeriaceae [109][110][111][112][113][114]. Of these, the most commonly identified phytotoxic compounds are mellein and its derivatives [115,116]. The presence of a predicted ACT-toxin producing gene cluster in the Botryosphaeriaceae is interesting as it is a host selective toxin from citrus infecting A. alternata [117]. This toxin is part of the Epoxy-decatrienoic acid (EDA) family, however, neither this type of toxin or any in this family has been isolated from the Botryosphaeriaceae. The ability to produce secondary metabolite toxins have been associated with pathogenic fungi's lifestyle, host range and virulence [4,118,119]. Many plant pathogenic fungi have large numbers of secondary metabolite BGCs, e.g. Bipolaris spp. [120], Corynespora cassiicola [66], Colletotrichum spp. [100] and Pyrenophora teres [4], but so also do fungi with other lifestyles such as the saprobic Annulohypoxylon stygium [121], Hysterium pulicare, and Rhytidhysteron rufulum [4]. Despite the observation that the total abundance of secondary metabolite BGCs does not predict lifestyle, several fungal toxins are able to modulate a fungal species' host range or virulence, e.g. the AF-toxin of A. alternata [118], the Hybrid-1,2 and 3 genes of Eutiarosporella darliae and E. pseudodarliae [27] and the HC-toxins of Bipolaris zeicola [122].
The Botryosphaeriaceae genera were shown to possess non-compartmentalized genomes. These species were characterized as having moderate to high % GC, RIPaffected genomes with low amounts of repetitive DNA, a slight preferential localization of secreted genes to gene sparse regions and no preferential localization of secreted genes to TA rich regions. Most Dothideomycetes do not have compartmentalized genomes, but several important pathogenic species (e.g. L. maculans and Pseudocercospora spp.), have genomes with high levels of repetitive DNA (often transposable elements) [12,57] enriched for fast-evolving genes related to pathogenicity or virulence [57,123,124]. However, not all rapidly evolving fungal phytopathogens have this characteristic 'two-speed' genome architecture [13]. Where 'two-speed' genomes rely on the action of leaky RIP to generate variation for selection to act on, 'one-speed' genomes of fast-evolving plant pathogens rely on the absence of RIP that allows gene duplication/copy number variation to generate variation [13]. The Botryosphaeriaceae are not like those species with 'two-speed' genomes as they don't have compartmentalized genomes, but RIP is also not completely absent as seen in species with 'one-speed' fast-evolving genomes.

Conclusions
This study is the first large-scale comparative genomics study to consider all available genomes of Botryosphaeriaceae. It has illustrated large variability in the secreted hydrolytic enzyme and secondary metabolite biosynthetic repertoire between genera of this family. Most importantly, we have demonstrated similarities between the Botryosphaeriaceae and necrotrophic plant pathogens and endophytes of woody plants, emphasising their role as latent pathogens. This study highlights the importance of these genes in the infection biology of Botryosphaeriaceae species and their interaction with plant hosts. This knowledge will be useful in future studies aimed at understanding the mechanisms of endophytic infections and how these transition to a pathogenic state. The results should also help to better understand the genetic factors involved in determining the complex question of host range in the Botryosphaeriaceae.

Genomic data
All available, published Botryosphaeriaceae genomes were retrieved from public databases (NCBI and JGI). Additionally, we sequenced and assembled 12 genomes representing three Lasiodiplodia spp. and five Neofusicoccum spp. (Table 1). To standardize protein annotations for downstream application, all of the above Botryosphaeriaceae genomes were annotated using the same pipeline described below. Additionally, the genomes and protein annotations of 41 Dothideomycetes and Aspergillus nidulans (Eurotiales), which had both genomic sequences and annotated protein sequences available on NCBI, were retrieved (Table 1). These genomes were used for comparative purposes in the phylogenomic analyses, hydrolytic enzyme and secondary metabolite BGC analyses and in the statistical clustering analyses, described below.

DNA extraction, genome sequencing and assembly
Cultures of three Lasiodiplodia and five Neofusicoccum species (Table 1) were inoculated onto cellophane covered 2% malt extract agar (MEA; Biolab, Merck) and incubated at 22°C. After 5 days, mycelium was harvested from the surface of the cellophane using a sterile scalpel and DNA was extracted using a modified phenol/chloroform protocol, that included the addition of potassium acetate to precipate protein. Mycelium was ground to a fine powder using liquid nitrogen and mortar and pestle. Approximately 500 mg of ground mycelium was used for DNA extraction. To the ground mycelia, 18 ml of a 200 mM Tris-HCl (pH 8.0), 150 mM NaCl, 25 mM EDTA (Ethylenediaminetetraacetic acid, pH 8.0) and 0.5% SDS (Sodium dodecyl sulfate) solution and 125 μl of 20 mg/ ml Proteinase K was added and incubated at 60°C for 2 h. This was followed by addition of 6 ml of 5 M potassium acetate and 30 min incubation at 0°C. Samples were then centrifuged at 5000 g for 20 min. The aqueous phase was kept and 24 ml of a 1:1 phenol:chloroform solution was added; samples were then centrifuged as above. Two chloroform washes were performed on the aqueous phase, followed by addition of 100 μl of 10 mg/ ml Rnase A and incubated for 2 h. DNA was precipitated using one volume isopropanol and centrifuged for 30 min. The pellet was cleaned with two 70% ethanol washes and resuspended in 1 x Tris-EDTA buffer.
The extracted DNA was used for paired-end sequencing (average fragment size of 500 bp). All samples were sequenced on an Illumina HiSeq 2500 platform, except for N. parvum (isolate CMW9080) that was sequenced on a Miseq platform. The quality of the resulting reads was assessed using FastQC 0.10.1 [125] and low quality and short reads were trimmed or discarded using Trimmomatic 0.30 [126]. De novo genome assembly was performed by Velvet 1.2.10 [127] and Velvetoptimiser 2.2.5 [128]. Paired-end reads were used to scaffold the assembly and insert size statistics were determined by Velvet for each genome assembly. Genome assembly summary statistics were calculated with the AssemblyStatsWrapper tool of BBtools 38.00 [129].

Genome annotation
The twelve sequenced genomes described above, as well as the fourteen Botryosphaeriaceae genomes retrieved from public databases, were annotated as follows: Custom repeat libraries were constructed for each genome assembly using RepeatModeler 1.0.10 [130]. BRAKER 1.10 [131] was used to create trained GeneMark-ET 4.29 [132] and AUGUSTUS 3.2.3 [133] profiles using previously published N. parvum transcriptome data (SRR3992643 and SRR3992649) [31]. Genomes were annotated through the MAKER2 2.31.8 [134] pipeline using the custom repeat libraries and the BRAKER trained GeneMark-ET 4.29 and AUGUSTUS profiles. Genomes and genome annotations were assessed for completeness with BUSCO 4.0.5 [135] using the Ascomycota ortholog library (Creation date 2020-09-10, 1706 core orthologous genes). The annotations for six Botryosphaeriaceae genomes available on public databases prior to this study were also assessed using BUSCO and compared to those of the annotations generated in this study.

Phylogenomic analyses
To illustrate the relationships between species and genera of the Botryosphaeriaceae, as well as the relationship of this family to the rest of the Dothideomycetes, a robust phylogeny was created from the genome data. Single copy core orthologous genes were identified from individual genomes (Table 1) using BUSCO (as described above). The BUSCO genes present in all taxa (207 genes) were selected for further analysis. Each orthogroup was aligned using MAFFT 7.407 [136] before being concatenated into a single matrix. RAxML 8.2.4 [137] was used to perform maximum likelihood phylogenetic inference using the PROTGAMMAAUTO option and a thousand bootstrap replicates were performed. Trees were rooted using sequences from Aspergillus nidulans.

Functional annotation
The genome annotation data for the Botryosphaeriaceae, the other Dothideomycetes and the outgroup A. nidulans were used to perform functional annotation for hydrolytic enzymes and secondary metabolite BGCs. CAZymes were predicted by searching the total predicted proteins of each genome against the CAZy database [138] using dbCAN2 (HMMdb release v9.0) [139]. Only those CAZyme predictions supported by two or more tools (HMMR, DIAMOND, Hotpep) were retained. Proteases and protease inhibitors were predicted by subjecting the predicted protein sequences to a BLASTP [140] search against the MEROPS protease database 12.0 [141] using a cut off E-value of 1E-04. Lipases and cutinases were predicted by searching protein sequences against lipase and cutinase hidden Markov model (HMM) profiles retrieved from the Lipase Engineering Database v 3.0 [142] using HMMER 3.1b2 [143].
The total predicted proteins were also analyzed for the presence of signal peptides, involved in protein secretion. Phobius 1.01 [145] and SignalP 4.1 [146] were used to assess the presence of signal peptides and TMHMM 2.0 [147] was used to determine if any transmembrane regions occurred within these proteins. Only proteins with a signal peptide predicted by both Phobius and Sig-nalP, as well as no transmembrane domains outside of the signal peptide predicted by both Phobius and TMHMM were regarded here as predicted secreted proteins.
Statistical analyses were performed using the total number of annotations, as well as the number of each enzyme class/secondary metabolite BGC type. Statistical tests were done comparing the genera of the Botrysphaeriaceae, but also for comparisons between Dothideomycetidae, Pleosporomycetidae and Botryosphaeriaceae. Due to the small number of representative genomes for Botryosphaeria (2) and Macrophomina (1) and their close phylogenetic placement, they were combined in a single group (i.e. Botryosphaeria-clade) for the purpose of statistical analyses. We further also included Cenococcum geophilum, Coniosporium apollinis and Glonium stellatum in the Pleosporomycetidae, based on their phylogenetic placement (Fig. 1).
We calculated pairwise Pearsons correlation coefficients [148] in order to test whether genome size, the number of predicted genes or any of the functional annotation categories were correlated. The Shapiro-Wilk test [149] was performed in order to test for a normal distribution using the byf.shapiro function of the RVAi-deMemoire R-package [150]. Pairwise comparisons to test for significant differences were done using onetailed Wilcoxon rank-sum tests [151] in R using the pairwise.wilcox.test function of the R stats package [152] with either the 'greater' or 'less' alternative hypothesis options. This is a non-parametric test to determine whether two independent sets of values have the same distribution. Unlike the t-test, the Wilcoxon rank sum test does not assume a normal distribution [153] and thus was better suited to our data. These same tests as above were done using the total number of predicted, as well as the number of secreted genes, associated with hydrolytic enzyme families to test if specific CAZyme, protease or lipase families within the secretomes of the Dothideomycetidae, Pleosporomycetidae and Botryosphaeriaceae were significantly different.
Analysis of gene family evolution CAFE (Computational Analysis of gene Family Evolution) v4.2 [154] was used to study the evolution of gene family size in the hydrolytic enzymes and secondary metabolite BGCs. To this end, the phylogeny described above was converted to an ultrametric tree using r8s v1.81 [155]. This was then calibrated by fixing the age of the Botrosphaeriaceae to 61 million years [79] and constraining the age of the Dothideomycetes to 303-357 million years [156]. Gene family sizes for the total predicted CAZymes, proteases, lipases and secondary metabolite BGCs (Additional file 1) were used as input for the CAFE analyses. CAFE was run using separate lambda (birth) and mu (death) rate parameters. Additionally, two separate rate classes were allowed in that the rate parameters were calculated independently for the Botryosphaeriaceae and the remaining Dothideomycete taxa. CAFE was run using a P-value cutoff of 0.01 and Viterbi P-values were calculated to significant expansions/contractions across branches.

Hierarchical clustering and principal component analysis
Hierarchical clustering was done to determine the similarity between taxa based on the secreted hydrolytic enzyme classes and secondary metabolite BGC types. The heatmap.2 function of the gplots [157] R package was used to perform the analysis. Additionally, principal component analysis (PCA) was performed using the same functional annotations as used in the hierarchical clustering, with the exception that for CAZymes and proteases the number of genes associated with each family (e.g AA1, A01) was used instead of those of each class (e.g AA, Aspartic). The FactoMineR [158] R package was used to perform the analysis and Factoshiny [159] was used to generate PCA plots.

Genome architecture
Gene densities were analyzed for each Botryosphaeriaceae genome to assess the level of genome compartmentalization. Intergenic distances were used as a measure of gene density, by considering the 5′ and 3′ flanking intergenic regions (FIRs) of each gene. FIR lengths were used for two-dimensional data binning to construct gene density heat maps [160] using R [152]. Possible differences in the gene density distributions between the total and secreted CAZymes, proteases and lipases were also considered. We compared the relative amounts (i.e. the percentage) of these genes located in gene-sparse regions. In this case, gene sparse regions were defined as genes with both 5′ and 3′ FIRs larger than 1500 bp, as previously used by S Raffaele and S Kamoun [11].
The repeat contents of the Botryosphaeriaceae genomes were determined using RepeatMasker [161] with custom repeat libraries created by RepeatModeler. The presence of TA rich regions and the genes present therein were determined using OcculterCut [124]. Specifically, the numbers of the total secreted genes, the secreted CAZymes, proteases and lipases, as well as the number of secondary metabolite BGCs associated with TA rich regions were determined. The occurrence of RIP, including large regions affected by RIP (LRARs), was determined in each genome using TheRIPper [162].

Acknowledgements
We are grateful to the University of Pretoria, The Department of Science and Technology (DST)/National Research Foundation (NRF) Centre of Excellence in Tree Health Biotechnology and members of the Tree Protection Cooperative Program for financial support of this study. The authors acknowledge the Centre for Bioinformatics and Computational Biology, University of Pretoria and the Centre for High Performance Computing (CHPC), South Africa for providing computational resources to this research project. Lastly, our sincere thanks to Alisa Postma for her guidance during the genome assembly and annotation process and to Dr. Tuan Duong for his critical comments on this manuscript.
Authors' contributions BS and MJW were supervisors for the doctoral research by JHN. All authors contributed to the study conception, data interpretation and the revision of the manuscript. JHN performed the laboratory work, data analyses and visualizations, and wrote the draft manuscript. All authors read and approved the final manuscript.

Funding
The South African National Research Foundation (NRF) funded the Doctoral research of the first author. This project was financed by the University of Pretoria, the Department of Science and Innovation (DSI)/National Research Foundation (NRF) Centre of Excellence in Tree Health Biotechnology. The Grant holders acknowledge that opinions, findings and conclusions or recommendations expressed in any publication generated by NRF-supported research are that of the author(s) and that the NRF accepts no liability whatsoever in this regard.

Availability of data and materials
All data generated or analysed during this study are included in this published article (and its supplementary information files). Genomic data used in this study (Table 1) are available from NCBI (https://www.ncbi.nlm. nih.gov/genome) and the JGI Genome Portal (http://genome.jgi.doe.gov). Raw sequencing reads of the de novo genomes described in this study linked to Bioproject PRJNA497969 (Biosample Accessions SAMN10319569 -SAMN10319580) have been deposited to the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra) database and are awaiting accession numbers.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.