Genome sequences of Knoxdaviesia capensis and K. proteae (Fungi: Ascomycota) from Protea trees in South Africa

Two closely related ophiostomatoid fungi, Knoxdaviesia capensis and K. proteae, inhabit the fruiting structures of certain Protea species indigenous to southern Africa. Although K. capensis occurs in several Protea hosts, K. proteae is confined to P. repens. In this study, the genomes of K. capensis CBS139037 and K. proteae CBS140089 are determined. The genome of K. capensis consists of 35,537,816 bp assembled into 29 scaffolds and 7940 predicted protein-coding genes of which 6192 (77.98 %) could be functionally classified. K. proteae has a similar genome size of 35,489,142 bp that is comprised of 133 scaffolds. A total of 8173 protein-coding genes were predicted for K. proteae and 6093 (74.55 %) of these have functional annotations. The GC-content of both genomes is 52.8 %. Electronic supplementary material The online version of this article (doi:10.1186/s40793-016-0139-9) contains supplementary material, which is available to authorized users.


Introduction
Two lineages of the polyphyletic assemblage known as ophiostomatoid fungi [1] are associated with the fruiting structures (infructescences) of serotinous Protea L. plants [2]. Protea species are a key component of the fynbos vegetation in the Core Cape Subregion (CCR) of South Africa [3] and the genus is predominantly encountered in South Africa [4,5]. The Protea-associated ophiostomatoid fungi are, therefore, believed to be endemic to this region, similar to their hosts. This association of ophiostomatoid fungi with a keystone plant genus in a biodiversity hotspot is intriguing [6], as many ophiostomatoid fungi are notorious pathogens of trees [7][8][9][10], yet the Protea ophiostomatoid species are not associated with disease symptoms [11].
Ophiostomatoid fungi are characterized by the flaskshaped morphology of their sexual fruiting structures and their association with arthropods [1,12]. The Protea-associated members of this assemblage are primarily dispersed by mites that come into contact with fungal spores in the Protea infructescences [13,14]. These mites have limited dispersal ability, but use beetles and possibly larger vertebrates (such as birds) as vehicles for long-distance dispersal [15,16].
The three Knoxdaviesia M.J. Wingf., P.S. van [17][18][19][20]. An investigation of the population biology of K. proteae, revealed that this fungus has a high level of intra-specific genetic diversity and that it is extensively dispersed within the CCR of South Africa [16,21]. However, other than host range and dispersal mechanisms, little is known about the biology and ecology of Knoxdaviesia in general [11]. Here we present the description of the first drafts of the genome sequences of the two CCR species, K. capensis and K. proteae, as well as their respective annotations.

Classification and features
The one lineage of Protea-associated ophiostomatoid fungi resides in the Ophiostomataceae (Ophiostomatales, Ascomycota), while the second resides in the Gondwanamycetaceae (Microascales, Ascomycota) [11,22]. The latter group includes three closely related Protea-associated species in the genus Knoxdaviesia (Fig. 1). This genus was initially described to accommodate the asexual state of the first species in the genus, K. proteae [23]. Under the dual nomenclature system of fungi, the sexual state of this fungus was described in the same paper as Ceratocystiopsis proteae M.J. Wingf., P.S. van Wyk & Marasas [23]. A new genus, Gondwanamyces G.J. Marais & M.J. Wingf., was later described to accommodate the sexual state of this species and that of another species, Ophiostoma capense M.J. Wingf. & P.S. van Wyk [24]. The asexual states of both remained to be treated as species of Knoxdaviesia. Since the abolishment of the dual nomenclature system of fungi, the oldest genus name takes preference, irrespective of morph [25,26]. The name Knoxdaviesia, therefore, has priority and all species previously treated in Gondwanamyces were transferred to Knoxdaviesia [27].
In a study determining the genome sequence of any fungus, it is advisable to use a living isolate connected to the type specimen. However, the ex-type isolate of K. proteae (CMW738 = CBS486.88) is more than 20 years old and does not display the characteristic morphological features of the fungus in culture anymore. No living ex-type isolate exists for K. capensis. We thus collected fresh isolates of both species for this study in order to eliminate possible mutations or degradation that may have occurred though continual artificial propagation in culture media. The new isolates (Figs. 1 & 2) were collected from the same localities and hosts as the holotype specimens: K. capensis (CMW40890 = CBS139037) from the infructescences of P. longifolia Andrews in Hermanus, and K. proteae (CMW40880 = CBS140089) from P. repens infructescences in Stellenbosch, both locations in the Western Cape Province of South Africa. General features of these isolates are outlined in Table 1.

Genome project history
Considering the lack of ecological information on the genus Knoxdaviesia and the close relationship these Microascalean fungi have to important plant pathogens, two Protea-associated Knoxdaviesia species, believed to be native to the CCR in South Africa, were selected for genome sequencing. Both species were sequenced at Fasteris in Switzerland. The genome projects are listed in the Genomes OnLine Database [28] and the whole genome shotgun (WGS) project has been deposited at DDBJ/EMBL/GenBank (Table 2). Table 2 presents the project information and its association with the minimum information about a genome sequence version 2.0 compliance [29]. The full MIGS records for K. capensis and K. proteae are available in Additional file 1: Table S1 and Additional file 2: Table S2, respectively.

Growth conditions and genomic DNA preparation
Both K. capensis and K. proteae were cultured on Malt Extract Agar (MEA; Merck, Wadeville, South Africa) overlaid with sterile cellophane sheets (Product no. Fig. 1 Maximum Likelihood tree illustrating the phylogenetic position of K. capensis and K. proteae in the Gondwanamycetaceae (grey block). The Protea-associated species are shaded red and the two isolates for which genome sequences were determined are indicated with a box. The sequences of the Internal Transcribed Spacer (ITS) region (available from GenBank®, accession numbers in brackets following isolate numbers) were aligned in MAFFT 7 [55]. The phylogeny was calculated in MEGA6 [56] using the Tamura-Nei substitution model [57], 1000 bootstrap replicates and Ceratocystis fimbriata (Ceratocystidaceae) as an outgroup  Evidence codes -IDA inferred from direct assay, TAS traceable author statement (i.e., a direct report exists in the literature), NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from http://www.geneontology.org/GO.evidence.shtml of the Gene Ontology project [58] Z377597, Sigma-Aldrich, Steinham, Germany). After 10 days of growth at 25°C, mycelia was scraped from the cellophane and DNA was extracted according to Aylward et al. [30]. Approximately 5 μg DNA from each species was used to prepare the three Illumina libraries ( Table 2).
RNA was extracted from the K. proteae genome isolate to use as evidence for gene prediction. After growth on MEA at 25°C for approximately 10 days, total RNA was isolated from the mycelia with the PureLink™ RNA Mini Kit (Ambion, Austin, TX, USA). Quality control was performed on the Agilent 2100 Bioanalyzer (Agilent Technologies, USA) using the RNA 6000 Nano Assay kit (Agilent Technologies, USA). The mRNA component of the total RNA was subsequently extracted with the Dynabeads® mRNA purification kit (Ambion, Austin, TX, USA).

Genome sequencing and assembly
The genomes of K. capensis and K. proteae were sequenced with the Illumina HiSeq 2500 platform at Fasteris,  Switzerland, using two paired-end and one Nextera matepair library ( Table 2). More than 60 million paired-end and 8 million mate-pair reads were obtained for each species. These reads were trimmed in CLC Genomics Workbench 6.5 (CLC bio, Aarhus, Denmark) so that the Phred Q (quality) score of each base was at least Q20. VelvetOptimiser (Gladman & Seeman, unpublished), a Perl script used as part of the Velvet assembler [31,32], was initially used to optimize the assembly parameters. Assembly of contigs was performed in ABySS 1.5.2 [33] using the optimal parameters suggested by VelvetOptimiser as a starting point. Several assemblies were computed using kmer-values slightly higher and lower than the kmer-value suggested by VelvetOptimiser. The assembly with the lowest number of contigs was used to build scaffolds in SSPACE 3.0 [34], discarding scaffolds smaller than 1000 bp. Automatic gap closure was performed in GapFiller 1.10 [35]. The average genome coverage of each library was estimated using the Lander-Waterman equation (total sequenced nucleotides/ genome size) ( Table 2), which yielded a combined average coverage for the three libraries of 188.5x (K. capensis) and 271.5x (K. proteae). The K. capensis genome consists of 29 scaffolds ranging between 1226 and 5,637,848 bp, whereas the 133 scaffolds of K. proteae are sized between 1022 and 2,610,973 bp. A search for the 1438 fungal universal single-copy ortholog genes with BUSCO 1.1b1 [36] identified 1355 complete and 67 partial genes in K. capensis and 1366 complete and 57 partial genes in K. proteae. The two genomes are therefore estimated to be >98 % complete.
The extracted mRNA of K. proteae was sequenced using an Ion PI™ Chip on the Ion Proton™ System (Life Technologies, Carlsbad, CA) at the Central Analytical Facility (CAF), Stellenbosch University, South The total is based on the total number of protein coding genes in the genome Africa. The >49 million raw RNA-Seq reads were mapped to the K. capensis genome in CLC Genomics Workbench and assembled with Trinity 2.0.6 [37] using the genome-guided option.

Genome annotation
Genome annotation was performed with the MAKER 2.31.8 pipeline [38,39], using custom repeat libraries for each species constructed with RepeatScout 1.0.5 [40] and two de novo gene predictors, SNAP 2006-07-28 [41] and AUGUSTUS 3.0.3 [42]. The assembled K. proteae RNA-Seq and predicted protein and/or transcript sequences from 22 sequenced Sordariomycete species (Additional file 3: Table S3), including two Microascalean fungi, were provided as additional evidence. AU-GUSTUS was trained with the assembled K. proteae RNA-Seq data and subsequently MAKER was used to annotate the largest scaffold of the K. capensis and the largest scaffold of the K. proteae assembly, independently. After manually curating all the gene predictions on these scaffolds with Apollo 1.11.8 [43], SNAP was trained with the curated gene predictions of each scaffold and the scaffolds were re-annotated. SNAP was retrained for each species individually and subsequently both genomes were annotated. EuKaryotic Orthologous Group (KOG) classifications were assigned to the predicted proteins through the WebMGA [44] portal that performs reverse-position-specific BLAST [45] searches on the KOG database [46]. Additional functional annotations were predicted with InterProScan 5.13-52.0 [47,48], SignalP 4.1 [49] and TMHMM 2.0 [50].

Genome properties
K. capensis and K. proteae have similar genome sizes at 35.54 and 35.49 Mbp, respectively. It was possible to assemble the K. capensis genome into 29 scaffolds larger than 1000 bp, whereas the number of scaffolds above this threshold achieved for K. proteae was 133. Both genomes had a GC content of 52.8 %.
A total of 7940 protein-coding genes were predicted for K. capensis and 8174 for K. proteae. Additionally 137 and 116 tRNA and 30 and 27 rRNA genes were predicted for each species, respectively. More than 74 % of the protein-coding genes of each species could be assigned to a putative function via the KOG and Pfam databases. The content of the two genomes are summarized in Tables 3 and 4.