Globally distributed root endophyte Phialocephala subalpina links pathogenic and saprophytic lifestyles

Background Whereas an increasing number of pathogenic and mutualistic ascomycetous species were sequenced in the past decade, species showing a seemingly neutral association such as root endophytes received less attention. In the present study, the genome of Phialocephala subalpina, the most frequent species of the Phialocephala fortinii s.l. – Acephala applanata species complex, was sequenced for insight in the genome structure and gene inventory of these wide-spread root endophytes. Results The genome of P. subalpina was sequenced using Roche/454 GS FLX technology and a whole genome shotgun strategy. The assembly resulted in 205 scaffolds and a genome size of 69.7 Mb. The expanded genome size in P. subalpina was not due to the proliferation of transposable elements or other repeats, as is the case with other ascomycetous genomes. Instead, P. subalpina revealed an expanded gene inventory that includes 20,173 gene models. Comparative genome analysis of P. subalpina with 13 ascomycetes shows that P. subalpina uses a versatile gene inventory including genes specific for pathogens and saprophytes. Moreover, the gene inventory for carbohydrate active enzymes (CAZymes) was expanded including genes involved in degradation of biopolymers, such as pectin, hemicellulose, cellulose and lignin. Conclusions The analysis of a globally distributed root endophyte allowed detailed insights in the gene inventory and genome organization of a yet largely neglected group of organisms. We showed that the ubiquitous root endophyte P. subalpina has a broad gene inventory that links pathogenic and saprophytic lifestyles. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3369-8) contains supplementary material, which is available to authorized users.


Background
Plant roots are confronted with the colonization of symbiotic fungal species ranging from pathogens to mutualists [1]. While research has largely been focused on the symbiotic and pathogenic interactions, seemingly neutral associations with plant roots by endophytes received less attention [2,3]. Endophytic fungi colonize functional roots tissues but disease symptoms do not develop at all or at least not for prolonged periods of time [4]. Despite their prevalence in many ecosystems, little is known about the nature of interaction with their hosts [5,6]. It is assumed that they behave along the mutualism -antagonism continuum depending on host conditions and environment, as shown for some mycorrhizal fungi [7][8][9].
Species belonging to the helotialean Phialocephala fortinii s.l. -Acephala applanata species complex (PAC) are the dominant root endophytes in woody plant species [5]. PAC shows a global distribution as PAC species colonize roots from arctic to subtropical plant species throughout the northern hemisphere [10][11][12]. Recently, a study proved the presence of PAC on the southern hemisphere [13]. In contrast to ectomycorrhizal species (EcM), which are usually confined to primary, nonlignified roots, PAC can be found in all parts of the root system, predominatly on fine roots < 3 mm where up to * Correspondence: christoph.gruenig@microsynth.ch † Equal contributors 1 80% of randomly sampled roots were colonized [5]. In addition, PAC species are belonging to the first colonizers of tree seedlings in natural forest ecosystems, infecting them within a few weeks after germination [14].
PAC is composed of more than 15 cryptic species [10], eight of which were formally described [15,16]. Among the strains sampled from a single study site, PAC species form communities of up to 10 sympatrically occurring species. In contrast to agricultural ecosystems, PAC communities remain stable for several years [17] although minor long-term changes in the frequency of species can be observed [18]. Most PAC communities are dominated by few species and additional species occur at low frequencies [5] as observed in many other community structures of biological systems [19]. Species diversity and community structure do not correlate with the tree community, geographical location, soil properties, management practices nor does climate, precipitation and temperature [10]. Host specificity of PAC species was not observed [5] except for A. applanata that was almost exclusively isolated from species belonging to the Pinaceae but rarely from ericaceous roots of the ground vegetation [14].
Despite the fact that PAC species are highly successful colonizers of plant roots and widely distributed in natural ecosystems, their ecological role is still poorly understood.
They were described as beneficial, neutral or pathogenic for different hosts, growing conditions and fungal strains [5,20,21]. New results comparing the effect of PAC species and strains on Picea abies indicate that the outcome of the interactions is mainly driven by the fungal genotype and follow the antagonism-mutualism continuum. Whereas some of the PAC/P. subalpina strains were shown to be pathogenic and killed most of the seedlings, others were benign [22]. Nevertheless, infection by PAC is costly for the plant since an increase in plant growth was never observed [22]. The outcome of PAC-host interactions depends on the ability of PAC strains to invade and colonize host root tissues. This is evident by the health status of Norway spruce seedlings, which negatively correlates with the biomass of the fungus in roots [22,23]. However, negative effects of harmful PAC strains are reduced by the co-colonization of ectomycorrhizal fungi and other PAC strains [24].
The dynamics of PAC-host colonization was rarely studied [25][26][27], and data on intraspecific variation of colonization dynamics for different PAC species is missing completely. In general, PAC strains form hyphopodia-or appressoria-like structures to enter root hairs or epidermal cells (Fig. 1a, b). After entering the root, PAC strains grow inter-and intracellularly and colonize the cortex but rarely invade the vascular cylinder (Fig. 1c, d). Intercellular  Queloz, unpublished). Figure 1a, b. Surface colonization and appressoria/hyphopodia formation of P. subalpina on Cistus incanus roots. Figure 1c. Cross-section of Pinus strobus root colonized by PAC stained using borax, methylene blue and toluidine blue and counterstained with basic fuchsine. The fungus completely colonizes the cortical tissue up to the endodermis. Accumulation of phenolic compounds in the vascular cylinder is evident by the presence of dark granular structures. Figure 1d. Example of intracellular colonization of P. subalpina in C. incanus cortex (arrow) labyrinthine fungal tissue resembling the Hartig net in ectomycorrhizal fungi as well as mantel-like structures were occasionally observed for PAC [27][28][29].
Host defense reactions such as cell-wall appositions or lignituber formation were rarely observed [27]. Intracellular hyphae traverse host cells by narrow penetration hyphae without apparent lysis of the plant cell wall while the host cell cytoplasma disintegrates after colonization by PAC hyphae (Fig. 1d). Pertaining to the ultrastructural level, hyphae are not surrounded by host-derived perifungal membranes, which are regarded as hallmark for biotrophic fungal associations [27]. Finally, cortical cells of the plant are often associated with thick-walled, heavily melanized fungal cells, i.e. microsclerotia, which were shown to accumulate reserve substances [5,25].
The availability of genomic sequences provides information on the gene inventory of species and identifies characteristic genomic structures and gene sets associated with different lifestyles [30][31][32][33]. Although the number of sequenced fungal genomes increases rapidly, genome sequencing of ascomycetes was mostly restricted to pathogenic, saprophytic or well-known mutualistic species. In contrast, only few endophytes were sequenced and restricted to endophytes in the Clavicipitaceae [34,35]. Clavicipitalean endophytes are obligate biotrophs, colonize their hosts systemically and follow a very close symbiotic lifestyle with their hosts but roots are not colonized [36]. This sets them apart from non-clavicipitalean endophytes isolated from all parts of the plants at high frequencies.
An exception is the genome of Phialocephala scopiformis, a foliar endophyte, for which the draft genome was recently published albeit with no analysis [37]. In the present study, the genome of P. subalpina, the most frequent species of the PAC, was sequenced and compared to the genomes of 13 ascomycetous species with different lifestyles to get first insights in the genome structure and gene inventory of non-clavicipitalean endophyte.

Phialocephala subalpina holds a large and feature-rich genome
The genome of P. subalpina strain UAMH 11012 was sequenced at 25 x coverage using the Roche/454 GS FLX technology. Sequence data was assembled into 204 scaffolds (excluding rDNA repeat and mtDNA) with a genome size of 69.7 Mb and an average GC content of 45.9% (Table 1). The complete rRNA repeat (part of this assembly) and the mtDNA (http://www.ebi.ac.uk/ ena/data/view/JN031566) [38] were assembled manually and validated by Sanger sequencing. Data sets are accessible at http://pedant.helmholtz-muenchen.de/ genomes.jsp?category=fungal. The genome and annotation data was submitted to the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena/data/view/ FJOG01000001-FJOG01000205). Mapping of reads data against the assembled genome did not reveal any significant deviations from the average 25x coverage except for the mtDNA (2,629x) and rDNA repeat (1,422x) indicating that no repetitive sequences were collapsed in short scaffolds leading to an underestimation of the genome size.
The annotation pipeline and manual validation resulted in 20,173 gene models. In addition, 91 tRNA genes coding for all amino acids and 20 5S rRNA loci were identified. None of the expected single-copy core orthologous genes found in eukaryotes (248 and 246 genes, see material and methods) was missing in the protein predications for P. subalpina indicating that the core gene inventory had completely been covered. This was supported by EST data as 28,045 out of 28,092 assembled transcripts (99.83%) mapped to the assembly with high coverage (Additional file 1). 73.6% of the 20,173 proteins showed an identity >30% with known proteins in the Similarity Matrix of Proteins database (SIMAP) [39]. The remaining 5,313 proteins with less than 30% identity included 4,233 speciesspecific P. subalpina proteins. No significant length difference was observed among the high identity and the low identity genes (Additional file 2A). Moreover, mapping of the 4,233 proteins against the 454 EST dataset and RNA-Seq data showed that 2,833 of these gene models were covered by ≥50% of the ORF length with EST/RNA-Seq data (Additional file 2B). A classification scheme of the gene models based on different analysis is given in Fig. 2.

Classification of repetitive elements
Repeats that could be classified as transposable elements (TEs) accounted for 5.7% of the genome sequence (Table 2). In contrast, 1.8% of repeats identified in RepatScout analysis could not be classified to any TE family. TE were dominated by Class I elements of the Copia and Gypsy families attributing for 55% of all TEs. In contrast, class II TEs were less abundant and were dominated by Tc1 Mariner and Helitron elements. A large fraction of the repeat consensus sequences of the RepeatScout analysis could not be assigned to any TE family. TEs of all classes/families were evenly scattered throughout the genome of P. subalpina and no evidence for TE-rich islands was observed (Fig. 3). Besides the TEs, the genome of P. subalpina also included lowcomplexity sequences, tandem as well as simple sequence repeats (total 8.1% of the genome).

Presence of RNAi pathway and evidence for RIP mechanisms
Homologs of DICER, ARGONAUT and RNA-dependent RNA polymerase genes were found in multiple copies in the genome of P. subalpina and comparison with other ascomycetes showed that the copy number of each of the three genes was higher than with other ascomycetes (Table 3). Similarly, several copies of cytosine methyltransferase gene of the Dnmt1 family were present in the genome of P. subalpina (Table 3 A clear difference in the dinucleotide frequencies was observed in repeat versus genomic control regions and the difference in dinucleotide frequencies was more pronounced in P. subalpina than in S. sclerotiorum (Fig. 4). TpA dinucleotides were significantly overrepresented in repeats whereas CpA/TpG were underrepresented suggesting a dominant mode of RIP for CpA to TpA mutations. In addition, repeat regions were generally rich in AT content as also ApA, ApT and TdT were more frequent in repeats than in non-repeat genomic regions (Fig. 3).
The genome indicates horizontal gene transfer (HGT) events from bacteria co-occurring in the same habitat Twenty one out of 163 genes, originally with a non-fungal best SIMAP hit, showed a skewed taxonomic distribution or higher bit scores with non-fungal taxa in BLAST searches against the NCBI non-redundant protein (nr) database (Table 4). Phylogenetic analysis for these 21 genes showed that they likely result of HGT as the gene trees significantly deviate from the expected species tree (for three examples see Fig. 5b-d; see also http://purl.org/ phylo/treebase/phylows/study/TB2:S20196). In contrast, the RPB2 gene which was used as a control in the analysis, showed the expected ascomycetous phylogeny (Fig. 5a). In the majority of HGTs, the protein sequences taken from the phylogenetically closest non-fungal species were derived from soil-borne bacteria (i.e. Collimonas arenae or Brevibacillus laterosporus) or to live in the rhizosphere (i.e. Frankia sp.) and share therefore the same habitat as P. subalpina. In 12 of the phylogenies, one or few of the closest relatives were of fungal origin while most of the  Table 2), putative paralogous gene models in P. subalpina were analyzed using the Uclust algorithm, low identity gene models showing <30% identity in SIMAP searches and gene models including ≥1 InterPro accessions. Five hundred eighty four singlecopy and high-identity genes not including InterPro accessions were not covered by one of the four characteristics  Fig. 5d).

Key secondary metabolite genes
The genome of P. subalpina encodes a large number of genes coding for putative secondary metabolite (SM) key enzymes (Table 5). 75.8% of the SM key genes in P. subalpina clustered with putative orthologous genes in the other 13 species without any obvious dominance of the closely related Leotiomycete species, i.e., P. subalpina shared most orthologous clusters with Aspergillus flavus. Numerous SM key genes are clustered in putative secondary metabolite loci including genes for acyl-, and methyltransferases, oxidoreductases, cytochrome P450 monooxygenases, or transcription factors (Table 5 &  Additional file 3). Among the eight genes for non-reducing Type I PKSs, two gene clusters were identified in P. subalpina probably involved in the melanin synthesis pathway. Whereas PAC_11435 was placed in the same QuartetS cluster as G. lozoyensis PKS1, PAC_07895 showed the highest similarity with the gene of A. alternata encoding alm and the gene of A. fumigatus encoding Alb1/PksP. In addition, PAC_07895 was placed in a second QuartetS cluster. Adjacent to both PKS genes, a putative hydroxynaphthalene reductase gene was found (PAC_11432, PAC_07896). However, the two putative scytalone dehydratases (PAC_18365, PAC_19872) in the P. subalpina genome were not located within either cluster. Presence of putative orthologous genes for the two PKS for the 13 ascomycetous species included in the comparative analysis (Table 6) showed that only melanized species were included in these two clusters. Further, genes of other Leotiomycete species, such as S. sclerotiorum, B. cinerea and M. brunnea, were represented in both clusters as P. subalpina. Two NRPS genes were identified that are likely encoding siderophore synthetases (PAC_05248 and PAC_13158). Besides key enzymes in secondary metabolism, the genome of P. subalpina also encodes a fucose-specific lectin (PAC_07514) with similarities to the AAL protein of Aleuria aurantia that was shown to protect the fungus against predators and parasites [40].
Comparative genome analysis proves different lifestyles and enlarged gene families in P. subalpina To explore the unexpectedly large gene set, the proteome was compared to 13 ascomycetous proteomes ( Table 6) using orthologous cluster analysis, InterPro motif occurrence and a review of Carbohydrate-active enzymes (CAZymes). A total of 174,555 predicted proteins were grouped into 20,555 clusters of corresponding putative orthologous genes. Proteins of P. subalpina were present in 12,932 clusters (62.9%), significantly more than for any other investigated species. 1,408 clusters included proteins of all 13 species and  P. subalpina covering core functions in the primary metabolism, energy supply and cell cycle. 163 Quar-tetS clusters were mostly restricted to pathogens, whereas 61 clusters were predominantly found in saprophytic species. Principal component analysis based on these two datasets showed that P. subalpina was either placed within the pathogens or close to the saprophytes (Fig. 6a, b). Moreover, P. subalpina also shared the highest number of QuartetS clusters with the two mycorrhizal species (Table 7, Fig. 6c). The most frequent FunCat annotations for the pathogenand saprophyte-related clusters showed that the secondary and C-metabolism was most often included but also several putative orthologous proteins classified as virulence and disease factors were recognized (Table 8)      subalpina either in the pathogenic cluster or the saprophytic cluster (Fig. 7a, b). Enrichment of GO terms for the overrepresented InterPro accessions in P. subalpina compared to average number of InterPro accessions found in 13 ascomycetous genomes. Of the 963 overrepresented InterPro accessions 57% mapped to one or several GO terms and the table summarizes the GO terms with the highest numbers of distinct InterPro accessions.
Eight hundred eighty one gene models in P. subalpina were classified in 138 different CAZyme families resulting in 998 CAZyme modules, which are significantly more than observed in the other ascomycetous species. The second most frequently found CAZyme modules (883 modules) were encoded by F.     (Fig. 8).
Especially modules involved in pectin breakdown were encoded in high copy numbers in the genomes of both P. subalpina and F. oxysporum but P. sublpina included also a higher number of modules for cellulose and hemicellulose degradation (Additional file 5). The ectomycorrhizal species T. melanosporum and C. geophilum as well as the obligate biotrophic species B. graminis were separated due to the small number of PCWDEs (Fig. 8).

Discussion
In the present study we sequenced the genome of the root endophyte Phialocephala subalpina belonging to the Phialocephala fortinii s.l. -Acephala applanata species complex one of the most prevalent species in forest ecosystems. By comparative genomic analysis with the gene inventory of 13 other ascomycetous species we show that P. subalpina links pathogenic and saprophytic lifestyles.
Genome expansion due to a large set of distinct genes in gene families With a genome size of approximately 69.7 Mb, the P. subalpina genome is significantly larger than the average genome size of previously sequenced ascomycetous species [41]. Genome expansions can be caused by various events including (i) genome duplications, (ii) invasion of autonomous elements such as TEs and expansion of repetitive sequences such as microsatellites and tandem repeats into the genome, (iii) the number and length of introns and/or (iv) the expansion of the gene inventory [42,43]. No evidence for large segmental duplications were observed in the P. subalpina genome by genome-wide alignments using CoCoNUT [44] and the frequency of repeats in general and TEs in special was small, considering the genome size (Additional file 6). Proliferation of TE is counteracted by three processes which act at different stages of TE proliferation. Repeat induced point mutations (RIP) act on the DNA level by introducing C to T transitions and reversing CpG to TpA dinucleotides in repeated regions [45,46]. MIP (Methylation induced premeiotically) de-novo methylates repeated DNA sequences [47] and RNA interference (RNAi) suppresses TE proliferation either by heterochromatin assembly or small interfering RNAs, which silence TE transcripts [48][49][50]. Indirect evidence that a RIP mechanism is or was active in P. subalpina stems from the skewed dinucleotide distribution in repeat For abbreviation of species see Table 2 Table  [51] but no sexual stage is known in P. subalpina. However, several lines of evidence suggest that sexual reproduction regularly occurs as (i) most PAC populations showed no gametic disequilibrium, (ii) mating types do not deviate from a 1:1 ratio in PAC populations and (iii) strong purifying selection was recorded in the mating type loci [52]. Moreover, teleomorphs are known for phylogenetically closely related species such as Phaeomollisia piceae or Mollisia spp. [53]. Therefore, it seems likely that field studies have overlooked the teleomorph of PAC species so far [52].
Common to the RIP/MIP process is that cytosine DNA methyltransferase of the Dnmt1 family play a key role [45,54]. Homologs of cytosine DNA methyltransferases were present in P. subalpina including two gene models closely related to N. crassa RiD and one protein related to N. crassa Dim2. A fourth gene model found in P. subalpina (PAC_02147) was closest related to MASC2 of Ascobolus immersus which was placed in a cluster exclusively with basidiomycete species in the study of Amselem et al. [54]. However, additional analysis showed that also other Leotiomycete species included MASC2 homologs (B. cinerea  RNAi key enzymes such as Dicer, Argonaute and RdRP in multiple copies. Based on these findings we hypothesize that P. subalpina has a well-developed arsenal of defense mechanisms in place to counteract the proliferation of TEs which may explain the comparative low frequencies of TEs. No significant differences in intron length and the average intergenic distance were observed. However, the gene inventory was one of the largest among ascomycetes with 20,173 annotated gene models resulting in 2,500 to >10,000 more gene models than in other ascomycetes. In the light of the broad host range and the broad geographical distribution of species belonging to the P. fortinii s.l. -A. applanata species complex [1, 3, 5, 10] an enlarged gene repertoire could be expected. However, even compared to other fungal species with broad host ranges such as S. sclerotiorum, B. cinerea and F. oxysporum the number of gene models is large [55,56]. The high numbers of gene models may be the result of the annotation of a large fraction of very short gene models [55] or gene models including TE fragments. In P. subalpina, no significant deviation in protein length distributions for low identity genes was observed and a large fraction of the low identity genes were covered by ESTs. Moreover, particular attention was paid to mask TEs. Therefore, the expansions in the gene inventory of P. subalpina do not result from annotation artefacts. Several factors contributed to the high number of gene models found in P. subalpina. First of all, P. subalpina showed a slightly higher fraction of putative paralogous genes than observed on average for the other 13 ascomycetous species. F. oxysporum showed the highest fraction of putative paralogous genes and this observation is due to large segmental genome duplication [56]. Secondly, a significant number of gene models in P. subalpina were species-specific as they showed no significant hits in SIMAP analysis. The high fraction of speciesspecific genes may be the result of the lifestyle of P. subalpina as well as the missing genome data of closely related species in the Loramyces -Vibrissea clade [53,57]. Indeed, if the supposedly species-specific genes of P. subalpina were blasted against the recently announced genome of the closely related P. scopiformis [37], 28% of the gene models showed significant blast hits (>1.0 E -14 ). Thirdly, significantly more InterPro accessions were annotated in P. subalpina including 13,074 gene models than in any other species used for comparison. The high number of annotated InterPro accessions in the gene inventory did, however, not result in an expanded functional catalogue as the number of distinct InterPro accessions per species reached a plateau at approx. 5,200-5,500 distinct accessions indicating that "more of the same" is present in P. subalpina.
Our analyses show that the gene inventory of P. subalpina was also expanded to some extent by HGT from non-fungal organisms. A systematic analysis of the acquired prokaryotic genes by 60 fungal genomes by Marcet-Houben & Gabaldón [58] showed that species in the Pezizomycotina exhibited a high incidence of interdomain HGT including between 4 and 63 proteins per species. The 21 HGT events found in P. subalpina likely represents a lower limit of events and closer inspection of uncertain candidates will likely reveal additional HGT events. Moreover, HGT events from other fungal species were shown to be another significant source for HGT [59,60]. The phylogenetically closest protein sequences often originated from species in the Burkholderiales or Actinomycetales that colonize soil and/or roots, i.e., they share their habitat with P. subalpina and rendering a HGT event likely.
In almost half of the cases, the HGT event was unique or occurred recently. In contrast, also older HGT events were detected that were characterized by the presence of multiple fungal species within a clade of non-fungal species [60]. Interestingly, O. maius and P. scopiformis shared several of the 21 studied HGT events. It is possible that the HGT event occurred in a common ancestor as both fungi are related to P. subalpina (Fig. 6a). Alternatively, it might be possible that these genes were introduced twice independently for O. maius as O. maius can also be found in similar habitats as P. subalpina. It is associated with roots of ericaceous shrubs and involved in the decomposition of sphagnum peat [61].  Genes transferred by HGT can drive important evolutionary innovations as shown for plant pathogens [60,[62][63][64]. This was also observed in P. subalpina as several proteins were hydratases or peptidases. Notably, two β-lactamases were included that are involved in the detoxification of β-lactam antibiotics [65] and may result in a competitive advantage against other microbes.

The chameleonic genome of P. subalpina
The ecological role of members of the P. fortinii s.l.-A. applanata species complex is still poorly understood. They were described as beneficial, neutral or pathogenic for different hosts, growing conditions and fungal strains [5,20] and even a role in mycorrhizal associations was hypothesized [21]. In order to shed light on the lifestyle of P. subalpina we compared the genome against 13 genomes of other ascomycetous species with different lifestyles. All our analyses showed that the gene inventory of P. subalpina supports multiple lifestyles. First, P. subalpina shared more putative orthologous genes with the two ectomycorrhizal species included in the analysis than any other species indicating some affinities with the two species. Nevertheless, an important difference is evident. EcM fungi as well as obligate biotrophic pathogens are generally characterized by a reduced gene inventory especially for plant cell wall degrading enzymes (PCWDEs) [32]. However, P. subalpina encodes a high number of PCWDEs. Moreover, several EcM fungi and obligate biotrophic pathogens show genome expansions due to TE proliferation [30,66,67] which was absent in P. subalpina.
In contrast to EcM fungi, saprophytes and necrotrophic/ hemibiotrophic pathogens are well endowed with enzymes involved in the degradation of plant material [32]. The main difference between these two groups is that the pathogenic species must have a specific gene inventory allowing them to invade hosts and overcome plant immune response, i.e., pathogen-associated molecular patterns (PAMP) or damage-associated molecular patterns (DMAP) [68,69]. Whereas necrotrophic pathogens kill the host tissue by secreting effectors like toxins and/or proteins, hemibiotrophic pathogens grow intracellularly. and form specialized structures such as haustoria for nutrient uptake [70]. Recent comparative genome analysis showed that there are only few genes associated with plant pathogens that are absent in non-pathogens [33,71]. However, domains overrepresented in necrotrophic/hemibiotrophic pathogenic species compared to saprophytes were identified [71]. When our dataset was enriched for overrepresented InterPro accessions in pathogens only few of the accessions were in accordance with Soanes et al [71]. However, a closer look showed that also of the accessions listed by Soanes et al [71] were also overrepresented in our dataset yet with a factor <2x. Irrespective of which InterPro accession list was used for analysis, P. subalpina was placed close to the pathogenic species (Fig. 7b and Additional file 7), indicating the robustness of the analysis. The list included genes with protease/peptidase domains, cutinases, pectinases, genes including a necrosis inducing domain or genes with chitinbinding modules. Many of these gene classes were shown to play a pivotal role in plant-pathogen interactions [68,70]. For example, PAMP-induced host chitinases can either be overcome by the action of secreted proteases [72] or the secretion of LysM effectors that may be coupled with a chitin-binding module [73,74]. The finding that P. subalpina includes the repertoire of pathogenic species fits well with the recent host-fungus interactions studies showing that PAC strains behave along the antagonism-mutualism continuum and are localized rather towards antagonistic interactions as the colonization results in reduced biomass accumulation of the host [22]. However, strong strainspecific differences were observed in the outcome of the interaction with some strains killing the majority of the seedlings and others only marginally affecting the host [22] and future studies are needed to understand the underlying mechanisms. Despite the pathogenic gene repertoire observed in P. subalpina, host defense mechanisms such as lignituber formation and cell wall appositions are rarely observed during the invasion of PAC strains indicating that PAC can manipulate/suppress the plant immune response [25]. Although the precise mechanisms how P. subalpina suppresses host induced defense mechanisms are unknown, effectors such as small secreted proteins (SSPs) predicted in the genome of P. subalpina may be candidates as they were shown to function as effectors in plant-fungal interactions [70,75] and genome-wide differential gene expression studies during host colonization will help to identify possible effectors.
Beside the pathogen-related gene repertoire, our analysis shows that P. subalpina has also affinities with saprophyte-specific genes indicating that the species includes the signature of both lifestyles in the genome. A very similar positioning in the analysis was observed for F. oxysporum. Indeed, F. oxysporum not only includes the >70 pathogenic variants but non-pathogenic strains of F. oxysporum were also isolated as endophytes from asymptomatic roots [76][77][78].
Both species share a high number of β-lactamase/β-lactamase-related genes with saprophytes involved in the detoxification of β-lactam antibiotics. β-lactamases are well known fungal defense effector proteins [79], and detoxifying β-lactam antibiotics help to persist against antagonists. Besides the β-lactamase genes, those coding for sulfatases and different hydrolases were also enriched in saprophytic species including P. subalpina and F. oxysporum. In contrast to most pathogenic species, P. subalpina showed also an enlarged repertoire of PCWDEs involved in cellulose and hemicellulose breakdown as well as proteins with auxiliary activities supposed to be involved in lignin breakdown. Indeed, a strain of P. fortinii s.l. was shown to cause soft rot in autoclaved wood of beech and conifer species indicating that members of the PAC can degrade lignin, cellulose and hemicellulose [1].
Why does P. subalpina stand out of the crowd?
The answer of two fundamental questions is still pending: (i) why are PAC species so amazingly successful in colonizing their hosts and (ii) does the host also benefit from the colonization by PAC. At least the second question may be answered by the genome sequence of P. subalpina.
Although often hypothesized, endophytic fungi of forest trees were rarely shown to be mutualistic for their hosts [4]. However, the closely related needle endophyte P. scopiformis was shown to produce rugulosin, a potent secondary metabolite against herbivory [80,81]. Similarly, interaction studies using pathogens (Phytophthora plurivora and Elongisporangium undulatum), P. subalpina strains and P. abies seedlings showed that some of the P. subalpina strains effectively reduced mortality and disease intensity caused by the pathogens [82]. In addition, secondary metabolites were identified in PAC species that inhibited Phytophthora spp. [82], and the genome of P. subalpina encodes a high number of secondary metabolite key enzymes. Some of these are compatible with known pathways and products, such as melanin or ferrichromelike siderophores. For example, enzymes PAC_05248 and PAC_13158 resemble SidC-like (=type II) and NPS1-like (=type IV) synthetases, i.e., two distinct types of ferrichrome-like siderophore-producing enzymes [83]. Also, two non-reducing type I PKSs of P. subalpina flanked by hydroxynaphthalene reductases likely involved in the melanin synthesis were recognized. Whereas gene PAC_1135 was placed in the same putative orthologous gene cluster as PKS1 of G. lozoyensis, gene PAC_07895 was placed in a second orthologous gene cluster and showed high similarities with the alm gene of A. alternata/the PksP/Alb1 gene of A. fumigatus. All three proteins were experimentally shown to be involved in the melanin production [84][85][86]. Several Leotiomycete species included genes in both clusters (i.e. B. cinerea, S. slcerotiorum, M. brunnea), whereas melanized non-helotialean species and G. lozoyensis were included in one of the clusters (i.e. C. geophilum, O. maius and C.globosum). Moreover, no gene models of non-melanized species were included in one of these two clusters. To the best of our knowledge this is the first report for the duplication of the melanin pathway in ascomycetous fungi, although the redundancy of important secondary metabolite genes was reported previously [87,88]. Still, the products of many of the secondary metabolite key enzymes and clusters remain unresolved. However, they may represent the chemical language of P. subalpina to interact via small molecules with host plants and other microorganisms in the rhizosphere.
Although the biomass and turnover rates of fine roots in forest ecosystems largely depend upon tree species, age of forest stands and climate, estimates indicate that as much as 30% of the net primary production is used for fine root production [89]. In an Abies alba stand for example >6.0 t/ha fine roots biomass with a cumulative length of > 20,000 km/ha was estimated [90] showing the importance of fine roots as carbon source. Given the fact that species of the P. fortinii s.l. -A. applanata species complex dominate many of the endophytic assemblages in fine roots of temperate and boreal forests [10], PAC species have access to a very substantial carbon source. P. subalpina can already colonize the future carbon source by entering healthy fine roots and may then switch to the saprophytic lifestyle as soon as the roots die off. Indeed, the signature of both lifestyles was observed in the genome although studies about the importance of PAC species in the fine root turnover are missing. Moreover, the fungus can escape the highly competitive soil community by colonizing the roots [91].

Conclusions
The analysis of a globally distributed root endophyte allowed for detailed insights in the gene inventory and genome organization of a yet largely neglected group of organisms. Our analysis showed that the genome of P. subalpina has a versatile genome including genes for both a pathogenic and saprophytic lifestyle but showed also some affinities with ectomycorrhizal species. The degree of pathogenicity among strains of P. subalpina is high, as observed in F. oxysporum. In F. oxysporum pathogenicity is driven by mobile pathogenicity chromosomes [56]. Re-sequencing of multiple strains of P. subalpina will help identify the molecular basis of its pathogenicity. In addition, one central question will be to understand the evolutionary trajectory of PAC, i.e.whether PAC will become more pathogenic in future which would have a severe effect on forest ecosystems health, or whether the PAC-host interaction gets less antagonistic.

Selection of Phialocephala subalpina strain and DNA isolation
Strain UAMH 11012 (UAMH Centre for Global Microfungal Biodiversity, Toronto, Canada) was used for genome sequencing. The strain was originally isolated as single hyphal tip culture from P. abies fine roots in an undisturbed forest in Switzerland [14] and was classified using multiple classes of molecular markers [16,38,92]. The strain was grown in malt extract broth (50 ml 2% (w/v)) for 10 days at 20°C under constant shaking.
Mycelium was harvested by filtration, lyophilized, and total DNA was isolated using a CTAB-based protocol [93]. Strain identity was verified using microsatellite analysis [94] before the genome sequencing.

Sequencing, assembly and gap closing
A whole genome shotgun strategy (WGS) on the Roche/ 454 GS FLX (454) was used and sequencing was performed at the Functional Genomic Centre of University Zürich/ETH Zürich (FGCZ). In total, 1.3 Mio shotgun reads as well as 2.3 Mio reads from one 3 kb paired-end library, 6.2 Mio reads of three 8 kb paired-end libraries and 309,909 reads of a 20 kb library were included in the assembly. The assembly was performed using newbler 2.5 with a minimum overlap of 50 bases and 98% sequence similarity. We noticed that newbler tends to open gaps due to the high number of pair-end reads in the assembly. Therefore, reads mapping to both sides of the gaps were identified and gaps were closed after manual validation.

sequencing of a normalized EST library
The same strain was used to generate a normalized EST library. The fungus was pre-cultivated in 50 ml of 2% malt broth (20 g l -1 malt extract; Difco) for 14 days at 20°C under constant shaking. Then, the mycelium was homogenized with a blender for 30 s and 5 ml of the homogenized mycelium was transferred to new 50 ml of 2% malt broth (20 g l -1 malt extract; Hefe Schweiz). After 48 h the actively growing mycelium was harvested and immediately frozen in liquid nitrogen. Total RNA was isolated from approx. 75 mg fresh mycelium using the RNeasy plant mini kit (Qiagen, Hombrechtikon, Switzerland). Full-length cDNA was synthesized using the MINT kit (evrogen, Moscow, Russia) with a degenerated poly-T primer (5′-AAGCA GTGGTATCAACGCAGAGTAC (T) 4 G(T) 9 C(T) 10 VN-3′) during the first strand cDNA synthesis [95] and polTM1 (5′-AAGCAGTGGTATCAACGCAGAGTACT TTTGTCTTTTGTTCTGTTTCTTTTVN-3′) for the generation of dsDNA. The cDNA was normalized using the TRIMMER kit (evrogen) and the library was sequenced on the 454. Resulting reads were filtered for chimeras and then a whole transcriptome assembly was performed in newbler 2.3 with a minimum overlap of 50 bases and 98% sequence similarity.

Repeat library construction
A repeat library was constructed based on the final assembly of the genome (see Additional file 8). In brief, putative repeat sequences were derived from RepeatScout analysis [96]. Low-complexity sequences and microsatellites were removed using the default filtering options in RepeatScout. In addition, sequences <50 bases and with <10 hit on the genome were excluded from further analysis. The remaining sequences were clustered using blastclust (-S 90 -L 0.9 -b F -p F) and only one sequence per cluster was used for further analysis. Blastx was used to exclude any sequences in the library not belonging to TEs (i.e. HET domain containing proteins or ubiquitin-like proteins). This draft library was then mapped against the genome using RepeatMasker and consensus sequences of complete TEs were derived. Finally, sequences were classified according to the systematic of Wicker et al. [97]. The manually curated TE library was used for genome masking before annotation.

Annotation of the P. subalpina genome
The annotation strategy is presented in Additional file 9. In brief, a reference dataset of 1,089 gene models/proteins covered by full-length EST sequences was established and used as training dataset for Augustus [98]. Ab-initio gene prediction was performed on the masked genome using GeneMark-ES [99], Augustus [100] and FGENESH (Neurospora and Ustilago matrices). Geno-meThreader [101] was used to calculate spliced alignments for P. subalpina ESTs and protein data from related fungal species. The program was run based on a P. subalpina specific splicing model trained using 454 EST dataset with the software BSSM4GSQ [102]. For the training of the splicing model, ESTs were chosen that showed a high coverage >98%, a sequence similarity of 100% and had only one hit on the genome resulting in >4,000 intron/exon junctions.
A total of 28,092 assembled ESTs as well as 27,819 non-assembled but well matching 454 singleton reads were mapped. Protein sequences of Rhychnosporium secalis, S. sclerotiorum, B. cinerea, F. graminareum, N. crassa, Saccharomyces cerevisiae were mapped using GenomeThreader. Finally, Jigsaw [103] was trained based on the 1,089 high confident gene models and used to calculate the best gene model using all the available evidence (predictors, ESTs, and trans-alignments). Subsequently, all gene models were manually curated in Apollo [104] and functional annotation was performed in PEDANT [105].
Estimating the completeness of the P. subalpina genome and classification of gene models The completeness of the P. subalpina genome and annotation was assessed by mapping two separate highly conserved core gene sets including 248 and 246 proteins respectively [106,107]. In addition, the fraction of successfully and non-mapped ESTs was analyzed.
Gene models were classified based on the identity against the best hit in the Similarity Matrix of Proteins database (SIMAP) [39]. Proteins with identities ≥30% were considered as confidential gene prediction. The quality of the low identity genes (proteins with <30% identity with a protein sequence in SIMAP) was assessed by mapping the protein sequences against the assembled 454 EST dataset using GenomeThreader and analyzing the coverage of the gene model by ESTs. In addition, RNA-Seq data was available following the genome sequencing/annotation [108] (http://www.ebi.ac.uk/ena/data/view/ PRJEB12610) was mapped against the genome using tophat (http://ccb.jhu.edu/software/tophat/index.shtml), discarding reads with bad mapping quality. Read coverage was calculated using coverageBed (http://bedtools.readthedocs.org/ en/latest/content/tools/coverage.html) for each exon within the coding sequence. The coverage graph coverage graph was calculated in R using the ggplot2 package [109].

Presence of RNAi pathway and analysis for the presence of RIP mechanism
A reference dataset of key proteins in the RNAi pathway of Neurospora crassa (ARGONAUT, DICER, and RNAdependent RNA polymerases (RdRP)) was used to search for similar genes in P. subalpina as described in Laurie et al. [110]. In addition, the presence of repeat-induced point mutations (RIP) in the genome of P. subalpina was analyzed. As RIP results in C-to-T transitions at repetitive loci [45,111] we analyzed di-nucleotide abundances of all predicted interspersed repeats and of nonrepetitive control sequences using RIPCAL [112]. Overrepresented dinucleotides were identified by determining the fold change of the dinucleotide abundance between repeats and controls. S. sclerotiorum was included as a reference in the RIP analysis. Gene, repeat, and GC content were calculated using a sliding window analysis (window size: 1,000 bp, step size: 1,000 bp) and plotted for each scaffold. In addition, the genome of P. subalpina was mined for cytosine DNA methyltransferase genes of the Dnmt1 family involved in RIP and classified as described in Amselem et al. [54].

Analysis of horizontal gene transfer from non-fungal species
A total of 163 proteins showed the best hit with non-fungal taxa (124 with bacteria, 20 with plants and 17 with metazoa) when mapped against the SIMAP database in PED-ANT. The possibility of HGT for these genes was evaluated. In a first step, a BLAST search against the nr database (query coverage ≥40%; identities ≥20%; best 1,000 hits) was done and the taxonomic distribution and similarities of the hits were analyzed in R. Genes for HGT were selected as candidates if (i) they showed a biased taxonomic distribution of the hits (<15% fungal hits) and/or (ii) the fungal hits showed smaller bit scores in blast searches than non-fungal hits. Candidate genes were further analyzed using a phylogenetic approach. The full protein datasets of the ≤1000 best blastp hits against nr database for each candidate protein were downloaded. In addition, each candidate gene was also blasted against the genome data of phylogenetically closely related fungal genomes (G. lozoyensis, B. cinerea, M. brunnea, S. sclerotiorum). Protein sequences for each candidate gene were clustered with USEARCH v8.0.1517 (http://www.drive5.com/usearch/) [113] using the -cluster_fast option at an identity threshold of 0.95 (pre-sorted by length). If the resulting number of clusters exceeded 40, the threshold was sequentially reduced by 0.05 until cluster numbers were ≤ 40, or the threshold reached 0.5. This procedure was only applied to non-fungal sequences. Fungal BLAST hits were clustered at 0.95 if the number of clusters was ≤ 40. If not, the threshold was adjusted as described. The cluster representative sequence of each cluster was used for phylogenetic analysis. Protein sequences including the P. subalpina sequence were aligned using MAFFT [114] (E-INS-i method) and the alignment trimmed with TrimAl (http://trimal.cgenomics.org/trimal) [115] using the -strict setting. Maximum likelihood phylogenetic trees were inferred using FastTree 2.1 (http://meta.microbesonline.org/fasttree/) [116] with default settings. All trees were deposited in TreeBase (http:// purl.org/phylo/treebase/phylows/study/TB2:S20196). The DNA-dependent RNA polymerase II (RPB2) sequence that is routinely used in phylogenetic studies was used as control. Due to its high conservation, only proteins with > 75% identity were included and clustering done at a 90% threshold.

Secondary metabolism
Key proteins involved in the secondary metabolism of P. subalpina were searched by using conserved InterPro motifs of polyketide synthases (PKSs), non-ribosomal peptide synthetases (NRPSs) and related enzymes such as terpene synthases (TPC), and dimethyl allyl tryptophan synthases (DMATSs) followed by manual inspection of the genes to verify the domain arrangement in the case of PKSs and NRPSs. In addition, putative secondary metabolite clusters were identified by searching for genes encoding tailoring enzymes (e.g., acyl-, and methyltransferases, oxidoreductases, cytochrome P450 enzymes) up-and downstream of genes for key enzymes and searching for and promotors [117].

Comparative genome analysis
Comparative genome analyses were performed against a selection of published genomes of species showing different lifestyles (i.e. saprophytes, bio-and necrotrophic parasites and mycorrhizal species, Table 1). Special emphasis was put on selecting ascomycetous genomes and, whereever possible, genomes that are closely related to P. subalpina. All genomes used for comparative analysis were functionally annotated in PEDANT.
Putative orthologous gene clusters were analyzed using QuartetS [119] and the total number of clusters in which a specific species was present as well as pairwise shared clusters was recorded. Orthologous gene clusters enriched in pathogenic species were searched (four out of the five pathogenic species show entries for the respective cluster and ≤1 species of the six saprotroph species and ≤1 species of the two mycorrhizal species are present in the clusters respectively). Gene clusters enriched in saprophytes were searched using the same strategy. Both matrices were then subjected to principle component analysis using the vegan package in R [120] to analyze the position of P. subalpina compared to the other species. Moreover, FunCat terms [121] were enriched for these lifestyle enriched clusters by mapping P. subalpina geneIDs against the FunCat annotations and selecting the ten most frequent FunCat terms (any geneID/FunCat category was only considered once). In addition, putative orthologous gene clusters shared among the two mycorrhizal species T. melanosporum and C. geophilum with O. maius and P. subalpina was analyzed separately as the limited number of mycorrhizal species did not allow to properly defining lifestyle enriched clusters.
In a second step, the non-redundant portion of Inter-Pro accessions per gene model, i.e. counting each Inter-Pro accession per gene model only once, was analyzed. Besides some general statistics such as the number of distinct InterPro accessions within a species or different lifestyles or the total number of InterPro accessions, the number of significantly over-or underrepresented Inter-Pro accessions in P. subalpina was determined by comparing their abundance in P. subalpina against the average observed in the 13 genomes using a Z-test [122]. Significantly over-and underrepresented InterPro accessions were then mapped against the gene ontology annotations (http://www.ebi.ac.uk/interpro/download.html) and enriched for GO terms. Moreover, InterPro accessions >2x overrepresented in pathogenic species and saprophytic species were mined and the resulting matrices were subjected to principle component analysis as described above.

Funding
This study was partially funded by Vontobel Stiftung, Zürich to CRG. The funding agency had no influence on the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.