Introduction

Over the last few decades, novel molecular techniques and DNA based sequence data, coupled with phylogenetic analyses, have been used to test traditional taxonomic findings and overcome the difficulties in taxonomic studies (White et al. 1990; Liew et al. 2000; Hyde et al. 2013; Lücking et al. 2014). Traditional molecular tools are most appropriate for cultivatable and fast growing species isolated from the environment, but DNA from single spores isolates and fresh specimens can usually also be extracted and sequenced. A combination of traditional characterization and molecular approaches has been used to identify and discover numerous novel species in recent years, e.g. Ariyawansa et al. (2015), Liu et al. (2015), Hyde et al. (2016, 2017a), Li et al. (2016), Tibpromma et al. (2017), and Wanasinghe et al. (2018). Targeting highly variable loci for distinguishing species has resulted in the discovery of cryptic species in molecular phylogenetic studies (Hebert et al. 2003; Divakar et al. 2016).

Species in the environment may not be cultivatable or even visible (Mitchell and Zuccaro 2006; Stewart 2012). DNA metabarcoding has therefore become an important tool and is now commonly used to understand species diversity and community structure in complex communities of microorganisms (Heeger et al. 2018; Jayawardena et al. 2018b). The approach is based on the DNA-barcoding concept and uses the massive amount of sequence data produced through high-throughput sequencing (HTS) techniques. “Environmental DNA” from high throughput sequencing generally refers to DNA extracted from samples such as soil, water, or air. It contains two types of DNA, extracellular and intracellular (i.e., genomic DNA) (Levy-Booth et al. 2007; Pietramellara et al. 2009; Taberlet et al. 2018). Mycologists extract environmental DNA from a community composed of multiple organisms to analyse their composition and structure. Accordingly, they use the terms “environmental DNA” which is synonymous to “metagenomic DNA” in the sense of Handelsman et al. (1998). However, DNA extracts from environmental samples contain extracellular DNA in addition to the intracellular metagenomic DNA. Extracellular DNA may be released from an organism before or after its death. In particular macroorganisms always leave DNA behind in their environment, e.g. faeces, hair, urine, and skin cells (Herder et al. 2014). While zoologists introduced the term “eDNA” for this extracellular DNA, mycologists use this abbreviation for environmental DNA representing metagenomics. However, to avoid confusion, we suggest using the abbreviation “mgDNA” (metagenomic DNA) to designate intracellular DNA from environmental samples.

Fungal species which cannot be linked to any physical specimen are referred to as “dark taxa” or “dark matter fungi” (Parr et al. 2012; Grossart et al. 2015; Page 2016; Tedersoo and Smith 2017; Ryberg and Nilsson 2018). In recent years, a large amount of mgDNA ITS sequence data from environmental samples have been deposited in GenBank. These data have neither been linked to any specimens nor given any formal names to genus and species level (Taberlet et al. 2012; Herder et al. 2014; Hawksworth et al. 2016). Although an informal system was assigned for giving codes to species known only from sequence data (e.g. Inocybe sp. 3, Nara 2006; Hibbett et al. 2011), errors in communication have occurred.

The problems of dark taxa are believed by a minority of mycologists to be resolved by giving valid names to these taxa under the International Code of Nomenclature for Algae, Fungi and Plants, ICN (Hawksworth et al. 2016). Ryberg and Nilsson (2018) suggested that an integrated naming system is needed to facilitate unambiguous communication, to record and accumulate data on the dark taxa. However, there are some limitations of high-throughput sequencing metabarcoding, prompting many mycologists to disagree with validating names or using mgDNA as holotypes. The worldwide accepted set of rules for nomenclature is the Melbourne Code (McNeill et al. 2012). Article 38 states that a new taxon must be associated with a formal description or diagnosis from a physical specimen or an illustration of a specimen. Adapting that code allowing for the use of mgDNA data as holotype with all its consequences is currently under debate. A proposal by Hawksworth et al. (2016) that DNA sequences should serve as substitutes for type specimens of new taxa was already rejected at the nomenclature session of the last International Botanical Conference in Shenzhen, China (July 2017), but has been brought forward again to be discussed in the nomenclature session of the 11th International Mycological Congress in Puerto Rico. Two recent opinion papers that were co-authored by the majority of the current officers of the International Commission on the Taxonomy of Fungi (ICTF; Thines et al. 2018) and by over 400 mycologists (Zamora et al. 2018), respectively, have summarized various concerns of the mycological community against the premature introduction of DNA-only based nomenclature. There can be no doubt that the majority of mycologists neither need nor want to have these proposed changes, at least at the present time.

However, the implementation of new rules in botanical and mycological nomenclature does unfortunately not strictly rely on democratic principles. Instead, a rather old-fashioned system based on oligocratic committee votes is still in place and the decisions will eventually be made in the general assembly of the IMC conferences, where many people cannot attend for financial reasons. Therefore, an intrinsic risk exists that a small minority of mycologists will be able to overrule the majority and decisions will be made that will lead to chaos and substantial drawbacks in the progress of basic as well as applied mycology.

This paper aims to discuss on the major issues of using mgDNA as holotype without a physical specimen from a biological, ecological and taxonomic perspective. Case studies that illustrate some problems of using mgDNA for fungal identification and classification are provided with discussion (Botryosphaeria Ces. & De Not., Colletotrichum Corda, Penicillium Link and Xylaria Hill ex Schrank). The issues of using ITS non-coding regions from high-throughput sequencing metabarcoding are discussed. We also discuss about the solution of nomenclature codes and whether we should have the separate code for dark taxa and DNA specimen-based for fungi or an integrated system for both cases. Additionally, recommendations on detailed body of nomenclature codes based on some further case studies and review articles are provided.

Pitfalls of using mgDNA for taxon naming

mgDNA has the potential to provide a much better understanding of fungal biodiversity. We are still far away from having a real estimate of the real numbers of organisms yet to be discovered on earth, especially with micro-organisms and fungi, and mgDNA has provided clues as to where these potential organisms can be found. The systems which provide code/ID to a sequence from environmental sample are not linked to species-based databases, e.g. Index Fungorum and MycoBank (Hibbett et al. 2011). The errors in communication of using these systems have occurred from one publication to another (Hibbett et al. 2011; Ryberg and Nilsson 2018). The idea of naming species from mgDNA-based data was raised to improve the efficiency of communication, reducing erroneous publication and data records, which may lead to erratic results, and ultimately, false estimates of fungal biodiversity (Hawksworth et al. 2016; Ryberg and Nilsson 2018). Some authors have proposed that the naming of “dark taxa” can help to explore some important contexts, e.g. species counts (Ryberg and Nilsson 2018). However, there are many controversial issues that will emerge concerning the impact of using mgDNA as holotypes.

Formation of chimeras

Although high-throughput sequencing can represent the existence of uncultivable and invisible microbes, novel sequences can be raised artificially by chimeras formation and erroneous sequencing (Reeder and Knight 2009; Porazinska et al. 2012). Chimeric sequences are commonly detected in amplicon sequencing, but rarely detected with shotgun sequencing (Edgar et al. 2011). Chimeric sequences are known as artifactual PCR products which are erroneously generated from aborted extension during subsequent cycles of PCR. Chimeras are formed by an aborted extension strand generated from an earlier cycle which can bind to another single strand DNA template and function as a primer in DNA synthesis (Smith et al. 2010). Adjustment of the methodology to attain a low number of PCR cycles is recommended to avoid this formation (Hoshino 2012), while, on the other hand, high numbers of PCR cycles are needed to obtain enough PCR products (Hoshino and Matsumoto 2008). Hence, an increase in sensitivity of the PCR also increases the risk of chimeras. Therefore, it is very important to detect and filter out such sequences to avoid false diversity estimates (Wintzingerode et al. 1997), increasing the number of OTUs and novel discovery of “species” from erroneous sequence data (Smith et al. 2010; Hoshino 2012). However, it is difficult to detect chimeras as normally they have a short length, and occur near the end of a template (Hughes et al. 2015). Most fungal sequence data from high-throughput sequencing available in GenBank were obtained from amplicon sequencing, with ITS as the barcoding locus (Schoch et al. 2012). Multiple studies have discovered some unusual chimeric ITS sequences in public databases (e.g. Ryberg et al. 2008; Mullineux and Hausner 2009). Jumpponen (2007) noted that 40 or 31% of the sequences, respectively, in two clone libraries (from soil fungal analyses) were detected as chimeric. These errors increase the concern for fungal chimeric sequences in databases and might be a problem in the near future (Christen 2008), especially if mgDNA is adopted as possible holotypes.

Although chimera formation rates can be reduced by detecting chimeras approaches (e.g. de novo detection), there is still no perfect method to completely eliminate chimeric sequences (Haas et al. 2011). Thus, giving a scientific name (Latin name) to mgDNA which includes chimeras can lead to an overestimate of the community diversity and we might see many “new species” descriptions based on chimeric sequences in the future. It may happen that some taxonomists are not well versed in detecting chimeras and these results in assuming that these could apparently represent novel taxa. A scientific name is currently not assigned to chimeric sequences. Rather, they are often annotated as e.g. “Unidentified fungus”, which has no implications on fungal classification.

The cryptic species

Morphologically indistinguishable species that can be recognized only by their DNA sequences are referred to as cryptic species (Shivas and Cai 2012). Due to the widespread use of DNA sequence based techniques, there has been a rapid increase in the number of cryptic species of plant pathogenic fungi being detected. Multiple cryptic species are often found within previously described single morphological species, even for some “well-studied” species (O’Donnell et al. 2004; Cai et al. 2009; Cannon et al. 2012; Hagen et al. 2015; Udayanga et al. 2015). Colletotrichum Corda, Diaporthe Nitschke, Fusarium Link, and Phyllosticta Pers. are examples of important plant pathogenic fungal genera that actually include a considerable number of cryptic species. Due to their overlapping morphological characters, reliable identification at the species level is best based on the use of multi loci sequence data in these genera (Hyde et al. 2014; Jayawardena et al. 2016; Dissanayake et al. 2017) or better a polyphasic approach using many facets (Cai et al. 2009). The use of mgDNA in naming fungal taxa in these pathogenic genera would cause serious problems, as it is based on overall sequence identity between the query sequence as well as those in the reference databases. At present, 97% or greater sequence identity for OTU/species is used for species delimitation, identification and assessment of species numbers (O’Brien et al. 2005; Nilsson et al. 2008; Tedersoo et al. 2014; Garnica et al. 2016; Dissanayake et al. 2018; Jayawardena et al. 2018b). However, there is a large range of intra- and interspecific ITS sequence variation depending on the taxonomic groups (Nilsson et al. 2008). For example, the similarity of the ITS sequence exceeds 99% for some species (Xu et al. 2000; Dettman et al. 2001; Johannesson and Stenlid 2003). The ITS sequence of the ex-type culture of Col. queenslandicum B.S. Weir & P.R. Johnst. has a 99% similarity to the ex-type ITS sequences of Col. aenigma, Col. alienum B.S. Weir & P.R. Johnst., Col. Aotearoa B.S. Weir & P.R. Johnst., Col. clidemiae B.S. Weir & P.R. Johnst., Col. salsolae B.S. Weir & P.R. Johnst. and Col. ti B.S. Weir & P.R. Johnst. The ITS sequence of the ex-type culture of Col. kahawae subsp. kahawae J.M. Waller & Bridge and Col. kahawae subsp. ciggario B.S. Weir & P.R. Johnst. have 100% similarity over a 100% query cover. Another example is the ITS sequence of the ex-type culture of Diaporthe hongkongensis R.R. Gomes et al. showing 98% similarity to the ITS sequences of the types of D. eucalyptorum Crous & R.G. Shivas and D. pseudophoenicicola R.R. Gomes et al. In some cases, intraspecific ITS identity of ≤90% has been reported for certain taxa (Kuninaga et al. 1997; O’Donnell et al. 2000). The species can be resolved properly with the use of multi-loci sequence data instead of ITS alone. In mgDNA, ITS1 or ITS2 sequence data are presently being used, which are evidently shorter than the sequence of the complete ITS region. However, it has been demonstrated that shorter sequences lead to less reliable identification and ITS alone is largely insufficient for species resolution. Therefore, identification of the cryptic species will become even more difficult. Another pitfall of naming cryptic species based on mgDNA is that the correspondence of OTU with species can be unreliable. Normally, the OTUs are defined based on a 97% similarity threshold (Sneath and Sokal 1973). However, this threshold is usually an overall group average and may deviate for individual pairs of OTUs. Sometimes, some species may have ≤ 97% similarity can result in merged OTUs containing multiple species (Jayawardena et al. 2018b). Likewise, a single species may have ≥ 97% similarity but split into two or more species. Dissanayake et al. (2018) and Jayawardena et al. (2018b) were unable to identify the cryptic species using mgDNA to the species level.

Re-discovery of already named species

Over the past centuries, mycologists have studied and provided scientific names for numerous specimen-based species and this has allowed for effective communication amongst ecologists, plant pathologists and workers in other disciplines. Valuable dried type specimens (herbarium materials) have been preserved in fungaria and this allows for re-examination, which is a requirement of any reliable science. Epitypification was established to resolve the taxonomic problems when the type material is ambiguous, in poor condition or has been lost (McNeill et al. 2006; Hyde and Zhang 2008; Ariyawansa et al. 2014). Even though the type material is in good condition, DNA cannot usually be extracted from type materials easily. Thus, epitypification is often carried out (although not strictly recommended) to obtain molecular data from living materials (Hibbett et al. 2007; Hyde and Zhang 2008). If fungi were named without physical specimens, it would be difficult to establish epitypes based on the current rules (see Hyde and Zhang 2008: Ariyawansa et al. 2014), i.e. the epitype specimen should be identical and obtained from the same location, host or substrate as the type it interprets (Ariyawansa et al. 2014). On the other hand, when the morphological data available were insufficient, a few species have been established mainly based on sequence data, but accompanied with some additional evidence e.g. cultural characters, metabolite profiles (Pažoutová et al. 2013; Kamil et al. 2018). Ambiguous type materials of mgDNA will need epitypification, and this can lead to future problems when a new taxon discovered by using traditional methodology turns out to be identical to a short ITS fragment of holotype-mgDNA, which has no morphological features to compare it with. In addition, it should be kept in mind that it will still be possible to erect and describe new species based on morphological characters alone. There may even be some cases of coincidental redundant descriptions of the same fungus based on morphology and DNA, respectively. The whole concept should therefore be carefully reconsidered because there is actually no need to rush and change a well-working system.

Character evolution studies

Character evolution is the process of how and why a trait evolves along the branches over a period of time in order to reveal common ancestry. It improves the understanding in the history of life, explains the relationships among extant species, character states for each species, and a model for character evolution (Huxley 1957; Harvey and Pagel 1991; Vijaykrishna et al. 2006). For example, the evolution of closed fruiting bodies in apothecioid Pezizomycotina was described by Hansen et al. (2005) and Ekanayaka et al. (2017). Schmitt et al. (2009) found that perithecia in Lecanoromycetes have evolved independently, several times from apotheciate ancestors. Their results also suggested that angiocarpous ascomata are a means of pre-adaptation for the repeated gain of perithecia (Schmitt et al. 2009), which supported the hypothesis of neotenic evolution of perithecioid fruiting bodies in Lecanoromycetes (Grube et al. 2004). However, the phenotypic evolution of fruiting bodies should be considered in conjunction with functional correlations of characters (Schmitt et al. 2009). Character evolution also refers to the identifying features, the divergence which makes a lineage unique from others based on phenotypic changes, nucleotide or amino acid substitutions (Ariyawansa et al. 2015; Liu et al. 2015, 2017; Li et al. 2016; Hyde et al. 2016, 2017a, b; Hongsanan et al. 2017; Tibpromma et al. 2017). Many character evolution studies have been carried out within the past 15 years (e.g. Liu and Hall 2004; Li et al. 2005; Schoch et al. 2009; Schmitt 2011; Kumar et al 2012). However still there are many incomplete points to resolve. For example, although there is a general agreement on exposed hymenium (apothecium) as the primitive fruiting body type of Pezizomycotina, its relationships with other partially (perithecia) or completely enclosed (cleistothecia) fruiting body types are still unclear (Ekanayaka et al. 2017). Moreover, it is difficult to infer the complete flow of character evolution. The major issues for these problems are the unavailability of complete sets of taxon sampling during analyses and the lack of taxonomic studies. Both, complete sets of sequence data and relevant corresponding morphological characters are required.

We can use environmental sequences (mgDNA) to provide an almost complete set of sequence data. The mgDNA reveal that there are species that are highly divergent and not yet discovered. Moreover, it reveals new phylogenetic relationships at higher taxonomic levels, where it is almost impossible to compare character evolution only with morphological characters. Using mgDNA as holotypes does not provide any clue on morphology and its evolution across species. Therefore, there are still certain deficiencies in character evolution studies. Even when sequence data can reveal the phylogenetic relationships among taxa and their evolution with time, morphological characters are essential to explain these evolutionary processes in detail and how these evolved traits have become advantageous and useful according to the environment they live.

Can mgDNA fill the gap pertaining to any evolutionary distance we have seen among different fungal relatives? Yes possibly they can, but only from only one angle as we would be dealing with DNA sequences and the other aspect with regards to evolution of fungal phenotypes will still be unresolved. However the former should not be a drawback. Given that we are slowly moving from a morph based approach to a DNA based one, we envisage that sooner or later, we should be able to culture those fungi and examine their morphs as well as retrieve DNA sequences again. At that point in time, we should be able to bridge the gap in a similar way we have linked asexual fungi to their sexual ones, but now is premature.

Why Next Generation Sequencing (NGS)?

NGS unraveled complex fungal communities across different ecosystems


Diversity and distribution patterns of fungal communities and their diversity are central issues in fungal ecology as these information are crucial for understanding and predicting the roles played by fungi in maintaining ecosystem functions and stability (Kubartová et al. 2012; Peršoh 2015; Hoppe et al. 2016). Since fungal communities in environmental samples are generally complex comprising an unseen majority of members and cannot be efficiently be evaluated using direct observation and culture-dependent approaches, high resolution culture independent approaches (i.e. NGS) are needed to reasonably characterize those fungal communities (van der Heijden et al. 2008; Dissanayake et al. 2018; Jayawardena et al. 2018b). The existing ecological knowledge on fungal ecology, and in particular the information on their diversity patterns, community composition, resource use and the determinants of their community composition, was primarily obtained via culture dependent approaches or direct observation (i.e. sporocarp surveys), which only detect the composition of a portion of the culturable or the actively reproducing fungal community at some specific points in time (Hoppe et al. 2016). Thus, such existing knowledge in fungal ecology should be validated using NGS approaches. With such validation, we can gain better insights into fungal ecology and improve our understanding of the diversity and distribution patterns of fungal communities across wide ranges of habitats (Purahong et al. 2018b). Here we give some examples on important existing knowledge in fungal ecology that has been recently challenged by NGS in various environments, which includes the terrestrial and aquatic ecosystems.

NGS has unraveled the unseen majority of soil fungi

The first NGS study on forest soil fungi significantly changed the expectation of the magnitude of the fungal diversity in soils (1000 molecular operational taxonomic units (OTUs) in 4 g soil) and shed light on the factors (including the tree species and soil organic matter composition) that may have the largest influence on the soil fungal communities (Buée et al. 2009). Further studies have confirmed these findings by revealing a high diversity of soil fungi that is related to tree species and soil physicochemical factors (i.e. soil macro- and micro-nutrients, soil pH) (Rousk et al. 2010; Baldrian et al. 2012; Tedersoo et al. 2014). Furthermore, the high resolution of metabarcoding revealed highly pronounced niche preferences along vertical soil profiles (Peršoh et al. 2018).

Diversity of root endophytic mycobiomes


Knowledge on the patterns of diversity and community composition of fungi associated with plant roots has been challenged also with the results from NGS. It has been concluded that nonclavicipitaceous fungal endophytes (class 2) are characterized by a low in planta diversity and a broad host range (Rodriguez et al. 2009). However, NGS studies demonstrate that they may have rather high in planta diversity and exhibit strong host preferences (Schöps et al. 2018). The host preferences of the root-associated fungi have been shown before even within the asteraceaous plants (Wehner et al. 2014). The specific DSE fungi (dark-septate endophytes; also known as nonclavicipitaceous endophytic fungi class 4) were also shown to be broadly distributed; however, the extent of their in planta diversity is unknown (Rodriguez et al. 2009). A recent NGS study indicates that most of detected DSE fungi exhibit some degree of host preferences and have low in planta diversity in temperate grassland plants (Schöps et al. 2018).

Tree species preferences and diversity of wood-inhabiting fungi

Existing knowledge pertaining to wood-inhabiting fungal distribution and diversity based on sporocarp surveys indicates that wood-inhabiting fungal communities in temperate forests exhibit low α-diversity (average ~ 2 species or less/deadwood log) (Blaser et al. 2013) and are not specific to host tree species, leading to researchers differentiating only between softwood and hardwood decomposers (Tuor et al. 1995). These views have been confirmed recently by the results from sporocarp surveys during a large-scale, long term monitoring study of deadwood (BELongDead) experiment (Baber et al. 2016). However, there are few studies that have shown some degrees of host specificity of heart-rot fungi for trees species (Rayner and Boddy 1988; Boddy 2001; Boddy et al. 2017). NGS has been applied to the same sets of deadwood as the sporocarp surveys in BELongDead experiment (Baber et al. 2016) to answer the questions regarding to diversity and tree species preferences of wood-inhabiting fungi (Purahong et al. 2018b). The results from NGS demonstrate high diversity (22–42 OTUs /deadwood log) and strong tree species preferences (especially in broadleaf species), which contradict existing knowledge based on sporocarp surveys and challenges current views on wood-inhabiting fungal distribution and diversity in temperate forests (Purahong et al. 2018a, b). It has yet to be established if this high diversity is however, functional.

NGS challenges the classical view of fungal succession during leaf litter decomposition

NGS consistently confirmed that Ascomycota have highest relative abundances in the early stages of litter decomposition in temperate forests, and then there is a clear shift from Ascomycota to Basidiomycota in the later stages (Voříšková and Baldrian 2013; Purahong et al. 2016). However, the presence/absence data show that Ascomycota (66–82%) are much more frequently detected as compared with Basidiomycota (18–33%) across different sampling times during 473 days (Purahong et al. 2016). NGS data may in general only partly support the view of a succession from an Ascomycota to a Basidiomycota-dominated community from early to later stages of litter decomposition (Peršoh 2015) and the accuracy of relative abundance data derived from NGS is still questionable (Amend et al. 2010). A recent NGS study also suggests that the complex litter decomposition process is the result of a dynamic cross-kingdom functional succession between fungi and bacteria, where bacteria may facilitate the saprotrophic fungi by providing essential nutrients (Purahong et al. 2016).

Diversity of fungi in groundwater

Natural groundwater limestone aquifers are a challenging and unexplored fungal habitats. There is one study using 18S based eukaryote clone libraries to detect the fungal community in groundwater (Risse-Buhl et al. 2013) and a two further studies using direct microscopic morphological identification of fungal spores (Krauss et al. 2003) and a culture-dependent approach (Lategan et al. 2012). With little oxygen available, it is expected that anoxic groundwater exhibits low fungal diversity limited to facultative anaerobes. Recent NGS studies based on both DNA and RNA (ITS amplicon sequencing) have revealed a diverse taxonomy (mainly Ascomycota and Basidiomycota) and ecological functional groups (mainly saprotrophs) of fungi found in this habitat (Nawaz et al. 2016). The RNA study is interesting as it demonstrates that the detected fungal OTUs are not only spores but they also include the viable and/or active fungi (Nawaz et al. 2018). Life span of the precursor RNA containing ITS regions is very short, thus it can be considered to represent the active community during the last few minutes before sampling (Kos and Tollervey 2010; Purahong and Krüger 2012). Specifically, only metabolically active fungi are continuously transcribing rRNA precursor molecules and their ITS regions can be detected in the precursor rRNA pool (Anderson and Parkin 2007; Rajala et al. 2011). The living fungal community (RNA based) in this habitat mainly corresponds with availability of NH4+ and some macronutrients. Notably, all of these exciting results on the fungal community can be obtained without any need to name the involved organisms, and accordingly the experts refrain from reporting species names.

Numbers of taxa

The numbers of taxa detected via NGS are usually much higher than those detected by direct observations or culturing methods and they are assumed to include both cultural and non-cultivable fungi (Buée et al. 2009; Kubartová et al. 2012). However, due to the methodological biases in NGS, this assumption has not been verified. For example, not all cultivable fungi can be detected with NGS, but only the predominant ones are detected (Dissanayake et al. 2018; Jayawardena et al. 2018b), and for certain taxa, in particular the smut fungi, specific PCR primers need to be developed because the standard methods do not work for their ITS (Kruse et al. 2017). Similar to the direct observation or culturing methods, fungal richness from different studies may not be compared directly due to different laboratory standards, protocols and data processing (Lindahl et al. 2013; Purahong et al. 2017). Nevertheless, apart from the laboratory standard and protocol, we can get data from comparable datasets, i.e. same NGS platform, same primer pairs (or at least same targeted region) and re-analyze all interested data together to answer specific questions. With this procedure, we can reasonably compare the results of NGS across different studies and biomes (Nilsson et al. 2011). However, we must be aware that, based on the available methods, such studies have their limitations. Many species that are known from morphological studies have never been sequenced, and many predominant taxonomic groups of fungi cannot be well-separated at the species level without using at least a second barcode (see Figs. 3, 4). Therefore, it is not really possible to tell the exact number of species from NGS studies and we may be able to use the fungal taxonomic information only at genus level (Purahong et al. 2017; Purahong et al. 2018b).

Ecological data

During the past decade, NGS has emerged as a high resolution culture independent approach for characterizing microbial community composition and diversity in various ecosystems and biomes (van Dijk et al. 2014). It is clear that NGS has significantly increased the amount of data and expanded our knowledge of microbial communities and their distribution (Prosser et al. 2007; Kubartová et al. 2012). Recently, the cost of analyzing the DNA samples with NGS has also dropped significantly, thus boosting the potential for using this technique (van Dijk et al. 2014). Long-read sequencing has become currently available at high quality by using nanopore or PacBio sequencing (Heeger et al. 2018; Wurzbacher et al. 2018). By detecting hundreds to thousands of fungal OTUs in hundreds of samples within weeks, metabarcoding approaches tear down the limits of cultivation-based approaches in community ecology studies (Peršoh 2015). Furthermore, the new technologies allow to analyse a suitable number of replicates to infer statistical support for hypothesis testing. However, the first years of metabarcoding revealed that fungal communities are more complex and more dynamic in space and time than previously thought (Peršoh 2015). While this awareness demands for even more comprehensive sampling designs in future studies, metabarcoding already largely widened our knowledge of community ecology. A major finding is certainly the functional redundancy of compositionally diverse communities (Talbot et al. 2014). Furthermore, it was shown that they may differ for one plant species in dependence of the surrounding plant community (Toju et al. 2013). We also learned that endophytic and litter decomposing fungal communities are much tighter linked than previously thought (Guerreiro et al. 2018).

Using mgDNA from environment in species identification

To a large extent, mycologists have relied heavily on morphology from specimens and DNA sequence data derived from cultures for species identification (e.g. Jeewon et al. 2003, 2017; Ariyawansa et al. 2015; Liu et al. 2015; Hyde et al. 2017a, b; Tibpromma et al. 2017; Wanasinghe et al. 2017, 2018; Jayawardena et al. 2018a, b). The use of other molecular approaches based on DNA sequence data (e.g. PCR based DGGE and metatranscriptomics), have also provided insights into assigning species into specific taxonomic ranks and enumerating fungal diversity (e.g. Duong et al. 2006; Rampadarath et al. 2018). With such rapid advances in DNA sequencing technologies coupled with an ever increasing number of DNA sequences (for instance from NGS) being analysed from diverse fungal communities without any available morphs, mycologists are recovering a myriad of genetic data from the environment. Fungal taxonomy is dynamic with rapid changes in nomenclature and classification. One always has a certain uncertainty when naming a species, especially if the morphs are unclear or DNA sequence data fail to resolve species relationships. This mostly happens whenever we deal with genera containing many potential cryptic species, such as Aspergillus (Houbraken et al. 2014; Samson et al. 2014) or Colletotrichum (e.g. Damm et al. 2014; Jayawardena et al. 2016). The major advantage of going forward with a mgDNA approach is that it largely overcomes the discrepancies associated with morphological-based identification (e.g. time consuming, recovery of fungi in different stages, phenotypic plasticity). In addition, it increases the probability of detecting species where traditional methods will usually fail and when the species occurs in low density. For the time being, we cannot culture all species in vitro, but they are there. So mgDNA does help to track rare and elusive species and pave the way to facilitate future taxonomy. In the same way, we have old fungal specimens where nowadays, DNA sequence can be retrieved and analysed to compare with existing ones, we can anticipate that mgDNA can also serve as potential “reference molecular types” for any important taxa that are yet to be cultured or discovered based on morphology.

However, there are some pertinent issues that merit attention before formalizing the use of mgDNA in taxonomic studies. Under most circumstances, mgDNA is usually sheared and results in shorter reads as compared to other commonly used gene regions in phylogenetics. If coverage is low, the likelihood that the mgDNA clusters with other “eDNA” / OTUs is high and this does not help in species identification. In addition, the bootstrap support to link mgDNA with its known counterparts is often low and hence ends up in low taxonomic resolution. There is a need to standardize analytical methodologies and for better interpretation/assessment of false positives and negatives generated from NGS sequences. Otherwise we might run the risk of using contaminated samples, repetitive DNA sequences or reads with sequencing errors generated through inadvertent quality control procedures. This will obviously have an impact on our taxonomic interpretations especially if dealing with nucleotide-level variation between populations of related species. While mgDNA does provide insights into potential fungal organisms associated with a particular substrate, no other informative data which are considered important for taxonomists and ecologists can be obtained. For example, if mgDNA has been recovered from organic matter or from water samples, it could have been due to transport of DNA from other sources and hence this poses a challenge in space and time for mgDNA species detection. The lifestyle of those so called organisms remains elusive as well, because it would be almost impossible to predict whether it would be a saprobe, endophyte or potential pathogen. The mere presence of isolated mgDNA does not give any indication whatsoever in what state is the fungus (either sexual or asexual).

Comparing whole organisms with detailed phenotypic features and occasionally physiological characteristics have always been an integral part of taxonomy. With mgDNA, no such comparisons can be made and any taxonomic relationships based on a fragment of mgDNA would be like looking at some letters in a person’s name and making an attempt to decipher that person’s physiognomy. The other major problem would be to how to define a species based on mgDNA. There has been an ongoing discussion, but never a consensus on the use of DNA sequences in defining species for those taxa whose morphology is known, hence we contemplate that establishing a species concept based on mgDNA would be even much more difficult and should be dealt with much precaution. One major taxonomic determinant is the number of available DNA sequences in databases, but some markers are far better represented. For fungal species, the ITS gene regions and occasionally some protein genes are well represented and hence caution is warranted herein to avoid misidentifications and this will affect fungal biodiversity statistics. Despite the reduction in pricing over the last decade, the cost per sample analyses is still a major limiting factor, especially for those researchers in developing countries, and even for many amateur mycologists who are not associated with academic institutions in the rich countries. With the latest third and fourth generation sequencing technologies, which eliminate PCR amplification procedures and yield better reads with lower error rates, the costs are even higher. When it comes to the analytical part to compare the mgDNA with others, the question arises of how many sequence data is enough. We have seen in many circumstances when investigating phylogenetic relationships of fungal organisms that there has been a need to shift from a single DNA locus (e.g. ITS rDNA sequence data) to multigene phylogenetic analyses for better species identification and circumscription. How are we going to proceed with mgDNA, should we start accepting them as holotypes? Undoubtedly mycologists should tap into the potential benefits of mgDNA for extracting taxonomic knowledge, but should we set clear guidelines to validate our DNA sequence data? We are still some steps away from appropriate DNA-based molecular markers that can be used reliably for species identification and analysed phylogenetically with certainty. This poses a considerable challenge as many mycologists have diverged opinions on which gene fragment can be considered as a universal barcode. There will be obviously concerns of “holotype recognition” with mgDNA and under what circumstances/situations should we really accept a holotype. What would be the consequence or compromise when we have identical sequences or minor DNA differences, or where phylogenies fail to provide reliable support for specific lineages? Which metagenomics approach is more reliable to conclusively assign a specific DNA sequence as a holotype and the long debated issue of which genes/regions should be given priority rises again. We should treat name assignments with caution. Even if a sample of mgDNA has a high-level match with any previously deposited sequence, should we give them the same name? Many will have divergent opinions on this.

There will be discussion on the use of DNA sequence data as types during the International Code of Nomenclature for algae, fungi, and plants (ICN)-Fungal Nomenclature Session in the upcoming IMC meeting in Puerto Rico in July 2018. A major concern would be how to align existing DNA sequence data available or anticipated DNA sequence data especially from environmental samples into a framework that does not give rise to problems that arose with our existing dual nomenclatural system. It would be interesting to see how we proceed in relation to “old names” proposed by Dayarathne et al. (2016) following discovery of proper morphs/cultures after acceptance of DNA as types. Leaving a taxon as “Unnamed” or “Named inappropriately” are both a taxonomic concern. While we inevitably acknowledge that resorting to acceptance of DNA as types is deemed important given the huge number of undiscovered species, precaution is warranted so that we do not end up in vague guidelines that defeat purpose of fungal taxonomy. Mycologists have already stepped into the era of mgDNA and there is obviously no going back as the latter has already started to alter the landscape of fungal taxonomy. Perhaps we should be more confident that the outcome of most of these mgDNA can aid in the discovery of potential novel biomarkers and results in better species diagnostics. Meanwhile, mgDNA can be considered as additional novel genetic data in our databases, this does not really translate into species per se. With metagenomics, only well-known gene fragments are recovered which are compared to existing ones. However, DNA sequences without any associated morphological descriptions should not be considered totally obsolete, but a timely, systematic and accurate approach towards characterizing the DNA sequences and make them available as recognized taxonomic entities is the way forward.

Using ITS for species identification

After over two decades of intensive research on the most important classes of the Ascomycota, regarding both the numbers of species and the economical and practical importance (in particular, the Eurotiomycetes, Sordariomycetes and Dothideomycetes), we can now conclude that ITS is better than LSU and SSU for getting a species identification. A recent paper (Vu et al. 2018) relying on the type strains of the CBS culture collection (housed at the Westerdijk Fungal Biodiversity Institute, Utrecht, the Netherlands) has confirmed that ITS is an excellent primary barcode. However, it would not have been possible to reach that conclusion, were it not for the fact that many of the studied strains had previously been studied very carefully by polythetic taxonomic approaches. Many of the recent taxonomic rearrangements in this area have relied on a combination of multi-locus phylogenies and phenotype-derived characters. In some cases, aside from morphological traits of both, the sexual and asexual states, even secondary metabolite profiles were generated as additional informative parameters (Frisvad and Samson 2004; Stadler et al. 2014b). This is of very high practical concern because for example, the classification of biosafety levels of fungi heavily relies on the taxonomy and nomenclature. A very important example is the species pair Aspergillus flavus Link vs. A. oryzae (Ahlb.) Cohn, where it was found out that both species have identical ITS sequences, probably because the latter fungus has eventually been domesticated by humans in Asia from wild type strains of A. flavus. However, recent comparisons of the genome have revealed that they differ in 350 genes! In fact, A. oryzae is a very important industrial organism that has not only been used to produce various foods, such as soy sauce and tofu for many centuries, but is now even employed in other industrial processes such as the production of enzymes and commodity chemicals because it was granted GRAS status (generally recognised as safe). On the other hand, A. flavus is one of the most dangerous fungi on Earth because it is not only a pathogen but also produces highly toxic and mutagenic mycotoxins and is therefore classified in Biosafety Risk Class 2. If the polythetic taxonomy of these fungi were to be abandoned, and only ITS data were sufficient for nomenclature, the two aforementioned species would need to be treated as synonyms—possibly, with fatal consequences for the biotechnological industry! This relates to a very important point, i.e. that taxonomy should accommodate the requirements of the users and not just be a self-fulfilling prophecy. Taxonomists have a great responsibility to the other scientific communities such as plant pathologists, medical mycologists and biotechnologists and should therefore always use all sources of information that are available to them. Therefore, the nomenclatural rules should never be adapted to the lowest possible standards.

Moreover, a nomenclature based on small DNA fragments would even mock the current developments in fungal biology, where many capable scientists are working hard to provide new evidence on the functional biodiversity within the fungal kingdom. For example, the trichothecenes are a very important class of hazardous mycotoxins, and until recently, it remained unclear whether they are being produced at random by various hypocrealean taxa. Initial studies had concentrated on model organisms such as Fusarium graminearum Schwabe, and it was at first very tedious work to elucidate the mechanisms of their biosynthesis. However, based on this pioneer work and the availability of modern bioinformatic tools, Proctor et al. (2018) have recently provided a conclusive outline on the evolution of trichothecenes biosynthesis in the fungal kingdom. This is only a single exemplary study to indicate that, due to the recent advent of genomics, transcriptomics and bioinformatics technologies, it even appears feasible to employ secondary metabolite biosynthesis genes or other genes encoding for important functional traits in phylogenomic studies to verify the affinities of higher taxa, or even species hypotheses.

In the light of these exciting developments, an approach that will generally allow for any type of DNA-based data to disrupt the current nomenclatural system appears highly anachronistic. We admit that a very large portion of the fungal biodiversity cannot be subjected to multi-locus phylogenies. This certainly applies to those taxa that cannot (easily) be brought into axenic culture or have hitherto been neglected by taxonomists. For example, the current outline of Orbiliomycetes taxonomy (Baral et al. 2018) has been heavily reliant on a conjunction of rDNA data with meticulous morphological studies, specifically because many of the important taxa were never cultured, or the cultures that were eventually made from these fungi did not survive as they were never deposited in professional biodiversity repositories. Likewise, there are many other examples for important fungal groups including the powdery mildews, the mycorrhizal Agaricomycetes and the rust fungi, where only rDNA sequences are presently available. Even fresh material of these fungi can normally not be cultured. As the protein coding genes that might give more conclusive results cannot normally be amplified, there is presently no other option for taxonomists than to generate rDNA sequences of these specimens and try to find correlations to certain morphological traits. Evidently, this will work very well if enough data on the phenotypic characters within a certain taxonomic group are available. However, in the absence of morphological characters, it seems very unlikely that DNA-only nomenclature can replace the polythetic approach in defining species boundaries. We will demonstrate based on a few examples based on well-studied fungi that the ITS-based approach does not work well.

On one hand, polymorphism of ITS can be a big obstacle. Stadler et al. (2014a) have epitypified the important species, Xylaria hypoxylon (L.) Grev., based on a specimen that was previously examined by Fournier et al. (2011) and Peršoh et al. (2009). While the morphological examinations, which even included a comparison with authentic material going back to Linnaeus and Fries were rather conclusive, it was found that out of five cultures of the epitype specimen made from ascospores originating from the same perithecium, three different ITS genotypes were observed. If these genotypes would be found in the course of a molecular ecology study, and it were allowed to erect new taxa, based on slight divergence of the ITS, two superfluous species could be validly erected!

In general, ITS sequence data are often not 100% reproducible when a particular fungal strain is revived from liquid nitrogen. It is not uncommon that slight deviations are observed, and this cannot only be attributed to sequencing errors. O’Donnell and Cigelnik (1997) have already reported the phenomenon of polymorphism in Fusarium. Nevertheless, many taxonomic studies are relying on one or a few isolates of a given species, and the segregation of entire species complexes is sometimes based on the characteristics of a single representative isolate.

The second problem with ITS and rDNA as such is that there are whole genera and species groups in large genera that cannot be discriminated well without the housekeeping gene sequencing. Even though there are not many reports in the literature, owing to the fact that negative results (e.g. about the uselessness of ITS for species discrimination and phylogenetics) do not normally get published, this phenomenon frequently occurs across the fungal kingdom. The fact that the leading mycological taxonomists are always striving to attain higher standards is reflected by the increasing number of new taxonomic concepts that rely on multi locus studies. We should therefore rather refine the detection techniques, so that we can go for e.g., Calmodulin, TUB2, TEF and RBP2 even in NGS approaches and for the study of small specimens.

The third issue is that very often the high-throughput sequencing will result in sequencing errors, as discussed elsewhere in this paper, and the authors will then falsely believe that they have discovered new phylogenetic lineages. This can only be amended by using a dual approach such as the one that was recently published in Dissanayake et al. (2018).

Using the polythetic approach, we can now see clearly that ITS is less well-suited than certain protein coding genes for species segregation in many fungal groups (e.g. Dothideomycetes, Eurotiomycetes, or Sordariomycetes. We provide some representative case studies of using ITS from fungal identification and classification here.

Case studies from Basidiomycota

The global performance of full length ITS region sequences for species identification of Basidiomycota was estimated to be only 63% (PCI, percentage of correct identification) and PCI lower than 50% was observed in 38 out of 113 genera (Badotti et al. 2017). Some of the low PCI values might be explained by dataset characteristics such as low numbers of species or sequences used and/or by the presence of “outliers”, or they may be due to poor intrinsic performance of ITS. Neither the complete ITS region nor the sub-regions (ITS1 or ITS2) were useful in identifying species in eleven of the 113 genera studied (Badotti et al. 2017).

Problems with the use of ITS sequences have been reported for a number of basidiomycete genera. For example, ITS of Cortinarius (Pers.) Gray is less variable than RPB2 but it performs as well as RPB2 and RPB1 to retrieve supported close relationships in the phylogeny (Frøslev et al. 2005). However, out of 901 species, 30–39 species (depending on alignment method) could not be separated based on full ITS sequences owing to a lack of a barcoding gap (Garnica et al. 2016). “In a very few cases, there is evidence of ‘morphological species’: a morphologically and ecogeographically distinguishable species/subspecies with identical ITS regions, e.g. Cor. atrovirens Kalchbr. versus Cor. ionochlorus Maire (Brandrud et al. 1990, 1992, 1994, 1998, 2012; Garnica et al. 2016). More importantly, up to 40% false positives (wrongly splitting species) were obtained with full ITS regions sequences in subgenus Myxacium (Garnica et al. 2016).

Harder et al. (2013) used a two single-copy nuclear genes dataset to study Mycena pura (Pers.) P. Kumm. species complex. With the two protein-coding genes, they identified eleven phylospecies whereas ITS not only underestimated diversity as found by the two single-copy genes, but also identified an OTU which was not a phylogenetic species (“false positive”). On the other hand, ITS in Hygrocybe (Fr.) P. Kumm. is highly variable, with up to 25% distance (Babos et al. 2011, 2017), which is expected to lead to overestimation of the number of species.

den Bakker and Noordeloos (2005) indicated that some disagreements were found between a single-copy protein-coding gene (Gapdh) and ITS2 phylogeny of Leccinum Gray. In one case (L. cyaneobasileucum Lannoy & Estadès), the species was well supported both by morphological differences and the Gapdh phylogeny, but was included in L. holopus (Rostk.) Watling in the ITS2 tree. This may have resulted from past hybridization followed by concerted evolution of the ITS2 locus. Introgression and incomplete lineage sorting have been documented in other Leccinoideae genera, namely Rossbeevera T. Lebel et al. and Turmalinea Orihara & N. Maek., and may have been overlooked in other groups (Orihara et al. 2016). However, this kind of phenomenon, although problematic in species identification and phylogenetic analysis should not commonly result in false-positives in mgDNA species delimitation (i.e., wrongly considering divergent sequences as belonging to different species), except in the case of incomplete concerted evolution of the ITS sequences.

Case studies from Ascomycota genera

Overall methodologySequence data used in each case study are provided in Supplementary tables. Datasets were aligned for each gene partition using MAFFT (Katoh and Standley 2013). Aligned datasets were manually checked in Bioedit (Hall 2011). Maximum Parsimony, Maximum likelihood and Bayes analyses were performed by using PAUP*4, raxmlGUIv.0.9b2 and MrBayes v 3.1.2 (BMCMC; Ronquist and Huelsenbeck 2003) respectively, for each gene partitioned and combined dataset. Maximum parsimony trees were inferred using the heuristic search option with 1000 random sequence additions, and Maxtrees were setup to 1000. GTRGAMMAI model of nucleotide substitution, and the search strategy were set to Rapid boot strapping in Maximum Likelihood (RAxML) Analyses (Silvestro and Michalak 2012), with 1000 replications. The best fit model of evolution was performed by MrModeltest 2.2 (Nylander 2004) and was used in the Bayesian analyses. The six simultaneous Markov chains were run for 1,000,000 generations (3,000,000 generations for dataset of Colletotrichum destructivum), with sampling frequency at 100. The first 2,000 trees were discarded based on the result from Tracer software. The selected examples actually go back in part to published conclusive multi-locus studies where only combined datasets were published, apparently because the authors realised that the ITS data alone would not provide enough resolution. We have constructed the trees from the published sequence data in order to demonstrate the problems.


Colletotrichum Corda

Colletotrichum is a genus of pathogens, saprobes or endophytes occurring worldwide (Yang et al. 2009; Hyde et al. 2014; Jayawardena et al. 2016). There are 51 OTU sequences in GenBank, which are assigned to the genus based on LSU (12), SSU (6) and ITS (33) gene regions. Jayawardena et al. (2018b) identified saprobic fungi based on culture-depended and culture-independent methods including several Colletotrichum species. Colletotrichum species isolated from culture-dependent method had ≥500 bp for ITS region. In their study using multigene analyses the species were identified. ITS sequence data alone can be used in identification to genus or the species complex levels (Jayawardena et al. 2016, 2018a), but it has insufficient information to provide a better resolution at species or below species level.

For the case studies we have selected OTUs available in GenBank and have constructed separate phylogenetic trees for the destructivum species complex and other species (Fig. 1). We used the Blastn search in GenBank and considered a similarity at 99–100% as the same species (Garnica et al. 2016; Jeewon and Hyde 2016; Jayawardena et al. 2018b). A phylogenetic tree based on ITS sequence data was constructed for the Col. destructivum O’Gara species complex with both OTU sequences and sequences derived from type specimens (Fig. 1). Results depict that almost all of the OTUs identified as Col. destructivum clustered together with the type ITS sequence of Col. tabaci Böning with the exception of MF330413 which is basal to all others. The Blastn results of these OTUs showed 100% query cover and 100% similarity to species within the destructivum species complex (Col. tabaci Böning, Col. higginsianum Sacc., Col. utrechtense Damm, Col. destructivum, Col. fuscum Laubert) and to Col. coccodes (Wallr.) S. Hughes. Herein we note that those short ITS sequences are not suitable for resolving the destructivum species complex. A similar phylogenetic scenario was obtained when the ITS sequence data was analysed for other species complexes (Fig. 2). OTUs could be assigned to species complexes but phylogenetic placement was ambiguous hence the reliability of the ITS is being questioned. For a better resolution of this genus protein coding gene regions are required (Jayawardena et al. 2018a) but how far are we going to favour the use of alternative genes should we be dealing with OTUs suspected to be plant pathogens for a stable classification and appropriate nomenclature.

Fig. 1
figure 1

Maximum parsimony phylogenetic tree of Colletotrichum destructivum (ITS) and allies. The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). The third set of at each node is Bayesian posterior probabilities (only PP values above 90% shown). Strain numbers are indicated after species names. Sequences from environmental sample are in blue bold

Fig. 2
figure 2figure 2figure 2

Maximum parsimony phylogenetic tree of other Colletotrichum species (ITS). The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). Strain numbers are indicated after species names. Sequences from environmental sample are in blue bold


Penicillium Link

Penicillium is a very large genus, containing over 350 species (Visagie et al. 2014b), many of which play an important role in medicinal and industrial applications as producers of mycotoxins, antibiotics and enzymes. Therefore, it is instrumental to maintain a stable taxonomy of these fungi and provide concise species concepts. Species in this genus can be found on various substrates, for example, occurring as pathogenic species in humans, and contamination of foods. The genus was recently divided into 25 sections (Visagie et al. 2014a). Presently, more than 200 sequence datasets of ‘Uncultured Penicillium’ are available in GenBank. All of these have less than 200 bp linear DNA (June, 2018). In this study, sequence data from Penicillium section Citrina were selected to show the potential of ITS for species identification and classification. In Fig. 3, the maximum parsimony tree generated from ITS sequence data indicate that the clade containing strains of P. citrinum Thom and P. hetheringtonii Houbraken et al., which includes “P. citrinum” OTU-43 (available in GenBank with 201 bp, others have ca. 500 bp) are not well-resolved. Penicillium citrinum OTU-43 is placed outside the clade of P. citrinum, while P. tropicum Houbraken and P. tropicoides Houbraken et al. could not be differentiated by ITS. Figure 4 shows the classification within Penicillium section Citrina which is well-resolved by using the TUB2 gene. The maximum parsimony tree generated from TUB2 is in accordance with previous studies by Houbraken et al. (2010) and Visagie et al. (2014a), in that TUB2 is better suited to segregate species in Penicillium than ITS. Unfortunately, there are no TUB2 sequences of P. citrinum from environmental samples available in GenBank.

Fig. 3
figure 3

Maximum parsimony phylogenetic tree of Penicillium sect. Citrina (ITS). The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). The third set of at each node is Bayesian posterior probabilities (only PP values above 90% shown). Strain numbers are indicated after species names. Sequences from environmental sample are in blue bold

Fig. 4
figure 4

Maximum parsimony phylogenetic tree of Pennicillium sect. Citrina (TUB2). The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). The third set of at each node is Bayesian posterior probabilities (only PP values above 90% shown). Strain numbers are indicated after species names. There are no TUB2 sequence data available from environmental samples


Botryosphaeria Ces. & De Not

Species from the genus are saprobic, parasitic or endophytic on plants worldwide (Phillips et al. 2013; Dissanayake et al. 2016). Some species of Botryosphaeria are known to cause cankers, dieback and other plants diseases (e.g. Maas and Uecker 1984; Rumbos 1987; Michailides 1991; Smith et al. 1994; Phillips et al. 2013). For example, B. dothidea (Moug.) Ces. & De Not. (associated with botryosphaeria dieback in grapevine) could be considered as the major pathogen in this genus. Several molecular studies have indicated that the affinities of this important genus need to be further clarified (Hyde et al. 2013; Dissanayake et al. 2018). There are over 50 sequences in GenBank assigned to the genus Botryosphaeria based on ITS regions. Dissanayake et al. (2018) identified many endophytic species using a culture-dependent method, including species of Botryosphaeria. In their study, B. dothidea isolates obtained from culture dependent method had more than 500 bases for ITS. All isolates of B. dothidea clustered with the type species, thus, there is no problem in using ITS for identification of B. dothidea. However, ITS has insufficient information to resolve the relationship between B. dothidea and other species in the genus. Therefore, sequence data of combined ITS and TEF1 regions were used in their phylogenetic analyses to increase the phylogenetic resolution. Botryosphaeria (OTU_7) was detected and identified as B. dothidea (OTU_7) even the OTU has only 219 bases (Dissanayake et al. 2018). The short read sequence of OTU_7 is actually unable to clarify its position in the phylogenetic tree, thus Dissanayake et al. (2018) did not include it in their phylogenetic tree. However, bioinformatic tools indicated that OTU_7 belongs to the species B. dothidea.

In this paper, we provide phylogenetic trees using the dataset as in Dissanayake et al. (2018), but included only one dataset from the uncultured fungus OTU_7 (Figs. 5, 6). The phylogenetic tree generated from combined ITS and TEF1 sequence data shows more variability and is thus more useful for species discrimination. Isolates of B. dothidea cluster together with strong statistical support; which is in accordance with the previous studies. The OTU_7 in Fig. 6 placed OTU_7 closer to B. auasmontanum F.J.J. Van der Walt et al., and this is due to the fact that the NGS-derived fragment has only 219 base pairs. For comparison, a single DNA locus tree based on ITS is only provided in Fig. 5, demonstrating that only ITS OTUs data cannot resolve the identification or classification of taxa. These results imply that the inclusion of OTU data should be incorporated together with the bioinformatics analyses, and/or that the NGS methodology must be further developed to allow for generation of larger contigs and, even more importantly, the generation of DNA sequences from protein-coding genes.

Fig. 5
figure 5

RAxML maximum likelihood phylogenetic tree of Botryosphaeria (ITS). The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). The third set of at each node is Bayesian posterior probabilities (only PP values above 90% shown). Strain numbers are indicated after species names. Sequences from environmental sample are in blue bold

Fig. 6
figure 6

RAxML maximum likelihood phylogenetic tree of Botryosphaeria (ITS and TEF1). The first and second sets of numbers at each node are MP and ML values, respectively (only BS values above 70% shown). The third set of at each node is Bayesian posterior probabilities (only PP values above 90% shown). Strain numbers are indicated after species names. Sequences from environmental sample are in blue bold


Xylaria Hill ex Schrank

Xylaria is the generic type of Xylariaceae Tul. & C. Tul., with around 790 epithets listed in Index Fungorum (2018, http://www.indexfungorum.org/names/Names.asp). Species in this genus constitute a rich source of bioactive secondary metabolites (Song et al. 2014). Recently the phylogeny of the family was changed, following multi-locus DNA pyhlogenies and some comprehensive reviews of their taxonomy have been provided by Daranagama et al. (2018) and Wendt et al. (2018). Helaly et al. (2018) concluded that future culturing and phylogenetic studies are needed to reach a better taxonomic classification of Xylaria. Thus, it may finally become possible to link the secondary metabolites to the taxonomy of the genus. Presently, there are around 36 sequence data from “Uncultured Xylaria” available in GenBank (June, 2018). Most of these are ITS (32), with 22 sequences shorter than 200 bp, and only 6 sequences longer than 300 bp (Fig. 7). We selected X. schweinitzii clone G-ela3-ITS2_OTU-0-069_9 to be a case study in this paper. The sequence was initially checked through Blast search, and we found that X. ophiopoda is identical to X. schweinitzii (Fig. 8). Xylaria ophiopoda isolate 1081 was “identified” by Thomas et al. (2016, in their supporting information, even though it remains obscure how they could relate the sequence to an old name that was never verified since it was mentioned by Saccardo!), while X. schweinitzii isolate 92092023 was published by Hsieh et al. (2010) and therefore represents an authentic specimen that was studied by experts. Thus, a misidentification may have been involved between the two species, because ITS gives insufficient information for this fungal group. In Fig. 8, the most identical to X. schweinitzii G-ela3-ITS2_OTU-0-069_9 is X. schweinitzii isolate 904, it however has 197 bp and that will definitely not be enough to differentiate them.

Fig. 7
figure 7

Representative strains of uncultured Xylaria available in GenBank

Fig. 8
figure 8

Xalaria schweinitzii clone G-ela3-ITS2_OTU-0-069_9 in Blast search shows similarlity of different species

Linking multiple loci in mgDNA extracts

In DNA extracts, i.e. after cell disruption, only genes located on the same chromosome are physically linked. The usage of multiple genes for “species” delimitation usually relies on the assumption that DNA extract contains only DNA from a single species (i.e. gDNA). This assumption usually only applies for pure cultures, and even those may contain undetected species, such as fungicolous yeasts. We currently think that we know for several groups (but not for all) which genes are required to delimitate species in these groups. But this changed with time and is certainly still changing. For most fungal taxa, consecutive phylogenetic studies revealed more and more genes to be necessary for reconstruction of their phylogenetic relationships. In case the original DNA extract had been deposited, however, additional loci may be sequenced to fulfill emerging demands.

With regard to a type specimen consisting of mgDNA this is challenging. Different chromosomes of multiple species are mixed in mgDNA extracts. If a species is represented only by mgDNA, we are currently not aware of an approach to link multiple loci for species delimitation if these are positioned on different chromosomes. Assuming that subsequent analyses reveal that additional loci are required for proper phylogenetic resolution, it is only possible to identify these loci in a mgDNA-based type, if they are located on the same chromosome as the originally sequenced loci. While this may already require the sequencing of the whole chromosome, it is currently not possible to assess the required data if the locus is located on another chromosome than the previously sequenced loci. Even if the information is theoretically there, we may not be able to assess it in the mgDNA based type specimen. Accordingly, an mgDNA based type specimen may only serve as appropriate reference for a species if all chromosomes of the represented species are assignable to that species.

In the long term sequence data of numerous whole metagenomes will be available. Once assembled to the chromosome level, correlation analyses of multiple metagenomes will enable an assignment of chromosome sequences to species. However, such an assignment would require data beyond a single “mgDNA type” and would only be correlative, i.e. with a certain probability specified by statistical support values.

Conclusions and an alternative DNA-based system

Mycologists have studied and provided scientific names for novel fungal species with specimens-based holotypes for effective communication. Cultures and dried type specimens are preserved in various biodiversity repositories to facilitate future studies and re-examination of taxa. We have already witnessed how DNA sequencing has revolutionized taxonomy and has become the most appropriate and standard taxonomic tool to identify species. However as mycologists, we are still mostly convinced to use morphological characterization and supplement our taxonomy with DNA sequence data to resolve species, more reliable identification and establish natural relationships. We estimate that sooner or later, our sequencing strategies can become our “molecular microscope”, not in view of solely relying on DNA for identification, but to better complement morphology and assign taxa to any particular taxonomic rank.

There are some benefits of naming mgDNA especially if it would bring numerous dark data into the light (Ryberg and Nilsson 2018) and monitor species diversity. However without careful considerations, this can end up in disregard of appropriate nomenclatural issues with fungi and future taxonomic problems that can arise. There will be no morph to study, no cultures, and no phenotypic identification. If the DNA barcodes used in generating mgDNA which does not have counterparts in DNA databases, then it defeats the purpose to make any comparison. With morphological based studies, we compare like with like, but with mgDNA, there is often a totally different scenario. Another problem is that there are already numerous erroneous or fake sequences in databases. Furthermore, mgDNA strains will often not be deposited in the public domain collections. The integrity of mgDNA deposited can be questionable. With morph based specimens, one can go back, recollect samples, re-examined, re-evaluate nomenclature and classification. What can we do with mgDNA? Should we just bluntly accept a given name and classification with little possibility of scientific re-evaluation?

The present discussion goes back to the previous one on the introduction of an 1F1N concept, when the issue with DNA based nomenclature has been brought up for the first time and in retrospective the decision for a radical introduction made at the IBC in Melbourne has created lots of name changes. This has resulted in a lot of fruitful collaborations within the mycological community, and finally in a workable system that was built from a team effort of mycologists world-wide. However, for us, in retrospective it would have been much better to maintain the priority of the sexual morph, except in such cases as Aspergillus where the asexual morph is clearly more important, which would have afforded much less discussion and name changes. In any case, there was no way to avoid this, thus we should accept mgDNA as holotypes once we have eliminated all possible shortfalls. Another way would be to include the DNA of the taxa that are known to occur in the respective habitat as a sort of positive control. The extent of the potential damage of implementation of DNA data as a type cannot be foreseen, but it could be very important. As mentioned above, if we use ITS only to assign a species name, some scientists would probably be tempted to erect a new species when we could not find a highly similar sequence in GenBank possibly because of erroneous sequences deposited. We strongly suggest to re-discuss the matter of using mgDNA as holotypes once whole genomes can be obtained from environmental samples, which may soon become feasible.

Several publications have shown serious errors in DNA sequences (we sometimes even suspected that we were dealing with fake sequences!) over the past decade. While one can argue that it does not matter if we add some more species during biodiversity assessments, the nomenclature of pathogens or otherwise practically important taxa should definitely not be messed up, but it would lead to a decrease in scientific quality standard. mgDNA evidently plays a significant role to discover potentially rare and elusive species but not directly in taxonomy per se. One way to avoid a further decline of good scientific standards might be to install a similar procedure as in case of bacterial taxonomy, that is naming a few journals that are allowed to publish new fungal taxa on DNA-based data (e.g. where the editorial boards are composed of capable specialists), but this will be very difficult to implement. After all, there is no restriction regarding the place of publication, and even if an irrational species concept is being rejected by the editors of ten or more mycological journals, it can still be published on the Internet.

To allow naming of species based on mgDNA for dealing with environmental data cannot be the reason to undermine all the results of the excellent work dealing with polythetic taxonomy which has been accomplished over the past years. Actually only the polythetic approach has led to a situation where we can now see clearly that ITS is less well-suited than certain protein-coding genes for species segregation in many fungal groups. Another hypothesis from Peršoh et al. (2010) and Peršoh (2013) is to give independent rules from the code for mgDNA, i.e. it would be not a real species name, but an “indication” that there might be something new (a new species) related to the taxon given in the name, or the name of the closest physical type appended by a number. This way a conflict with the code might be avoided, because the naming approach (and thus the resulting name) would not claim to follow the standards of the code. However, we think this would not work in cryptic species and other species which could not be classified by ITS. We could of course live with a “candidatus system” such as the one outlined in the paper by de Beer et al. (2016) where the DNA based types are being flagged, but it must be carefully considered which criteria can apply to such a system. There are already some cases where it was regarded feasible to forego previously established species concepts, e.g. if newly discovered insect-associated and/or endophytic fungi were found to belong to a well-studied genus and could thus be recognised as new from a comparison of molecular data (e.g. Pažoutová et al. 2013), but we have to be aware that this does not really work in numerous phylogenetic lineages where almost no molecular data in closest species are extant. An additional way that the provisional names similar the ones generated by GenBank when sequences from new or unidentified species are submitted, like Fungus sp. MAN-2018.25, in which Fungus can be replaced by any genus name, MAN are the initials of the author of the sequence, 2018 the year of the submission of the sequence to the database, and 25 the species number. Such a system could be extended to higher rank taxa, by replacing “fungus” by the nearest known higher rank taxon. A database of all those names could be implemented to allow unambiguous and efficient communication about those taxa, without the negative effects of designating single-locus (ITS) DNA sequences as holotypes.

We conclude that taxonomy is impossible to manage if the nomenclature becomes uncertain. The accuracy of fungal identification and classification is needed (morphology together with molecular data) to understand morphological evolution, adaptation, epidemiology of fungal pathogens and in many other work areas. To allow the use of naming of mgDNA as holotypes currently may benefit a minority of mycologists, but not the majority of mycologists who work on fungal taxonomy and classification.

No approach is better than its counterparts. Traditional taxonomists have battled through a long way over centuries. We can still describe fungi without sequence data based on morphology alone (as we can see the characters). Although undesirable, it can be done. However, if we can describe species based only on DNA we are at a real risk of describing already named species. The disadvantage will be that for the DNA-based species we can never look at additional characters and we believe that mgDNA is still in its infancy and not a substitute for use as holotypes for the time being. We should not rush to spoil the scientific beauty of fungal taxonomy and disregard acquired knowledge.