Data on the genome analysis of the wood-rotting fungus Steccherinum ochraceum LE-BIN 3174

In the present article, we report data on the whole-genome sequencing of wood-rotting (white-rot) fungus Steccherinum ochraceum LE-BIN 3174. The S. ochraceum LE-BIN 3174 genome consists of 770 scaffolds (N50 = 62,812 bp) with the total length of assembly ∼35 Mb. The structural annotation of the genome resulted in the prediction of 12,441 gene models, among which 181 were models of tRNA-coding genes, and 12,260 – protein-coding genes. The protein-coding genes were annotated with different databases (Pfam, InterPro, eggNOG, dbCAN, and MEROPS). The whole genome sequence and functional annotation provide an important information for the deep investigation of biochemical processes that take place during the late stages of wood decomposition by Basidiomycetes. The Whole Genome project of S. ochraceum LE-BIN 3174 had been deposited at DDBJ/ENA/GenBank under the accession RWJN00000000. The version described in this work is version RWJN00000000.1. For further interpretation of the data provided in this article, please refer to the research article “Fungal Adaptation to the Advanced Stages of Wood Decomposition: Insights from the Steccherinum ochraceum” [1].


a b s t r a c t
In the present article, we report data on the whole-genome sequencing of wood-rotting (white-rot) fungus Steccherinum ochraceum LE-BIN 3174. The S. ochraceum LE-BIN 3174 genome consists of 770 scaffolds (N50 ¼ 62,812 bp) with the total length of assembly~35 Mb. The structural annotation of the genome resulted in the prediction of 12,441 gene models, among which 181 were models of tRNA-coding genes, and 12,260 e proteincoding genes. The protein-coding genes were annotated with different databases (Pfam, InterPro, eggNOG, dbCAN, and MER-OPS). The whole genome sequence and functional annotation provide an important information for the deep investigation of biochemical processes that take place during the late stages of wood decomposition by Basidiomycetes. The Whole Genome project of S. ochraceum LE-BIN 3174 had been deposited at DDBJ/ ENA/GenBank under the accession RWJN00000000. The version described in this work is version RWJN00000000.1. For further interpretation of the data provided in this article, please refer to the research article "Fungal Adaptation to the Advanced Stages of Wood Decomposition: Insights from the Steccherinum ochraceum" [1]. © 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data description
Steccherinum ochraceum is a white-rot basidiomycete with wide ecological amplitude. It occurs in different regions of Russia and throughout the world occupying different climatic zones. The obtained draft genome of S. ochraceum LE-BIN 3174 (DDBJ/ENA/GenBank accession/version e RWJN00000000.1) is represented by the 770 scaffolds with the total length of 35.27 Mb and of comparable quality with other previously sequenced genomes of polypore fungi [2]. The gene prediction resulted in 12,441 gene models. The general information regarding genome's assembly, structural and functional annotation is presented in Table 1. The summary of the Gene Ontology (GO) classification of the protein coding genes is illustrated in Fig. 1. The whole genome sequence of S. ochraceum LE-BIN 3174 showed that it harbors 361 carbohydrate-active enzymes (CAZymes). The auxiliary activity enzymes (AA), carbohydrate Specifications Table   Subject Biology Specific subject area Microbiology, Mycology, Genomics. Type of data Genome sequence data. How data were acquired Shotgun method using Illumina HiSeq 2500 with paired end runs. Data format Raw and analyzed data. Parameters for data collection The mycelium derived from field-collected basidiospores was statically cultivated on glucose-peptone (GP) medium at 26e28 C in 750-mL Erlenmeyer flasks. The mycelium was ground in liquid nitrogen, and total DNA was extracted using DNeasy Plant Mini Kit (Qiagen, US).

Description of data collection
The genome was assembled with CLC Genomics Workbench 11.0 (Qiagen, US) and annotated with Funannotate pipeline v1.

Fungal strain isolation and genetic verification
The   For the genetic verification, the genomic DNA (gDNA) was extracted as described later in the "Genomic DNA Isolation, Library Preparation and Sequencing" section of this manuscript, and the sequence of ITS1-5.8S rRNA-ITS2 region was obtained using the standart primers: ITS1F 5 0 eCTT GGT CAT TTA GAG GAA GTA Ae3 0 and ITS4B 5 0 eCAG GAG ACT TGT ACA CGG TCC AGe3 0 . The PCR amplification was performed using the Encyclo PCR kit (Evrogen, Russia) under the following conditions: 1 cycle of 5 min at 95 C; 25 cycles of 1 min at 90 C, 1 min at 56 C, and 1 min at 72 C; 1 cycle of 10 min at 72 C. Obtained PCR reaction mixture was resolved using 1,2% agarose gel. The performed PCR amplification produced the single PCR-product with approximate length of 830 bp. The obtained product was ceased from the gel and purified with QIAquick Gel Extraction Kit (Qiagen, USA), according to the manufacturer's instructions. The Sanger sequencing of the obtained fragment was performed at the Evrogen JSC (Russia, Moscow). After ultrasonic fragmentation the gDNA was prepared for sequencing using TruSeq DNA Sample Prep Kit (Illumina, US). The quality and quantity of the obtained DNA-library were checked using Agilent Bioanalyzer 2100 and StepOnePlus Real-Time PCR System (Thermo Fisher Scientific, US). The whole genome sequencing was carried out with Illumina HiSeq 2500 system (Illumina, US) using HiSeq Rapid SBS Kit v2 at the Evrogen JSC (Russia, Moscow).

Genome sequencing, assembly and annotation
The shotgun sequencing produced 2 Â 47,868,586 paired-end reads (2 Â 100 bp) with an insert size of 300e500 bp. The reads were further processed with CLC Genomics Workbench 11.0 (Qiagen, US) as follows: (1) adapters were removed from all reads; (2) all reads were trimmed based on their quality; (3) reads were sampled to reduce coverage to a maximum average coverage of 100 Â ; (4) reads were de novo assembled, and resulted contigs were scaffolded.
The functional annotation of the predicted protein-coding genes was performed with three generalcontent databases: the protein families database e Pfam [8], the integrative protein signature database e InterPro [9], and the orthologous groups database e eggNOG [10]. Additionally, two domain-specific databases were employed: carbohydrate-active enzyme (CAZyme) database e dbCAN [11], and peptidase database e MEROPS [12]. The prediction of transmembrane topologies and signal peptides was performed with Phobius [13] and SignalP [14], respectively.
The data on genome sequencing, assembly and annotation are presented in Table 1.
As a result of general functional prediction, 6019 genese were annotated with the GO terms. In total, 10,648 GO terms were assigned, from which 1707 were GO terms related to "Cellular component" class, 5207 e to "Molecular function" class, and 3734 e to "Biological process" class ( Fig. 1).
Comparison of the total CAZymes contentis present in Fig. 2.
Comparison of the content of CAZymes acting on different polymeric components of lignocellulose [15] is presented in Fig. 3. Please note, that the numbers do not add up properly due to the redundancy in the classification scheme that was advanced to reflect different enzymatic activities possessed by fungi rather than different CAZymes, since the same CAZyme can simultaneously act on several components of lignocellulose.