Draft genome sequence of Trametes villosa (Sw.) Kreisel CCMB561, a tropical white-rot Basidiomycota from the semiarid region of Brazil

Herein, we present the draft genome of Trametes villosa isolate CCMB561, a wood-decaying Basidiomycota commonly found in tropical semiarid climate. The genome assembly was 57.98 Mb in size with an L50 of 691. A total of 16,711 putative protein-encoding genes was predicted, including 590 genes coding for carbohydrate-active enzymes (CAZy), directly involved in the decomposition of lignocellulosic materials. This is the first genome of this species of high interest in bioenergy research. The draft genome of Trametes villosa isolate CCMB561 will provide an important resource for future investigations in biofuel production, bioremediation and other green technologies.


a b s t r a c t
Herein, we present the draft genome of Trametes villosa isolate CCMB561, a wood-decaying Basidiomycota commonly found in tropical semiarid climate. The genome assembly was 57.98 Mb in size with an L50 of 691. A total of 16,711 putative protein-encoding genes was predicted, including 590 genes coding for carbohydrateactive enzymes (CAZy), directly involved in the decomposition of lignocellulosic materials. This is the first genome of this species of high interest in bioenergy research. The draft genome of Trametes

Value of the data
It is the first draft genome of Trametes villosa, a tropical white-rot Basidiomycota from the semiarid region of Brazil, promising for its production of ligninolytic enzymes.
T. villosa isolate CCMB561 is a good producer of lignin peroxidase, manganese peroxidase and laccase, enzymes considered crucial for lignin degradation, providing a major advantage for its use in bioenergy research.
The draft genome will accelerate functional genomics research, helping to understand the molecular basis of lignin decay by this fungus as well as advancing its enzymatic applications.
Trametes villosa (Sw.) Kreisel is a common species in the Brazilian semiarid region [3]. It is a good producer of the three important ligninolytic enzymes: Laccase (Lac) [4], Manganese Peroxidase (MnP) [5] and Lignin Peroxidase (LiP) [6], demonstrating its high potential for biotechnological applications. However, little is known about the function and structure of T. villosa genes, which requires detailed investigation.
White-rot basidiomycotan fungi are the main producers of ligninases that substantially contribute to lignin decay of wood [7,8]. Nowadays, ligninolytic enzymes of white-rot fungi have been broadly studied for their potential applications in a wide range of industrial bioprocesses such as decolorization of industrial dyes, the pulp bleaching of paper, textile industry and the degradation of organopollutants [9]. Furthermore, T. villosa simultaneously produces LiP, MnP and Lac [5,6,10] whereas other lignin decay fungi produce only one or two of these ligninolytic enzymes simultaneously [11,12]. Thus, a species able to produce the three ligninolytic enzymes in the same bath culture is highly desirable for biotechnological applications [6].
In order to accelerate the studies on functional genomics and elucidate molecular processes of lignin decay in this species, the genome of T. villosa CCMB561 was sequenced and assembled. Sequencing was performed using the paired-end method with the Illumina HiSeq. 2500, which generated 25,034,256 reads with a mean read length of 151 bp and a total of 7.5 Gbp of data. The resulting genome assembly of T. villosa CCMB561 contained 57.98 Mb, which was larger than the 33.6 Mb genome of Trametes hirsuta [13], which is the phylogenetically closest species with available complete genome based on a five-marker dataset [14]. According to QUAST version 4.4 [15], the assembled draft genome of T. villosa CCMB561 consisted of 10,323 contigs (6161 longer than 1 kbp), with N50 of 16.5 kb and L50 of 691, while the largest contig spanned 647,839 bp, and the GC content of the genome was predicted as 59.34% (Table 1). A total of 16,711 genes were predicted, encoding proteins with an average length of 496 amino acids. The CAZymes analysis identified 590 these genes encoding carbohydrate-active enzymes (CAZymes), which included 237 glycoside hydrolases (GHs), 78 glycosyltransferases (GTs), 12 polysaccharide lyases (PLs), 69 carbohydrate esterases (CEs) and 112 auxiliary activities (AAs). Additionally, the genome of T. villosa CCMB561 was predicted to contain 820 proteins with oxidoreductase activity and 45 with peroxidase activity.
Although T. villosa CCMB561 commonly exists as an efficient wood decomposer, with a lignocellulolytic enzyme system mainly comprising laccases, lignin peroxidases and Mn-dependent peroxidases as well as a series of CAZymes [5,6,16], there are limited data about their synthesis, genetic coding and regulation. In the context of basic research, the genome sequencing of T. villosa CCMB561 presented herein will enrich the portfolio of potential genes, enzymes and pathways involved in the lignin degradation processes of the white-rot fungi. Additionally, in an applied context, the draft genome assembly of T. villosa CCMB561 will facilitate the development of ligninolytic enzyme production for biotechnological applications. Altogether, these efforts will ultimately provide important management tools to be used in industry, especially for lignocellulosic waste management.

Genomic DNA extraction and sequencing
The mycelium of T. villosa was grown on PDA medium for 5-7 days, at room temperature and after covering the superficial area of a 9-mm diam. Petri dish, it was scrapped. Genomic DNA was extracted with a FastDNA TM Soil kit (MPBio). The quality and quantity of the genomic DNA were assessed by agarose gel electrophoresis and fluorometric analysis, respectively. A 450 bp library was prepared from genomic DNA with the NEBNext Fast DNA Fragmentation and Library Preparation Kit (New England Biolabs, Ipswich, NE, USA) following the manufacturer's instructions. Library quality was evaluated with Agilent 2100 Bioanalyzer. Whole genome sequencing was performed using an Illumina HiSeq. 2500.

Acknowledgments
We would like to thank everyone who contributed directly or indirectly to this study, especially UEFS, UFMG, UFPA, CEFET, FAPESB, CAPES and CNPq. Datasets were processed in Sagarana HPC cluster, CPAD-ICB-UFMG.

Transparency document. Supplementary material
Supplementary data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2018.04.074.