Draft genome assembly dataset of the Basidiomycete pathogenic fungus, Ganoderma boninense

Ganoderma boninense is a soil-borne Basidiomycete pathogenic fungus that eminent as the key causal of devastating disease in oil palm, named basal stem rot. Being a threat to sustainable palm oil production, it is essential to comprehend the fundamental view of this fungus. However, there is gap of information due to its limited number of genome sequence that is available for this pathogenic fungus. This implies the hitches in performing biological research to unravel the mechanism underlying the pathogen attack in oil palm. Therefore, here we report a dataset of draft genome of G. boninense that was sequenced using Illumina Hiseq 2000. The raw reads were deposited into NCBI database (SRX7136614 and SRX7136615) and can be accessed via Bioproject accession number PRJNA503786.


Data description
This data consist of raw reads of the cultured G. boninense genome that were sequenced via Illumina Hiseq 2000 technology [1]. The data sets were named as s1_1.fastq, s1_2.fastq, s8_1.fastq and s8_2.fastq, whereby this involved paired-end reads sequencing in two lanes, denoted by s1* and s8* file names. The data reported here covers the pre-processing of raw reads, assembly data statistics and similarity search. Table 1 shows pre-processing statistics of the genome reads, consisiting of raw reads and cleaned reads, which the latter indicates reads with high quality. Table 2 summarizes the main assembly statistics of the assembled draft genome. Fig. 1 shows assessment of draft genome completeness using Benchmarking Universal Single-Copy Orthologs (BUSCO) software while using fungi dataset of Basidiomycota odb9 a reference. Fig. 2 shows the distribution of similarity search of assembled draft genome against Swiss-Prot database which delineated into different levels of similarity in the sense of E-value parameter.

Genome sequencing
Genomic DNA (gDNA) was isolated from the fruiting body of G. boninense. A total of 5 mg of DNA was used to prepare a 400 bp paired-end sequencing library using an Illumina paired-end DNA sample preparation kit. The quality of the library was assessed by Q-PCR before continuing to cluster Specifications Table   Subject Molecular Biology Specific subject area Agriculture biology and next generation sequencing (NGS) reads of genome Type of data

Value of the Data
The data reported here is important for genomics and molecular related projects to unravel G. boninense genetic code. The deposited data contributes to larger database of currently limited G. boninense genome access (still in incomplete sequencing phase) and the accessible data may benefit researchers in subsequent projects on G. boninense, especially in genome-wide related projects. The data allows further comparative analysis to identify candidate genes in G. boninense that possibly contribute in the traits of interest.
The mapping data can be used for the identification of the genetic variants that may help in better understanding the biological nature of this pathogen through its genetic variability. The accessible data can be used to elucidate the mode of infection and molecular events of G. boninense during the oil palm infection.   generation. Sequencing was performed using two lanes of Illumina HiSeq 2000 paired-end flow cell using 202 cycles to produce 2 Â 100 bp paired-end reads.

Quality assessment and reads pre-processing
Prior to bioinformatics analysis, the quality of raw reads were assessed using FASTQC [2]. The raw reads were pre-processed using Perl-coded computer scripts to trim low quality bases and filter short reads to obtain high quality reads, which refer to reads with Phred quality value of Q20 and longer than 30 bp [3]. The improved quality of cleaned reads were confirmed using FASTQC [2]. Table 1 shows the pre-processing statistics of the genome reads.

De novo genome draft assembly
The high quality reads of Illumina were assembled using de novo approach by Trinity tools [4,5]. Assembly statistics for both approaches is shown in Table 2. The completeness of de novo assembled draft genome was evaluated using BUSCO [6] on a local workstation. Fungi dataset of Basidiomycota odb9 was used as its single-copy orthologs database and the result is shown in Fig. 1. The assembled sequence was searched against Swiss-Prot database [7] using Blastx program [8] which was downloaded locally. The similarity search shows about 74.31% of the assembled sequence were similar to the manually curated protein database (Fig. 2).