Deep Insight into the Ganoderma lucidum by Comprehensive Analysis of Its Transcriptome

Background Ganoderma lucidum is a basidiomycete white rot fungus and is of medicinal importance in China, Japan and other countries in the Asiatic region. To date, much research has been performed in identifying the medicinal ingredients in Ganoderma lucidum. Despite its important therapeutic effects in disease, little is known about Ganoderma lucidum at the genomic level. In order to gain a molecular understanding of this fungus, we utilized Illumina high-throughput technology to sequence and analyze the transcriptome of Ganoderma lucidum. Methodology/Principal Findings We obtained 6,439,690 and 6,416,670 high-quality reads from the mycelium and fruiting body of Ganoderma lucidum, and these were assembled to form 18,892 and 27,408 unigenes, respectively. A similarity search was performed against the NCBI non-redundant nucleotide database and a customized database composed of five fungal genomes. 11,098 and 8, 775 unigenes were matched to the NCBI non-redundant nucleotide database and our customized database, respectively. All unigenes were subjected to annotation by Gene Ontology, Eukaryotic Orthologous Group terms and Kyoto Encyclopedia of Genes and Genomes. Differentially expressed genes from the Ganoderma lucidum mycelium and fruiting body stage were analyzed, resulting in the identification of 13 unigenes which are involved in the terpenoid backbone biosynthesis pathway. Quantitative real-time PCR was used to confirm the expression levels of these unigenes. Ganoderma lucidum was also studied for wood degrading activity and a total of 22 putative FOLymes (fungal oxidative lignin enzymes) and 120 CAZymes (carbohydrate-active enzymes) were predicted from our Ganoderma lucidum transcriptome. Conclusions Our study provides comprehensive gene expression information on Ganoderma lucidum at the transcriptional level, which will form the foundation for functional genomics studies in this fungus. The use of Illumina sequencing technology has made de novo transcriptome assembly and gene expression analysis possible in species that lack full genome information.


Introduction
Ganoderma lucidum is a basidiomycete white rot fungi and has been used as traditional herbal medicine in Asia for thousands of years [1]. It has an abundance of bioactive components, which have numerous positive effects in diseases with little side effects [2][3][4][5]. Furthermore, this fungus has been used as a potent resource for lignocellulose degrading enzymes for many years [6][7][8].
In recent years, next-generation sequencing techniques (such as Roche 454, Illumina Solexa GA, and SOLiD) has helped to improve the efficiency of discovering novel genes and has provided a platform for further understanding differential gene expression at the genomic and transcriptional level [30][31][32][33]. While much attention has been given to human [34] and the model organisms [35][36][37][38], genome research of other non-model organisms such as macroscopic fungi remain in its infancy.
Although G. lucidum has been extensively studied from a pharmacological perspective, genomic resource, molecular information on developmental processes and the content of lignocellulose degrading enzymes for G. lucidum are scarce. At present, only 1,046 expressed sequence tags (ESTs) of G. lucidum are obtainable from the NCBI public database [57]. While these sequences are a valuable resource for analysis in G. lucidum, the current genetic data is insufficient for elucidating the molecular mechanisms of G. lucidum from a pharmacological and developmental perspective.
In this study, we determined the transcriptomes of two developmental stages (mycelium and fruiting body) from G. lucidum using Illumina sequencing technology, for the purpose of further understanding the differential expression of genes relating to important (pharmaceutical) metabolic pathways and lignocelluloses degradation. In the current work, we obtained over 12 million high-quality 90 bp DNA fragments and demonstrated the efficiency and practicability of short-read sequencing for de novo assembly and annotation of genes expressed in a eukaryote without prior genome information. We identified 28,210 unigenes (18,892 unigenes in mycelium and 27,408 in fruiting body), and analyzed the gene expression profiles of G. lucidum at two different developmental stages by using a digital gene expression system. By transcriptome analysis, our results provide an important resource for identifying the effective pharmaceutical ingredients and new lignocelluloses degrading enzymes in G. lucidum and for analyzing the developmental process of this fungus.

Illumina Sequencing and De novo Assembly
To obtain an overview of the G. lucidum gene expression profiles during different developmental stages, cDNA samples were prepared from G. lucidum mycelia and fruiting bodies and sequenced using the Illumina HiSeq TM 2000 sequencing platform. After filtering for adaptor sequences, duplication sequences, ambiguous reads and low-quality reads, a total of over 12.8 million clean reads 90 bp long were obtained in a single sequencing run, of which 6,439,690 reads belonged to mycelium and 6,416,670 reads to the fruiting body (Table 1). Then, these high quality reads were assembled to produce 18,892 mycelium unigenes and 27,408 fruiting body unigenes using SOAP de novo and TGICL software. The mean sizes of mycelium and fruiting body unigenes were 498 bp and 514 bp, respectively, with lengths ranging from 100 bp to over 3,000 bp.
Unigenes from each sample were assembled together by sequence clustering software TGICL resulting in 28,210 allunigenes. The length of the majority of unigenes was 100-500 bp, with 12,841 from mycelium, 18,400 from the fruiting body and 17,558 from the all-unigenes set ( Figure 1A).
To evaluate the quality of the dataset, the ratio of the gap length of assembled unigenes was assessed ( Figure 1B). The majority of the unigenes showed gap lengths less than 5% of the total length, which accounted for 99.8% of mycelium unigenes, 94.1% of fruiting body unigenes and 94.8% of all-unigenes. Few unigenes had a 30% or greater gap, indicating that the assembled data was of high quality.

Functional Annotation of G. lucidum Transcriptome
An outline flow of the transcriptome analysis procedure is shown in Figure 2. For annotation, all unigenes were aligned against sequences from the National Center for Biotechnology Information (NCBI) non-redundant (nr) protein database by using the BLASTx algorithm with an E-value threshold of 10 25 . 11,098 out of 28,210 unigenes (39.3%) were found to match to known proteins ( Figure 2). Most sequences in the nr database were from non-fungal organisms and other fungal species other than Basidiomycota. G. lucidum is member of Agaricomycotina in Basidiomycota. While 39.3% of all-unigenes matched to the nr database, only 3,770 of 11,098 unigenes received GO annotation.
To analyze the G. lucidum transcriptome in further detail, a customized fungal database (LCSPP database) consisting of genome sequences of Laccaria bicolor, Coprinopsis cinerea, Schizophyllum  Figure 2). The proportion distribution of sequences matching to different fungi species is shown in Figure S1. The G. lucidum transcriptome produced a strong match against the Schizophyllum commune  genome. To further understand the G. lucidum transcriptome, genomic information (13,181 genes) of S. commune [50] was downloaded from the JGI website and subjected to the same functional annotation as G. lucidum.
2) KEGG annotation. KEGG metabolic pathways analysis was performed by initially aligning unigenes with sequences from the LCSPP database and automatically assigning gene functions to the corresponding KEGG terms. 1,396 annotated unigenes from the G. lucidum transcriptome and 1,588 genes of S. commune were grouped into 16 known metabolic or signaling pathway classes (Table S2). As illustrated in Figure

3) COG annotation and protein domains in G. lucidum
transcriptome. To further evaluate the effectiveness of our annotation process and the completeness of our transcriptome, we searched the annotated sequences for the genes involved in COG classifications. A total of 5,217 unigenes were assigned to one or more COG functional categories using BLASTx with an E-value threshold of 10 210 . 7,160 genes of S. commune got COG annotation. Out of the 25 COG categories ( Figure S2), ''General function prediction only'' was the most populated group (15.99%) followed by ''Signal transduction mechanisms'' (11.27%) and ''Posttranslational modification, protein turnover, chaperones'' (10.83%). The least populated groups were: ''Cell motility'' (0.19%), ''Cell wall/membrane/envelope biogenesis'' (1.32%) and ''Extracellular structures'' (1.4%).

Analysis of Differentially Expressed Genes
To identify differentially expressed genes (DEGs) between the G. lucidum mycelium and fruiting body ( Figure 4A), the expression level for each unigene was calculated using the RPKM method [59] ( Figure 4B). Out of 28,210 unigenes, 8,330 unigenes were differentially expressed between G. lucidum mycelium and fruiting body, of which 913 genes were specifically expressed in fruiting body and 446 in mycelium. 3,938 and 3,033 unigenes showed upregulated expression in fruiting body and mycelium, respectively ( Figure 4C, Table S4). 19,880 unigenes did not show significant expression changes between the two development stages.
1) KEGG annotation analysis of DEGs. To study the function of DEGs, functional annotation was adopted for these identified genes. The metabolic and regulatory pathways relating to these DEGs were analyzed by using the log (base 10) value of the RPKM values ( Figure 5). Metabolic pathways relating to biosynthesis were over-expressed at the transcriptional level in the mycelium stage whereas pathways mainly involved in degradation were more active in the fruiting body. For example, ''ATP synthesis'', ''Citrate cycle (TCA cycle)'', ''RNA polymerase'' and ''N-Glycans biosynthesis'' showed high transcriptional activity at the mycelium stage, whereas ''Starch and sucrose metabolism'', ''Fatty acid metabolism'', ''Glycolysis/Gluconeogenesis'' and ''N-Glycan degradation'' were more active during the fruiting body stage. These expression differences are indicative of their importance at their respective developmental stages. Furthermore, the pathways of ''Purine metabolism'', ''Starch and sucrose metabolism'', ''Glycolysis/Gluconeogenesis'', ''Pyruvate metabolism'', were highly expressed in both developmental stages, demonstrating the importance of these pathways during G. lucidum development.
Among the DEGs, the general annotation level of biosynthesis metabolism and degradation metabolism in both KEGG annotation and GO annotation were almost in accordance with each other in some aspects. For example, in KEGG annotation of DEGs, ''ATP synthesis'' and ''TCA cycle'' were up-expressed in mycelium whereas ''starch and sucrose metabolism'' and ''Nglycan degradation'' were up-regulated in fruiting body. Correspondingly, in GO annotation of DEGs, ''biosynthesis'' and ''protein metabolism'' were over-represented in mycelium whereas ''carbohydrate metabolism'' and ''hydrolase'' were over-represented in fruiting body. This indicated the completeness of the transcriptome and the accuracy of our analysis.
3) The Ganoderic acids biosynthesis pathway. Ganoderic acids (GAs) are important pharmaceutical components in G. lucidum, which have been shown to have anti-tumor activity, roles in immunomodulation, hypoglycemic treatment and hepatoprotection [60][61][62]. To deeply understand GAs, the biosynthesis pathway of this ingredient was analyzed. From Figure 5, the metabolic pathway ''Terpenoid backbone biosynthesis'' in GA biosynthesis showed increased expression in mycelium at the overall transcriptional level and the metabolic processes of this pathway was elaborated by using the KEGG tool ( Figure 7).
Two pathways, 2-methyl-D-erythritol-4-phosphate/1-deoxyxylulose-5-phosphate (MEP/DOXP) pathway and mevalonate pathway (MVP), are involved in the GA biosynthesis. The mevalonate pathway is the main GA biosynthesis pathway. The mevalonate pathway can be divided into four main processes (Figure 7). The first process involves the conversion of acetylcoenzyme A to isopentenyl pyrophosphate (IPP) in which the 3hydroxy-3-methyglutaryl coenzyme A reductase (HMGR) catalyzes the synthesis of mevalonate. Second, the action of various prenyltransferases generate high order terpenoid building blocks (geranyl pyrophosphate (GPP) and farnesyl pyrophosphate (FPP)) from the IPP precursor, which is also involved in the conversion of GPP to FPP by farnesyl diphosphate synthase (FPPs). Then these intermediates are converted into lanosterol by squalene synthase (SQS) and lanosterol synthase (LS). Finally, the lanosterol is metabolized into triterpenoid [63,64].
The expression levels of these unigenes were confirmed by quantitative PCR (Figure 8). Our qPCR results are in accordance with the transcriptome data with the exception of Unigene6227_All (Table 2). Unigene6227_All was up-regulated in fruiting body while Unigene2082_All and Unigene4718_All were only detected in mycelium. Previous work had demonstrated that HMGR (Uni-gene6227_All), FPPs (Unigene803_All), SQS (Unigene111_All) and LS (Unigene4718_All) were highly expressed during the primordia stage [65][66][67][68], in contrast, our transcriptome data and qPCR results showed that Unigene4718_All was up expressed in mycelia. Previous reports have confirmed that the environmental factors can influence the ganoderic acid production by changes in oxygen [69], light [70], PH [71] and heavy metal ion [72]. So, it is comprehensible for the little discrepancy between our results and previous works.
The GAs biosynthesis pathway is complex and such environmental stimuli may affect the expressional level of some genes in this pathway. Further research is needed to further understand this complex metabolic pathway.

Putative Wood-degrading Genes in Ganoderma Lucidum
The basidiomycetes: Ganoderma lucidum, Laccaria bicolor, Coprinopsis cinerea, Schizophyllum commune, Phanerochaete chrysosporium and Postia placenta all belong to the subphyla Agaricomycotina with the top five being white-rot fungi and the last one being brown-rot fungus [48]. Most white-rot fungi belong to the basidiomycete and can degrade components of plant cell walls such as cellulose, hemicellulose, and lignin [75,76]. Lignin-degrading enzymes are comprised of lignin oxidases (LO families) and lignin-degrading auxiliary enzymes (LDA families) [8].
To identify related genes from G. lucidum transcriptome, BLASTx was performed against the fungal oxidative lignin enzymes (FOLymes) database (http://foly.esil.univ-mrs.fr/). G. lucidum has 13 and 9 potential members in LO family and LDA family, respectively ( Table 4). The 13 putative genes of LO      Table 2, which are involved in the GA biosynthesis pathway.
Unigene2082_All and Unigene4718_All are involved in the GA pathway but were not detected in the fruiting body by qPCR. The last 4 unigenes were from Table 3  transferases (GT), and Polysaccharide lyases (PL), which are commonly referred to as carbohydrate-active enzymes (CAZymes, http://www.cazy.org/) [78,79]. Our BLASTx results showed that 120 unigenes from G. lucidum transcriptome had high identity to CAZymes. Out of these unigenes, 78 homologs belonged to the GH superfamily, 40 candidates to glycosyl transferases and 2 candidates to carbohydrate esterases. In the GH superfamily, GH5 possessed 12 unigenes and was the most dominant, followed by GH31 (11 unigenes), GH3 (7 unigenes) and GH16 (7 unigenes). Information of unigene distribution in the GH superfamily is listed in Table S5. The differential expression of these putative lignocellulose degrading genes between G. lucidum mycelium and fruiting body is elaborated in Table 4. In summary, the expression levels of FOLymes between the two developmental stages did not reveal any significant difference, whereas CAZymes, especially the GH superfamily, were found to be generally up-regulated in the fruiting body.

Conclusions
G. lucidum has important herbal medicinal value in Asia. While the function of many its pharmaceutical components (e.g. ganoderic acids, polysaccharides, LZ-8, etc.) has been extensively studied [10,25,61], little genomic information remains available on this fungus.
In the current work, a transcriptome database of G. lucidum was constructed. From the two different developmental stages (mycelium and fruiting body), we identified differentially expressed genes by transcriptome analysis and found that numerous metabolic pathways relating to biosynthesis were significantly over-expressed at the transcriptional level during the mycelium stage, especially in the biosynthesis of important pharmaceutical components in G. lucidum such as ganoderic acids. During the fruiting body stage, unigenes involved in degradation activity were highly expressed such as the glycan degradation. The biosynthesis pathway of GA is complex and further analytical work is required. While G. lucidum is an important medicinal fungus in Asia, little developmental information is available at present, our transcriptome provides a resource for further extensive studies on developmental processes in this important fungi.
Here, we show that transcriptome sequencing is an effective technology for analyzing complex functions of fungal genomes. To our knowledge, this is the first publication of the G. lucidum transcriptome using Illumina sequencing technology.

Sample Preparation
Mycelia were cultured from a central Ganoderma lucidum source (mycelia discs, r = 3 mm) on a potato dextrose agar (PDA) medium at 20uC in the dark with 90% air humidity. After 16 days, the  Table 4. BLASTx results of the G. lucidum transcriptome against FOLymes and CAZymes database. mycelia totally covered the petri dish and 1.5 g of mycelia were collected [12]. The mycelia were then transferred to basswood medium and stored in light at 28uC, 90% air humidity until fruiting-bodies were formed [57]. After nine weeks, 1.5 g of fruiting bodies were collected. Samples from the two developmental stages were frozen at 280uC until RNA extraction.

cDNA Library Construction and Sequencing
Total RNA from Ganoderma lucidum mycelium and fruiting body was extracted using TriZol reagent (Promega) according to the manufacturer's protocol. RNA quantity and quality were checked by a NanoDrop 1000 spectrophotometer (NanoDrop Technologies) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA) and the two samples had RNA Integrity Number (RIN) values greater than 8.5. The mRNA was concentrated from equal amounts of total RNA (20 mg per sample) using oligo (dT) magnetic beads. All mRNA was broken into short fragments (200-700 nt) by adding fragmentation buffer. These short sequences were used for the first strand cDNA synthesis using reverse transcriptase and random primers followed by the second strand cDNA synthesis with DNA polymerase I and RNaseH.
The cDNA fragments were purified using a QiaQuick PCR extraction kit. These purified fragments then were washed with EB buffer for end reparation poly (A) addition and ligated to sequencing adapters. Following agarose gel electrophoresis and extraction of cDNA from gels, the cDNA fragments with the lengths of 200 bp (625 bp) were purified and enriched by PCR to construct the final cDNA library.
Two cDNA libraries (G. lucidum mycelium and fruiting body) were sequenced at Beijing Genome Institute (BGI, Shenzhen, China) on the Illumina sequencing platform (Illumina HiSeq TM 2000) using paired-end technology in a single run. The original images process to sequences, base-calling and quality value calculation were performed by the Illumina GA Pipeline (version 1.6), in which 90 bp paired-end reads were obtained.

De novo Assembly and Analysis of Transcriptome
Before assembly, the 90-bp raw reads were filtered to obtain the high-quality clean reads by removing adaptor sequences, the reads containing more than 10% ''N'' rate (the ''N'' character representing ambiguous bases in reads) and low quality sequences (reads in which more than 50% of the bases had a quality value #10).
De novo assembly of the clean reads was performed using SOAPdenovo program (http://soap.genomics.org.cn) which implements a de Bruijn graph algorithm [80]. Briefly, the overlap information from the clean reads was used to construct highcoverage contigs without N. Then the reads were realigned onto contigs. To connect the contigs, N was applied to represent unknown sequences between each pair of contigs, and a scaffold was built. Paired-end reads were used again for gap filling between scaffolds to obtain unigenes which have the least 'N's and cannot be extended at either end. Assembled unigenes from each sample were further processed by using sequence clustering software, TGI Clustering tools and Phrap.
The BLASTx algorithm was used to search for homologous sequences against the NCBI nr database and LCSPP database using an E-value cutoff of 10 25 . The LCSPP database contained some fungal genomes which were released by the U.S. Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/). Functional annotation of these unigenes was determined by using the BLASTx program against the LCSPP database, followed by additional manipulation with perl scripts.

Analysis of Differentially Expressed Genes (DEGs)
To identify the differentially expressed genes between the two developmental stages, the false discovery rate (FDR) method was used to determine the threshold of P-value in multiple tests [81,82]. A FDR #0.001 and an absolute value of the log2 ratio $1 were used as threshold to determine significant differences in gene expression between the two developmental stages.
Functional annotation was applied to obtain enriched genetic annotation from mycelium and fruiting body. The p-values for significant overpresentation of a particular GO category were calculated by using the hypergeometric distribution.

Quantitative Real time PCR
Total RNA was extracted from mycelium and fruiting body of G. lucidum using TRIzolH by following the manufacturer's protocol. Total RNA was resuspended in 100 ml of nuclease-free water and the concentration was measured (1.5 mg/ml, respectively) by using a NanoVue (GE Healthcare NanoVue). About 2 mg of total RNA was used as template to synthesize first-strand cDNA by using a M-MLV Reverse Transcriptase kit (Promega). The resultant cDNA was stored at 220uC until further use.
Unigenes of interest were subjected to quantitative PCR (qPCR) analysis. Primers were designed on the basis of matched parts against known genes in the KEGG database with our unigenes by using Primer 3. Detailed information of these primers is shown in Table S6. Amplifications were performed using 0.5 ml (10 mM) of the specific primers, 10 ml of SYBRH qPCR Mix (TOYOBO) and 40 ng of cDNA in a final volume of 20 ml. The cycling parameters were 95uC for 4 min followed by 45 cycles of 95uC for 10 s, 57uC for 15 s and 72uC for 20 s. 3 replicates were performed for each gene tested in real time PCR reactions. The G. lucidum glyceraldehyde-3-phosphate dehydrogenase (Gl-GPD) gene (Uni-gene27520_All, 100% similarity) was used as the internal reference gene [66,83]. Relative gene expression was analyzed by the 2 2DDCT method.

CAZy and FOLy Annotation
Unigenes from Ganoderma lucidum transcriptome that are related to carbohydrate-related enzymes and ligin oxidative enzymes were confirmed by BLASTx searches against CAZy and FOLy database using a threshold of 1.0E 220 . Unigenes possessing a sequence identity greater than 60% with these biochemically characterized enzymes were considered to be a CAZyme or FOLyme.

Data Deposition
The raw Illumina sequencing dataset of Ganoderma lucidum was submitted to the NCBI Sequence Read Archive under the accession number of SRA036392. Figure S1 BLASTx result distribution of G. lucidum transcriptome against the LCSPP database. All unigenes from the G. lucidum transcriptome were aligned against the LCSPP database using BLASTx with a cutoff E-value of 1.0E 25 . The percentages represent the ratio between the BLASTx result against one of the five fungal genomes and all unigenes. 22.08% of 28,210 unigenes showed high identity to Postia placenta, followed by Phanerochaete chrysosporium (21.36%), Schizophyllum commune (16.73%), Coprinopsis cinerea (11.87%) and Laccaria bicolor (10.25%). (TIF)