Characterization of Dendrolimus houi Lajonquiere (Lepidoptera: Lasiocampidae) Transcriptome across All Life Stages

Dendrolimus houi Lajonquiere is a phytophagous caterpillar infesting many economically important coniferous tree species in China, causing serious economic and ecological environment losses. Based on previous research, it has one generation per year in South China and East China in contrast to two generations per year in Yunnan province in southwestern China. The species is potentially resilient to climatic extremes in these regions with the eggs and 1st instar larvae surviving in the winter (5 °C), older instar larvae and pupae surviving high temperatures in the summer (35 °C), suggesting some temperature stress tolerance during different developmental stages. However, little is known in this species at the genetic and genomic level. In this study, we used high throughput sequencing to obtain transcriptome data from different developmental stages (eggs, 1st–3rd instar larvae, 4th–5th instar larvae, 6th–7th instar larvae, pupae, male and female adults), which were collected from Fujian province. In total, we obtained approximately 90 Gb of data, from which 33,720 unigenes were assembled and 17,797 unigenes were annotated. We furtherly analyzed the differentially expressed genes (DGEs) across all stages, the largest number between the eggs and 1st instar larvae stage and gene expression varied significantly in different developmental stages. Furthermore, 4138 SSR genes and 114,977 SNP loci were screened from transcriptome data. This paper will be a foundation for further study towards improved integrated pest management strategies for this species.


Introduction
The pine moth Dendrolimus houi Lajonquiere (Lepidoptera: Lasiocampidae) is a widely distributed and adaptable phytophagous pest, which seriously infests leaves of Cryptomeria fortunei, Pinus yunnanensis, Pinus massoniana, Cupressus funebris, Platycladus orientalis during its larval stage, causing thousands of hectares of dead and dying coniferous trees ( Figure 1) in South China. Previous studies mostly focused on distribution, host range, biology, natural enemies and pest management approaches [1][2][3][4]. This pest develops on conifers mostly at high elevations and can have one or two generations per year depending on the local climate. Interestingly, it has a much longer development time, particularly in the larval stages, than other species of Dendrolimus in China, including D. punctatus Walker [5], D. kikuchii Matsumura [6], D. superans Butler [7], D. spectabilis Butler [8] and others. Over the approximately 150 d larval development time, the body length of larva grows from 7 mm (the 1st instar larva) to 116 mm (the final instar larva, 7st instar) and during this time there are significant changes in the caterpillar, including developing toxic setae. There is a tendency to overwinter as eggs or the 1st instar larvae. These stages have strong tolerance to low (above freezing) temperatures (under 5 • C) in the winter, while older larvae to final instar larvae and pupae have tolerance to high temperatures (above 35 • C) in the warmest summer months. Different stages of this pest have some special biological adaptations to ecological factors. Consequently, we inferred that a series of physiological changes and adaptations have taken place during the developmental process. However, little is known about development and regulatory mechanisms of D. houi at the molecular level. Transcriptomics and de novo transcriptome reconstruction is a robust method to characterize these mechanisms across developmental stages of D. houi. Technically, the high-throughput next generation sequencing technology (NGS) has greatly promoted the application of insect transcriptomics [9][10][11][12][13][14], mostly focusing on their growth and development, classification, toxicology, interaction between insects and host plants and even non-coding RNA. For instance, transcriptomic data of Adelphocoris suturalis and D. punctatus across different stages was obtained using NGS techniques to define the gene expression related to the development of insects [15][16][17] and that of Spodoptera frugiperda was sequenced to reveal the mechanism of antivirus resistance [18]. In this study, four stages across all life cycle of D. houi were sampled to obtain transcriptome data and understand the gene expression associated with development, which will be very helpful to potentially reveal the gene function related to regulatory mechanism of development, phylogeny and evolution and the interaction between insects and other organisms.

RNA Isolation and Illumina Sequencing
Total RNA was extracted by TRIzoI Reagent (Invitrogen, Carlsbad, CA, USA), then the total RNA degradation and contamination was monitored on 1% agarose gels. The purity of RNA was determined by Nanodrop spectrophotometer (IMPLEN, Westlake Village, CA, USA), RNA concentration was measured using Qubit ® RNA Assay Kit in Qubit ® 2.0 Flurometer (Life Technologies, Carlsbad, CA, USA) and the integrity was accurately determined by Agilent 2100 (Agilent Technologies, Santa Clara, CA, USA). Sequencing libraries were generated using NEBNext ® Ultra TM RNA Library Prep Kit for Illumina ® (NEB, Ipswich, MA, USA), following the manufacturer's recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. First strand cDNA was synthesized using random hexamer primer and reverse transcriptase (Invitrogen, Carlsbad, CA, USA). Second strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. Then the high-throughput RNA-sequencing libraries were prepared, following Illumina's protocols and were sequenced on the Illumina Hiseq 2000 platform (Illumina, San Diego, CA, USA).

De Novo Transcriptome Assembly and Annotation
In order to obtain high quality clean reads, the raw reads were filtered to remove adaptor fragments, reads containing unknown nucleotide "N" over 5% and empty tags. Trinity v2.5.1 (Broad Institute, Cambridge, MA, USA) with min-kmer-cov set to 2 by default and all other parameters set default [19] was used to assemble all reads from all stages for individual samples and the TGICL v2.1 clustering tool [20] was utilized to assemble all the unigenes with default parameters. Then, the unigenes were aligned with the Nr (non-redundant protein sequence), Swiss-Prot (a manually annotated, non-redundant protein sequence), COG (clusters of orthologous groups of proteins), KOG (clusters of protein homology), eggNOG 4.5 (orthologous groups of genes) and KEGG (kyoto encyclopedia of genes and genomes) databases by BLAST v2.2.31 with a cut-off e-valve of 10 −5 [21] and GO (gene ontology) annotation was performed using Blast2GO v2.5 [22]. TransDecoder 5.0.0 (Broad Institute, Cambridge, MA, USA) was used to predict the amino acid sequence of unigene and the HMMER v3.1b2 software was used to search the Pfam database to get annotation information of the unigenes.

Sequencing and Analysis of Differential Gene Expression Profile
Based on FPKM (fragments per kilobase of transcript per million mapped reads) value, the levels of unigene expression were calculated and normalized. Differential expression analysis of sample groups was performed using the DESeq R package (1.10.1). DESeq provides statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution, and the generally accepted Benjamini-Hochberg method [23] was adopted to correct the p-value of the original hypothesis tests and finally adopted it (False Discovery Rate, FDR). It was the key indicator of differential gene expression screening to reduce false positives resulting from independent statistical hypothesis testing of the expression values of a large number of genes [24]. We use "FDR ≤ 0.001 and the absolute value of Log2 ratio ≥ 1" as the threshold to judge the significance of the gene expression difference. Then, the specificity of gene expression variation across eggs-L1-3 instar larvae, L1-3-L4-5 instar larvae, L4-5-L6-7 instar larvae, L6-7 instar larvae-pupae and pupae-adults was analyzed respectively. The change and enrichment of differentially expressed genes at five different stages by functional category was performed using GO and KEGG pathways enrichment analysis. GO enrichment analysis, which were calculated the gene numbers for each term, determining the GO terms that were significantly enriched with DEGs by using a hypergeometric test [25]. The calculated p-value was subjected to correct, using the corrected p-value (q-value) ≤ 0.05 as a threshold. And KEGG pathway enrichment analyses with a q-value ≤ 0.05 were considered significantly enriched.

Microsatellite Identification
All assembled unigenes more than 1 kb in length were used to identify simple repeats using MISA. The reporting criteria are as follows: mono-nucleotide repeat threshold is ≥ 10, di-nucleotide repeat threshold is ≥ 6, trinucleotide to hexa-nucleotide repeat threshold is ≥ 5.

Single Nucleotide Polymorphism
The sequence alignment software STAR 2.6.0b was used to align the raw RNA-seq reads to the unigene sequence for each sample and GATK v3.2.2 (Broad Institute, Cambridge, MA, USA) was used following its best practices to identify single nucleotides polymorphism (SNP) sites.

Sequencing and de Novo Transcriptome Assembly
In total, after filtering, 138,340,761 clean reads were obtained across the different development stage samples of D. houi with generally uniform coverage across samples (min of 20.7 M reads, max of 26.3 M reads, Table 1). The clean reads sequences were assembled by using Trinity (default parameters). A total of 65,149 contigs and 33,720 transcripts unigenes with a length ranging from 300 to 3000 bp were obtained. Among them, 22,012 unigenes were > 500 bp and 13,999 were > 1000 bp. The length of unigenes between 300 and 500 bp accounted for 34.71% (Table 2).

Predictive Protein Annotation
A total of 17,797 unigenes with annotated information were obtained (Table 3), accounting for 52.78% of the total sequences, suggesting that some transcripts may be unique to this species or may be significantly variable from homologous transcripts in other species. The unigenes annotated in the Nr database, 34.85% having significant homology (< 1e −5 -1e −50 ) (Figure 2A). The species similarity distributions showed that the majority of the matches were concentrated between 60% and 80% similarity ( Figure 2B) and 4898 sequences had sequence similarity more than 80%. Matches to the Nr database showed the highest similarity to Spodoptera litura, accounting for 12.20%, followed by Helicoverpa armigera, Bombyx mori, Heliothis virescens, Nephila clavipes, Amyelois transitella, Papilio xuthus, Papilio machaon, Bicyclus anynana and others in decreasing order ( Figure 2C).

Comparison and Analysis of Transcriptome at Different Developmental Stages of D. Houi
According to the Venn diagram, there were 338 unique genes at eggs stage, 230 unique genes from L1-3 instar larvae, 1261 unique genes from L4-5 instar larvae, 1293 unique genes from L6-7 instar larvae, 496 unique genes from pupae stage and 1052 unique genes at adults stage and 12,953 genes expressed across all stages, accounting for 45.1% of the total ( Figure 5A). FPKM method was used to evaluate the gene expression level, five groups (eggs-L1-3 instar larvae, L1-3-L4-5 instar larvae, L4-5-L6-7 instar larvae, L6-7 instar larvae-pupae and pupae-adults) were compared and the genes with significant difference in expression were identified (FDR ≤ 0.001, Log2 ratio ≥ 1). In the heatmap, the gene expression profiles of different stages were significantly different and each stage was composed of a major cluster of highly expressed genes ( Figure 5B). We found some embryogenesis genes such as vasa, mago nashi, easter, cactus, dorsal, VgR, maternal effect genes (bicoid, nanos), gap genes (hunchback, kruppel) and segment polarity genes (ftz) that were highly expressed during the egg stage. Epidermal protein and cuticle protein genes had high expression in larva and pupa stages. Chorion protein and chitin binding protein genes were identified in adult stage. During different stages from eggs to L1-3 instar larvae, the expression profiles of 11,662 genes were changed, among which 6669 genes were up-regulated and 4993 genes were down-regulated. From L1-3 to L4-5 instar larvae, 1804 genes were up-regulated and 766 genes were down-regulated. There were 1345 and 809 up-down regulated genes respectively from L4-5 to L6-7 instar larvae ( Figure 5C). The most up-regulated genes from eggs to L1-3 instar larvae were glycerol kinase, fatty acid-binding protein, trehalose transporter, sorbitol dehydrogenase, tyrosine-protein phosphatase 10D, cytochrome P450s, alpha-esterase 40 precursor, cuticle protein and NADH dehydrogenase subunit 5 and the most down-regulated genes were vam6/Vps39, metaxin-1 isoform X2, kinesin-like protein KIF23. Genes of Glycoside hydrolase, lipA and epidermal retinol dehydrogenase were up-regulated and GMP reductase, fatty acid-binding protein were down-regulated from L1-3 to L4-5 instar larvae. Comparison between L4-5 and L6-7 instar larval stage identified the genes of heat shock protein, juvenile hormone, epidermis protein (cuticular protein RR-2 motif 63, larval cuticle protein LCP-22-like, cuticular protein hypothetical 8 precursor), metabolism genes (cytochrome P450s, carboxylesterases and glutathione S-transferases) antimicrobial peptide, endonuclease, collagenase were up-regulated. The most down-regulated genes were histone genes (histone H3.2, histone H1, late histone H2B.L4-like), vitellogenin receptor, protein synthesis protein, lipid metabolism and transport protein, yolk protein receptor, DNA synthesizer. According to GO enrichment analysis (Figure ??), highly expressed genes were associated with metabolic process and catalytic activity (Figure ??A). In addition, KEGG enrichment analysis showed that during eggs to L1-3 instar larvae pathways related to lysosome, phagosome, fatty acid metabolism, fructose and mannose metabolism, proteasome, galactose metabolism, insect hormone biosynthesis, pentose and glucuronate interconversions, extracellular matrix (ECM)-receptor interaction were active ( Figure 7A1). However, the up-regulated pathways have changed at the stages of L1-3-L4-5 and L4-5-L6-7 instar larvae (oxidative phosphorylation and ribosome, oxidative phosphorylation, respectively) ( Figure 7B1,C1) and the down-regulated pathways also changed ( Figure 7B2,C2). When the L6-7 instar larval stage transitioned to the pupal stage, there were 7941 genes expressed differently, including 2520 genes up-regulated and 5421 genes down-regulated ( Figure 5C). The most up-regulated genes were alpha-tocopherol transfer protein, inhibitor of growth protein, fatty acid synthase, heat shock protein, lysozyme, storage protein and insulin, while the most down-regulated genes were larval cutin, ion channel, glycerol kinase, cellulase, sensory protein, odorant binding protein, chitin metabolism genes (cytochrome P450) and so on. In GO enrichment analysis (Figure ??D), it mainly focused on biological process, including metabolic process, cellular process and single-organism process. Then, it was catalytic activity, binding and structural molecule activity (molecular function), cell part, the composition of the whole membrane (cellular composition). The same as up-regulation process, the biological process was also dominant at down-regulation process. A little number of pathways were up-regulated, including endocytosis, apoptosis-fly and soluble NSF attachment protein receptor (SNARE) interactions in vesicular transport and two pathways were significantly down-regulated, including oxidative phosphorylation and ribosome ( Figure 8A1,A2).
Finally, there were 6508 genes expressed differently, including 3152 genes up-regulated and 3356 genes down-regulated ( Figure 5C) from pupa to adult stage. The most up-regulated genes were, hydrolase, DNA synthesis, TFEB, structural protein, G1/S-specific cyclin-E, odorant-binding protein, neuropeptide receptor, chemosensory protein, multiple epidermal growth factor, transporter and so on, while the most down-regulated genes were toll-like receptor, hemolymph proteinase, juvenile hormone esterase, carboxypeptidase, lysozyme, lipase and so on. Based on GO enrichment analysis (Figure ??E), it was mainly concentrated in biological processes, most of them were related to cell proliferation, such as reproduction, metabolic process, cellular process, reproductive process, biological adhesion, multicellular organismal process, developmental process and so forth. These results indicated that the main gene expression during adult stage was in the process of cell proliferation, which it could be inferred that the survival mode of D. houi during adult stage was mainly for the purpose of reproduction. However, in the process of down-regulation, molecular function was dominant, including catalytic activity, binding, structural molecule activity, transporter activity and so on. In KEGG enrichment analysis ( Figure 8B1,B2), DNA replication, nucleotide excision repair and aminoacyl-tRNA biosynthesis were the most up-regulated, only pathway of phagosome was the most down-regulated. (A1-C1) means the up-regulated genes of KEGG pathway from eggs to L1-3 instar larvae stage, L1-3 to L4-5 instar larvae stage, L4-5 to L6-7 instar larvae stage, respectively; (A2-C2) means the down-regulated genes of KEGG pathway from eggs to L1-3 instar larvae stage, L1-3 to L4-5 instar larvae stage, L4-5 to L6-7 instar larvae stage, respectively. Each circle in the graph means a KEGG pathway, x-axis is the enrichment factor and y-axis means the name of the pathway. (A1,B1) means the up-regulated genes of KEGG pathway from L6-7 instar larval to pupal stage and pupae to adults stage; (A2,B2) means the down-regulated genes of KEGG pathway from L6-7 instar larvae to pupae stage and pupae to adults stage. Each circle in the graph means a KEGG pathway, x-axis is the enrichment factor and y-axis means the name of the pathway.

SSR and SNP Analysis of D. Houi
A total of 4138 SSR (Simple Sequence Repeats) genes were searched from the transcriptome of D. houi, mono-nucleotide repeats were the most prevalent repeat type ( Figure 9A), followed by di-nucleotide repeats (644), trinucleotide repeats (536) and compound SSR (113), tetra-nucleotide repeats (28), penta-nucleotide repeats (4) and hexa-nucleotide (1). Motif A accounted for the largest number of mono-nucleotide repeats and motif AT was the most numerous in the di-nucleotide repeats, potentially representing the poly (A) tail of mRNA. In SSR data, the number of repeats in the range of 5-9 repeats was the highest and the number of repeats more than 24 was less (Table 4). Meanwhile, 10,239 primer pairs were designed as part of the analysis that could be used to target these SSR regions.  There were 429,997 SNP loci during all stages (Table 5), including 217,530 homozygous SNPs and 212,467 heterozygous SNPs in the transcriptome of D. houi. The number of SNP loci during larval stage were the least (61,269) but at the adult stage were the largest ( Figure 9B).

Discussion
In this study, we obtained 86.48 Gb clean data from samples across all life stages of D. houi by using high throughput sequencing, assembling to 33,720 unigenes of which a total of 17,797 unigenes were finally annotated. For sequences that had BLAST match in the Nr database, the most common and most homologous matches were to S. litura, accounting for 12.20% of the top hits. We further analyzed the differentially expressed genes at different growth stages of the insect. There were 26,111 differentially expressed genes in the DGE libraries. From egg to whole larva stage, the differentially expressed genes accounted for 53.14%, which indicated that the regulatory mechanism might be more complex compared to other stages. The difference of gene expression across different stages indicated regulation mechanism during different stages, which can be further explored in future research.
The egg and newly hatched larva stage of D. houi presented high tolerance to the low temperature stress in the winter and we found that some resistance and energy storage genes were up-regulated such as glycerol kinase, fatty acid-binding protein, trehalose transporter, sorbitol dehydrogenase, cytochrome P450s, alpha-esterase 40 precursor and NADH (Nicotinamide Adenine Dinucleotide) dehydrogenase subunit 5 from the egg to L1-3 instar larval stage. However, some high expression genes have changed with the increase of larval instar, external temperature and self-growth metabolism, the resistance of the older larvae to high temperature and other unfavorable external factors also increased. For example, gene of acid-binding protein began to be down-regulated of L4-5 instar larva but genes of heat shock protein (HSP), juvenile hormone, epidermis protein, detoxifying enzymes and so on were up-regulated in older larvae. Previous studies have shown that insects can improve the cold resistance by producing small molecules of cold-resistant substances such as glycerol, trehalose, sorbitol, glucose and so forth [26][27][28]. And they also depend on some kinds of proteins such as HSPs to resist high temperature, the HSPs play an important role in improving the heat resistance of organisms [29][30][31]. The ability of cold and heat resistance of D. houi may be related to these high expression genes. Moreover, Togawa et al. [32] found that the CPFL2-7 epidermis protein gene of Anopheles gambiae was highly expressed in the larval stage, which may be involved in the formation of larval epidermis. We also assumed that these highly expressed epidermis protein genes in the larval stage might be involved in the epidermis formation of D. houi larvae. Interestingly, the up-regulation of juvenile hormone genes within larvae were relatively abundant, partially the long development duration associated with having seven instars of larvae, possibly playing a role in maintaining a long larval stage with many instars [33][34][35]. Furthermore, the expressed gene level of related detoxification metabolic enzymes was also high in the process of larval feeding, because detoxification enzymes (cytochrome P450s, carboxylesterases and glutathione S-transferases) were in vivo are needed to participate in the metabolic decomposition of secondary substances within host plants or some pesticides [9,[36][37][38][39]. The up-regulation of these genes provided help for them to overcome the adverse environmental factors.
During the larval and pupal stage, there were differentially expressed 7941 genes, the up-regulated genes accounted for only 31.73% and the down-regulated genes reached 68.27%. Additionally, the gene expression levels of fatty acid synthase, storage protein and other genes were relatively high during this stage. The fatty acid synthase and storage protein was synthesized in the fat of feeding larvae and released into the hemolymph and reached the peak at the end of larval stage, which provided protein and amino acids for development of pupae and organs or tissue formation of adults, playing an important regulatory role for the storage of amino acids inside insects for normal metabolism [40][41][42]. Moreover, lysozyme was significantly up-regulated, which will be very helpful to overcome external adverse factors and provide favorable conditions for its autoimmunity [43][44][45].
From pupae to adults, there were 6508 differentially expressed genes and the number of upregulated genes were close to down-regulated genes. In the process of adult emergence from pupal stage, the transcription factor TFEB (transcription factor EB) was relatively enriched. Sardiello et al. found that the synthesis and function of lysosome are regulated by a gene network and the coordinated expression of these genes is controlled by transcription factor TFEB. And TFEB itself can be activated during lysosomal dysfunction, regulated the abundance of lysosomes in cells and its ability to degrade complex molecules [46], which played an important role in the process of apoptosis and self-digestion of its own cells [44]. Furthermore, the expression of structural protein genes was up-regulated, which played an important role in the formation of tissue and organs as well as external morphological structure in the process of adults [47,48]. In addition, high expression of chemosensory genes is beneficial to adults mating, oviposition and other behaviors, which rely on sensitive olfactory mechanisms [10,49].
Technically, molecular marker is an important method for population genetics research [50,51]. In this study, 4138 SSR genes and 114,977 SNP loci were identified from the transcriptome of D. houi, mono-nucleotide repeat is the main type of SSR, accounting for the largest proportion, the trinonucleotide repeat was the most common type, which is similar to results of some previous research on Epacromius coerulipes [52], Bactrocera dorsalis [53]. Thousands of primer pairs were designed by transcriptome data, which greatly saved the time for primer design, although the validation assay of designed primers have not yet been performed, which provided a certain basis for the further study of genetic evolution, the SSR and SNP genes screened in this study will lay a foundation for the study of population genetics of this species at the same time, the study of differentially expressed genes will provides a new direction and method to formulate more reasonable integrated pest management. Further, we plan to further evaluate mechanisms for adaptation in future experiments, to evaluate how this species is able to adapt to extreme environments (e.g., temperature).
The pine moth D. houi is extremely harmful when it exhibits outbreaks, particularly in Yunnan and Fujian provinces. However, it has different generations in these two regions, which may be related to its environmental factors. The Yunnan population of D. houi perches and survives at high altitude all the year round, consequently, those caterpillars might be induced to strengthen their cold tolerance because they were usually exposed to longer period, higher frequency and stronger intensity of low temperature stress than Fujian population. Furtherly, this impact might also be partially induced by hosts of Yunnan population, P. yunnanensis, which prefer an area with high altitude from 2000 to 3200 m, while Fujian population of D. houi infests C. fortunei, which are located in the area with altitude of 400-2000 m. Interestingly, some previous research has shown that different host plants provided different nutrients and the insects feed on different hosts before overwintering, which affects the composition and accumulation of cold-resistant substances within their bodies, leading to differences in cold-resistant properties [26,54]; and P. yunnanensis was proved to be one of the most cold-resistant species hosts (grade I) [55]. On the contrary, the Fujian population survives under the longer period, higher frequency and stronger intensity of high temperature stress in the summer, which might strengthen their adaptation of older larvae or pupae to heat stress. Obviously, these two populations must be regulated by both temperature stress and host difference, which may lead to differential profiles of gene expression. We assumed these results will provide a theoretical basis for more targeted pest control.

Conclusions
In this study, four stages across all life cycle of D. houi were sampled to obtain transcriptome data and understand the gene expression associated with development, which will be very helpful to potentially reveal the gene function related to regulatory mechanism of development, phylogeny and evolution and the interaction between insects and other organisms. This paper will be a foundation for further study towards improved integrated pest management strategies for this species.