Comparative profiling of hepatopancreas transcriptomes in satiated and starving Pomacea canaliculata

Background Although Pomacea canaliculata is native to South and Central America, it has become one of the most abundant invasive species worldwide and causes extensive damage to agriculture and horticulture. Conventional physical and chemical techniques have been used to eliminate P. canaliculata, but the effects are not ideal. Therefore, it is important to devise a new method based on a greater understanding of the biology of P. canaliculata. However, the molecular mechanisms underlying digestion and absorption in P. canaliculata are not well understood due to the lack of available genomic information for this species, particularly for digestive enzyme genes. Results In the present study, hepatopancreas transcriptome sequencing produced over 223 million high-quality reads, and a global de novo assembly generated a total of 87,766 unique transcripts (unigenes), of which 19,942 (22.7%) had significant similarities to proteins in the UniProt database. In addition, 296,675 annotated sequences were associated with Gene Ontology (GO) terms. A Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was performed for the unique unigenes, and 262 pathways (p-value < 10−5) in P. canaliculata were found to be predominantly related to plant consumption and coarse fiber digestion and absorption. These transcripts were classified into four large categories: hydrolase, transferase, isomerase and cytochrome P450. The Reads Per Kilobase of transcript per Million mapped reads (RPKM) analysis showed that there were 523 down-regulated unigenes and 406 up-regulated unigenes in the starving apple snails compared with the satiated apple snails. Several important genes are associated with digestion and absorption in plants: endo-beta-1, 4-glucanase, xylanase, cellulase, cellulase EGX1, cellulase EGX3 and G-type lysozyme genes were identified. The qRT-PCR results confirmed that the expression patterns of these genes (except for the longipain gene) were consistent with the RNA-Seq results. Conclusions Our results provide a more comprehensive understanding of the molecular genes associated with hepatopancreas functioning. Differentially expressed genes corresponding to critical metabolic pathways were detected in the transcriptome of starving apple snails compared with satiated apple snails. In addition to the cellulase gene, other genes were identified that may be important factors in plant matter metabolism in P. canaliculata, and this information has the potential to expedite the study of digestive physiology in apple snails. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0485-7) contains supplementary material, which is available to authorized users.


Background
Pomacea canaliculata (apple snail) is a member of Gastropoda and originates from South and Central America, and it is beginning to emerge in locations worldwide, including China, representing one of the 100 most invasive species in the world [1][2][3][4]. P. canaliculata features high reproductive capacity, strong adaptability, large food intake and a lack of effective predators, which indicate that it is a considerable threat to the ecosystem balance of fields and ponds [5][6][7]. This species has rapidly adapted to new environments and is now found in at least 11 provinces in southern China [8].
Presently, medication and predation approaches are applied to eradicate P. canaliculata. However, the results are not satisfactory. Moreover, these approaches are associated with drug residues and animals such as geese and ducks, which exert a negative influence on crops. In addition, the control of P. canaliculata in wetland ecosystems has received limited attention [9][10][11][12][13][14][15].
The hepatopancreas is an important digestive organ in P. canaliculata [16], and the expression of digestive enzyme genes in P. canaliculata may be closely related to nutrient metabolism and energy balance. The relative lack of information on enzyme genes has impeded research on the digestive physiology of P. canaliculata. Because of the advancements in DNA sequencing technology, generating a large amount of sequence data from non-model organisms such as the apple snail has become increasingly affordable. Although transcriptomic analyses of the hepatopancreas have been performed in several snail species [17,18] using RNA-Seq, limited information is available on the P. canaliculata hepatopancreas transcriptome [19]. Moreover, the available reports were confined to topics such as invasion prevention, population control, species discrimination, etc. [20,21]. Therefore, a dearth of research has been conducted on the digestive physiology of P. canaliculata. A transcriptome analysis can enrich the supply of genetic information on the apple snail, explore its digestion and the molecular mechanisms of its growth and development, and provide a reference for research on targeted drugs for snail control.
In this study, we divided snails into two groups, starving apple snails and satiated apple snails; collected total hepatopancreatic RNA from both groups; and constructed cDNA libraries. The cDNAs were sequenced using an Illumina/Solexa platform. Approximately eight gigabytes of high-quality sequencing data were generated. For the functional annotation of the assembled gene transcripts (contigs), we searched against the NCBI nonredundant (NR) and UniProt databases using BlastX. In addition, we compared the gene expression in the hepatopancreatic tissue between satiated apple snails and starving apple snails. This strategy can help further our understanding of the biological responses of P. canaliculata and the hepatopancreas transcriptome dynamics of plant consumption in P. canaliculata.

Methods
Sample collection and RNA extraction P. canaliculata specimens (15-20 mm shell length; 21.11 ± 0.23 g; 20 individuals) were collected from the Hunan Academy of Agricultural Sciences experimental paddy and then sent to the laboratory. The specimens were divided into hunger and satiety feeding groups. After 7 days of rearing in the aquarium, total RNA was extracted from two sets of snail hepatopancreases. The RNA quality was verified with an absorbance microplate reader (260/280 ratio > 2.0) and by agarose gel electrophoresis. Spare samples were stored at−80°C.

Construction of a cDNA library for sequencing
The cDNA libraries were constructed via PCR amplification using random hexamer priming (the process was conducted in strict accordance with the manual). After library construction, cluster formation and Illumina sequencing were conducted on a HiSeq 2000 platform (Illumina, SY-401-2501) by Bohao Biotechnology Corporation (Shanghai, China) using the methods previously reported by our group [22].

Data preprocessing and de novo assembly
For the preprocessing, all the adaptor sequences, lowquality reads, ambiguous bases and sequences containing fewer than 20 nucleotides were removed from further analysis prior to contig assembly.
The sequencing data from the two samples were merged into the pooled reads, and de novo assembly was performed using CLC Genomics Workbench (version 6.0.4, CLC Bio, Aarhus, Denmark). De novo sequence assembly was performed on all remaining reads using a contig scaffolding algorithm with a minimum contig length of 200 bp or longer and a word size of 24 characters. The sequences resulting from this phase were designated contigs. Then, we applied CAP3 EST to the spliced primary unigenes to obtain the final unigenes.

Functional annotation and classification
The final unigene sequences were compared with the NR database and the UniProt database (filter: E-value < 1e-5). The NR annotations of the resulting unigenes were searched with Blast2GO Gene Ontology (GO) functional classification algorithms, and the unigenes were functionally classified using cluster of orthologous groups (COG). The unigenes were searched in the CDD using rpstblastn, and all results with Evalues < 1e-5 were assigned KOG functional classification predictions and mapped to each COG level of classification, whereas the online KAAS pathway alignment analysis tool was applied to the unigene splices for the KEGG mapping analysis.

Screening of differentially expressed unigenes
Taking Final Unigene as the reference, respectively, reads each sample was reference mapping, to obtain a sample of each coverage in each Unigene reads the application RPKM (Reads Per Kilobase of transcript per Million mapped reads), which was calculated via the following formula: RPKM ¼ transcription reads transcription length Â total assembly reads in run Â 10 9 where transcription_reads is the number of reads throughout the unigene coverage, transcription_length is the length of the unigene, and total_assembly_reads_in_run is the total number of reads involved in splicing for all samples. Each RPKM result was normalized to the total expression of the corresponding sample as an internal standard. Fisher's exact test was used for the statistical analysis. Differentially expressed genes (DEGs) were identified using the analysis package DEGseq under the following conditions for at least one sample in the group: FDR < 0.05; fold change ≥ 1; and an average RPKM ≥ 3.

Putative molecular markers and open reading frames (ORFs)
The flanking sequences of a particular microsatellite are usually highly conserved, and the repeat unit is species specific. Using this characteristic, an SSR analysis was conducted on the unigene sequences with MISA software (http://pgrc.ipk-gatersleben.de/misa/). SSR markers can be used to develop STMS, one of the most commonly used microsatellite markers, using the principle of simple sequence repeat length polymorphisms (SSLPs). EMBOSS (6.4.0)-getorf [23] was used to generate ORF predictions for all unigenes, and the ORF greater than 300 bp was considered the ORF of the unigene.

Differentially expressed gene validation by real-time quantitative RT-PCR (qRT-PCR)
We screened the protein-coding genes associated with plant digestion and absorption using RT-PCR validation.
The following six genes were selected: cellulase, cellulase EGX3, cellulase EGX1, endo-beta-1, 4-glucanase, xylanase and G-type lysozyme. The primer sequences (Additional file 1: Table S1) for the qRT-PCR were designed utilizing the Roche Universal Probe Library Assay Design Center (https://lifescience.roche.com/en_cn.html) and synthesized by Sangon Biotechnology Corporation (Shanghai, China). The cDNA synthesis and RT-PCR processes were conducted according to the manufacturer's instructions. A reaction volume of 20 μL was used. The amplification program was as follows: 95°C for 30 s, followed by 34 cycles of 95°C for 10 s and annealing at 60°C for 30 s.

Results and discussion
RNA sequencing and de novo assembly of the hepatopancreas transcriptome After pre-processing the sequencing data of the two samples, we obtained 95,126,010 and 128,258,226 clean reads. The clean ratios (clean ratio = clean reads/raw reads) (%) were 94.9 and 94.6%.
The high-quality reads were assembled de novo using CLC Genomics Workbench v6.0.4. A total of 90,141 contigs were generated, with an average length of 108,487 bp and an N50 of 1,710 bp. To obtain the unigenes, we applied CAP3 EST to the secondary splices, which were then assembled into 87,766 unigenes, with a mean unigene size of 110,818 bp, a total length of 97,260,581 bp and an N50 of 1,772 bp. The results of the assembly are listed in Table 1. The majority of the unigenes were between 400 and 600 bp (30.37%). The length distribution of the assembled unigenes is shown in Fig. 1.

Functional annotation and classification
A total of 16,384 unigenes were identified by comparing our sequences with the NCBI NR database. The unigenes were searched in a protein sequence database and subjected to a UniProt comparative analysis, with the filtering criterion: E-value < 1e-5 ( Table 2). The results show that of all unigenes, only 18.7% (n = 16,384) had significant similarities to proteins in the NR database (Additional file 2: Table S2), and a total of 19,942 unigenes had significant similarities to proteins in the UniProt database (Additional file 3: Table S3). However, other unigenes did not return any results in either of the two databases, which indicates that our study revealed a number of new genes and non-coding RNA sequences and enriched the snail hepatopancreas transcriptome gene data resources. Overall, the transcriptome sequencing yielded a great number of unique genes in the species, which is consistent with similar results reported in other species [20].
The Blast2GO algorithm was used to determine the GO functional classifications of the NR results. All sequences were classified according to the three major GO categories, with 23 in "biological process," 17 in "cellular component" and 14 in "molecular function". The distribution of the terms in each of the major GO categories is shown in Fig. 2 (Additional file 4: Table S4). In the biological process category, unigenes related to metabolic processes (4131, 27.68%) and cellular processes (3538, 23.69%) were predominant, indicating that certain basic processes occurred in the hepatopancreas. A high percentage of genes were classified as "cell" and "cell part" under the cellular components category. Moreover, a COG classification of all of the unigenes indicated that 13,060 unigenes were assigned to 25 COG categories (Fig. 3) (Additional file 5: Table S5). The top 3 groups were "signal transduction mechanisms" (2,600, 19.9%), "general function prediction only "(2,052, 15.7%) and "posttranslational modification, protein turnover and chaperones" (1,205, 9.2%). "cell motility" (25, 0.02%) had the least representation.
KAAS was used for the KEGG map analysis of 20,885 unigenes using comparative analysis tools. These unigenes were mapped to a 262-article pathway, and most were concentrated in the following categories: metabolic: biosynthesis of secondary metabolites; microbial metabolism in diverse environments; and protein processing in the endoplasmic reticulum.

Functional enrichment analysis of DEGs
To better understand the function of the DEGs, GO functional and KEGG pathway enrichment analyses were performed when a p-value was less than 0.05 was observed. The up-regulated and down-regulated genes were significantly enriched in 194 GO categories (Additional file 6: Table S6). The DEGs were mainly concentrated in the functional categories of "carbohydrate metabolic process," "hydrolase activity," "hydrolyzing O-glycosyl compounds," "oxidation-reduction process," "metabolic process" and "oxidoreductase activity" (Table 3). Our results suggest that the hepatopancreas is closely related to the digestion and metabolism of food and indicated that a variety of hydrolytic enzymes may play an important role in the process of digestion.
A total of 148 pathways were found to be enriched with DEGs, and DEGs were primarily identified in the following GO categories: metabolic pathways (104, 63.4%); chemical carcinogenesis (18, 10.98%); metabolism of xenobiotics by cytochrome P450 (16, 9.76%); and drug metabolism -cytochrome P450 (14, 8.53%). Notably, up-regulated genes were mostly found in the metabolic pathways of starch and sucrose metabolism and glutathione metabolism, and only up-regulated genes were found in the pathways of galactose metabolism, caprolactam degradation, and carbohydrate digestion and absorption ( Table 3). The full lists of DEGs is shown in Additional file 7: Table S7 Collectively, these transcriptome sequences and pathway annotations provide an essential resource for further screening and expression analyses of candidate genes related to digestion and absorption in P. Canaliculata.  Table S8). In addition, 775 ORFs were predicted.

Analysis of digestive enzymes and genes in hepatopancreas
In this study, we focused on CDS (coding sequences) extracted from the putative ORF sequences to classify the functions of the hepatopancreas transcripts. Based on sequence alignment using the NR and UniProt databases and by referencing the results of the functional enrichment analysis of DEGs, the transcripts involved in plant feed digestion were analyzed and presented in Table 4. These transcripts were classified into four large categories as follows: hydrolase, transferase, isomerase and cytochrome P450. Among them, a variety of hydrolase genes were identified throughout our comparative study of satiety and starvation, including endoglucanase, beta-1,4-endoglucanase, cellulase, family 10 cellulase, cellulase EGX1, cellulase EGX3, and xylanase. These genes are involved in many metabolic processes, such as starch and sucrose metabolism, galactose metabolism, the TCA cycle, secondary metabolite biosynthesis, carbohydrate digestion and absorption, and the pentose phosphate pathway. Moreover, we identified P450 cytochromes (CYPs), which is a major family of enzymes involved in detoxification and metabolism. Previous studies have reported a correlation between increased exposure to metabolic neurotoxic pesticides and over-expression of P450 genes in many pest species [24][25][26][27][28][29][30][31].

Comparative analysis between the two hepatopancreas libraries
To identify the DEGs in the starving apple snails and satiated apple snails, we used the RPKM algorithm to compare the differences in gene expression. Based on an RNA-Seq (quantification) analysis, 929 unigenes in total were differentially expressed between the two samples, among which the number of unigenes with up-regulated  . 2 Gene Ontology analysis of hepatopancreas transcriptome data. The results are summarized for the three main GO categories; red bars refer to "biological process", blue bars refer to "cellular component" and green bars refer to "molecular function". The Y axis shows the number of unigenes in each category expression was 406, and the number with down-regulated expression was 523 in the starving apple snails compared with the satiated apple snails (as shown in Fig. 4). Many of the DEGs had only a 2-to 5-fold change in gene expression between the two samples. (The full list of DEGs is included in Additional file 9: Table S9).

Genes encoding secreted proteins that mediate digestive absorption
We concentrated on the most significant sequenced transcripts encoding secreted proteins. This de novo assembly of the hepatopancreas transcriptome of P. canaliculata provided us with valuable raw material for the identification and analysis of P. canaliculata hepatopancreas proteins. Similar to other herbivores, P. canaliculata produces various secreted proteins that are involved in the digestion of the meal or modulation of host defenses for survival. Among these modulatory molecules, cellulase, cellulase EGX1, cellulase EGX3, endo-beta-1, 4-glucanase, xylanase and G-type lysozyme enzymes are vital for P. canaliculata survival and feeding. We found that only the above six digestive and absorption-related enzymes were differentially expressed in this experiment, which may be related to the characteristics of P. canaliculata digestive organs themselves, the distribution of enzymes and the role of symbiotic gastrointestinal bacteria in digestion [32]. Our group is continuing to perform related research.
Quantitative expression and comparative analysis for genes associated with digestive absorption and host defenses qRT-PCR technology was used for the RNA-Seq screening of the six DEGs to validate the analysis. These genes include two up-regulated unigenes (contig473, endobeta-1, 4-glucanase; contig6717, xylanase) and four down-regulated unigenes (contig4257, cellulase; con-tig1040, cellulase EGX1; contig3816, cellulase EGX3; contig4913, G-type lysozyme) in starving apple snails compared with satiated apple snails. We used the fold change to reflect the differential expression of unigenes. The results of the above two methods were generally consistent (shown in Fig. 5).
In this study, cellulase EGX1, cellulase, cellulase EGX3 and G-type lysozyme were up-regulated in the hepatopancreases of fully fed P. canaliculata, and this finding is consistent with previous studies (xylanase [33], cellulase EGX3 [34], endobeta-1, 4-glucanase [35,36] and G-type lysozyme [37]), which indicates that these enzymes play an important role in the digestion and absorption of fibrous food sources. Cellulase [38] and cellulase EGX1 [36] may be involved in this process. These results showed that the hepatopancreatic secretion of proteins plays an important role in the regulation of plant digestion and utilization in P. canaliculata.

Conclusions
In this study, to explore the digestive function of P. canaliculata as a starting point, the de novo hepatopancreatic transcriptome of P. canaliculata was initially characterized using NGS technology and bioinformatics. The transcriptomic approach provides opportunities to identify pharmacologically active proteins and further insights into the digestive and metabolic functions of P. canaliculata. In total, 90,141 contigs and 87,766 unigenes have been discovered via high-throughput sequencing of the hepatopancreas. By comparing unigenes with the NCBI NR and UniProt databases,     5 qRT-PCR validation of the differentially expressed genes analyzed by RNA-seq. qRT-PCR was perform for nine genes that were identified as differentially expressed between the starving apple snails and satiated apple snails. The expression level of each gene was normalized to the level in the satiated snails. The Y axis shows the relative mRNA expression levels. P * < 0.05, P ** < 0.01