The tissue-specific transcriptome in P. b. seulensis third instar larvae
To expand global gene expression profiles in P. b. seulensis, high-throughput RNA-sequencing (RNA-seq) was performed on fat body, midgut, muscle, and hematocyte tissue types (Fig. 1A). Approximately 668 million raw data reads were produced, with an average of 134 million reads/tissue. Before aligning short reads, adapter sequences and low-quality reads were eliminated in preprocessing steps, after which approximately 602 million high-quality reads were acquired. Clean reads were aligned to the P. b. seulensis reference genome to estimate gene expression profiles in libraries. In total, approximately 560.4 million reads, accounting for 93.09% of total clean reads, were mapped to the P. b. seulensis genome. Approximately 6.91% of reads did not align with the reference genome, probably because alignment parameters were too strict, or sequencing or assembling errors, or gaps or alternative splicing in the reference genome. Of the total reads, 70.20%, 38.70%, and 54.38% were perfect, unique, and multiple position matches, respectively.
Quality assessment of tissue-specific RNA-seq
To measure gene expression levels, high-quality reads from the four-tissue transcriptomes were mapped using RSEM and Bowtie2 [17], and transcripts measured as FPKM values (Table S1). Expression profiles had stable distribution patterns in each tissue (Fig. 1B). In total, 448,676 transcripts were expressed in the four tissues and were above threshold levels (FPKM > 0.3) (Table S1). A heat map (Fig. 1C) showed that transcriptomes varied substantially between tissues. Additionally, a hierarchical clustering dendogram of expressed genes/samples across the four tissue types indicated that intra-group tissue samples were the most similar. This meant that specific genes characterized particular tissue types, and that gene subgroups were expressed in all tissue (Fig. 1C). In particular, fat body-expressed genes exhibited the most distinct expression patterns. PCA was used to analyze transcript abundance quantitative values, with results used to perform tissue clustering. The first principal component (PC1) accounted for 35.06% of observed variance and the second principal component (PC2) represented 27.44% of variance. Fundamental differences in RNA abundance between the four tissues generated clear differences between them (Fig. 1D). Squared Pearson’s correlation coefficient values were all > 0.8 in triplicate experiments and indicated that gene expression measurements were highly reproducible. Differentially expressed genes and gene expression trends were investigated by examining expressed genes in all tissue.
Tissue similarity and specificity analyses
We analyzed tissue-specific expression using reads across the whole transcript length, rather than selecting the 3’ end of mRNAs using polyA tails. Expressed genes were defined as those with (1) an average log (RPKM) > 2 and (2) each tissue replicate having a log (RPKM) > 1. We detected 8,437 hematocyte, 7,691 muscle, 7,191 fat body, and 9,604 midgut protein-coding genes (Fig. 2A) – and 5,360 genes were expressed across all tissues. The midgut expressed the largest number of tissue-specific genes (7,366), followed by muscle (1,935), hematocytes (1,814), and the fat body (1,052). Multidimensional scaling analyses suggested that samples clustered closely with similar tissue types, i.e., the fat body and muscle were the most closely related tissue types (Fig. 2B). Expression correlation analyses showed that the fat body had the highest correlation score with muscle (0.70), but lower scores with hematocytes (0.08) and the midgut (0.11). Interestingly, high-abundance genes were highly tissue-specific: 20% in muscle, 25% in midgut, 44% in hematocytes, and 11% in the fat body. Based on Pearson’s correlation analyses of tissue-enriched gene expression, hematocytes were the most distinct tissue overall, while fat body and midgut samples were more distinct from one another. Subsampling analyses, which were used to determine if sequencing had sufficient depth, suggested that gene expression estimates were stable across multiple sequencing depths, and thus gene expression differences represented true differences between tissues.
Tissue-specific functional analyses - tissue-enriched genes characterize individual tissues
We identified significantly enriched genes in specific tissues (FDR > 0.05, log (Fold change) > 2, Fig. 3). Importantly, additional tissue-enriched functions provided further insights on tissue-specific gene expression/regulation. For example, tissue-enriched genes in muscle were associated with muscle contraction, actin filament-based processes, cell migration, calcium ion homeostasis, actin cytoskeleton organization, integrin signaling, and other similar processes, while fat body tissue-enriched genes were associated with well-characterized immune functions, learning and memory, synaptic transmission, and neurotransmitter transport (Fig. 3). Gene ontology (GO) analyses of the midgut suggested that, in addition to expected digestion (peptidase activity and proteolysis), responses to bacteria (defense responses and biotic stimulus responses), and lipid metabolism (localization and storage) categories, terms were also associated with the molting cycle and driven by high collagen (several) expression (Fig. 3). P. b. seulensis hematocytes are relatively uncharacterized and are implicated in collagen expression in the cuticle and molting during development. These GO terms represented our data (Fig. 3) and suggested that hematocytes, a syncytial tissue comprising a substantial fraction of the adult’s body, may be major metabolic processing sites in P. b seulensis.
Identifying orthologous human liver disease-related genes in P. b. seulensis
In humans, P. b. seulensis is used in traditional medicine to improve liver conditions, therefore, we compared tissue-specific transcriptional profiles with known mutated genes in human liver disease. We first identified P. b. seulensis orthologs of human liver disease-related genes using OMIM database. We identified 1,993 liver disease-associated genes in P. b. seulensis, from which 521 human orthologs were found. We next calculated gene expression fold changes in larvae when compared with adults. Liver protection functions in P. b. seulensis occur at larval stages and not in adults. We found 11 genes which were expressed across muscle, fat body, midgut, hematocytes, and adults, and which were differentially expressed between larval and adult stages (Fig. 4A). The majority were down-regulated in larvae when compared with adults, with the notable exception of the Svil-like gene, which was highly up-regulated in muscle, hematocytes, and the midgut. In hematocytes, Svil-like gene expression in larvae was four times higher when compared with adults. From the Drosophila genome database FlyBase (https://flybase.org/), Svil functions in actin filament and phosphatidylinositol-4,5-bisphosphate binding activities, and is predicted to function in actin filament severing, actin polymerization or depolymerization, barbed-end actin filament capping, and actin cytoskeleton activation in human (Fig. 4B). Svil-like gene is expressed in the presumptive embryonic/larval system, in several structures, including the ectoderm anlage, testis, and trunk ectoderm. The human ortholog is called supervillin (SVIL) and is implicated in myofibrillar myopathy [18]. SVIL is tightly associated with actin filaments and plasma membranes, suggesting a high-affinity link between these structures. The encoded protein appears be implicated in myosin II assembly during cell spreading and in focal adhesion disassembly.
Constructing a phylogenetic co-occurrence matrix using 11 liver disease-related genes
A phylogenetic profile was created by projecting a species tree onto a binary one-dimensional vector with each extant species represented by a single element with a value of 1 if an ortholog is present, while keeping closely related species adjacent (177 species; Fig. 5). Resulting profiles were used to identify other phylogenetic profiles with similar gene presence and absence patterns. The SVIL-like gene only existed in the crustacean clade, while the other 10 genes, with liver disease-related functions, showed conservation patterns across three kingdoms or metazoans. Hepatoprotective functions in P. b. seulensis were probably acquired by specific Svil-like gene gains and unique interactions with surrounding proteins. This cross-species, disease-related gene profile analyses provided useful insights on P. b. seulensis hepatoprotective effects in humans.
Sequence and structure analysis of the Svil-like gene in P. b. seulensis
The PRBRE25CG0210, PRBRE25CG0180 gene of P. b. seulensis was orthologous with the Svil-like gene in P. b. seulensis (Fig. 4A), and form its sequence, we analyzed Svil-like gene domain architecture. The PRBRE25CG0210 have 807 amino acid (Fig. 6A). We identified Laminin, EGF-like, Cysteine-rich, and transmembrane domain repeats (Fig. 6B). Laminins form independent networks and are associated with type IV collagen networks via entactin, fibronectin, and perlecan. They also bind to cell membranes via integrins and other plasma membrane molecules, including the dystroglycan glycoprotein complex. Cysteine-rich domains are characterized by disulfide bridges and commonly demonstrate end-to-end macrocyclization. They have been successfully used as stable scaffolds for molecular grafting, a protein engineering process where cysteine-rich structures provide higher thermodynamic and metabolic stability to epitopes, thus providing new biological functions. By analyzing 3-dimensional structures, we used Modeller and I-TASSER which is based on homology and comparative modeling of a given protein sequence. protein three-dimensional structures (Fig. 6C). Sequence was screened in the pdb library for finding most similar structure and found that it's a 1D4X protein. The function of 1D4X is contractile protein. As a result, one main helix and two sheets were found, and it was found that there were a lot of loops. However, PRBRE25CG0180 could not obtain any structural library screening results.
Functional enrichment analyses of 92 highly co-expressed genes with Svil-like gene
We performed functional enrichment analysis of co-expressed genes with the Svil-like gene. A Pearson’s co-expression correlation of 0.7 was selected as a cut-off value, and 92 genes were selected. It was hypothesized that having the same expression patterns as the Svil-like gene would mean that other genes cooperatively participated in the same function. Consequently, glycolysis, mitochondrial localization, ATP generation, cuticle development, and chitin binding functions were highlighted (Table 1).
Table 1. Function enrichment analysis of 92 genes highly co-expressed genes with Svil-like gene