Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei

Crustins are crustacean cationic cysteine-rich antimicrobial peptides that contain one or two whey acidic protein (WAP) domain(s) at the carboxyl terminus and mainly show antimicrobial and/or proteinase inhibitory activities. Here, we performed genome and transcriptome screening and identified 34 full-length crustin-like encoding genes in Litopenaeus vannamei. Multiple sequence analysis of the deduced mature peptides revealed that these putative crustins included 10 type Ia, two type Ib, one type Ic, 11 type IIa, three type IIb, four type III, one type IV, one type VI, and one type VII. These putative crustins were clustered into different groups. Phylogenetic analysis, considering their domain composition, showed that different types of crustin-like genes in crustaceans might be originated from the WAP core region, along with sequence insertion, duplication, deletion, and amino acid substitution. Tissue distribution analysis suggested that most crustin-like genes were mainly detected in immune-related tissues while several crustin-like genes exhibited tissue-specific expression patterns. Quantitative PCR analysis on 15 selected crustin-like genes showed that most of them were apparently upregulated after Vibrio parahaemolyticus or white spot syndrome virus (WSSV) infection. One type Ib crustin-like gene, mainly expressed in the ovary, showed the highest expression levels before the gastrula stage and was hardly detected after the limb bud stage, suggesting that it was a maternal immune effector. Collectively, the present data revealed the molecular and functional diversity of crustins and their potential evolutionary routes in crustaceans.


Introduction
Antimicrobial peptides (AMPs) are key elements of the innate immune system. AMPs are widely present in all multicellular organisms and exhibit a broad spectrum of activities against bacteria, fungi, yeast, protozoa, and viruses [1]. As of March 2020, as many as 3175 AMPs have been reported in the Antimicrobial Peptide Database (http://aps.unmc.edu/AP/main.php). These AMPs show tremendous sequence diversity, which suggests that organisms employ this mechanism to adapt to microbial challenges in different environments [2]. In invertebrates, various types of AMPs are effective components in the host innate immune system to protect them against pathogen infection. In crustaceans, especially decapods, different AMP families including penaeidins, anti-lipopolysaccharide factors, crustins, stylicins, etc., have been identified and characterized [3][4][5][6]. encoded all reported types of crustins except type V and also four new types or sub-types of crustins. Phylogenetic analysis reveals their possible evolution routes and expression analysis suggests their multiple immune functions in shrimp.

Identification and Sequence Characterization of Crustin-Like Genes in L. vannamei
After WAP domain query and sequence analysis in the genome and transcriptome database of L. vannamei, a total of 34 genes which encode full-length open reading frames (ORFs) of crustin-like genes were identified ( Table 1). Five of them have been previously characterized in the directly submitted National Center for Biotechnology Information (NCBI) database or published literatures. Fifteen of them (including three previously characterized sequences) have been annotated in our previous shrimp genome database, while most of them have not been characterized. Seventeen of them were newly identified sequences from the assembled transcriptome database. Among the 34 sequences, 33 (except LvCruIII-4) could be mapped to the genome database, whereas 10 of them were incomplete (lacking the first 2 to 5 nucleotides in their ORF) based on the current version of the shrimp genome database. The mapping information of each gene can be found in the Supplementary Table S1. Note: N/A represents lack of annotated genes in the shrimp genome data. The abbreviations represent signal peptide (SP), cysteine-rich region (Cys), WAP domain (WAP), short glycine-rich region (sGly), long glycine-rich region (lGly), short N-terminal region (sN), and serine/leucine-rich region (Ser/Leu), respectively.

Identification of New Types and Sequence Characterization of Putative Crustins in L. vannamei
According to the amino acids and domain composition of the deduced peptides, these 34 putative crustins could be categorized into six types, including types I, II, III, IV, VI, and VII ( Figure 1). Type I was then divided into three sub-types including types Ia, Ib, and Ic. The domain composition of type Ia, composed of signal peptide, cysteine (Cys)-rich region, and the WAP domain, was the same as that of previously reported type I crustin. The domain composition of types IIa, IIb, and IV were all identical to previously reported types or sub-types of crustins. Four transcripts were identified encoding type III crustins. Among them, only LvCrustin III-3 contained a Pro/Arg-rich N-terminal region before its WAP domain, whereas the other three type III crustins had a short N-terminal region without Pro/Arg-rich characteristics. Type Ib, Ic, VI, and VII crustins were four newly identified types or sub-types. Type Ib crustin contained a longer carboxyl terminal region (more than 30 aa) when compared to other type I crustins. Type Ic crustin contained two linked Cys-rich regions before the WAP domain. Type VI crustin was composed of signal peptide, glycine (Gly)-rich region, and the WAP domain. Type VII crustin contained the signal peptide and the WAP domain, and a serine/leucine (Ser/Leu)-rich region (32.8% of the total residues in this region, Figure 2) between them. Further searching of new identified type and subtype crustin-like genes in the transcriptome database from another shrimp, Fenneropenaeus chinensis [32], identified two type Ib (accession numbers: MT375591 and MT375592), one type Ic (accession number: MT375593), and one type VI crustin (accession number: MT375594), which showed high sequence similarity with those in L. vannamei ( Figure 3).   Table 1. Different sequence features including glycine-rich region, cysteine-rich region, whey acidic protein (WAP) domain, and C-terminal tail are displayed with columns under the alignment. The first cysteine-rich region of LvCrustin Ic and the serine/leucine-rich region of LvCrustin VII are boxed and underlined, respectively. Identical and similar residues are shown in dark. The second and seventh conserved cysteine sites of three atypical WAP domains are shown with stars (*) above the alignment. The cysteine, glycine, serine, and leucine residues are colored. The pI/Mw values, type, and domain composition of each putative crustin are exhibited at the end of the alignment.  The mature peptides of the shrimp putative crustins exhibited a wide range of theoretical isoelectric point (pI) and molecular weight (Mw) values ( Figure 2). The pI values of mature peptides ranged from 4.15 to 9.69 and the Mw values ranged from 5.87 to 28.29 kD. The Cys-rich regions in all identified putative crustins shared a "CX 2~3 CX 7~13 CC" formula, where C showed the cysteine residue and X n showed other residues with length ranges. The WAP domains comprised eight cysteine residues complying with a "CX 5~9 CX 6~31 CX 5 CX 5~7 CCX 3~4 CX 3~6 C" formula, where the general lengths of the second, fourth, fifth, and last X were 9~12, 5, 3, and 5, respectively. Three putative crustins, including Ia-1, Ia-9 and III-1, had an atypical WAP domain, in which the second and seventh cysteine residues were mutated. In the WAP domain of crustin VII, two extra cysteine residues existed.

Phylogenetic Analysis of Different Types of Putative Crustins in Shrimp
The WAP domains from the 34 identified putative crustins in L. vannamei were clustered by the neighbor-joining method. As shown in Figure 4, different types of putative crustins were classified into different groups. Except for Ia-2, type I crustins exhibited close relationship with each other. Type III and IV crustins also showed close relationships with type I crustins. Type II crustins were all clustered into one big group, together with Ia-2, and type VI and VII crustins. The phylogenetic analysis of the WAP domains from different species showed that WAP domains from different types of putative crustins were categorized into four branches, branch A to D ( Figure 5). Branch A WAPs, which exhibited the closest phylogenetic relationship with outgroup WAP domains, were from type Ic, III, and IV crustins. Brand B and C WAP domains were from type Ia and Ib crustins. Branch D WAP domains were from type II, VI, and VII crustins.

Spatial and Temporal Distribution of Crustin-Like Transcripts
Tissue distribution analysis of different types of crustin-like genes showed that most of them were widely expressed in several tissues, which were mainly immune-related tissues, including the epidermis, stomach, gill, intestine, and hemocytes ( Figure 6). Several crustin-like genes exhibited tissue-specific expression patterns. Among them, Ib-2 was mainly detected in ovary, IIa-8, IIa-9, and IIa-10 were stomach specific, IIb-1 was mainly in eyestalk, and III-4 exhibited hepatopancreas-specific expression pattern. Among different developmental stages, most crustin-like genes showed an increase of expression level in shrimp after hatching (from nauplius or zoea stages, Figure 7). Two crustin-like genes, Ia-1 and Ia-5, also had a temporally high expression in the embryonic stage. In particular, the crustin-like gene Ib-2 was mainly detected in the early embryonic stages before gastrula.

Immune Responses of Crustin-Like Transcripts on Vibrio parahaemolyticus and WSSV Infection
In order to know whether these crustin-like genes participated in pathogen infection, 15 genes with a relatively high expression level in their target tissues (RPKM > 100 in the most abundant tissue) were selected to perform expression analysis in shrimp after V. parahaemolyticus or WSSV infection. As shown in Figure 8 and Table S2, the expression levels of all tested crustin-like genes were obviously changed after V. parahaemolyticus or WSSV infection. They were upregulated at different time points, mainly at 3, 12, and 24 hpi, and exhibited similar trends in shrimp after V. parahaemolyticus infection ( Figure 8A) or after WSSV infection ( Figure 8B). An exception was the crustin-like gene Ib-1, which was downregulated after V. parahaemolyticus infection ( Figure 8A), while being up-regulated after WSSV infection ( Figure 8B).   Table S2.

Discussion
Crustins are important immune effectors with diversity and various functions in crustaceans and some insects. Previously identified crustins are categorized into five types, of which type I to IV crustins are from crustacean species, whereas type V crustins only exist in some ant genomes [6,23]. In the present study, as many as 34 genes encoding full-length ORF of crustin-like genes, including all reported crustin types in crustaceans, as well as two new types (types VI and VII) and two new sub-types (types Ib and Ic), were identified in the shrimp, L. vannamei, which was much more than those in other species and greatly enriched the diversity of crustin genes. However, the present identified crustin-like genes do not contain type V crustin either. Besides multiple protein-coding genes in the genome, crustins also contain abundant polymorphic sites. In the crab, Portunus trituberculatus, a total of 87 SNPs and 7 indels were obtained from a 1073 bp DNA sequence encoding PtCrusin2 [33]. Actually, many polymorphic sites were also identified from most identified crustin-like genes in the present study (data not shown), which greatly increased the polymorphism of crustin genes. Although AMPs have been always regarded as immune effectors which show wide activities against different microbes, more and more studies reveal that some AMPs also exhibit exquisite specificity against certain pathogens [34]. In addition, different AMPs could combinate together to defend against pathogen infection [35]. We guess that a large number of crustin-like encoding genes are essential for the shrimp to defend against different pathogens in the marine environment.
Classification of crustins into different types is mainly based on the variable N-terminal region of mature peptides [10]. The C-terminal region WAP is the identical and functional domain of crustins. The phylogenetic analysis based on the WAP domains of all putative crustins from L. vannamei revealed that members from the same type usually had a closer relationship, like type II crustins and most type I crustins. Type III and IV crustins also exhibited close relationship with type I crustins. However, type VI and VII crustins showed close relationship with type II crustins. Moreover, all putative crustins from types I, III, and IV, except LvCrustin Ic, have a short sequence length before their WAP domains. In contrast with them, crustins from types II, VI, and VII have a relatively long N-terminal region. However, LvCrustin Ia-2, which was closely related with type II crustins, was an obvious exception in the phylogenetic data. This might be due to a fast rate of amino acid substitutions in the WAP domain of LvCrustin Ia-2, which was reported to be an important and common mode making crustins exhibit functional diversification [36].
Existence of multiple crustin genes in one species provides a basis to study the evolution of these kinds of genes in crustaceans. In a recent study, an evolution analysis on the WAP domain containing proteins in crustaceans deemed that crustins were generated into two groups by acquisition of Pro/Arg-rich region (type III) or Cys-rich region (type I) before the WAP domain, respectively. Type II crustins were generated by insertion of Gly-rich region in some type I crustins [37]. The present domain composition and phylogenetic analyses supported the viewpoint that type II crustins were generated from type I crustins by Gly-rich region insertion. However, when looking through the amino acid sequences, we could find that there is a short N-terminal region located in front of the WAP domain of most crustins. Some crustins contain several Proline residues in this short region while others don't, which is similar with that of the short N-terminal region in different type III crustins. Therefore, considering the phylogenetic analysis data, we propose that type III crustins are the ancient type which generates type I crustins.
Based on the above discussion, we proposed an evolutionary route of crustin-like genes in crustaceans ( Figure 9). The ancestral WAP core region might be inserted with a short N-terminal region to generate several ancient crustins, the type III crustins. Type Ia crustins might have originated from type III crustins by insertion of a Cys-rich region at the N-terminal, while type II crustins might be generated from type Ia crustins by a subsequent insertion of Gly-rich region before the Cys-rich region. Some type Ia and IIa crustins might also be generated from type gene duplication. The type Ic crustin might be generated by insertion of two Cys-rich regions before the WAP domain. The type IV crustin might be generated by WAP domain duplication because its two WAP domains exhibited the closest relationship compared to other WAP domains, which was also proposed to explain the existence of multiple WAP domain-containing proteins in crustaceans [37]. Type Ib crustins might come from type Ia crustins through a substitution of the stop codon generating a longer C-terminal region. For example, the substitution of the stop codon of LvCruIa-1 could lead to an extension of 39 aa at the C-terminal, which would become a new member of type Ib crustins. Seeing as there is a close phylogenetic relationship, type VI and VII crustins might be generated from type II crustins by deletion of the Cys-rich region and substitution of the Gly-rich region, respectively. However, comparative studies with broad taxonomic sampling and rigorous phylogenetic analysis are needed to clarify the evolutionary relationship of crustin genes in crustaceans.
Although crustins are only reported in crustaceans and a few insect species, the WAP four-disulfide core (WFDC) proteins are ubiquitous in many kinds of vertebrate and invertebrate animals [38]. Many secretory WFDC proteins exhibit antimicrobial activities, such as the antileukoprotease in humans [8] and waprins in snakes [39]. In crustaceans, crustins are regarded as the important immune effectors which participate in the first line of host defense to combat any invaders [40]. The tissue distribution and developmental expression patterns, as well as the immune responses to Vibrio and WSSV infections, of the crustin-like genes in L. vannamei entirely supported the opinion. Crustins in other crustaceans were also responsive to Vibrio or WSSV infection, suggesting their important roles as a kind of AMP [6]. Identification of many novel crustin-like genes in L. vannamei provides a useful source of peptides for developing new antimicrobial drugs, which will be further investigated. What is noteworthy is that one crustin-like gene, LvCrustin Ib-2, is mainly distributed in the ovary ( Figure 6) and expresses in the early developmental stages before gastrula (Figure 7), suggesting its possible role as a maternal immune effector. Previously, we found that many immune-related genes, including 13 crustin encoding genes, might be involved in immune protection during shrimp molting [31]. These crustins, which were mainly detected in the epidermis, stomach, and gill, showed the lowest expression levels during late premolt and increased expression levels immediately after ecdysis [31]. In the green lip abalone Haliotis laevigata, the Perlwapin, a secreted nacre protein with three WAP domains, could bind mineral crystals in the shell matrix and play a major role in shell formation by inhibiting calcium deposition [41]. These molting-related crustins might also play functions in calcium deposition in shrimp, which needs to be further investigated.

Database
The RNA-seq data from different tissues, developmental stages and molting stages of the shrimp Litopenaeus vannamei were downloaded from NCBI (SRA SRR1460493−SRR1460495, SRR1460504−SRR1460505, and SRX1098368−SRX1098375). To get high-quality clean reads, algorithms were run for removing empty reads, adaptor sequences, and low-quality sequences. The clean reads of each group were then assembled into Unigenes using the RNA-Seq de novo assembly program Trinity (v2.8.2, https://github.com/trinityrnaseq/trinityrnaseq/releases) with default parameters. The gene abundances were calculated and normalized to reads per kilobase per million reads (RPKM).

Sequence Identification
The amino acid sequences decoding WAP domain of different types of crustins (accession numbers: BAD15063, AAZ76017, ABP88042, ABV25094, ABW82154, ACL97374, ADC32522, ACT82963, ACZ43782, AAS59736, ACQ66004) were downloaded from NCBI protein database (https://www.ncbi. nlm.nih.gov/protein). The WAP domain sequences were used as query sequences to perform a tblastn search (E-value < 10 −5 ) using the L. vannamei genome database (http://www.shrimpbase.net/vannamei. html) and transcriptome databases. The acquired nucleotide sequences were analyzed with ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) to predict their open reading frames (ORFs). The deduced amino acid sequences were then analyzed by InterPro (http://www.ebi.ac.uk/interpro/) and SignalP (http://www.cbs.dtu.dk/services/SignalP/) to predict the domains and signal peptide sequences of these deduced amino acid sequences. Those with full-length ORFs, signal peptides, and one or two WAP domains, while not containing other domains, were deemed as crustin candidates. These crustin candidates were then analyzed by performing a functional annotation with blastp on NCBI (https://www.ncbi.nlm.nih.gov). The sequences which hit a reported crustin or crustin-like sequence with the E-value < 10 −5 were identified as crustin-like genes in shrimp. The identified crustin-like genes were then re-mapped to the genome database at the NCBI website (https://www.ncbi.nlm.nih.gov/) to obtain the gene encoding sequences by performing a blastn search with the E-value < 10 −5 .

Sequence and Expression Analysis
The deduced mature peptides of identified putative crustins were analyzed by performing a multiple alignment using online ClustalW2 software (http://www.ebi.ac.uk/Tools/msa/clustalw2/) with manual modification. The cysteine-rich region was assured by four conserved cysteine residues. Glycine or other amino acid-rich regions were predicted by analysis of amino acid composition using an online ProtParam tool (https://web.expasy.org/protparam/). Cluster analysis was performed with MEGA5 software (https://www.megasoftware.net/) under the neighbor-joining (NJ) method using the WAP domains of the identified L. vannamei putative crustins. A phylogenetic tree was constructed with the same method using the WAP domains of crustins from different crustaceans and the ant, and the WAP domains of Perlwapin from the abalone Haliotis laevigata (accession number: P84811) and Haliotis asinina (accession number: P86730) as the outgroup.
As part of the identified crustin-like genes were not annotated in the shrimp genome database, the expression profiles of these crustin-like genes were analyzed using the assembled transcriptome data. The RPKM values of these crustin-like genes in different samples including tissues, developmental, and molting stages of the shrimp were achieved from the transcriptome data. The heatmaps were drawn with the online software Morpheus (https://software.broadinstitute.org/morpheus/).

Animals, Infection, and Tissue Collection
Healthy Pacific whiteleg shrimp with a body weight of 9.5 ± 0.4 g were collected from our laboratory (Qingdao, China). The shrimp were cultured in culture tanks filled with aerated seawater at 26 • C and fed thrice daily with artificial food pellets for three days to acclimatize to laboratory conditions. A total of 135 shrimps were used for infection experiment. The animals were randomly divided into three groups including the PBS group (negative control), Vp group, and WSSV group, with 45 shrimps in each group. In the PBS group, 10 µL PBS was intramuscularly injected into each individual at the 5th abdominal segments. In the Vp group, 10 µL PBS containing 10 7 CFU formalinized Vibrio parahaemolyticus was injected into each individual. In the WSSV group, 10 µL PBS containing 1000 copies of WSSV particles was injected into each individual. The hepatopancreas, intestine, stomach, and gill from 9 individuals from each group were sampled as three replicates at 3, 6, 12, and 24 h post WSSV injection (hpi) for RNA extraction, respectively. The hemolymph was collected from the ventral sinus located at the first abdominal segment using a syringe with an equal volume of precooled anticoagulant solution (115 glucose, 27 sodium citrate, 336 NaCl, 9 mmol L −1 EDTA·Na 2 ·2H 2 O, pH 7.4). Then the hemocytes were harvested by centrifugation at 800 g, 4 • C, for 10 min and preserved in liquid nitrogen.

RNA Extraction and cDNA Synthesis
Total RNA from each sample was isolated with RNAiso Plus (TaKaRa, Kyoto, Japan) following the manufacturer's instructions. The RNA quality was assessed by electrophoresis on 1% agarose gel. About 1 µg total RNA of each sample was first treated with RQ1 RNase-Free DNase (Promega, Madison, WI, USA) and then used to synthesize cDNAs by PrimeScript™ 1st strand cDNA Synthesis Kit (TaKaRa, Kyoto, Japan) with random 6 mers.

Quantitative PCR and Data Analysis
Fifteen crustin-like encoding genes were selected to detect their immune responses against V. parahaemolyticus or WSSV challenges. The qPCR primers (Table 2) of the selected nucleotide sequences were designed with Primer Premier 5.0 (Premier Biosoft, USA). The qPCR reactions were carried out using SYBR (TOYOBO, Japan) and 18S rRNA gene was used as the internal control gene. The relative gene expression levels were calculated using the comparative Ct method with the formula 2 −∆∆Ct . All data were acquired from at least three parallel tests in separate tubes. An independent sample t-test was used to analyze the difference between the data from the challenge group and PBS group at each time point by SPSS 21.0. Significant differences were considered at p < 0.01.

Ethical Statement
The present study used shrimp as experimental animals, which are not endangered invertebrates. In addition, there was no genetically modified organism used in the study. According to the national regulation (Fisheries Law of the People's Republic of China), no permission was required to collect the animals and no formal ethics approval was required for this study.

Conclusions
In conclusion, the present study identified 34 transcripts encoding full-length crustin-like genes from the shrimp L. vannamei. They were classified into six types based on their N-terminal regions, with two newly reported types and two new sub-types. The domain composition and phylogenetic analysis results indicated that different types of crustins in crustaceans might have originated from the WAP core region, along with sequence insertion, duplication, deletion, and amino acid substitution. The supposition provided a primary understanding of crustin evolution in crustaceans, which need more evidence to be supported. These multiple genes and their expression patterns reveal the sequence and function diversification of crustins in crustaceans. Future investigations on the antimicrobial activities and biological functions of newly reported crustins are needed to enrich the understanding of the physiological roles of crustins in crustaceans, as well as develop new antimicrobial drugs.