Similarities between decapod and insect neuropeptidomes

Background. Neuropeptides are important regulators of physiological processes and behavior. Although they tend to be generally well conserved, recent results using trancriptome sequencing on decapod crustaceans give the impression of significant differences between species, raising the question whether such differences are real or artefacts. Methods. The BLAST+ program was used to find short reads coding neuropeptides and neurohormons in publicly available short read archives. Such reads were then used to find similar reads in the same archives, and the DNA assembly program Trinity was employed to construct contigs encoding the neuropeptide precursors as completely as possible. Results. The seven decapod species analyzed in this fashion, the crabs Eriocheir sinensis, Carcinus maenas and Scylla paramamosain, the shrimp Litopenaeus vannamei, the lobster Homarus americanus, the fresh water prawn Macrobrachium rosenbergii and the crayfish Procambarus clarkii had remarkably similar neuropeptidomes. Although some neuropeptide precursors could not be assembled, in many cases individual reads pertaining to the missing precursors show unambiguously that these neuropeptides are present in these species. In other cases, the tissues that express those neuropeptides were not used in the construction of the cDNA libraries. One novel neuropeptide was identified: elongated PDH (pigment dispersing hormone), a variation on PDH that has a two-amino-acid insertion in its core sequence. Hyrg is another peptide that is ubiquitously present in decapods and is likely a novel neuropeptide precursor. Discussion. Many insect species have lost one or more neuropeptide genes, but apart from elongated PDH and hyrg all other decapod neuropeptides are present in at least some insect species, and allatotropin is the only insect neuropeptide missing from decapods. This strong similarity between insect and decapod neuropeptidomes makes it possible to predict the receptors for decapod neuropeptides that have been deorphanized in insects. This includes the androgenic insulin-like peptide that seems to be homologous to drosophila insulin-like peptide 8.


INTRODUCTION
Lobsters, crayfish, prawns, crabs and shrimps are all crustaceans belonging to the order of the decapods. Many of these species are part of the human diet, are sometimes a major source of protein and are often considered a delicacy. While some species are caught in the wild, others, e.g., the freshwater prawn Macrobrachium rosenbergii, are mainly obtained from commercially important cultures. Many of these species are also sufficiently large to allow physiological experiments that are more difficult to perform on smaller arthropods. For these reasons, decapods probably constitute the second best studied group of arthropods after insects. Neuropeptides have also been extensively researched in decapods, and several neuropeptides were initially identified in these crustaceans before they were found in other arthropods such as insects (e.g., Kegel et al., 1989;Stangier et al., 1987;Stangier et al., 1992).
I have previously used the published short read archives for P. clarkii to look for neuropeptide transcripts and could deduce complete or partial sequences for a surprisingly large number of neuropeptide precursors (Veenstra, 2015). When comparing the results obtained in this species, with the lists of neuropeptide transcripts from other decapods, differences appear. While several neuropeptides are consistently found in all species, others are only identified in some. The question is whether these differences are real or represent artefacts. For example, some peptides may not have been searched for in the assembled reads, or there were simply too few reads to allow assembly of a contig, while in other cases the tissue where the particular gene is predominantly expressed was perhaps not included in the analysis. Here I try to answer these questions by reanalyzing published short sequence read archives for a number of decapods.
The E. sinensis genome was downloaded from http://gigadb.org/dataset/100186, made into a BLAST database and searched for neuropeptide genes as described previously (Veenstra, 2014).

Data analysis
The fasta files were extracted from the SRAs using the fastq command from the SRA Toolkit from NCBI and then made into BLAST databases using BLAST+ (Camacho et al., 2009). Using the P. clarkii predicted neuropeptide precursors, as well as a few other peptides, as queries those databases were then searched using the tblastn command. A few neuropeptide receptors were also analyzed. Identified reads that appeared to belong to the orthologous gene were extracted from the database and then used to identify similar reads using the blastn command. The latter were used as input for the Trinity program (Grabherr et al., 2011) and resulting transcripts were recursively used as input until either the transcript stopped increasing in length or it was judged to be complete based on the location of in-frame stop codons and/or a signal peptide at the N-terminal of the protein predicted from the transcript. Calculations were run on a desktop computer with a AMD FX TM -6100 six-core processor and 15.4 Gb of memory under Ubuntu Linux.
This method is very efficient for the extraction of transcripts from single copy genes. However, when there are several paralog genes that have not evolved a lot since their separation, some paralogs may be missed, particularly when their expression levels are low. In those cases, a selection of the particular neuropeptide precursors from which the non-conserved regions (such as the signal peptides) had been removed was used as a query in a tblastn command and all the obtained reads were then fed as input to the Trinity program. It can not be excluded that some less well expressed paralogs of those genes that exist in multiple copies have been missed. This may concern neuroparsin, CHH (crustacean hyperglycemic hormone), PDH (pigment dispersing hormone) and possibly CFSH (crustacean female sex hormone).
Clustal Omega (Sievers et al., 2011) was used for sequence alignments and those were inspected and, when needed, manually corrected using Seaview (Gouy, Guindon & Gascuel, 2010), which was also used to extract the regions for making phylogenetic trees with FastTree (Price, Dehal & Arkin, 2010).

RESULTS
Trinity is a fantastic tool to reconstruct large DNA sequences from very short reads. However, not every sequence corresponds necessarily to a correct cDNA sequence or is biologically interesting. One regularly finds more than one sequence derived from the same gene. In the absence of a genomic sequence, as is the case here, it is not always possible to determine which is the correct one. There are several common causes for the failure to produce a single complete cDNA sequence. First, there may simply be insufficient reads available to produce a complete contig. Secondly, there may be allelic variation that causes the elongation to stop. Thirdly, alternative splicing, as is the case for genes encoding the agatoxin-like peptide, Neuropeptide F 1, CNMamide, calcitonin and CHH, may have the same effect. Fourth, recombining short sequences into a long one becomes very difficult in the case of repetitive sequences. One or more reads containing a sequencing error can aggravate some of the other problems, i.e., lack of sufficient reads, alternative splicing or allelic variants.
Most of the data analyzed here come from natural or almost natural populations that show much larger genetic variation than that found in the typical laboratory animals like mice or rats. Furthermore, many neuropeptide genes code for a number of highly similar neuropeptide paracopies and this makes it no doubt difficult to reconstruct the complete cDNA encoding such precursors and when the various copies are only separated by convertase cleavage sites, the problem may become acute. In one attempt to produce the E. sinensis FMRFamide precursor mRNA Trinity produced a partial transcript that had a perfect internal repeat of 164 nucleotides (Fig. S1), that must have been an artefact; a similar phenomenon is also present in the second predicted orcokinin precursor from S. paramamosain (Bao et al., 2015;Fig. S1). Furthermore, I have previously shown that some neuropeptide genes have alleles that differ in the number of neuropeptide paracopies that they encode (Veenstra, 2010a;Veenstra 2015). It is therefore not surprising that a relatively large number of transcripts for neuropeptide precursors containing multiple paracopies, such as FMRFamide, tachykinin, leucokinin, EFLamide etc, are incomplete even though significant numbers of individual reads are found in the various SRAs. Predictions by Trinity of neuropeptide precursors containing various paracopies may, for the same reasons, contain errors. For example, the allatostatin A precursor from C. maenas does not code for some of the previously identified peptides from this species (Duve et al., 1997), while the Trinity transcripts of several other neuropeptide precursor sequences from the same species that have been obtained by screening of classical cDNA libraries are identical (Klein et al., 1992;Klein et al., 1993;Linck et al., 1993;Chung et al., 2006;Wilcockson & Webster, 2008). Other transcripts that are incomplete are often due to low expression levels.
While this work was in progress a draft genome for E. sinensis sinensis was published (Song et al., 2016). This sequence was prepared using short sequence reads and therefore suffers from the problems associated with this methodology (Richards & Murali, 2015).
It is estimated that about 67% of total sequence is present in the current draft. Several of the transcripts identified here are not at all or only partially present in this genome and different exons of the same transcripts are regularly found on different contigs. Its usefulness was, therefore, limited.
The decapod neuropeptide genes that were found are indicated in Fig. 1, where for comparison the presence of neuropeptide genes of Daphnia pulex, a crustacean, and two insects, the termite Zootermopsis nevadensis and the fruit fly Drosophila melanogaster, is also shown. Note that although most neuropeptides have a single name, a few are known by different names, e.g., the natalisins are sometimes called WXXXRamides and some researchers refer to the peptides derived from the neuropeptide like precursor (NPLP) as HIGSLYRamides. Another example is the insect CHH homolog that was initially discovered by its effect on ion transport in the hindgut and is therefore called ion transport peptide (ITP).
Many of the neuropeptide precursor transcripts seem complete, at least as far at the coding region is concerned, while for others very significant parts were found. Since one of the questions raised here is the presence of a particular neuropeptide gene, I have also added neuropeptide genes for which individual reads from an SRA provide evidence for its existence in the particular species, even though Trinity produced no contigs for transcripts from these genes. All the sequences, both DNA and deduced amino acids, are listed in Tables S1-S8.

Distribution
Having all the SRAs it seemed interesting to look at where the various genes might be expressed. Although it is possible to do this for all species involved, some are not very interesting as there is a very limited number of tissues sampled, while in other species the different tissues were sampled on different occasions and analyzed in different fashions, making direct comparisons difficult. However, in the case of C. maenas a single publication Figure 1 Overview of the presence of neuropeptide genes in seven decapods, Daphnia pulex and two insect species. Dark blue, neuropeptide precursors that have been published previously; light blue, neuropeptide precursors (or significant parts therefore) that can be deduced directly from publicly available TSAs; red, precursors assembled here; yellow, precursors that could not be assembled, but for which individual reads in TSAs demonstrate their existence in the particular species. Asterisks indicate the existence of more than one gene.
reports SRAs for a large variety of tissues (Verbruggen et al., 2015). Therefore, I used this species to look at the expression of the various neuropeptide genes in different tissues. Those neuropeptide receptors for which a contig of a significant size could be obtained and for which a likely ligand could be deduced based on homology to a deorphanized protostomian GPCR (see Veenstra, 2016a) were also included. The actual number of individual reads is often small and quantification of RNAseq reads is tricky due to the PCR amplification protocol used to create these libraries. Furthermore, the actual number of samples remains very small, so care should be taken in the interpretation of these data. Nevertheless, some interesting results are apparent (Table S9). Both the neuroparsins and the CHHs are expressed in virtually every tissue. In the case of the neuroparsin it is the neuroparsin 1 gene that is most abundantly expressed in all tissues, with the other two neuroparsin transcripts present at much lower levels. However, the two identified CHH transcripts are differentially expressed, one hormone is most abundant in the central nervous system and the eye (eyestalk), and the other in the intestine. Other neuropeptides found in the intestine are tachykinin, allatostatin C (Veenstra, 2016b), the B transcript of the calcitonin gene, elevenin, orcokinin, the agatoxin-like peptide and hyrg. The expression of CCHamide 2 and trissin in the hemolymph, presumably in hemocytes, is interesting to note as is the relatively large number of neuropeptides found in the SRA derived from the epidermis.

Paralogs
There are several neuropeptide genes that have one or more paralog genes. These are allatostatin C, CHH, moult inhibiting hormone (MIH), CCHamide, eclosion hormone, neuroparsin, PDH, insulin and CFSH. In some cases these are sufficiently different within the same species and sufficient similar between different species, that they clearly derive from different genes. This is the case for allatostatin C, CCHamide, insulin, neuropeptide F and eclosion hormone.
In the case of PDH, it is a bit more complicated. Variable numbers of PDH precursors were found in the seven decapod species. One group of precursors encoding PDH-like peptides distinguishes itself by an insertion of two-amino-acid residues in the predicted mature PDH. Such a predicted peptide was first found in P. clarkii (Veenstra, 2015), but since it was based on a single read in one species, it seemed premature to give it a distinct name. Now that complete precursor sequences are available and this peptide appears to be ubiquitously present in decapods, I propose to call it elongated PDH, or ePDH, to distinguish it from the more classical forms of these peptides (Fig. 2). The ePDH gene is one of the few genes that is present on a single contig of the draft genome from E. sinensis. It consists of three exons of which the first one is non-coding (Fig. 3). Partial sequence for one of the classical PDH genes show the intron between the two coding exons to be conserved.
In the case of neuroparsin, PDH, CHHs and its homolog MIH it is not always as clear that they represent different genes with unambiguous orthologs in different species. In some cases the observed differences could reflect allelic variations of a single gene or recent local gene duplications. Although no decapod genomes have been completely sequenced and the E. sinensis CHH genes are mostly very fragmentary, such local gene duplications are well known for CHH in decapods (Gu & Chan, 1998;Gu, Yu & Chan, 2000;Dircksen et al., 2001;Webster, Keller & Dircksen, 2012) as well as Chelicerates (Veenstra, 2016b) and particularly in decapods the number of CHH genes can be quite large (Webster, Keller & Dircksen, 2012).

CHH/MIH
The CHH/MIH neuropeptide family is characterized by CHH, MIH, mandibular organinhibiting hormone (MOIH), vitellogenesis-inhibiting hormone (VIH) and gonadinhibiting hormone (GIH). These hormones have been identifed by different physiological assays, but are in many cases pleiotropic. For example, although identified in different Figure 2 Sequence alignment of PDH and ePDH. Parts of the various PDH precursors including the convertase cleavage sites of the various decapod species. Note that the ubiquitous presence of ePDH that has a two-amino-acid insertion. biological assays, VIH and GIH are the same hormone. These peptides can be subdivided in two subfamilies, the CHHs proper and the other peptides. The precursors from the two groups differ in that the CHHs are produced together with a CHH-precursor related peptide, while the prepropeptides from the other homornes consist exclusively of a signal peptide and the sequence of the mature hormone (Webster, Keller & Dircksen, 2012). Four of the CHH/MIH transcripts identified here defy those rules as they do not have the CHH-precursor related peptide, yet on phylogenetic trees they form a separate branch that is closer to the CHH than to the MIH cluster (Fig. 4). Adding more sequences to the tree does not change this (data not shown). In the E. sinensis draft genome many sequences The DNA sequence containing the TATA box and a sequence that is recognizably similar to the Drosophila core promoter motif 1 (in blue, Ohler, 2006) and the start of the mRNA (in red) are also displayed; the red nucleotides at the end are part of the mRNA.
corresponding to these hormones are located on small scaffolds making it impossible to ascertain whether or not these genes are clustered.

CFSH
The CFSH is a recent discovery (Zmora & Chung, 2014) and consequently we still know very little of this very interesting hormone. In P. clarkii three related proteins were previously identified (Veenstra, 2015). In five of the other six decapod species two to four such hormones were found, but not in H. americanus, where there are no ovary transcriptomes. The primary sequence of these different putative hormones is not very well conserved, but the cysteine residues are (Fig. 5). The phylogenetic tree of these hormones suggests an initial gene duplication giving rise to two types of CFSH, that I have arbitrarily called CFSH 1 and 2 (Fig. 6). In most species both CFSH 1 and 2 were found, but in L. vannamei only three CFSH 1 paralogs were found and no CFSH 2. In the draft genome of E. sinensis CFSH 1 and 2b genes contain a single coding exon. The CFSH gene 2a transcript is incomplete and it is not clear from the genomic sequence what it is. This hormone was initially isolated from the eyestalk of the crab Callinectes (Zmora & Chung, 2014), while it in the crayfish P. clarkii it seemed to be strongly expressed in the ovary. It seemed therefore of interest to see whether these hormones might be expressed in the ovary in other decapods also. No significant expression was found in the ovaries of M. rosenbergii and L. vannamei (1 to 2 reads maximum for each hormone in an SRA), but 9 reads corresponding to L. vannamei CFSH 1c (as well as 1 each for 1a and 1b) are present in SRR2060962 from the L. vannamei testis. In E. sinensis expression is similar to that in C. maenas (Table S9), high expression levels in the eyestalk and a few reads only in the ovary. For S. paramamosain and H. americanus there are insufficient data to answer this question.

Neuroparsins and receptors
Three to four neuroparsin transcripts were identified in each of the seven decapod species. Three of the E. sinensis genes were found in the draft genome, two of these (neuroparsins Hormones are those identified from decapod SRAs as well as a few for which the biological activity has been described. Highlighted in yellow are the four sequences that on the tree are more similar to CHH, but lack the precursor-related peptide typically present in CHH. 3 and 4) are on the same scaffold in a tail to tail configuration, where the start and stop codons of the two genes are separated by 11,960 and 9,045 nucleotides respectively (Fig. 7). These are the two E. sinensis genes that are most similar to one another, suggesting that they may reflect the most recent neuroparsin gene duplication in this species. As both these genes seem to have direct orthologs in S. paramamosain and C. maenas, that particular Figure 5 CFSH alignments. Note that the two types of CFSH are clearly different from one another. Scylla paramamosain is not included in the figure, as there are only a few individual reads from this species. These reads correspond to orthologs of the three crab CFSHs; those that correpond to S. paramamosain 2a predict a protein sequence that is completely identical to the S. olivacea sequence illustrated. gene duplication possibly occurred in a common ancestor of the three crab species (Fig. 8). The neuroparsin receptor was recently identified as a venus kinase receptor (Vogel, Brown & Strand, 2015); two such receptors are found in all seven decapod species (Table S8). The phylogenetic tree made of the various venus kinase receptors suggests that the other arthropods venus kinase receptors are equally similar to both decapod receptors (Fig. 9).  The different decapod neuroparsin sequences found in the different species were used to make a phylogenetic tree. Note that one neuroparsin gene duplication likely occurred after the crabs separated from the other decapods.

Insulin-like peptides and receptors
Three different insulin-related peptides were identified. These are the well known androgenic insulin-related peptide (Fig. 10), an insulin-like peptide (Fig. 11) that seems most similar to the Drosophila insulin-like peptides 1-6 (Nässel & Vanden Broeck, 2016), and a peptide that is orthologous to Drosophila insulin-like peptide 7 and that has been called relaxin (Fig. 12). The latter was previously identified in Sagmariasius and P. clarkii (Chandler et al., 2015;Veenstra, 2015). As can be seen from the figures, the androgenic insulin-like peptide is the least conserved of those three (Figs. 10-12). Insulin-related peptides use two different types of receptors, the typical tyrosine kinase receptor and GPCRs. Insects typical have one gene coding an insulin tyrosine kinase receptor and have one or two GPCRs that are related to the vertebrate relaxin receptors RXFP1 and RXFP2. Given the interest in the androgenic insulin-like peptide both for its intriguing physiology as a peptidergic sex hormone and for its commercial potential (Ventura and Sagi, 2012), I also analyzed the likely insulin receptors. Figure 9 Phylogenetic tree of the tyrosine kinase domains of the decapod insulin and venus kinase receptors. Venus kinase receptors from the following species were added for increased resolution: Limulus polyphemus, Stegodyphus mimosarum, Locusta migratoria, Ixodes ricinus and Zootermopsis nevadensis. Note that the duplication of the venus kinase receptor gene is not generally present in arthropods and could thus be specific to crustaceans.
The typical insulin tyrosine kinase receptor, similar to the one recently described from M. rosenbergii (Sharabi et al., 2016), was also found in the other six decapods (Table  S8). Two receptors similar to the vertebrate relaxin receptors RXFP1 and RXPF2 were identified. Those two GPCRs are most similar to the Drosophila receptors CG31096 and CG34411, also known as leucine-rich repeat containing GPCR-3 and 4 (LGR3 and LGR4) respectively. However, the much weaker expression of those receptors made it impossible to deduce their complete cDNA sequences and in some cases no contigs could be obtained. Interestingly the SRA from the E. sinensis accessory gland (SRR2170964) not only shows very large number of reads for the androgenic Insulin-like peptide, but also very significant

Splice variants
There were a number of neuropeptide encoding cDNAs that revealed splice variants. Those that concerned the untranslated regions were ignored, but there are five neuropeptide genes that have alternative transcripts producing different precursors: the CHHs, CNMamide, neuropeptide F 1, calcitonin and the agatoxin-like peptide. In the case of neuropeptide F 1, there is an extra exon sliced into the sequence of the peptide, as described previously from insects and D. pulex (Roller et al., 2008;Nuss et al., 2010;Dircksen et al., 2011). The CNMamide gene in the termite Zootermopsis contains five coding exons, the last two of which are alternatively added to the first three and then produces a different CNMamidelike peptide. In four of the seven decapods, similar alternative splice products were found for the CNMamide precursor. However, while the mature peptide derived from the major splice form is well conserved, the second is much less so (Fig. 13). Two to four splice variants (Fig. S2) were found for the recently discovered µ-agatoxin-like peptide (Sturm et al., 2016). As in some insects (Veenstra, 2014), the calcitonin gene produces two different transcripts, encoding different types of calcitonin, that are similar to the insect calcitonins (Fig. S3). In L. vannamei, M. rosenbergii, H. americanus and P. clarkii the second transcripts are predicted to produce a calcitonin-like peptide that does not have one but two cysteine bridges at Figure 11 Sequence alignment of the decapod insulin-like peptides. Note the much better conservation of the primary sequences of the A and B chains of these hormones. The Carcinus sequence, although incomplete, is clearly part of an insulin precursor. Conserved residues indicated in black highlighting, and cysteine residues in red.
is N-terminus (Fig. 14). The calcitonin gene is absent from the E. sinensis draft genome, and hence it is impossible to compare the insect and decapod calcitonin gene structures.

Other peptides
In several cases novel neuropeptides have been detected by mass spectrometry. These are often structural variants of well known neuropeptides such as the RFamides, tachykinins and allatostatins A or B (e.g., Ma et al., 2008;Ma et al., 2009;Ma et al., 2010). However, not all peptide sequences identified this way belong to known neuropeptides. From H. americanus, C. maenas and L. vannamei other peptide sequences have been reported. The ones from C. maenas have previously been suggested to represent fragments of cryptocyanin (Ma et al., 2009), and this was confirmed (Fig. S4). Several of the peptides from H. americanus are shown here to represent fragments of thymosin, actin or histone 2A, however the origins of others remain unclear (Fig. S4). The one peptide reported from L. vannamei, L/IPEPEDPMAEAGHEL/I (Ma et al., 2010), is more interesting, as it could potentially be part of a novel neuropeptide (precursor). This sequence is part of a small protein that has a signal peptide followed by a peptide containing a small piece that is very well conserved (Fig. 15). However, it lacks the classical convertase cleavage sites that Figure 12 Sequence alignment of the decapod relaxins. Note the relatively good sequences conservation between the different decapod peptides and Dilp-7. Conserved residues indicated in black highlighting, and cysteine residues in red.

Figure 13
Last parts of CNMamide precursors. Some arthropods produce alternatively spliced mRNA predicted to produce different CNMamides. Notice that the major splice variants produce a much better conserved neuropeptide than the alternative ones. Residues in red are predicted to be cleaved by convertase and removed by carboxypeptidase during processing; the green glycine residues will be transformed in C-terminal amides and the cysteine residues are orange. Residues conserved between the different species are in blue. one usually finds in neuropeptide precursors and hence its status as a neuropeptide is unclear. Such proteins are also found in the other decapods. Although it was not possible for Trinity to produce a complete contig for S. paramamosain, a similar sequence is present in the databanks for Scylla olivacea. I was unable to find similar proteins in insects, but an orthologous protein was detected in the SRA from Euphausia crystallorophias. The latter sequence shows that the only conserved part is the same as in the decapods (Fig. 15). This peptide was called hyrg (pronounced hirg), for four of the conserved amino acids. Interestingly, the eyestalk seems to be the tissue where expression of hyrg is the highest (Table S9), thus suggesting that it is likely a neurohormone.

DISCUSSION
Insects and decapods are estimated to have had their last common ancestor about 596 Mya, while similar estimates for the common ancestor of crabs and lobsters on the one hand and that for termites and flies on the other are 322 and 348 Mya respectively (Hedges et al., 2015). The gross morphology of decapods has changed a lot less than that of insects and when one compares the respective neuropeptidomes of those two groups, it is clear that those are similarly much better conserved in decapods than in insects (Fig. 1). Most of the changes in insects are losses of neuropeptides that are particularly pronounced in flies, and perhaps even more so in D. melanogaster.
Whenever in this study a particular gene has not been identified from a decapod species, either one of the following is true: (1) the gene is not expressed at high levels and there are relatively small amounts of RNAseq reads; (2) the gene is expressed predominantly in tissues that have not been sampled in the species in question; or (3) the gene has several paralog genes (PDH, CHH, neuroparsin) and it may not have the same number of paralogs in all species and/or some of those paralogs may be expressed in tissues that were not sampled. A Figure 15 Hyrg sequence alignment. Note that only a small part of the sequence of this puative neuropeptide is conserved in both decapods as well as in Euphausia crystallorophias. combination of (1) and (2) likely explains the absence of some of the neuropeptide genes in S. paramamosain. From that species the eyestalk was not analyzed, even though this tissue is by far the richest source of neuropeptides. Nevertheless, the existence of several S. paramamosain neuropeptide genes could be inferred from individual RNAseq reads, while the few genes that are completely lacking are only weakly expressed in the other species. The androgenic peptide was found neither in C. maenas nor H. americanus. As in H. americanus only the nervous system was included in the analysis, this is to be expected. In the case of C. maenas, it is plausible that the testis samples did not include the accessory gland, or that the sample was taken at a moment in the life cycle of the animal that expression of this peptide is low or non-existent. With one exception, in all other instances where a transcript seems to be lacking it is either from a gene for which an alternative transcript was found (e.g., in the case of the CNMamide and Neuropeptide F 1 genes), or the number of paralogs may differ between the various species (neuroparsin, CHH, MIH, PDH). The only exception is a typical MIH in H. americanus. In spite of using the MIH sequence of the closely related species H. gammarus (Ollivaux et al., 2006) as a query in the BLAST program, no transcript was found in the H. americanus SRAs. Although MIH has been reported by mass spectrometry from the stomatogastric ganglion of H. americanus (Ma et al., 2008), this peptide is not a typical MIH but has the structure of a CHH (Chang, Prestwich & Bruce, 1990). This suggests that in H. americanus the MIH function is assured by a member of the first rather than the second subfamily of these hormones. It also suggests that in this species the second subfamily was lost.

Neuropeptide evolution
It thus appears that the neuropeptidome of decapods has been remarkably well conserved during evolution. Differences that are found between the insect and decapod neuropeptidomes are the loss or the gain of a neuropeptide. Although there possibly still remain arthropod neuropeptides to be discovered, it appears that the loss of neuropeptides in decapods is limited to a single neuropeptide, i.e., allatotropin. Allatotropin is present in mollusks, annelids as well as chelicerates (Veenstra, 2010a;Veenstra, 2011;Veenstra, 2016a) and hence, it must have been present in the arthropod ancestor. Small peptides are sometimes hard to find using the BLAST program and allatotropin is no exception to this rule (Veenstra, Rodriguez & Weaver, 2012). Nevertheless, as I was neither able to find even a single read correponding to its receptor, including in the very abundant number of transcriptome reads from H. americanus, I conclude that this peptide was most likely lost. In the termite and the fruit fly on the other hand, more neuropeptides are missing, particularly in Drosophila. At first sight insects, as a group, lack EFLamide, the androgenic insulin-like hormone, CFSH and ePDH. However, the recent identification of an EFLamide receptor in Platynereis dumerlii as a TRH GPCR ortholog (Bauknecht & Jékely, 2015) and the presence of such a GPCR in Nilaparvata lugens (Tanaka et al., 2014) suggests that some insects may have such a gene. As described below, it is plausible that the androgenic peptide has an insect ortholog. What seems really different is that many insect species, in particular holometabolous species, have lost several neuropeptides (Derst et al., 2016). Thus Drosophila no longer has genes for elevenin, vasopressin, allatotropin, allatostatin CCC, EFLamide, neuroparsin, calcitonin, ACP, eclosion hormone 2, neuropeptide F 2 and it also lost the possibility to produce alternative transcripts from the CNMamide and neuropeptide F1 genes. The beetle Tribolium castaneum on the other hand still has most of those neuropeptides, but lost allatostatin A, corazonin and leucokinin.

New neuropeptides
Since the last common ancestor of decapods and insects-estimated to have lived 596 Mya (Hedges et al., 2015)-very few neuropeptides seem to have been added to either of the two lineages. Novel neuropeptide genes that have appeared seem all to have originated by duplication from existing ones and are easily recognized as the paralogs of the parent genes. Examples of such genes are the various paralogs of CHH and MIH, PDH and neuroparsin in crustaceans and in insects the typtopyrokinin and SIFamide paralogs as well as the great variety of adipokinetic hormones (all orthologs of crustacean RPCH). The only exception may be hyrg, the precursor for the peptide initially identified from L. vannamei (Ma et al., 2010). This peptide, that is well expressed in the eyestalk and the midgut, has a distribution typical of a neuroendocrine peptide. As I was unable to find it outside of crustaceans, it could be a novel invention of this group. The structure of this putative neuropeptide precursor is somewhat reminiscent of limostatin, a small neuroendocrine protein discovered in Drosophila that intereacts with a GPCR (Alfa et al., 2015) previously identified as the receptor for neuropeptide pyrokinin 1 (Cazzamali et al., 2005). The similarity between limostatin and hyrg resides in the apparent absence of conventional convertase sites in these putative neuropeptide precursors (those postulated to function in the limostatin precursor (Alfa et al., 2015) seem highly unusual (Veenstra, 2000)). In the same context the Drosophila sex peptide comes to mind, as it also acts on a neuropeptide receptor without having neither a well conserved structure nor the typical neuropeptide convertase cleavage sites (Kim et al., 2010). Perhaps one or more of these proteins represent newly evolved ligands for existing neuropeptide receptors that could potentially become novel neuropeptides.

Missing neuropeptides
Many decapod neuropeptides have been identified by mass spectrometry over the years (e.g., Stemmler et al., 2007a;Stemmler et al., 2007b;Stemmler et al., 2010;Christie et al., 2008;Ma et al., 2008;Ma et al., 2009;Ma et al., 2010;Dickinson et al., 2008;Dickinson et al., 2009a;Dickinson et al., 2009b). Most of those were identified in the various SRAs, although not always in exactly the same molecular form. In particular, I was unable to find some of the analogs of SIFamide that have been reported (e.g., Hui et al., 2012). I could neither find [Val 1 ]-SIFamide in any species, however this peptide seems to be present in the stomatogastric nervous system (Christie et al., 2006) and this might explain its absence from the various SRAs. Several of the peptides previously described from these data that did not appear to be neuropeptides could be identified as being part of well known proteins and it also allowed me to identify the hyrg trancript. However, there are three neuropeptides that either have been reported or suggested to be present in decapods that were not found in any of the SRAs from the seven decapod species studied here. These are a pituitary adenylate cyclase activating polypeptide (PACAP) from L. vannamei (Lugo et al., 2013), a GnRH-like peptide from P. clarkii (Guan et al., 2014) and two kisspeptins from M. rosenbergii (Thongbuakaew et al., 2016). None of these peptides could be found in any of the SRAs, neither those from the species from which they were reported, nor from any of the other species. In two cases (PACAP and GnRH), the amino acid sequences of the peptides have been published from the same species used here, so my inability to find these peptides is not due to significant sequence differences between the species used for bioinformatic analysis and those from which the peptides were identified. I was neither able to find evidence for the receptors for such peptides in any of decapods. The GnRH receptor identified from the ovary of the oriental river prawn Macrobrachium nipponense is the corazonin receptor ortholog (Du, Ma & Qiu, 2015), implying that corazonin is its ligand. Given the strong conservation of the decapod neuropeptidome described here, I conclude that is highly unlikely that any of those three peptides is present in decapods.

Functional aspects
Conservation of structure does not necessarily imply conservation of function. The function of crustacean RPCH and its insect ortholog AKH are distinctly different. A neuropeptide sequence does not reveal its function, but the distribution of its receptor may give some clues. FMRFamide is known to effect muscle contraction in decapods (Worden, Kravitz & Goy, 1995), the expression of its putative receptor in muscle, heart and the epidermis (that contains muscle as well) suggests that it has similar effects. The simultaneous expression of elevenin and a putative elevenin receptor in the midgut suggests that is has a digestive function. The hormone GPA2/GPB5 was suggested to be an antidiuretic hormone in Drosophila (Sellami, Agricola & Veenstra, 2011) and was subsequently shown to stimulate sodium reabsorption in the mosquito hindgut (Paluzzi, Vanderveken & O'Donnell, 2014). The very abundant expression of its receptor in the gill suggests that its function C. maenas may well be similar. An interesting difference between insects and decapods is the presence of ecydysis triggering hormone in the decapod nervous system and eye(stalk); in insects this peptide seems to be exclusively present in cells associated with the tracheal system and absent from the central nervous system (Roller et al., 2010). It will be interesting to know whether the function of ecdysis triggering hormone within the decapod nervous system is related to ecdysis behavior.

Intestine
Neuropeptides in the intestine are typically produced by enteroendocrine cells. CHH (Chung, Dircksen & Webster, 1999), SIFamide and tachykinin immunoreactive enteroendocrine cells (Christie et al., 2007) have been previously described from decapods. No SIFamide reads were found in the C. maenas intestine SRA, but allatostatin C, calcitonin-B, elevenin, orcokinin and hyrg were all present in seemingly significant numbers of reads (Table S9). This ensemble of gut neuropeptides differs significantly from what is known from the Drosophila midgut (Veenstra & Ida, 2014), even though tachykinin, allatostatin C and orcokinins are present in both, while the calcitonin B transcript is abundant in phasmid midgut SRAs (Veenstra, 2014).

CHH and MIH
The neuropeptides related to CHH are amongst the best known crustacean hormones (excellent review by Webster, Keller & Dircksen, 2012). As was expected based on the literature, several molecular forms were found. There are reasons to think there may be more of these hormones than reported here. First of all, the few decapod CHH genes that have been identified are typically present in clusters and in Metapaeneus ensis 16 such genes have been found (Gu & Chan, 1998). Secondly, as shown here and elsewhere (e.g., Hsu et al., 2006;Li et al., 2010;Ventura-López et al., 2016) some of these genes are differentially expressed. Thus, if a gene is predominantly expressed in a tissue not included in the analysis, it may not be found. Finally, since these hormones are similar in structure, it is possible that Trinity would have problems producing all contigs.
The biological activities of these hormones vary widely and the hormones with very similar sequences may have quite different physiological effects (e.g., Webster, Keller & Dircksen, 2012;Luo et al., 2015). It is for this reason impossible to interpret the meaning of the four predicted hormones that defy classification as either a CHH-like or MIH-like hormone (Fig. 5).

PDH
There are generally within the same species several precursors coding the shorter, more classical, PDHs, those different precursors code sometimes for the same mature peptide. It seems plausible that some of these differences reflect either allelic variations of a single gene or recent local gene duplications. Most of the species have two or more different predicted mature PDH peptides. It has previously been shown that the two PDHs from the crab Cancer productus have different functions, one is released as a hormone into the hemolymph, while the other is used within the central nervous system (Hsu et al., 2010). As the tissue used for the S. paramamosain transcriptome did not include the eyestalk it is thus not surprising that the hormonal PDH is lacking from the deduced transcriptome in this species. ePDH is not expressed in the eyestalk and one might therefore be tempted to think it is not released into the hemolymph. However, it is present in the L. vannamei transcriptome that was produced from abdominal muscle, hepatopancreas, gills and pleopods (Ghaffari et al., 2014) and thus is likely produced somewhere in the periphery (this transcriptome contains relatively few neuropeptides as it includes neither the central nervous system nor the intestine).

CFSH
CFSH was discovered very recently in the crab Callinectes sapidus (Zmora & Chung, 2014) and consequently we know still very little of this extraordinarily interesting hormone. I previously reported the presence of both CFSH and two homologous proteins in P. clarkii (Veenstra, 2015). Now that there are a few more sequences available, it appears that this gene commonly has several paralogs. Some of these seem to have a relatively recent origin, as the most closely related sequence comes from the same species (Fig. 6). The independent gene duplications of these proteins as well the great sequence variability between and within species may indicate that all these hormones act on the same receptor. Given the relatively large size of these hormones one might expect a leucine rich repeat G-protein coupled receptor or a dimeric protein kinase, perhaps one of the two venus kinase receptors, but this remains speculation. The primary structure of CFSH is not very well conserved and its receptor is unknown. Hence, we don't know whether an orthologous hormonal regulatory system might be present in other arthropods, like e.g., insects (given the great similarity in their neuropeptidomes this seems a distinct possibility, at least in the more primitive insects). It seems that the expression of this hormone in the ovary of P. clarkii (Veenstra, 2015) is unusual, as it was not found in any of the other decapods for which an ovary SRA is available.

Insulin and neuroparsin
Other intriguing neuropeptides are the neuroparsins and the insulin-related hormones. There are three different insulin-like hormones. There are also three different insulin receptors, the classical tyrosine kinase and two G-protein coupled receptors. What I have called insulin is the hormone most similar to the Drosophila insulin-like peptides 1-6, which function as growth hormones and are also important for reproduction and that signal through the classical tyrosine kinase receptor (Nässel & Vanden Broeck, 2016).
The same receptor is also present in decapods as shown here and elsewhere (Veenstra, 2015); it has recently been characterized in two decapods (Aizen et al., 2016;Sharabi et al., 2016). Both insulin and neuroparsins activate tyrosine kinase receptors. However, whereas the actions of insulin in insects are relatively well understood due to very extensive research on these peptides in Drosophila (Nässel & Vanden Broeck, 2016), the function of neuroparsin is less clear, as it is absent from Drosophila melanogaster (Veenstra, 2010b). It is interesting to note that some species have several insulin genes and few if any neuroparsin genes (Drosophila, Acyrthosiphon, Zootermopsis), while decapods and Locusta have several neuroparsin transcripts and only a single insulin gene, suggesting some complementation between these two hormones. Indeed, in some cases, such as vitellogenesis in the mosquito both hormones have synergistic effects (Brown et al., 1998;Dhara et al., 2013), however in the migratory locust they act antagonistically (Badisco et al., 2011). Initially isolated from the migratory locust L. migratoria (Girardie et al., 1989) neuroparsin was shown to have strong anti-juvenile hormone effects, effecting both reproduction and metamorphosis (Girardie et al., 1987). It has been shown that neuroparsin RNAi also inhibits vitellogenesis, and hence reproduction, in the decapod Metapenaeus ensis (Yang et al., 2014). The receptor for this hormone was recently identified in mosquitoes as a venus kinase receptor (Vogel, Brown & Strand, 2015), a type of receptor that was lost in chordates during evolution (Dissous, 2015). Although orthologous venus kinase receptors are present in other arthropod genomes (notably Limulus, Strigamia and Stegodyphus, Table S8) as well as mollusks (Vanderstraete et al., 2013), no neuroparsin orthologs could be found in those species. The evolutionary origin of neuroparsin is therefore unclear and it is not known whether species that seem to lack neuroparsin need a hormone ligand to activate the venus kinase receptor (Dissous, 2015). The presence of two such receptors in decapods is intriguing, but has also been found in Lepidoptera and trematodes (Dissous, 2015).

The other insulin-like peptides
Insects and decapods share many neuropeptides and it is not surprising that the various decapod insulin-related hormones also have insect orthologs. The insulin-like hormone I have called relaxin is an ortholog of Drosophila insulin-like peptide 7. This hormone is not only present in insects, but also in ticks, spider mites, mollusks and even acorn worms (Veenstra, 2010a;Veenstra, Rombauts & Grbić, 2012). As previously pointed out, this hormone is only present in the genomes of those species that also have an ortholog of Drosophila gene CG34411, encoding LGR4 that is homologous to vertebrate relaxin GPCRs (Veenstra, Rombauts & Grbić, 2012;Veenstra, 2014). This suggests that this GPCR functions as a receptor for the arthropod relaxins. It must be noted that this does not exclude the possibility that arthropod relaxins may also signal through the classical insulin tyrosine kinase receptor. In fact, there is evidence from Drosophila that this is so (Linneweber et al., 2014).
Drosophila has an eighth insulin-like hormone that was initially discovered because it is secreted by the imaginal discs (Colombani, Andersen & Léopold, 2012;Garelli et al., 2012). However, data from fly atlas (Chintapalli, Wang & Dow, 2007) show that it is also expressed by the ovary. This hormone was suggested to be acting through the GPCR encoded by Drosophila gene CG31096 encoding LGR3 (Veenstra, 2014) and this has now been confirmed (Vallejo et al., 2015;Garelli et al., 2015).
LGR3 is also related to vertebrate GPCRs binding relaxin. As reported previously it has a P. clarkii ortholog (Veenstra, 2016a), and as shown here is generally present in decapods. Combined these data suggest that LGR3 is the receptor for the androgenic insulin-like peptide from the accessory gland. The absence of clear sequence homology between the Drosophila and decapod peptides is not surprising, as the primary sequence of this hormone is poorly conserved in both decapods (Fig. 10) and insects (other insects almost certainly have such a peptide, since they have the receptor, but only within flies is it possible to find orthologs using the BLAST program). Interestingly, both these hormones are produced by gonads or associated accessory glands. At first sight it seems that in crustaceans it is predominantly the male that produces it, while in adult flies it is the female. However, work on the expression of LGR3 in Drosophila shows it to be important for the development of both male and female specific sexual characters (Meissner et al., 2016) and it is perhaps better considered a (sexual ?) maturation hormone for both sexes. This would also make it easier to understand how during evolution it was coopted by the imaginal discs. In decapods the male has two Z chromosomes and is the default sex (Parnes et al., 2003;Staelens et al., 2008;Ventura et al., 2011;Cui et al., 2015). Therefore, one would expect females to have a mechanism (not necessarily hormonal) to escape becoming a male and might thus expect a gynogenic rather than an androgenic hormone (this is one of the reasons why CFSH is so interesting). Even in decapods there is now evidence that the androgenic insulin-like peptide is not specific for males. Thus, in S. paramamosain it is also expressed by the ovary and at higher levels at the end of vitellogenesis (Huang et al., 2014). While the relative levels of expression may seem low as compared to those of actin (Huang et al., 2014), the actual quantities of peptide produced may well rival those made by the accessory gland, considering that the ovary is so much larger (could the accessory gland be the remnants of an embryonic ovary anlage ?). Given the effectiveness of RNAi in crustaceans and the strong phenotypes obtained in the absence of androgenic peptide (Ventura et al., 2009), the hypothesis that LGR3 is important in the transduction of the androgenic peptide signal can be tested. As with relaxin, a GPCR specifically activated by the androgenic insulin-like peptide does not exclude the possibility that it may also act on the classical insulin tyrosine kinase receptor, as shown by recent experiments in the decapod Sagmariasus (Aizen et al., 2016). Possible relations between the decapod insulin-related peptides and their receptors are illustrated in Fig. 16.
It is of interest to note that the mammalian GPCR most similar to LGR3 is RXFP2, the receptor for insulin-related peptide 3. The latter hormone was initially discovered from the testis and is important not only to insure testicular descent (Adhama, Emmen & Engel, 2000) but also in the female reproductive system (Satchell et al., 2013). Thus the data suggest that not only the structures of the receptor and its ligand are recognizably similar, but so might be their function. This is rather interesting, as most neuropeptides with orthologs in both proto-and deuterostomia have quite different functions in these two groups.

CONCLUSIONS
Decapod neuropeptidomes are highly conserved and share many neuropeptides with insects. Although a shared neuropeptide structure does not necessarily translate into a shared function, it should allow for the rapid identification receptors in decapods in those cases where the orthologous insect receptor is known.