Comparative genomic analysis of innate immunity reveals novel and conserved components in crustacean food crop species

Background Growing global demands for crustacean food crop species have driven large investments in aquaculture research worldwide. However, large-scale production is susceptible to pathogen-mediated destruction particularly in developing economies. Thus, a thorough understanding of the immune system components of food crop species is imperative for research to combat pathogens. Results Through a comparative genomics approach utilising extant data from 55 species, we describe the innate immune system of the class Malacostraca, which includes all food crop species. We identify 7407 malacostracan genes from 39 gene families implicated in different aspects of host defence and demonstrate dynamic evolution of innate immunity components within this group. Malacostracans have achieved flexibility in recognising infectious agents through divergent evolution and expansion of pathogen recognition receptors genes. Antiviral RNAi, Toll and JAK-STAT signal transduction pathways have remained conserved within Malacostraca, although the Imd pathway appears to lack several key components. Immune effectors such as the antimicrobial peptides (AMPs) have unique evolutionary profiles, with many malacostracan AMPs not found in other arthropods. Lastly, we describe four putative novel immune gene families, potentially representing important evolutionary novelties of the malacostracan immune system. Conclusion Our analyses across the broader Malacostraca have allowed us to not only draw analogies with other arthropods but also to identify evolutionary novelties in immune modulation components and form strong hypotheses as to when key pathways have evolved or diverged. This will serve as a key resource for future immunology research in crustacean food crops. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3769-4) contains supplementary material, which is available to authorized users.


Background
The global human population is projected to escalate to 9.1 billion by 2050 [1]. With an increasing food consumption per capita and changing demands for animal proteins [2], there is a dire need for sustainable sources to avoid further degradation of the environment. It has been suggested that much of this may come from invertebrate sources including insects [3], but clearly crustaceans currently represent a source of protein that is more culturally palatable in Europe and North America. Crustaceans already represent a significant portion of marine aquaculture produce, with the predicted annual production exceeding 10 million tonnes and sales of $40 billion [4] that will continue to increase. The expansion of farmed crustaceans is not without major issues. It is estimated that up to 40% ($3 billion) of just shrimp production alone can be lost each year due to disease outbreaks [4]. Pathogens and diseases impacting crustaceans have been recently extensively reviewed [4][5][6][7][8][9]. Some of the most common diseases in decapod crustaceans are the white spot disease caused by the white spot syndrome virus (WSSV) in penaeid shrimps [8], yellow head disease caused by the Yellow head Virus [10][11][12], Taura syndrome caused by the Taura Syndrome Virus [13,14], fungal diseases in the Dungeness crab, Cancer magister [15][16][17], infections by the parasitic dinoflagellate Hematodinium sp. in crabs [18], the Panulirus argus virus 1 (PaV1) infection in lobsters [19] and bacterial diseases caused by Vibrio or Aeromonas [20]. There is broad agreement that without new interventions and better understanding of pathology and immune responses, current best practices for crustacean aquaculture cannot be improved. The use of antibiotics and chemical treatments for disease control in aquaculture is undesirable due to long-term economic and environmental ramifications [21][22][23]. Therefore, approaches that harness and aid the crustacean innate defence mechanism should be exploited to limit and prevent diseases, and therefore crop loss. For example, assays for the measurement of innate immune activity could provide early warnings for the presence of potential pathogens within closed aquaculture systems.
Systematic and cross-species characterisation of the crustacean immune system has not been performed, despite it being essential for the field to progress [24]. Previous comparisons amongst sequenced arthropod genomes of insects, chelicerates, the myriapod Strigamia maritima, the branchiopod Daphnia pulex and the amphipod Parhyale hawaiensis have recently revealed signatures of conservation and diversity in innate immunity components across arthropod phyla [25][26][27]. However, not much is known about the evolutionary events that define the immune system in malacostracans, or within the order Decapoda that includes crop species. The radiation of Pancrustacea (hexapods and crustaceans) has been estimated to be between~540 and~666 million years ago (mya) [28,29] while the split of Branchiopoda from Malacostraca was estimated at 614 mya [28]. Others have made estimates of similar divergence times based on crustacean hemocyanins [30]. Given the large evolutionary time scales involved, many lineage specific changes in immune system components within the Malocostraca may have occurred and using only the branchiopod D. pulex and a single malacostracan P. hawaiensis to define immune regulation is unlikely to provide either a rich or accurate picture. Ultimately this will require both comparative and functional genomics approaches to effectively understand and exploit the immune system. Due to potential importance of crustacean food sources, such studies are of high impact and urgency. Currently, the lack of a comprehensive comparative genomics study of immunity with the Malacostraca means that a clear staging point for underpinning this work is lacking.
Here, we address this major deficit by performing an in depth comparative study amongst the broader Malacostraca, including extant data from the order Decapoda that includes all the major food crop species (Additional file 1: Figure S1). A large number of relatively recent independent studies have started to generate publically deposited large transcriptomic data sets from food crop species and other related malacostracan species providing ample raw data for our study; complete set of references provided in Additional file 2: Table S1 [31][32][33][34][35][36][37][38][39][40][41]. We have annotated innate immunity genes and pathways from 69 Malacostraca transcriptome datasets from 55 species representing five Malacostraca orders: Amphipoda (7 species), Decapoda (18 species), Isopoda (27 species), Euphausiacea (2 species) and Mysida (1 species) (Additional file 1: Figure S1; Additional file 2: Table S1 and Additional file 3: Table S2) [42][43][44]. We used sequence, motif and domain similarity based approaches to identified 7407 genes, representing 39 immune gene families in the Malacostraca (summarised in Fig. 1). We annotate genes that encode pathogen recognition proteins, signalling components of key signal transduction pathways such as Toll, Imd and JAK-STAT, effector genes encoding proteins that perform immune protection such as antimicrobial peptides and members of the antiviral RNAi pathway. Within these key groups, we define malacostracan specific evolutionary events that suggest a previously unsuspected variation in immune gene content, and that functional genomic studies of immunity specifically with species in this group will be required for clear understanding of host defence in food crop species. A comparison across these data sets also allowed us to expand the annotation of previously discovered crustacean specific immune components, confirming their importance across the group. Finally, taking a conservative approach using orthology analyses and Pfam annotations in the sequenced amphipod genome of the crustacean P. hawaiensis [27], we describe four novel gene families with immune related protein domains conserved only within the Malacostraca. We show that these novel Malacostraca genes exhibit tissue-specific expression in the amphipod P. hawaiensis. Overall, our work provides a comprehensive picture of the Malacostraca innate immune system and a key staging point that will now facilitate important immunology research to underpin food crop aquaculture.

Results and Discussion
Pattern recognition receptors in malacostracans are dynamically evolving and exhibit family-specific expansions While vertebrates rely on adaptive immune systems and immunological memory mediated by secreted antibodies to ward off pathogens, many invertebrates, including arthropods use a pre-encoded set of proteins known as the pattern recognition receptors (PRRs) to recognise a broad spectrum of microbial ligands. Arthropods PRRs facilitate microbial killing through a range of direct and indirect mechanisms [45][46][47][48][49][50] upon the detection of nonself pathogen structures known as pathogen-associated molecular patterns (PAMPs) present on the surface of microbes [51]. Some examples of PAMPs include peptidoglycans (PGN) and lipotechoic acids (LTA) in Gram-positive bacteria, lipopolysaccharides (LPS) in Gram-negative bacteria and β-glucans from fungal cell walls [51,52]. We examined seven PRR families in Malacostraca, which included the Gram-negative binding proteins (GNBPs), Down syndrome cell adhesion molecules (DSCAMs), scavenger receptors (SRs), Domeless proteins (discussed in the signal transduction section), C-type lectins (CTLs), galectins and thioester-containing proteins (TEPs; Fig. 1; Fig. 2A). DSCAM, SRs and Domeless are transmembrane receptors (Fig. 2a). GNBPs can either be associated with the cell membrane via a glycosylphosphatidylinositol anchor or function as soluble receptors [53]. We identified 202 GNBPs, 49 DSCAMs, 443 SRs, 47 Domeless proteins, 1005 CTLs, 47 galectins and 432 TEPs in malacostracans (Additional file 4: Table S3).
A major component of arthropod immune systems that still requires further definition and is yet to be fully exploited is the DSCAM proteins, which undergoes startling levels of alternative splicing (AS) in the Pancrustacean clade. The canonical DSCAM domain arrangement consists of 9 (immunoglobulin; Ig) -4 (fibronectin; Fn) -(Ig) -2 (Fn) [69]. Since other Ig-containing genes may confound the identification of bona fide DSCAM transcripts in malacostracans, we searched for genes/transcripts containing the Fn1-Fn2-Fn3-Fn4-Ig10-Fn5 motif set from known DSCAM protein sequences used as queries for BLAST. From this, we identified putative DSCAM transcripts in 49 out of 55 malacostracan species in our study (Fig. 2b). We observed that DSCAMs in brachyurans (except for Cancer borealis) are monophyletic and are likely to be orthologous (Fig. 2b). DSCAM AS in arthropods has evolved to allow versatile pathogen recognition and this is facilitated by the vast reservoir of receptor diversity resulting from alternative splicing of the hypervariable regions [27,[70][71][72][73][74]. Although it is likely that most malacostracan DSCAMs have multiple splice forms, accurate characterisation and annotation of splice variants from transcriptome data alone is confounded by long arrays of highly similar Ig exons. Genome and genomic DNA based approaches will be needed to assess this with accuracy. It seems likely that the DSCAM remains a key PRR in malacostracans, and will need to be further studied in the context of infection as potential diagnostic marker and effector mechanism that might be exploited in aquaculture.
Scavenger receptors (SRs) are a subclass of structurally diverse membrane-bound PRRs, first described as proteins having the ability to bind to oxidised low-density lipoproteins (LDLs) (Fig. 2a) [51,[75][76][77][78][79]. SRs can recognise a diverse range of cognate ligands and these include modified self-molecules (eg: oxidised LDLs) and non-self microbial structures such as LPS and LTA [80,81]. We considered two classes of SRs in Malacostraca, namely the macrophage class A scavenger receptors (SCRAs) and the class B scavenger receptors (SCRBs; Fig. 2a). We annotated 129 SCRAs and 314 SCRBs in malacostracans (Additional file 4: Table S3; Fig. 2e). Malacostracans SCRAs are characterised by multiple domains; the cysteine-rich (SRCR) domain, C-type lectin domain, lysyl oxidase or collagen domain (Fig. 2a) [82][83][84]. Malacostracans SCRBs have the CD36 domain and two transmembrane domains (Fig. 2a) [79]. To date, the only SR reported in crustaceans is a homolog of Croquemort, a SCRB family member in Marsupenaeus japonicus [85]. Humans and Caenorhabditis elegans only have three CD36-like proteins each [86,87]. SCRBs in malacostracans have however, undergone multiple gene duplications; the isopod Proasellus ortizi, the decapod C. borealis and amphipod P. hawaiensis have 10, 6 and 8 SCRBs respectively ( Fig. 1; Additional file 4: Table S3). Major SCRB gene expansion is likely to have occurred at the base of Mandibulata as S. maritima, D. melanogaster and D. pulex have 7, 13 and 8 homologs respectively while the chelicerate I. scapularis only has three (Fig. 1). Clearly the role of SCRBs in mandibulate immunity needs further study as almost nothing is known about the significance of the SCRB expansion in arthropods. Perhaps not all SCRBs in malacostracans are involved in host defence because CD36-like proteins have been shown to participate in other physiological roles such as facilitating cellular uptake of carotenoids required for visual chromophore formation [88], scavenging of apoptotic cells [89] and lipoprotein homeostasis [90].
Lectins have been shown to be directly relevant to the immune system of crustaceans [91,92]. A C-type lectin (CTL) in M. japonicus, expressed primarily in intestinal tissues, is upregulated upon bacteria and WSSV infection and can bind LPS and PGN in a dose-dependent manner [93]. Nonetheless little is known about how many of each of the different types of lectins are present in malacostraca. CTLs are a group of diverse proteins characterised by a carbohydrate-recognition domain, some of which are Ca 2+dependent and they can bind sugar and non-sugar ligands [94,95], while galectins are another type of lectin proteins that can bind β-galactoside sugars and are involved in multiple cellular processes such as apoptosis, cell proliferation and immunity [96]. We identified over a thousand putative CTLs across Malacostraca with Litopenaeus vannamei having 65 different CTLs, in line with a general trend for decapods to have more CTLs than other malacostracan groups ( Fig. 1; Fig. 2e; Additional file 4: Table S3). Although the copy number of CTLs varies greatly between malacostracan species ( Fig. 2e; Additional file 4: Table S3), it is clear that divergent evolution through multiple gene duplications has occurred within this lineage, particularly in some amphipod and decapod species (Fig. 2e). More broadly we find that this appears to be a feature in many other arthropod lineages; S. maritima, D. melanogaster, Aedes aegypti and D. pulex have 25, 33, 40 and 26 genes respectively (Additional file 4: Table S3). To date, only five CTLs in L. vannamei have been studied in the contexts of Gram-negative bacteria agglutination and WSSV infection [97][98][99][100][101]. Future expression panel testing, particularly in decapods, will be required to ascertain whether CTLs may have distinct roles in recognising different pathogenic agents. Galectins in malacostracans are present as single-copy homologs except in two isopod species (Asellus aquaticus and Bragasellus peltatus that have two galectins each; Additional file 4: Table S3; Fig. 2d). Our analysis revealed that insects, chelicerates, S. maritima and D. pulex have multiple copies of galectins ( Fig. 1) suggesting that with respect to galectins, malacostracans have evolved conservatively (Fig. 2d).
The thioester-containing protein (TEP) superfamily includes the vertebrate complement system, the pan-protease inhibitor α2-macroglobulin (α2M), insect TEPlike proteins and macroglobulin complement related (MCR) proteins [102]. TEPs have the unique propensity to form covalent bonds with pathogens through their canonical thioester (GCGEQ) motifs to promote endocytotic clearance or to neutralise pathogenic proteases [103][104][105]. Amongst arthropods, some TEPs lack the canonical thioester motif and they presumably lack the ability to form covalent bonds with pathogenic surfaces [26]. We identified a total of 432 TEPs in all 55 malacostracan species (Additional file 4: Table S3). Decapods in general have more TEPs than other malacostracan orders (Additional file 6: Figure S3B). P. clarkii has at least 25 different TEPs, the highest amongst the malacostracan datasets considered here (Additional file 4: Table S3). However, only a third (147/432) of malacostracans TEPs have the GCGEQ motif. Nonetheless, as reports have indicated that a TEP protein in D. melanogaster, although lacking the thioester motif, could still bind to fungi and promote phagocytosis [104], this TEP diversity in malacostraca and particularly Decapoda may be immune related. We analysed phylogenetic relationships between TEP members in Malacostraca and observed that like TEPs in arthropods, they fall into three major categories: α2Ms, insect TEP-like proteins and MCRs (Additional file 6: Figure S3). The α2Ms in amphipods (except for 1 gene in Talitrus saltator) form a monophyletic group (Additional file 6: Figure S3A). The vertebrate C3 and C4 factors are also monophyletic, while C3 from the amphioxus Branchiostoma belcheri and C5 factors from mouse and human are paraphyletic (Additional file 6: Figure S3). Since we did not find any malacostracan TEPs clustering with the vertebrate complement factors and together with the observation that C3-like proteins are only found in chelicerates and myriapods [26], we predict that C3-like proteins have been lost in the Pancrustacea.
The recognition of PAMPs by PRRs is the first line of defence against invading pathogens. In this study, we have annotated known PRR families in Malacostraca and established analogies to arthropod PRRs (Figs. 1 and 2). We show that malacostracans have a large repertoire of PRR proteins to efficiently cope with a broad range of pathogens. Several PRR families, CTLs, GNBPs and SCRBs, are expanded in malacostracans and this may, in part, contribute to enhanced plasticity when dealing with diverse microbial ligands. Our data will underpin comparative approaches as to how PRR activation in aquaculture affects outcomes in different conditions. Together our analyses indicate that PRRs are evolving rapidly within this lineage, reflecting the diverse selection pressure from pathogens encountered by different malacostracan groups.

Prophenoloxidases are invented at the base of Pancrustacea
The prophenoloxidase-activating system (proPO) is another non-self pathogen recogonition mechanism implicated in arthropod immunity [106][107][108]. Upon the recognition of LPS, PGNs or β-glucans by GNBPs, a serine protease cascade ensues, which results in the proteolytic cleavage of proPO into active phenoloxidase (PO). PO then catalyses melanin formation [107] and this creates a physical barrier to inhibit further pathogen growth and movement [109]. Because PO plays functional roles in the melanisation pathway and wound healing [107,110,111], the emergence of PO is associated with the evolution of humoral immunity in arthropods. POs are thought to be members of the hemocyanin superfamily; a family that is exclusively found in arthropods [112]. Due to shared sequence similarities, it was proposed that hemocyanins could be converted to proPOs upon chemical treatments [113,114]. Chelicerates (scorpions and spiders) and the myriapod S. maritima lack sensu stricto proPOs ( Fig. 1) [25,26,112,115,116] and so perhaps they would need to rely on activated hemocyanins for melanin synthesis. To date, most crustacean proPOs were identified from decapods [117][118][119][120][121][122][123]. We found only two malacostracan proPOs from non-decapod species in GenBank, Nebalia kensleyi (Leptostraca; ACV33307.1), and Oratosquilla oratoria (Stomatopoda; ADR50356.1; Additional file 7: Figure S4A), indicating that proPO exists beyond decapod species. No other reports exist for pro-POs in amphipods (except for P. hawaiensis) [27], isopods, krills and mysid crustaceans. Some have reported that amphipods and isopods lack proPO [30,[124][125][126][127]. Failure to identify proPOs from isopods by an independent study could be due to the use of a limited EST dataset [127]. In contrast to the previous studies, we were able to identify proPOs from all five malacostracan orders (Additional file 7: Figure S4A). Since proPOs and hemocyanins have similar sequences, we confirmed that these are bona fide proPOs through reciprocal BLASTs and phylogenetic analysis (Additional file 7: Figure S4A). Considering that proPOs are present in insects, D. pulex and malacostracans but not in myriapod and chelicerate lineages (although related proteins with predicted tyrosinase activity were identified) [26], it is likely that this non-oxygen binding derivative of hemocyanin was invented at the base of Pancrustacea. ProPOs may have evolved distinct roles in immunity since we were still able to identify many other hemocyanin genes in malacostracans (Additional file 7: Figure S4B). Parallels have been drawn between the initiation of serine protease cascades and the conversion of proPOs into catalytically active POs after exposure to PAMPs [128,129]. POs must be tightly regulated by serine proteases since PO activation generates highly reactive toxic quinone intermediates [109,130,131] and CLIP-domain serine proteases are implicated in this process [132]. CLIP-domain serine proteases are expanded in Diptera insects (D. melanogaster has 47 genes) but not in D. pulex, S. maritima and chelicerates ( Fig. 1) [26,133]. We made similar observations on the expansion of CLIP serine proteases in Malacostraca (Additional file 7: Figure S4C). We annotated over 2163 CLIP serine proteases. The highest numbers across five malacostracan orders are: the decapod L. vannamei (72 genes), the amphipod P. hawaiensis (54 genes), the isopod Proasellus meridianus (68 genes), the krill Meganyctiphanes norvegica (57 genes) and the mysid crustacean Neomysis awatschensis (56 genes; Additional file 4: Table S3). The expansion of CLIP serine proteases in malacostracans may signify a need for highly regulated PO activation and this correlates with our novel findings of proPO presence across the broader Malacostraca.
Toll and JAK-STAT pathways are conserved in Malacostraca while several key components of the Imd pathway are lost Signal transduction pathways link recognition of PAMPs by PRRs with transcriptional activation. Three wellstudied pathways are the Toll, Imd and Janus Kinase (JAK)-signal transducer and activators of transcription (STAT) pathways. Components of the Toll pathway in malacostracans include a chain of interacting proteins: the Toll-like receptors (TLRs) [134][135][136], Spätzle [137][138][139][140], myeloid differentiation factor 88 (MyD88) [141], Tube, Pelle, Dorsal, Cactus and the Toll-interacting protein (TOLLIP; Fig. 3A). Our investigation of the malacostracan Toll pathway members suggests that this pathway is broadly conserved. Malacostracan TLRs appear to have undergone divergent evolution through multiple gene duplications and from our phylogenetic analysis, we saw that paralogs exhibited marked sequence divergence (Additional file 8: Figure S5). Like in dipterans (D. melanogaster and A. aegypti have 6 genes each), we discovered multiple-copies of the gene encoding Spätzle, the cytokine partner of Toll, in malacostracans. In particular we observed that Euphausia superba has at least Spätzle encoding genes, while we found numbers more in line with insects in E. sinensis (6 genes), A. leptodactylus (6 genes), P. hawaiensis (7 genes), M. norvegica (5 genes), N. awatschensis (7 genes) and Proasellus sonalasi (7 genes; Fig. 3B). We conclude that the unique expansion in E. superba might be a unique curiosity of this species. We identified intact MyD88-Tube-Pelle complexes in only 13 out of 55 malacostracan species (Fig. 3c). As in insects [133,142], MyD88 and Tube each exist as singlecopy genes in Malacostraca (Additional file 9:  Fig. 3 Toll pathway members in malacostracans. a Soluble PRRs such as the GNBP1 and GNBP3 are involved in the recognition of non-self, e.g. peptidoglycans and β-glucans, which triggers proteolytic through the activation of CLIP-domain serine proteases. The cytokine Spaetzle is cleaved by the Spaetzle processing enzyme and this activates the Toll receptor. Signalling through Toll acts via the Toll-induced signalling complex (TISC), comprising of three proteins containing death-domains: Tube, myeloid differentiation primary-response gene 88 (MyD88) and Pelle. TICS signal is transduced to Cactus (a homologue of the mammalian inhibitor of NF-κB). Cactus is phosphorylated, polyubiquitylated and degraded and the dorsal-related immunity factor (DIF) is translocated to the nucleus. DIF binds to NF-κB response elements to induce gene expression. The graphs represent the total number of b Spätzle, c MyD88, Tube and Pelle and d Dorsal and Cactus transcripts in malacostracans. The y-axes represent total number of genes identified in all 55 malacostracan species for each family. Each species is represented by a number on the X-axes and a complete list of species is available in Additional file 3: Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. Phylogenetic trees of e Toll-interacting protein (TOLLIP), f Dorsal and g Cactus are constructed using the maximum-likelihood method from an amino acid multiple sequence alignment. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n = 1000) for all trees can be found in Additional file 26: Figure S14. Scale bar represents substitution per site of a crustacean Tube homolog has been previously shown [143]. Although dipterans have one copy of Pelle, we observed duplications of Pelle in some malacostracan species in the Decapoda (Homarus americanus and L. vannamei), Amphipoda (Echinogammarus veneris, P. hawaiensis and T. saltator) and Isopoda (Proasellus beticus, P. coiffaiti, P. coxalis, P. grafi, P. hercegovinensis and P. rectus) ( fig. 3C; Additional file 9: Table S4). As two copies of Pelle were found in D. pulex and the deer tick Ixodes scapularis [26], it is possible that duplication of Pelle may have occurred at the base of arthropod lineages, with subsequent loss of one copy in insects. We identified single-copy homologs of Dorsal and Cactus in malacostracans (Fig. 3d, f and g). TOLLIP is a negative regulator of NF-κB in mammals [144]. Not much is known about the function of TOLLIP in invertebrates and to date, only one TOLLIP in crustaceans has been described [145]. We identified single-copy homologs of TOLLIP across five Malacostraca orders, D. pulex, the myriapod S. maritima and chelicerates (Mesobuthus martensii and Ixodes scapularis) (Additional file 9: Table S4, Fig. 3e). Amongst dipterans, we identified single-copy TOLLIP homologs in Anopheles gambiae but neither in D. melanogaster nor A. aegypti, although it is present in other insects like bees and ants ( fig. 3E; Additional file 9: Table S4). Malacostracan TOLLIPs share similarities to mammalian TOLLIP proteins having both the protein kinase C conserved region 2 and the Cterminal coupling of ubiquitin to endoplasmic reticulum degradation domain [144,146]. Most components of the Imd pathway are present in malacostracans except for three gene families (Fig. 4a). Imd is conserved amongst insects, myriapods and D. pulex, but not in chelicerates [26,147]. Imd exists as a single gene within malacostracans across all five orders (Fig. 4b). Imd is preferentially activated by the inner PGN layer of Gram-negative bacteria through the binding of PGRP-LC to PGN [148,149]. To our knowledge, no PGRP homologs have been previously reported in crustaceans including D. pulex [25] and P. hawaiensis [27]. Although we failed to identify PGRPs in most malacostracans, we found four putative PGRP genes from T. saltator (Amphipoda), Proasellus karamani (Isopoda) and H. americanus (Decapoda; Additional file 10: Figure S6A). This could indicate a complex pattern of PGRP loss amongst crustacean taxa, that PGRPs are present but not represented in available malacostracan transcriptome and/or that the PGRP sequences we have found have evolved convergently. Sequence analysis revealed that these malacostracans PGRPs possess the amidase domain and share striking sequence similarities to D. melanogaster PGRP-SC1, SC2 and SB2 (Additional file 10: Figure S6D). Within this domain, five amino acid residues (H-Y-H-T-C; marked in Additional file 10: Figure S6D) have been shown to be critical for PGRP enzymatic activity [150,151]. These residues are present in the malacostracans PGRPs annotated here, indicating that they have the potential to be enzymatically active. These data suggest that PGRPs are present in Crustacean taxa but perhaps have greatly reduced representation. Future genome sequenced based analyses will be required to clarify this. A negative regulator of Imd signalling is the Caspar protein, a homolog of the mammalian Fas-associating factor 1 [152]. We identified single homologs of Caspar across all five malacostracan orders (Fig. 4C) and in other arthropods (D. pulex, dipterans, chelicerates and myriapod) indicating that it is conserved in Arthropoda (Additional file 11: Table S5). Concerning other Imd pathway components, we identified single-copy homologs of Relish, death-related ced-3/Nedd2-like protein (DREDD), IκB kinase β (IKKβ) and MAPKKK transforming growth factor -β (TGFβ)-activated kinase 1 (TAK1) in malacostracans indicating that these components of the Imd pathway have remained intact (Fig. 4, Additional file 10: Figure S6 and Additional file 11: Table S5). However, we failed to identify clear homologs of IκB kinase γ (IKKγ), FAS-associated death domain (FADD) and TAK1binding protein (TAB2) in malacostracan transcriptomes, which could be due to sequence divergence or the replacement of these components with other functionally related proteins. Independently, components of the Imd pathway have been reported in the decapod Carcinus maenas and the authors also did not find clear homologs for IKKγ or FADD [37]. Not much is known about the Imd pathway in crustaceans. To date, only two homologs, Imd and Relish have been subjected to functional studies in shrimps [153][154][155][156]. Overall, the Imd pathway appears to be reduced in malacostracans. Whether this is a result of actual gene loss, sequence divergence or the utilization of alternative proteins for Imd signalling is presently unknown and warrants further investigation.
Core components of JAK-STAT include the cytokine transmembrane receptor Domeless, JAK (Hopscotch in D. melanogaster) and STAT proteins (Additional file 12: Figure S7A) [157][158][159][160]. Mammals have four JAK proteins and seven STATs [161] while most arthropods, except chelicerates, only have single-copy JAK and STAT homologs [26,133]. JAK and Domeless proteins in malacostracans exist as single homoogs ( Fig. 2C; Additional file 12: Figure S7B, Additional file 13: Table S6). Like Domeless, STAT in most malacostracans exists as single homologs, except in two amphipod species; Hyalella azteca (2 genes) and T. saltator (3 genes; Additional file 12: Figure S7C, Additional file 13: Table S6). Negative regulators of JAK-STAT include the suppressor of cytokine signalling (SOCS) and  Fig. 4 Immune deficiency (IMD) pathway members in malacostracans. a PGRPs recognises Gram-negative bacteria and activate the IMD pathway through the RHIM motifs. Although the IMD pathway is typically activated by PGRPs in Drosophila melanogaster, PGRPs are not necessary for IMD signalling and it was posited that an unknown protein is present upstream of the IMD signalling cascade. Like DIF from the Toll pathway, in the IMD pathway, differential activation of another NF-κB transcription factor, Relish, occurs. Relish is phosphorylated through the activation of IκB kinase (IKK) complexes and transforming growth factor-β-activated kinase 1 (TAK1). The caspase-8 homolog death-related ced-3/Nedd2-like protein (DREDD) and FAS-associated death domain (FADD) proteins are required for IKK and TAK1 activation and Relish is cleaved through DREDD. Caspar, a homologue of mammalian Fas-associating factor 1 that is essential for antifungal immunity, negatively regulates the IMDmediated immune response by preventing nuclear translocation of Relish. Caspar also suppresses the IMD pathway through targeting Dredddependent cleavage of Relish. Phylogenetic trees of b IMD, c Caspar, d Relish and e DREDD are constructed using the maximum-likelihood method from an amino acid multiple sequence alignment. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n = 1000) for all trees can be found in Additional file 26: Figure S14. Scale bar represents substitution per site protein inhibitors of activated STAT (PIAS) [162,163]. SOCS and PIAS are also well conserved in malacostracans (Additional file 13: Table S6; Additional file 12: Figure S7D and Additional file 14: Figure S8). Mammals have eight SOCS proteins while D. melanogaster only has three [160,164]. Copy number of SOCS varies between malacostracans; some of the highest numbers are in L. vannamei, P. hawaiensis and M. nipponense where they have 6, 6 and 5 genes respectively (Additional file 13: Table S6). Phylogenetic analysis of malacostracan SOCS proteins revealed that they clustered in seven major groups (Additional file 14: Figure S8). Few studies on crustacean SOCSs are available [165,166], and whether the entire malacostracan SOCS repertoire have roles in immunity is yet unknown.
Overall we have shown that three signal transduction pathways, Toll, Imd and JAK-STAT have remained largely conserved in malacostracans. Nonetheless, several components of these pathways exhibit lineage specific diversification, for example, the loss of three Imd pathway modules (IKKγ, FADD and TAB2) and the divergent evolution of core TLRs and Spätzle components of the Toll pathway.

Anti-lipopolysaccharide factors and crustins are malacostracan-specific antimicrobial peptides
Signal transduction culminates in the activation of immune effector molecules to neutralise pathogenic agents. Antimicrobial peptides (AMPs) are rapidly evolving, highly specific effector proteins that are potent agents against a broad range of microbes [167,168]. D. melanogaster has seven AMP families, but only three of these, attacins, cecropins and defensins, are shared with other dipterans [133]. To date, fifteen AMP families have been reported in crustaceans, fourteen of these are from decapods and many are lineage-specific [169,170]. We considered two of these AMP families, antilipopolysaccharide factors (ALFs) and crustins, and show that they are actually well conserved in malacostracans beyond just the Decapoda (Figs. 5 and 6). While ALFs have only been reported in decapods [170,171], crustins have been reported once in a non-decapod malacostracan species, the amphipod Gammarus pulex [172]. The branchiopod D. pulex lack ALFs and crustins or anything with sequence similarity, and we did not identify clear homologs in other arthropods, indicating that both gene families are specific to Malacostraca (Additional file 15: Table S7; Fig. 5a). We identified a total of 337 ALFs from malacostracans form a wide range of tissue samples (Additional file 15: Table s7). The decapod Hyas araneus has the highest number of ALFs (20 genes; Additional file 15: Table S7; Fig. 5b). Using homology modelling, we find that ALFs in malacostracans share high structural similarities, consisting of three β-sheets and three α-helices (Fig. 5a'). Alignment analysis of malacostracan ALFs revealed that they contained two conserved cysteine residues predicted to form a disulfide bridge (Fig. 5c) [170,173,174]. Between the cysteine residues, a region containing positively charged amino acids is defined as the LPS-binding domain [171]. This domain is present in all malacostracan ALFs, which suggests a conservation of LPS binding across this whole gene family (Fig. 5c).
Crustin is a cysteine-rich AMP containing a whey acidic protein (WAP) domain and was first discovered in the decapod Carcinus maenas to have a role in defence against Gram-positive bacteria [175]. Crustins are abundant in malacostracans and we identified 513 putative genes with E. superba encoding at least 50 crustins ( Fig. 6a and b; Additional file 15: Table S7). In comparison with other WAP domain proteins, crustins are characterised by an additional crustin domain consisting of 12 conserved cysteine residues, in which a single WAP domain is present and we note that this the case in all malacostracans crustins (Fig. 6c). Future studies can now address the biological roles of these AMPs and questions as to whether these AMPs are differentially regulated by specific microbial ligands, whether they are broad spectrum or selective and whether they are active in specific developmental stages. Our analyses show that the immune effector phase in malacostracans has undergone substantial lineage specific evolution, expansion and sequence diversification of AMPs, reflecting their modes of action to guard against a broad range of pathogens found in their natural habitats. We also note conservation of AMPs across Malacostraca, meaning that nonfood crop species that can be easily studied in the lab will be potential model systems for this aspect of malacostracan immunity.
Malacostracans have a canonical RNAi-based antiviral immune system RNA interference (RNAi) is a conserved antiviral mechanism in many systems [176][177][178][179][180][181]. RNAi-mediated gene silencing is now employed as a method to prevent viral disease progression in shrimps through the targeting of viral genes in order to inhibit replication [182][183][184][185][186]. No direct mechanistic evidence exists regarding the involvement of the RNAi pathway components in crustacean innate immunity. Despite this, there have been increasing efforts to identify RNAi pathway members in penaeid shrimps because of the potential applicability of RNAi-derived technologies in circumventing viral diseases [180-182, 185, 187-192]. We annotated core RNAi components in malacostracans, which include Dicer, the trans-activating response (TAR) RNA-binding protein (TRBP) and Argonaute-2 (Fig. 7a). We found single-copy homologs of TRBPs across all five malacostracan orders and they share the conserved dsRNA-binding domain ( Fig. 7d; Additional file 16: Table S8). We identified Dicer proteins in amphipods, isopods, decapods and krills, but not in the mysid crustacean N. awatschensis (Additional file 16: Table S8; Fig. 7b). Dicer-1 and Dicer-2 have distinct roles in D. melanogaster, where the former is involved in microRNA (miRNA) biogenesis while the latter participates in   Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n = 1000) for all trees can be found in Additional file 26: Figure S14. Scale bar represents substitution per site. c Multiple sequence alignment of ALFs showing the putative signal peptides and LPS binding motifs characterised by two conserved cysteine residues are marked in red boxes dsRNAs processing into small-interfering RNAs (siRNAs) [193]. Phylogenetic and sequence analysis of malacostracan Dicer proteins revealed that they form two clusters representing Dicer-1 and Dicer-2 ( fig. 7B). With a few exceptions, most malacostracans have single-copy homologs of Dicer-1 and Dicer-2 proteins; the krill species E.  Fig. 6 Crustin antimicrobial peptides in malacostracans. a Graph of putative crustin transcripts. The y-axes represent total number of genes identified in all 55 malacostracan species for each family. Each species is represented by a number on the X-axes and a complete list of species is available in Additional file 3: Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. b Phylogenetic tree of crustins is constructed using the maximum-likelihood method from an amino acid multiple sequence alignment. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n = 1000) for all trees can be found in Additional file 26: Figure S14.  Table S8). With respect to Argonautes, we observed that most malacostracan species have multiple copies of this gene. We show that these putative Argonautes form a separate cluster from the closely related Piwi proteins (Additional file 17: Figure S9). We identified single-copy homologs of Argonaute-1 and multiple copies of Argonaute-2 in malacostracans (Additional file 17: Figure S9). Duplications of Argonaute-2 have occurred independently in specific lineages because variable copy number of this protein is reported in chelicerates but not in insects [26,194]. Also, the longer branch lengths of Argonaute-2 proteins indicate that sequence evolution is higher than those of Argonaute-1 (Additional file 17: Figure S9). In L. vannamei, only Argonaute-2 is responsive to dsRNA [191]. Hence, it was thought that Argonaute-1 operates through the miRNA pathway in shrimps [195]. As in arthropods, the miRNA pathway is associated with crustacean antiviral defence [196,197]. The expression of miRNAs in M. japonicus was differentially regulated upon viral challenge [198] and in other systems, viral infection results in the modification of host miRNA profiles [199][200][201]. Components of miRNA biogenesis are intact in malacostracans; we identified single-copy homologs of Drosha and partner of Drosha (Pasha) (Fig. 7c and e). Research in the area of RNAi-mediated antiviral immunity has remained comparatively sparse in crustaceans, despite its rich therapeutic potential. Our results provide independent evidence that malacostracans have a naturally occurring antiviral defence mechanism in place. Much more needs to be done to understand the role of RNAi in innate immunity before it can be exploited for host defence against viral infections.

Four novel gene families with potential involvement in malacostracan immunity
We have shown that although most canonical immunity genes and pathways in malacostracans share broad conservation with arthropods, lineage specific diversifications and gene duplications are common, which together suggests that lineage specific immune components may exist. With the advent of high-throughput sequencing, we are now able to tap into the availability of growing transcriptomic resources to find currently unknown proteins that might have potential involvement in host defence. Here, we present four novel gene families classified on the basis of shared domains implicated in immune function. We obtained a set of crustacean specific proteins from an orthology analysis using complete arthropod genomes [27]. This list contained 750 protein sequences that have no significant blast hit to any other sequences in the NCBI nr database. We filtered this list down to 82 genes based on the presence of known Pfam domains [202] and then down to a selection of 4 genes with domains suggestive of immune function (Additional file 18: Table S9). We used P. hawaiensis as a starting point for this analysis as this is the only complete Malacostraca genome available to date. While we are aware of potential limitations of this approach; for example, we may miss gene families that are not present in P. hawaiensis, since we were interested in genes that are found across all five malacostracan orders, we were able to rationalise the use of P. hawaiensis, an established to laboratory organism, as a basis for comparison. Considering genes with known Pfam domains allowed us to investigate potential structural characteristics that have implied immune function. We observed broad conservation of all four novel gene families in Malacostraca. We have named these gene families, pending functional studies, according to their Pfam annotations: 1) chitin binding peritrophin-A family, 2) death domain family, 3) ML domain family and 4) von Willebrand factor type A family (Additional file 18: Table S9). We found that these genes are all expressed in the P. hawaiensis and that they exhibited differential expression patterns between developmental stages and tissue types ( Fig. 8; Additional file 19: Table S10). Peritrophins are chitin-binding proteins originally isolated from the insect gut peritrophic membrane [203]. Peritrophic membrane is thought to constitute a barrier for midgut epithelial cells to prevent the entry of microbes [204][205][206]. Peritrophin-like proteins are characterised by peritrophin domains. One example is the peritrophin-A domain, which contains six conserved cysteine residues separated by other non-conserved amino acids (Additional file 20: Figure S10) [207]. It has been shown recently that crustaceans also have peritrophin-like genes. Penaeid shrimps express peritrophins during oogenesis and these proteins have roles in the protection of spawned eggs against Vibrio [203,208]. A peritrophin-like gene cloned from Fenneropenaeus chinensis could bind chitin and Gram-negative bacteria [209] and another peritrophin-like protein from Exopalaemon carinicauda is involved in WSSV infection [210]. To date, only 10 peritrophin-like genes have been identified in crustaceans and no reports exist beyond decapod species [203,[208][209][210][211][212]. Here, we identified 80 novel peritrophin-like genes across five Malacostraca order (Additional file 18: Table S9; Additional file 20: Figure S10A). This novel family of peritrophin-like proteins have no significant similarities to other peritrophin genes previously reported in crustaceans or arthropods. They share 18 conserved cysteine residues, a chitinbinding peritrophin-A domain and multiple conserved aromatic amino acids (Additional file 20: Figure S10B). In P. hawaiensis, we observed that one peritrophin-like gene is highly expressed in gut, hemocyte and limb samples but not in whole embryonic or adult tissue samples (Fig. 8). This corroborates the observation that other peritrophins are found in the gut peritrophic membrane. They may serve additional roles in innate immunity since we also saw increased expression in circulating hemolymph (Fig. 8).
The second novel gene family specific to malacostracans is characterised by a death domain (DD). The DD superfamily represents evolutionarily conserved proteins of four subfamilies: the DD subfamily, the caspase recruitment domain subfamily, the pyrin domain subfamily and the death effector domain subfamily [213,214]. Many DD superfamily members are involved in the regulation of immune response. Some examples are Imd, FADD and DREDD of the Imd pathway [215][216][217][218][219], MyD88, Tube and Pelle of the Toll pathway [137,220] and receptor interacting protein, tumor necrosis factor receptor-1 (TNFR1), TNFR-associated death domain and MAP kinase-activating death domain of the TNF pathway [221][222][223][224][225]. This new malacostracan DD gene family is a group of novel single-copy transcripts sharing a Cterminal DD (Additional file 21: Figure S11A). This family is specific to Malacostraca and has no significant blast results to the nr database or any other DD-containing proteins (Additional file 18: Table S9). The N-terminal region of this family exhibited considerable similarities within members of this group but no known domains could be determined by hidden Markov model (HMM) in this region (Additional file 21: S11B). Experimental confirmation revealed that embryonic samples show the highest expression of a P. hawaiensis DD gene (Fig. 8). This observation is intriguing because programmed cell death can be a key process during animal embryogenesis [226] and many DD proteins are shown to be involved in apoptosis [216,227].
The third novel gene family we identified in malacostracans is characterised by the presence of a MD-2related lipid recognition (ML) domain. The ML domain was originally identified from a group of unknown proteins that share regions of homology with the MD-2 protein [228]. We identified 39 transcripts containing the ML domain; they have no significant blast hits to any known genes so we named this the malacostracan ML family (Additional file 22: Figure S12A; Additional file 18: Table S9). The malacostracan ML family contains six conserved cysteine residues and these residues may be involved in the formation of disulfide bonds (Additional file 22: Figure S12B). Mutation of a conserved cysteine in MD-2 abolishes the response to LPS, which suggests a role of the disulfide bond in MD-2 function [229]. Members of this ML family are expressed in a wide range of tissue types; we observed expression in brain and nervous system samples (Additional file 22: Figure S12A; Additional file 18: Table S9). ML genes in P. hawaiensis exhibited the highest level of expression in adult tissue samples and to lesser degrees in embryos or gut samples (Fig. 8). Since ML proteins have been implicated in lipid signalling, metabolism and immunity [228,[230][231][232] ML in whole adult samples may imply a metabolic role in addition to host defence, as expression is also be observed in hemocytes. Although not much is known about the direct roles of ML proteins in immunity, others have proposed that they could participate as lipid-binding cofactors in the recognition of pathogenic agents [228]. In mammals, MD-2 directly binds LPS through the MD-2-TLR4 complex of the Toll signalling pathway [233][234][235]. MD-2 has a leader sequence for endoplasmic reticulum targeting and secretion but lacks any transmembrane domain [236]. We predicted the presence of putative N-terminal transmembrane topologies in the malacostracans ML family, which indicates that they could be anchored to the cell membrane and have unknown PRR functions rather than being secreted (Additional file 22: Figure S12B). The final family of novel malacostracan genes with potential involvement in immunity has the von Willebrand factor type A (VWA) domain. Von Willebrand factor (VWF) proteins were discovered in patients with blood clotting disorders, named the von Willebrand disease [237,238]. They are ubiquitous in blood plasma and connective tissues and have functions in binding blood clotting factors and in mediating platelet adhesion at the site of vascular injury [239][240][241]. It was thought that enzymatic components of the blood clotting and complement system utilise similar macromolecular building blocks that existed before the protostome-deuterostome divergence [242]. Since arthropods have open circulatory systems and lack adaptive immunity, hemolymph clotting is a pivotal part of the immune response because it functions not only to prevent hemolymph loss but also to immobilise pathogens at the site of wound [243]. Arthropod hemolymph is also sensitive to small amounts of microbial polysaccharides [244][245][246] and clotting enzymes in arthropods emerged via convergent evolution from the assembly of domains found in vertebrate factors rather than being exact orthologs [242]. Here, we describe the novel observation of a gene family that contained a VWA domain (Additional file 18: Table S9). They are present as multiple-copy homologs; we identified 87 genes from all five malacostracan orders and they have no significant sequence homology to any other known sequences (Additional file 23: Figure S13). The malacostracans VWA family is made up of members with long transcripts up to 7 kb in length, with coding sequences translated to polypeptides of up to 1500 amino acids; this feature is commonly seen in other VWF proteins [247,248]. We observed expression in tissue specific datasets obtained from the brain, nervous system, hepatopancreas, hemocytes, gill and eyestalk (Additional file 18: Table S9). From our RT-PCR results, the five VWA genes in P. hawaiensis exhibited diverse patterns of expression; e.g. the VWA2 gene appears to be highly expressed in embryos, limbs, gut, hemolymph and adult tissues while the VWA3 gene is not expressed in gut and hemolymph samples (Fig. 8). In summary, we have described four novel gene families specific only to malacostracans, which are good candidates for fulfilling roles in host defence. These genes offer new avenues for research and further analysis will be required to ascertain if they fit into the conceptual framework of innate immunity. Given our conservative approach to identifying gene families with domains that relate to known Pfam domains, it seems likely further comparative study, coupled with functional genomics and immunobiology approaches, will identify more malacostracan specific immune related genes.

Conclusion
The recent availability of transcriptome sequences of distantly related malacostracan species has allowed us to describe molecular components of their innate immune systems at a new level of detail. This data is now available to the community to inform the next stages of immune research to underpin important aquaculture developments. By separating the immune response into successive phases, we observed dynamic evolutionary adaptations in the pathogen recognition phase, signal transduction and effector response systems. Malacostracans achieve flexibility in recognising infections through the divergent evolution of certain PRR families, notably the gene expansions of GNBPs and CTLs. Upon recognition, several enzymatic cascades are involved in signal modulation and these too have novel evolutionary features. Malacostracans achieve diversity in modulation components through gene duplications of modulatory families involving CLIP serine proteases and Spätzle. When drawing comparisons to other arthropods, we observed novelties in these immune modulation components and are able to form strong evolutionary hypotheses as when key pathways evolved or diverged (e.g. the invention of proPO at the base of Pancrustacea). Core immune signal transduction pathways are largely conserved in malacostracans, although several components of the Imd pathways have been lost. The Imd pathway is activated through the digestion of PGNs by PGRPs in D. melanogaster [53].
PGRPs are previously thought to be lost in Crustacea; as D. pulex lack PGRPs [25] and no PGRPs are present in the P. hawaiensis genome [27]. Despite this being true for most malacostracan datasets we considered here, we were able to identify four PGRP genes spread across the Amphipoda, Isopoda and Decapoda orders, which are predicted to be catalytically active based on the presence of essential residues (Additional file 10: Figure S6A and D). Although unlikely, PGRPs in these species may have appeared through convergent evolution. Future studies will be required to determine the whether the biological roles of these PGRPs have any relevance in host defence and the evolutionary events that explain their relatively patchy phylogenetic distribution.
Effector mechanisms in malacostracans, like in other arthropods, are highly divergent and lineage specific. We described two malacostracan-specific AMPs previously confined to the Decapoda, and show that members of these families are widespread in other non-decapod malacostracan species. Crustaceans are regularly exposed to viral components in their natural environments [180,249,250] and hence need antiviral mechanisms in place to counteract infection. We demonstrate that malacostracans have intact siRNA and miRNA components.
Finally, we present four novel gene families in Malacostraca as potential key players of the innate immunity. We only addressed the structural significance of these genes in the context of host defence based on comparisons with other immune proteins containing similar structural features. More functional studies will be required in the future to ascertain the roles of these genes and their potential function in innate immunity before they can be confirmed as crustacean immune system components. Resources presented in this study facilitate and expand the scope of both basic and applied research, in particular analyses on the mechanistic links between specific immune modules and overall host defence. Importantly our data suggest that non-decapod species, like the laboratory model P. hawaiensis, may nonetheless be suitable for studying malacostracan specific immune mechanisms relevant to food crop species.

Innate immunity datasets and query sets
We retrieved complete transcriptome datasets for malacostracan species (available at the time of manuscript preparation) from the European Nucleotide Archive (http:// www.ebi.ac.uk/ena) (Additional file 24). These transcriptomes included those generated from specific tissue types or developmental stages. We also included the Parhyale hawaiensis transcriptome generated by a separate study [27]. All analyses were performed on a total of 69 transcriptome datasets. A full list of datasets and accessions used in this study is listed in Additional file 2: Table S1. For query sequences used in homology searches, we retrieved a set of insect immunity genes from ImmunoDB (http://cegg.unige.ch/Insecta/immunodb) [133] and known malacostracan entries compiled from Uniprot and Gen-Bank. Both gene sets are consolidated to generate a core set of protein sequences used as queries.

Identification of Malacostraca innate immunity genes
First, we use CD-HIT (https://github.com/weizhongli/ cdhit) to generate 'non-redundant' datasets. CD-HIT was used to collapse contigs that have at least 95% identity in order to merge potential splice variants. To generate a set of malacostracan immunity gene orthologs, we used multiple Basic Local Alignment Search Tool (BLAST)-based approaches such as PSI-BLAST, BLASTp and tBLASTn with varying Blocks Substitution matrices. This list was filtered by e-value of less than 10 -6 and best reciprocal BLAST hits against the GenBank nonredundant (nr) database. We then filtered the best reciprocal nr BLAST hits by the presence of conserved domains reported to be essential for function and tree-based approaches to compile a final non-redundant list of Malacostraca innate immunity orthologs. Fasta files for all sequences are available as Additional file 25.

Identification of conserved domains and phylogenetic tree construction
Malacostracan transcripts were in silico translated according to the longest open-reading frames into protein sequences. Conserved domains of the malacostracan hits were annotated using the Batch CD-Search Tool by NCBI (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/ bwrpsb.cgi). Hits without essential domains were discarded. Multiple sequence alignments of protein sequences were performed using MAFFT [251]. Phylogenetic trees were built from the MAFFT alignments using the WAG + G model in RAxML [252] to generate best-scoring maximum likelihood trees. Multiple sequence alignment images were generated using the Geneious programme [253]. Graphical representations of Newick trees were also generated using Geneious.

Identification of novel genes with putative immune function in Malacostraca
We previously performed orthologous group analyses in a separate study using complete arthropod genomes, which included the genome of the Malacostraca P. hawaiensis [27]. We retrieved a list containing 750 protein sequences that were found only in P. hawaiensis and have no significant blast hits to any other sequences in the NCBI nr database [27]. We performed a scan for the presence of Pfam domains [202] using HMMER (http://hmmer.org/) [254] on these 750 sequences and identified 82 genes containing Pfam domains. Further examination of predicted Pfam domains revealed four genes with domains suggestive of immune function: 1) chitin binding peritrophin-A domain pfam01607, 2) death domain pfam00531, 3) von Willebrand factor type A domain pfam05762 and 4) ML domain pfam02221 (Additional file 18: Table S9). These four genes were used as query sequences for Blast against the other malacostracan transcriptomes. We scanned putative malacostracan orthologues for transmembrane domains and signal peptides using the Phobius tool [255,256].
RNA extraction and reverse transcriptase-polymerase chain reaction (RT-PCR) of novel Malacostraca immunity genes in Parhyale hawaiensis Five types of P. hawaiensis tissue samples were collected from: 1) 100 mixed-stage embryos, 2) amputated limb fragments from 15 adult animals, 3) dissected gut tissues from 20 adult animals, 4) hemolymph from 50 adult animals and 5) adult whole tissues from two males and two females. Embryos were dissected from gravid females and rinsed with molecular grade water. Prior to tissue collection from the adults, animals were washed with filtered artificial seawater followed by treating with a mixture of clove oil (Sigma) and milliQ water (1:5000 dilution) for anaesthetization. As soon as the animals stopped moving after several minutes, the clove oil mixture was rinsed off and the anaesthetized animals were rinsed with molecular grade water. Limb fragments were dissected using spring scissors (Fine Science Tools). Gut tissues were collected using a scalpel and fine forceps. Hemolymph samples were collected by allowing the animals to bleed out in molecular grade water. Immediately after collection, 1 mL of Trizol reagent (Thermo Fisher Scientific) was added to the tissue samples in eppendorf tubes and samples were then snap frozen on dry ice. RNA extractions were performed according to the Trizol manufacturer's instructions. Concentrations of total RNA extracts were quantified using Qubit and Nanodrop. One microgram of total RNA from each tissue type was used for cDNA synthesis using the Qiagen QuantiTect Reverse Transcription Kit according to manufacturer's instructions. PCR on each gene was performed using Phusion High-fidelity polymerase (Thermo Fisher Scientific) and the following program was used for thermal cycling: 98C 30s, followed by 25 cycles of 98C 10s, 62C 30s, 72C 45 s, and then 72C 5 m. PCR products were ran on a 1% agarose gel and stained with SYBR Safe (Thermo Fisher Scientific). Primer sequences used for PCR are listed in Additional file 19: Table S10.

Additional files
Additional file 1: Figure S1. Phylogenetic relationship of Malacostraca. Malacostraca is shown within the Pancrustacea clade. Malacostraca tree is adapted from Melands and Willassen 2007. Decapod phylogeny is adapted from Scholtz and Richter 1995 and Schram [43]. Representative species are shown at each branch. Species denoted in purple are edible food crops. (PDF 153 kb) Additional file 2: Table S1. A complete list of all 69 malacostracan transcriptome datasets used in this study along with information on tissue types, developmental stages, accession IDs and total number of transcripts for each transcriptome. (PDF 65 kb) Additional file 3: Table S2. List of species used in graphs along with their corresponding number IDs. (PDF 46 kb) Additional file 4: Table S3. Malacostracans pattern recognition receptors and proPOs. (PDF 639 kb) Additional file 5: Figure S2. Multiple sequence alignment of the βglucanase domains of Gram negative binding proteins of malacostracans together with the β-glucanase protein from Bombyx mori (NP_001159614.1). Two conserved Glu active site residues are labeled as E188 and E193 based on positions in the B. mori protein. (PDF 924 kb) Additional file 6: Figure S3. Thioester-containing protein (TEPs) family. (A) Phylogenetic tree of the TEP family is constructed using the maximumlikelihood method from an amino acid multiple sequence alignment. Amino acid sequences include the macroglobulin complement related proteins, the vertebrate C3, C4 and C5 complement factors, arthropod TEPs and the alpha-2 macroglobulin family. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n=1000) for all trees can be found in Supplementary figure 8. Scale bar represents substitution per site. (B) Graph of putative TEP family transcripts. The y-axes represent total number of genes identified in all 55 malacostracan species for each family. Each species is represented by a number on the X-axes and a complete list of species is available in Additional file 3: Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. (PDF 1763 kb) Additional file 7: Figure S4. Prophenoloxidase activation system. (A) Phylogenetic tree of prophenoloxidase (proPO) is constructed using the maximum-likelihood method from an amino acid multiple sequence alignment. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n=1000) for all trees can be found in Additional file 14: Figure S8. Scale bar represents substitution per site. The graphs represent the total number of (B) hemocyanin and (C) CLIP-domain serine protease transcripts in malacostracans. The y-axes represent total number of genes identified in all 55 malacostracan species for each family. Each species is represented by a number on the X-axes and a complete list of species is available in Additional file 3: Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. (PDF 579 kb) Additional file 8: Figure S5. Malacostracans Toll-like receptors (TLRs). (A) Phylogenetic tree of TLRs is constructed using the maximumlikelihood method from an amino acid multiple sequence alignment of toll-IL-1 receptor (TIR) domains. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n=1000) for all trees can be found in Additional file 14: Figure S8. Scale bar represents substitution per site. (B) Graph of putative TLR transcripts. The y-axes represent total number of genes identified in all 55 malacostracan species for each family. Each species is represented by a number on the X-axes and a complete list of species is available in Additional file 3: Table S2. Black horizontal bars below each graph delimit the five orders of malacostracans and the numbers in parentheses (x/y) represent the following: x = number of species in which a particular gene family is found and y = total number of species in each order. (PDF 1124 kb) Additional file 9: Table S4. Malacostracans Toll pathway components. (PDF 474 kb) Additional file 10: Figure S6. Additional members of the IMD pathway. Phylogenetic trees of (A) PGRP, (B) IκB kinase β (IKKb); also known as immune response deficient-5 (IRD-5) and (C) TAK1 are constructed using the maximum-likelihood method from an amino acid multiple sequence alignment. Taxa labels are depicted as their respective colour codes. Bootstrap support values (n=1000) for all trees can be found in Additional file 14: Figure S8. Scale bar represents substitution per site. (D) Multiple sequence alignment of four PGRPs identified in malacostracans. Conserved amidase catalytic residues are highlighted in red boxes. (PDF 1151 kb) Additional file 11: Table S5. Malacostracans Imd pathway components. (PDF 351 kb)