Genome description and inventory of immune-related genes of the endangered pen shell Pinna nobilis: a giant bivalve experiencing a mass mortality event

Background: The noble pen shell Pinna nobilis is a Mediterranean endemic and emblematic giant bivalve. Already registered as an endangered species in the late 20th century, it is facing a dramatic and rapidly expanding epizooty that decimates its populations since mid-2016. The ecological value of P. nobilis has urged important investigations for conservation purposes. In light of this, we report here the rst draft genome of this animal. Results: The whole-genome sequencing has been performed on an Illumina HiSeq X platform using a single paired-end library of short fragments (2x150 bp). The de novo contig assembly accounted for a total size of 584 Mb (96,738 contigs, N50 = 7.6 kb, with 0.4% of “N” nucleotides), representing 77.5% of the predicted genome size of 754 Mb. The pen shell genome is very AT-rich, with a GC-content of 35.6 %. Heterozygosity was found to be in the range of other sequenced bivalves (1%). Over one third (36.2 %) of the genome consisted of repeated elements with a surprising larger number of SINEs elements compared to other molluscan genomes. We were also able to reconstruct the full mitochondrial genome (~19 kb, with 12 protein-coding genes, 2 rRNA and 22 tRNA genes). In relation with the outbreak that affects P. nobilis, we paid a special attention on the innate immune and stress-related genes found in the sequence. We revealed that P. nobilis disposes of a complete chemical defensome, and a relatively sophisticated innate immune system. Conclusion: In addition to offering a valuable resource for further research in comparative biology and evolution, access to the draft genome sequence is central to deepen our understanding of the vulnerability of P. nobilis to new diseases, which are likely to occur more often in the current scenario of a rapidly changing environment.

reference genome in the Pinnidae family and the second in the order of Pteriida, after the pearl oyster Pinctada fucata, and should be considered in future studies dealing with differential pathogens resistance across bivalve molluscs.
Materials And Methods P. nobilis tissue sampling and nucleic acid preparation A single adult individual of Pinna nobilis was sampled by freediving during summer 2016 in Les Embiez island (Six-Fours-les-Plages, France, 43° 4'41.89", 5°47'33.12"E). A non-lethal sampling method was speci cally developed to avoid manipulation of P. nobilis. Indeed, a small piece (14 mm ² , ~20 mg) of the mantle tissue was quickly excised using biopsy forceps before the fan mussel closes its valves. The piece of tissue was transferred in a tube lled with 95% ethanol and stored at -80°C until use. Total DNA was isolated using the Nucleospin® tissue kit from Macherey-Nagel. DNA quantity was measured using the Qubit™ dsDNA HS Assay kit (Invitrogen) and quality was checked on a standard 0.8% agarose gel.

Sequencing, genome assembly and size estimation
The puri ed genomic DNA (4 µg of starting material) was used to prepare a HiSeq X Ten DNA library. Sequencing was performed on an Illumina HiSeq 2x150 bp pair-end (Genewiz, USA). Trimming and de novo assembly were completed using CLC Genomics Server 9.0 and CLC Genomics Workbench 10.0 (Qiagen, Genewiz, USA). Before assembly, low-quality reads and sequencing adaptor contaminated reads were trimmed. For quality trimming, error rate was set at 0.01 whereas the maximal ambiguous nucleotides accepted was two. After trimming, reads with length < 100 bases and those of likely bacterial and viral origins were discarded from assembly. Final assembly statistics were determined using QUAST [33].
The haploid genome size was estimated from the trimmed PE reads with the k-mers frequency counting method, using JELLYFISH [34] that produced a histogram le of k-mers frequency distribution. An optimal k-mer size of 23 was chosen according to the formula k = log(200*genome size)/log (4) [35] and considering the genome sizes of the Pteriidae Atrina pectinata and Pinctada fucata (roughly 1,1 Gb), obtained from the Animal Genome Size database, release 2.0 [36] and [37]. We then used R software to plot the k-mer frequency distribution and manually estimated the genome size, by calculating the total number of k-mers divided by the position of peak depth, i.e. the value of the homozygous peak, as two peaks were obtained ( Figure S2) [27]. The proportion of single-copy genome length was obtained by dividing the total number of k-mers under the peaks by the total number of k-mers under the whole curve.
Additionally, KmerGenie [38] was used to con rm JELLYFISH results. The produced histogram le of kmer frequency distribution was interpreted using Genomescope [39] to deduce the genome size and the proportion of unique sequences.

Nuclear genome annotation and completeness
The nature and occurrence of interspersed repeats and low complexity DNA sequences within our assembly were assessed using RepeatMasker v1.33 [40] run with nhmmscan version 3.1b2 (February 2015). Then, a second run of RepeatMasker v1.33 was performed using RepeatModeler 1.0.11, that is a package allowing a de novo identi cation and modelling of repeat sequence families. Simple sequence repeats (SSR) were speci cally detected using the kmer-SSR package [41].
To annotate our genome, only contigs over 100X coverage were considered and blasted (BLASTN) against the nr database of the NCBI. Any remaining contigs corresponding to bacteria, protozoan or fungi organisms were manually sorted out. Afterward, a functional annotation was performed using Blast2GO work ow [42], designed to search for i) protein homologies (BLASTX) against the refseq non-redundant protein database (e-value cut-off of 10 -6 ), ii) homologies with protein motifs and domain in the InterPro and Gene Ontology (GO) databases (InterProScan) and iii) to associate GO to the best hits of the BLASTX predicted proteins and protein motifs.
Then, we ran BUSCO v.3 with default parameters, to look for orthologue genes among the unique "universal" eukaryotes and metazoan sets of genes43, 44, data sets: Eukaryota odb9 and the subset Metazoa odb9]. The proportion of "universal" genes within the genome assembly was considered as a proxy of the completeness. Finally, the number of unique genes was approximated by removing redundancies within the set of annotated contigs and GO results.
Defensome and innate immunity gene analysis As the genome assembly was fragmented, a high number of annotated contigs might actually refer to a unique gene sequence. Hence, in order to prevent from irrelevant redundancy in the gene description, we rst collected contigs that displayed a description and InterPro identi ers associated with a BLASTX evalue lower than 10 -10 [24]. These contigs were manually examined according to the encoded protein domains, the sizes of the contigs and encoded peptides, and relative to the complete sequence of an orthologous gene and to the structure of orthologous proteins. For instance, within a gene family, two contigs with an identical description encoding two different domains of a protein, were considered as one gene. Two contigs with an identical description encoding the same domain were considered as one gene or two genes, according to the structure of an orthologous protein that might contain repeated domains.
Considering the e-value cut-off, two contigs encoding similar domains but differentially described were considered as proxies of two genes.
Whenever possible, the detailed analysis of the structure of encoded peptides were assessed, especially for contigs encoding pathogen recognition receptors. In this aim, the putative complete or partial coding sequences was obtained using Augustus [45] and protein motifs were predicted using InterProScan [46] and ScanProsite [47].

Identi cation of mitochondrial genome and annotation
Contig descriptions that displayed a putative mitochondrial signature, either after BLASTX or BLASTN, were manually examined to retain those with an e-value lower than 10 -10 or 10 -6 , for BLASTX and BLASTN respectively, and exhibiting more than 75% identity and 60% coverage. One contig (Contig_66) that satis ed these criteria was further examined.
In order to double check the sequence and investigate a potential homoplasy, the trimmed reads were remapped against contig_66 using BOWTIE2 v2.3.5.1 [48] with the following options: "very-sensitive", "endto-end alignments", "allowing up to two distinct alignments per read" and selecting reads with a minimum Phred-score of 30. Mapped reads were collected, and single-nucleotide and indel variants were visualized

Results And Discussion
Sequencing and nuclear genome size determination The sequencing of Pinna nobilis DNA generated about 799 million reads in pair, yielding 240 Gbases of sequence with high quality (Fig S1). After several trimming and ltration steps, reads could be assembled into 97,633 contigs, with a 304X coverage in average, and contig N50 and average length of 7,576 and 6,007 nucleotides, respectively (Table S1). The total size of the assembly yielded 586 Mb. In the subsequent steps of the assembly analysis, only contigs over 100X coverage were considered and any remaining contig with suspicion of bacterial, protozoan or fungi origin were eliminated. In the end, our assembly yielded ~584 Mb over 96,738 contigs.
Analysis of the k-mer frequency plots using JELLYFISH [34] showed a bimodal distribution ( Figure S2), revealing the huge proportion of homozygous regions in the P. nobilis genome, of approximately 99%.
From the data of frequency distribution, we estimated a genome size of 778 Mb, of which unique sequences represent 575 Mb (73.9%). Such estimates were con rmed using KmerGenie [38] that predicted a genome size of 735 Mb, including 632 Mb of unique sequences. Thus, our assembly represents approximately 75% of the estimated genome size of the pen shell.
Among other species of Pteriida whose genome size has been elsewhere estimated, P. nobilis shows the smallest genome. It is respectively 23%, 32% and 34% shorter than those of the recently described Pinctada fucuta marcensii, which size has been estimated at 990 Mb [26], Atrina rigida (~1100 Mb from the Animal Genome Size Database) and Pinctada fucata (~1150 Mb, [37]).

Genome composition and completeness
The genome is very AT-rich (64%) while a fraction of the bases could not be determined (0.4% N) ( Fig. S1 and Table S1).  Figure 2A).
However, a high number of short interspersed nuclear elements or SINEs, accounting for more than 10 % of the genome assembly, was detected (Figure 2A and 2B). Strikingly, the proportion of SINEs in P. nobilis exceeds that of other marine mollusc organisms such as Aplysia californica, Lottia gigantea P. fucuta, Crassostrea gigas, or M. galloprovinciallis (see Figure 2A). In addition, within our assembly, SINEs were about four times more represented than long interspersed nuclear elements (LINEs) which has never been reported in bivalve genomes. Actually, the number of detected SINEs hardly re ects the number of genuine SINEs. Indeed, P. nobilis genome might contain numerous fragments of short non-autonomous retrotransposons, possibly fragments of LINEs, which actually resemble SINEs.  [61] for complete single copy gene models, and up to 43.3% and 97.7% for these species, respectively, while considering fragmented gene models as well. Strikingly, contrary to similar studies on bivalve genomes, BUSCO identi ed twice more complete than fragmented orthologues in our assembly, which was rather unexpected considering the contig sizes and sequencing coverage. This could be explained by our fragmented assembly, in which numerous contigs possibly contain only small length, partial, and potentially too divergent, coding sequences, that BUSCO did not considered. For instance, the genome assemblies of P. nobilis and C. gigas are of similar sizes and share comparable sequencing coverages. However, their assembly statistics strongly differ. Indeed, the genome of the paci c oyster C. gigas was assembled in 11,969 scaffolds (N50 = 401,319) and its genome completeness reached 79% and 95% for complete and fragmented metazoan core orthologs, respectively [28]. Other similar observations support that genome completeness is inversely related to the number of scaffolds within assembly [27]. Moreover, completeness is most of the time achieved when both genome and transcriptome are concomitantly sequenced and investigated, as illustrated by genome descriptions of the deep sea mussel B. platifrons [61,62] and P. fucuta marcensii [26] that evidenced almost full completeness. Hence, in addition to our study, P. nobilis genome would bene t from a future transcriptome sequencing to improve the completeness of the present description.
So far, using a BLASTX comparison against the protein refseq database, 51,317 (52.5%) out of the 97,633 contigs of our assembly matched an orthologue sequence. This gure was reduced to 21,773 unique annotated contigs after removing any redundancies, which is interpreted as an approximation of unique gene content. Reported to an assumed complete genome size, we would expect around 29,000 genes in the genome of P. nobilis. In the assembly of the pearl oyster P. fucuta marcensii, the estimated gene content varies from 21610 [63]  In the ontology annotation of Blast2GO, approximately 40 % of orthologues were found in only three recently sequenced bivalve species. The gene ontology (GO) distributions at level 3, among the three main categories of biological process, i.e. molecular functions and cellular components, together with the results described above, indicates the production of a comprehensive genome draft ( Figure 3).
Insight into the gene repertoires of the innate immune system and chemical defensome Within the Biological Process category, GO terms referring to "cellular response to stimulus" and "response to stress" were encountered in 2765 and 783 contigs respectively ( Figure 3). A glimpse into the number of non-redundant descriptions for more speci c GO terms (Table 1) revealed that P. nobilis would dispose of a well-developed defensome, challenging the explanation of the massive and deadly zoonoses recently experienced by this organism. Here, we report the rst inventory of genes encoding key proteins speci cally related to the innate immune response and to the acclimation to chemical stress, that all together account for the homeostasis maintenance of this organism.
Pathogen recognition receptors (PRRs) are molecular sensors that recognize pathogen structures and constitute a rst defence line against pathogens. In invertebrates, diverging evolution of the immune system has led to a diverse and complex set of pathogen binding molecules [23]. Among them, several have been inventoried in various bivalves, such as the toll-like receptors (TLRs), peptidoglycan recognition receptors (PGRPs), glucan binding proteins (GNBPs), lectins and laminins [24, 64,65]. Besides, an intracellular line of defence relies upon sensors of foreign RNAs, DEAD-box helicases, such as the retinoic acid inducible gene I (RIG I-like = DDX58) and the melanoma differentiation associated gene-5 (MDA-5) [66,67]. In our P. nobilis genome sequence, we could evidence a wide range of genes related to immunity and we could highlight several families, from PRRs to effectors of the immune response that potentially harbored a rather diverse set of genes (Table S2). Although a wide TLR repertoire has been elucidated in some invertebrates such as the sea urchin Strongylocentrotus purpuratus in which more than two-hundred TLR have been identi ed70, 71], only a few have been reported in bivalves: twenty-seven putative TLR encoding contigs in M. edulis [24], twentythree in M. galloprovincialis [65] though four have been reported in C. gigas [28] and only one detected in Mya arenaria and Chlamys farreri [72,73].

Pathogen recognition receptors
In our sequence, we identi ed twenty-seven contigs encoding TLRs of various structures, all exhibiting a TIR-domain. Among them, twenty displayed a typical TLR structure, one also encoded a C1q motif at the C-terminal end of a typical TLR and one had two TIR-domains. We also found six and nine contigs encoding either a single TIR-domain or LRRs, respectively ( Figure 4). However, the latter nine contigs could not be considered as additional putative TLRs since the TIR-domain was missing. Thus, we evidenced that P. nobilis disposes of a rather diverse set of TLRs, of a similar magnitude to the one reported in M. edulis [24]. When looking at the downstream key players of the TLR pathway, we could also detect contigs with MyD88, IRAK1, IRAK3 IRAK4 and TRAF patterns. We found ninety-two annotations for C1q motifs and twenty-six contigs encoding a peptide homologous to C-type lectins or a C-type motif (i.e. IPR001304). Using "fucolectin" as a keyword for F-type lectins, three contigs were evidenced. Finally, we found four contigs annotated for colin while no contig for interlectin could be retrieved. Those results are congruent with the reported lectins diversity among bivalves, and evidence the predominance of C1qDC proteins within the lectin repertoire in P. nobilis, as similarly reported in C. gigas [23].
Laminins are one of the extracellular matrix (ECM) components. Laminins are large proteins of approximately 800 kDa, composed of three polypeptide chains ( , and ) exhibiting ve globular domains that interact with some of the ECM components such as integrins, proteoglycans or receptors, and a coil-coiled domain [87]. Interactions between laminins and their speci c receptors activate the intracellular signalling pathways that regulate cell growth, differentiation, migration and adhesion to the extracellular matrix and other cells [88][89][90]. They also in uence the migration and functions of immune cells [91]. Noteworthy, laminins are host cell targets for a wide range of pathogens. Some recent research indeed evidenced reductions of infection rates when the expression of laminin was lowered [92] or the protein partially deleted [93], underling the importance of laminins in the process of pathogen infection.
Moreover, they even could help protozoan escape the host innate immunity, as evidenced in some susceptible mosquitos to Plasmodium ookinetes [94] and in other host-protozoan pathogen interactions [95]. This observation has to be considered in the light of the infection of P. nobilis by the protozoan Haplosporidum pinnae [96]. In the genome assembly of P. nobilis, we identi ed seven contigs encoding laminins (4 , 2 , 1 ). Ultimately, while looking at sensors of foreign viral RNAs, we could highlight two contigs, encoding one RIG I-likei.e. DDX58)and oneMDA-5i.e. interferon-induced with helicase C: Ish)gene.

Signalling
Cytokines are low molecular weight proteins secreted by immune cells in response to pathogen infections and cell injuries. They are essential mediators of the immune response acting through a series of conserved signalling pathways, which for instance can lead to contrasted outcomes such as apoptosis or cell survival. We found twenty eight cytokines within the genome assembly of P. nobilis, of which nineteen were homologous to bivalves interleukines-17 (twelve contigs) or tumour necrosis factor (TNF) superfamily members displaying conserved TNF_2 domains (seven contigs). We also identi ed one contig encoding both a well-conserved macrophage inhibitory factor (MIF), considered as a key mediator of the innate immune system across taxa, which has already been evidenced in some bivalves [97][98][99], and three interferon regulatory factors.
Regarding IL-17 and TNF superfamily in detail, our observations are consistent with previously reported evidences of a central ancient IL-17 signalling pathway in the immune response of bivalves [23, 100].
IL-17 is a pro-in ammatory cytokine mainly involved in the defence against extracellular pathogen. Moreover, the IL-17 gene might be an initiator of a more complex, integrated, immune response, as evidenced by the rapid upregulation of IL-17 gene expression observed in C. gigas haemocytes submitted to a bacterial challenge [101]. TNF-α is also mainly involved in resistance against pathogens, though in certain cases, circulating TNF might be permissive towards parasitic, bacterial, and viral infections [102]. The transcriptome of M. edulis showed twelve TNF superfamily members, similar to the number found in P. nobilis, and a high number of transcripts encoding TNF protein motifs, which emphasises the central role of TNF superfamily in the innate immune response in bivalves [24].
Besides cytokines, the immune response may also likely be ne-tuned by a set of molecules of the neuroendocrine system that can rapidly trigger an adaptive response to various biotic and abiotic stressors [103][104][105][106]. The genome assembly of P. nobilis contains a number of contigs consistent with a likely well-developed neuroendocrine system. We could indeed con rm the presence of several enzymes of the catecholaminergic, cholinergic, serotoninergic and GABAergic systems, of the nitric oxide signalling and neuropeptides (data not shown).

Destruction of pathogens
In the ght against pathogens, innate immunity relies upon the complement cascade, activated by either the classical, lectin or alternative pathway, and ultimately results in the formation of a pore complex within the pathogen membrane [107] and the production of pro-in ammatory molecules [108,109].
Vertebrates have evolved a complement system composed of the complement factors C1-9 and factor B (Bf), C3 being central since it integrates all three activation pathways and activates subsequent steps of the cascade, that further lead to the lysis of pathogen cells. In ancient taxonomic lineages, only few factors could be evidenced, suggesting an ancient origin of the complement system [110]. In bivalves, no Bf, C2, C4, C5 or C7-9 factors could be reported, challenging the comprehensive view of the complement cascade occurring in these organisms [24,110]. Consistently, we were also unable to detect these factors.
However, our genome assembly contains contigs matching the complement factors C3 and C6 as observed in M. edulis [24]. Noteworthy one contig matched an orthologous C8b protein of Rattus norvegicus (BlastX, e-value 10 -4 ), and four contigs were positive for the membrane attack complex domain IPR020864. Hence, our results corroborate the idea of a rather divergent complement system compared to that of vertebrates [110].
Antimicrobial peptides (AMPs) are small secreted peptides involved in the defence against the invasion of a wide range of pathogens [111]. AMPs typically contain one hydrophilic, positively charged, domain allowing xation on the cell plasma membrane of a pathogen, and a hydrophobic domain that embed into the plasma membrane, disrupting the membrane or intracellular functions, ultimately inducing cell death [112,113]. AMPs are a very diverse family and have been found from prokaryotes to vertebrates.
Several groups of AMPs have been reported in bivalves, i.e. defensins, mytilins, myticins and mytimycins [114][115][116]. Within the P. nobilis genome, no orthologue sequences of AMPs were discerned using keywords or InterPro identi ers. Nonetheless, six contigs were spotted to contain sequences similar to soluble scavenger receptor cysteine-rich peptides and cysteine-rich secretory proteins, which are likely involved in the innate immune response [117][118][119][120].

Apoptosis
Apoptosis is also a prominent and e cient mechanism that limits the progression of pathogen infections, while preventing in ammation of the invaded tissue. Most of the time, apoptosis is triggered following interactions between the host immune cell and the pathogen, but obligate intracellular microorganisms have evolved strategies to circumvent apoptosis and promote survival [121][122][123]. The genome of P. nobilis contains the main actors of apoptosis, i.e. members of the BCl-2 family, caspases and inhibitors and activators of apoptosis. Noteworthy, we also found 38, 34 and 21 contigs encoding a DEATH, BIR and CARD domains, respectively. These results suggest a well-developed apoptotic machinery in molluscs, as previously discussed [24].

Inventory of stress-related genes
A wide range of structurally diverse chemical stressors is encountered in the marine environment. Heavy metals, natural exogenous compounds (e.g. toxic aldehydes), microbial and algal products or manmade molecules, such as pesticides and hydrocarbons, may considerably affect the physiology of marine organisms and challenge their persistence [124]. This is especially true when considering sessile organisms that are unable to escape their contaminated environment. Therefore, a variety of proteins have evolved towards the protection against potentially toxic chemicals and have been grouped under the term of chemical defensome [25]. More speci cally, it mainly consists in sets of enzymes that, collectively, ensure the cell detoxi cation through four main processes : i) the modi cation of chemicals to less toxic and more easily excretable substances, ii) the elimination of toxicants and transformed products out of cells, iii) the molecular protection against radicals including the reactive oxygen species and iv) the sensing of environmental contamination or cellular damages. The P. nobilis genome assembly possesses most of the genes linked to the chemical defensome. Altogether, they account for roughly 1.5% (337/21,773) of the predicted genes in P. nobilis genome, similar to the proportion observed in other organisms [125]. Thereafter, we present an insight into the P. nobilis chemical defensome (Table S3) and detail the family diversity of some key genes, according to the four processes presented above.

Reductive, conjugative and oxidative biotransformation
Proteins involved in chemical biotransformation are required to mitigate the toxicity of chemicals and to modify these toxicants to more hydrophilic substances prior to their excretion. Among those, the cytochromes P450 (CYPs), the sulfotransferases (SULTs) and the UDP-glucuronosyl and glycosyl transferases (UGTs)represented expanded families.
Cytochromes P450 are enzymes responsible for oxidative biotransformation. The diversi cation of CYP genes is very important in almost all species and genes are organised in numerous families, of which the CYP1, CYP2, CYP3, CYP4, CYP6 and CYP9 families account for the most important ones [126]. The genome assembly of P. nobilis contains ninety-eight contigs annotated for CYP genes, which overall presented a relatively high mean similarity with orthologous sequences (approximately 70%), allowing annotation at the family level, revealing thirty-seven CYP2, thirteen CYP3 and twenty-four CYP4.
According to the classi cation of CYP Clan [127], P. nobilis CYPs were found distributed among Clan 2, Clan 3 and Clan 4 ( Figure 5) resembling the distribution in other organisms [25,128]. Noteworthy, no CYP19, i.e. the aromatase, was identi ed which is also consistent with other reported observations in invertebrates [129,130].
Sulfotransferases (SULTs) are a variety of either cytosolic or membrane-anchored enzymes that catalyse the conjugation of sulfuryl groups to many diverse molecules, with a target speci city according to their localization [131,132]. For instance, the cytosolic SULTs mediate the transfer of a sulfuryl group to endogenous metabolites and xenobiotics [133], although membrane SULTs catalyse the sulfation of carbohydrates and tyrosyl residues of larger proteins or peptides [134]. In bivalves, SULTs represent a large family. For instance, twenty-eight and thirty-one genes have been detected in the oyster and pearl oyster genomes, respectively and up to eighty-three genes in the Chlamys farreri genome [28, 37, 135]. We found twenty-eight genes encoding SULTs in the genome of P. nobilis, distributed in nine subfamilies. However, most SULT genes belonged to subfamily 1 (eight genes), 10 (seven genes) and 11 (seven genes). Moreover, we could evidence a total of seventeen membrane anchored SULTs in our assembly.
Glutathion S-transferase (GSTs) are typically small enzymes of less than 250 amino acids [136], present in all three branches of life (Archaea, Eubacteria and Eukaryota). Once activated upon exposure to a range of toxins or to oxidative damages, they transfer reduced glutathione (GSH) to hydrophobic xenobiotics [137][138][139]. The classi cation of the GSTs family separates the cytosolic, mitochondrial and microsomal GSTs, and distinguishes fourteen subclasses. In eukaryotes, the subfamilies alpha, delta, epsilon, omega, pi, sigma, theta, mu and zeta are cytosolic GSTs, while kappa GSTs are the mitochondrial and peroxisomal ones. Structurally, all GSTs are homo-or heterodimers of the same GST class. They display conserved C-and N-terminal domains that encompass two types of binding sites, conserved Gsites which bind GSH and H-sites that are highly variable, allowing interactions with a wide range of hydrophobic substances [140,141]. The mitochondrial kappa GSTs are homodimers exhibiting structural similarities to cytosolic GSTs, thus likely have similar molecular speci cities [142]. Microsomal GSTs, on the contrary, result from the association of three identical monomers that each contain an alpha helix and a GSH binding domain [143,144].
Overall, we found fteen contigs encoding GSTs in our P. nobilis assembled sequence. Eleven GSTs clustered with the cytosolic GSTs: one alpha, one mu, two omega, one pi and three theta, while three of them could not be speci cally assigned. Alpha GSTs protect against therapeutic drugs, environmental toxins and product of oxidative stress, while mu GSTs protect against molecules such as polycyclic aromatic hydrocarbon metabolites [145,146]. Compared to other subclass of GSTs, omega GSTs have additional conjugative and biotransformation abilities and perform a wide range of thiol transferase and reduction reactions, such as the biotransformation of reactive α-haloketones to nontoxic acetophenones [142]. The theta GSTs are mainly involved in counteracting general oxidative stress [141,[147][148][149][150].
Eventually, one mitochondrial kappa GST and three microsomal GSTs (isoforms 1, 2 and 3) could be distinguished in our sequence. Therefore, P. nobilis disposes of a rather complete set of GSTs, with a relatively low level of gene expansion within the GSTs classes, compared to other marine organisms, such as the sea urchin, sea anemone and crustaceans [25, 125, 151].

E ux transporters
E ux transporters are complexes of proteins mainly represented by the large family of the ATP-binding cassette (ABC) transporters. Using ATP as a source of energy, they catalyse the transport of various substrates such as sugars, lipids, amino acids, antibiotics or siderophores but also xenobiotics or pollutants, across membranes [152][153][154][155][156]. To date, eight sub-families (from A to H) have been described although some family could be absent from a genome [152,157]. Considering xenobiotic stress, the rst line of cellular defence is represented by the ABCB (p-glycoproteins), ABCG (mitoxantrone resistance protein) and the ABCC (multidrug resistance proteins) that export both unmodi ed and modi ed substrates out of the cell [158][159][160][161]. A search within the genome assembly revealed fty-two contigs encoding typical features of ABC transporters, of which thirty-seven corresponded to the three multidrug transporter subfamilies B, C, and G. Similar total number and proportion have been described in human, sea urchin, anemone and the copepod Tigriopus japonicus [25, 125, 162, 163] ( Figure 6).

Antioxidant proteins and metal detoxi cation
Antioxidant proteins counteract the deleterious effects of excessive oxidative stress within cells, endogenously generated by the presence of metals, the detoxi cation processes or even the global increase in oxygen consumption [164][165][166]. We could identify at least one contig containing a coding sequence associated to each of the following enzymes: catalase, superoxide dismutase, glutathione peroxidase, glutathione synthase and glutathione reductase. Concerning the metal-binding proteins, we found a single contig for metallothionein, which, to the best of our knowledge, seems to be a general feature among bivalves. However, we detected seven heavy metal binding proteins (HIPs) in the genome of P. nobilis, similar to the histidine-rich glycoproteins (HRG), which have been evidenced in the blood plasma of several bivalves and where they represent more than 40 % of the total protein weight [167,168]. Moreover, P. nobilis was also determined to contain ferritin, ceruloplasmin and biliverdin reductase sequences, but it seems to lack a heme oxygenase and phytochelatins. Based on these results, we could expect a good ability of P. nobilis to cope with metallic elements.

Heat shock proteins
Heat shock proteins (HSP) are expressed as a general response to many stressful challenges [169][170][171]. We could evidence seven hsp70, one hsp90, one heat shock cognate protein and one heat shock factor in the genome assembly of P. nobilis. Noteworthy, seventeen co-chaperones dnaJ/hsp40 were also detected, consistent with previous observations across a wide range of organisms, from unicellular to human [172][173][174]. These proteins are conserved throughout the evolution and regulate the ATP-dependant activity of hsp70, by stimulating the hydrolysis of ATP. Hence, they appear pivotal actors of translation, (un)folding, translocation and degradation processes [175,176]. Additionally, we could recognise ve mitochondrial HSPs: one HSP10, two HSP60 and two HSP70.

Nuclear receptors and transcription factors
Finally, we searched for nuclear receptors (NR) and transcription factors (TF), as some of them are known to be involved in the sensing of xenobiotics and other environmental pollutants, in stress signalling and in the regulation of the in ammatory response [25]. NRs are effector proteins, characterized by a ligandbinding domain and a DNA-binding domain and constitute a large family, organized in six subfamilies [177,178]. Once activated, NRs regulate the expression of genes that control development, metabolism or homeostasis [179]. The number of NRs found in P. nobilis, namely thirty-two, is in agreement with the number of NRs found in other species, i.e. twenty-one in Drosophila melanogaster [180], thirty-one in the copepod Tigriopus japonicus [181], thirty-three in the owl limpet Lottia gigantea [182], thirty-nine in the gastropod snail Biomphalaria glabrata [182], forty-three in the bivalve C. gigas [183] and forty-eight in humans [184]. Noteworthy, many of the NRs involved in the defensome, such as the ecdysone receptor, PXR, the estrogen receptor (ERR), PPAR and retinoic acid receptors [185,186] were discerned in the P. nobilis genome.
We also detected seventeen transcription factors involved in the chemical defensome in our assembly.
Activated transcription factors bind speci c responsive elements (REL) within regulatory regions and, in turn, modulate the expression of the responsive genes. For instance, hif-1 and ARNT bind hypoxia REL [187], mtf1 binds metal and hypoxia REL [188] and AhR/ARNT bind xenobiotic REL [187]. Some of these transcription factors actually interact with other proteins to form active dimers, such as Maf proteins that heterodimerize with nfe-2 or Bach proteins [189].
Overall, we report the existence of forty-nine NRs and TFs in the draft genome of P. nobilis, of which many are involved in the chemical defensome, unmasking the capacity of this marine bivalve to early detect and respond to harmful chemicals.

Mitochondrial genome
Mitochondria are essential components of the cell where they provide energy. As a reminiscence of symbiotic organisms, they possess their own genome. Conveniently, genome sequence data in eukaryotes also contain a large number of mitochondrial sequences due to the process of total DNA isolation that capture organelle nucleic acids.
Within our assembly, we identi ed one contig (Contig_66, 18799 nt) that matched the complete genome of the closely related penshell Atrina pectinata (NC_020028.1, [190]), over its entire length (68% covering, 80% identity, null e-value). The subsequent mapping of the trimmed raw reads against the mitochondrial contig, could con rm the unambiguously reconstructed mitochondrial genome with a mean coverage of approximately 6000X, and the absence of an alternative consensus sequence, that could have resulted from a putative homoplasy, due to accumulating mutations and/or the simultaneous presence of male and female mitochondrial genomes. Actually, our variant calling analysis highlighted only seven potential variations of low statistical signi cance (i.e. one indel and six SNPs) (Table S4 data).
Only ATP synthase subunit 8 (atp8) was missing from our annotation. However, this gene also seemed to be absent in other bivalve species. Whether P. nobilis and other bivalves are truly lacking the highly divergent atp8 gene remains debated [191]. The complete mitogenome DNA sequence should be useful for future research regarding the structuration of P. nobilis populations [192,193].

Conclusion
Pinna nobilis is an emblematic and endemic bivalve of the Mediterranean Sea where it is distributed all around the basin, but for how long? At the end of the past century, recurrent local population collapses due to human activities have risen concerns about the persistence of this animal, leading to e cient conservation measures that allowed populations to recover. However, the actual ongoing MME caused by emerging pathogens resurrects major worries for the survival of this species at a global level. In Spain, where the outbreak started in early autumn 2016, and where high mortality rates reaching up to 100% of the surveyed populations were observed [15], authorities changed the status of P. nobilis from "endangered" to "endangered with extinction" ( Nonetheless, this phenomenon could obviously be associated to either the challenging physical and chemical environmental conditions (e.g. signi cant and frequent variation of temperature and/or salinity) in these speci c sites that would be not suitable for the dissemination of the pathogens or a local genetic adaptation of these P. nobilis populations (individuals). Yet, all these sites are under strong anthropogenic pressures. Therefore, this situation is all the more worrying that the organism physiology is stressed by chemical and physical factors (pollutants, habitat degradation, climate change) usually weakening the immune response [195][196][197][198].
In this study we reveal that P. nobilis has a complete chemical defensome, that globally resembles that of other sessile marine organisms. Similarly, P. nobilis has a sophisticated innate immune system, based on an extended number of recognition receptors (i.e. 38 TLR, 27 lectins, 16 laminins) and cytokines (IL-17, MIF), a well conserved TLR pathway, the presence of the main effectors of complement cascade and a complex apoptosis pathway. However, we were unable to clearly detect AMPs, orthologous to those evidenced in the Mytilus genera, although several contigs were found to encode cysteine-rich peptides. A transcriptomic study performed on fan mussels challenged with pathogenic bacteria would likely help clarifying with point, and potentially reveals species-speci c AMPs. Overall, the gene diversity of the innate immune system appears quite developed, similar to that observed in the common mussel M. edulis [24].
The extensive number of recognition receptors seems obvious with respect to the highly diverse, commensal and pathogenic microbial fauna surrounding and hosted by P. nobilis individuals.
Nevertheless, it questions about how, at the molecular level, H. pinnae invade tissues and cells, and circumvent the innate immunity. The genome comparison of resistant versus sensitive individuals, or species across the Pinna genera (e.g. P. nobilis vs Pinna rudis since P. rudis seems so far to be unaffected by the outbreak, Garcia-March and Vicente, pers. obs.), will help to nd putative regions under selection for these traits. Actually, the resistance to pathogen or abiotic stress do not only rely upon gene content but also on how gene expression is nely regulated. Regarding gene regulation, notwithstanding transregulating elements, P. nobilis would also likely methylate DNA, as we could detect a DNA methyl transferase and predict several hundred CpG islands (data not shown). Hence, the next steps forward the comprehension of pathogen resistance will be to investigate other genomes and transcriptomes.
To conclude, this genome assembly represents the rst reference genome in the Pinnidae family and the second in the order of Pteriida, after the pearl oyster P. fucata. This reference genome (including the mitogenome) should be considered in future research regarding comparative evolution, comparative genomics or studies aiming to understand the genetic variability of P. nobilis populations, in order to evolve effective conservation plans.

Declarations
Availability of supporting data

Competing interests
The authors declare that they have no competing interests.    Table 1 Due to technical limitations, Table 1 is only available as a download in the supplemental les section. Figure 1 A specimen of P. nobilis covered with epibionts in a posidonia oceanica seagrass meadow. Overview of the proportions of physiological functions according to gene ontology and distribution of Blast Hits (BlastX) according to species, at level 3.

Figure 4
Schematic representation of encoded TLRs found within the P. nobilis genome assembly. The numbers in brackets represent the number of contigs encoding a TLR with the above represented structural feature. The "n" stands for the number of Leucine motifs within the extracellular part of the receptor and varies from 1 to 10 repeats. Nine contigs were annotated as TLR (Blast2GO) but only contained Leu domain without transmembrane region and no TIR motif, hence, they were not included in the TLR repertoire.

Figure 5
Relative proportions of CYP clan families.

Figure 6
Relative proportion of xenobiotic transporter genes in various organisms. The number in brackets represents the total number of ABC genes detected.