Apoptotic gene loss in Cnidaria is associated with transition to parasitism

The phylum Cnidaria consists of several morphologically diverse classes including Anthozoa, Cubozoa, Hydrozoa, Polypodiozoa, Scyphozoa, Staurozoa, and Myxozoa. Myxozoa comprises two subclasses of obligate parasites—Myxosporea and Malacosporea, which demonstrate various degrees of simplification. Myxosporea were previously reported to lack the majority of core protein domains of apoptotic proteins including caspases, Bcl-2, and APAF-1 homologs. Other sequenced Cnidaria, including the parasite Polypodium hydriforme from Polypodiozoa do not share this genetic feature. Whether this loss of core apoptotic proteins is unique to Myxosporea or also present in its sister subclass Malacosporea was not previously investigated. We show that the presence of core apoptotic proteins gradually diminishes from free-living Cnidaria to Polypodium to Malacosporea to Myxosporea. This observation does not favor the hypothesis of catastrophic simplification of Myxosporea at the genetic level, but rather supports a stepwise adaptation to parasitism that likely started from early parasitic ancestors that gave rise to Myxozoa.

Multicellular organisms can get rid of old, damaged, unuseful or precancerous cells by apoptosis, a form of programmed cell death or cell suicide. Key apoptotic proteins include proteolytic enzymes called caspases, which trigger cell death by cleaving specific cellular proteins 1 . Inactive precursors, or procaspases, are activated through cleavage by other caspases 2 . Extracellular or intracellular death signals can initiate this proteolytic cascade under tight regulation of adaptor and regulatory proteins such as Fas-associated death domain protein (FADD) and Bcl-2 family proteins 3,4 .
Variation exists between the apoptotic pathways of invertebrates. For example, the Pacific oyster has mammalian-like apoptotic pathways 5 , whereas, Caenorhabditis elegans does not use cytochrome C for the apoptotic caspase cascade activation 6 . Apoptosis has been described and well-studied in model free-living Cnidaria, such as Hydra 7 which has numerous caspases, Bcl-2 protein family members, an APAF-1 homolog, components of a putative death receptor pathway and inhibitors of apoptotic proteases. Like other animals, Hydra is capable of developing tumors 8 and the transcriptomes of these tumors reveal misregulation of genes related to mammalian apoptosis genes.
Let us review the general features of the better-understood mammalian apoptosis, keeping in mind that similar actors can be found in free-living Cnidaria 7,9 . There are two principal apoptotic pathways: the extrinsic pathway and intrinsic pathway 1 . In its simplified version, the extrinsic apoptotic pathway starts with external signals that activate Death domain harboring transmembrane receptors (TNFR/Fas) 1 . This signal is transduced via adaptor proteins such as TRADD/FADD with Death and Death-effector domains that transform initiator Death-effector domain containing procaspases into active initiator caspases which induce the apoptosis caspase cascade 1 .
The intrinsic apoptotic pathway is activated in response to various oncogenes and intracellular stressors, including DNA damage, reactive oxygen species, endoplasmic reticulum stress, and hypoxia 1 . The activation is mediated via transcription factors such as p53 10 . In response BH3-only Bcl-2 family proteins are expressed, leading to the inhibition of anti-apoptotic Bcl-2. This allows liberated Bak proteins to form channels within the mitochondrial membranes causing cytochrome C release. Apoptotic protease activating factor 1 (APAF-1)

Results
Presence of genes coding key apoptotic proteins in parasitic Cnidaria. Comparison of cnidarian species reveals that the transition to parasitism is associated with gradual loss of key apoptotic factors in the parasitic classes Polypodiozoa and Malacosporea, up to a complete reduction of almost all pathway components in Myxosporea ( Fig. 1 and Table 1). We chose a 'standard' mammalian-like apoptotic pathway as a reference, because comparative genomic studies find many similarities between it and the predicted apoptotic pathway of free-living Cnidaria such as Hydra 9 .
The free-living Cnidaria Hydra vulgaris was previously known to share all key components of both the intrinsic and extrinsic apoptotic pathways (Table 1). We found that the parasitic Cnidaria Polypodium hydriforme lost the main components of the extrinsic apoptotic pathway: it completely lacks the Death domain required for the extrinsic receptor and adaptor proteins as well as the Death-effector domain required for the extrinsic initiator caspase. However, it maintains most of the intrinsic apoptotic pathway components with the exception of BH3-only Bcl-2 family members. Also, we were able to detect only one predicted intrinsic pathway initiator (CARD domain containing) caspase and one executioner caspase compared to two and eleven found in Hydra. Malacosporea have undergone further loss of apoptosis-related proteins including p53, all Bcl-2 family members, APAF-1, CARD domain, and DEATH domain containing proteins (see Supplementary tables, Supplementary table S2, for full list of domains). One caspase without a CARD domain was found in Buddenbrockia. Myxosporea universally lack all main actors of apoptosis except cytochrome C, whose gene is however absent in Myxobolus squamalis and Henneguya salminicola, and is a pseudogene with multiple inner stop-codons in Kudoa iwatai, Sphaeromyxa zaharoni, and Enteromyxum leei. Henneguya salminicola apparently entirely lacks a mitochondrial genome 24 . Detailed findings and a comparison with other animals are presented in Table 1.
This pattern of apoptotic-protein loss among parasitic Cnidaria is consistent with gradual loss-of-function in contrast to previous findings suggesting catastrophic loss of apoptotic and cancer-related proteins in Myxosporea 13   Note that among Malacosporea caspases were only found in Buddenbrockia. Compared to free-living Cnidaria, Polypodium lacks core proteins involved in the extrinsic apoptosis pathway. For the intrinsic pathway Polypodium has 1 predicted intrinsic pathway initiator caspase and 1 executioner caspase, compared to 2 and 11 in Hydra respectively. Myxosporea universally lack all main actors of apoptosis except cytochrome C, whose gene is however absent in Myxobolus squamalis, Henneguya salminicola and is a pseudogene with multiple inner stop-codons in Kudoa iwatai, Sphaeromyxa zaharoni, and Enteromyxum leei. Note that Henneguya salminicola apparently lacks a mitochondrial genome and the respiratory chain 24 . In contrast to Polypodium, Myxozoa likely lack Bcl-2 family members. The Bcl-2 family plays an important role in regulating the intrinsic apoptotic pathway. Some Bcl-2 family members are proapoptotic (Bak and Bok proteins in humans) and others are anti-apoptotic (human Bcl-2) 33 . Our analysis suggests that Myxozoa lack any Bcl-2 protein members, consistent with the predicted lack of intrinsic apoptotic pathways. In Polypodium we predicted five Bcl-2 protein family members and attempted to infer their pro-or anti-apoptotic roles based on similarity with known human and Hydra Bcl-2 family protein members (Fig. 3, see Supplementary materials, Fig. S1 for phylogenetic tree). One Polypodium Bcl-2 protein was predicted as a pro-apoptotic Bok homologue by InterProScan 34 . The separation between Bcl-2 and Bak proteins is based on phylogenetic analysis Cytochrome C is likely lost in some myxosporeans. Predicted genes for cytochrome C were found in Polypodium, Malacosporea (both Tetracapsuloides and Buddenbrockia), and Myxosporea. However, in Myxosporea the situation with cytochrome C is rather complex. In Myxobolus pronini, M. honghuensis, and Thelohanellus kitauei the predicted cytochrome C proteins share high similarity with each other. Although, cytochrome C could not be found in a published Myxobolus cerebralis transcriptome (GBKL00000000.1). Also, we were unable to identify any potential genes of cytochrome C in the available genomes and transcriptomes of Henneguya salminicola and Myxobolus squamalis. Note that Henneguya salminicola was previously shown to lack a mitochondrial genome so this is a special case 24 . While we were able to find cytochrome C homologs in Sphaeromyxa zaharoni, Enteromyxum leei, and Kudoa iwatai, we found numerous stop codons and insertions within their predicted structural Cytochrome C domains. Enteromyxum leei and Kudoa iwatai have stop codons only within the above-mentioned insertions, whereas Sphaeromyxa zaharoni also has stop codons outside of them, within a highly conserved 69 aa region (Fig. 4).
Annotated mitochondrial genomes are available for myxosporeans Enteromyxum leei 35 and Kudoa iwatai 36 . Despite the presence of stop-codons in their cytochrome C genes, genes of core subunits of complex IV and cytochrome b (related to complex III) are present in their mitochondrial genomes. They also possess the Pfam domains of respiratory subunits of complex III: PF00355 (Rieske) and PF02167 (Cytochrome C1). The latter statement is also true for Sphaeromyxa zaharoni, the third myxosporean with stop-codons in the cytochrome C gene. This suggests that the truncated cytochromes C of myxosporeans, resulting from these nonsense mutations, might still maintain their function in the respiratory chain.
Not only Myxosporea, but also Malacosporea lack calpains, APAF-1, IAPs, and p53. Calpains are a family of proteases, some members of which are involved in caspase activation in response to calcium release from the endoplasmic reticulum 37 . The caspase activation is performed via the Peptidase_C2 and Calpain III peptidase domains, while the EF-hand domain is capable of binding calcium 38 . We were unable to find any calpain family members in Myxozoa, however, we found 4 putative calpains in the parasitic cnidarian Polypodium (Fig. 5). The domain structure of two of those calpains (presence of both peptidase domains and EF-hand domains) suggests their role in apoptosis. We refer to them as "classical" calpains as described in Croall and Ersfeld 39 , and in Luo et al. 40 . The EF-hand domain is important for calcium binding, whereas the Peptidase_C2 and Calpain III domains are homologous to peptidase domains capable of caspase activation.
APAF-1 or apoptotic protease activating factor 1 is essential for forming the apoptosome when bound to cytochrome C, released from the mitochondria. Among studied parasitic Cnidaria only Polypodium has an APAF-1 homologue that we could identify. This homologue has the WD40 domain necessary for cytochrome C binding and the CARD domain for apoptosome formation (Fig. 6). This further suggests that Polypodium has functional apoptosis, while Myxosporea and Malacosporea do not. www.nature.com/scientificreports/ IAPs are inhibitors of apoptosis. Among studied parasitic Cnidaria only Polypodium has an IAP. However, this predicted protein lacks the XIAP/BIRC8_UBA domain found in its human homolog and has a shorter BIR_rpt domain which binds Zn2+ and then binds the active site of caspases inhibiting their catalytic activity (Fig. 6). p53 is a transcription factor that is activated in response to a number of stressors including DNA damage and increases the expression of a number of pro-apoptotic genes (including APAF-1, Bak, Bax) 10 . The only parasitic Cnidaria in our analysis with p53 is Polypodium. However, the Polypodium p53 homolog is truncated and lacks some of the domains present in the human p53 (Fig. 6). It lacks the p53_TAD2 and p53_tetr domains which are involved in p53 transactivation and tetramerization respectively. The p53 DNA binding domain is present in Polypodium.   table S3). Some genes can still be absent due to data incompleteness: when a gene is missing from the data, it might be due to insufficient read coverage. However, when this is the case, we expect different genes to be lost in independently sequenced related animals. For each species of Myxosporea we observe roughly the same set of lost apoptosis-related genes. A similar picture was observed for cancer-related gene loss of Myxosporean species as reported previously 13 .

Conclusions
We evaluated the presence of core apoptotic proteins in free-living and parasitic Cnidaria. We observe a gradual loss of apoptosis-related proteins, starting with Polypodium who lost the major components of the extrinsic apoptotic pathway, followed by Myxozoa who lost the intrinsic apoptotic pathway as well. Compared to Myxosporea, Malacosporea retained one potentially functional caspase homolog (in Buddenbrockia). While both studied malacosporeans retained cytochrome C, the latter is missing in myxosporeans Myxobolus squamalis, Henneguya salminicola and its gene contains multiple stop-codons in Kudoa iwatai, M. honghuensis, and Sphaeromyxa zaharoni. However, the presence of other respiratory chain proteins suggests that the genes might be producing www.nature.com/scientificreports/ truncated, yet possibly functional cytochrome C proteins. Thus, Cnidaria present us with a unique opportunity to observe different stages of gradual loss of apoptosis. The observation that Malacosporea lost many genes involved in apoptosis regulation, weakens the previously proposed SCANDAL hypothesis that suggested a transmissible cancer origin for Myxosporea catastrophic simplification 13 . Previously Panchin et al. 13 , did not observe an obvious trend for the loss of apoptosis and cancer-related genes in a subsample of parasitic Metazoa. A recent study of apoptosis in parasitic Schistosoma 41 revealed the presence of most proteins typically involved in apoptosis. This makes the case of extreme loss of apoptosis-related proteins in Myxosporea and Malacosporea rather unique and not a necessary consequence of their parasitic life-style.
One of the approaches to filter contaminations of transcriptomic data of a parasitic animal is to compare its sequences from two hosts (as was done in Faber et al. 22 ) or compare sequences from an infected and non-infected host of the same species. We created three filtered datasets using these approaches ( Table 2). In two cases we used the intersection of RNA reads of T. bryosalmonae from a fish host and T. bryosalmonae assembly from a bryozoan host. In one case we used RNA reads from infected by T. bryosalmonae kidney of Salmo trutta and filtered out reads mapping to S. trutta reference genome. Reads were mapped on assemblies using Bwa mem algorithm 43 . After filtering, predicted T. bryosalmonae reads were assembled using SPAdes. SPAdes tool was used with k-mer lengths of 33, 49, and 93. All .sam and .bam files were processed with samtools package 44 .

Exclusion of host contaminations in Myxozoan genomes and transcriptomes.
In the eight available genomes of myxosporean species (Myxobolus squamalis GCA_010108815.1, Henneguya salminicola GCA_009887335.1, Enteromyxum leei GCA_001455295.2, Sphaeromyxa zaharoni GCA_001455285.1, Kudoa iwatai GCA_001407235.2, Thelohanellus kitauei GCA_000827895.1, Myxobolus pronini (see Materials and methods, Sequence assembly), Myxobolus honghuensis GWHBFXL00000000) we searched for possible host contaminations using an original Python script (see supplementary script genome_filter.py). Each contig from the genomes was used as a query for blastn-searches 26 against a BLAST database built from the host genome. In the case of Myxosporea we used the publicly available genome of either the exact fish host or its closest available relative from Genbank (see Supplementary materials, List S1). If the best hit demonstrated > 85% identity and e-value < 1-e75, the query contig was excluded from the final assembly.
Another original python script (see supplementary script comparison.py) was used to remove host-derived contaminating sequences from the transcriptomes of malacosporeans B. plumatellae and T. bryosalmonae. First, we created two BLAST databases. The first consisted of several cnidarian transcriptomes, namely of Aurelia aurita, Polypodium hydriforme, Hydra vulgaris, Pocillopora acuta, Myxobolus squamalis, Myxidium lieberkuehni, and malacosporean transcriptomes that were obtained from hosts other than those which are being filtered. The second database consisted of a wide range of non-cnidarian organisms that may be regarded as additional probable contaminants. It was shown that possible contaminants include algae and cyanobacteria 23 , so we included Tetraselmis striata and Nodularia spumigena as representatives of these taxa. We also included the genome of bryozoan Cristatella mucedo, the transciptome of bryozoan Fredericella sultana, a transcriptome of another lophophorate-Lingula anatina, and also transcriptomes of Drosophila melanogaster, Ruditapes philippinarium, Trichinella pseudospiralis, Salmo trutta, Danio rerio, Cricetulus griseus. We found contaminations likely derived from these taxa when we manually checked BLAST search results of Malacosporean contigs. Such contaminations may sometimes occur when the same lab equipment is used for various sequencing projects or because bryozoans are filtrators and are susceptible to environmental contaminations. Table 2. Data sets which were used to make intersection transcriptomes. Reads were mapped on the readymade assembly using Bwa mem algorithm and then mapped reads were assembled using SPAdes.

Reads Assembly
T. bryosalmonae, fish-derived 22 T. bryosalmonae, bryozoan-derived 22 www.nature.com/scientificreports/ Databases were created using translated ORFs obtained from the assemblies with the TransDecoder tool 45 , using a minimal ORF length of 70 amino acids. Each contig from malacosporean assemblies was searched against each of the two BLAST databases using blastp from the BLAST + 26 tool package. Contigs were divided into three groups ("cnidaria", "contamination" and "not clear") based on e-value and bit score differences between the best hits against cnidarian and contaminant blast databases. If the e-value of the best cnidarian hit was at least 100 × less and its bit score at least 1.5 × more than that of the best hit from the contaminant dataset, this contig was considered as "cnidaria". If the opposite was true, the contig was flagged as "contamination". Otherwise, it was placed in the "not clear" group and subjected to further GC-content analysis (see below).
It was reported that Myxozoa have relatively lower GC-content compared to other Metazoa 46 . To visualize GC-content of each group of contigs we built a scatter plot with the length of contig and the GC-content on the Y-and X-axis respectively for "cnidaria", "contamination", and "not clear" groups of contigs. In the "cnidaria" group of contigs we obtained a set bounded by a bell curve with the average GC-content at 27-30% depending on the particular organism. The "contamination" group of contigs were placed as a flat bell-bounded set with average GC-content of 55%. In the "not clear" group we observed a distribution of GC-content indicating the presence of both "cnidaria" and "contamination" contigs (see Supplementary materials, Fig. S2) We used the following GC-content cutoffs: 39.03% for B. plumatellae, 40.80% for T. bryoslamonae assembly from fish kidney, 36.92% for T. bryosalmonae from bryozoan host to include the "not clear" contigs into the final "cnidaria" set of contigs.
Manual checks of potential contamination sources were conducted using BLAST searches and phylogenetic tree construction using IQ-TREE 47

Search for apoptosis-related domains and proteins.
To elucidate the presence of core apoptotic proteins as functional multidomain units we conducted additional BLAST searches. Similar to Lasi et al. 7 , we adopted the list of proteins that are most important for apoptosis, namely caspase homologs, Bcl-2-family proteins, Apoptotic protease activating factor 1 (APAF-1), tumor necrosis factor receptor (TNF-R), inhibitor of apoptosis proteins (IAPs), Fas-associated protein with death domain (FADD), calpains, p53, and cytochrome C. For each protein we collected several reference cnidarian protein sequences from UniProt and NCBI Genbank, Hydra sequences from Lasi et al. 7 , were also included (see Supplementary materials, List S2, for list of accessions).
We performed blastp searches for each of the reference proteins against malacosporean (filtered and unfiltered B. plumatellae, T. bryosalmonae from invertebrate host, T. bryosalmonae from fish host with further manual analysis), Polypodium hydriforme genome and transcriptome, and also against filtered myxosporean genomes. Domain structures of identified hits were analyzed using the InterProScan tool 34 with InterPro database 51 . Multiple alignments were built with the Muscle tool 52 and visualized in the Jalview application 53 .
Sample collection, DNA isolation. Specimens of Carassius auratus gibelio were caught in a pond near Barguzin River (53°69'N, 109°80'E) (Lake Baikal basin, Russia) in summer 2013. Round cysts of Myxobolus pronini were dissected from the body of fish and preserved in RNAlater for further analysis.
Total DNA was isolated from tissue samples by Diatom DNA Prep (IsoGene). One paired-end and two matepair libraries were prepared for Myxobolus pronini using the Nextera library preparation protocol (Illumina). The mate-pair libraries were prepared with the estimated mean insert lengths of 3500 bp and 7000 bp. The libraries were sequenced using an Illumina HiSeq 2000 instrument, generating 163 M 100 bp read pairs for the paired-end library, 31 M read pairs for the 3500 bp mate-pair library and 38 M read pairs for the 7000 bp mate-pair library.
Specimens of Acipenser gueldenstaedtii were caught by A.A. Gerasimov in the Volga River delta near the fall of the river into the Caspian Sea (region of city of Astrakhan) (46°16′N, 47°57′E) in 2013. This was done by A.A. Gerasimov as part of an official fish conservation practice. This practice involves catching a small number of fish, extraction of their oocytes for further breeding, followed by the release of fry back into their wild habitat. Stolons of Polypodium hydriforme were surgically isolated from infected oocytes of Acipenser gueldenstaedtii host, washed, and preserved in RNAlater for further analysis.
Total DNA was isolated from stolon samples by Diatom DNA Prep (IsoGene). One paired-end library was prepared for using the TruSeq library preparation protocol (Illumina). The Polypodium library was sequenced using an Illumina HiSeq 2500 instrument, generating 260 M read pairs with the read length of 250 bp. Sequence assembly. The sequencing reads for Myxobolus pronini and Polypodium hydriforme were trimmed using Trimmomatic 54 to remove the adapters. Prior to assembly, the Myxobolus pronini reads were processed using the error correction utility of the ALLPATHS-LG program 55 . The assembly for Myxobolus pronini data was performed using Velvet 56 with a k-mer size of 55. Scaffolds obtained with Velvet were then processed using the Redundans pipeline 57 with an identity threshold of 0.85 to obtain the final assembly draft. The assembly of Polypodium data was performed using SPAdes 42 with three k-mer sizes: 77, 99, 127. The assemblies were evaluated using QUAST 58 .