Molecular Investigation of the Ciliate Spirostomum semivirescens, with First Transcriptome and New Geographical Records

he ciliate Spirostomum semivirescens is a large freshwater protist densely packed with endosymbiotic lgae and capable of building a protective coating from surrounding particles. The species has been arely recorded and it lacks any molecular investigations. We obtained such data from S. semivirescens solated in the UK and Sweden. Using single-cell RNA sequencing of isolates from both countries, the ranscriptome of S. semivirescens was generated. A phylogenetic analysis identified S. semivirescens s a close relative to S. minus. Additionally, rRNA sequence analysis of the green algal endosymbiont evealed that it is closely related to Chlorella vulgaris. Along with the molecular species identification, n analysis of the ciliates’ stop codons was carried out, which revealed a relationship where TGA stop odon frequency decreased with increasing gene expression levels. The observed codon bias suggests hat S. semivirescens could be in an early stage of reassigning the TGA stop codon. Analysis of the ranscriptome indicates that S. semivirescens potentially uses rhodoquinol-dependent fumarate reducion to respire in the oxygen-depleted habitats where it lives. The data also shows that despite large eographical distances (over 1,600 km) between the sampling sites investigated, a morphologicallydentical species can share an exact molecular signature, suggesting that some ciliate species, even hose over 1 mm in size, could have a global biogeographical distribution. 2018 The Author(s). Published by Elsevier GmbH. This is an open access article under the CC BY icense (http://creativecommons.org/licenses/by/4.0/).


32
The genus Spirostomum Ehrenberg, 1834, cur-33 rently comprises eight species of ciliates found 34 globally in fresh and brackish water habitats 35 (Boscaro et al. 2014). These single-celled eukary-36 the previously described records of occurrence and 83 morphology from a fen pond ∼100 meters away 84 (Esteban et al. 2009a). Densities of up to 15 cells 85 per mL were observed, with cells being maintained 86 in natural samples for one week after collection. 87 When left undisturbed for about one hour the cil-88 iate builds an external case or coating; the ciliate is 89 contractile, and retracts into the casing if disturbed. 90 This could provide protection during a dispersal 91 event (e.g. wind, birds). S. semivirescens was not 92 observed to form cysts; however, there are records 93 of other Spirostomum species being able to form 94 cysts precursors (Ford 1986, and own observa-95 tions) in cultures of S. ambiguum). Cells were 96 always found to be densely packed with bright 97 green endosymbiotic algae (Fig. 1). 98 S. semivirescens from Swedish study sites was 99 immediately identified from the freshly-collected 100 samples from both locations as being morphologi-101 cally identical to the UK strain, and the additional 102 diagnostic literature (Esteban et al. 2009a). S. 103 semivirescens found was 800-1,500 m in length 104 and 25-45 m in width with more than 50 cells 105 being measured (Fig. 1). Densities of up to 30 106 cells per mL were observed, but more often were 107 found to be 5 per mL from both locations, each 108 showing productive ciliate concentrations, with 109 green Frontonia reaching up to ∼1,000 per mL, 110 especially from an algal mat sampled in Stadssko-111 gen. The S. semivirescens cells were observed 112 to build a loose casing, contractile, and always 113 densely-packed with endosymbiotic green algae. 114 The casing observed in the Swedish specimens 115 of S. semivirescens was larger (wider) and less 116 densely packed than observed in the UK, perhaps 117 due to different composition of available sediments 118 and/or to the length of time that the ciliate samples 119 were left undisturbed, allowing them to build a larger 120 protective coat. The samples were collected during 121 a warm period in August 2015, but S. semivirescens 122 was later found to thrive during much colder peri-123 ods in winter, even being regularly recovered from 124 the habitat under a ∼15 cm thick layer of ice. 125 Sequencing and Transcriptome Quality 126 For all seven transcriptomes (Table 1) a total of 127 9.3 Gb sequencing data was generated. Low lev-128 els of contamination were indicated by MEGAN 129 that assigned less than 5% of the contigs as 130 prokaryotic in each assembly. Less than 4% of 131 the contigs were classified as Viridiplantae, despite 132 the high number of algal endosymbionts in S. 133 semivirescens. For 17% of the 23,933 transcripts 134 in the co-assembly more than 10 reads from each 135 of the six S. semivirescens mapped and for 49% of 136 the transcripts 10 reads or more from at least three 137 different replicates mapped. Based on this level 138 of consistency between the transcriptomes and 139 the similar relative expression level of transcripts 140 between replicates (Supplementary Material Fig. 141 Figure 1. Spirostomum semivirescens in vivo micrographs of the specimens collected from UK and Sweden. A: S. semivirescens collected from Sweden. Note the long moniliform macronucleus (arrow) running along the center of the ciliate. The cell is packed with endosymbiotic green algae, a diagnostic characteristic. Scale bar 200 m. B: S. semivirescens collected from UK. The cell is shown here after leaving it undisturbed for a few hours on a counting chamber, as evident by the thick casing (arrow) it has produced. C: S. semivirescens' nuclear apparatus from a UK cell. Note the nodes of the macronucleus, and the small micronucleus at top of the oval shapes. The densely-packed endosymbiotic green algae are clearly in view in this cell. D: S. semivirescens collected from the UK. This specimen has begun to build its coating, which is the thin layer (arrow) around the center of the ciliate. Table 1. Transcriptome data generated. The sum of the pro-and eukaryotic fraction of contigs will not be 100%. This is due to the high number of contigs where diamond could not find any hit in the nr database and therefore no MEGAN assignment could be done.

Species
Sampling site Contigs * Prokaryotic contigs (%) ** Eukaryotic contigs (%) **   Since the IQ-TREE package contains a wider selec-302 tion of evolutionary models to choose from and is 303 reported to often find topologies with higher like-304 lihoods (Nguyen et al. 2015) than RAxML, the 305 bootstrap values from IQ-TREE were maped on 306 the bayesian tree (Fig. 2). S. subtilis was placed 307 as the deepest branching taxa in the Spirostomum 308 genus as seen before in Boscaro et al.   which gives further support to that heterotrichs 409 code for rquA within their genome. Therefore we 410 suggest that the identified rquA genes in this study 411 are highly unlikely to be a contamination.  (Esteban et al. 2009a) and this site is 443 known to be a hotspot of ciliate biodiversity, with sampling 444 efforts often revealing the S. semivirescens species. The fen 445 habitat is densely wooded and dimly-lit with temporary ponds 446 rich in organic sediment. The ditch had similar parameters, 447 and was about 100 meters away from the fen. Oxygen levels 448 were very low (<5%). The sediment water interface was sam-449 pled using a corked 500 mL caged sample bottle on a line. 450 The corked line was pulled once the apparatus had sunk, to 451 allow water and sediment within the desired oxygen-depleted 452 depths to be collected. The area sampled in the fen pond and 453 the ditch had a depth of less than 30 cm. 1 mL subsamples 454 were observed in a Sedgewick Rafter chamber. Many cells were 455 encountered and examined, with densities of up to 15 cells per 456 mL of sediment subsample.

457
S. semivirescens cells collected from this location 458 were hand-picked under a dissecting microscope using a 459 micropipette, and were stored in RNAlater (Thermo Fisher 460 Scientific) for transport to Uppsala University, Sweden for 461 transcriptome analyses. cDNA synthesis (see below) was 462 Please cite this article in press as: Hines HN MEGAN v5.8.3 (Huson et al. 2007), which contig assign-531 ments were used to estimate the fraction of the data originating 532 from the host, algae or prokaryotes.

533
Identification of anaerobic respiration pathway: Anaer-534 obic respiration proteins previously found in other eukaryotes 535 (Müller et al. 2012;Stairs et al. 2015) were searched for in 536 the transcriptomes via tblastn search. To search for the pres-537 ence of hydrogenosomes queries with [FeFe] hydrogenase, 538 pyruvate:ferredoxin oxidoreductase and the maturase proteins 539 HydE, HydF and HydG were used. Both pyruvate formate lyase 540 and the enzyme to activate this protein were search for to detect 541 pyruvate formate lyase activity. Also nitrate reductase, fumarase 542 and RquA were used as queries to detect other anaerobic path-543 ways.

544
Phylogenetic analysis: The rRNA sequences used in the 545 phylogenies were identified with Barrnap (Seemann 2013). 546 The ciliate sequences used to infer the phylogeny (Supple-547 mentary Material Table S1) were gathered by downloading all 548 Spirostomum sequences available in the SILVA database and 549 all sequences generated by Shazib et al. (2014). The algae 550 sequences were gathered by using the identified 28S rRNA 551 gene from S. semivirescens as a seed in a blastn search against 552 NCBI nt database. CD-HIT V4.6.6 (Li and Godzik 2006) was 553 used to remove identical sequences. Multiple sequence align-554 ments were produced by MAFFT X- INS-i (Katoh 2002) where 555 the CONTRAfold algorithm (Do et al. 2006) was used for pair-556 wise structural alignment. The multiple sequence alignments 557 were manually curated. BMGE was used to trim the curated 558 alignment (Criscuolo and Gribaldo 2010). Bayesian inference 559 tree topology was calculated with PhyloBayes v1.5a (Lartillot 560 and Philippe 2004) using the CAT + GTR model. Four chains 561 were used and both trees ran until maxdiff calculated by the 562 PhyloBayes bpcomp-command were below 0.1. Burn-in was 563 selected by monitoring -log likelihood plotted against gen-564 eration of trees. For the ciliate Tree 13000 generations was 565 generated and the burn-in was set to 1000. For the algae Tree 566 37000 generations was generated and the burn-in was set to 567 1000. Maximum likelihood trees were calculated with IQ-TREE 568 (Nguyen et al. 2015) using the TIM + R2 model for the cili-569 ate and TN + R3 model for the algae. The model tester in the 570 IQ-TREE package selected the models in the maximum likeli-571 hood tree according to the Bayesian Information Criterion. Two 572 long branches were removed in both the ciliate and the algae 573 phylogeny that could potentially produce artifacts in the tree 574 topology. To rule out that that the identified rquA sequences from 575 the tblastn search were contamination we repeated the phylo-576 genetic analysis by Stairs et al. (2018). Additional sequences 577 added in this phylogeny were the potential rquA sequences 578 identified in this study and a potential rquA sequence from the 579 LG + C50 model that was selected by the Bayesian Information 585 Criterion.