Genome sequence of the sulfur-oxidizing Bathymodiolus thermophilus gill endosymbiont

Bathymodiolus thermophilus, a mytilid mussel inhabiting the deep-sea hydrothermal vents of the East Pacific Rise, lives in symbiosis with chemosynthetic Gammaproteobacteria within its gills. The intracellular symbiont population synthesizes nutrients for the bivalve host using the reduced sulfur compounds emanating from the vents as energy source. As the symbiont is uncultured, comprehensive and detailed insights into its metabolism and its interactions with the host can only be obtained from culture-independent approaches such as genomics and proteomics. In this study, we report the first draft genome sequence of the sulfur-oxidizing symbiont of B. thermophilus, here tentatively named Candidatus Thioglobus thermophilus. The draft genome (3.1 Mb) harbors 3045 protein-coding genes. It revealed pathways for the use of sulfide and thiosulfate as energy sources and encodes the Calvin-Benson-Bassham cycle for CO2 fixation. Enzymes required for the synthesis of the tricarboxylic acid cycle intermediates oxaloacetate and succinate were absent, suggesting that these intermediates may be substituted by metabolites from external sources. We also detected a repertoire of genes associated with cell surface adhesion, bacteriotoxicity and phage immunity, which may perform symbiosis-specific roles in the B. thermophilus symbiosis.


Introduction
Chemoautotrophic bacteria form the base of the foodchain in deep sea hydrothermal vent ecosystems [1,2]. Many of these chemoautotrophs live in highly integrated symbiotic associations with invertebrate hosts, such as mussels, clams and tube worms, which enables megafaunal communities to thrive in the otherwise uninhabitable vent ecosystem [3][4][5].
The mussel Bathymodiolus thermophilus, for example, a bivalve belonging to the family Mytilidae, densely populates the hydrothermal vent fields of the Galapagos Rift and of the East Pacific Rise between the latitudes 13°N and 21°S [6]. Although the animal's food groove and digestive tract are reduced [6], B. thermophilus appears to be able to ingest and assimilate suspended particles by filter feeding [7].
The major part of the bivalve's nutrition, however, is derived from its chemosynthetic symbionts [4,8]. The sulfuroxidizing bacteria live within specialized gill cells, so called bacteriocytes [9]. Provided with a steady supply of reduced sulfur from the vents, these symbionts synthesize organic compounds and thus feed their host [10,11].
Investigations on the symbiont's physiology have hitherto been limited by the inaccessibility of mussel samples and failure to culture the symbionts in vitro. Underlying metabolic pathways that facilitate the putative inter-exchange of nutrients between the symbiotic partners therefore remain unexplored. However, culture-independent methods, such as direct genomic, transcriptomic or proteomic analyses of symbiont-containing tissue or of enriched symbiont fractions have provided useful physiological information about various uncultured marine symbionts in the past [12][13][14][15][16][17]. In this study we used symbiont-enriched preparations from B. thermophilus gill tissue to assemble the first draft genome of the B. thermophilus symbiont in order to gain preliminary insights into its metabolic potential.

Organism information
Classification and features B. thermophilus symbiont cells are coccoid or rod-shaped (Fig. 1). In electron micrographs, they typically appear as roundish forms, whose central region is light or transparent (looking "empty"), while the outermost regions of the cytoplasm are darker and more structured ( Fig. 1 and [9]). Like most sulfur-oxidizing (thiotrophic) bivalve symbionts [4,18], the B. thermophilus symbiont has a Gram-negative cell wall. With a diameter of 0.3-0.5 μm, B. thermophilus symbiont cells are of similar size as thiotrophic symbionts from other Bathymodiolus host species [19][20][21][22], and notably smaller than sulfur-oxidizing symbionts from other invertebrate hosts [4,23]. In the host tissue, the symbionts are usually enveloped in large vacuoles. Groups of up to 20 symbionts within a single host vacuole have previously been reported by Fisher and colleagues [9]. Imaging of purified symbiont fractions from homogenized B. thermophilus gill tissue revealed, besides a large number of free symbiont cells, some intact vacuoles encompassing multiple symbionts (Fig. 1).
B. thermophilus symbionts reside intracellularly in bacteriocytes in their host's gill tissue. Unlike some other Bathymodiolus species, such as B. azoricus that maintains a dual symbiosis with both sulfur-oxidizing and methaneoxidizing bacteria [24], B. thermophilus hosts only one type of bacterial endosymbionts. Based on 16S rRNA gene similarity [25], this sulfur-oxidizing symbiont population in B. thermophilus belongs to a single phylotype.
The B. thermophilus symbiont is a member of the Gammaproteobacteria (NCBI taxonomy ID 2360). It is closely related to symbionts of other Bathymodiolus species, and more distantly related to symbionts of other invertebrate hosts and to free-living Gammaproteobacteria from various marine habitats [26]. The B. thermophilus symbiont falls in a well-supported clade consisting of symbionts of other mytilid and vesicomyid bivalves and free-living gammaproteobacterial clones from marine vents and other submarine volcanic sites as shown in Fig. 2. Its closest relatives are the 'Bathymodiolus aff. Thermophilus thioautotrophic gill symbiont' from 32°N EPR (NCBI taxonomy ID 363574; 99.85% similarity on the 16S rRNA level) and the Bathymodiolus brooksi symbiont from the Gulf of Mexico (NCBI taxonomy ID 377144; 99.53% similarity). According to our analysis, the B. thermophilus symbiont is only remotely similar to the sulfur-oxidizing symbionts of deep-sea vestimentiferan tube worms (90% 16S rRNA similarity) and of shallow water lucinid clams (87-90% similarity, see Fig. 2).
The B. thermophilus symbiont's closest cultured relative is the free-living Candidatus Thioglobus autotrophica, whose genome was recently sequenced [27]. The metabolic properties of both bacteria appear to be highly similar, as predicted from their genomes. Like the B. thermophilus symbiont genome presented in this study, the Ca. T. autotrophica genome encodes an incomplete TCA cycle. The high degree of 16S rRNA gene sequence similarity between Ca. T. autotrophica and the B. Transmission electron micrographs of Candidatus Thioglobus thermophilus. B. thermophilus gill tissue was homogenized in a glass tissue grinder and subjected to crude density gradient centrifugation using Histodenz™ gradient medium. Subsamples were taken from two visible bands and fixed for electron microscopy (a and b). Both subsamples contained numerous free symbiont cells (S) as well as some intact host vacuoles (V) containing several symbiont cells, besides various other cellular components and host tissue debris. L: Lipid drop or mucus. Scale bar: 5 μm. Electron microscopy method details: samples were fixed in a) 1% glutaraldehyde, 2% paraformaldehyde in IBS (imidazole-buffered saline; 0.49 M NaCl, 30 mM MgSO 4 *7H 2 O, 11 mM CaCl 2 *2H 2 O, 3 mM KCl, 50 mM imidazole) and b) in 2.5% glutaraldehyde, 1.25% paraformaldehyde in IBS. After embedding in low-gelling agarose and postfixation in 1% osmium tetroxide in cacodylate buffer (0.1 M cacodylate; pH 7.0), samples were dehydrated in a graded ethanol series (30 to 100%) and embedded in a mixture of Epon and Spurr (1:2). Sections were cut on an ultramicrotome (Reichert Ultracut, Leica UK Ltd., Milton Keynes, UK), stained with 4% aqueous uranyl acetate for 5 min followed by lead citrate for 1 min and analyzed with a transmission electron microscope LEO 906 (Zeiss, Oberkochen, Germany) thermophilus symbiont (95%), suggests that both belong to the same genus. We therefore propose the tentative name Candidatus Thioglobus thermophilus for the thiotrophic B. thermophilus symbiont.
A summary of key features of Ca. T. thermophilus is given in Table 1.

Genome project history
The genome of Candidatus Thioglobus thermophilus was sequenced to get a comprehensive insight into the metabolic potential of the bacterium. This project is part of a larger effort to compare the symbiont genomes from various Bathymodiolus species across different vent habitats in order to understand the possible effects of vent geochemistry in shaping host-symbiont evolution in Bathymodiolus. Sequencing and assembly of the symbiont genome were conducted at the Göttingen Genomics Laboratory (University of Göttingen, Germany) and at the Max-Planck-Institute of Marine Microbiology (Bremen, Germany), respectively. The sequences have been deposited in GenBank under the accession number MIQH00000000. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Symbionts for genome sequencing were isolated from one single B. thermophilus host individual, which was collected during the R/V Atlantis cruise AT26-10 in January 2014. The mussel was collected from a diffuseflow vent at the Tica vent field on the East Pacific Rise at 9°50.39′ N, 104°17.49′ W by the remotely operated vehicle (ROV) Jason. After recovery, the animal was dissected on board the research vessel and gill tissue was removed and homogenized in 1× PBS buffer (Dulbecco's Phosphate Buffered Saline, Sigma-Aldrich order no. Fig. 2 Phylogenetic tree of Candidatus Thioglobus thermophilus and related free-living and host-associated sulfur oxidizers. Ca. T. thermophilus, the thiotrophic symbiont of Bathymodiolus thermophilus, is displayed in bold. The tree was inferred from closely related 16S rRNA gene sequences obtained from the SILVA database using the SILVA Incremental Aligner (SINA) [51] and was estimated with the 16S rRNA sequence of 46 bacteria. The final alignment covered 1138 nucleotides. Sequence alignment and phylogenetic analysis were performed using the MEGA7 software tool [52]. The phylogenetic tree was constructed using the Maximum Likelihood method based on the Tamura-Nei model implemented in MEGA7 [53]. Branch bootstrap support values were calculated using 1000 replicates and are displayed as circles (black: ≥ 90%, white: ≥ 60%). For the sake of clarity some organisms were merged into groups (wedges): a uncultured clones (KC682721, KC682765, JQ678401, AB193934); b whale fall symbionts (HE814589, HE814588, HE814591 HE814585); c uncultured clones (FM246509, FM246513); d uncultured clones (JQ678344, JQ678392); e Mytilidae symbionts (AM503921, AM503923); f Vesicomyidae symbionts (EU403432, EU403431, CP000488* 1081274-1,082,807, AP009247* 948400-949,934); g Lucinidae symbionts (X84979, M99448, M90415); h tube worm symbionts (NZ_AFOC01000137* 503-2033, DQ660821, NZ_AFZB01000059* 4132-5662). The lucinid clam symbionts, the vestimentiferan tube worm symbionts, and the free-living Thiomicrospira crunogena XCL-2 were included as outgroup. Branches that are not highlighted by colors represent free-living relatives. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. *these NCBI accession numbers refer to whole genome submissions and not to individually submitted 16S rRNA gene sequences (start and stop positions of the 16S rRNA gene are given after the asterisk). JdFR: Juan de Fuca Ridge, EPR: East Pacific Rise, MAR: Mid-Atlantic Ridge, OMZ: oxygen minimum zone, MFZ: Mendocino Fracture Zone, SBB: Santa Barbara Basin, WH: Woods Hole D5773). The resulting homogenate was diluted with 1× PBS (ratio 1:3) and subjected to multiple centrifugation steps (differential pelleting): In a first centrifugation step (500 × g, 5 min, 4°C in a tabletop centrifuge using a swing-out rotor), crude host tissue debris and host cell nuclei were removed from the homogenate. The supernatant was centrifuged again (step 2) as described above to pellet residual host nuclei. The new supernatant was now centrifuged at maximum speed (step 3), i.e. at 15,000 × g for 20 min at 4°C using a fixed-angle rotor. The resulting pellet contained enriched bacterial cells and was immediately frozen at −80°C until genomic DNA preparation.
Genomic DNA was isolated from the purified bacteria using the MasterPure DNA Purification Kit (Epicentre) as recommended by the manufacturer.

Genome sequencing and assembly
Sequencing of the B. thermophilus symbiont genome was performed at the Göttingen Genomics Laboratory using the Illumina Genome Analyzer II x. A Nextera shotgun library was generated for a 112 bp paired-end sequencing run. Sequencing resulted in 7,569,934 paired-end reads. Adaptors were removed from the reads, quality-trimmed (Q = 2) with BBDuk and errorcorrected with BBnorm (V35, sourceforge.net/projects/ Evidence codes -IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [50] bbmap). The resulting reads were assembled with IDBA-UD [28]. To bin the symbiont genome from the metagenome assembly, we used gbtools [29] based on GC content and sequencing coverage. The corrected reads were mapped against the symbiont genome bin with BBmap and reassembled with SPAdes v. 3.1.1 [30]. This assembly resulted in 1341 contigs longer than 200 bp (1281 scaffolds). The completeness and contamination of the genome was estimated with CheckM [31]. The CheckM test showed 96.98% completeness of the genome with 11.32% contamination and 81.40% strain heterogeneity.

Genome properties
The properties of this genome are summarized in Table 3. The draft genome of the sulfur-oxidizing B. thermophilus symbiont contained 3,088,407 bp in 1281 scaffolds >200 bp. The average GC content was 37.7%. A total of 3097 genes were predicted, of which 3045 (98.3%) are predicted protein-encoding genes. The remaining 1.5% and  Genes with function prediction are all 3045 protein-coding genes minus those 994 genes annotated as "hypothetical proteins" that have no COG category or fall into the COG categories "unknown function" or "general function prediction only" and that have no Pfam domain or a Pfam "domain of unknown function" c Includes genes for which a signal peptide was predicted with at least two of the three tools used. Percentages of genes with function prediction, COGs, Pfam domains, signal peptides and transmembrane helices were calculated against a total of 3045 protein-coding genes 0.2%, respectively, consisted of RNA genes and pseudo genes. Of the protein-encoding genes, 54.5% and 65.2% were affiliated to COG-and Pfam-based functions, respectively. For an overview of predicted COG categories see Table 4.

Insights from the genome sequence
Sulfur-oxidizing symbionts of Bathymodiolus species are assumed to be horizontally transmitted, i.e., they supposedly enter their bivalve hosts from a free-living bacterial population in the environment, rather than being transferred from one mussel generation to the next [39]. The idea of a putative free-living stage of the symbiont in the hydrothermal vent environment is in accordance with our genome analysis: Unlike some insect symbionts, which are obligatorily dependent on their hosts and have a diminished genome [40], the B. thermophilus symbiont genome (3.1 Mb in size, see below) is not reduced. With the exception of the tricarboxylic acid cycle, which lacks three enzymes (see below), all necessary pathways for a host-independent life-style appear to be complete in the B. thermophilus symbiont's genome.

Energy generation
The B. thermophilus symbiont uses reduced sulfur compounds such as sulfide and thiosulfate as its major energy sources [10]. As predicted from the genome sequence, sulfide and thiosulfate are oxidized to sulfate via the rDSR-APS-Sat pathway and the Sox multienzyme-complex, respectively. Oxygen and nitrate are used as final electron acceptors. Complete gene sets for these pathways are present in the symbiont genome.

CO 2 fixation and carbon metabolism
The B. thermophilus symbiont genome furthermore encodes a modified version of the CO 2 -fixing Calvin-Benson-Bassham cycle: while the genes for sedoheptulose-7-phosphatase and fructose-1,6-bisphosphatase are missing, a pyrophosphate-dependent 6-phosphofructokinase is encoded, which potentially replaces the two other functions (as also described for the endosymbionts of The percentage is based on a total of 3045 protein-coding genes Calyptogena magnifica [12], Riftia pachyptila [13] and Olavius algarvensis [16]). The B. thermophilus symbiont's TCA cycle is incomplete, as the enzyme 2-oxoglutarate dehydrogenase is missing. Moreover, homologs of the enzymes malate dehydrogenase and succinate dehydrogenase are also lacking, similar to what was reported for the thiotrophic B. azoricus symbiont [17].

Nitrogen metabolism
The B. thermophilus symbiont possesses genes for assimilatory nitrate reduction, i.e. for nitrogen uptake from nitrate. Its genome also encodes the Nar complex, a membrane-bound respiratory nitrate reductase necessary for respiratory reduction of nitrate, indicating that nitrate can be used as an alternative electron acceptor besides oxygen. Several membrane transporters for the uptake of nitrate, nitrite and ammonia are also encoded.

Immunity and cell surface interactions
Of the 3045 protein-coding genes, 10.74% are predicted to contain Pfam domains related to bacterial cell surface adhesion, such as bacterial Ig-like domain proteins and cadherins, and to putative toxins, such as pore-forming RTX and MARTX cytotoxins. Another 2.17% of the protein-coding genes were associated with immunity against phages (CRISPR-Cas, restriction modification system and the Abi toxin-antitoxin system). This elaborate presence of genes associated with pathogenicity and phage defense, typical of pathogens and bacteriophages, was also observed in the related thiotrophic B. azoricus symbiont [17,41]. This particular feature of Bathymodiolus symbionts is surprising since the bacteria a) reside in shielded intracellular niches, b) are beneficial symbionts for their host, and c) are not related to any known pathogen [26,41]. Moreover, approximately 1.71% of the protein-coding B. thermophilus genes belonged to several classes of pathogenic and digestive peptidases. Membrane transporters of type I and type II secretion systems, which transport toxins and folded exoproteins such as peptidases, are also encoded. Although their exact roles have not been determined as yet, we postulate that these pathogenicity-related genes may be involved in protecting the symbionts against pathogens or phages or even perform symbiosis-specific functions, such as symbiont attachment to the host or defense against the host's immune system, as suggested previously [41].

Conclusions
Sequencing of the uncultured B. thermophilus symbiont's genome allowed preliminary insights into its genomic characteristics and metabolic potential. Candidatus Thioglobus thermophilus appears to solely rely on sulfide and thiosulfate as energy sources, as genes for the oxidation of other reduced compounds were absent from its genome. The absence of three genes encoding essential TCA cycle enzymes, which was recently also reported for the thiotrophic B. azoricus symbiont [17], may suggest that these genes are consistently missing in Bathymodiolus symbionts. The unusual presence of a repertoire of genes associated with cell adhesion, toxin production and phage immunity in the non-pathogenic B. thermophilus symbiont may point to a symbiosis-specific beneficial role of these functions other than pathogen defense. Authors' contributions RP submitted the genome, performed genome and phylogenetic analysis and drafted the manuscript. RP and HF collected mussel samples and purified the symbiont fractions. SMS was the Chief Scientist on the cruise and coordinated the sample collection. MK and StM developed the symbiont enrichment procedure. RS performed the electron microscopy. AT and RD conducted DNA isolation and genome sequencing. LS, MK and SEH assembled the genome, LS binned the genome and conducted quality control tests. LS and SEH helped with phylogenetic analyses. TS and StM supervised and coordinated the entire project. All authors reviewed and approved the final version of the manuscript.