SoxY gene family expansion underpins adaptation to diverse hosts and environments in symbiotic sulfide oxidizers

ABSTRACT Sulfur-oxidizing bacteria (SOB) have developed distinct ecological strategies to obtain reduced sulfur compounds for growth. These range from specialists that can only use a limited range of reduced sulfur compounds to generalists that can use many different forms as electron donors. Forming intimate symbioses with animal hosts is another highly successful ecological strategy for SOB, as animals, through their behavior and physiology, can enable access to sulfur compounds. Symbioses have evolved multiple times in a range of animal hosts and from several lineages of SOB. They have successfully colonized a wide range of habitats, from seagrass beds to hydrothermal vents, with varying availability of symbiont energy sources. Our extensive analyses of sulfur transformation pathways in 234 genomes of symbiotic and free-living SOB revealed widespread conservation in metabolic pathways for sulfur oxidation in symbionts from different host species and environments, raising the question of how they have adapted to such a wide range of distinct habitats. We discovered a gene family expansion of soxY in these genomes, with up to five distinct copies per genome. Symbionts harboring only the “canonical” soxY were typically ecological “specialists” that are associated with specific host subfamilies or environments (e.g., hydrothermal vents, mangroves). Conversely, symbionts with multiple divergent soxY genes formed versatile associations across diverse hosts in various marine environments. We hypothesize that expansion and diversification of the soxY gene family could be one genomic mechanism supporting the metabolic flexibility of symbiotic SOB enabling them and their hosts to thrive in a range of different and dynamic environments. IMPORTANCE Sulfur metabolism is thought to be one of the most ancient mechanisms for energy generation in microorganisms. A diverse range of microorganisms today rely on sulfur oxidation for their metabolism. They can be free-living, or they can live in symbiosis with animal hosts, where they power entire ecosystems in the absence of light, such as in the deep sea. In the millions of years since they evolved, sulfur-oxidizing bacteria have adopted several highly successful strategies; some are ecological “specialists,” and some are “generalists,” but which genetic features underpin these ecological strategies are not well understood. We discovered a gene family that has become expanded in those species that also seem to be “generalists,” revealing that duplication, repurposing, and reshuffling existing genes can be a powerful mechanism driving ecological lifestyle shifts. Sulfur metabolism is thought to be one of the most ancient mechanisms for energy generation in microorganisms. A diverse range of microorganisms today rely on sulfur oxidation for their metabolism. They can be free-living, or they can live in symbiosis with animal hosts, where they power entire ecosystems in the absence of light, such as in the deep sea. In the millions of years since they evolved, sulfur-oxidizing bacteria have adopted several highly successful strategies; some are ecological “specialists,” and some are “generalists,” but which genetic features underpin these ecological strategies are not well understood. We discovered a gene family that has become expanded in those species that also seem to be “generalists,” revealing that duplication, repurposing, and reshuffling existing genes can be a powerful mechanism driving ecological lifestyle shifts.

(1-3).Sulfur-oxidizing bacteria (SOB), which use sulfur oxidation to generate energy, are remarkably diverse and are widespread in nature (1,2).Although they can be found worldwide, their simultaneous reliance on reduced sulfur compounds and oxidants such as oxygen means that they are restricted to the limited habitats where both co-occur.Presumably, competition among diverse SOB for the limited environments they can thrive in is intense.SOB have evolved a range of successful strategies to access oxidants and reductants, attuned to various microniches within these environments.These range from the remarkable morphology and migration behavior of some giant sulfur bacteria such as Beggiatoa sp.(4), to anaerobic growth on electron acceptors other than oxygen, such as in Thioalkalivibrio denitrificans that uses nitrate (5).In addition, many bacteria can transform sulfur compounds without relying exclusively on these for energy, such as Roseobacter spp.(6).This could explain why most environments ideal for sulfur oxidation host diverse communities of sulfur oxidizers and other microorganisms.
Evolving symbiosis with an animal host is also a highly successful way for SOB to access substrates.In symbiosis, SOB are thought to have continuous and simultaneous access to oxygen and reduced sulfur compounds that might otherwise be in limited supply or subject to intense competition among free-living bacteria in the external environment (7)(8)(9).In return, the symbionts use the energy they gain from sulfur oxidation to convert inorganic carbon into nutrients for the host (10).Symbioses with sulfur-oxidizing bacteria have evolved convergently on multiple occasions in at least seven eukaryotic phyla, including ciliates, tube worms, sponges, and lucinid clams (9,(11)(12)(13)(14).Found in a variety of environments, from intertidal sediments to deep-sea hydrother mal vents, these symbioses face, at times, extreme and fluctuating concentrations of reduced sulfur compounds and electron acceptors in the environment (11).Additionally, biotic factors, such as symbiont localization within the host, mode of transmission, and fidelity, further shape these interactions (15).The range of microenvironments experienced by symbiotic SOB is therefore virtually as diverse as the hosts harboring them.The most diverse animal family that hosts symbiotic SOB is Lucinidae, which are marine bivalve clams (16).They are found in virtually all habitats where sulfur-oxidiz ing symbioses occur.Their symbionts are horizontally transmitted, taken up from the environment during host juvenile development (17).Once taken up, they are housed inside epithelial cells of the host gill (18).Considering that the symbionts must survive in the free-living environment, but also be adapted to an extremely intimate symbiosis with their host, lucinids represent an intersection of evolutionary forces, and therefore an excellent opportunity to understand genomic mechanisms of adaptation to hosts and environments.
Within the Gammaproteobacteria, symbiotic SOB comprise at least nine phyloge netically distinct clades (9,19).All encode highly similar sulfur-oxidizing pathways in their genomes (12,20,21).The uniformity of sulfur metabolic pathways across diverse gammaproteobacterial symbiont lineages appears paradoxical in light of substantial variability in types and concentrations of sulfur compounds in their microenvironments, and the range of distinct pathways used by their free-living counterparts (9,22,23).However, the evolutionary forces shaping sulfur-oxidizing pathways within symbiotic SOB are not well understood.
The ability to adapt to novel habitats and shifting conditions can arise not only from the emergence of novel metabolic pathways but also from diversification of existing genetic elements (24).One prominent example in sulfur metabolism is seen in the genes encoding SoxY.SoxY is a sulfur anion carrier protein within the sulfuroxidizing multienzyme complex (Sox pathway) (25,26).The "canonical" SoxY has a highly conserved amino acid sequence in the sulfur-binding "swinging arm" region.In addition to the canonical soxY, the genomes of some SOB encode for divergent copies of the gene.Divergent SoxYs have mutations in the substrate-binding swinging arm region, potentially affecting substrate specificity (27)(28)(29).For example, mutations in the sulfur-binding region of SoxY in Sulfurimonas denitrificans enabled the oxidation of S8, a previously unrecognized Sox pathway substrate (30).
In this study, we investigated the evolution and diversification of the soxY gene family and its potential role in the diversification of SOB engaged in symbiotic associations with marine animals.We used comparative genomics and phylogenomic analyses to reveal sulfur-oxidation genes and pathways in 234 metagenome-assembled genomes (MAGs) of diverse marine symbiotic bacteria.Our findings indicate that although sulfur oxidation pathways are highly conserved across distinct phylogenetic clades of symbiotic SOB, there was a noteworthy expansion of the soxY gene family with some genomes containing as many as five distinct copies of the gene.Interestingly, symbionts that encode only the "canonical" soxY gene exhibited potential specialization for thiosulfate as a substrate and seem to be confined to a very few specialized hosts or environments.In contrast, symbionts with multiple divergent soxY genes form associations with diverse hosts across various environments.The expansion and diversification of the soxY gene family may have enabled symbionts and their hosts to effectively colonize a wider range of diverse environments.

New collections
We collected clams from multiple sites and environments with detailed descriptions in Table S1.Divalinga weberi n = 1 and Anodontia alba n = 1 were collected in beds of Thalassia testudinum in Bocas del Toro, Panama.Both clams were sampled by snorkeling and digging sediment, then the clams were sieved out and stored in RNAlater at −20°C.Euanodontia ovum n = 1 was collected from Lazarus beach on St John's Island, Singapore.The clam was collected by sieving sediment dug up from a seagrass bed during low tide.Further individuals including Loripes orbiculatus n = 11, Loripinus fragilis n = 7, and Lucinella divaricata n = 5 were collected by sieving sediment dug up from seagrass (Cymodocea nodosa) meadows in Piran, Slovenia and fixed immediately in RNAlater (Cat.No. AM7020; Life Technologies, USA) at 4°C overnight and stored at −20°C until extraction.Rugalucina vietnamica n = 5 was collected from an intertidal seagrass bed using a shovel at approx. 10 cm depths.Sediment was sieved through a 1-mm mesh, and bivalves were collected.Thyasira cf.gouldii specimens were sieved (1 mm mesh) out of sediment collected at roughly 15 m water depth using a Petersen grab at Neddy's Harbour, Bonne Bay, Newfoundland, Canada.The T. cf.gouldii specimens were transpor ted to Memorial University, St. John's, Newfoundland, where the gills were dissected, preserved in RNAlater, and stored at −20°C until extraction.

Museum samples
We carefully dissected gill tissue fragments using clean scalpels and forceps from lucinid clam samples (Table S1) at the Natural History Museum in London, UK, and the Florida Museum of Natural History, Gainesville, FL, USA.Access to the Florida Museum collection was graciously provided by Gustav Paulay and Amanda Bemis.All museum collection samples were preserved in 70%-100% ethanol and stored at 13°C-25°C.

DNA extraction and library preparation for metagenome sequencing
DNA was extracted from gill tissue of 29 lucinid and 1 thyasirid clam species from diverse vegetated and unvegetated environments (Table S1).Information such as habitat type, precise location, and depth were recorded when available and are found in Table S1.The DNA was extracted from approximately 1 cm 2 gill fragments using three different extraction methods: the Qiagen DNeasy Blood and Tissue Kit (Cat.No. 69506; Qiagen, USA), the animal tissue protocol from the Analytik Jena Innuprep DNA Mini Kit (Cat.No. 845-KS-1041250, Germany), and phenol-chloroform method.Detailed protocols are described in the Supplemental Text.
Sequencing library preparation and metagenomic and amplicon sequencing were carried out by the Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna.Libraries were constructed using Illumina compatible library prep kits (NEBNext Ultra II FS DNA Library Prep Kit, NEBNext Ultra DNA v2 Library Prep Kit, and Nextera Mate Pair Sample Preparation Kit).All libraries were sequenced with Illumina technology (HiSeq 3000, HiSeq 4000, and NovaSeq 6000) using paired-end settings with read lengths of 100, 150, or 250 bp to generate a minimum of 1,000,000 reads (see Table S1 for more details).

Quality filtering, assembly, and bacterial genome binning
Quality filtering, assembly, and bacterial genome binning were done according to the workflow used in reference 31.Read libraries were trimmed, contamination filtered, and quality checked using BBMap v37.61'sBBDUK (32) feature and the software's adapter and PhiX databases to detect contaminants.Libraries were interleaved as needed for future processing and analysis.Individual read libraries were assembled using SPAdes v3.13.1 (33,34).Read libraries were mapped to the scaffold assemblies using BBMap with default settings.The resulting sam files were converted to bam files with samtools v1.9 (35) and sorted using the anvi-init-bam script from Anvi'o (36).The assembled scaffolds were then binned using a combination of Anvi'o v6.1 or 7.1 using CONCOCT v1.1.0(37), and metaBAT v2.15 (38).All potential bins for each metagenome were then compared using dRep v2.4.2's (39) dereplicate workflow with default settings, and the one considered the best was selected automatically.If no bin was selected, a manual bin selection (using CheckM v1.1.3 [40] with Gammaproteobacteria gene set) and revision of the best metagenome-assembled genome using Anvi'o's "anvi-refine" were attempted to manually improve the quality of the MAGs.Scripts for this workflow can be found in the supplemental material from reference 22.
Further MAGs of symbionts of lucinids and other marine invertebrates were retrieved from publicly available assemblies deposited in the NCBI database as well as MAGs of free-living bacteria Allochromatium vinosum (41), Sedimenticola thiotaurini (42), and Sedimenticola selenatireducens (43) (Table S2).To reduce the number of possible false negatives in gene searches, all symbiont genomes were analyzed for genome com pleteness and contamination using CheckM (40).The CheckM analysis was performed according to the default taxonomic-specific workflow using Gammaproteobacteria as the taxonomic rank.The minimum required completeness and contamination for SoxY analyses were above 90% and below 5%, respectively, with a few exceptions important for taxonomic diversity listed in Table S2 resulting in a total of 234 genomes.While interpreting the data from the MAGs of lower completeness during presence/absence analysis of sulfur genes, we considered that the absence may have been caused by lower completeness.Furthermore, two of thyasirid symbiont MAGs, belonging to the same species, showed substantial contamination (5.06%-6.65%);therefore, the interpretation of sulfur gene presence results for these MAGs was considered estimates.

Annotation and analysis of sulfur oxidation metabolism
Putative sulfur genes from all symbiont genomes were identified using HMSS2 (44) run on default settings and using blastp (BLAST+ 2.14.0) (45) search using known identified sulfur proteins from A. vinosum as a query with a maximal e-value 10e−30 and 10e−6 for short proteins under 120 amino acids (Table S3).

Data collection
To identify soxY genes, all MAGs and genome assemblies were first annotated using Prodigal v2.6.3 using default settings (46), and the predicted protein sequences queried with the SoxY HMM model obtained from the eggNOG database v5.0 (47) (accessed January 2022) and using the hmmsearch tool in HMMER v3.3.2 (48) with an e-value cutoff of 10 −5 (based on the observed sequence identity of SoxY and SoxZ proteins).To reconstruct the SoxY phylogeny, sequences from a reference proteomes alignment of SoxY (PF13501) covering 55% of SoxY sequences (RP55) in representative proteome groups containing similar proteomes were calculated based on co-membership in UniRef50 clusters resulting in 1,673 sequences (obtained on 30 August 2022).Moreover, 44 viral SoxY sequences (49) were also included by search of MAGs as described for symbiont sequences.The final list of putative SoxY sequences was created by combin ing the SoxYs from symbiotic and viral MAGs and RP55 SoxY sequences and further verification for the presence of SoxY (PF13501) functional domain using the InterProScan (50) search on default settings.To optimize the usage of computational resources, the number of total sequences was reduced from 2,291 to 1,960 by the deletion of identi cal proteins using the deduplicate sequences function (rmdup) in seqkit v2.2.0 (51).Sequences of poor quality (less than 140 aa length, no "GGC'' functional motif ) were deleted resulting in a total of 1,631 sequences.

Phylogenetic analysis of SoxY sequences
The final set of full-length SoxY sequences was aligned using hmmalign (48) (default settings), and the SoxY model retrieved from eggNOG database (47).The alignment was visualized using MEGA X (52), and poorly aligned regions were manually identified and trimmed.During manual inspection of the alignment, a further 31 low-quality sequences were removed resulting in a final number of 1,631 sequences.A maximum likelihood tree was constructed using IQ-Tree multicore version 2.1.2(53,54) with the Q.pfam+I+G4 model (ModelFinder [55] was used to determine the best substitution model) and 10,000 ultra-rapid bootstraps (UFBoot) (56).The final consensus tree was visualized and annotated using Interactive Tree Of Life (iTOL) v6.8 (57) as well as Inkscape 1.3.2.To establish if soxY and soxZ genes, encoding for two functionally connected dimers of SoxYZ proteins, coevolved, the phylogenetic of SoxZ sequences was performed using SoxZ sequences associated with analyzed SoxY sequences (detailed description in Supplemental Text).

Phylogenomic reconstruction of the lucinid symbiont relationships
Newly obtained symbiotic MAGs as well as publicly available MAGs of previously described lucinid symbionts (Ca.Thiodiazotropha sp., Ca.Sedimenticola sp., and Thiohalomonadales sp.), alongside symbiotic bacteria of marine hosts and MAGs of free-living close relatives to Ca. Thiodiazotropha obtained from the NCBI database, were included in the analysis (NCBI accession numbers in Table S2).All publicly available MAGs we used were quality checked using CheckM's taxonomy-specific workflow using the gammaproteobacterial set of marker genes (40).All MAGs were taxonomically classified using GTDB-Tk v0.3.3 classify workflow.This workflow produced a concatenated amino acid sequence alignment of 120 highly conserved single-copy bacterial marker genes that was used for phylogenomic analyses (46,(58)(59)(60)(61)(62).The alignment was constructed using IQ-Tree multicore version 2.1.2(53, 54) using the default settings (auto substitution model detection-LG+I+G and 1,000 ultra-rapid bootstraps (UFBoot) (56).The tree was rooted at midpoint and visualized using iTOL v5 (57).The resulting phylogenetic tree was used alongside ANI analyzes to identify species-level clades.The average nucleotide identity (ANI) between MAGs was calculated using FastANI v1.3 (59), with a program-sug gested 95% one-way ANI used as the threshold for species delimitation.

Gene organization visualization in lucinid symbionts
The lucinid symbiont MAGs were annotated with the Rapid Annotation using Subsystem Technology (RAST) web server (63) using the RASTtk pipeline (64) and analyzed in SEED (65).The genes of interest were identified by the internal BLAST function in SEED.For each of the gene, models surrounding the soxY locus were annotated in RAST and verified using searches of the BLASTp against non-redundant protein sequences (nr) (66,67), Pfam database (68), the eggNOG database (47) as well as InterProScan (50).The analysis was done on one representative MAG from each symbiont species (ANI above 95%) with the least number of contigs (Table S4).In case of substantial fragmentation of the analyzed region, a MAG with the second lowest fragmentation was chosen.For each of the tools, the search was performed on default settings with an e-value cutoff of 10 −5 .The gene visualization was based on the annotation overview in the SEED genome browser and illustrated using Inkscape 1.3.2.

Diverse symbiotic bacterial genomes revealed through metagenomic analysis of lucinid and thyasirid hosts
We sequenced, assembled, and binned the metagenomes of 68 lucinid and thyasirid individuals, representing 1 thyasirid species and 27 different lucinid host species from 4 subfamilies (Table S1).A total of 72 bacterial metagenome-assembled genomes were retrieved, out of which 60 were considered "high quality" because they were over 90% complete and less than 5% contaminated (Table S2) (69).All 72 MAGs were assigned to the Gammaproteobacteria class, specifically to the order Chromatiales and the family Sedimenticolaceae based on GTDB release 212 (Table S2).To expand the scope of our study, we included additional 162 publicly available MAGs of marine symbiotic bacteria associated with animal hosts (Table S2).Our analysis encompassed a total of 234 symbiotic bacterial genomes, spanning approximately 89 animal host species, from 5 phyla.The geographical scope encompassed 100 sites across 6 continents, representing diverse ecological contexts (Tables S1 and S2; Fig. S1).
Through phylogenomic analyses employing a 95% average nucleotide identity threshold for species delimitation, 69 distinct bacterial species clades were identified among the symbiont MAGs (Fig. 1; Fig. S3; Table S5).This large data set allowed us to identify several novel symbionts.We retrieved a total of 47 novel Thiodiazotropha MAGs, comprising 14 symbiont species, 8 of which were previously undescribed, that were associated with 21 lucinid species from 3 different subfamilies (Fig. 1; Fig. S3; Table S5).We also identified 21 new MAGs representing 2 novel symbiont species from the genus Sedimenticola (Ca.Sedimenticola endoloripinus and Sedimenticola3), which were associated with 8 host species from the Pegophyseminae subfamily, marking the first comprehensive characterization of symbionts within this host subfamily (Fig. 1; Fig. S3; Table S5, Supplemental text).Finally, four MAGs, corresponding to three symbiont species, were retrieved from the Thyasira cf.gouldii (Fig. 1; Fig. S3; Table S5).

Phylogenetically and ecologically diverse symbiotic SOB share highly con served sulfur metabolisms
In contrast to highly diverse sulfur metabolic pathways known from free-living SOB, symbiotic SOB have been reported to have a highly conserved set of pathways; such uniformity is perhaps unexpected, given their phylogenetic diversity (12,23).Our analysis across 69 different SOB symbiont species from multiple genera covering a great diversity of hosts and environments confirms earlier observations based on smaller data sets that sulfur oxidation metabolic pathways are either identical or of complementary function, meaning that the same reactions are carried out, but by different enzymes (Fig. 1; Fig. S3, Table S6).In symbiotic SOB, sulfur compounds are initially oxidized in the periplasmic space using the sulfide:quinone oxidoreductase (Sqr), sulfide dehydrogenase (FccAB), and the sulfur-oxidizing multi-enzyme (Sox) system (70) (Fig. 1; Fig. S2).Most symbiotic SOB encoded a truncated Sox pathway, lacking SoxCD, that accumulates polymeric water-insoluble sulfur globules within the periplasm as intermediate product (70,71).They also shared the ability to subsequently oxidize these sulfur globules via the cytoplasmic dissimilatory sulfite reductase (Dsr) system, APS reductase, and ATP sulfurylase (Fig. 1; Fig. S2), which is advantageous when reduced sulfur compounds are not always available in the environment (70,72,73).The only genes that differed encoded the functionally complementary enzymes AprM and QmoAB/HdrBC (Fig. 1; Fig. S2).This toolkit for sulfur oxidation not only exhibited similarities in gene copy numbers but also shared relatively high amino acid identity across ecologically diverse symbiotic SOB genera originating from a wide range of environments (Fig. 5; Table S1).The gene encoding the sulfur oxidation protein SoxY was the sole exception, with an average amino acid identity of just 20% compared to the standard SoxY in A. vinosum and varying in copy numbers of up to five per genome (Fig. 1).Further columns describe presence or absence of major sulfur-oxidizing genes within inorganic and organic (Org.s. m.) sulfur metabolism.

Phylogenetic analysis of SoxY proteins reveals diversification into four major clades
We reconstructed the phylogenetic relationships of SoxY proteins from symbiotic and free-living SOB to investigate the evolutionary history of this gene family.The final set of 1,634 SoxY sequences, representing 247 symbiotic and 1,387 free-living bacteria and viruses, formed four major clades (Fig. S7).Clades 1 and 3 contained canonical thiosul fate-oxidizing SoxY proteins, characterized by the conserved "VTIGGC"' motif, previously established in A. vinosum (74) and Paracoccus pantotrophus (26,75).Clade 1 was dominated by proteins from Gammaproteobacteria, including all examined symbiotic SOB, while clade 3 primarily consisted of SoxY proteins from metabolically versatile Alphaproteobacteria capable of chemolithoautotrophy and chemoorganoheterotrophy such as Magnetospira sp.(strain QH-2), Methylosinus sp. 3 S-1, and Hyphomicrobium spp.(Fig. S4).Clade 3 contained proteins from only a few symbiotic SOB.These were the symbionts of Ca. T. endolucinida, Ca.T. CTENA1, and Ca.T. CTENA2 (Fig. 2).
Clades 2 and 4, which comprised genes originating from a fusion of the SoxY and SoxZ subunits, were highly divergent, as evidenced by long branch lengths and low amino acid identity to the canonical SoxY sequence from the model sulfur oxidizer Allochromatium vinosum (Fig. 2; approximately 22%).The genes for SoxYZ sequences from clades 2 and 4 were present in the genomes of 15 Ca.Thiodiazotropha symbiont species and the symbiont of the shipworm Kuphus polythalamius.Furthermore, clade 2 additionally contained SoxYZ sequences from the lucinid symbiont Ca.T. lotti and from Cycloclasticus spp., which are known for short-chain alkane oxidation (76).Clades 2 and 4 also contained representatives from phylogenetically diverse and metabolically versatile free-living bacteria (clade 2: Alpha-, Beta-, Gammaproteobacteria and Campylobacterota; clade 4: Alpha-, Beta-, Delta-, and Gammaproteobacteria; Fig. S4) as well as viruses (clade 4; Fig. 2).These bacteria included Thauera, Azoarcus, and Hyphomicrobium species, which are mixotrophs or obligate heterotrophs and oxidize thiosulfate as an accessory metabo lism (77,78).Clade 4 also included SoxYZ sequences from the methane-oxidizing symbionts of Bathymodiolus mussels as well as Methylococcus capsulatus (79), H. sulfoni vorans (80), Methylocella silvestris (81), all of which are not known to be capable of energy generation from thiosulfate oxidation.

Six unique sequence signatures in the sulfur-binding "swinging-arm" region of SoxY proteins
Aside from segregating into four distinct clades, the SoxY sequences of symbiotic SOB had an additional feature: six distinct sequence signatures within the "swinging arm" region of the protein, which is the substrate-binding site (Fig. 2; Fig. S6).These signatures were polyphyletic traits that did not necessarily correspond to unique monophyletic SoxY clades (Fig. 2).Three of the signatures (1a, 1b, and 1c) were unique to clade 1 and were observed in both symbiotic and free-living Gammaproteobacteria (Fig. 2).Signa ture 1a, in particular, featured the highly conserved "VTIGGC" motif, a hallmark of the canonical thiosulfate-oxidizing SoxY described in model systems such as A. vinosum and P. pantotrophus, as well as relatively high amino acid sequence similarity to these model systems (A.vinosum between 52.2% and 64.2%, P. pantotrophus between 40% and 46.9%).Signatures 1b and 1c were distinguished from 1a by only a single amino acid change (Fig. 2).Signature 1b was present in only four symbiont species: Three Thyasira cf.gouldii symbiont species, where it is the only soxY copy in the genome, and in Ca.Sedimenticola endophacoides, which also has a clade 1 soxY gene with the highly conserved signature 1a (Fig. 2, cyan).Signature 1c was unique to a divergent group of genes from Ca. T. weberae and Ca.T. lotti with lower amino acid identity to the canonical A. vinosum SoxY (between 33% and 35%).Sister to this clade was a less divergent cluster of two sequences from Ca. T. CTENA2 symbiont species (exhibiting 43.5% amino acid similarity to A. vinosum).Signatures 2-4 were each observed in proteins from the clades of the corresponding name (clades 2-4; see Fig. 2; Fig. S7).Signatures 2-4 were highly divergent from the canonical SoxY of A. vinosum, with low amino acid identity (24%- 33.3% for signature 2, 28.8%-31.6%for signature 3, and 26%-36.4% for signature 4) and multiple mutations within the "swinging arm" motif.Signature 3 was unique to lucinid symbionts within clade 3, while signature 2 (186 sequences) and signature 4 (297 sequences) were widely distributed among free-living bacteria throughout their respective clades.The SoxY variants will hereafter be referred to as SoxY-S1a, SoxY-S1b, SoxY-S1c, SoxYZ-S2, SoxY-S3, and SoxYZ-S4, where the suffix of each name denotes both the clade to which the amino acid sequence belongs, and the sequence signature pattern observed in the sulfur-binding region.

Lucinid symbiont genomes reveal distinct genomic context and functional variations of SoxY signatures compared to free-living sulfur oxidizers
To gain insight into the potential roles of the different SoxY variants, we investigated the genome architecture and context of the Sox pathway components in lucinid symbiont MAGs from the Thiohalomonadales, Sedimenticola, and Thiodiazotropha groups.For SoxY-S1a, three main genomic organizational patterns emerged, which were consistent with symbiont phylogeny: (i) The Thiohalomonadales Sox pathway was split into two loci: one harboring soxB and soxAX, and the other containing the soxYZ genes (Fig. 3).The soxB and soxAX locus included genes encoding LuxR and histidine kinase genes, known transcriptional regulators in quorum sensing (82).Meanwhile, the soxYZ locus housed genes encoding the transcriptional regulators σ-54 and histidine kinase (Fig. 3) (ii).Sedimenticola had a soxABXYZ operon, situated downstream of the transcriptional regulators, the LuxR-encoding gene, and a histidine kinase (Fig. 3).Additionally, the Sedimenticola MAGs encoded a second copy of soxY-S1a and the adjacent soxZ in a separate genomic location, but no transcriptional regulators could be identified (iii).In Thiodiazotropha, the sox genes were segregated into three clusters, soxB, soxAX, and soxYZ, that were distributed across separate locations (Fig. 3).Only the soxB gene cluster included transcriptional regulators, LuxR protein, and a histidine kinase.No transcrip tional regulators could be identified in the remaining gene clusters.
A further soxY gene variant from clade 1 is soxY-S1b, which we only found in the genus Sedimenticola, neighbored genes encoding beta-barrel assembly-enhancing proteases, ribosomal RNA small subunit methyltransferase E, and a chemotaxis methylaccepting receptor (Fig. S5).Genes encoding the most divergent SoxY homologs in clade 1, soxY-S1c, were co-localized with genes involved in sulfur oxidation, such as the sulfur carrier protein YeeE/D and dissimilatory sulfate reductases dsrAB.Further upstream of the soxY-S1c was genes annotated as dimethyl sulfoxide reductases (dmsAB) that reduce DMSO to DMS.The lucinid soxYZ-S2 was situated on a locus containing a PQQ-dependent quino(hemo)protein alcohol dehydrogenase (ADH), which oxidizes a variety of alcohols and sugars (83), and soxH, a sox gene of unknown function.Genes encoding SoxY-S3 were located upstream of a copy of soxZ and downstream of a gene encoding octaheme tetrathionate reductase (otr), which can reduce tetrathionate to thiosulfate but may also have a role in reducing nitrite (84,85).Finally, genes for SoxYZ-S4 were clustered with those for formate (fdhAB) and methanol dehydrogenases (xoxF).It is noteworthy that the soxYZ-S2 and soxYZ-S4 fusion genes were also encoded with conserved synteny in the MAGs of the Kuphus polythalamius symbiont (soxYZ-S2 and soxYZ-S4) and Bathymodiolus methane-oxidizing symbionts (soxYZ-S4 only).The genome architectures described above were conserved across all the representative MAGs that were analyzed (Table S4).(Fig. S6).The canonical SoxY sequence with its conserved sulfur-binding cysteine-110 (cys110) is labeled as SoxY-S1a.Presence of viral sequences within collapsed clades has been marked in clades 3 and 4 (orange star).Coloring according to the previous tree and legend.Branches with support over 75% (10,000 bootstraps) have been marked with black circles, the size of the circle is equivalent to the % support according to the legend.

DISCUSSION
Partnering with animal hosts helps chemoautotrophic SOB avoid competition and provides an advantage over the free-living lifestyle by ensuring a constant supply of oxygen and reduced sulfur compounds.It is intriguing that although free-living SOB encode diverse pathways for sulfur oxidation, the pathways encoded by symbiotic SOB are largely conserved, although they stem from several distinct phylogenetic groups.This suggests that this set of sulfur oxidation mechanisms may provide a selective advantage to SOB in symbiosis with animal hosts.

The central role of canonical Sox genes and thiosulfate oxidation in shaping symbiotic relationships
In this study, we characterized the sox genes, particularly the soxY gene family, from symbiotic systems to shed light on factors shaping their evolution.All SOB symbionts, except those associated with Thyasira cf.gouldii, possessed the clade 1 SoxY sequences with the S1a signature in the sulfur-binding region.The near-ubiquitous conservation of this gene and key residues within the binding region, which is the characteristic of the canonical SoxY, suggests these SOB symbiont SoxYs play a highly conserved role in binding thiosulfate.The conservation of this function is consistent with observations from physiological studies of these systems.For example, thiosulfate accumulates in the hemolymph of the lucinid Stewardia floridana when the clams are incubated with sulfide (86).Furthermore, the symbionts of Bathymodiolus thermophilus show a preference for thiosulfate (87).Members of genus Sedimenticola encode two copies of the canonical soxY, both with the S1a signature.Encoding two nearly identical canonical soxY-S1a variants could potentially increase SoxY activity through higher dosage as a result of gene duplication-amplification, possibly indicating specialization toward thiosulfate oxidation in this genus (88)(89)(90).
The organization of soxABYZX genes and their regulators into a single operon, as we saw in Sedimenticola, was previously observed in free-living SOB such as P. panto trophus, Chlorobium tepidum, Bradyrhizobium diazoefficiens, and A. thiooxidans (91)(92)(93).An operon structure would enable co-regulation and co-transcription of the entire Sox pathway, minimizing stochastic variations in gene expression and reducing the regulatory information required for optimizing gene expression patterns (94,95).In contrast to Sedimenticola, sox genes were dispersed throughout the genome in the other two lucinid symbiont lineages, Thiohalomonadales and Thiodiazotropha, similar to the free-living SOB A. vinosum.While the functional implications of these different genome architectures are unknown, the formation or degradation of operon structures profoundly influences gene expression patterns and likely indicates adaptations to diverse lifestyles and environmental conditions (94,95).Furthermore, although the role of LuxR in regulating sulfur metabolism remains unknown, their widespread association with sox genes in the genomes investigated here indicates the potential for interactions between the regulation of sulfur oxidation and population-level metabolic processes and/or host interactions (82).
Interestingly, the only symbiotic SOB whose genome lacks the canonical soxY are the symbionts of Thyasira cf.gouldii.These encode only the SoxY-S1b homolog, which is also found as a second copy in free-living Sedimenticola species as well as in the Ca.Sedimenticola endophacoides.The mutation in the "swinging arm" region of the protein compared to the canonical SoxY-S1a suggests potential functional adaptation, possibly related to varying affinities for thiosulfate as proposed for Sulfurimonasand Sulfurovum-related Campylobacterota (27)(28)(29).

Potential roles of divergent SoxY homologs in organosulfur, tetrathionate, and alcohol metabolism
While the soxY-S1a and soxY-S1b genes likely play a conserved role in thiosulfate oxidation, the genomic architectures of soxY-S1c soxYZ-S2, soxY-S3, and soxYZ-S4 suggest they may have other functions.The proximity of soxY-S1c to genes annotated as DMSO reductases and soxY-S3 to genes annotated as tetrathionate reductases sug gests possible roles in sulfur metabolism distinct from the traditional Sox pathway of thiosulfate oxidation.If so, they possibly do not interact with thiosulfate but may bind to other sulfur compounds (Fig. 4).For example, a divergent SoxY has been proposed to be crucial in transformations of elemental sulfur in Sulfurimonas CVO, which is incapable of thiosulphate oxidation (96), and in Sulfurimonas denitrificans (30).DMSO could serve as an alternative electron acceptor in low oxygen conditions, while octaheme tetrathionate reductase has been implicated in regulating host-microbe interactions in the anthozoan Exaiptasia pallida (97) and the human gut (98).Furthermore, Koch and Dahl (99) reported a distinct link between inorganic (sox and hdr) and organic sulfur metabolism (dmsAB and mtoX) in Hyphomicrobium denitrificans (99), which, similarly to lucinid symbionts, encodes for multiple divergent SoxY homologs that have similar sequence signatures.
The fusion genes soxYZ-S2 and soxYZ-S4 were in close proximity to soxH, a gene potentially associated with sulfur metabolism.In Sulfurimonas sp.strain CVO, SoxH was able to compensate for the absence of SoxB by triggering sulfate release from SoxY, thus aiding in elemental sulfur oxidation (96).However, this role for SoxH is improbable for Thiodiazotropha, as all metagenome-assembled genomes also encoded a soxB gene.Notably, these clades feature SoxYZ fusion proteins, a result of the fusion of both carrier subunit genes, soxY and soxZ, previously reported in Alphaproteobacteria (71).Sulfur gene fusion was experimentally shown to improve functionality and efficiency, such as psrB and psrC in Chlorobium phaeobacteroides DSM 266, hdrC and hdrB in green sulfur bacteria, and the TsdB-TsdA fusion protein in Marichromatium purpuratum (100,101).In contrast to soxY-S1c and soxY-S3, the genes flanking soxYZ-S2 and soxYZ-S4 mainly have predicted functions in carbon metabolism; soxYZ-S2 was associated with a PQQ-dependent quino(hemo)protein alcohol dehydrogenase and soxYZ-S4 with formate and methanol dehydrogenases.It is noteworthy that clades 2 (SoxYZ-S2) and 4 (SoxYZ-S4) also contained sequences from bacteria that cannot oxidize thiosulfate or only do so in the presence of other substrates.These lines of evidence suggest that SoxYZ-S2 and SoxYZ-S4 are unlikely to be directly involved in sulfur metabolism but could instead play a role in carbon metabolism.
The discovery that growth on thiosulfate can inhibit methanol degradation, but not formate degradation in Hyphomicrobium denitrificans XT, further suggests there are potentially unexplored links between carbon and inorganic sulfur metabolic pathways (78).Interestingly, H. denitrificans XT also has a gene encoding for SoxYZ that belonged to clade 4 shared the same swinging-arm sequence as the Thiodiazotopha SoxYZ-S4.The synteny of these soxYZ-S4 and carbon metabolism genes was also conserved in the Kuphus polythalamia symbiont and the methanotrophic bathymodiolin symbionts.It is noteworthy that the soxYZ-S2 and soxYZ-S4 genes were prevalent in Thiodiazotropha symbiont species associated with lucinids from seagrass beds, which are potentially a rich source of alcohols such as methanol [Fig.5; (102)].The detoxifying role of SoxYZ-S4 in protecting methanol dehydrogenase from thiosulfate during growth on methanol could be especially beneficial during the free-living phase of Thiodiazotropha symbionts in methanol-rich environments like seagrass beds (103).The conservation of genomic links between soxYZ-S4 and genes for methanol and formate metabolism across multiple bacterial lineages leads us to further speculate that soxYZ-S4 genes were co-opted to enhance the efficiency of these metabolic pathways.
The co-organization of genes with diverse metabolic functions is a common phenom enon.Operon-driven adaptive evolution often rapidly incorporates new genes, favoring the emergence of single complex promoters over two independent ones.This can lead to operons containing seemingly unrelated genes that are nevertheless co-regulated as they are required under the same environmental conditions (104)(105)(106).For instance, conserved operons may include genes for ribosomal proteins and central metabolism, both essential for growth rate regulation (105).The high-sequence divergence of the soxYZ-S2 and soxYZ-S4 genes, their fusion with soxZ, and their conserved genomic synteny with genes involved in carbon metabolism together indicate that these genes likely originate from horizontal gene transfer (HGT) rather than a recent gene duplication event.This hypothesis is further supported by the presence of genes from diverse bacterial phyla in soxYZ clades 2 and 4, the incongruence between the topology of these branches with the phylogenetic relationships of the bacteria, and the presence of soxYZ genes in clade 4 that originated from marine viruses, known agents of HGT (Fig. 2; Fig. S4) (49).The versatile ability of SoxY proteins to bind to a range of sulfur compounds (27,28,30,75) provides the foundations that may preadapt these proteins for the evolution of novel functions (107).The co-occurrence of soxY genes with various gene families, including genes with established roles in sulfur metabolism within symbiotic SOB, as well as novel candidates like octaheme tetrathionate reductase and genes linked to carbon metabolism (e.g., alcohol, formate, and methanol dehydrogenases), suggests the functions of proteins from this family may be more diverse than previously assumed (75).These new metabolic traits not only support symbionts' survival in the environment during their free-living stage but are also potentially beneficial during symbiosis.Indeed, adaptations acquired by Vibrio fischeri living free in the environment can influence hostmicrobe interactions and even enhance fitness, while V. fischeri is associated with a host (108)(109)(110).

SoxY diversification shapes ecological strategies and symbiotic adaptations in sulfur-oxidizing bacteria
The lucinid symbioses, characterized by a unique diversity of hosts, habitats, and horizontal symbiont transmission mode, present a unique opportunity for investigating how the interplay between the environment, the host, and symbiont genomic and metabolic adaptations collectively shapes holobiont ecology.Symbiotic SOB, much like their free-living counterparts, have evolved several distinct ecological strategies.They range from "specialists, " which are adapted to grow most efficiently on a restricted range of substrates, and which thrive in highly specific symbiotic relationships with only one or a few host species or unique environments (e.g., hydrothermal vents, mangroves).An example of specialization can be seen in the Lucinoma aequizonata symbiosis; this lucinid occurs in oxygen minimum zones and exclusively associates with one Ca.Thiodiazotro pha species that is uniquely distinguished by its reduced genome size (31).At the other end of the spectrum are "generalists" that can use a wider range of substrates for energy generation, presumably enabling them to colonize a wider range of environments and host species (111).The prime example of the "generalist" strategy would be Ca.T. taylori, which has a global distribution and has been detected so far in eight lucinid host species (22).This diversity of ecological strategies likely underpins the high diversity of hosts and habitats in which sulfur-oxidizing symbioses have evolved (Fig. 5) (9,22).
An interesting pattern emerging from our findings was that symbiotic SOB such as the SUP05 group (also called Thioglobaceae [112]) and Thiohalomonadales, which encoded only the canonical SoxY-S1a but not the divergent types, tended to inhabit "extreme" environments such as deep-sea hydrothermal vents, mangroves, or oxygen minimum zones (Fig. 5).On the other hand, SOB symbionts with both canonical and divergent SoxY homologs were associated with lucinids from shallow water seagrass beds or organic-rich sediments with abundant plant debris in the case of the K. polytha lamius symbionts (Fig. 5).The correlation between specific environments and the diversity of the SoxY repertoire supports the idea of specialization in the symbiotic SOB lacking divergent SoxY proteins (Fig. S9 and S10).In contrast, symbionts encoding several distinct soxY genes were only ever found in seagrass beds.Seagrass sediments may be chemically and physically more complex than other environments inhabited by hosts of chemosynthetic symbionts such as "bare" sediments and hydrothermal vents.We hypothesize that expansion of the soxY gene family in these symbionts may be one genomic mechanism that underpins their flexibility in associating with a range of hosts and may also underpin the adaptation of the holobiont to complex, fluctuating environ ments.The proposed function of some divergent SoxYs in C1 or organosulfur metabo lism, compounds that are typically enriched in seagrass environments, is consistent with this (102).
The Sedimenticola3 symbiont clade was an interesting exception to these patterns.Despite the operon-like organization of the Sedimenticola3 sox genes, a hallmark of specialization on thiosulfate as a substrate, members of this lineage were found in association with lucinids inhabiting diverse shallow-water environments worldwide, such as seagrass, coral reef, and mangrove sediments (Fig. 5).It is noteworthy that the Sedimenticola3 symbionts are almost exclusively restricted to hosts from one lucinid subfamily, Pegophyseminae, where others from the Thiodiazotropha genus are typically more flexible and can associate with host species from more than one subfamily.Clams from the family Pegophyseminae also typically have unique morphological features not shared by species from other lucinid subfamilies, which may affect the microniches experienced by their symbionts (86,(113)(114)(115).Their specificity for the Pegophyseminae, lack of divergent SoxYs, and operon-like sox gene organization lead us to propose that the Sedimenticola3 symbionts specialize primarily on thiosulfate oxidation to obtain energy.If this is the case, it is further interesting to speculate that the Sedimenticola3 and Ca.Thiodiazotropha bacteria may occupy distinct niches in the external environment during their free-living phase.

Conclusions
Our findings suggest soxY gene family expansion and diversification, a prominent genomic feature of some symbiotic and free-living SOB, could contribute to metabolic flexibility and a "generalist" ecological strategy for exploiting a larger range of energy sources.We hypothesize that some of these divergent, non-canonical SoxY proteins may carry out novel functions, e.g., in carbon or organosulfur metabolism.Considering that pure cultures are available for some free-living SOB that encode a suite of divergent soxY genes, such as Hyphomicrobium denitrificans with three soxY copies, these hypotheses could be tested experimentally.Unfortunately, most of the genomes available in public databases, and those we generated here, are still not closed.It is therefore challenging to investigate the mechanisms responsible for the expansion of the soxY gene family based on the data currently available.However, one intriguing observation was that a SoxY encoded in a viral genome fell into a clade containing lucinid symbiont SoxYs (SoxYZ-S4).Sox genes, including SoxYZ fusion proteins, are commonly found in viral genomes (49).Horizontal acquisition of divergent SoxY(Z) proteins might therefore be one mechanism that could explain their expansion, although we currently have no information about viruses that could infect symbiotic SOB.The lack of congruence between SoxY(Z) proteins and symbiont phylogenies would be consistent with a role of horizontal gene transfer in soxY gene family expansion.This study was supported by the ERC Starting Grant EvoLucin (grant number 802494), a Vienna Research Grant for Young Investigators from the Vienna Science and Technol ogy Fund (WWTF, VRG14-021), by Austrian Science Fund projects MAINTAIN DOC 69 doc.fund,Cluster of Excellence Microbiomes Drive Planetary Health, and the Austrian Academy of Sciences.

FIG 1
FIG 1 Maximum likelihood phylogenetic tree of symbiotic bacteria reconstructed from GTDB's multisequence alignment using the best-fit model LG+I+G and 1,000 bootstraps.Black circles describe bootstrap support values above 85%.Clades containing symbiont MAGs generated in this study are indicated by asterisks.Number of symbiont species in each of the clades is indicated by the number in brackets.Lucinid symbiont genera are indicated in the following colors: Thiohalomonadales, blue; Sedimenticola, yellow; Thiodiazotropha, green.The first column describes the number of encoded soxY genes in each symbiont MAG.

FIG 2
FIG 2 Maximum likelihood tree of SoxY proteins (1,631 sequences total) from symbionts and free-living bacteria including sequences from NCBI and Pfam (RP55).The major clades are labeled according to their phylogeny.The SoxY sequence signature groups from symbiotic SOB were distinguished by mutations within the sulfur-binding "swinging arm" of the protein (Continued on next page)

FIG 3
FIG 3 Genomic islands for sulfur oxidation across lucinid symbiont genera.SoxY displayed here is the canonical soxY signature 1a gene.All regulatory genes are marked in red.Hypothetical genes are marked in gray.Colors used for the sox genes in Thiohalomonadales and Thiodiazotropha correspond to the labeling in the Sedimenticola clade.

FIG 5
FIG 5 Maximum likelihood phylogenetic tree of symbiotic bacteria reconstructed from GTDB's multisequence alignment (as in Fig. 1) indicating the host and environmental context.Lucinid symbiont lineages are marked by the following colors: Thiohalomonadales, blue; Sedimenticola, yellow; Thiodiazotropha, green.The number of symbiont species in each clade is indicated in brackets.The presence and absence of the symbiotic SoxY signatures are marked by colored circles with colors following the legend and previous description.Gray circles show organisms encoding SoxYs with signatures different to those marked in color.The first column from the right indicates the main substrate used for energy metabolism: S, sulfur; Org, complex organic compounds; C, methane and short-chain alkanes.
this study, and to Stephan Köstlbacher for assistance with phylogenetic analyses.Most of the sequencing was carried out at JMF of the Medical University of Vienna and the University of Vienna (project IDs JMF-2002-8, JMF-1911-9, JMF-2011-C, and JMF-2104-13).