The defining genomic and predicted metabolic features of the Acetobacterium genus

Acetogens are anaerobic bacteria capable of fixing CO2 or CO to produce acetyl-CoA and ultimately acetate using the Wood-Ljungdahl pathway (WLP). This autotrophic metabolism plays a major role in the global carbon cycle. Acetobacterium woodii, which is a member of the Eubacteriaceae family and type strain of the Acetobacterium genus, has been critical for understanding the biochemistry and energy conservation in acetogens. Other members of the Acetobacterium genus have been isolated from a variety of environments or have had genomes recovered from metagenome data, but no systematic investigation has been done into the unique and varying metabolisms of the genus. Using the 4 sequenced isolates and 5 metagenome-assembled genomes available, we sequenced the genomes of an additional 4 isolates (A. fimetarium, A. malicum, A. paludosum, and A. tundrae) and conducted a comparative genome analysis of 13 different Acetobacterium genomes to obtain better phylogenomic resolution and understand the metabolic diversity of the Acetobacterium genus. Our findings suggest that outside of the reductive acetyl-CoA (Wood-Ljungdahl) pathway, the Acetobacterium genus is more phylogenetically and metabolically diverse than expected, with metabolism of fructose, lactate, and H2:CO2 constant across the genus, and ethanol, methanol, caffeate, and 2,3-butanediol varying across the genus. While the gene arrangement and predicted proteins of the methyl (Cluster II) and carbonyl (Cluster III) branches of the Wood Ljungdahl pathway are highly conserved across all sequenced Acetobacterium genomes, Cluster 1, encoding the formate dehydrogenase, is not. Furthermore, the accessory WLP components, including the Rnf cluster and electron bifurcating hydrogenase, were also well conserved, though all but four strains encode for two Rnf clusters. Additionally, comparative genomics revealed clade-specific potential functional capabilities, such as amino acid transport and metabolism in the psychrophilic group, and biofilm formation in the A. wieringae clade, which may afford these groups an advantage in low-temperature growth or attachment to solid surfaces, respectively. Overall, the data presented herein provides a framework for examining the ecology and evolution of the Acetobacterium genus and highlights the potential of these species as a source of fuels and chemicals from CO2-feedstocks.

Sulfate reducing bacteria (SRB) are often found in these environments as well, with several studies suggesting a syntrophic partnership between SRB and acetogens [15,16]. In particular, a combination of Desulfovibrio, Sulfurospirillum, and Acetobacterium were proposed to cooperatively participate in microbial induced corrosion (MIC) of steel [16], have been found in production waters from a biodegraded oil reservoir [17], and can be detected in natural subsurface CO 2 reservoirs [18]. Moreover, these three microorganisms were the most abundant members of the biocathode community responsible for electrode-driven production of acetate and hydrogen from CO 2 [19][20][21].
Acetobacterium woodii is the type strain of the genus, is the best characterized strain, and has been used as a model organism for understanding the bioenergetics of acetogens ( [23], and references therein). Importantly, a genetic system also exists for A.
Outside of the type strain, not much is known about the Acetobacterium genus.

fimetarium, A. paludosum, and A. tundrae).
We conducted a comparative genomic study to shed light on the phylogenetic relatedness and functional potential of the Acetobacterium genus. Our results suggest that while the major metabolic pathways are well conserved (e.g. the Wood-Ljungdahl pathway and accessory components), certain predicted metabolisms and genome features were markedly different than what is known from Acetobacterium woodii. For example, a clade consisting of A. sp. MES1, A. sp. UBA5558, A. sp. UBA5834, and A. wieringae contained over 40 unique predicted protein sequences with six closely related to diguanylate cyclases, an enzyme that catalyzes the production of the secondary messenger cyclic-di-GMP known to induce biofilm formation [39] and increase tolerance to reactive oxygen species [40]. The unique diguanylate cyclases may afford a unique ability of this clade for cellular attachment to a diverse range of solid surfaces, including carbon-based electrodes. Furthermore, the psychrophilic strains had varied amino acid transport and metabolism, lacking a glycine cleavage system and an alanine degradation pathway, suggesting important protein adaptations to a low-temperature lifestyle. Results described herein provide insight into the defining common features of the Acetobacterium genus as well as ancillary pathways that may operate to provide various strains the ability to survive in diverse environments and potentially be exploited for biotechnological applications, such as the production of fuels and chemicals from CO 2 -feedstocks [19,21].

Materials and Methods
DNA extraction, DNA sequencing, and genome assembly of four Acetobacterium isolates. For A. fimetarium, A. malicum, A. paludosum, and A. tundrae, freeze-dried cells were obtained from ATCC and reconstituted on minimal freshwater medium growing autotrophically on H 2 :CO 2 [12]. Chromosomal DNA was extracted using the AllPrep DNA/RNA Mini Kit (Qiagen) according to manufacturer's protocol. Extracted DNA was processed for Illumina sequencing using the Nextera XT protocol (Illumina).
Samples were barcoded to enable multiplex sequencing of four samples using a single MiSeq V3 kit (2 x 301). Raw paired-end sequences were quality trimmed with CLC Genomics workbench (Qiagen) with a quality score cutoff of Q30, resulting in a total of 297, 911, 471, and 548 million base pairs for A. fimetarium, A. malicum, A. paludosum, and A. tundrae, respectively. Trimmed paired-end reads were assembled with SPAdes (v. 3.7.0) using the -careful flag to reduce mismatches and short indels [41]. Assembly of trimmed paired-end reads resulted in four high-quality draft genomes, with >90% completion, <5% contamination, the presence of 23S, 16S, and 5S rRNA genes, and at least 18 tRNAs [42] (Table 1; Supplemental Table 1). The SPAdes assembled contigs were assessed for quality using Quast [43] and CheckM (v. 1.0.7) [44].
Genome completeness. Each Acetobacterium genome or metagenome-assembled genome (MAG) was assessed for completeness and contamination (Table 1)

Results and Discussion
Phylogeny of the Acetobacterium genus.
As a first step towards identifying the taxonomic placement of newly sequenced Acetobacterium genomes, we collected 16S rRNA gene sequences from the NCBI database, and for MAGs not in the database; we attempted to extract 16S rRNA gene sequences from metagenome data. Multiple MAGs did not contain 16S rRNA gene sequence, potentially due to metagenome assembly and binning issues. A.
dehalogenans contained a truncated 16S rRNA gene that was less than 400 bp.
Therefore, nine 16S rRNA gene sequences of the 13 sequenced Acetobacterium genomes were utilized. We reconstructed an updated phylogenetic tree of available 16S rRNA gene sequences ( Figure 1). The Acetobacterium genus forms distinct subgroup with cluster XV of the Clostridium subphylum [52]. Notably, Willems and Collins found that the psychrophilic strains were not more closely related to one another compared to non-psychrophilic strains-our findings suggest the opposite, with distinct clustering of known psychrophiles A. tundrae, A. paludosum, A. fimetarium, and A. bakii ( Figure 1).
Other Acetobacterium strains with sequenced genomes clustered close together, including A. wieringae and A. sp. MES1. UBA11218 was only 2.22 Mbp with 6.1x genome coverage. These genomes were often excluded from our analyses in order to more accurately describe the core components of the genus.

Genome characteristics of the Acetobacterium genus
To understand the relatedness of the Acetobacterium genus, we calculated the pairwise average nucleotide identity (ANI) and average amino acid identity (AAI) for each genome and constructed phylogenies based on whole genome alignments ( Figure   2). The closest species to Acetobacterium woodii, the type strain of the genus, were A.  Figure 2 indicate suggested strain-level variation of a single species). While many of the MAGs were above the suggested species-level cutoff for ANI of 95% [54][55][56], we cannot definitively define particular MAGs as subspecies and phenotypic characterization should accompany genomic data to confirm these findings.

Utilizing predicted protein sequences, whole genome comparisons between
Acetobacterium genomes was performed. While the presence of a particular pathway does not definitively prescribe function, genetic comparisons reveal evolutionary conservation or divergence and provide valuable information about the functional potential encoded in each genome. Pan-genome analyses utilized gene families or amino acid sequences that clustered together at a defined sequence similarity cutoff.
We identified the common set of gene families (core genome), the unique set of gene families (unique genome), and the ancillary set of gene families, which were found in at least two, but not all genomes (accessory genome). In order to obtain the most accurate pan-genome and core genome estimates, only genomes with a >98% completion were utilized (11 in total).
Whole genome comparisons corroborated 16S rRNA gene phylogeny and ANI/AAI groupings of the Acetobacterium genus ( Figure 2). Inclusion of the eleven most complete Acetobacterium genomes revealed a total of 10,126 gene families (40,203 total amino acid sequences). When calculated as the total number of amino acid sequences per category, the percentage of core sequences was 38.4%, while the accessory, unique, and exclusively absent sequences were 43.6%, 11.9%, and 6.1%, respectively. Within the pan-genome, the core genome, accessory genome, and unique genome contained a total of 1406 (13.9%), 3,964 (39.1%), and 4,756 (46.9%) gene families, respectively ( Figure 1B, Table 2). The pan-genome partitioning of gene families suggests that the unique genome contains the most functional diversity, followed by the accessory genome and then the core genome. Phylogenomic trees utilizing the pan-genome and core genome sequences revealed clustering of the psychrophilic strains A. tundrae, A. paludosum, and A. fimetarium ( Figure 1C). Other clusters included Acetobacterium wieringae, A. sp. MES1, and A. sp. UBA5834; and A. malicum and A. dehalogenans, consistent with the ANI and AAI results.
Protein sequences from core, accessory, and unique gene families were classified with the clusters of orthologous groups (COG) database [57]. The core genome had the highest representation from categories E (amino acid transport & metabolism), J (translation, ribosomal structure & biogenesis), and C (energy production and conversion) ( Figure 1D). The highest representation in the accessory and unique gene families were from Category K (transcription) and category T (signal transduction mechanisms). The third most represented category for the accessory and unique gene families were category E (amino acid transport and metabolism) and category L (replication, recombination, and repair), respectively. Sequences were also characterized with the KEGG database (Supplemental File).
The Wood-Ljungdahl pathway (WLP) (or the reductive acetyl-CoA pathway) is thought to be derived from an ancient carbon dioxide fixing metabolic pathway [58]. The WLP is coupled to substrate-level phosphorylation but the net ATP gain is zero, thus a chemiosmotic mechanism is employed for energy conservation [59]. Generation of a transmembrane gradient is utilized by an F 1 F 0 ATP synthase for ATP synthesis and relies upon either the Rhodobacter nitrogen fixation (Rnf) complex or cytochromes depending upon the species. The Acetobacterium genus appears to lack cytochromes and therefore Acetobacterium woodii has served as a model organism for understanding the mechanisms of energy conservation in acetogens lacking cytochromes [22,[60][61][62][63][64].
Cluster I of the WLP is divergent across Acetobacterium spp.
In Acetobacterium woodii, the WLP is organized into three separate and distinct gene clusters [22]. Cluster I contains genes that encode the hydrogen dependent CO 2 reductase (HDCR), which is the first step of the WLP methyl branch (CO 2 to formate) [65].  (Figure 2; [32]). The CO dehydrogenase/acetyl-CoA synthase complex (AcsABCD) converts CO 2 to CO and this conversion has the largest thermodynamic barrier of the WLP. Acetogens utilize flavin-based electron bifurcation to overcome this barrier [1]. This highly specialized form of energy conservation may explain why this cluster is well conserved, demonstrating that purifying selection is the predominant evolutionary force on this cluster.

Genes essential for energy conservation.
Energy conservation in acetogens requires the generation of a sodium (cytochrome-deficient acetogens like Acetobacterium) or proton gradient (cytochromecontaining acetogens) across the cytoplasmic membrane [1]. In Acetobacterium woodii, and potentially all other members of the genus, a sodium-ion gradient is utilized. The transmembrane sodium-ion gradient is generated by the Rnf complex, but a number of accompanying enzyme complexes, including an electron bifurcating hydrogenase, ferredoxins, and electron-transfer flavoproteins (ETF) are also essential for overcoming thermodynamically unfavorable reactions of the WLP and for ATP generation [1].

Conservation of the flavin-based electron bifurcating hydrogenase HydABC.
The key hydrogenase responsible for providing reducing equivalents for energy conservation in Acetobacterium woodii is encoded by hydA1, hydB, hydD, hydE, and hydC (Awo_c26970-Awo_c27010). This electron-bifurcating hydrogenase, HydABCDE, couples the thermodynamically unfavorable reduction of oxidized ferredoxin with the favorable reduction of NAD + [22]. With the exception of A. sp. UBA5558, HydABCDE (Supplemental File) was well conserved across the Acetobacterium genus (>77% identity and 85% coverage). A phylogenetic tree of concatenated proteins HydABCDE revealed identical clustering to that of the whole genome tree, with tight clustering of A.
sp. MES, A. sp. UBA5834, and A. wieringae, and clustering of the psychrophilic strains (Supplemental Figure 2). This high degree of conservation across the genus highlights the essential function of this enzyme complex in Acetobacterium spp.

Conservation of the ferredoxin-NAD + oxidoreductase (Rnf complex) and variability in Rnf copy number.
Energy conservation in Acetobacterium woodii requires a multi-subunit integral membrane ferredoxin-NAD + oxidoreductase called the Rhodobacter nitrogen fixation (Rnf) complex [70][71][72]. The Rnf complex is an energy-coupled transhydrogenase responsible for transfer of electrons from reduced ferredoxin to NAD + , which generates a transmembrane sodium ion gradient to drive ATP generation via sodium-dependent ATP synthase. This process is reversible when the concentration of NADH is greater than ferredoxin [64]. The Rnf cluster is common across many Clostridia including C.
tetani, C. kluyveri, C. difficile, C. phytofermentans, and C. botulinum [60], however the Clostridia-type is not sodium-dependent and may involve a transmembrane proton gradient [59]. While gene arrangement within the Rnf cluster differs across prokaryotes, in Acetobacterium woodii the Rnf genes (rnfCDGEAB) are polycistronic and cotranscribed [60]. All sequenced Acetobacterium genomes encode for a Rnf complex (Supplemental File). Based upon the presence and conservation of the Rnf complex in sequenced Acetobacterium spp., we propose that energy conservation mechanisms are similar across the genus and that Rnf is required.
Microorganisms, such as Azotobacter vinelandii, encode for two Rnf complexes, one of which is linked to nitrogen fixation and the other that is expressed independent of nitrogen source [73]. Acetobacterium woodii only encodes for a single Rnf complex [22], while Acetobacterium wieringae and Acetobacterium sp. MES1 were found to encode for two [14,74]. The function of the second Rnf complex in Acetobacterium is unknown, but closer examination of the remaining Acetobacterium genomes revealed that the majority of Acetobacterium genomes encode for a second rnfCDGEAB cluster (Supplemental File). The genome architecture surrounding the second Rnf complex was also well conserved (Supplemental data), with a predicted hydroxymethylpyrimidine transporter, hydroxyethylthiazole kinase, and thiamin-phosphate pyrophosphorylase directly adjacent to the Rnf2 complex, suggesting a potential role in thiamin pyrophosphate (Vitamin B 1 ) synthesis and nitrogen metabolism [75]. The genomes encoding only the first Rnf complex retained the genes that surround the second Rnf complex, supporting an HGT event. Phylogenetic analysis supports these findings, as the second Rnf cluster forms a distinct clade with the Clostridia Rnf cluster (Figure 4).
Therefore, the first Rnf cluster is most likely a paralog from a Eubacteriaceae ancestor, while the second copy is a potential ortholog from a horizontal gene transfer (HGT) event.

Conservation of the Sodium-dependent ATPase.
Acetobacterium woodii encodes for an integral membrane Na + F 1 F 0 ATP synthase that generates ATP via the sodium ion gradient and contains both V-type and F-type rotor subunits [76]. All sequenced Acetobacterium genomes encode for the F-type and V-type ATP synthase (Supplemental File), each contained in a separate operon and with amino acid sequence similarity to Acetobacterium woodii ranging from 56% (AtpB, A. fimetarium) to 98% (AtpE2, A. bakii). In particular, the c-subunits of the F 1 F 0 ATP synthase are responsible for ion translocation across the membrane. Using three csubunits from Acetobacterium woodii (Awo_c02160 -c02180), which is known to use a sodium ion gradient, we queried the remaining 12 Acetobacterium genomes for csubunits, and specifically, the Na + -binding motif. All strains encode for at least two subunits with binding motifs specific for Na + (Supplemental Figure 3). The presence of the F-type and V-type ATP synthase operons, gene synteny, and sequence similarity across all strains, in addition to the presence of Na + -binding motifs suggests that all Acetobacterium strains utilize this ATP synthase for ATP production via a sodium gradient, similar to what is observed in A. woodii.

Alternative Electron Donors/Acceptors utilized by the Acetobacterium genus.
Acetobacterium species can use a wide range of substrates for carbon and energy, indicating a generalist lifestyle for the genus despite the perception that acetogens are specialists. This could provide an advantage in anoxic environments where competition for limited substrates is high [77,78]. As an example, encoded in the genomes of most Acetogens metabolize alcohols as an alternative to autotrophic acetogenic growth [79]. Acetobacterium woodii can utilize 1,2-propanediol as the sole carbon and energy source for growth [30]. In anaerobic environments, formation of 1,2-propanediol results from the degradation of fucose and rhamnose, constituents of bacterial exopolysaccharides and plant cell walls [80]. The 1,2-propanediol degradation pathway is encoded by the pduABCDEGHKL gene cluster, which in A. woodii (Awo_c25930-Awo_c25740) contains 20 genes with similarity to the pdu cluster of Salmonella enterica [30]. The presence, homology, and gene arrangement of the pdu gene cluster in each of the other 12 Acetobacterium genomes suggests that 1,2-propanediol degradation is conserved across the Acetobacterium genus (Supplemental File). Furthermore, all strains contain a histidine kinase and response regulator upstream of the 1,2propanediol cluster suggesting that the regulatory and expression mechanisms of this pathway are conserved. In A. woodii, it was proposed to sense alcohols (chain length > 2) or aldehyde intermediates-a mechanism we hypothesize is employed by all Acetobacterium species.
tundrae are capable of caffeate reduction via this pathway.

Potential adaptations for enhanced surface colonization.
Many of the Acetobacterium genomes/MAGs encoded for portions of the Widespread Colonization Island (WCI), which mediates non-specific adherence to surfaces and biofilm formation [93]. The products of the WCI are responsible for assembly and secretion of bundled pili [94] and may be important for colonization of diverse environments [95]. The general structure of the WCI genomic region is similar for all sequenced Acetobacterium strains with the exception of A. woodii, and includes 1 to 4 small hypothetical proteins (~56 a.a), followed by TadZ malicum and an isolate most closely related to A. wieringae more effectively extract electrons from solid Fe(0) coupons compared to A. woodii [37]. Additionally, A. sp.
Closer examination of the strains that were phylogenetically most-closely related tundrae) was evident, as a heme-binding protein (pfam03928) located between pduO and pduP was missing.
One adaptation to survival in cold environments psychrophiles employ is to increase protein flexibility and stability [98]. As a result, psychrophiles often contain a higher proportion of hydrophobic amino acid residues, such as alanine and glycine [99,100]. The psychrophilic strains are the only sequenced Acetobacterium strains to lack an alanine degradation pathway and glycine cleavage system. One possible explanation for the loss of these pathways in the psychrophilic Acetobacterium strains is an increased utilization of alanine and glycine in cold-adapted protein synthesis [99].
Interestingly, the alanine degradation pathway encodes for alanine dehydrogenase, which may also play an important role in NH 4 + assimilation [101]. Furthermore, glutamate dehydrogenase, another enzyme shown to be important for nitrogen assimilation, was also absent from the psychrophilic strains (Supplemental File). Thus, the psychrophilic strains may have adapted to obtain nitrogen from alternative sources, using an alternative pathway.
To further determine what functional genome attributes may separate the psychrophilic Acetobacterium species from their non-psychrophilic counterparts, we identified gene families unique to the psychrophilic species (Supplemental File). Our analyses revealed 40 annotated gene families found only in the psychrophilic clade, and included a calcium-translocating P-type ATPase, sodium:proton antiporter, cupin, ribulose 1,5-bisphosphate carboxylase, and SDR family oxidoreductase, among others (Supplemental File). The presence of a unique calcium-translocating P-type ATPase and sodium:proton antiporter suggest modified abilities to transport calcium and sodium.
The 1,5-bisphosphate carboxylase (RuBisCO) and cupin may be involved in the methionine salvage pathway, which has been observed in other bacteria [102,103].
SDR family oxidoreductases have a wide range of activities, including metabolism of amino acids, cofactors, carbohydrates, and may play a role in redox sensing [104].
More work is needed to determine the exact function each protein plays in the psychrophilic strains, but these findings can be used as a framework for targeted functional analyses in future studies to determine specific roles these unique proteins play in the metabolisms of psychrophilic Acetobacterium strains and their persistence in cold environments.

Conclusions.
Acetogens are a phylogenetically diverse group of microorganisms capable of converting CO 2 into acetate. Acetobacterium woodii has been used as a model organism to study the WLP and accessory components required for energy conservation in acetogens lacking cytochromes ( [23], and references therein). In this study we sequenced four Acetobacterium isolates from the culture collection (ATCC) (A.  Figure 5). Divergent metabolisms found in a subset of genomes included caffeate reduction, 2,3-butanediol oxidation, ethanol oxidation, alanine metabolism, glycine cleavage system, and methanol oxidation. Notably, the psychrophilic strains encode for unique amino acid transport and utilization, and ion transport, which may have evolved for survival in low-temperature environments, in addition to posttranscriptional modifications [32]. Members of the A. wieringae clade encode for unique diguanylate cyclases and a unique methylene-THF reductase, which may aid in attachment and colonization to solid surfaces. Overall, the comparative genomic analysis performed on the Acetobacterium genus provides a framework to understand the conserved metabolic processes across the genus, as well as identifying divergent metabolisms that can be exploited for targeted biotechnological applications using Acetobacterium strains.

Authors and Contributors.
DER performed analyses, analyzed data, and wrote the manuscript. CWM analyzed data and wrote the manuscript. HM, RSN, and DG edited the manuscript.

Conflict of interest.
The authors declare no conflict of interest.

Funding.
Funding was provided by the U.S. Department of Energy, Advanced Research Project Agency-Energy (award DE-AR0000089).

Acknowledgement.
This work was performed in support of the   [48] and the phylogenomic tree was constructed with the Maximum Likelihood method using the Bootstrap method with 1000 replications for test of phylogeny. The substitution method was Jones-Taylor-Thornton with uniform rates among sites. Clustering and tree construction was performed with MEGA 6.06 [47]. Varied genome architecture of predicted protein sequences immediately surrounding each Rnf operon are shown (not drawn to scale).