Defining Genomic and Predicted Metabolic Features of the Acetobacterium Genus

Acetogens are anaerobic bacteria capable of fixing CO2 or CO to produce acetyl-CoA and ultimately acetate using the Wood-Ljungdahl pathway (WLP). This autotrophic metabolism plays a major role in the global carbon cycle and, if harnessed, can help reduce greenhouse gas emissions. Overall, the data presented here provide a framework for examining the ecology and evolution of the Acetobacterium genus and highlight the potential of these species as a source for production of fuels and chemicals from CO2 feedstocks.

the WLP is the genus' defining feature, and Acetobacterium woodii has been extensively studied. The genus Acetobacterium contains Gram-positive, non-spore-forming, homoacetogenic bacteria and was first described as a genus by Balch and coworkers, with the type strain Acetobacterium woodii WB1 (ATCC 29683) (3). Members of the genus Acetobacterium have been found in diverse environments, including sulfate-reducing permeable reactive zones (4), anoxic bottom waters of a volcanic subglacial lake (5,6), seagrass rhizosphere (7), high-temperature gas-petroleum reservoirs (8), anaerobic granular sludge from fruit-processing wastewater (9), and biocathode communities (10)(11)(12)(13)(14). Sulfate-reducing bacteria (SRB) are often found in these environments as well, with several studies suggesting a syntrophic partnership between SRB and acetogens (7,15). In particular, a combination of Desulfovibrio, Sulfurospirillum, and Acetobacterium was proposed to cooperatively participate in microbially induced corrosion (MIC) of steel (15), has been found in production waters from a biodegraded oil reservoir (16), and can be detected in natural subsurface CO 2 reservoirs (17). Moreover, these three microorganisms were the most abundant members of the biocathode community responsible for electrode-driven production of acetate and hydrogen from CO 2 (18)(19)(20).
Outside the type strain, not much is known about the Acetobacterium genus. The genus appears to have varied metabolic capabilities, including low-temperature growth (29), debromination of polybrominated diphenyl ethers (PBDEs) (30), hexahydro-1,3,5trinitro-1,3,5-triazine (RDX) degradation (31,32), electrode-mediated acetogenesis (33), enhanced iron corrosion (34), and isoprene degradation (35). To provide a more robust genome data set, we sequenced and manually curated the genomes of four Acetobacterium strains, including A. malicum and three psychrophilic strains (A. fimetarium, A. paludosum, and A. tundrae). We also conducted a comparative genomic study to shed light on the phylogenetic relatedness and functional potential of the Acetobacterium genus. Our results suggest that while the major metabolic pathways are well conserved (e.g., the Wood-Ljungdahl pathway and accessory components), certain predicted metabolisms and genome features were markedly different than what is known from Acetobacterium woodii. Results described here provide insight into the defining common features of the Acetobacterium genus as well as ancillary pathways that may operate to provide various strains the ability to survive in diverse environments and potentially be exploited for biotechnological applications, such as the production of fuels and chemicals from CO 2 feedstocks (18,20).

RESULTS AND DISCUSSION
Phylogeny of the Acetobacterium genus. As a first step toward identifying the taxonomic placement of newly sequenced Acetobacterium genomes, we collected 16S rRNA gene sequences from the NCBI database, and for metagenome-assembled genomes (MAGs) not in the database, we attempted to extract 16S rRNA gene sequences from metagenome data. Multiple MAGs did not contain 16S rRNA gene sequence, potentially due to metagenome assembly and binning issues. Acetobacterium dehalogenans contained a truncated 16S rRNA gene that was less than 400 bp and was not included in the 16S rRNA tree. Therefore, nine 16S rRNA gene sequences of the 13 sequenced Acetobacterium genomes were utilized. We reconstructed an updated phylogenetic tree of available 16S rRNA gene sequences (Fig. 1). The Acetobacterium genus forms a distinct subgroup with cluster XV of the Clostridium subphylum (36). Notably, Willems and Collins (36) found that the psychrophilic strains were not more closely related to one another than were nonpsychrophilic strains-our findings suggest the opposite, which is in agreement with the work of Shin and coworkers (29) and shows distinct clustering of known psychrophiles A. tundrae, A. paludosum, A. fimetarium, and A. bakii (Fig. 1). Other Acetobacterium strains with sequenced genomes clustered close together, including A. wieringae and Acetobacterium sp. strain MES1.
Genome characteristics of the Acetobacterium genus. To date, there are 14 publicly available sequenced representatives of the Acetobacterium genus. Thirteen of the 14 sequenced Acetobacterium strains are draft genome assemblies, with the genome of Acetobacterium woodii as the only complete and closed genome (contained in one contiguous sequence). The 13 draft genomes contained a varying number of contigs ranging from 62 (A. wieringae) to 1,582 (Acetobacterium sp. strain UBA11218). The genome sizes ranged from ϳ2.2 Mbp (Acetobacterium sp. strain UBA11218) to ϳ4.1 Mbp (A. bakii), and the GC% ranged from 39.7 (A. tundrae) to 44.8 (Acetobacterium sp. strain UBA11218) ( Table 1). Each genome, including A. woodii, was examined for completeness based upon the Eubacteriaceae marker gene set from CheckM (37). Acetobacterium woodii and Acetobacterium sp. strain KB-1 were 100% complete. The remaining 12 Acetobacterium genomes/MAGs were Ͼ98% complete with the exception of Acetobacterium sp. UBA6819 (94% complete, missing 11 marker genes), Acetobacterium sp. strain UBA5558 (85% complete, missing 31 marker genes), and Acetobacterium sp. UBA11218 (57% complete, missing 93 marker genes). Furthermore, Acetobacterium sp. UBA5558 was only 2.61 Mbp, well below the average genome size (3.69 Mbp), and had a genome coverage of 13ϫ. Likewise, Acetobacterium sp. UBA11218 was only 2.22 Mbp with 6.1ϫ genome coverage. The use of nonclosed genomes/MAGs provides access to a larger and potentially more diverse genome set, but gaps between contigs and missing genes present potential errors in analyses. Therefore, the three most incomplete MAGs were excluded from our pan-genome analyses in order to more accurately describe the core components of the genus, while minimizing erroneous results, and all genomes/MAGs, excluding Acetobacterium sp. UBA11218 (57% complete), were included in specific pathway identification (e.g., WLP, Rnf, hydrogenase).
To further examine the relatedness of the Acetobacterium genus, we calculated the pairwise average nucleotide identity (ANI) and average amino acid identity (AAI) for each genome and constructed phylogenies based on whole-genome alignments (Fig. 2). The closest species to Acetobacterium woodii were A. malicum, A. dehalogenans, and Acetobacterium sp. KB-1, with average nucleotide identity (ANI) and average amino acid identity (AAI) of 80% and 79%, respectively. While these represent distinct species, many of the sequenced Acetobacterium genomes were highly similar to one another, potentially representing subspecies or strains. Specifically, Acetobacterium sp. strain UBA6819 was closely related to Acetobacterium sp. KB-1 (99% ANI); A. malicum and A. dehalogenans (97% ANI) could be considered different strains of the same species; Acetobacterium sp. MES1, A. wieringae, Acetobacterium sp. UBA5558, and Acetobacterium sp. strain UBA5834 are all likely the same species (97 to 100% ANI); and A. tundrae and A. paludosum (95% ANI) had high sequence similarity across their genomes ( Fig. 2A) (38). These findings suggest that, for example, three potential subspecies groups exist (brackets in Fig. 2A). While many of the genomes/MAGs were above the suggested species-level cutoff for ANI of 95% (39-41), we cannot definitively define particular genomes/MAGs as subspecies, and phenotypic characterization should accompany genomic data to confirm these findings.
The Acetobacterium pan-genome. Utilizing predicted protein sequences, wholegenome comparisons between Acetobacterium genomes were performed. While the presence of a particular pathway does not definitively prescribe function, genetic comparisons reveal evolutionary conservation or divergence and provide valuable information about the functional potential encoded in each genome. Pan-genome analyses utilized gene families or amino acid sequences that clustered together at a defined sequence similarity cutoff. We identified the common set of gene families (core genome), the unique set of gene families (unique genome), and the ancillary set of gene families, which were found in at least two, but not all, genomes (accessory genome). In order to obtain the most accurate pan-genome and core genome estimates, only genomes with a Ͼ98% completion were utilized (11 in total).
Whole-genome comparisons validated phylogenetic distances calculated using 16S rRNA genes and ANI/AAI groupings (Fig. 2), though phylogenomic relationships varied from marker gene clustering. For example, A. fimetarium was relatively closer to the psychrophilic strains in the pan-genome tree and ANI/AAI groupings, while A. bakii was relatively closer to the psychrophilic strains than A. fimetarium in the 16S rRNA tree. While 16S rRNA gene-based phylogeny has been extensively used for the Acetobacte- (B) Pan-genome distribution of gene families found in 1, some (2 to 10), or all genomes (determined using predicted amino acid sequences). (C) Pan-genome phylogeny of 11 Acetobacterium strains. Potential subspecies denoted in brackets. Blue bar ϭ presence of Rnf2 complex, green bar ϭ potential for 2,3-butanediol metabolism, purple bar ϭ A. woodii-type WLP cluster I, black bar ϭ potential for methanol oxidation, orange bar ϭ potential for caffeate metabolism, red bar ϭ potential for ethanol oxidation. (D) COG classification of predicted protein sequences of the core genome, accessory genome, and unique genome of 11 Acetobacterium strains. rium genus, improved phylogenetic resolution is achievable with our whole-genome analyses (42,43).
Inclusion of the 11 most complete Acetobacterium genomes revealed a total of 10,126 gene families (40,203 total amino acid sequences). When calculated as the total number of amino acid sequences per category, the percentage of core sequences was 38.5%, while the accessory, unique, and exclusively absent sequences were 43.6%, 11.9%, and 6.1%, respectively. Within the pan-genome, the core genome, accessory genome, and unique genome contained a total of 1,406 (13.9%), 3,964 (39.1%), and 4,756 (47.0%) gene families, respectively ( Fig. 2B and Table 2). The pan-genome partitioning of gene families suggests that the unique genome contains the most functional diversity, followed by the accessory genome and then the core genome. Phylogenomic trees utilizing the pan-genome and core genome sequences revealed clustering of the psychrophilic strains A. tundrae, A. paludosum, and A. fimetarium (Fig. 2C). Other clusters included Acetobacterium wieringae, Acetobacterium sp. MES1, and Acetobacterium sp. UBA5834, and A. malicum and A. dehalogenans, consistent with the ANI and AAI results. Gene families unique to the psychrophilic clade or the A. wieringae clade were examined further (see Data Set S1 in the supplemental material).
Protein sequences from core, accessory, and unique gene families were classified with the Clusters of Orthologous Groups (COG) database (44). The core genome had the highest percentage of sequences from amino acid transport & metabolism (category E, 9.9%), translation, ribosomal structure & biogenesis (category J, 9.7%), and energy production and conversion (category C, 7.7%) (Fig. 2D). The highest percentages of sequences in the accessory and unique genomes were from transcription (category K, 11.5% and 12.4%, respectively) and signal transduction mechanisms (category T, 10.9% and 9.0%, respectively). The third most represented categories for the accessory and unique gene families were category E (amino acid transport and metabolism) and category L (replication, recombination, and repair), respectively.
The Wood-Ljungdahl (acetyl-CoA) carbon fixation pathway in Acetobacterium spp. The Wood-Ljungdahl pathway (WLP) (or the reductive acetyl-CoA pathway) is thought to be derived from an ancient carbon dioxide-fixing metabolic pathway (45). The WLP is coupled to substrate-level phosphorylation, but the net ATP gain is zero; thus, a chemiosmotic mechanism is employed for energy conservation (46). Generation of a transmembrane gradient is utilized by an F 1 F 0 ATP synthase for ATP synthesis and relies upon either the Rhodobacter nitrogen fixation (Rnf) complex or the energyconverting ferredoxin-dependent hydrogenase complex (Ech) depending upon the species. Acetobacterium woodii has served as a model organism for understanding the mechanisms of energy conservation in Rnf-type acetogens (21,22,(47)(48)(49)(50), and we utilized the A. woodii genome as a template to examine the WLP of the other 12 Acetobacterium genomes/MAGs.
(i) Cluster I of the WLP is divergent across Acetobacterium spp. In Acetobacterium woodii, the WLP is organized into three separate and distinct gene clusters ( Fig. 3) (21). Cluster I contains genes that encode the hydrogen-dependent CO 2 reductase (HDCR), which is the first step of the WLP methyl branch (CO 2 to formate) (51). Specifically, cluster I in Acetobacterium woodii encodes two formate dehydrogenase (FDH) isoenzymes (one of which is a selenocysteine-containing version), a formate dehydrogenase accessory protein, a putative FeS-containing electron transfer protein, and an [FeFe]hydrogenase (FdhF1, HycB1, FdhF2, HycB2, FdhD, HycB3, and HydA2) ( Fig. 3) (21). The presence, sequence similarity, and gene arrangement of cluster I were determined for the remaining 12 genomes/MAGs. Only A. bakii, A. fimetarium, A. paludosum, and A. tundrae encoded cluster I with high sequence similarity and a gene arrangement identical to A. woodii ( Fig. 3; Data Set S2). Of these, A. bakii and A. fimetarium contain Contrary to recent reports of the conservation of the WLP (38), we found that cluster I was markedly different and more divergent in the remaining genomes, with low sequence identity (Ͻ50%) and similarity ( Fig. 3 and Data Set S2). Examination of formate dehydrogenase (FDH) protein alignments revealed approximately 220 extra amino acids at the beginning (N-terminal) of the non-A. woodii-type FDH that contained multiple conserved cysteine residues, with predicted [2Fe-2S] and [4Fe-4S] motifs (Table S2; Text S2). The genome architecture surrounding the formate dehydrogenase had identical synteny for various groups of strains with four distinct gene patterns, which corresponded to phylogenetic clustering of all FDH proteins from A. woodii and non-A. woodii FDH clusters ( Fig. 3; Table S2). While the A. woodii FDH cluster I was quite different from the non-A. woodii FDH clusters, the non-A. woodii FDH cluster IA, for example, contained proteins with conserved residues for coordination of ironsulfur clusters and molybdopterin cofactors, with similarities to the A. woodii HDCR (Table S2). Based upon the genome architecture surrounding each FDH, and phylogenetic placement of the FDH, we propose four types of FDH cluster I-the A. woodii FDH cluster I that contains FdhF1 and FdhF2, and three non-A. woodii FDH clusters (IA, IB, (B) Genetic representation of the WLP cluster I (formate dehydrogenase), cluster II (methyl branch), and cluster III (carbonyl branch). Lines denote contiguous sequences, while the absence of a line represents a discontinuous operon. The numbers represent the protein-encoding gene (peg) numbers generated by RAST. Consecutive numbers are synonymous with contiguous sequences. (C) Gene arrangement of cluster I of the Wood-Ljungdahl pathway. The formate dehydrogenase subunits are denoted in purple. (D) Phylogenetic tree of formate dehydrogenases from Acetobacterium spp. and closely related proteins. Four distinct FDH branches are highlighted, corresponding to the four types of FDH cluster I. and IC) (Fig. 3). The A. woodii HDCR (A. woodii FDH cluster I) uses H 2 as an electron donor for CO 2 reduction (52). More work is needed to determine the electron donor for the non-A. woodii FDH clusters, but if H 2 is not used directly, other potential alternative electron donors, such as NAD(P)H and reduced ferredoxin, could be utilized in a mechanism similar to Clostridium autoethanogenum (53).
The variability in FDH could be a result of the availability and requirement for tungsten. In Campylobacter jejuni and Eubacterium acidaminophilum, formate dehydrogenase activity is dependent upon tungsten availability, as tupA mutants exhibited markedly decreased FDH activity (54,55). We found that the strains encoding the non-A. woodii FDH cluster I also encode a tungsten-specific ABC transporter (tupABC). TupA has a high affinity for tungsten and can selectively bind tungsten over molybdate (56). The tupABC operon was absent in the strains encoding the A. woodii FDH cluster I, though the ability to transport tungsten in these strains may rely upon the modABC operon, as all sequenced genomes encode ModA, which is a high-affinity molybdate ABC transporter that binds both molybdate and tungstate (54,57).
(ii) Conservation of WLP cluster II and cluster III of the Acetobacterium genus. Cluster II (methyl branch) encodes enzymes responsible for the conversion of formate to methyltetrahydrofolate (methyl-THF) (Fhs1, FchA, FolD, RnfC2, MetV, and MetF). Unlike cluster I, cluster II is highly conserved across all Acetobacterium genomes (Data Set S2). The exception was the dihydrolipoamide dehydrogenase (LpdA1) and glycine cleavage system H protein (GcvH1), which is found downstream of methylenetetrahydrofolate reductase (MetF) in A. woodii but dispersed throughout the genomes of the other Acetobacterium strains (Fig. 3). Other variations in cluster II are found in A. tundrae and A. paludosum, which contain two formate tetrahydrofolate-ligase (Fhs1) subunits separated by a small hypothetical protein (39 amino acids [aa]). In A. paludosum the two copies are identical, while in A. tundrae there is one amino acid difference between the two Fhs1 sequences, suggesting a gene duplication event in these two strains.
Cluster III (carbonyl branch) encodes the CO dehydrogenase/acetyl-CoA synthase complex (AcsABCD), methyltransferase (AcsE), and accessory proteins (CooC and AcsV), for conversion of methyl-THF to acetyl-CoA. The carbonyl branch is the most well conserved portion of the WLP in the Acetobacterium genus, with high sequence similarity and identical gene arrangement across all genomes-the one exception is a hypothetical protein and an additional AcsB directly downstream of AcsB in A. bakii, A. dehalogenans, A. paludosum, A. tundrae, A. fimetarium, A. malicum, and Acetobacterium sp. KB-1 (Fig. 3) (29).
Genes essential for energy conservation. Energy conservation in acetogens requires the generation of a sodium or proton gradient across the cytoplasmic membrane, with an energy-converting ferredoxin:NAD ϩ reductase complex (known as the Rhodobacter nitrogen fixation [Rnf] complex) or energy-converting ferredoxindependent hydrogenase complex (Ech) and a sodium-dependent or proton-dependent ATPase involved (1). In Acetobacterium woodii, Acetobacterium bakii, and potentially all other members of the genus, a sodium ion gradient is generated via the Rnf and sodium-dependent ATPase.
(i) Conservation of the ferredoxin-NAD ؉ oxidoreductase (Rnf complex) and variability in Rnf copy number. The multisubunit integral membrane ferredoxin-NAD ϩ oxidoreductase called the Rhodobacter nitrogen fixation (Rnf) complex is an energy-coupled transhydrogenase responsible for transfer of electrons from reduced ferredoxin to NAD ϩ , which in Acetobacterium woodii generates a transmembrane sodium ion gradient to drive ATP generation via sodium-dependent ATP synthase (58)(59)(60). This process is reversible when the concentration of NADH is greater than that of ferredoxin (22). The Rnf cluster is common across many Clostridia including Clostridium tetani, Clostridium kluyveri, Clostridium difficile, Clostridium phytofermentans, and Clostridium botulinum (47); however, the Clostridia type is not sodium dependent and may involve a transmembrane proton gradient (61). While gene arrangement within the Rnf complex differs across prokaryotes, in Acetobacterium woodii the genes of the Rnf complex (rnfCDGEAB) are polycistronic and cotranscribed (59). All sequenced Acetobacterium genomes encode an Rnf complex (Data Set S2). Based upon the presence and conservation of the Rnf complex in sequenced Acetobacterium spp., we propose that energy conservation mechanisms are similar across the genus and that the Rnf complex is required.
Microorganisms, such as Azotobacter vinelandii, encode two Rnf complexes, one of which is linked to nitrogen fixation and the other of which is expressed independently of nitrogen source (62). Acetobacterium woodii encodes only a single Rnf complex (21), but Acetobacterium wieringae and Acetobacterium sp. MES1 were found to encode two (14,63,64). The function of the second Rnf complex in Acetobacterium is unknown, but closer examination of the remaining Acetobacterium genomes revealed that the majority of Acetobacterium genomes encode a second rnfCDGEAB cluster (Data Set S2). The genome architecture surrounding the second Rnf complex was also well conserved, with a predicted hydroxymethylpyrimidine transporter, hydroxyethylthiazole kinase, and thiamine-phosphate pyrophosphorylase directly adjacent to the Rnf2 complex ( Fig. 4; Data Set S2). Phylogenetic analysis of the two Rnf complexes revealed two distinct clusters: the first Rnf complex clusters with Eubacterium limosum, and the second Rnf complex forms a distinct clade with the Clostridia Rnf complex (Fig. 4).
Sodium pumping by the Rnf complex has been extensively studied in A. woodii, and sodium-dependent autotrophic growth has been experimentally verified in A. bakii, suggesting a role of the Rnf complex similar to that of A. woodii (29,58,59). In A. woodii, predicted Na ϩ translocating amino acid residues were found in the transmembrane subunits RnfA (E88), RnfD (D250 and G300), and RnfE (D129) (59). It is not known if the other Acetobacterium spp. utilize a Na ϩ gradient or H ϩ gradient for energy conservation. To probe this question, we examined the amino acid sequence similarity between the Rnf complexes and found that despite sequence divergence, amino acid residues predicted to be involved in Na ϩ binding were conserved across all Acetobacterium Rnf complexes. For example, RnfA1 and RnfA2 from Acetobacterium sp. MES1 were only 51% identical with 73% positives, yet both retained the conserved glutamic acid (E88). Similarly, RnfE1 and RnfE2 were divergent (53% identity, 69% positives), but aspartic acid at position 129 (D129) was conserved. While experimental validation is needed, we hypothesize that Acetobacterium Rnf complexes perform Na ϩ -specific pumping.
(ii) Conservation of the sodium-dependent ATPase. Acetobacterium woodii encodes an integral membrane Na ϩ F 1 F 0 ATP synthase that generates ATP via a sodium ion gradient and contains both V-type and F-type rotor subunits (65). All sequenced Acetobacterium genomes encode the F-type and V-type ATP synthase (Data Set S2), each contained in a separate operon and with amino acid sequence similarity to Acetobacterium woodii ranging from 56% (AtpB, A. fimetarium) to 98% (AtpE2, A. bakii).
In particular, the c-subunits of the F 1 F 0 ATP synthase are responsible for ion translocation across the membrane. Using three c-subunits from Acetobacterium woodii (Awo_c02160 to -c02180), which is known to use a sodium ion gradient, we queried the remaining 12 Acetobacterium genomes for c-subunits and, specifically, the Na ϩ -binding motif. All strains encode at least two subunits with binding motifs specific for Na ϩ (Fig. S1). The presence of the F-type and V-type ATP synthase operons, gene synteny, and sequence similarity across all strains, in addition to the presence of Na ϩ -binding motifs, suggests that all Acetobacterium strains utilize this ATP synthase for ATP production via a sodium gradient, similar to what is observed in A. woodii and A. bakii.  (105), and the phylogenomic tree was constructed with the Maximum Likelihood method using the bootstrap method with 1,000 replications for test of phylogeny. The substitution method was Jones-Taylor-Thornton with uniform rates among sites. Clustering and tree construction were performed with MEGA 6.06 (104). Varied genome architecture of predicted protein sequences immediately surrounding each Rnf operon is shown (not drawn to scale).

Electron bifurcation.
The transmembrane sodium-ion gradient is generated by the Rnf complex, but a number of accompanying enzyme complexes, including an electron-bifurcating hydrogenase, ferredoxins, and electron-transfer flavoproteins (ETF), are also essential for overcoming thermodynamically unfavorable reactions of the WLP (1).
(i) Conservation of the flavin-based electron-bifurcating hydrogenase HydABC. The key hydrogenase responsible for providing reducing equivalents for electron bifurcation in Acetobacterium woodii is encoded by hydA1, hydB, hydD, hydE, and hydC (Awo_c26970-Awo_c27010). This electron-bifurcating hydrogenase, HydABCDE, couples the thermodynamically unfavorable reduction of oxidized ferredoxin with the favorable reduction of NAD ϩ (21). With the exception of Acetobacterium sp. UBA5558, HydABCDE was well conserved across the Acetobacterium genus (Ͼ77% identity and 85% coverage) (Data Set S2). A phylogenetic tree of concatenated proteins HydABCDE revealed clustering identical to that of the whole-genome tree, with tight clustering of Acetobacterium sp. MES, Acetobacterium sp. UBA5834, and A. wieringae and clustering of the psychrophilic strains (Fig. S2). This high degree of conservation across the genus highlights the essential function of this enzyme complex in Acetobacterium spp.
(ii) Variability in the number of ferredoxins encoded by Acetobacterium genomes. Ferredoxins are soluble cellular redox compounds and, in acetogens, play a major role in autotrophic CO 2 fixation (1). Of the electron carriers involved in the WLP, only ferredoxins have been demonstrated to provide the reducing potential required for the reduction of CO 2 to CO in the carbonyl branch. The genome of Acetobacterium woodii encodes 17 ferredoxins, the most of any Acetobacterium genome to date. The number of ferredoxins encoded by the other Acetobacterium genomes ranged from 6 (Acetobacterium sp. KB-1 and Acetobacterium sp. UBA6819) to 16 (A. bakii). Specifically, A. malicum and A. dehalogenans encode 12 ferredoxins, members of the A. wieringae clade encoded 8 to 10, and Acetobacterium sp. KB-1 and Acetobacterium sp. UBA6189 encoded six. The psychrophilic clade was more varied, as A. fimetarium encoded 8, while A. paludosum encoded 11, A. tundrae encoded 13, and A. bakii encoded 16. The overall variability in the number of ferredoxins encoded by each genome may be a casualty of loss of other things in the genome or an indication that it has less or more specific electron transport functions to carry out.
(iii) Variability in electron transfer flavoprotein (ETF) copy number. Electron transfer flavoprotein (EtfAB) is the electron bifurcation module in Acetobacterium woodii (66). A. woodii encodes two separate EtfAB modules. One is adjacent to CarABC and part of the caffeate reduction operon (CarDE), while the other is directly upstream of a lactate dehydrogenase (LctD), a potential L-lactate permease (LctE), and a potential lactate racemase (LctF) and involved in lactate metabolism (67). Examination of the remaining 12 genomes revealed that, with the exception of Acetobacterium sp. UBA5558, all genomes encode at least one copy of EtfAB. Specifically, A. tundrae encodes four copies of EtfAB; A. bakii and A. paludosum encode three copies; A. woodii, A. malicum, A. fimetarium, and A. dehalogenans each encode two copies; and the remaining strains have a single copy. The gene architecture downstream of the single EtfAB module encoded by all strains (except Acetobacterium sp. UBA5558) was identical and included machinery for lactate metabolism. The four EtfAB modules of A. tundrae were not well conserved, with amino acid identities of EtfA ranging from 39% to 55% and of EtfB ranging from 43 to 48%, suggesting they are not duplications but potentially four different gene transfer events. Interestingly, one of the EtfAB modules found only in A. tundrae, A. paludosum, and A. bakii was adjacent to a butyryl-CoA dehydrogenase (acyl-CoA dehydrogenase), an L-carnitine dehydratase (CoA transferase), and a butyryl-CoA dehydrogenase (acyl-CoA dehydrogenase), suggesting that other electron acceptors besides caffeate can be utilized-potentially crotonyl-CoA (68).
Other genome features: alternative electron donors/acceptors utilized by the Acetobacterium genus. Acetobacterium species can use a wide range of substrates for carbon and energy, indicating a generalist lifestyle for the genus despite the perception that acetogens are specialists (Table 3). This could provide an advantage in anoxic environments where competition for limited substrates is high (69,70). As an example, encoded in the genomes of most Acetobacterium species are carbon utilization pathways for 1,2-propanediol, 2,3-butanediol, ethanol, lactate, alanine, methanol, glucose, fructose, and glycine betaine, and utilization of various electron acceptors, including caffeate (Data Set S2). Combining data of growth phenotypes of Acetobacterium isolates with in silico genomic evidence of metabolic pathways, we hypothesize predicted metabolisms of the Acetobacterium genus, in particular strains that have not been isolated (MAGs) or isolates that have yet to be tested for a particular phenotype (Table 3; Data Set S3).
(i) 1,2-Propanediol. Acetogens metabolize alcohols as an alternative to autotrophic acetogenic growth, including Acetobacterium carbinolicum, Acetobacterium woodii, and Acetobacterium wieringae (71). Acetobacterium woodii can utilize 1,2-propanediol as the sole carbon and energy source for growth (27). In anaerobic environments, formation of 1,2-propanediol results from the degradation of fucose and rhamnose, constituents of bacterial exopolysaccharides and plant cell walls (72). The 1,2-propanediol degradation pathway is encoded by the pduABCDEGHKL gene cluster, which in A. woodii (Awo_c25930 to Awo_c25740) contains 20 genes with similarity to the pdu cluster of Salmonella enterica (27). The presence, homology, and gene arrangement of the pdu gene cluster in each of the other 12 Acetobacterium genomes suggest that 1,2propanediol degradation is conserved across the Acetobacterium genus (Data Set S2). Furthermore, all strains contain a histidine kinase and response regulator upstream of the 1,2-propanediol cluster, suggesting that the regulatory and expression mechanisms of this pathway are conserved. In A. woodii, it was proposed to sense alcohols (chain length Ͼ2) or aldehyde intermediates-a mechanism we hypothesize is employed by all Acetobacterium species.
(ii) 2,3-Butanediol. The 2,3-butanediol oxidation pathway is encoded by the acoR-ABCL operon in Acetobacterium woodii, and unlike 1,2-propanediol degradation, the WLP accepts reducing equivalents from 2,3-butanediol oxidation to generate acetate (26). AcoR is a putative transcriptional activator, AcoA is a thiamine PP i (TPP)-dependent acetoin dehydrogenase (alpha subunit), AcoB is a TPP-dependent acetoin dehydroge-TABLE 3 Growth phenotypes and predicted metabolisms of Acetobacterium spp. a a Growth phenotypes were compiled from experimental data with observed growth (ϩ), no growth (Ϫ), or not determined (ND) (reference 109 and references therein). Metabolisms were predicted as present (dark blue) or absent (yellow). Light blue represents pathways that are complete but in which one or two predicted proteins have low sequence identity. Dark boxes represent inconsistencies between experimentally validated phenotypes and predicted metabolisms from genome data. Growth phenotypes are unavailable for the five metagenome-assembled genomes on the right (Acetobacterium sp. MES1, Acetobacterium sp. UBA5558, Acetobacterium sp. UBA5834, Acetobacterium sp. UBA6819, and Acetobacterium sp. KB-1).
(iii) Ethanol. In acetogens, the oxidation of primary aliphatic alcohols, such as ethanol, is coupled to the reduction of CO 2 (75). The ethanol oxidation pathway has been elucidated in A. woodii and contains a bifunctional acetaldehyde-CoA/alcohol dehydrogenase (AdhE) that is upregulated during growth on ethanol (23), suggesting this was the primary enzyme responsible for ethanol oxidation. Examination of the other sequenced Acetobacterium genomes revealed Acetobacterium sp. KB-1 and Acetobacterium sp. UBA6819 were the only strains to encode a complete AdhE with 87% sequence identity to A. woodii (Data Set S2). Despite the lack of AdhE in the other species of Acetobacterium, species like A. wieringae are capable of growth on ethanol (Table 3) (64,71,(75)(76)(77). It is likely that A. wieringae and the other Acetobacterium strains that do not encode the A. woodii-type bifunctional acetaldehyde-CoA/alcohol dehydrogenase employ an alternative pathway for ethanol oxidation. Biochemical analyses are needed to confirm this phenotype, but one hypothesis is that the conversion of ethanol to acetate proceeds via acetaldehyde by the activities of alcohol dehydrogenase (converts ethanol to acetaldehyde) and aldehyde:ferredoxin oxidoreductase (converts acetaldehyde directly to acetate), similar to Thermacetogenium phaeum (78). The genome of Acetobacterium sp. MES1 encodes an alcohol dehydrogenase (OXS24683.1) upstream of a tungsten-containing aldehyde:ferredoxin oxidoreductase (AFO) (OXS24682.1), both of which are also found in the A. wieringae genome. A. woodii does encode a similar alcohol dehydrogenase (Adh3; alcohol dehydrogenase, iron-type [AFA47416]) but does not encode a similar tungstencontaining AFO.
(v) Alanine. A. woodii encodes an alanine degradation pathway for utilization of alanine as a sole carbon and energy source (24). The alanine degradation pathway consists of a pyruvate:ferredoxin (flavodoxin) oxidoreductase (PFO) (AWO_RS12520), a sodium:alanine symporter family protein (AWO_RS12525), alanine dehydrogenase (AWO_RS12530), and the Lrp/AsaC family transcriptional regulator (AWO_RS12535). Examination of this operon revealed it was well conserved across the Acetobacterium genus, with the exception of A. fimetarium, A. paludosum, and A. tundrae, which lacked a sodium:alanine symporter, alanine dehydrogenase, and the transcriptional regulator, the details of which are discussed below (Data Set S2).
(vi) Caffeate. Some acetogenic bacteria utilize phenyl acrylates as alternative electron acceptors (82). One such phenyl acrylate, caffeate, is produced during lignin degradation and may be an available substrate in environments containing vegetation (83). Recently, it was observed that A. woodii has the ability to couple caffeate reduction with ATP synthesis (25). The caffeate reduction operon in A. woodii is encoded by carA2, carB2, carC, carD, and carE (Awo_c15700 to Awo_c15740). Specifically, CarA is a hydrocaffeyl-CoA:caffeate CoA transferase (25), CarB is an ATP-dependent acyl-CoA synthetase (84), CarC is a caffeyl-CoA reductase, and CarDE is an electron transfer protein (49). Examination of the other 12 Acetobacterium genomes revealed BLAST hits with low similarity (Ͻ53% identity) with the exception of three psychrophilic strains (A.  paludosum, A. fimetarium, and A. tundrae) and CarC, CarD, and CarE from Acetobacterium bakii (74%, 73%, and 73% amino acid sequence identity and 88%, 85%, and 86% amino acid sequence coverage, respectively) (Data Set S2). The lack of similar and congruous sequences suggests that only A. woodii, A. fimetarium, A. paludosum, and A. tundrae are capable of caffeate reduction via this pathway.
Potential adaptations for enhanced surface colonization. Many of the Acetobacterium genomes/MAGs encoded portions of the Widespread Colonization Island (WCI), which mediates nonspecific adherence to surfaces and biofilm formation (85). The products of the WCI are responsible for assembly and secretion of bundled pili (86) and may be important for colonization of diverse environments (87). The general structure of the WCI genomic region is similar for all sequenced Acetobacterium strains with the exception of A. woodii and includes 1 to 4 small hypothetical proteins (ϳ56 aa), followed by TadZ, Von Willebrand factor type A, RcpC/CpaB, TadZ/CpaE, TadA/VirB11/ CpaF, TadB, and TadC (Data Set S2). The small hypothetical proteins from Acetobacterium sp. MES1 show significant similarity (100% coverage, 98% identity) with the Flp/Fap pilin component from A. wieringae (OFV69504.1 to OFV69507.1). Likewise, immediately upstream of the Flp/Fap pilin components is a hypothetical protein with high similarity (100% coverage, 79% identity) to a prepilin peptidase from A. dehalogenans (WP_026393143.1). The presence of this potentially biofilm-enhancing WCI found in many of the species of Acetobacterium but not in A. woodii highlights the importance of expanding genome-informed biochemical analyses to species and strains beyond the type strain of a genus. One intriguing hypothesis resulting from the discovery of this WCI is that it may enhance surface attachment on insoluble electron donors like metallic iron or electrodes. A recent study suggests A. malicum and an isolate most closely related to A. wieringae more effectively extract electrons from solid Fe(0) coupons than does A. woodii (34). Additionally, Acetobacterium sp. MES1 is capable of colonizing cathodes in microbial electrosynthesis systems (14,33), but A. woodii has failed in all such attempts (88,89).
Closer examination of the strains that were phylogenetically most closely related to Acetobacterium wieringae, which include Acetobacterium sp. MES1 (capable of electroacetogenesis), revealed 42 proteins unique to this clade (Data Set S1). Of these unique protein sequences, several were of potential relevance for possible surface colonization and electron transport, including cardiolipin synthase for the production of membrane phospholipids, the global regulator diguanylate cyclase (Fig. S3), and methylenetetrahydrofolate reductase (methylene-THF reductase) (Data Set S1).
Potential adaptations to a psychrophilic lifestyle. Psychrophilic microorganisms have adapted to survive and grow in cold environments by varying membrane fluidity, optimizing transcription and translation (e.g., overexpression of RNA helicases, posttranscriptional regulation of RNA), and expressing cold shock proteins and coldadapted enzymes (90). Acetobacterium bakii employs posttranscriptional regulation for cold adaptation, and at low temperatures a lipid biosynthesis pathway (ABAKI_c35860 to -c35970), cold shock protein CspL (ABAKI_c09820 and ABAKI_c26430), and a Deadbox helicase (ABAKI_c00160 to -c00180) were upregulated (29). Examination of these pathways across the Acetobacterium genus revealed the lipid biosynthesis pathway was conserved, but for the psychrophilic strains the genome architecture surrounding this cluster was different. Furthermore, two of the four psychrophilic strains (A. tundrae and A. paludosum) encoded twice as many cold shock proteins (4 total) as the other Acetobacterium strains (1 to 2 total).
Other minor variations were observed in the psychrophilic strains, in particular the sodium-dependent ATPase and 1,2-propanediol operon. The subunits of the sodiumdependent ATPase involved that constitute the membrane-bound motor were less well conserved than the catalytic subunits, perhaps indicative of the varying membranes across Acetobacterium (Data Set S2). Furthermore, A. tundrae, A. paludosum, and A. fimetarium were the only strains that do not encode the c1 subunit. Minor variability in the 1,2-propanediol operon structures of three psychrophilic strains (A. fimetarium, A. paludosum, and A. tundrae) was evident, as a heme-binding protein (pfam03928) located between pduO and pduP was missing.
One adaptation to survival in cold environments that psychrophiles employ is to increase protein flexibility and stability (90). As a result, psychrophiles often contain a higher proportion of hydrophobic amino acid residues, such as alanine and glycine (91,92). The psychrophilic strains are the only sequenced Acetobacterium strains to lack an alanine degradation pathway and glycine cleavage system. One possible explanation for the loss of these pathways in the psychrophilic Acetobacterium strains is an increased utilization of alanine and glycine in cold-adapted protein synthesis (91). Interestingly, the alanine degradation pathway encodes alanine dehydrogenase, which may also play an important role in NH 4 ϩ assimilation (93). Glutamate dehydrogenase, another enzyme shown to be important for nitrogen assimilation, was also absent from the psychrophilic strains (Data Set S2). Thus, the psychrophilic strains may have adapted to obtain nitrogen from alternative sources, using an alternative pathway.
To further determine what functional genome attributes may separate the psychrophilic Acetobacterium species from their nonpsychrophilic counterparts, we identified gene families unique to the psychrophilic species (Data Set S1). Our analyses revealed 40 annotated gene families found only in the psychrophilic clade and included a calcium-translocating P-type ATPase, sodium:proton antiporter, cupin, ribulose 1,5bisphosphate carboxylase, and short-chain dehydrogenase (SDR) family oxidoreductase, among others (Data Set S1). The presence of a unique calcium-translocating P-type ATPase and sodium:proton antiporter suggests modified abilities to transport calcium and sodium. The 1,5-bisphosphate carboxylase (RuBisCO) and cupin may be involved in the methionine salvage pathway, which has been observed in other bacteria (94,95). SDR family oxidoreductases have a wide range of activities, including metabolism of amino acids, cofactors, and carbohydrates, and may play a role in redox sensing (96). More work is needed to determine the exact function each protein plays in the psychrophilic strains, but these findings can be used as a framework for targeted functional analyses in future studies to determine specific roles these unique proteins play in the metabolisms of psychrophilic Acetobacterium strains and their persistence in cold environments.
Conclusions. Acetogens are a phylogenetically diverse group of microorganisms capable of converting CO 2 into acetate. Acetobacterium woodii has been used as a model organism to study the WLP and accessory components required for energy conservation in Rnf-type acetogens. We sequenced four Acetobacterium isolates from the culture collection (ATCC) (A. fimetarium, A. malicum, A. paludosum, and A. tundrae) and performed pan-genome analysis on the 11 most complete genomes/MAGs. We examined the functional potential of the available Acetobacterium genomes to shed light on the diverse genome attributes, gene arrangement and architecture, and potential metabolic capabilities of the Acetobacterium genus. Using the type strain of the genus (A. woodii) as a framework for pathway identification and comparison, we found the common and conserved pathways included the WLP and accessory components (electron-bifurcating hydrogenase, ATP synthase, Rnf complex, ferredoxin, and electron transfer flavoprotein [ETF]), glycolysis/gluconeogenesis, and 1,2-propanediol. Figure 5 presents a metabolic overview of the Acetobacterium genus. Interestingly, the hydrogen-dependent CO 2 reductase (HDCR) is unevenly distributed among the genus. Based upon presence/absence, gene arrangement, and sequence similarity of many pathways, we hypothesize that divergent metabolisms found in a subset of genomes included caffeate reduction, 2,3-butanediol oxidation, ethanol oxidation, alanine metabolism, glycine cleavage system, and methanol oxidation. Notably, the psychrophilic strains encode unique amino acid transport and utilization, and ion transport, which may have evolved for survival in low-temperature environments. Members of the A. wieringae clade encode over 40 unique predicted protein sequences, six of which were annotated as diguanylate cyclases (DGCs), enzymes that catalyze the production of the secondary messenger cyclic-di-GMP known to induce biofilm formation (97) and increase tolerance to reactive oxygen species (98). These unique DGCs may play a role in the environments which these strains inhabit, potentially aiding in attachment to various surfaces, including carbon-based electrodes. Overall, the comparative genomic analysis performed on the Acetobacterium genus provides a framework to understand the conserved metabolic processes across the genus, as well as identifying divergent genomic features (e.g., surface attachment) that can be exploited for targeted biotechnological applications using Acetobacterium strains, such as CO 2 -based hydrogen storage (19), microbial electrosynthesis, and conversion of CO or syngas to acetate or other commodity chemicals.

MATERIALS AND METHODS
DNA extraction, DNA sequencing, and genome assembly of four Acetobacterium isolates. For A. fimetarium, A. malicum, A. paludosum, and A. tundrae, freeze-dried cells were obtained from ATCC and reconstituted on minimal freshwater medium growing autotrophically on H 2 :CO 2 (12). Chromosomal DNA was extracted using the AllPrep DNA/RNA minikit (Qiagen) according to the manufacturer's protocol. Extracted DNA was processed for Illumina sequencing using the Nextera XT protocol (Illumina). Samples were barcoded to enable multiplex sequencing of four samples using a single MiSeq V3 kit (2 ϫ 301). Raw paired-end sequences were quality trimmed with CLC Genomics workbench (Qiagen) with a quality score cutoff of Q30, resulting in a total of 297, 911, 471, and 548 million bp for A. fimetarium, A. malicum, A. paludosum, and A. tundrae, respectively. Trimmed paired-end reads were assembled with SPAdes (v. 3.7.0) using the -careful flag to reduce mismatches and short indels (99). Assembly of trimmed paired-end reads resulted in four 'high-quality' draft genomes, a designation which is based upon minimum mandatory genome reporting standards of Ͼ90% completion, Ͻ5% contamination, the presence of 23S, 16S, and 5S rRNA genes, and at least 18 tRNAs (100) ( Table 1; also see Table S1 in the supplemental material). The SPAdes assembled contigs were assessed for quality using Quast (101) and CheckM (v. 1.0.7) (37).
Manual genome curation and pathway identification. The Acetobacterium woodii genome (https://www.ncbi.nlm.nih.gov/nuccore/379009891?reportϭgenbank) was utilized for pathway identification, gene synteny, and sequence similarity. Fasta files of all genomes were downloaded from the NCBI genome database (August 2018) and uploaded to RAST (102, 103) for annotation. The RASTtk pipeline was used to analyze each genome. The BLAST algorithm in RAST was used to find specific gene sequences and to assess gene synteny of operons (e.g., rnfCDGEAB). Sequence similarity of predicted protein sequences is presented in Data Set S2. To determine what cutoffs should be utilized in assessing pathway conservation, we analyzed RecA across all sequenced Acetobacterium strains and found the sequence identity ranged from 100% to 86%. Pathways were categorized as highly conserved (high sequence similarity [86 to 100%] and near-identical gene arrangement), conserved (high to medium sequence similarity [50 to 86%] with variation in gene arrangement), or divergent (low sequence similarity [0 to 50%] with variation in gene arrangement). Thus, well-conserved sequences should share at least 86% identity. RAST protein-encoding gene (peg) identifiers reveal the synteny of genes that encode each protein in an operon, and where available, NCBI accession numbers are provided.
Concatenated and single protein trees. Predicted protein sequences encoded by the hydABDEC operon, Rnf operon, or individual diguanylate cyclases (DGCs) were identified in each genome. Proteins encoded by each operon were manually concatenated into a single contiguous amino acid sequence. Concatenated protein sequences were aligned in MEGA6.06 (104) using MUSCLE (105) with the following parameters: gap open penalty (Ϫ2.9), gap extend penalty (Ϫ0.01), hydrophobicity multiplier (1.2), and UPGMB clustering method with a minimum diagonal length (lambda) of 24. A protein tree was constructed using the Maximum Likelihood method with the following parameters: test of phylogeny ϭ bootstrap method, number of bootstrap replications ϭ 1,000, substitutions type ϭ amino acid, model/ method ϭ Jones-Taylor-Thornton (JTT) model, rates among sites ϭ uniform rates, gap/missing data treatment ϭ partial deletion, ML heuristic method ϭ nearest-neighbor-interchange (NNI), and branch swap filter ϭ very strong.
Pan-genome analysis. Pan-genomics was performed with the Bacterial Pan Genome Analysis Tool (BPGA) version 1.0.0 (106). Initially, default BPGA parameters were utilized, which include protein clustering at 50% sequence identity cutoff with USEARCH (107). Further analysis was performed at various clustering cutoffs ranging from 10% to 99% to examine the pan-genome partitioning (e.g., total gene families, core gene families, accessory gene families, and unique genes). Resulting pan-genome and core genome trees were visualized using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). The representative Acetobacterium core gene sequences from BPGA were uploaded to BlastKOALA (108) to evaluate KEGG pathway predictions.
Genotype/phenotype analysis. Observed growth phenotypes on various substrates from the work of Simankova and coworkers (109), and references therein, were overlaid with predicted metabolisms from genome data. Predicted metabolisms were determined as described above and also determined with KofamKOALA (110). Amino acid fasta files from RAST were uploaded to the KofamKOALA website (www.genome.jp/tools/kofamkoala) and analyzed using a 0.01 E value threshold. KofamKOALA data were compiled, and all pathway modules were examined (Data Set S3).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.03 MB.