Comparative Genomics Reveals Thermal Adaptation and a High Metabolic Diversity in “Candidatus Bathyarchaeia”

ABSTRACT “Candidatus Bathyarchaeia” is a phylogenetically diverse and widely distributed lineage often in high abundance in anoxic submarine sediments; however, their evolution and ecological roles in terrestrial geothermal habitats are poorly understood. In the present study, 35 Ca. Bathyarchaeia metagenome-assembled genomes (MAGs) were recovered from hot spring sediments in Tibet and Yunnan, China. Phylogenetic analysis revealed all MAGs of Ca. Bathyarchaeia can be classified into 7 orders and 15 families. Among them, 4 families have been first discovered in the present study, significantly expanding the known diversity of Ca. Bathyarchaeia. Comparative genomics demonstrated Ca. Bathyarchaeia MAGs from thermal habitats to encode a large variety of genes related to carbohydrate degradation, which are likely a metabolic adaptation of these organisms to a lifestyle at high temperatures. At least two families are potential methanogens/alkanotrophs, indicating a potential for the catalysis of short-chain hydrocarbons. Three MAGs from Family-7.3 are identified as alkanotrophs due to the detection of an Mcr complex. Family-2 contains the largest number of genes relevant to alkyl-CoM transformation, indicating the potential for methylotrophic methanogenesis, although their evolutionary history suggests the ancestor of Ca. Bathyarchaeia was unable to metabolize alkanes. Subsequent lineages have acquired the ability via horizontal gene transfer. Overall, our study significantly expands our knowledge and understanding of the metabolic capabilities, habitat adaptations, and evolution of Ca. Bathyarchaeia in thermal environments. IMPORTANCECa. Bathyarchaeia MAGs from terrestrial hot spring habitats are poorly revealed, though they have been studied extensively in marine ecosystems. In this study, we uncovered the metabolic capabilities and ecological role of Ca. Bathyarchaeia in hot springs and give a comprehensive comparative analysis between thermal and nonthermal habitats to reveal the thermal adaptability of Ca. Bathyarchaeia. Also, we attempt to determine the evolutionary history of methane/alkane metabolism in Ca. Bathyarchaeia, since it appears to be the first archaea beyond Euryarchaeota which contains the mcrABG genes. The reclassification of Ca. Bathyarchaeia and significant genomic differences among different lineages largely expand our knowledge on these cosmopolitan archaea, which will be beneficial in guiding the future studies.

IMPORTANCE Ca. Bathyarchaeia MAGs from terrestrial hot spring habitats are poorly revealed, though they have been studied extensively in marine ecosystems. In this study, we uncovered the metabolic capabilities and ecological role of Ca. Bathyarchaeia in hot springs and give a comprehensive comparative analysis between thermal and nonthermal habitats to reveal the thermal adaptability of Ca. Bathyarchaeia. Also, we attempt to determine the evolutionary history of methane/alkane metabolism in Ca. Bathyarchaeia, since it appears to be the first archaea beyond Euryarchaeota which contains the mcrABG genes. The reclassification of Ca. Bathyarchaeia and significant genomic differences among different lineages largely expand our knowledge on these cosmopolitan archaea, which will be beneficial in guiding the future studies.  Completeness and contamination were estimated by CheckM (Parks et al. [65]).
Alkane Metabolism in Thermophilic Ca. Bathyarchaeia and soil (32,37) environments (Data Set S1). The wide distribution and broad phylogenetic diversity of these Ca. Bathyarchaeia MAGs suggest a high level of genomic and/ or phenotypic plasticity, which may allow them to occupy a wide range of habitats.
Studies of Ca. Bathyarchaeia diversity relying on the 16S rRNA gene identified up to 23 subgroups (16) (Fig. 1a). In our study, 12 of these 23 subgroups were identified as having representative genomes ( Fig. 1a and b), while 13 of the 95 Ca. Bathyarchaeia genomes (14%) could not be categorized within the existing subgroups due to the lack of 16S rRNA genes. This is not surprising, as MAGs commonly lack 16S rRNA gene sequences due to assembly technique biases or poor genome completeness (33). While traditionally 16S rRNA-based classification has been useful to classify taxonomic groups such as the Ca. Bathyarchaeia, limited metabolic information can be inferred from analysis of this single gene. Therefore, reclassification of this group based on MAGs is necessary. To do this, we used the GTDB genome classification software as an objective measure of taxonomic assignment (38). GTDB is a genome-based taxonomy with phylogenetic consistency that provides rank-normalized classifications for genomes from domain to genus (28), which has previously been missing in the classification of the phylogenetically diverse Ca. Bathyarchaeia. By applying this strategy, all 95 Ca. Bathyarchaeia MAGs were assigned to seven orders and 15 families (Fig. 1b). Order-1, -2, -3, -4, and -5 were found to correspond to Subgroup-21, -22, -17, -18, and -15, previously utilized by Zhou et al. (16) (Fig. 1a), which had the lowest amino acid identity (AAI) (40% to 51%) between each of the order groupings. The MAG ex4484_135 was found to be the sole representative of the sixth order. The remaining subgroups form the seventh order had the lowest intra-order AAI, at 44%. Of the 35 MAGs newly assembled in this study, they organized into five orders covering nine families (Family-1.1, -1.2, -1.3, -3.1, -4.1, -4.3, -5, -7.2 and -7.4). Six MAGs were assigned to Subgroup-7 and represent the first genomes in this group (Fig. 1b). Likewise, four families, Family-1.1, -1.2, -3.1 and -4.1, represent novel lineages solely represented by MAGs from the Tibet and Tengchong hot springs. An obvious outcome of this comparative analysis was the high level of congruence between the 16S rRNA and GTDB trees ( Fig. 1a and b). Apart from the inversion of Order-3 and -4, along with MAGs within Family-7.3, there were little differences between these taxonomies. These results also suggest further work needs to be done to link existing 16S rRNA and MAGs and identify more Ca. Bathyarchaeia MAGs and 16S rRNA genes from environmental samples.
Potential metabolic capabilities of hot spring Ca. Bathyarchaeia. Analysis of open reading frames for central metabolic pathways in the hot spring-associated Ca. Bathyarchaeia MAGs Family-3.1, -4.1, and -4.3 revealed the presence of a complete Embden-Meyerhof-Parnas (EMP) glycolysis pathway, whereas other Ca. Bathyarchaeia families only contain a partial EMP pathway (Fig. 2, Data Set S2). Genomes in Order-2, -5, and -7 and Family-1.2, -1.3, -4.1, and -4.3 harbor genes for the gluconeogenesis pathway for synthesizing glucose-6-phosphate (or fructose-6-phosphate) from the non-carbohydrate-precursor oxaloacetate (Fig. 3). All the Ca. Bathyarchaeia lineages appear to utilize the pentose phosphate pathway (PPP) to generate pentoses and ribose 5-phosphate (R5P), a precursor for the biosynthesis of nucleotides and aromatic amino acids, such as histidine. The MAGs from Family-4.2 and -7.2 have the potential to further convert the R5P into phosphoribosyl pyrophosphate (PRPP) after entering the nonoxidative phase of PPP. However, most other lineages, including Order-1 and -2 and Family-3.1, -4.1, -4.3, -7.3, and -7.4 lack genes for the nonoxidative phase and would likely employ a reverse ribulose monophosphate pathway to perform this same function (39) (Fig. 2 and 3, Data Set S2). Interestingly, Order-5 contains both gene sets, indicating flexibility in the conversion of PRPP and further promoting the synthesis of amino acids and nucleotides (Fig. 3). Consequently, the core metabolic capabilities associated with central carbon processing appear to be similar between these thermophilic Ca. Bathyarchaeia MAGs.
It appears that the Ca. Bathyarchaeia MAGs are able to metabolize acetate via two distinct pathways. Specifically, the archaeal-type acetate-forming gene encoded by ADP-forming acetyl-CoA synthetase (acd) is found in many lineages, including Order-2, -3, and -5 and Family-4.1, -4.3, -7.2, -7.3, and -7.4 ( Fig. 2 and 3). Alternatively, the bacterial-type acetate-forming genes phosphate acetyltransferase (pta) and acetate kinase (ack) were detected in Family-1.3, -2, and -7.2 MAGs, which are exclusively distributed in hydrothermal vent environments. This result indicates the ability to produce acetate by Ca. Bathyarchaeia can be lineage, or at least environment, specific. Also, Family-4.3 from hot springs may have the potential to form acetate, since all four MAGs in this lineage contain the ack gene (Data Set S2). Ca. Bathyarchaeia utilization of acetate has been demonstrated previously in 13 C-labeled acetate incubations of estuarine sediments (41)(42)(43). Furthermore, the presence of alcohol dehydrogenases among MAGs from Family-1.2, -1.3, -4.1, -4.3, and -5 and Order-7 suggests they may have the ability to ferment other small organic compounds (18) (Fig. 3).
Previously, it has been suggested that Ca. Bathyarchaeia perform carbon fixation via bacterial and archaeal type Wood-Ljungdahl (WL) pathways (22). In the 35 Ca.  2) from hot springs possess the complete archaeal-type WL pathway with tetrahydromethanopterin (H 4 MPT) as C 1 -carrier (Fig. 2). The ability to fix carbon dioxide via the WL pathway and form acetate has been predicted previously in Ca. Bathyarchaeia and supported by the activity of heterologously expressed Ca. Bathyarchaeia acetate kinase in vitro (17). The nearly complete archaealtype of WL pathway in Family-3.1 and -7.4 suggests they may also have the capacity to fix carbon dioxide. However, the 5,10-methylenetetrahydromethanopterin reductase (mer) was not recovered in MAGs from these two families. This suggests they may utilize an alternative and unknown complex to perform this same function. Also, by using tetrahydrofolate as C 1 -carrier, some Ca. Bathyarchaeia can fix carbon dioxide via the bacterial type of WL pathway (44). In particular, Family-5 harbors all genes necessary for the bacteria-type WL pathway, including the key enzyme formate dehydrogenase  (fdhA), which converts CO 2 to formate, with the exception of methylenetetrahydrofolate reductase (metF), which is absent (44). Ca. Bathyarchaeia are presumably able to take advantage of both WL pathway types to conserve energy, promoting their adaptability to thrive across several environments. Neither an ATP-citrate lyase nor citrate synthase were detected in any of the 35 hot spring-derived MAGs, suggesting the inability of the tricarboxylic acid cycle (TCA cycle) and reductive tricarboxylic acid cycle (rTCA cycle) to function in Ca. Bathyarchaeia. The ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCO) gene, an important component of the Calvin-Benson-Bassham (CBB) cycle, was observed in Order-1, Family-3.1, -4.1, -4.3, and -5, but not Family-7.2 and -7.4 (Data Set S2). Phylogenetic analysis showed that most RuBisCO gene sequences affiliated with the form-III type enzymes (Fig. S2), which may be involved in the pathway for AMP metabolism. We rule out the possibility of a carbon fixation pathway using the CBB cycle due to the lack of phosphoribulokinase in all Ca. Bathyarchaeia genomes (45) (Data Set S2). The wide detection of genes involved in the AMP metabolism pathway, including adenine phosphoribosyltransferase, AMP phosphorylase, and ribose 1,5-bisphosphate isomerase, gives further supporting evidence for the role of Ca. Bathyarchaeia RuBisCO enzymes in the metabolism of AMP. The end product of this AMP pathway, glycerate-3-phosphate (G3P), would then enter the previously mentioned EMP glycolysis pathway.
Comparative genomics of Ca. Bathyarchaeia from thermal and nonthermal environments. Although Ca. Bathyarchaeia have been reported to have an evolutionary origin in hot environments (22), attributes specific to thermophilic Ca. Bathyarchaeia remain unknown. Comparative genomics was leveraged to reveal genomic features that are specific to the thermophilic lineages. The similar completeness of the Ca. Bathyarchaeia MAGs from thermal and nonthermal environments, 88.40% and 86.70% mean estimated completeness, respectively, ensures the validity of the comparison between these groups (Fig. 5a). The average genome size of the thermophilic MAGs was 1.42 Mbp and was significantly less than nonthermophilic MAGs at 1.65 Mbp (Mann-Whitney U test, P = 0.004362; Fig. 5b). Consequently, the thermophilic MAGs on average contained fewer putative open reading frames at 1,578, compared to the nonthermophilic MAGs at 1,918 open reading frames (Mann-Whitney U test, P = 2.51 Â 10 24 ; Fig. 5c). However, this decreased number of open reading frames was slightly offset by a higher coding density in the thermophilic MAGs, of 88.54% compared to 85.55% in the nonthermophilic MAGs (Mann-Whitney U test, P = 6.68 Â 10 25 ; Fig. 5d). A possible explanation for these differences is that thermophilic microorganisms favor small genomes from reduced noncoding regions driven by the so-called genome streamlining process (46)(47)(48). A whole-genome comparison of all the 77 MAGs based on KEGG Orthology (KO) showed them clustering into four distinct groups (the ANOSIM test, R = 0.57, P = 0.001; Fig. 5e). It appears the taxonomic lineage and the habitat type work in concert to shape the clustering pattern of these MAGs (Fig. 5e). Notably, only one of the four KO clusters is solely constituted by MAGs from thermal environments and comprises Family-1.1, -1.2, -1.3, -7.1, and -7.2. While the thermal and nonthermal clusters for the three families of Order-4 were well distinguished based on their phylogenetic relationships (Fig. 5f), thermal and nonthermal MAGs in Family-7.4 were not well separated (Fig. 5g). KEGG Orthology (KO) functional gene profiling revealed significant differences between thermal-and non-thermal-derived MAGs (Fig. 5h), with genes related to cell growth and death, signal transduction, replication and repair, energy metabolism, and metabolism of terpenoids and polyketides being more enriched in nonthermal MAGs (Fig. 5h). In contrast, genes relevant to carbohydrate metabolism are more abundant in the thermal MAGs, which appears to be mainly driven by the previously discussed CAZymes in Family-1.2, -1.3, and -7.2 (Fig. 4).
To further explore the distribution of carbohydrate metabolism capabilities in the Ca. Bathyarchaeia MAGs, CAZy-based annotation of these genes was conducted. Consistent with the above results (Fig. 4), the thermal-derived MAGs from Family-1.2, -1.3, and -7.2 harbor a significantly higher number of CAZymes related to carbohydrate degradation compared to nonthermal lineages (least significance difference test, all P values , 0.05; Fig. 5i). Specifically, genes associated with the degradation of cellulose (b-glucosidase, endoglucanase), hemicellulose (a-L-fucosidase), pectin (a-L-rhamnosidases), oligosaccharides, and other polysaccharides are more enriched in these three families ( Fig. 5j; Data Set S3). Family-1.1 and -7.1, both exclusively from thermal habitats, do show a higher number of CAZymes than most nonthermal lineages, although this result was not significant. The likely ability to utilize a wide range of carbohydrates by Family-1.2, -1.3, and -7.2 is a likely driver for the evolutionary differentiation and may be a vital strategy for survival in thermal environments. Hence, we propose a plausible evolutionary scenario where these thermophilic Ca. Bathyarchaeia utilize a variety of polysaccharides as part of a generalized heterotrophic metabolism in nutrient-poor and extreme geothermal environments, as previously suggested for other saccharolytic thermophiles (40).
While carbohydrate utilization is one mechanism associated with some thermophilic Ca. Bathyarchaeia, other thermophilic MAGs also included diverse molecular chaperones, including heat shock proteins (HtpX and Hsp20) and DNA repair enzymes (RadAB) (Fig. S3). These mechanisms are similar to those commonly utilized by other thermophilic microbes to deal with heat stress (49). However, these genes were found across all Ca. Bathyarchaeia MAGs, suggesting they were retained from the thermophilic ancestor of the Ca. Bathyarchaeia. One key determinant of thermophily is reverse gyrase (rgy), which was detected only in thermal-derived bathyarchaeial families, including Family-1.2, -1.3, -2, -4.1, -6, -7.2, and -7.4. Further phylogenetic analysis reveals a complicated evolutionary history of the rgy gene, with frequent HGTs detected (Fig. 6). It is likely that Ca. Bathyarchaeia MAGs containing the rgy genes are    obligate thermophilic organisms. This result is further supported by the absence of the DnaK-DnaJ-GrpE chaperone system in the thermal Ca. Bathyarchaeia, as this system is considered to be important to mesophiles (50). The widespread presence of the DnaK-DnaJ-GrpE genes in mesophilic MAGs and in those thermophilic MAGs without rgy suggests this may be the case. Interestingly, two genomes (M10_bin185 and DRTY-6_2_bin_141) contain both the rgy gene and DnaK-DnaJ-GrpE chaperone system, suggesting they may have the ability to grow and/or survive in a wider range of temperatures. Evolution of methane and alkane metabolism in Ca. Bathyarchaeia. While many Ca. Bathyarchaeia appear to be heterotrophs, two Ca. Bathyarchaeia MAGs were recently shown to possess genes for the methyl-coenzyme M reductase (mcrABG) complex, a key enzyme involved in methane/alkane metabolism. These two MAGs were the first archaea outside of the classic methanogen/methanotroph lineages to be identified with these genes (23). Their Mcr complexes fall into a cluster that also contains mcr genes from the Halobacteriota (per GTDB), including Ca. Syntrophoarchaeum (25), Archaeoglobi (51), Ca. Methanoliparia (52), and Ca. Hadarchaeota (27,53), along with members from the Ca. Helarchaeales (54). Phylogenetic analysis places all of these mcrABG sequences into a lineage distant from traditional archaeal methanogens and methanotrophs (Fig. S4) and suggests they are involved in the oxidation of short chain alkanes such as butane/propane, rather than methane metabolism (25). While it has been suggested that these Ca. Bathyarchaeia may conduct alkane oxidation, the detection of genes for b-oxidation and acetyl-CoA oxidation pathways suggest these mcrcontaining Ca. Bathyarchaeia may have metabolic capabilities similar to those proposed for the alkane-oxidizing Ca. Syntrophoarchaeum (25). Given the multiple copies of Mcr genes in Ca. Syntrophoarchaeum, and phylogenetically diverse taxa containing related Mcr complexes, we speculate that the Mcr complexes in Ca. Bathyarchaeia, Ca. Hadarchaeota, and Ca. Helarchaeales are likely to have been derived from Ca. Syntrophoarchaeum via HGT events (Fig. S4). This result would be consistent with other suggestions that HGT is a driver of the transfer of this gene complex (54).
Along with the Mcr complex, a variety of other methanogenesis-related genes were detected in the Ca. Bathyarchaeia MAGs. Most methanogens encode the membranebound tetrahydromethanopterin S-methyltransferase (mtrABCDEFGH) complex, catalyzing the energy-conserving (Na 1 -translocating) methyl transfer from methyltetrahydromethanopterin (H 4 MPT) to coenzyme M (CoM-SH). For the Ca. Bathyarchaeia, Family-2 appears to be the only lineage that possesses genes that would encode a nearly complete MTR complex (Data Set S4), and the reason for their presence in these MAGs remains unclear. While mtrAH genes were detected in 6 of 35 terrestrial thermophilic Ca. Bathyarchaeia, it has been reported previously that mtrAH genes in Nitrososphaeria would likely encode for corrinoid and methyltransferase proteins that would allow for the assimilation of unknown methylated compounds (23,27,55). Given the widespread nature of these mtrAH genes in the Ca. Bathyarchaeia, phylogenetic trees of mtrA genes were generated to better understand their evolution (Fig. 7a). This analysis showed that mtrA genes from nonthermophilic Ca. Bathyarchaeia clustered into a single group, suggesting an ancient HGT event with the bacterial phylum Actinobacteriota as the potential donor (Fig. 7b). Alternatively, the thermophilic Ca. Bathyarchaeia mtrA sequences form two separate clusters and appear to have two independent evolutionary histories (Fig. 7). Likewise, cluster 3 from the thermophilic lineages is relatively conserved, though their ancestor may have been transferred from Halobacteriota or Methanobacteriota (Fig. 7d). While this result appears clear, the presence of Asgardarchaeota mtrA genes with low bootstrap values suggests the placements of these Ca. Bathyarchaeia mtrA sequences may change as additional sequences are discovered. No obvious HGTs are detected for the ancestor of the mtrA genes derived from terrestrial thermal habitats in cluster 2, given that Thermoproteia sequences form a sister lineage (Fig. 7c). It also needs to be mentioned that the Ca. Bathyarchaeia from Family-2 are exclusively from hydrothermal habitats and again suggests thermophilic mtrA genes from terrestrial (cluster 2) and marine (cluster 3) ecosystems have different evolutionary histories.
Amalgamated likelihood estimation (ALE) analyses consolidate the inference that a speciation event, rather than other gene acquisition, occurred for mtrA genes in Family-4.2 (Fig. 8). These results suggest the mtrA gene homologs in Order-4 have a thermal origin in the ancestor of Order-4 and may have acquired this gene horizontally, while it is preserved in Family-4.1 and -4.3 but lost in Family-4.2. Furthermore, to sustain the ability to metabolize unknown methylated substrates, Family-4.2 mtrA genes evolved to function in nonthermal habitats. While the Ca. Bathyarchaeia in nonthermal environments and terrestrial hot springs contain genes for a partial Mtr complex (mtrAH), the Ca. Bathyarchaeia in submarine hydrothermal vents harbor genes that would generate a nearly complete Mtr complex. Furthermore, sequence identities among all mtrA genes confirm the independent evolutionary trajectories among the three groups (Fig. 7e). In contrast to the clear evolutionary pattern for mtrA genes, the mtrH subunit shows a much more complicated evolutionary history, where HGT signals are frequently observed among different lineages (Fig. S5).
To better understand the origin and evolution of the suggested alkane metabolism in Ca. Bathyarchaeia, all related genes were recruited into an ALE analysis. Results suggest that the Ca. Bathyarchaeia ancestor may have synthesized acetyl-CoA from acetate, fixed carbon dioxide via the WL pathway, and harbored many genes for electron transport and energy conservation, including etfAB, hdrABC, hdrD, glcD, frhB, fpo-like complex genes, and V/A-type ATPase genes (Fig. 8). This analysis also suggests the ancestor did not have the ability to oxidize alkanes due to the lack of the Mcr complex, which is consistent with the inference that it was a HGT event from a Ca. Syntrophoarchaeum or similar alkane-oxidizing archaeon into the Family-7.3 MAGs (Fig. 8). Not surprisingly, this Ca. Bathyarchaeia lineage contains 24 of the 38 methanogenesis marker proteins that have been reported previously (52). Likewise, there is a clear pattern from the ALE analysis that shows the ancestor of all Ca. Bathyarchaeia was unable to metabolize methylated compounds, such as those for methanol and (di/ tri)-methylamine utilization via mtaA, mtbB, mtmB, and mttBC genes in Family-2 (Data Set S4), and have only been picked up by certain lineages. Also, it is likely that gene duplication of these methyl group utilization genes is another driver of metabolic diversity. For the acyl-CoA oxidation module, b-oxidation and acetate production via the acd gene or pta-ack pathway have been acquired via HGT, with the pta-ack pathway appearing to have only been acquired by MAGs from hydrothermal vent environments (Fig. 8).
b-Oxidation appears in many lineages, including Family-7.3, which is consistent with the function of Family-7.3 predicted to perform butane/propane oxidation using a similar mechanism to that suggest in Ca. Syntrophoarchaeum (25). However, this remains unproven due to the lack of experimental confirmation. Taken together, these results suggest that the ability of Family-7.3 MAGs to potentially oxidize alkanes is the result of HGT rather than vertical decent from a common ancestor. However, we still cannot rule out the possibility that the common ancestor of Family-7.3 and Family-2 may have had this ability, due to the prevalence of genes for alkyl-CoA transformation described previously (53) and several methanogenesis marker genes (Fig. 8) in Family-2.
Regarding the coenzyme recycling and energy conservation module, most genes undergo frequent HGT, accompanied by substantial gene duplication events. For example, both ech and mrp complexes in Family-2 are acquired horizontally, providing sufficient energy for the conversion of methylated compounds (Fig. 8). Similar to Methanomethylicia (24) (formerly Ca. Verstraetearchaeota phylum) and Korarchaeia (52) (formerly Ca. Korarchaeota phylum), Ca. Bathyarchaeia harbor F 420 H 2 :phenazine oxidoreductase (fpo), but lack the fpoFO subunits (Data Set S4), suggesting the inability to reoxidize reduced F 420 for energy conservation seen in some methanogens (56). Instead, they may employ the membrane-bound heterodisulfide reductase subunit D (hdrD) to form an energy-converting ferredoxin:heterodisulfide oxidoreductase, and concomitantly to generate a proton motive force across the cytoplasmic membrane; this mechanism has been predicted previously in H 2 -dependent methylotrophic methanogens (57). A total of 22 MAGs within nine families among the 35 MAGs from hot spring sediments harbor the fpo-like complex and different families show quite divergent cluster topologies (Fig. 9). Family-4.3 and -5 show similar fpo operon structures as predicted methanogens from the Methanomethylicia and Korarchaeia, except for the insertion of a fpoM copy in the operon from Family-5 (#1 and #5 in Fig. 9). Family-1.3, -3.1, and -4.1 contain all subunits but with rearranged operon structures (#2 to #4). Also, genome reduction in the thermophilic MAGs described previously (Fig. 5c) may play a role in some fpo operons, as Family-1.1, -1.2, -4.3, -5, -7.2, and -7.4 appear to have lost at least one subunit from this operon (#6 to #10). Conversely, subunits fpoBDHL may likely be indispensable (58), as they were always present (Fig. 9). Non-fpo genes were also identified within this operon, where hdrB2 was found in Family-4.3 and suggests alternative sources of electron transfer, such as the reoxidation of coenzyme M-coenzyme B heterodisulfide bonds (CoM-S-S-CoB) (#8) (59). Also, we observed the nuoEFG genes in only one (JZ_bin_32) of the 35 thermal MAGs, which may allow this complex to bind and oxidize NADH to NAD 1 rather than electron carriers such as F 420 , ferredoxin, or CoM-S-S-CoB predicted in other operons (Fig. 10a). While phylogenetic analyses of these nuoEFG genes place them within the phylum Chloroflexota and suggest these genes may have been obtained via HGT (Fig. 10b), these subunits are fragmented and contain several termination codons, suggesting that these genes may not be functional. Interestingly, the sequence coverage depth of the nuoEFG-containing scaffold is high at ;80Â across the assembled scaffold (Fig. 10c).
In conclusion, the present study taken as a whole largely expands the current diversity of Ca. Bathyarchaeia MAGs and shows that many of the hot spring-associated lineages are able to fix carbon dioxide and heterotrophically degrade a variety of carbohydrates. Also, it appears that these thermophilic Ca. Bathyarchaeia MAGs have evolved a greater number of genes related to carbohydrate degradation and their genomes have undergone genome streamlining consistent with these environments. Furthermore, we show that two lineages may have the ability to metabolize methane/alkane due to the wide detection of genes related to methanogenesis and/or alkane oxidation. Evidence also shows that the acquisition of these metabolic capabilities is likely the result of HGT rather than vertical inheritance. Overall, this study largely expands the understanding of metabolic capacities and evolution of Ca. Bathyarchaeia lineages, providing clues for the further discovery of lineages and future isolation of this widespread archaea, while shedding light on the metabolic capabilities of early life on earth.
Metagenomic assembly and genome binning. Raw metagenomic reads were generated on an Illumina Hiseq 4000 sequencer were quality filtered to obtain quality reads as described previously (61). Clean reads from each sample were assembled independently using SPAdes v3.9.0 (62) with the following parameters: -k 21,33,55,77,99,127 -meta. The assembled scaffolds with lengths of .2,500bp were kept for further analysis. Genome binning was conducted using Metabat v2.12.1 (63) and ESOM (Emergent Self-Organizing map) v1.1 (64). Specifically, sequence depth was calculated by mapping quality reads from each sample to the assembled scaffolds separately using BBMap v38.85 (http:// sourceforge.net/projects/bbmap/) with parameters as follows: k = 15 minid = 0.9 build = 1. MAGs were generated using Metabat by considering both the sequence depth and tetranucleotide frequency (TNF) information. The genome completeness, contamination, and strain heterogeneity of each MAG were evaluated using CheckM v1.0.5 (65). Scaffolds from all MAGs were sheared into short fragments (5 to 10 kb) and were visualized using ESOM based on their TNF. Low-quality MAGs were manually investigated and scaffolds with abnormal coverage information and discordant positions in the ESOM map were removed, as previously described (27). Finally, cleaned reads for each MAG were recruited using BBMap (the same parameters as mentioned above) and were reassembled using SPAdes with the following parameters: -careful -k 21,33,55,77,99,127. A total of 35 genome bins belonging to Ca. Bathyarchaeia were obtained from this process for further analysis.
Phylogenetic analysis. In total, 95 Ca. Bathyarchaeia MAGs including 35 from this study and 60 from currently available public databases (NCBI refSeq and IMG databases, downloaded 7 June 2019) with more than 50% genomic completeness and less than 10% genomic contamination were collected for the phylogenomic tree reconstruction. The 95 Ca. Bathyarchaeia MAGs were incorporated into the GTDB-tk v0.2.2 (82), an open-source toolkit for the taxonomic classification of genome and MAG assemblies with 122 concatenated archaeal single-copy marker protein sequences. The concatenated alignment was used to generate phylogeny by applying IQ-TREE v1.6.10 (83) with the mixture model of LG1F1R8 and with 1,000 ultrafast bootstrapping. The best model was determined by ModelFinder (84), which is well supported by Bayesian information criterion (BIC).
A phylogenetic tree based on 16S rRNA gene was generated using RNAmmer v1.2 (85) to identify nearly complete 16S rRNA genes in the 95 Ca. Bathyarchaeia genomes, with the parameters as following: -S arc -multi -m ssu. MAG DNA sequences were searched using the BLASTn program (86) against RDP database (87) (downloaded 18 October 2018) to detect partial 16S rRNA genes not detected by RNAmmer. Only sequences with lengths .300 bp were taken into consideration. The 16S rRNA gene sequences of 23 Ca. Bathyarchaeia subgroups classified by Zhou et al. (16) and Feng et al. (22) were used as phylogenetic anchors. All sequences were aligned together using the MAFFT v6.864b (88) online server with the strategy as follows: iterative refinement method "FFT -NS -I." The poorly aligned regions were trimmed by TrimAl v1.4.rev22 (89) with the parameters as follows: -gt 0.05 -cons 50. The 16S rRNA gene phylogeny was generated using IQ-TREE by iterating 1,000 times and the best-fit model was SYM1R10.
For the gene taxonomies of mtrA, rgy, mtrH, and nuoEFG, amino acid sequences were downloaded from NCBI by applying BLAST searches with the corresponding sequences from present Ca. Bathyarchaeia MAGs as inputs. Gene sequences of mcrABG from previous studies were used for analyses (27). Individual genes were aligned using MUSCLE v3.8.31 (90) by iterating 100 times. Gene complexes, such as nuoEFG and mcrABG, were concatenated. Poorly aligned regions were eliminated using TrimAl with the same parameters as above. Phylogenies were reconstructed using IQ-TREE with ultrafast bootstrapping (-bb 1000), as well as Shimodaira-Hasegawa-like approximate likelihood-ratio test (SH-aLRT, -alrt 1000). The best models are reported in the corresponding figure legends.
Comparative genomics between thermal environments and nonthermal environments. Genomes with completeness of $70% were picked for comparative genomics analysis, resulting in 77 of the 95 Ca. Bathyarchaeia MAGs being selected, and were further classified into thermal and nonthermal groups based on their habitat temperature. PCA clustering analyses based on Euclidean distances were performed for selected genomes annotated with the KEGG database using vegan package v2.5-6 (92). Significant differences among groups were examined using the analysis of similarities (ANOSIM). The differences of relative abundances of KEGG categories were examined between two groups using the Wilcoxon rank-sum test. All P values were adjusted using the "BH" correction in R. The comparisons of CAZyme number in each Ca. Bathyarchaeia family pair was analyzed by the least significant difference (LSD) test.
Evolutionary history reconstruction of alkane and methane metabolism. Genes related to alkane activation, alkyl-CoM transformation, acyl-CoA oxidation, carbon fixation via Wood-Ljungdahl pathway, and coenzyme cycling and energy conservation were selected for the present analysis. Putative proteincoding sequences were picked from all 95 Ca. Bathyarchaeia genomes and aligned using MUSCLE. Poorly aligned regions were eliminated using TrimAl and phylogenies of selected genes were generated using IQ-TREE. To infer the potential evolution scenarios of gene gain, duplication, transfer, and loss (DTL), amalgamated likelihood estimation (ALE) was used to calculate the likelihood for each of the 98 gene families encoded by 95 Ca. Bathyarchaeia genomes. The species tree was constructed on a concatenation of 122 conserved single-copy genes, as described above. The rates of DTL were inferred from the data using maximum likelihood optimization and reconciliations between gene trees and species tree were conducted using the ALEml_undated algorithm in the ALE package (93).
Data availability. Metagenome-assembled genomes described in this study have been deposited to NCBI under the BioProject PRJNA544494: BioSample id SAMN18244059, SAMN18244060, SAMN18253264, SAMN18253267, SAMN18253270, SAMN18838809 to SAMN18838815, and the accession numbers are JAGTQA000000000 to JAGTQZ000000000, JAGTRA000000000 to JAGTRI000000000. The data sets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, XLS file, 0.05 MB.