Microbial diversity in an early earth analogue: From ancient novelty to modern commonality

Life emerged and diversied in the absence of molecular oxygen 1 . The prevailing anoxia and unique sulfur chemistry in the Paleo-, Meso- and Neoarchean, and early Proterozoic eons may have supported microbial communities that are drastically different than those currently thriving on the earth’s surface 2– 4 . Zodletone spring in southwestern Oklahoma represents a unique habitat where spatial sampling could substitute for geological eons: from the anoxic, surcial light-exposed sediments simulating a preoxygenated earth, to overlaid water column where air exposure simulates the relentless oxygen intrusion during the Neo-Proterozoic 5 . We discovered a remarkably diverse microbial community in the spring sediments, with two thirds (340/516) of the metagenomic assembled genomes belonging to 200 bacterial and archaeal families that were either previously undescribed or are extremely rare elsewhere on earth. Such diversity is underpinned by the widespread occurrence of sulte-, thiosulfate-, tetrathionate-, and sulfur-reduction, in contrast with a paucity of sulfate-reduction metabolism in those taxa. This greatly expands the diversity of lineages mediating reductive sulfur cycling processes in the tree of life. In the overlaying water community oxygen intrusion leads to the establishment of a signicantly less diverse community dominated by well-characterized lineages and the prevalence of oxidative sulfur cycling processes. Such transition from ancient novelty to modern commonality underscores the profound impact of the great oxygenation event on the earth’s surcial anoxic community.. It also suggests that novel and rare lineages encountered in current anaerobic habitats could represent taxa once thriving on the anoxic earth that have failed to adapt to the progressive oxygenation. sequencing HiSeq paired-end reads for using FastQC followed . kmer Kmer Bowtie2


Main
Sulfur is one of the most abundant elements on earth, exhibiting a wide range of oxidation states (-2 to + 6). Microorganisms have evolved a plethora of genes and pathways for exploiting sulfur-redox reactions for energy generation. Thermodynamic considerations limit reductive sulfur processes to habitats where oxygen is limited. Sulfate is highly abundant on the current earth, and hence sulfate-reduction dominates reductive processes in the global sulfur cycle in permanently and seasonally anoxic and hypoxic habitats in marine 6-8 , freshwater 9 , terrestrial 10 , and subsurface 11 ecosystems. However, during the rst two billion years of its history the earth's surface was completely anoxic, and the availability and speciation of various sulfur species greatly differed from current values. Sulfate levels were signi cantly lower (estimates of < 200 µM-1mM from the Archean up to the Paleoproterozoic; 2.3 Gya) 2,4,5,12 , while sulfurcycle intermediates (SCI) appear to have played an important role in shaping the ancient sulfur cycle 3 .
Modeling suggests that mM levels of SO 3 2− were attained in the Archean anoxic shallow sur cial aquifers as a result of dissolution of the volcanic SO 2 prevailing in aquatic habitats 12 . Isotopic studies have demonstrated the importance of elemental sulfur, sul te, and thiosulfate reduction in the Archean 3,13 .
The evolution of life (3.8 to 4.0 Gya) in the early Archean era and the subsequent evolution of major microbial clades in the Archean and early Proterozoic 1 occurred within this background of anoxia and sulfur-chemistry. As such, it has been speculated that organisms using intermediate forms of sulfur were likely more common than sulfate-reducing organisms in pre-oxygenated earth 3 . However, while isotopic fractionation, modeling, and microscopic studies provide clues on prevailing sulfur speciation patterns and biological processes, the identity of microorganisms mediating such processes is unknown. This is due to constraints on preservation of nucleic acids and other biological macromolecules, with the oldest successful DNA sequenced sample being only 1.2 M years old 14 .
Investigation of modern ecosystems with conditions resembling those prevailing on the ancient earth could provide important clues to the nature, identity, and evolutionary trajectory of microorganisms that once thrived in these eons. In Zodletone spring in southwestern Oklahoma, the prevailing conditions are analogous to those predominant on the earth's surface in the late Archean and early Proterozoic eons (S. text, Figure S1). At the source of the spring, anoxic, sur cial, light-exposed conditions are maintained in the sediments by constant emergence of sul de-saturated water at the spring source from anoxic underground water formations in the Anadarko basin, along with gaseous hydrocarbons, which occur in seeps in the general vicinity. These sur cial anoxic conditions also support a sulfur chemistry characterized by high levels of sul de, sul te, sulfur (soluble polysul de), thiosulfate, and a low level of sulfate (S. text). Further, the sediments at the source of the spring are overlaid by an air-exposed water column (60 cm), where oxygen intrusion leads to a vertical oxygen gradient ranging from oxygen saturation in the top 1 µm, to hypoxic in the middle, to anoxic in deeper layers overlaying the sediments.
We combined metagenomic, metatranscriptomic, and amplicon-based approaches to characterize the microbial communities and sulfur cycling processes in Zodletone spring. The anoxic sediments served as a proxy for the late Archean and early Proterozoic communities, while the overlaid hypoxic water pool served as a proxy for oxygen intrusion during the Neoproterozoic. We binned 516 metagenomeassembled genomes (MAGs) from 281 Gbp raw sequence data from the anoxic spring source (Table S1).
In contrast, only 114 genomes were binned from 323 Gbp of raw data from the overlaid planktonic water community. Genomes recovered from the water column belonged to a lower number of families (n = 79) and phyla (n = 27) ( Table S1). The community exhibited a much lower proportion of previously undescribed and LRD taxa when compared to the sediment community ( Fig. 2a-b). Sixty-two genomes belonged to shared families with the sediment community, and 52 were water-speci c. Water-speci c genomes mostly belonged to well-characterized microbial lineages within the Alphaproteobacteria, Gammaproteobacteria, Camplylobacterota, Firmicutes/Firmicutes_A, Bacteroidota, and Spriochaetota (S. text, Fig. 1). As such, it appears that oxygen intrusion limits the growth of many of the unique microbes prevalent in the sediment, and triggered the proliferation of cosmopolitan lineages within the bacterial tree of life.
Analysis of predicted metabolic capacities and energy conservation pathways in all obtained sediment genomes revealed prevalence of reductive sulfur processes (149/564 genomes) in Zodletone spring sediments communities (S. text, Table S2). Strict fermentative capacities were also widespread, being observed in (100/564 genomes). Strict fermentative lineages mediated the utilization of a wide range of organic substrates, e.g. sugars, amino acids, short chain fatty acids, complex carbohydrates, long chain fatty acids, and short chain alkanes, producing numerous fermentative end products including lactate, formate, acetate, ethanol, succinate, and hydrogen (S. text, Table S2). Primary productivity in the sediments appears to be mostly mediated via hydrogen utilization (69 genomes) coupled to either sulfurcycle intermediates (SCI) reduction, or to CO 2 xation by hydrogenotrophic methanogens and acetogens (S. text, Table S2). On the other hand, aerobic, nitrate, or Fe 3+ respiration, chemolithotrophic nitri cation, and photosynthesis potentials were largely absent ( Fig. 1, S. text, Table S2).
To investigate sulfur-transformation processes in the spring sediments, we mapped the distribution of key sulfur-cycling genes, and inferred capacities in genomes via the occurrence of entire pathways (Fig. 3). Predicted capacities were subsequently substantiated by examining operon organization and phylogenetic analysis (Fig. 4, S4-S6). Further, time-series metatranscriptomic data were used to identify the metabolically active fraction of the community and document the expression of key pathways ( Figure   S7). Within the anoxic sediments, a total of 149 genomes (29 % of all genomes), belonging to 32 phyla and 97 families were involved in at least one reductive sulfur processes, while only 21 (4% of all genomes) encoded at least one sulfur oxidation pathway (Fig. 3 Table S3). Finally, 20 genomes encoded psrABC genes for polysul de reduction ( Fig. 3, S6, S. text, Table S3). Within lineages mediating reductive sulfur processes in Zodletone sediments, a wide range of substrates supporting sul dogenesis were identi ed (Table S2, Fig. 3, S. text). Metatranscriptomic analysis identi ed all S-species reduction/ disproportionation genes discussed above, with transcripts belonging to 51 different phyla identi ed (S. text, Figure S7).
In contrast, oxidative sulfur processes dominated the water community, while reductive sulfur-processes were extremely sparse (S. text, Fig. 3, Table S3). Pathways encoding sul de, sulfur, thiosulfate, tetrathionate, and/or sul te oxidation to sulfate present in 59/114 genomes (52% of all water genomes) belonging to 43 families in 13 phyla were identi ed. Encoded sul de and sulfur cycle intermediates (SCI) oxidation pathways included the SOX system (mediating oxidation of a wide range of reduced sulfurspecies to sulfate) in the well-characterized families Acidithiobacillaceae, Burkholderiaceae,  Table S3).
Collectively, phylogenomic analysis documents a unique microbial community in the spring anoxic sediments dominated by previously unencountered or extremely rare taxa on the current earth. Metabolic analysis suggests that such unique community is sustained by respiration and disproportionation of a wide range of SCI abundant in the spring. We posit that the high level of diversity and the abundance of previously undescribed and LRD lineages within Zodletone spring sediment S-reducing microbial (SRM) community could be attributed to two factors. First, the availability of a wide range of sulfur cycle intermediates selects for a more diverse community of SRM in the spring, when compared to predominantly sulfate-driven marine and freshwater ecosystems. The prevailing conditions in Zodletone spring (anaerobic, sur cial, light-exposed, sul dic, with abundance of SCI) remarkably mimic the conditions that prevailed in the late Archean and early Proterozoic eons. Due to the sensitivity and expected lack of adaptive mechanisms to cope with atmospheric oxygen in multiple strict anaerobes, as well as the chemical instability of multiple S species in an oxygenated atmosphere, the Great Oxidation Event (GOE) exerted a profound negative impact on anaerobic sur cial life forms (the oxygen catastrophe) leading to the rst and arguably most profound extinction event in earth's history. This study infers that the microbial communities presumably thriving in anoxic sur cial earth were extremely diverse with an abundance of SRM lineages. We argue that many such lineages were driven to extreme rarity in current environments, a re ection of their failure to adapt to the rise of atmospheric oxygen and the subsequent associated changes in earth's redox status and sulfur chemistry.
In addition to suppressing anaerobiosis in atmospherically-exposed habitats, the GOE also led to a signi cant change in the S cycle, from one based on atmospheric inputs (volcanic SO 2 release and dissolution in aqueous habitats) to one dependent on oxidative weathering. Such transition has led to the release of large amounts of sulfate derived from the oxidation of pyrite and the dissolution of sulfate minerals 29 , hitherto a minor byproduct of Archean abiotic and biotic reactions 3,30 . Our analysis comparing anoxic sediment communities to overlaid water hypoxic communities suggest that oxygen intrusion and loss of niches associated with geological transformations have triggered a mass extinction/con nement of a wide swath of branches within the bacterial tree, and greatly altered the microbial community mediating reductive sulfur transformation processes on earth.
In summary, by examining microbial diversity in Zodletone spring, we greatly expand the overall diversity within the tree of life via the discovery and characterization of a numerous previously uncharacterized lineages; and signi cantly enrich representation of a multitude of LRD lineages. We also describe a unique sulfur-cycling community in the spring that is largely dependent on sul te, thiosulfate, sulfur, and tetrathionate, rather than sulfate, as an electron acceptor. Given the inferred similarity to conditions prevailing prior to the GOE, we consider the spring an invaluable portal to investigate microbial communities and processes thriving on the earth's surface during these eons. Furthermore, we posit that the GOE precipitated the near extinction of a wide range of phylogenetically distinct oxygen-sensitive lineages and drastically altered the microbial reductive sulfur-cycling community from sul te, sulfur, and thiosulfate reducers to predominantly sulfate reduction.

Methods
Site description and geochemistry. Zodletone spring is located in the Anadarko Basin of western Oklahoma (N34.99562° W98.68895°). The spring arises from underground, where water is pumped out slowly along with sediments. Sediments settled at the source of the spring, a boxed square 1m 2 ( Figure  S1) are overlaid with water that collects and settles in a concrete pool erected in the early 1900s. The settled water is 50-cm deep above the sediments and is exposed to atmospheric air. Water and sediments originating from the spring source are highly reduced due to the high dissolved sul de levels (8-10 mM) in the spring sediments. Microsensor measurements show a completely anoxic (oxygen levels < 0.1 μM) and highly reduced source sediments. Oxygen levels slowly increase in the overlaid water column from 2-4 μM at the 2 mm above the source to complete oxygen exposure on the top of the water column 36 . The spring geochemistry has regularly been monitored during the last two decades [36][37][38]  GTGKCAGCMGCCGCGGTAA, for Crenarchaeota) 42 , 515F-Nano (5'GTGGCAGYCGCCRCGGKAA, for Nanoarchaeota) 42 , 515F-TM7 (5' GTGCCAGCMGCCGCGGTCA for TM7/Saccharibacteria) 43 as forward mix and 805RB (5' GGACTACNVGGGTWTCTAAT) 44 and 805R-Nano (5'GGAMTACHGGGGTCTCTAAT, for Nanoarchaeota) 42 as reverse mix. Puri ed barcoded amplicon libraries were sequenced on an Illumina MiSeq instrument (Illumina Inc., San Diego, CA) using a v2 500 cycle kit, according to manufacturer's protocol. Demultiplexed forward and reverse reads were imported as paired fastq les into QIIME2 v.
2020.8 45 for analysis. The DADA2 plugin was used to trim, denoise, pair, purge chimeras and select amplicon sequence variants (ASVs), using the command "qiime dada2 denoise-paired". Between 44k and 194k non-chimeric sequences were obtained for the individual samples. The ASVs were taxonomically classi ed in QIIME2 using a trained classi er built based on Silva-138-99 rRNA sequence database. The ASVs were assigned to 1643 taxonomic categories corresponding to taxonomic level 7 (species and above) and to 932 genera (level 6). There were no dominating species or genera in either the water or sediment: in the water sample only three taxa reached 3-5% relative abundance, while in the sediment, only three taxa accounted for 2-4% of the community, with 80% of the species being less that 0.1% of the community. Alpha rarefaction curves indicated saturation of observed sequence features (ASVs) at a sequencing depth of 70-80k sequences, the combined number of sequences being 514510 for water and 309383 for sediment.
Metagenome sequencing, assembly, and binning. Metagenomic sequencing was conducted using the services of a commercial provider (Novogene, Beijing, China) using two lanes of the Illumina HiSeq 2500 system for each of the water and sediment samples. Transcriptomic sequencing using Illumina HiSeq 2500 2 × 150bp paired-end technology was conducted using the services of a commercial provider (Novogene Corporation, Beijing, China). Metagenomic reads were assessed for quality using FastQC followed by quality ltering and trimming using Trimmomatic v0.38 46 . High quality reads were assembled into contigs using MegaHit (v.1.1.3) with minimum Kmer of 27, maximum kmer of 127, Kmer step of 10, and minimum contig length of 1000 bp. Bowtie2 was used to calculate sequencing coverage of each contig by mapping the raw reads back to the contigs. Assembled contigs were searched for ribosomal protein S3 (rpS3) sequences using a custom hidden Markov model (HMM) built from Uniprot reference sequences assigned to the Kegg Orthologies K02982, and K02984 (corresponding to the bacterial, and archaeal RPS3, respectively) using hmmbuild (HMMER 3.1b2). rpS3 Sequences were clustered at 99% ID using CD-HIT as previously suggested for a putative species cutoff for rpS3 data 47 . Taxonomic a liations of (rpS3) groups were identi ed using Diamond Blast against the GTDB r95 database 48 .
Contigs from the sediment and water assemblies were binned into draft genomes using both Metabat 49 and MaxBin2 50 . DasTool was used to select the highest quality bins from each metagenome assembly 51 . CheckM was used for estimation of genome completeness, strain heterogeneity, and contamination 52 . Genomic bins showing contamination levels higher than 10%, were further re ned based on the taxonomic a liations of the binned contigs, as well as the GC content, tetranucleotide frequency, and coverage levels using Re neM 53 . Low quality bins (>10% contamination) were cleaned by removal of the identi ed outlier contigs, and the percentage completeness and contamination were again re-checked using CheckM.
Genomes classi cation, annotation, and metabolic analysis. Taxonomic classi cations followed the Genome Taxonomy Database (GTDB) release r95 48 , and were carried out using the classify_work ow in GTDB-Tk (v1.1.0) 32 . Phylogenomic analysis utilized the concatenated alignment of a set of 120 singlecopy bacterial genes, and 122 single-copy archaeal genes 48 generated by the GTDB-Tk. Maximumlikelihood phylogenomic tree was constructed in FastTree using the default parameters 31 .
Annotation and metabolic analysis. Protein-coding genes in genomic bins were predicted using Prodigal 54 . GhostKOALA 55 was used for the functional annotation of every predicted open reading frame in every genomic bin and to assign protein-coding genes to KEGG orthologies (KOs).
Analysis of sulfur cycling genes. To identify taxa mediating key sulfur-transformation processes in the spring sediments, we mapped the distribution of key sulfur-cycling genes in all genomes and deduced capacities in individual genomes by documenting the occurrence of entire pathways (as explained below in details). This was subsequently con rmed by phylogenetic analysis and examining contiguous genes organization in processes requiring multi-subunit and/or multi-gene. Further, expression data was used from three time points to identify the fraction of the community that is metabolically actively involved in the process. Analysis of S cycling capabilities was conducted on individual genomic bins by building and scanning hidden markov model (HMM) pro les as explained below. To build the sulfur-genes HMM pro les, Uniprot reference sequences for all genes with an assigned KO number were downloaded, aligned using Clustal-omega 56 , and the alignment was used to build an HMM pro le using hmmbuild (HMMER 3.1b2) 57 . For genes not assigned a KO number (e.g. otr, tsdA, tetH), a representative protein was compared against the KEGG Genes database using Blastp and signi cant hits (those with e-values < e-80) were downloaded and used to build HMM pro les as explained above. The custom-built HMM pro les were then used to scan the analyzed genomes for signi cant hits using hmmscan (HMMER 3.1b2) 57 with the option -T 100 to limit the results to only those pro les with an alignment score of at least 100. Further con rmation was achieved through phylogenetic assessment and tree building procedures, in which potential candidates identi ed by hmmscan were aligned to the reference sequences used to build the custom HMM pro les using Clustal-omega 56 , followed by maximum likelihood phylogenetic tree construction using FastTree 31 . Only candidates clustering with reference sequences were deemed true hits and were assigned to the corresponding KO. for electron transfer, the enzyme dissimilatory sul te reductase [DsrAB; EC:1.8.99.5] and its co-substrate DsrC for dissimilatory sul te reduction to sul de, and the sul te reduction-associated membrane complex DsrMKJOP for linking cytoplasmic sul te reduction to energy conservation.
Sul te-reduction. Sul te could be utilized by most sulfate-reducing microorganisms 58 . Dedicated sul tereduction capacity was assessed by the presence of the dissimilatory sul te reductase system explained above 59,60 with the lack of sulfate-activation (Sat) and reduction (Apr) genes. In addition, sul tereduction was assessed via the sole or co-occurrence of the anaerobic sul te reductase (AsrABC) system 61 , along with the membrane-bound associated complex (HdrABC) for transfer of electrons to the AsrC subunit 62 . The Asr enzyme has been shown to function in the cytoplasm in Salmonella typhimurium to reduce the sul te released from respiratory reduction of tetrathionate and thiosulfate 61 . However, a scenario where the Asr enzyme is involved in sul te respiration is possible via electron transfer from a membrane-bound associated complex to AsrC (the physiological partner of AsrAB). A plausible candidate for this membrane complex is the heterodisul de reductase-related enzymes (HdrABC), analogous to what was suggested for DsrC (the physiological partner of DsrAB) in organisms lacking the sul te reduction-associated membrane complex DsrMKJOP 62 .
Polysul de reduction: In addition to sulfate and sul te, Zodletone spring is euxinic with extremely high levels of zero valent sulfur, available as soluble polysul de. Respiratory polysul de reduction was assessed via the identi cation of the membrane-bound molybdoenzyme complex PsrABC, which reduces polysul des with electrons obtained from either a hydrogenase or a formate dehydrogenase through a quinone electron carrier 63 . In addition to the membrane-bound Psr system, representatives of the cytolpasmic sulfurhydrogenase I (HydABCD system), and/or II (ShyABCD system) were identi ed.
However, although these enzymes have been shown to be dissimilatory in the archaeon Pyrococcus furiosus 64,65 , their involvement in an ETS-associated respiration is currently unclear.
Thiosulfate reduction/ disproportionation: Thiosulfate occurs in natural environments as a result of the reaction of sul te with bisul de (HS -) 66 . Thiosulfate is relatively stable at neutral pH and is present in high levels in Zodletone spring, Thiosulfate contains two sulfur atoms: a sulfone-sulfur (oxidation state +5), and a sulfane-sulfur (oxidation state -1). As such, thiosulfate can be disproportionated where the sulfone-sulfur is reduced (serves as an electron acceptor), and the sulfane-sulfur is oxidized (serves as an electron donor), with the products being hydrogen sul de, and sul te, respectively. We searched for genes encoding the three known pathways for thiosulfate-disproportionation. Following the disproportionation of thiosulfate to sul te and hydrogen sul de, microorganisms differ in the fate of the produced sul te. Some microorganisms reduce the released sul te to sul de via a Dsr or Asr dissimilatory sul te reductase 75 ), leading to complete reduction of one thiosulfate molecule to two sul des (thiosulfate-reduction). Others oxidize the released sul te to sulfate via the reversal of the sulfate reduction pathway 74,82 , or via the sul te dehydrogenases SorAB or SoeABC 83 , leading to the nal conversion of one thiosulfate molecule to one sul de and one sulfate molecules. The distribution of all thiosulfate disproportionation capacities were assessed by the occurrence of one of the three pathways described above, and the fate of sul te in genomes mediating the initial disproportionation steps was assessed as described above.
Tetrathionate reduction: Tetrathionate has two sulfur atoms in oxidation state of 0 while the other two are in oxidation state of +5. In nature, tetrathionate is formed via the biotic or abiotic oxidation of thiosulfate under anoxic conditions 66 . Some microorganisms are capable of tetrathionate respiration via membranebound tetrathionate reductases that will reduce tetrathionate to thiosulfate serving as the terminal oxidase in a short electron transport system. Enzymes mediating such process include octaheme tetrathionate reductase (otr) 84 , as well as the guanylyl molybdenum cofactor-containing tetrathionate reductase (ttrABC) 85 . The produced thiosulfate could be metabolized through disproportionation as described above.
Oxidative sulfur processes. The versatile sulfur oxidation (SOX enzyme complex) system was assessed in all genomes. Prodigal-predicted genes from all genomes using Kallisto with default settings 90 . The calculated transcripts per million (TPM) were used to obtain total transcription levels for genes identi ed from genomic analysis as involved in S cycling in the spring.
Additional metabolic analysis. For all other non-sulfur related functional predictions, combined GhostKOALA outputs of all genomes belonging to a certain order (for orders with 5 genomes or less; n=206), or family (for orders with more than 5 genomes; n=85) were checked for the presence of groups of KOs constituting metabolic pathways (additional le 1). The list of these 291 lineages is shown in  nodes with >70% support. Tracks around the tree represent (from innermost to outermost): cultured status at the order level, abundance in Gtdb based on the number of available genomes (abundant, > 5 genomes; rare, 5 genomes; novel, no genomes in Gtdb), percentage database enrichment (number of genomes belonging to a certain order binned in the current study / number of genomes belonging to the same order in Gtdb), energy conservation capabilities depicted by colored circles as shown in the gure legend, and the number of MAGs belonging to each order binned from the sediment (blue bars) and the water (orange bars). For orders with 20 genomes, the family-level delineation is shown in Figure S3. ZNC = novel class; ZNO = novel order.