Sequence similarity network and protein structure prediction offer insights into the evolution of microbial pathways for ferrous iron oxidation

ABSTRACT Dissimilatory ferrous iron [Fe(II)] oxidation is a well-established microbial energy generation strategy. This study aims to comprehensively investigate the distribution and evolution of recognized Fe(II) oxidation pathways through comparative analysis. Interestingly, we have discovered a wide range of taxonomic groups that harbor homologs to known Fe(II) oxidation proteins. The presence of these homologs among phylogenetically distant lineages and their frequent association with mobile genetic elements strongly suggest horizontal gene transfer events involving Fe(II) oxidation proteins, such as the rus operon of Acidithiobacillus and Cyc572 from Leptospirillum lineages belonging to classes Gammaproteobacteria and Betaproteobacteria often present at the hub positions of the protein sequence similarity networks from which homologs of other taxa are derived. In addition, RoseTTAFold predictions have provided valuable insights into the structural characteristics of previously unknown Fe(II) oxidation components. Despite having limited sequence identity, a significant number of acknowledged proteins involved in different Fe(II) oxidation pathways exhibit close structural similarities, including Cyc2 and Cyc572. Collectively, this study significantly enhances our understanding of the distribution and evolution of microbial ferrous iron oxidation pathways. IMPORTANCE Microbial Fe(II) oxidation is a crucial process that harnesses and converts the energy available in Fe, contributing significantly to global element cycling. However, there are still many aspects of this process that remain unexplored. In this study, we utilized a combination of comparative genomics, sequence similarity network analysis, and artificial intelligence-driven structure modeling methods to address the lack of structural information on Fe(II) oxidation proteins and offer a comprehensive perspective on the evolution of Fe(II) oxidation pathways. Our findings suggest that several microbial Fe(II) oxidation pathways currently known may have originated within classes Gammaproteobacteria and Betaproteobacteria.

I ron is the fourth most abundant element in the Earth's crust, with an estimated abundance of around 5% (1).The ability to perform dissimilatory ferrous iron [Fe(II)] oxidation has been discovered and characterized in various lineages of Bacteria and Archaea.This process has had a significant impact on global biogeochemical cycling processes, including the formation and dissolution of iron ore.The switch between the two most common iron redox states, ferrous iron [Fe(II)] and ferric iron [Fe(III)], transfers just one electron.Despite this simplicity, it has become one of the oldest and most widespread energy-producing strategies.In fact, there is a hypothesis that the Fe(II)-oxidizing chemoautotrophic microbes may have played a key role in the formation of banded iron formations, which are the most prominent sedimentary iron mineral deposits on Earth and were formed during the late Archean and early Proterozoic Eon (2).
Research on microbial Fe(II) oxidation has primarily focused on acidophilic microor ganisms, such as Acidithiobacillus (At.) ferrooxidans (3) and Leptospirillum (L.) ferrooxidans (4), that rely on the oxidation of iron minerals, such as ferrous sulfide (FeS) and pyrite (FeS 2 ), for energy and growth.In acidic environments like acid mine drainage, oxygen (O 2 ) serves as the most abundant and bioenergetically favorable terminal electron acceptor for Fe(II) oxidation by these microorganisms (5,6).However, certain neutro philic bacterial lineages, such as Sideroxydans (S.) lithotrophicus (7) and Mariprofundus (M.) ferrooxydans (8), have evolved the capability to carry out oxygen-dependent Fe(II) oxidation at the iron-rich microoxic interface through specialized electron transport chains, even at circumneutral pH.These Fe(II) oxidation processes generate ATP and NAD(H), which are essential for sustaining microbial metabolism, including carbon assimilation and biosynthesis.
There are currently at least eight distinct microbial pathways/models for Fe(II) oxidation that have been deciphered (summarized in Table 1).Additionally, an increasing number of studies are focused on identifying novel genes or gene clusters involved in Fe(II) oxidation and validating the iron oxidation capabilities conferred by homologs of these pathways within a wider range of taxa (9)(10)(11).
Although the pathways of Fe(II) oxidation vary significantly among different iron oxidizers, there are certain similarities in their overall organizational patterns (33): 1. Oxidation of Fe(II) occurs either in the periplasmic space or at the outer mem brane.

The abundance of functional redox proteins is relatively high (especially in
Gram-negative bacteria), even though the difference in redox potential between Fe(II)/Fe(III) and O 2 /H 2 O is small.3. Cytochromes generally act as the Fe(II) oxidase.c-Type cytochromes, as the main Fe(II) oxidases, contain a heme group that acts as a prosthetic group for electron transport (C-Cyt) (34).They are typically found on the outer membrane of bacteria.These enzymes exhibit a high affinity for Fe(II) and function as electron carriers, shuttling electrons from Fe(II) to an electron acceptor, such as molecular oxygen or other terminal electron acceptors, thus driving the oxidation reaction as a whole.The Fe(II) oxidation process mediated by c-type cytochromes consists of multiple several steps (35).Initially, the cytochrome binds to the Fe(II) ion, facilitating the transfer of electrons from Fe(II) to the heme group within the cytochrome.Binding of Fe(II) induces a conformational change in the cytochrome, enabling interaction with other enzymes or proteins involved in electron transfer.As a result, the electrons are transferred through the electron transport chain, facilitating the reduction of electron acceptors and the generation of energy for microbial growth.Additionally, the oxidase activity of c-type cytochromes can be regulated by various factors, such as oxygen concentration, pH, and the presence of other redox-active metals (35)(36)(37)(38).These enzymes display diverse structural and functional properties, enabling microbial communities to adapt to various environmental conditions.4. The transfer of electrons from Fe(II) to the cytoplasmic membrane is facilitated by vertically organized electron shuttles that connect the external medium with the cytoplasm.These shuttles are distinct from the laterally arranged redox proteins found in most respiratory chains.
The vertical organization of the iron respiratory chain components offers several advantages (33): 1.It prevents the precipitation of Fe(III) in the cytoplasm due to its neutral pH by sequestering iron outside the cell.2. It separates the respiratory chain from oxidative stress caused by the Fenton reaction.3. It helps maintain a neutral pH in the cytoplasm, which is essential for the survival of acidophiles, by consuming protons through the reduction of O 2 to H 2 O.
An example from the best-studied case of the model microorganism At. ferrooxidans (13) highlights these processes: electrons are initially extracted from extracellular Fe(II) by the outer membrane cytochrome c Cyc2 and then transferred to the periplasmic blue copper protein rusticyanin.From the "branch point" protein rusticyanin, the electrons can then flow either downstream, reducing O 2 to water via cytochrome c 4 Cyc1 and the aa 3 -type cytochrome oxidase complex, or upstream, utilizing the proton motive force across the inner membrane to overcome the unfavorable thermodynamic gradient and transfer electrons to the NADH1 complex, bypassing the cytochrome c 4 CycA1, the bc 1 complex and membrane-associated quinones.Higher redox potentials are observed at acidic pH levels than at neutral pH conditions, wherein both Fe(II)/Fe(III) species remain soluble.For example, in the presence of sulfate, which is the most common Cyt caa 3 caa 3 -type cytochrome c oxidase complexing agent in acid mine/rock drainage waters, the redox potential at pH 3 can reach approximately +0.720 V (5).
Previous studies have primarily investigated individual components of particular Fe(II) oxidation models or specific details within the Fe(II) oxidation pathway of certain species.Consequently, there is currently a deficiency in our overall understanding of the distribution and progression of these fascinating Fe(II) oxidation pathways.Furthermore, although various Fe(II) oxidizers have been proposed, our comprehension of their molecular-level Fe(II) oxidation pathways remains significantly lacking.Mean while, several Fe(II) oxidation components lack structure information, impeding further investigations into their molecular mechanisms.
We aimed to provide a comprehensive overview of the distribution and evolution of experimentally characterized microbial ferrous iron oxidation pathways.We ach ieved this objective by employing sequence similarity network (SSN) analysis, phyloge netic analysis, and genomic comparisons to offer a holistic perspective.Additionally, we utilized RoseTTAFold (39), an artificial intelligence (AI)-driven method, to gener ate high-quality three-dimensional (3D) structures of canonical proteins involved in microbial Fe(II) oxidation.These proteins have been under-studied and lack available structural data.Subsequently, we conducted fold recognition/comparison, conservation analysis, and electron hopping pathway prediction to gain insights into their mecha nisms.The SSN analysis is a computational approach used to investigate the relation ships between sequences based on their similarity (40).Unlike traditional phylogenetic tree construction, SSN analysis clusters sequences into groups called "network communi ties" based on sequence similarity, allowing for the detection of potential evolutionary relationships even among distantly related organisms (41).In our study, we employed the SSN analysis to explore the clustering patterns of proteins involved in iron oxidation across different organisms.By visualizing the network communities, we aimed to identify potential horizontal gene transfer (HGT) events or evolutionary connections that might have shaped the distribution and evolution of these iron metabolic cytochromes (42).

The overall models of Fe(II) oxidation
We first utilized FeGenie (43) to identify candidate genes associated with iron oxida tion in the genomes of reported iron oxidizer.From the FeGenie results, we obtained the representative sequences of known Fe(II) oxidation proteins as the initial query sequences for subsequent analyses (see Supplementary Material at https://doi.org/10.6084/m9.figshare.23652387.v1).We predicted the structure of Fe(II) oxidation proteins with unknown 3D structures using RoseTTAFold (39), as exhibited and discussed in detail in the following sections.We then constructed models that demonstrate the roles and arrangements of various Fe(II) oxidation proteins, as indicated in previous studies.These models offer a comprehensive view of the diverse Fe(II) oxidation pathways (Fig. 1).It is important to note that the predictions of RoseTTAFold (39), which are shown below, are primarily de novo.This means that they do not rely on any template that shares more than 30% identity with the query or covers more than 35% of the sequence.The structural predictions provide valuable insights that can help fill the gaps in our understanding of the functional roles and molecular details of these proteins in the whole Fe(II) oxidation-related system.
We further conduct SSN analyses on individual genes in the rus operon to visualize the distribution and evolution diagram of these genes and their putative homologs (Fig. 3).The first gene of the rus operon of Acidithiobacillus ferrooxidans encodes the cyto chrome c protein AfCyc2, which serves as the initial electron acceptor located on the outer membrane.The second gene encodes a cytochrome c 4 protein (AfCyc1), which associates with rusticyanin (Rus) and a copper-containing protein (Cup) to transfer electrons from AfCyc2 to the cytochrome aa 3 oxidase complex (CoxBACD).
Adjacent to the cox cluster in microbial iron oxidizer genomes are genes that participate in heme biosynthesis (ctaAB), the cofactor essential for cytochrome aa 3 oxidase (Fig. 2).SSN analysis revealed that homologs of CtaA and CtaB (heme A/O synthases) are widely distributed in lineages from Proteobacteria, Terrabacteria group, and even Eukaryota.From the sub-SSN illustrating the immediate neighbors of AfCtaA and AfCtaB, it is observed that AfCtaA and AfCtaB are closely related to homologs from Alphaproteobacteria lineages, including Hyphomicrobiales, Rhodospirillales, and Sphingo monadales.This suggests the occurrence of cross-class HGT (see Fig. S3 at https://doi.org/10.6084/m9.figshare.20079053.v4,dotted black frame).
The periplasmic proteins rusticyanin and AfCyc1 have previously been crystalized and characterized (45).According to RoseTTAFold, the structure of the undetermined protein AfCyc2, predicted to be a β-barrel porin-cytochrome fusion protein, aligns most closely with Protein Data Bank (PDB) entry 5O78 (phosphoporin phoE), with a TM-score of 0.80, root-mean-square deviation (RMSD) of 3.4 Å, and sequence identity of 10% when queried with the Dali server (46).This finding is consistent with a prior investigation (47).Average amino acid identities (AAIs) of the genes within the operons are indicated in blue.

The Cyc572-Cyc579 system
Another group of acidophilic and widely applied iron-oxidizing bacteria, known as Leptospirillum spp.(phylum Nitrospirae), has been found to utilize the outer membrane cytochrome c Cyc572 as a direct oxidant and first electron acceptor of Fe(II).Subse quently, they transfer electrons through the periplasmic cytochrome c Cyc579 and the cytochrome bc 1 , ultimately reaching a cbb 3 -type cytochrome c oxidase (Fig. 1b) (16)(17)(18).The genes encoding these components are not found to be clustered in the genomes of Leptospirillum spp.Interestingly, we have detected strong HGT signals of these iron metabolism genes, particularly the cytochrome c Cyc572 in Leptospirillum spp.Significant variability can be observed in the genome contexts of Leptospirillum's Cyc572, particu larly in terms of gene G+C content, when compared to their respective genomes (see Fig. 4; Fig. S7c at https://doi.org/10.6084/m9.figshare.20079053.v4).Moreover, numerous MGEs, including transposase and phage integrase, which are indicators of HGT, are found flanking the Cyc572 in Leptospirillum spp.These MGEs are present at an average of three to five per genome.Additionally, several mismatch-repair proteins that may modulate the efficiency of HGT are also found in this region (50) (Fig. 4).As illustrated in the SSN diagram, Leptospirillum's Cyc572 is derived from a dense cluster containing homologs from Acidobacteria and Nitrospira, which are putative HGT donors (see Fig. S8a at https://doi.org/10.6084/m9.figshare.20079053.v4).The SSN diagram also reveals similar results for other iron metabolism genes of Leptospirillum spp., although their adjacent taxa appear to differ.For example, the taxa Candidatus Brocadiales (phylum Planc tomycetes), Desulfobacterota (previous Deltaproteobacteria), and Nitrospinales (phylum Nitrospinae) are dominant in the first neighbors of Leptospirillum's cytochrome bc 1 .On the other hand, Leptospirillum's cytochrome cbb 3 oxidase is closely linking to homologs from cross-phylum lineages, including Bacillales (phylum Firmicutes) and Burkholderiales (phylum Proteobacteria) (see Fig. S8 at https://doi.org/10.6084/m9.figshare.20079053.v4).The predicted 3D model of Cyc572 has few close structural homologs when searched against the PDB database using DALI (46).It shows the highest alignment score with structures of PorB porin proteins (PDB entries 4AUI, 3WI4) with identities of about 10% and RMSDs of around 3.0 Å. PorB belongs to a group of channel-forming outer membrane proteins that uptake small solutes (51).Like PorB, Cyc572 is predicted to have a 16-stranded β-barrel topology with connecting turns and surface-exposed, inter-strand loops on the extracellular side.However, electrostatic surface potential analysis reveals strong electronegative charges on the surface of the funnel approaching the pore and the periplasmic periphery of Cyc572, in sharp contrast to the predominantly electropos itive surface of PorB (see Fig. S9 at https://doi.org/10.6084/m9.figshare.20079053.v4).Interestingly, several key residues previously reported are also conserved in Cyc572 (see Fig. S9c
In another neutrophilic bacteria, Sideroxydans lithotrophicus, Fe(II) oxidation is achieved through a Fe(II)-oxidizing decaheme cytochrome c protein called MtoA, nested within the outer membrane protein MtoB.Fe(II) is oxidized at the bacterial surface, and electrons are transferred to the quinone pool via the tetraheme cytochrome c protein CymA and a short cytochrome c protein MtoD (7,22).The SSN diagrams show that the homologs of these components are common in the phylum Proteobacteria, with lineages from classes Gammaproteobacteria and Betaproteobacteria often dominating the hub-like cluster (see Fig. S11 at https://doi.org/10.6084/m9.figshare.20079053.v4).Interestingly, homologs of PioB (Uniprot: A0A0S8L100) and CymA (Uniprot: A0A0S8BIT0/ A0A0S8GBH4) belonging to Acidithiobacillales, the aforementioned model iron-oxidizing acidophile, are also detected for the first time as far as we know.
The predicted models of Rhodopseudomonas' PioAB and Sideroxydans' MtoAB show the best PDB hits to MtrAB from the iron-reducing bacterium Shewanella baltica (55) (PDB 6R2Q).This suggests that they may traverse the bacterial outer membrane as an electrical bridge, connecting the extracellular and intracellular electron transfer networks.The PioA and MtoA polypeptides, like MtrA, exhibit very few regular secon dary structures, consisting mainly of flexible loops (60.6% and 64.3%, respectively), with 37.8%/34.2%helices and 1.6%/1.5% sheets.Additionally, PioA proteins from Rhodopseudomonas species have a unique extended N-terminal irregular helix-loop region (residue 1-270) and no heme-binding motif (see Fig. S12 at https://doi.org/10.6084/m9.figshare.20079053.v4).This region is rarely found in PioA homologs from lineages outside of Rhodopseudomonas.It is proposed that the periplasmic apo-PioA undergoes a proteolytic process for better heme incorporation.On the other hand, the non-heme holo-PioA C is necessary for the proper incorporation of PioB into the outer membrane.Any mutation in either the signal peptide residue or the PioA C region will result in the loss of phototrophic Fe(II) oxidation ability (20).
Ten conserved CxxCH motifs and distal histidines that encode well-aligned and connected heme coordinating positions are identified in proteins PioA/MtoA/MtrA.These motifs form an electron transfer bridge (see Fig. S12 at https://doi.org/10.6084/m9.figshare.20079053.v4)and suggest evolutionary connections.PioB and MtoB are proposed to associate with PioA/MtoA, forming a porin:cytochrome complex similar to the MtrAB complex (PDB 6R2Q) (55).The predicted 3D models of PioB and MtoB components exhibit porin-like topology, with 26 antiparallel β strands dominated by hydrophobic residues that confer solubility of the Pio/Mto complex in the lipidic outer membrane.PioB and MtoB are proposed to orient and entrap PioA/MtoA in a manner similar to MtrB's interaction with MtrA, ensuring that the heme chain is perpendicular to the membrane for optimized electron transfer routes (Fig. 1c and d).
The 3D model of Rhodoferax ferrireducens FoxE, in resemblance to Rhodobacter ferrooxidans FoxE (PDB 5MAB, with 65% sequence identity and RMSD 1.8 Å) is predic ted to have a di-heme topology dominated by alpha helices (Fig. 6a; see Fig. S16 at https://doi.org/10.6084/m9.figshare.20079053.v4).The R. ferrireducens FoxE polypep tide chain starts with a short protruding helix from N-terminus, followed by a hydro phobic structural core that contains two hemes covalently coordinated by conserved residues Cys37-Cys40-His41-Met208 and Met94-Cys147-Cys150-His151, respectively, with methionines occupying the distal axial positions of heme I and II.Besides, heme I and II of R. ferrireducens FoxE exhibit distinct orders in the attachment of the heme-binding motif CxxCH and the axial methionine ligand: methionine is preceding the CxxCH motif of heme II, but in heme I, the CxxCH motif is way ahead of the turn-back C-terminus Met208.In addition, a disulfide bridge between cysteines Cys58 and Cys236 (refer to Cys80 and Cys258 of PDB 5MAB) seems to supply an auxiliary covalent bond to sustain the conformation of the turn-back C-terminus (Fig. 6a).
The predicted 3D structure of FoxY is a compact propeller-like super-barrel with eight topologically identical propeller blades positioned radially around the protein center in pseudo-eightfold symmetry (Fig. 6b).Each β-propeller blade composes four antiparallel β strands (A-D), with the short A strand closest to the pseudo-eightfold axis and the D strand on the subunit surface.A typical tryptophan-docking (W) motif for structure stabilization is located on each strand D, except in W7 where a Phe308 replaces the common tryptophan residue (Fig. 6b and c).These aromatic residues putatively contribute to the electron transfer channel, as predicted by eMap (59) (see Fig. S17 at https://doi.org/10.6084/m9.figshare.20079053.v4).The topological features mentioned above also exist in peripheral membrane proteins, including cytochrome cd 1 nitrite reductase (60) (PDB 1QKS, with RMSD 2.4 Å), methanol dehydrogenase (61) (PDB 1H4I, with RMSD 3.0 Å), and BamB (62) (PDB 4HDJ, with RMSD 2.8 Å against FoxY) (see Fig. S18 at https://doi.org/10.6084/m9.figshare.20079053.v4).The similarity in structure between the iron respiratory component FoxY and the redox cytochrome cd 1 protein is intriguing in light of their putative evolutionary connection.Upon detailed comparison, it is observed that FoxY lacks the extended N-terminus loop region that harbors the heme-binding motif (CxxCH) seen in cytochrome cd 1 (PDB 1QKS), and the other loops of FoxY are generally much shorter than those of cytochrome cd 1 .Furthermore, the overall surface change of FoxY appears to be less negative compared to cytochrome cd 1 (Fig. 6b).Besides, the strands A-C of FoxY blade W8 (closure of the overall ring structure) derive from the C-terminus, while the D strand comes from the N-terminus.However, cytochrome cd 1 has a distinct blade W8 constitution that is made up of strands B-D coming from the N-terminus, plus an A strand from the C-terminus (Fig. 6b, highlighted with red dotted circles).
The overall predicted structure of FoxZ exhibits the basket-like DMT superfamily fold (Fig. 6c).A structural homolog search using the 3D model of FoxZ revealed significant structural resemblance to the distant DMT superfamily inner membrane protein (e.g., Starkeya novella YddG (63) (PDB 5I20) and Streptococcus pneumoniae LicB (64) (PDB 7B0K), which are involved in the transport of cationic compounds.These proteins superimpose with an RMSD of about 2.2 Å and have a sequence identity of approxi mately 21.0% (see Fig. S19 at https://doi.org/10.6084/m9.figshare.20079053.v4).Similar to other DMT family proteins, the 3D model of FoxZ consists of 10 transmembrane α-helices (TM), with both N-and C-termini situated on the cytoplasmic side and adopting an outward-open conformation.The overall topology of FoxZ comprises two inverted and repeated halves, with the N-terminal TM1 to TM5 domains and the C-terminal TM6 to TM10 parts arranged in twofold antiparallel pseudo-symmetry surrounding a central cavity.TM5 and TM10 in alliance with TM4 and TM9 form a bundle that contributes to the central cavity.It is observed that TM3, TM4, TM8, and TM9 of FoxZ are interrupted by short loops, forming short helical segments (e.g., TM4a, TM4b).Note that TM3 of YddG (PDB 5I20) and TM8 of LicB (PDB 7B0K) are consecutive.Besides, various conserved and essential aromatic residues previously reported to form an "aromatic box" surrounding the central cavity and participate in ligand coordination are also identified in similar positions of the FoxZ structure.These residues include Trp14 (TM1), Phe78 (TM3), and Trp163 (TM6) of FoxZ in correspondence to Trp17, Tyr78, and Trp163 of YddG (PDB 5I20) and Trp17, Tyr78, and Trp163 of LicB (PDB 7B0K).Moreover, conserved hydrophilic residues in the central cavity like Ser246 (TM9) and Ser254 (TM9) of FoxZ (in reference to Ser244 and Ser251 of YddG) are also discovered, which may supply binding sites for the hydrophilic groups of ligands (64) (Fig. 6c).
The bioenergetic pathways of the acidophilic archaeon Ferroplasma spp.appear to be relatively simple.These pathways consist of a fusion protein of heme/copper-type oxidase subunits I and III , as well as a stand-alone subunit II.Additionally, there is a copper protein known as sulfocyanin that associates with the oxidase subunit II (32).Sulfocyanin is proposed to serve as the primary electron acceptor from Fe(II) and directly interacts with the substrate at the cell surface (Fig. 1h).The predicted 3D model of sulfocyanin displays an RMSD of 2.9 Å and a weak sequence identity (14%) to rusticyanin (PDB 1CUR).The fusion protein of putative oxidase subunits I and III bears a close resemblance to Mycobacterium tuberculosis cytochrome c oxidase subunit 1 (PDB 7E1V) with RMSD 1.6 Å and 13% sequence identity.Conversely, subunit II shares homology with the quinol oxidase subunit CyoA of Escherichia coli (PDB 1CYX) with RMSD 2.2 Å and 28% sequence identity.

Distribution of Fe(II) oxidation proteins
The SSN diagrams above illustrate that homologs of characterized iron oxidation proteins are often found across a wide range of taxa, extending beyond model organisms.It is possible that these organisms also possess similar Fe(II) oxidation functions.However, we have limited knowledge about the extent to which these Fe(II) oxidation proteins diversify.To gain more insight into the overall evolution of microbial iron oxidation pathways, we performed an assessment and summary of the distribution of homologs of the previously mentioned Fe(II) oxidation proteins.We queried the non-reductant UniRef_90 database ( 69) (E-value 1e-20).Our results revealed a widely extended taxon profile, in which the components of the rus, pet, pio, mto, and foxEYZ operons are sparsely distributed overall but highly enriched in classes Gammaproteobacteria (with a total of 2,084 hits), Betaproteobacteria (with a total of 1,684 hits), and Alphaproteobacteria (with a total of 1,168 hits) (Fig. 7, highlighted with a red dotted rectangle).According to Timetree (70), these taxa are predicted to have originated around 2,621 million years ago (MYA), before the Great Oxidation Event (GOE, ~2,400 MYA) (Fig. 7, left).Surprisingly, homologs of the Mob protein are particularly enriched in the classes Sphingobacteriia (59 hits) and Cytophagia (44 hits), whereas considerable PioA/MtoA homologs are also identified in classes Acidobacteriia (27 hits) and the deep-branching bacterial extremophile Aquificae (seven hits) (Fig. 7).

DISCUSSION
In this study, we present a comprehensive analysis of the distribution and evolution of experimentally characterized microbial ferrous iron oxidation pathways.Our approach includes SSN analysis, phylogenetic analysis, genomic analysis, and structural compari sons.Through these methods, we have discovered a diverse range of taxa that possess homologs to known iron oxidation proteins.This finding suggests the potential of our method for the identification and characterization of novel Fe(II) oxidizers.
To summarize, our study yields two main findings regarding the evolution of microbial Fe(II) oxidation pathways: (a) Quite a number of acknowledged proteins involved in diverse Fe(II) oxidation pathways are structural homologs, exhibiting similar three-dimensional structures and functional characteristics to proteins involved in other Fe(II) oxidation pathways (Table 2).
AI-driven accurate protein prediction, such as RoseTTAFold (39), was selected by the journal Science as the top 1 breakthrough (71).With this approach, we have observed a striking pattern among Fe(II) oxidizers, which utilize a porin protein as the initial component of the iron respiratory chain to interact with the external environment/sub strate.This may be attributed to convergent evolution.Cyc2 from At. ferrooxidans/M.ferrooxydans, Cyc572 from Leptospirillum spp., and PioB/MtoB/MtrB from neutrophilic bacteria all display a β-barrel porin-like topology, with pairwise TM-align score ranging between 0.50 and 0.74 (TM-score over 0.5 is considered as having similar fold), despite the low sequence identities observed (6%-16%).
Porins are integral membrane proteins primarily found in the outer membranes of Gram-negative bacteria and mitochondria.They typically assume a β-barrel topology (72).In the context of Fe(II) oxidation, porin proteins serve as the initial component of the iron respiratory chain (73).They function by facilitating the transport of Fe(II) ions across the bacterial outer membrane and into the periplasmic space.The interaction between Fe(II) oxidizers and their external environment/substrate is critical for their survival and energy production.By utilizing the porin protein, the microorganisms establish a direct connection between their cellular machinery and the external iron source, facilitating the extracellular electron transfer from the Fe(II) substrate to the subsequent components of the iron respiratory chain.Cyc2, MtoAB, and PioAB are all involved in iron oxidation in neutrophilic bacteria while MtoAB and PioAB have not been reported in acidophilic bacteria.These proteins play roles in extracellular electron transfer and iron minerali zation processes (73).The functional similarity also suggests a potential evolutionary relationship among these proteins.The β-barrel cytochrome-porin fusion proteins, such as Cyc2 and Cyc572, appear to represent the more primitive evolutionary product of outer membrane porin proteins that have occasionally incorporated a heme-binding motif through gene fusion events.These fusion proteins are efficient in acidophilic iron oxidizers, such as Acidithiobacillus, and have gradually become dominant in nature (27).We propose that this cytochromeporin iron-oxidizing fusion protein has later evolved into the complex form (MtoAB/ PioAB) observed in neutrophilic iron oxidizer.This proposal is supported by several factors.Firstly, the cumulative number of protein folds generally increases over time (74).We observed that the number of β strands and radius of porin increased from approximately 17 β strands of Cyc2 to 26 β strands of PioB/MtoB/MtrB and 38.2 Å of Cyc2 to 50.4 Å of PioB/MtoB/MtrB, respectively.This is consistent with the proposal that acidophilic iron oxidizer emerges prior to neutrophilic iron oxidizer inhabiting fresh and marine waters (75).Secondly, the enlargement of the porin size of PioB/MtoB/MtrB may be necessary to accommodate the trapped, large, and elongated cytochrome/fila ment that contains abundant closely stacked hemes, such as PioA/MtoA/MtrA, that have evolved from c552-family cytochromes (76).The microbial Fe(II) oxidation at neutral pH is typically mediated by extracellular electron transfer (EET) mechanisms (35,76,77), which is guaranteed by the above configuration.EET involves the transfer of electrons from Fe(II) to extracellular electron acceptors, such as cytochromes, which are often associated with the outer membrane of microorganisms.This extracellular electron transfer process may help the neutrophilic iron oxidizers to counteract the problem of the insolubility of Fe(III) at neutral pH, where ferric hydroxide precipitates can clog around or inside the cell (35).
Resemblance in the folding structure is also observed between high-potential [4Fe-4S] iron-sulfur proteins (HiPIP), such as Iro from At. ferrooxidans and PioC from Rh. palustris (pairwise TM-align score 0.78), despite their low sequence identity of 15%.The conserved cysteine residues, Cys24, Cys27, Cys36, and Cys49 (in the order Iro), are crucial for ligating the [Fe(4)S(4)] cluster.Additionally, the aromatic residues Tyr14, Phe30, and Tyr48 (replaced by Trp86 in PioC), which play a significant role in protein stabilization and electron transfer, have been identified (78).The pioC gene is the final gene found in the pio operon and may have been acquired through HGT) and fused with pioAB (79).This hypothesis is supported by the presence of a homologous pioAB operon in S. lithotrophi cus, known as the mtoAB cluster.However, the mtoAB cluster lacks a gene encoding HiPIP, which is instead replaced by the cymA gene encoding a membrane-anchored cytochrome c (Fig. S3B) (7).Additionally, the pioC in-frame deletion mutant showed only a partial defect in Fe(II) oxidation, leading to the proposal that PioC could be replaced by other unknown small soluble electron carriers (19)(20)(21).The presence of HiPIP with redox potentials ranging from +0.05 to +0.5 V may have allowed for adaptation to the elevated oceanic redox potential resulting from oxygenation events (80).
Another example of resemblance in folding structure is observed among rusticyanin from At. ferrooxidans, sulfocyanin from F. acidarmanus, and Mco from Metallosphaera spp.These proteins demonstrate a common core topology, consisting of a β-barrel/sandwich core and two mixed β-sheets.The pairwise TM-align scores for these proteins range from 0.47 to 0.64.Despite a low sequence identity of 11%-14%, these proteins conserve key residues involved in copper ion binding.Specifically, the order of these residues in rusticyanin is His85/His143/Cys138/Met148, while sulfocyanin has Thr114 instead of Ser86.These residues are crucial for maintaining acid stability and modulating redox potential by facilitating hydrogen bond interactions (81,82).
The significant fold homology but low sequence identity observed in multiple iron oxidation proteins is similar to the characteristics observed in heat shock proteins and G protein-coupled receptors (83)(84)(85).There are several possible scenarios that may account for this similarity: 1. Ancient origin hypothesis: The iron oxidation gene cassette could have evolved in the last common ancestor.Although there is limited direct evidence, this hypothesis is supported by shared characteristics and conserved genes involved in iron oxidation among diverse microbial groups.In their review article titled "Geomicrobiology of Iron, " Kappler et al. (86) provide an overview of iron oxidation in various microbial lineages and discuss the potential ancient origin of these pathways.
2. HGT and later divergence: An alternative hypothesis suggests that the ability to oxidize ferrous iron arose independently in different lineages through horizontal gene transfer events.This scenario proposes that genes associated with iron oxidation were acquired through HGT, leading to the establishment of iron oxidation pathways in various microbial groups after their divergence.In the study conducted by He et al. (35), a thorough analysis of neutrophilic iron(II) oxidizer genomes was carried out to investi gate the potential acquisition of genes related to iron oxidation through HGT.It was consistently found that both the iron sulfur (Fe/S) protein and cytochrome c domains appeared in Bacteria and were later horizontally transferred to Archaea (via HGT) and Eukarya (via endosymbiosis) (87)(88)(89).Moreover, the giant mobile genetic elements known as "Borgs" frequently contain a variety of multi-copper oxidases, cupredoxins, and multi-heme cytochromes (90).
3. Interplay of vertical and horizontal evolution: It is possible that both vertical inheritance and HGT have contributed to the diversity of iron oxidation pathways.This theoretical scenario proposes that the last common ancestor possessed a primitive iron oxidation pathway which underwent subsequent modification and diversification through both vertical evolution and the acquisition of novel genes via HGT.This idea is supported by a study conducted by Barco et al. (28), which investigates the proteomic profile of an obligate iron-oxidizing chemolithoautotroph and highlights the interplay between vertical and horizontal evolution in shaping iron oxidation systems.
(b) Lineages belonging to classes Gammaproteobacteria and Betaproteobacteria are frequently found at the central positions of SSNs of Fe(II) oxidation protein, from which homologs of other taxa are derived.
The analyses of SSNs divide the protein family into clusters based on their sequence similarities, providing a more intuitive and quick visualization for the analysis of the taxonomic distribution and functional space of the target protein family (91).The hub cluster in an SSN may represent a more primitive form of proteins from which other clades (homologs) derived (92).Based on this assumption and our findings, it is proposed that many known microbial Fe(II) oxidation pathways may have originated within classes Gammaproteobacteria and Betaproteobacteria.
There are two additional pieces of evidence to complete the picture: (i) a significantly large number of homologs of the known Fe(II)-oxidizing related proteins are identified in classes Gammaproteobacteria (2,084 hits) and Betaproteobacteria (1,684 hits) compared to other taxa, and among these hits, several Fe(II) oxidation proteins (e.g., PioAB/MtoAB, and CymA) appear to be more widely distributed than the others; (ii) classes Gammap roteobacteria and Betaproteobacteria are predicted by Timetree (70) to have diversified from their last common ancestor (~2,621 MYA) prior to the GOE (~2,400 MYA, Fig. 7, left), whereas most other taxa that have sparsely distributed Fe(II)-oxidizing components are predicted to originate at a time period after the GOE (Fig. 7).For example, class Acidobacteriia that contains PioA/MtoA homologs is suggested to originate at around 2,009.3 MYA, and classes Sphingobacteriia and Cytophagia that harbor homologs of Mob protein are proposed to be present at about 920.1 MYA (70).
These findings provide support for the hypothesis that most key Fe(II) oxidation proteins originated within Proteobacteria branches Gammaproteobacteria and Betapro teobacteria, which likely emerged in the anoxic oceanic environment where abundant soluble Fe(II) are available due to the continuing weathering of the continental crust along with deep-sea hydrothermal convection prior to the GOE.In addition, these Fe(II) oxidation proteins likely performed reversible anoxygenic Fe(II) oxidation utilizing nitrate/nitrite (as an electron acceptor) rich in the primordial seawater produced through lightning-catalyzed nitrogen conversion (93)(94)(95).This mechanism is effective as the nitrogen oxide gases produced are continuously removed.Bacterial species that couple the oxidation of Fe(II) to nitrate reduction have been isolated from a wide range of habitats (96,97).They oxidize both soluble and insoluble Fe(II) (98,99).Oxidation of Fe(II) may also serve as an important detoxification strategy of toxic reactive nitrogen species in both photosynthetic and nitrate-reducing bacteria (100,101).The standard mem brane-bound supposed pre-last universal common ancestor enzyme, Nar, was reported to serve as the combined Fe(II) oxidase and nitrate reductase (99,102).Moreover, it is reported that the iron-oxidizing multi-copper oxidase and rusticyanin share evolutionary origin with nitrite reductase (NiR) that could use the accumulating nitrite as an oxidant (103).Nitrite reductase (NrfB) was also found to share conserved heme orientation with the iron-metabolizing Mtr complex (76).Therefore, nitrate-dependent Fe(II) oxidization may represent the most ancient dissimilatory Fe(II) metabolism.Consistently, we found that the acknowledged Fe(II) oxidation proteins such as CymA, FoxY, and FoxD showed significant structural homologies to cytochrome c nitrite reductase NrfH (PDB 2VR0), cytochrome cd 1 nitrite reductase (PDB 1QKS), and nitric oxide reductase (PDB 3AYF), respectively.Moreover, nitrate-dependent Fe(II) oxidizers are still widespread in both bacterial and archaeal lineages nowadays (73,98,99), and several aerobic Fe(II) oxidizers such as Acidithiobacillus species (104) and other biometallurgical strains (105) have retained the ability to perform Fe(II) oxidation coupled with sulfate (analog of nitrate) reduction under anaerobic conditions.
From these findings, it can be inferred that these Fe(II) oxidation proteins may have been vertically transmitted for a relatively long period within Gammaproteobacteria and Betaproteobacteria, allowing for the accumulation of large quantities of homologs and subsequently adapted to the shift of oxygen as the terminal electron acceptor.As previously observed, the GOE has promoted the innovation and diversification of metabolic pathways (106).The vertically transferred process was probably followed by horizontal transfers of these homologs to other distantly related taxa after the GOE, as evidenced by the G + C content deviation, frequent presence of flanking mobile genetic elements in the genomes, and the sporadic and irregular distribution of the Fe(II) oxidation homologs in their genomes (Fig. 2 and 4).
Although these findings offer valuable insights into the topic, investigations on the evolution of microbial iron oxidation pathways remain open.Ongoing studies continue to contribute to our understanding and may offer alternative perspectives.

MATERIALS AND METHODS
We utilized FeGenie (43) by default parameters to identify candidate genes associ ated with iron oxidation in the genomes of reported iron oxidizer (see Supplemen tary Material at https://doi.org/10.6084/m9.figshare.23652387.v1).FeGenie ( 43) is a computational tool designed for identifying putative iron-related genes from genomic and metagenomic data sets.It utilizes a combination of gene identification and annotation algorithms, along with specific criteria related to iron-associated functions, to predict and retrieve genes potentially involved in iron metabolism and homeostasis.The representative sequences of known Fe(II) oxidation proteins were obtained from the FeGenie results as the initial query sequences for the following analyses.SSNs of the targeted gene families were calculated via the Enzyme Similarity Tool (EFI-EST) (107), with BLAST query E-value of 1e-20 against the non-reductant UniRef_90 database (69) for homologous sequences retrieval and E-value of 1e-30 for BLAST to calculate similarities between sequences defining edge values.The alignment score threshold for generating the final SSN was selected at the score corresponding to 35% sequence identity which is also indicated in the illustrated SSN diagrams.The UniRef_90 database was built by clustering the UniProt original sequences at cutoffs of 90% sequence identity and 80% overlap with the longest sequence in the cluster (the seed sequence) (69).SSN is visualized with "Organic layout" in Cytoscape v. 3.7.1 (108).Genome Neighborhood Tool (EFI-GNT) (107) was used to analyze the gene context in genomes.The phylogenetic tree based on protein sequences was built using PhyML (109) with the maximum likelihood (ML) method (1,000 bootstrap replicates), followed by visualization with iTOL (110).Sequences were aligned with MUSCLE (111) and trimmed with Gblocks (112) prior to tree construction.

Conclusions
Fe(II) has likely served as an energy substrate for microbial metabolism for billions of years.This study aims to provide a comprehensive understanding of the distribu tion and evolution of microbial Fe(II) oxidation pathways by integrating SSN analysis and protein structural comparisons.Examining the non-redundant database revealed a surprisingly broad range of taxa, including classes Gammaproteobacteria (2,084 hits) and Betaproteobacteria (1,684 hits), harboring homologs of the known Fe(II) oxidation proteins.Additionally, evidence of HGT was found in many Fe(II) oxidation proteins.Notably, classes Gammaproteobacteria and Betaproteobacteria often occupy the hub positions of the protein SSNs from which homologs of other taxa are derived.The RoseTTAFold predictions also provide insights into those structurally unknown Fe(II) oxidation components, such as FoxY and FoxZ.Many proteins involved in diverse Fe(II) oxidation pathways exhibit close structural homology, suggesting convergent evolution.Still, the current Fe(II) oxidation models are far from finished.With the increasing number of pure isolates, genomic/structural data, and biochemical validations available, it is expected that additional Fe(II) oxidation mechanisms and evolutionary details will be uncovered.

FIG 3
FIG 3 Protein SSNs of representative proteins encoded by the Fe(II) oxidation operons of Acidithiobacillus for the visualization of the distribution and evolution diagram of these proteins and their putative homologs: (a) Cyc2, (b) Cyc1, (c) CoxA, (d) CoxB, and (e) CycA.The total number of nodes/edges and alignment score threshold (corresponding to 35% sequence identity) for final SSN construction are shown.The sub-SSN showing the first neighbors of the query sequence is marked with a black dotted rectangle frame.

FIG 4
FIG 4 Genome context comparisons of Cyc572 from Leptospirillum spp. as a typical case of Fe(II) oxidation gene with highly variable gene contexts and abundant flanking MGEs (e.g., transposase and phage integrase), as HGT indicators.The color of each gene is assigned based on domain classification (with the same domains showing the same color).Average AAIs of the genes within the operons are indicated in blue.

FIG 5
FIG 5 Protein SSNs of representative proteins encoded by the Fe(II) oxidation pioABC operon for the visualization of the distribution and evolution diagram of these proteins and their putative homologs: (a) PioA, (b) PioB, and (c) PioC.The sub-SSN showing the first neighbors of the query sequence is marked with a black dotted rectangle frame.

FIG 6
FIG 6 Predicted structures of proteins encoded by operon foxEYZ from Rhodoferax ferrooxidans.(a) The bihemic cytochrome c protein FoxE; (b) FoxY, a propeller-like super-barrel with the girdle of trypto phan residues in the "W" motifs involved in docking the β-sheets together (top) in comparison with cytochrome cd 1 nitrite reductase (PDB 1QKS, bottom).The electrostatic potential surfaces are calculated with adaptive Poisson-Boltzmannsolver (APBS); (c) FoxZ with the drug/metabolite transporter (DMT) superfamily fold and comparisons among FoxZ, Starkeya novella YddG (PDB 5I20) and Streptococcus pneumoniae LicB (PDB 7B0K).The overall fold of the polypeptide chain is ramp-colored from red (N-terminal) to purple (C-terminal).

FIG 7
FIG 7 Distribution of Fe(II) oxidation proteins at phylum level assessed by querying the non-reductant UniRef_90 database (E-value 1e-20).Data of asteroid impacts, solar luminosity, and fluctuations of atmospheric oxygen and carbon dioxide amount provided by Timetree (http://www.timetree.org)were displayed synchronously with divergence times in the form of time panels.The estimated occurrence time of the GOE event (~2,400 MYA) was marked with a blue dotted line.

TABLE 2
Structure commonality in iron oxidation proteins from various taxa Sulfocyanin FerroplasmaPrimary electron acceptor from Fe(II)