Genome mining strategies for metallophore discovery

Many bacteria use small-molecule chelators called metallophores to acquire trace metals from their environment. These molecules play a central role in interactions between bacteria, plants, and animals. Hence, knowing their full diversity is key to combatting infectious diseases as well as harnessing beneficial microbial communities. Metallophore discovery has been streamlined by advances in genome mining, where genomes are scanned for genes involved in metallophore biosynthesis. This review highlights recent trends and advances in predicting the presence and structure of metallophores based solely on genomic information. Recent work suggests new families of metallophores remain hidden from current homology-based approaches. Their discovery will require new genome mining approaches that move beyond biosynthesis to consider metallophore transporters, regulation, and evolution


Introduction
Microbes are often in competition for a limited pool of trace metals.In response to metal scarcity, many bacteria produce metallophores, low-molecular-weight organic compounds that bind ions with high affinity and selectivity (Figure 1a) [1].The metal-metallophore complex then enters the cell by active transport, and the metal is released for use in metalloenzymes.The most diverse and well-studied metallophores are the iron(III)-binding siderophores [2], with hundreds of unique structures characterized to date.Siderophores have been found to shape microbial interactions with the environment, other microbes, and multicellular life.Pathogens rely on siderophores to steal iron from their hosts [2], while beneficial microbial siderophores in the rhizosphere encourage plant growth and defend against pathogens [3,4].Lying at the interface of chemistry and biology, siderophore-based technologies are used in medicine, agriculture, biosensing, and bioremediation [5].A number of other metallophore classes have been reported, including chalkophores (Cu), zincophores (Zn), molybdophores (Mo), nickelophores (Ni), and lanthanophores (lanthanides) [1,6].Although siderophores are the most well-studied metallophores by a large margin, other metallophores play equally crucial roles in diverse natural environments and the human host [1,7,8].The chemistry and biology of a metallophore is often highly specific [1,4,9], and thus biotechnological applications require an understanding of natural metallophore systems [5].
The discovery and characterization of new metallophores has been accelerated by genome mining, where genomes are scanned for gene families of interest.Genes encoding metallophore biosynthesis, transport, and utilization are generally colocalized on the genome, forming biosynthetic gene clusters (BGCs, Figure 1b).The presence of a putative BGC not only provides evidence that a metallophore is being produced, but also can be used to predict the chemical structure of the molecule and dereplicate it against known compounds.Existing genome mining tools are well suited for finding variations of known metallophores; however, they do not facilitate straightforward identification of entirely novel families in silico.To date, the first example of each metallophore class was discovered in the wet lab.Understudied and unculturable taxa may produce novel metallophores with important natural roles and useful applications, but without a technique for genomic discovery, progress will be slow.This review first highlights recent studies that use current genome mining strategies to find novel metallophore BGCs and predict the resulting chemical structure.We focus on bacterial metallophores and direct interested readers to a recently published chapter on fungal siderophore bioinformatics [10].We also look toward the future of metallophore discovery and discuss strategies for de novo detection of metallophore families that are invisible to current techniques.

Genome mining for metallophore biosynthetic pathways
Metallophore genome mining generally involves searching for homologs of genes known to encode metallophore biosynthesis.The majority of known siderophores (and some other metallophores) are synthesized by one of two widespread pathways: nonribosomal peptide synthetases (NRPSs) and NRPS-independent siderophore (NIS) synthetases, though many siderophore and metallophore pathways belong to neither [2].NRPSs are large, multidomain enzymes that also assemble many other classes of peptidic specialized metabolites.Metallophore NRPSs are often distinguished from other NRPSs based on the presence of accessory genes in the associated BGC that code for the biosynthesis of the metal-chelating moieties.The NIS synthetase is putatively siderophore-specific, although an NIS-like lanthanophore was recently proposed [6].
Several platforms have been developed for the automated detection of BGCs in a genome; two of the most popular are antiSMASH and PRISM [11,12].Both are general-purpose, rule-based tools that scan genomes with profile hidden Markov models (pHMMs) to identify (combinations of) enzyme-coding genes that are signatures for certain classes of BGCs.As of antiSMASH 6.0 and PRISM 4, both are quite limited in metallophore prediction.AntiSMASH and PRISM can detect NIS Metallophores and current genome-mining techniques.(a) Metallophore-mediated metal acquisition.Left: Metallophore-biosynthesis genes are expressed when the intracellular metal concentration drops, as sensed by a metal-binding metalloregulator.Middle: Metallophores are exported into the environment, where they can encounter metal ions and chelate them with high affinity.Coelichelin from Streptomyces coelicolor A3 is a typical peptidic siderophore.Right: Metallophore complexes are recognized and transported into the cell by membrane proteins, and the metal is released for metabolic use.(b) A representative metallophore BGC, containing genes for siderophore biosynthesis and transport.(c) Multiple BGCs can be organized and dereplicated using sequence-similarity networks.(d) Phylogenetic analysis of biosynthetic enzymes, combined with structural information from known products, can reveal new biosynthetic traits.
synthetases, but neither tool can separate NRPS metallophore clusters from other NRPS clusters, and the smaller families of metallophore BGCs are not detected at all.Efforts to improve antiSMASH metallophore prediction are currently underway.The FeGenie tool detects a variety of iron metabolism pathways, including siderophore synthesis [13]; however, in our experience, the biosynthetic pHMMs produce many false positives.Thus, a manual inspection is still generally required to accurately detect a metallophore cluster.
Genome mining can generate lists of thousands of putative BGCs; however, many of them will be nearly identical, and many produce known compounds.BGCs can be dereplicated and prioritized for further study by organizing them into gene cluster families (Figure 1c).The BiG-SCAPE software [14] performs whole-BGC comparisons and constructs networks where each BGC is represented by a node (Figure 1c).A strict similarity cutoff can be used to differentiate nearly identical BGCs: a BiG-SCAPE analysis of nocobactin-like BGCs in Nocardia revealed 11 distinct subfamilies and identified several novel compounds [15].Alternatively, a relaxed cutoff can be used to identify metallophore BGCs among a broader network containing other classes of natural products.The photoxenobactins were discovered following a pan-analysis of Xenorhabdus and Photorhabdus; the novel BGC family had only slight similarity to known siderophores [16].These and other analyses benefit from a database of known BGCs for comparison.The Minimum Information about a Biosynthetic Gene Cluster (MIBiG) repository is currently the most comprehensive public database of BGCs with known products [17].Genomic data from MIBiG has been integrated into BiG-SCAPE [14], antiSMASH's KnownClusterBlast [11], and custom siderophore genomics workflows [18,19].Unfortunately, the current version of MIBiG (2.0)only contains 40 bacterial metallophore BGCs, a small portion of those described in literature.

Biosynthetic genes in new contexts
Genes from known metallophore pathways can be used as handles to search genome databases for homologous BGCs and reveal new biosynthetic diversity.For example, three novel biscatechol siderophores were found by scanning Acinetobacter proteomes for homologs of the vibriobactin condensation domain VibH using phmmer [20,21].The ethylenediaminesuccinic acid hydroxyarginine siderophore cluster was found using Multi-GeneBlast, which allowed for an entire operon to be used as a BLAST query [22,23].Comprehensively mapping the sequence diversity of an enzyme family can give a more complete picture of the associated biosynthetic space and serve as a roadmap for future studies by identifying new gene cluster families involving unprecedented combinations of enzyme-coding genes that may or may not be detected by current genome mining tools.An exhaustive 2013 study of methanobactin BGCs defined five families based on operon content and phylogeny [24]; today, methanobactins are the most well-studied non-iron metallophore; and a recent genome mining study perfectly predicted the structure of a novel methanobactin [25].The painstaking contextualization of gene families has been semiautomated using the Enzyme Function Initiative's Enzyme Similarity Tool (EFI-EST) and Genome Neighborhood Tool (EFI-GNT) for sequence similarity networking of protein-coding genes and their surrounding genomic loci, respectively [26].Soon after the novel chelating amino acid graminine was reported [3], the biosynthesis gene grbD was used as a query for EFI-EST/EFI-GNT, guiding the isolation of three additional graminine-containing siderophores [27].

Advances in metallophore structural predictions
Despite recent advances in genome mining, perfectly predicting metallophore structures from their BGCs remains difficult.Predictive power can be increased by splitting an enzyme family into phylogenetic clades, each with distinct reactivity (Figure 1d).This strategy is well developed among NRPS domains and NIS synthetases [30][31][32].A genomic analysis of NRPS siderophore aspartyl β-hydroxylases delineated two distinct subtypes, allowing for the position and stereochemistry of β-hydroxyaspartate residues to be predicted [33].The phylogeny revealed a mismatch between the cupriachelin genomic prediction and reported structure, leading to stereochemical reassignment upon reisolation.An independent study found the same phylogenetic division [18]; however, enzymatic studies are still lacking.
Improved understanding of metallophore biosynthesis has allowed researchers to hypothesize the existence of a 'missing' metallophore chemical structure, envision the biosynthesis, and scan genomes to find a producing strain.Bioinformatic and enzymatic studies of three opine-like zincophores revealed two binary sources of structural diversity [34].One combination had not been observed; targeted genome mining enabled the discovery of the fourth structural variant, bacillopaline.Similarly, BGCs were hypothesized for the hypothetical L- diastereomers of the related cyclic siderophores trichrysobactin and trivanchrobactin, which contain D-Lys and D-Arg, respectively; genomes were scanned for the corresponding requisite genes, leading to the isolation of frederiksenibactin and ruckerbactin [35,36].These studies are not merely for the sake of completion, but also provide natural systems to study the impact of slight structural changes on metallophore chemistry and biology.

Moving beyond biosynthesis-based metallophore biosynthetic gene cluster detection
Each of the genome mining studies highlighted above relied on homology to known metallophore biosynthesis pathways; however, the continued discovery of new pathways [3,37,38] suggests more pathways remain hidden to current genomic techniques.Metallophores have two key characteristics besides metal chelation: metallophore biosynthesis is repressed by the chelated metal, and the metal-metallophore complex is actively transported into the cell [1].The genomic markers for these two traits are far more universal among known metallophores than any single biosynthetic pathway, and a few recent studies show that it is possible to use them to detect metallophore BGCs, perhaps forming the core of future pathway-agnostic metallophore detection algorithms.

Regulation
Bacterial metallophore production is generally controlled at the transcriptional level.Under metal-replete conditions, global regulators block transcription by binding to DNA recognition sites upstream of metal acquisition genes (Figures 1a and 2a) [39,40].Spohn et al. discovered a metallophore BGC undetectable by Approaches for metallophore genome mining that are not reliant on known biosynthetic pathways.(a) A list of known metalloregulator binding sites can be used to construct a conserved binding site motif; scanning a genome for the motif can reveal genes that respond to low-metal conditions [41].(b) Metallophore-specific transporter families, rarely found in other classes of BGCs, can predict metallophore activity [19].(c) Metallophore 'cheaters' frequently arise that lose biosynthesis genes while retaining the ability to use foreign metallophores for metal acquisition.Complex patterns of metallophore gene transfers and deletions can be seen in species phylogenies.The sudden deletion of a metallophore pathway may indicate that a different metallophore is being produced [45].(d) A proposed de novo metallophore genome mining workflow.Genetic loci are successively filtered to produce a small set of potential metallophore BGCs for experimental validation.
antiSMASH using a strategy called Identification of Natural compound Biosynthesis pathways by Exploiting Knowledge of Transcriptional regulation (INBEKT) [41].Amycolatopsis japonicum produces the orphan zincophore ethylenediamine disuccinate (EDDS).Zincdependent regulator binding sites were identified in the genome based on motifs from other Actinobacteria.A four-gene zinc-mediated operon was identified, and biochemical studies confirmed that the cluster is responsible for EDDS production.In this case, cluster identification was aided by a known metallophore structure.However, Zur regulons characterized to date only have 10-30 genes [42], so the INBEKT workflow could significantly narrow the search for new zincophore BGCs with no structural information.The approach is limited to cases where a regulator binding site motif can be identified, and may miss metallophore biosyntheses controlled by intermediate pathway-specific regulators or post-transcriptional regulation [39].A broadly applicable tool would likely require equally comprehensive data on metalloregulator binding sites, which will be easier to obtain for some bacterial taxa compared with others.

Transport
Nearly every report of a new siderophore BGC includes an analysis of genes encoding ferric-siderophore import.Banfield and colleagues expanded this approach with a comprehensive study of transporter genes in characterized BGCs in MIBiG [19].Genes encoding TonB-dependent receptors and two ABC transporter components were found to be highly specific to siderophore BGCs (Figure 2b).Similarly, in a recent preprint, TonB-dependent receptors were used to identify siderophorelike NRPS clusters in weathered-granite-associated metagenomes [43].Several were colocalized with lanthanide-dependent XoxF3 systems, suggesting they may encode new lanthanophores.This exciting approach currently has several caveats.Siderophores imported by other pathways and/or by transporters located elsewhere in the genome cannot be detected.Several false positives were also found; phylogeny-based dissection of the transporter families into siderophore-specific subfamilies may improve their predictive potential.Such custombuilt siderophore transporter pHMMs are used in Fe-Genie [13], and the unpublished tool SideroScanner, which detects iron-regulated outer membrane receptors in pathogens (TD Stanton, URL: https://github.com/tomdstanton/sideroscanner).Neither tool focuses on novel siderophores, even though their pHMM libraries may serve useful for the purpose.Perfectly accurate pHMMs may not be feasible if metallophore transport is generally para-or polyphyletic, as was observed among actinobacterial siderophore receptors [44].Additionally, the technique was only tested on antiSMASH-detectable clusters [19].Genome-wide scans for siderophore transporter genes would also find a number of loci with no biosynthetic genes, as many bacteria have transporters for xeno-metallophores that they cannot produce themselves [9,45].

An evolving approach toward holistic metallophore detection
Metallophore families often have complicated evolutionary histories, and a de novo metallophore detection algorithm might use comparative and pan-genomic approaches to identify novel BGCs with similar evolutionary patterns.A comparative analysis of Salinispora revealed three clades that lost the genus' ancestral desferrioxamine BGC and became 'cheaters' that retained only the transporters (Figure 2c) [45].Surprisingly, these strains each contained a replacement siderophore BGC for the novel salinichelins.Thus, strains that lost a known siderophore pathway may be prime targets for finding novel BGCs (Figure 2c).Existing metallophore families are generally scattered across their taxonomic range: for example, opine-like metallophores possibly predate the division of bacterial phyla, but are now quite rare [29], while graminine genes are constrained to just Burkholderiaceae and are likewise uncommon among the family [27].This pattern seems near-universal among metallophores, and therefore constitutively present gene clusters can likely be eliminated.The most challenging aspect of a de novo metallophore detection algorithm will likely be the identification of the novel biosynthetic genes themselves.Machine-learning approaches for BGC detection are improving but still have a high false-positive rate [46].A phylogeny-aware approach such as EvoMining [47] may find genes that have diverged from primary metabolism for new biosynthetic functions.Each of these strategies in isolation would likely produce many false positives; however, successive filters might leave just a small number of highly promising potential metallophore BGCs (Figure 2d).For example, one might thus look for genes encoding siderophore-associated transporter families that are also colocalized with (any type of) biosynthetic genes as well as metal-associated cis-regulatory elements, and then use sequence similarity networking to dereplicate and prioritize the resulting hits to yield a set of high-potential candidate gene clusters for experimental characterization of likely new metallophore biosynthetic pathways.

Conclusions
Metallophore genome mining is built on decades of chemical and biological studies that have connected scores of metallophores to their biosyntheses.In return, genome mining can aid the natural product chemist by predicting the presence and structure of novel metallophores made by homologous BGCs.Comparative genomics of metallophore BGCs can prevent undesired reisolation of known compounds, reveal taxa with untapped structural diversity, and provide new insights into metallophore biosynthesis and evolution (Figure 1).We expect that such comprehensive, large-scale analyses will also be required to answer one of the biggest outstanding questions in metallophore research: when and how they evolved.Unfortunately, large-scale analyses are hampered by a lack of automated techniques for metallophore prediction.User-friendly tools such as an-tiSMASH or PRISM cannot detect the majority of metallophores, and thus accurate structural prediction and dereplication is often constrained to manual curation by experts in natural product biosynthesis.Current genome mining techniques are also limited to experimentally characterized metallophore families due to a reliance on known biosynthetic pathways, yet novel classes of compounds surely remain undiscovered.Hundreds of known metallophores have diverse biosyntheses and structures, but they are united by their biological function in metal acquisition.De novo discovery of metallophore BGCs will require a holistic approach that extends beyond biosynthetic genes.Transporter genes, metalloregulator binding sites, horizontal gene transfer, and other genomic markers of metallophore activity can all be combined to highlight the most promising BGCs for experimental characterization (Figure 2).In the meantime, genome mining will continue to streamline the discovery of new metallophores and lay the foundation for understanding and harnessing microbial competition for trace metals.

Figure 1 Current
Figure 1

Figure 2 Current
Figure 2