Global biosynthetic analysis reveals the novelty, distribution, and diversity of archaeal SMs
Intensive secondary metabolite studies have led to the accumulation of knowledge regarding natural product chemistry and biosynthetic machinery, which has greatly contributed to developing genome mining tools such as antiSMASH23. To explore the biosynthetic potential of Archaea, we applied antiSMASH 6.0 to 7,157 de-replicated archaeal genomes obtained from the NCBI database, including RefSeq datasets and archaeal Metagenome Assembled Genomes (MAG) from GenBank. Up to 2,790 genomes were found to harbor a total of 5,496 BGCs, with the number of BGCs per genome ranging from 1 to 26. Although archaea harbored a relatively low BGC number per genome compared to the more well-studied bacteria domain, they encoded most of the known classes of secondary metabolites (Fig. 1a, b). Further global biosynthetic analysis of the archaeal domain revealed diverse BGCs, including 2,079 ribosomally synthesized and post-translationally modified peptides (RiPPs), 1,951 terpenes, 354 non-ribosomal peptides (NRPs), 282 polyketides (PKs), 178 siderophores, and 652 other metabolites. Specifically, archaea contained a relatively high abundance of RiPPs and terpenes, contrasting with the dominance of NRPs and PKs BGCs in bacteria revealed by a recent global analysis of ~190,000 bacterial genomes24.
To gain insight into the novelty and diversity of archaeal BGCs, we extracted BGC features using BiG-SLiCE25 and grouped them based on an all-to-all cosine distance among BGCs26. Indicating the high diversity of archaeal BGCs, 2,391 non-redundant gene cluster families (GCFs) and 92 gene cluster clans (GCCs) were found, based on a BGC distance cutoff of 0.2 and 0.8, respectively (Fig. 1c), We further compared the 5,497 BGCs to the reference known BGCs described in the 'Minimum Information about a Biosynthetic Gene' (MIBiG) repository27. Only 2.5% of BGCs were found to be only remotely related to characterized BGCs, leaving the vast majority completely unknown. This highlighted the potential for discovering novel chemistry from the untapped archaea domain. The high diversity and novelty of archaeal BGCs suggested that further genome sequencing efforts of archaea would continually uncover additional novel BGC families. Notably, archaea harbored a high diversity of RiPPs, making up 55 out of 92 GCCs (Fig. 1c). In contrast, terpene BGCs with comparable abundance were relatively conserved, of which 98.3% (1917/1951) were grouped into one GCC, even though they were widely distributed across the archaea domain. In particular, the class Haloarchaea, an extremely halophilic branch of the Archaea domain, harbored a relatively high abundance of terpene BGCs (Fig. 1b, c and Supplementary Fig. 1). As archaeal terpenes have been proposed to regulate membrane permeability to enhance resistance to environmental stress28, 29, our findings suggest that terpene biosynthesis could be a universal strategy used by archaea to maintain membrane plasticity in response to environmental changes.
We then mapped the phylogenomic distribution of archaeal biosynthetic potential to identify BGC-rich taxa (Fig. 1b and Supplementary Figs.1-3). The distribution of BGC classes at the phylum or class levels across the archaeal domain varied significantly. Four archaeal clades exhibited distinct biosynthetic potential patterns (Fig. 1b and Supplementary Fig. 4). Euryarchaeota, well known for their diversity in taxonomy, appearance, and metabolic potential1, 4, were the most BGC-rich phylum, harboring diverse BGCs of RiPPs and terpenes. Specifically, Haloarchaea and Theionarchaea represent the most BGC-enriched taxa of archaea (Fig. 1b and Supplementary Figs. 2-3). The TACK clade, composed of four archael phyla, was relatively rich in RiPPs, while another clade, Asgard was richer in polyketides than other clades, exemplified by phylum Heimdallarchaeota. In summary, our global analysis of biosynthetic potential revealed that archaeal SMs, particularly RiPPs, are highly diverse and widely distributed throughout the archaeal domain.
Archaea harbor diverse uncharacterized RiPPs BGCs
The widespread and diverse RiPP BGCs were of particular interest, as they typically encode antibiotics, antifungals, and siderophores, which are envisioned to mediate social and competitive interactions within the microbial community30, 31. For instance, bacteria usually employ antimicrobial RiPPs as chemical weapons for defense and competition31. The manifold chemical space of genetically encoded RiPPs is determined by the highly variable nucleotide sequences of precursor peptides and numerouspost-translational modifications, leading to their high diversity and complexity. Thus, we sought to further investigate the novelty and diversity of archaeal RiPPs. We found that unclassified RiPP BGCs such as RiPP recognition element (RRE)-containing BGC and RiPP-like BGCs that may encode novel RiPPs contributed about 45.3% of detected RiPPs BGCs (942/2,076), suggesting novel biosynthetic logic lurking in archaeal RiPPs biosynthesis. In addition to unclassified RiPPs, archaea also harbored diverse lanthipeptides, lassopeptides, radical S-adenosylmethionine (rSAM)-dependent enzyme modified RiPPs32 (i.e., ranthipeptides and sactipeptides), and thioamide-forming enzyme YcaO related RiPPs30 (i.e., Thioamitides, LAPs, and thiopeptides) (Fig. 2a, and Supplementary Figs. 5-6). Intriguingly, Theionarchaea of Euryarchaeota exhibited remarkable genetic potential for RiPPs, encoding up to 23 rSAM-related RiPPs BGCs per MAG, exemplified by Theionarchaea Yap2000 (Supplementary Fig. 5). Bacterial RiPPs-related rSAMs often catalyze chemically challenging reactions that form thioether crosslinks between a donor Cys sulfur and the α, β, or γ carbon of an acceptor amino acid32, 33. Similar to their bacterial counterparts, the archaeal rSAM-related RiPP BGCs also contained genes encoding precursor peptides with conserved cysteine residues, rSAM enzymes, PqqD proteins, peptidases, and transporters. Most of the rSAM enzymes harbored in RiPP BGCs from Yap2000 were phylogenetically related to the known rSAM enzymes of sactipeptides or ranthipeptides, sharing a typical SPASM domain (Supplementary Fig. 5). Notably, rSAM-related RiPP BGCs harbored in Theionarchaea contained up to 12 rSAM genes in one cluster, representing an unusual feature of rSAM-related RiPPs (Fig. 2b). Numerous BGCs of rSAM-related RiPP harbored multiple rSAM enzyme genes in one MAG, which was rare in bacteria but ubiquitous in Theionarchaea (Fig. 2b and Supplementary Fig. 5). These findings indicated the unique biosynthetic machinery of archaeal RiPPs. However, without cultivated isolate of Theionarchaea, the biosynthetic machinery of rSAM-related RiPPs remains to be explored.
Unlike rSAM-related RiPPs found in all four clades of archaea, lanthipeptides were identified exclusively in the phylum Euryarchaeota, with a total of sixty-eight BGCs, of which sixty-three BGCs were classified as class II lanthipeptide from the class Haloarchaea (Supplementary Tables 1 and 2). Phylogenetic analysis of the hallmark class II lanthipeptide synthetase LanMs showed that archaeal LanMs were closely related to LanMs from Proteobacteria (Supplementary Fig. 7), suggesting that they may share a common ancestor. Notably, archaeal LanMs clustered together and thus were more evolutionarily related to each other. Based on our phylogenetic analysis of archaeal LanM, the class II lanthipeptide BGCs were further divided into four groups: A-D (Fig. 2c and Supplementary Fig. 8). To explore the sequence space of lanthipeptide precursors, the small peptide gene adjacent to the lanM genes were fetched by Prodigal-short34. In total, 63 candidate precursors were identified based on the presence of both Ser/Thr and Cys residues in their C-terminal core peptides, which are indispensable for forming the characteristic thioether crosslinks of lanthipeptides. Notably, none of those archaeal class II lanthipeptide precursors clustered with these of bacterial origins in a sequence similarity network (SSN) analysis (Fig. 2d). Each BGC group harbored diverse precursor peptides, indicating the chemical diversity of archaeal lanthipeptides (Fig. 2d and Supplementary Fig. 8). Additionally, most of the adjacent genes encoded proteins of unknown functions. Altogether, our results highlighted the potential for discovering novel chemistry from archaeal lanthipeptide BGCs.
Linking the biosynthetic loci to metabolites
Our genome mining offered insights into the biosynthetic capacity of archaeal RiPPs. We next sought to chemically investigate archaeal RiPPs and study their roles in antagonistic interactions. Thus, we selected halophilic archaea from class Haloarchaea for a fermentation-based discovery of SMs based on three reasons: (i) Haloarchaea harbor diverse RiPPs BGCs, especially lanthipeptide BGCs (Supplementary Fig. 9), (ii) Haloarchaea are well-known for their antagonistic interactions in the halophilic niche while the antagonistic origin of these interactions remains unknown11, 12, 13, 14, 15, 16, (iii) Haloarchaea is an exceptionally well-suited model for the study of archaeal biology35. Six representative isolates from the halophilic niche were selected accordingly, cultured, and subjected to matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry (MS) analysis. Mass spectrometry showed that three strains (Haloferax larsenii JCM13917, Halorussus salinus YJ-37-H, and Halomicrobium mukohataei DSM12286) exhibited diverse typical peptide signals (MW > 800) (Supplementary Figs. 10-12), hinting at the secondary metabolic potential of Haloarchaea species. Further analysis of the metabolites produced by these strains was performed using high-resolution (HR) LC-MS and the MS/MS-based molecular networking36 via the Global Natural Product Social (GNPS) platform37. Halorussus salinus YJ-37-H from marine solar saltern38 was found to be a prolific producer, particularly rich in diverse peptidic metabolites (Fig. 3a and Supplementary Fig.13).
Next, we attempted to link metabolites to their biosynthetic loci by mapping the observed HRMS signals to calculated m/z values of bioinformatically-predicted core peptides with anticipated modifications. These modifications included dehydration, methylation, and dehydrogenation (e.g., disulfide crosslink), etc. We found three clusters of peptidic HRMS signals that matched well with the core peptides of these anticipated lanthipeptides of three BGCs alnα, alnβ, and alnγ (Fig. 3 and Supplementary Table 3). Additionally, the HR-MS/MS analysis of compound 1 (m/z: 621.7) revealed the same SW motif observed in the C-terminal of core peptide (GCGFTCSPFSSW) encoded by alnα. Similarly, the HR-MS/MS analysis of compound 2 (m/z: 755.3) and 3 (m/z: 729.3) revealed the same linear GLP motif found in the N-terminal of alnβ’s core peptide (GLPSASMYSFEHCC) and the same linear DIDS motif observed in the C-terminal of alnγ’s core peptide (CSCYTRICCDIDS), respectively. (Fig. 3g, f and Supplementary Table 3). Altogether, we suggested that compounds 1-3 and their corresponding analogues (1a, 2a-2d, and 3a) are RiPPs encoded by class II lanthipeptide BGCs alnα, alnβ, and alnr, respectively. Guided by the bioinformatically-identified precursor peptide sequences and accompanying MS data, we successfully linked these putative lanthipeptides to their biosynthetic loci with confidence.
Discovery of three new lanthipeptides from archaea: archalan α-γ
We next sought to elucidate the structures of compounds 1-3 by the MS/MS and NMR analysis. Mass signals at m/z 621.7382 [M+2H]2+, indicative of the molecular formula C57H71N13O15S2 (△ +1.29 ppm) of compound 1, matched with the predicted core peptide of BGC alnα with two dehydration modifications, which was also further supported by MS/MS fragmentation analysis. The MS/MS fragmentation pattern matched well with the core peptide, except that Cys2 and Cys6 turned out to be Dha with a mass loss of 34 Da, while Thr5 and Ser10 residues were observed with a mass increase of 16 Da, due to the breaking of the Cγ-S bond in Cys residues during fragmentation (Fig. 3d and Supplementary Fig. 14). These results suggested the installation of the thioether-bridged amino acids lanthionine (Lan) crosslinking Cys6 and Ser10 and methyllanthionine (MeLan) between Cys2 and Thr5 (Fig. 3d, e). To fully characterize the structure of 1, we purified 2.0 mg of 1 from a 20 L culture and elucidated its structure using extensive NMR analysis (Fig. 3e, 3f, Supplementary Figs. 15-17 and Supplementary Table 4). HMBC correlations found within MeLan or Lan subunits further supported the installation of methyllanthionine and lanthionine motifs crosslinking Cys2 and Thr5, Cys6 and Ser10, respectively (Fig. 3f). The planar structure of 1 confirmed the dehydration of genetically encoded Thr5 and Ser10 and subsequent addition of Cys-2 and Cys-6 to the transients Dhb-5 and Dha-10, respectively. To determine the absolute configurations of all amino acid residues, we conducted advanced Marfey's analysis39 for 1 using L/D-FDLA (1-fluoro-2,4-dinitrophenyl-5-L-leucinamide) (Supplementary Table 5). Results showed that all unmodified amino acids existed as L-configuration. For two thioether rings, the "D before L" was observed for the FDLA derivatives of MeLan and "L before D" was observed for the FDLA derivatives of Lan in 1, which was consistent with the previously reported Marley's analysis of Lan (LL) and MeLan (DL)40, 41. Taken together, we assigned a DL absolute configuration for MeLan and an LL for Lan. This newly identified class II lanthipeptide, named archalan α (1), is the first lanthipeptide identified from archaea to the best of our knowledge.
We next resorted to a combination of chemical derivatization (e.g., DTT reaction and desulfurization by NiCl2 and NaBH4/NaBD4) and MSn analysis to infer the planar structures of 2 encoded by BGC alnβ (Supplementary Figs. 18-22) and 3 encoded by BGC alnγ (Supplementary Figs. 23-28), due to their limited amount. Compound 2 gave a prominent doubly charged [M+2H]2+ (m/z, 755.3012) peak by HRMS for C66H92N16O19S3 (△ +0.53 ppm). Compared to the unmodified core peptide, the observed m/z of 2 (m/z, 755.3012 [M+2H]2+) had a mass loss of 22.0064 Da, matching well with two dehydrations (-2H2O, -36.0211 Da), and one methylation (+CH2, +14.0157 Da) at the N-terminal of Gly1 (Fig. 3b), which were further supported by the tandem MS analysis of 2 (Supplementary Fig. 19). Deduction of methylation at the N-terminal of Gly1 was also supported by the association of a methyltransferase gene found in the BGC alnβ. Two dehydrations due to the formation of C-S crosslinks were further supported by a tandem MS analysis of the full reduction-desulfurization product (Supplementary Figs. 20). To further identify its ring topology, 2 was partially reduced and desulfurized following an established protocol40, generating two different partial desulfurized productions (Supplementary Figs. 21 and 22). According to MS/MS analysis, the first one showed an intact Ser6/Cys14 crosslink and a deuterated Ala reduced from Ser4, and the other showed an intact Ser4/Cys13 crosslink and a deuterated Ala reduced from Cys14. Together, 2 was identified as a class II lanthipeptide, namely archalan β, containing one methylation at the N-terminal of Gly1 and two C-S crosslinks with intertwined topology (Fig. 3g). Compound 3 gave a prominent doubly charged [M+2H]2+ (m/z, 729.2666) peak by HRMS for C58H88N16O20S4 (△ +3.98 ppm). Compared to the predicted core peptide, the m/z of 3 had a mass loss of 24.0826 Da, matching with two-fold dehydration (-2H2O, -36.0211 Da), one dehydrogenation (-H2, -2.0157 Da), and one methylation (+CH2, +14.0157 Da) (Fig. 3b). Guided by the core peptide sequences, we carried out DTT reduction, reduction-desulfurization and subsequent LC-MS/MS analysis to identify one N-terminal methylation, one S-S crosslink and two C-S crosslinks of 3 (Supplementary Figs. 23-28). Altogether, 3 was identified to be a class II lanthipeptide, namely archalan γ, which contained a Cys1-Cys8 disulfide crosslink and two lanthionine crosslinks between Ser2 and Cys3, Thr5 and Cys9, respectively. The structure elucidation is described in the Supplementary Fig. 23.
With the characterized chemical structures of archalans in hand, we attempted to bioinformatically analyze the chemical feature and diversity of archaeal lanthipeptides based on different characteristics, such as the diversity of precursor peptides, the number of putative dehydrations, and the number, size and topology of the rings. Sequence analysis of the 65 lanthipeptide precursor peptides from class Haloarchaea revealed three features which made them distinct from bacterial lanthipeptides. Firstly, the precursors and the core peptides are significantly shorter than their known bacterial counterparts (Supplementary Fig. 29). Secondly, their leader peptide regions were highly diverse, with conserved KxxYDxxF motifs in groups A and B, or KxxFDxxF in group C (Supplementary Fig. 8). Thirdly, their core regions were also highly diverse and distinct from their bacterial counterparts, exhibiting different topologies compared to other known lanthipeptides. In particular, the ring systems of lanthipeptides even from the same host were diverse, ranging from simple non-overlapping 'bicycle' rings exemplified by 1 to highly complex, intertwined topology exemplified by 2 and 3 (Supplementary Figs. 30-31). In brief, archaeal lanthipeptides were short in peptide length but highly diverse in the amino acid residues of the core peptide. Additionally, the ability to form diverse lanthionine rings even within the same producing host further diversified archaeal lanthipeptide structural diversity.
Archalan exhibits anti-archaeal activity against closely related haloarchaea
We next sought to investigate whether these archaeal lanthipeptides exert similar biological activities as their bacterial counterparts and whether they contribute to archaeal antagonistic interactions. We tested purified archalan α (1) in two assays: anti-archaeal against phylogenetically-related Haloarchaea and antibacterial against selected bacteria. No significant activity was observed in the antibacterial assay with inhibition less than 30% at 100 μg mL-1. (Fig. 4a). In contrast, archalan α showed significant inhibitory activity against several phylogenetically closely related Haloarchaea species. In particular, 1 is potent against extremely halophilic isolates of H. argentinensis and H. larsenii, with IC50 of 32-37 μg mL-1 (Fig. 4b). Archalan α is the first class of antiarchaeal lantibiotics identified to the best of our knowledge. Being the products of haloarchaea isolated from halophilic environments predominated by archaea42, archalan α exhibited narrow-spectrum anti-archaeal against phylogenetically-related halophilic archaea (Supplementary Table 6). These results suggested that archalan may play a role in antagonistic interaction of the halophilic niche. Though the ecological function of these compounds is not yet fully understood, our genomics-guided discovery of anti-archaeal lanthipeptide paves the way for the discovery of more archaeal SMs and investigation into their function in microbial interactions.