Heterologous Expression of the Unusual Terreazepine Biosynthetic Gene Cluster Reveals a Promising Approach for Identifying New Chemical Scaffolds

Here, we provide evidence that Aspergillus terreus encodes a biosynthetic gene cluster containing a repurposed indoleamine 2,3-dioxygenase (IDO) dedicated to secondary metabolite synthesis. The discovery of this neofunctionalized IDO not only enabled discovery of a new compound with an unusual chemical scaffold but also provided insight into the numerous strategies fungi employ for diversifying and protecting themselves against secondary metabolites. The observations in this study set the stage for further in-depth studies into the function of duplicated IDOs present in fungal biosynthetic gene clusters and presents a strategy for accessing the biosynthetic potential of gene clusters containing duplicated primary metabolic genes.

F ungal natural products (secondary metabolites) are an invaluable source of inspiration for pharmaceuticals that act against myriad conditions, including infectious diseases, cancer, and hyperlipidemia (1)(2)(3)(4). Indeed, the antibiotics penicillin and cephalosporin, the cholesterol-lowering lovastatin, and the immunosuppressant cyclosporine are derived from fungi (5,6), and the reservoir of novel scaffolds continues to grow each year (7). Although numerous drugs derived from fungi exist on the market today, genome sequencing has revealed that fungi possess the biosynthetic capacity to produce a far greater number of secondary metabolites than currently accessed (8). Recent studies spanning nearly 600 fungal genomes suggest that a mere 3% of molecules encoded by fungal biosynthetic gene clusters (BGCs) have been explored (8).
To access this biosynthetic potential, an innovative discovery pipeline was recently developed to systematically annotate the biosynthetic abilities of fungi using comparative metabolomics and heterologous gene expression (9)(10)(11)(12). With this platform, fungal genomic DNA fragments containing intact BGCs are inserted into fungal artificial chromosomes (FACs) and are used to transform a fungal host to discover new chemical scaffolds (10)(11)(12). This pipeline uses a metabolite scoring system to identify heterologously expressed metabolites from the thousands of signals originating from the host. By enabling facile linkage between secondary metabolites and their corresponding BGCs, the FAC-metabolite scoring pipeline facilitates prioritization of target compounds most likely to contain novel scaffolds. Using structural clues provided by BGC data, it is possible to target compounds originating from BGCs containing unusual biosynthetic machinery (Fig. 1).
Aromatic amino acids are fundamental for growth and development across phylogenetic kingdoms. Additionally, catabolism of aromatic amino acids leads to the production of nonproteinogenic amino acids, such as the tryptophan-derived kynurenine, which regulates inflammation and immune responses (13,14). Kynurenine and its derivatives are biosynthetic intermediates of numerous secondary metabolites, including sibiromycin (15), mycemycin C (16), nidulanin A (17), nidulanin B and nidulanin D (18), daptomycin (19), and quinomycin peptide antibiotics (20). Recently, the malpikynines were discovered from Mortierella alpina as degradation products of the fungal surfactant malpinins, resulting from an oxidative conversion of tryptophan into kynurenine (21). Incorporation of kynurenine into secondary metabolites enables differential specificity toward enzyme receptors and targets (22). Daptomycin, for example, shows decreased antimicrobial efficacy when kynurenine is mutated to tryptophan (23,24). One tactic for creating secondary metabolites with novel scaffolds is to recruit primary metabolic enzymes that modify common precursors into nonproteinogenic precursors into BGCs (20). For example, a tryptophan 2,3-dioxygenase (TDO) gene located adjacent to the daptomycin-producing nonribosomal peptide synthetase (NRPS) gene encodes an enzyme that supplies the kynurenine for daptomycin synthesis. This TDO diverges from related proteins in the same genus (29% sequence identity), suggesting it is a paralogous enzyme dedicated to secondary metabolite biosynthesis (19).
In a large-scale analysis of 56 FACs, an unknown metabolite from heterologous expression of a BGC from Aspergillus terreus ATCC 20542 (located on the FAC AtFAC7O19; Fig. 2A; also see Table S1 in the supplemental material) was identified with an m/z value of 310.1188 and a molecular formula of C 17 (10). This compound was found in both the parent strain and the AtFAC7O19-transformed Aspergillus nidulans, but not in the empty vector control. Interestingly, the BGC encoding this metabolite contained an indoleamine 2,3-dioxygenase (IDO), which is involved in tryptophan degradation via kynurenine production (25). While most aspergilli contain three IDOs, A. terreus contains four (see Fig. S1 in the supplemental material). Given that gene duplication is often utilized as a strategy to "repurpose" genes for secondary metabolism, the presence of this fourth IDO suggested that it may serve to supply kynurenine for the formation of the unknown metabolite. Here, we employ the FAC-metabolite scoring strategy to identify the biosynthetic product of this unusual gene cluster and probe its biosynthesis.

RESULTS
To determine the structure of the target compound, ϳ1.5 mg of material was purified from FAC-transformed A. nidulans extracts and subjected to tandem mass spectrometry (MS 2 ) analysis, 1 H and 13 C nuclear magnetic resonance (NMR) spectroscopy, and two-dimensional (2D) correlation approaches, including correlation spectroscopy (COSY), heteronuclear single quantum correlation (HSQC), and heteronuclear multiple bond correlation (HMBC) (see Table S2 and Fig. S2 and S3 in the supplemental material). Structural analysis revealed an unusual secondary metabolite backbone, a 3,4-dihydro-1H-1-benzazepine-2,5-dione, resulting from the unusual cyclization of kynurenine. The metabolite's structure matches that of a previously synthesized kynurenine derivative, 2-amino-N-(2,3,4,5-tetrahydro-2,5-dioxo-1H-1-benzazepin-3-yl) benzamide (26). On the basis of its structure and the parent organism, it was given a common name of "terreazepine." To determine the stereochemical configuration of terreazepine, R and S enantiomers were synthesized, each with an enantiomeric excess of Ն95% (Fig. S4). Each enantiomer and the purified natural compound were acylated to enable separation using supercritical fluid chromatography. Interestingly, natural terreazepine was found to be a 2:1 mixture of S-R enantiomers by comparing against synthetic standards separated by chiral chromatography (Fig. S4). During the course of this work, Li et al. reported the independent discovery of (S)-terreazepine (nanangelenin B) as an intermediate in the biosynthesis of the related compound nanangelenin A (27). Given that our discovery of terreazepine was completed prior to the publication by Li et al., the strategies outlined herein nonetheless provide promise for the discovery of novel chemical scaffolds.
To probe terreazepine's biosynthesis, we grew A. terreus (ATCC 20542) using media containing isotopically labeled biosynthetic precursors. Labeling with [ 13 C 6 ]anthranilate resulted in an m/z shift of ϩ6 Da (Fig. 2B), supporting incorporation of anthranilate into the molecule (Fig. 2C). Consistent with terreazepine's chemical structure, labeling with L-Trp-D 5 did not result in the expected shift of ϩ5 in the mass spectrum, instead resulting in a mass shift of ϩ4 (Fig. 2B). Given the existence of an IDO in the AtFAC7O19 BGC, these data provide support that tryptophan is converted into kynurenine prior to incorporation into terreazepine. For further confirmation of the IDO activity in terreazepine biosynthesis, a FAC deletion mutant was produced lacking the IDO tzpB by RedET-mediated recombineering (10). Mass spectral analysis of the FAC deletion mutant revealed no terreazepine production (Fig. S5).
Homology-based annotation of the FAC-encoded NRPS revealed a domain structure consisting of two adenylation (A), two condensation (C), and three thiolation (T) Heterologous Expression of Terreazepine Gene Cluster ® domains, giving the domain sequence A 1 -T 1 -C 1 -A 2 -T 2 -C 2 -T 3 . To investigate the function of the seemingly extraneous T 3 domain, we constructed FAC truncation mutants either lacking the C 2 T 3 domains (ΔC 2 T 3 ) or only the T 3 domain (ΔT 3 ). These constructs were transformed into A. nidulans, and extracted metabolites were subjected to liquid chromatography-mass spectrometry (LC-MS) analysis. A very small amount of the target compound was detected in ΔC 2 T 3 extracts (5,000-fold lower than control), suggesting that terreazepine formation can occur slowly without catalysis. We were unable to detect the presence of any offloaded intermediates. Interestingly, ΔT 3 extracts contained terreazepine levels close to that of the intact NRPS (Fig. 2D). Given that analyses focused on endpoint abundance of terreazepine, it is possible that the T 3 domain increases the catalytic efficiency of product formation. This is in contrast to recent findings by Li et al. in which NanA, the TzpA ortholog involved in nanangelenin A biosynthesis, requires the T 3 domain for product formation (27). The reasons for these differences are unknown but may involve differences in protein sequence or other ancillary protein interactions. Using heterologous expression, stable isotope feeding studies, and NRPS backbone deletions, we propose a biosynthetic scheme for terreazepine (Fig. 2E). In this scheme, N-formyl-kynurenine is formed through the catabolism of tryptophan by TzpB, an IDO. TzpB shares 41% sequence identity to Aspergillus fumigatus IdoA and 45% identity to IdoB, and only 26% identity to IdoC. Enzymatic studies using Aspergillus oryzae suggest that only ido␣ and ido␤ (orthologs of idoA and idoB, respectively) encode enzymes that participate in tryptophan catabolism (28,29). Because most aspergilli contain three IDOs, TzpB, a fourth IDO in the parent organism A. terreus, may no longer play a role in primary metabolism and instead represent a duplicated enzyme dedicated to terreazepine biosynthesis (Fig. S1). This is reminiscent of daptomycin biosynthesis in Streptomyces roseosporus, in which the TDO DptJ supplies kynurenine for daptomycin formation (19). The biosynthesis of terreazepine mirrors that of its relative nanangelenin A, where tzpA and tzpB orthologs in Aspergillus nanangensis (nanA and nanC) encode enzymes with near identical activity.
As the next step in the proposed biosynthesis, TzpA, a two-module NRPS, utilizes anthranilate and kynurenine to assemble terreazepine. The first adenylation domain (TzpA-A 1 ) loads anthranilate onto the T 1 domain, while TzpA-A 2 loads kynurenine, generated through spontaneous nonenzymatic deformylation of the TzpB-supplied N-formyl-kynurenine. The substrate-binding residues of TzpA-A 1 resemble those of other fungal adenylation domains that recognize anthranilate (Table S3). TzpA-A 2 , responsible for incorporating kynurenine, has a new pocket code quite dissimilar from other kynurenine-binding A-domains (Table S3). However, this disparity may be attributable to evolutionary distance between source organisms and the unstudied nature of kynurenine incorporation into fungal secondary metabolites. Given that the isolated terreazepine was a 2:1 mixture of S-R enantiomers, we can speculate that TzpA-A 2 accepts both D-and L-kynurenine. The peptide bond formation between the tethered amino acids is catalyzed by the first condensation domain, TzpA-C 1 , between anthranilate's carbonyl carbon and kynurenine's aliphatic primary amine. The second C domain (TzpA-C 2 ) catalyzes the final cyclization event between the aromatic amine of kynurenine and the tethered carbonyl carbon, yielding the final terreazepine product.
While the role of the terminal TzpA-T 3 domain remains uncertain, we can glean insight by looking at related NRPSs. For example, the unusual NRPS domain structure of TzpA mirrors that of GliP, the NRPS involved in gliotoxin biosynthesis (30,31). When studied in vitro, GliP mutants show behavior mirroring that of TzpA deletants: truncated GliP ΔT 3 mutants retain dipeptide synthetase activity, while ΔC 2 T 3 mutants show reduced activity (30,31). However, in vivo, GliP ΔT 3 loses activity, suggesting that the in vivo pathway involves transfer of the dipeptidyl-S intermediate from T 2 to T 3 (30). In light of these two possible pathways of cyclization from T 2 and T 3 , as well as a slow reported rate of approximately 1 per h, it has been suggested that T 3 facilitates interaction with downstream tailoring enzymes (30,31). Given that terreazepine biosynthesis does not appear to require the activity of downstream tailoring enzymes, both cyclization pathways may exist. Like the T domains of GliP, TzpA-T 2 and T 3 possess the predicted active site residue (S1937 and S2473, respectively), suggesting that they are both functional (Table S3). Similarly, TzpA-C 2 possesses the purported catalytic histidine at position H2137. However, the adjacent residue sequence diverges from the conserved SHXXXDXX(S/T) sequence shared by diketopiperazine-forming NRPSs such as GliP and HasD (30), and slightly from the SHXXXD sequence of NanA (27), suggesting it may have different cyclization requirements (Table S3). More studies will be required to elucidate the nuanced functions of terminal T domains in natural product biosynthesis.
The discovery of terreazepine and its BGC revealed that fungal IDOs can play a role in secondary metabolite biosynthesis and that kynurenine incorporation into secondary metabolites can yield novel chemical scaffolds. This suggests that targeted efforts to characterize fungal BGCs containing IDOs may facilitate the discovery of completely new molecules with unique chemical scaffolds and their derivatives. To explore this possibility, we searched sequences of 1,037 fungal genomes from GenBank and the Joint Genome Institute and located BGCs containing IDOs. Of the ϳ38,000 BGCs contained within these genomes, 118 contain an IDO. We then grouped IDO-containing BGCs into gene cluster families (GCFs) based on sequence identity and the fraction of protein domains shared between BGC pairs, anticipating that a single GCF groups BGCs that produce similar metabolites. Of the 118 IDO-containing BGCs, 68 were sorted into 16 GCFs. The remaining 50 BGCs represent singletons that had no similar BGC pairs (Fig. 3A).
Several interesting insights can be gleaned from this analysis. First, many BGCs originate from phylogenetically diverse aspergilli, an NRPS-containing subset of which are illustrated in Fig. 3B. BGCs from two Aspergillus GCFs in particular were identified as putative terreazepine clusters. The first GCF includes the terreazepine BGC itself, which exists in Aspergillus terreus and A. pseudoterreus. The second GCF contains BGCs from Aspergillus thermomutatus, A. funiculosus, and A. lentulus. The NRPSs in this GCF follow the same unusual domain sequence of ATCATCT (with the exception of A. lentulus which lacks the terminal T domain). Adenylation domain specificity codes bear remarkable similarity to those of TzpA-A 1 and TzpA-A 2 ( Table S3), suggesting that these NRPSs biosynthesize terreazepine. Unlike the terreazepine BGC, however, the BGCs in this family contain several tailoring enzymes expected to diversify the terreazepine scaffold, raising the possibility that the shared NRPS T 3 facilitates interaction with downstream enzymes in these pathways. The tailoring enzymes present in these BGCs differ from those present in the nanangelenin A cluster in A. nanangensis, indicating that a variety of terreazepine/ nanangelenin analogs may exist (27). Moreover, IDO-containing BGCs from Aspergillus ibericus and A. homomorphus may encode yet undiscovered dipeptide scaffolds containing kynurenine (Fig. 3B). Interestingly, the IDOs contained in these three GCFs represent a distinct clade of duplicated IDOs with moderate sequence homology (ϳ40%) to both A. fumigatus IdoA and IdoB (Fig. S1).
Perhaps even more remarkable is the degree to which IDO-containing BGCs span the kingdom of fungi, encompassing five taxonomic classes and two phyla (Fig. S6). Particularly interesting is the presence of several NRPS-containing BGCs originating from Basidiomycetes, given the rare and unstudied nature of NRPSs in this phylum (32). Taken together, these results reveal the rich biosynthetic potential of IDO-containing BGCs that has only just begun to be explored.

DISCUSSION
The work described herein has elucidated a clear biosynthetic role for two FACencoded enzymes, TzpA, an NRPS, and TzpB, an IDO, in the biosynthesis of terreazepine. A closer look at the genes surrounding tzpA and tzpB, however, reveals additional genes that could play a role in biosynthesis. Indeed, adjacent genes include an alcohol dehydrogenase (ATEG_07354), a transcription factor (ATEG_07357), and a P450 (ATEG_07362) ( Fig. 2A; see Table S1 in the supplemental material). While we were unable to determine whether these additional genes encode regulatory elements or biosynthetic players involved in terreazepine formation, a few possibilities warrant discussion.
First, it is possible that the tailoring enzymes present in the gene cluster (i.e., the P450 and the alcohol dehydrogenase) remain silent and that the gene cluster itself is only partially expressed. Given that only (S)-terreazepine is formed as an intermediate in nanangelenin A production in A. nanangensis (27), a second possibility is that these tailoring enzymes play a role in the epimerization of (S)-terreazpine to (R)-terreazepine. This epimerization could occur through a two-step mechanism similar to that of the conversion of (S)-reticuline to (R)-reticuline in Saccharomyces cerevisiae (33). Should this be the case, it would follow that TzpA produces (S)-terreazepine which is then oxidized from an amine to an enamine by the P450. This oxidized product could then be reduced by the alcohol dehydrogenase/oxidoreductase to form the S-R mixture that we observed. Of course, the activities of these enzymes are at this point only speculative, and future studies will be required to confirm what, if any, roles these enzymes play in terreazepine biosynthesis.
The discovery of terreazepine provides another example of how fungi repurpose primary metabolism genes for secondary metabolism. On the basis of this and other  examples, we propose two major strategies fungi employ for such repurposing: type I repurposing into biosynthetic enzymes and type II repurposing into resistance genes (Fig. 4). One of the earliest discoveries of type I repurposing is that of the important fungal toxin sterigmatocystin. Evaluation of the sterigmatocystin biosynthetic pathway revealed the presence of two fatty acid synthase (FAS) genes, stcJ and stcK located within the sterigmatocystin gene cluster. Indeed, disruption of these genes in A. nidulans resulted in strains that did not produce sterigmatocystin but were morphologically identical to wild-type strains (34). Another important example of type I repurposing is the duplicated isopropyl-malate synthase (IPMS) involved in echinocandin biosynthesis in Emericella rugulosa. Similar to the provision of kynurenine by TzpB, this duplicated IPMS serves to provide the nonproteinogenic amino acid homotyrosine for incorporation into echinocandin B (Fig. 4) (35).
In addition to repurposing duplicated primary metabolism genes to have a biosynthetic role, fungi also utilize duplicated genes from primary metabolism as a form of self-resistance (36,37). This type II repurposing represents a particularly attractive avenue for drug discovery, as the duplicated gene will often provide insight into the mechanism of action of the encoded secondary metabolite. Several examples of such type II repurposing have been discovered by targeting clusters with duplicate resistance targets. The proteasome inhibitor fellutamide B, for example, was discovered due to the presence of a duplicated proteasome subunit within its BGC (38). Similarly, the BGC encoding the methionine aminopeptidase inhibitor fumagillin contains both type I and type II methionine aminopeptidase genes in the gene cluster (Fig. 4) (39). While it is likely that many of the IDOs contained within the BGCs depicted in Fig. 3 and Fig. S6 represent type I biosynthetic enzymes that provide kynurenine for secondary metabolite synthesis, it is also possible that they represent type II duplicated gene targets that serve to protect the producing organism against the biosynthetic product. Indeed, when we began this research, we suspected that terreazepine might possess IDO inhibitory activity and show promise as an anticancer agent (40). When tested against A. fumigatus IDO mutants, however, no growth inhibitory activity was observed (data not shown). Future studies aimed to elucidate the biosynthetic products of additional IDO-containing BGCs in fungi offer exciting opportunities not only to discover new molecular scaffolds but also to identify anticancer metabolites with known mechanisms of action. In sum, this work has shown the feasibility of using FACs to discover secondary metabolites from unusual BGCs. With this approach, we identified, reconstructed, cloned, and heterologously expressed an unusual BGC encoding the novel metabolite terreazepine. Thus far, IDO involvement in secondary metabolite biosynthesis in fungi has only just begun to be explored, suggesting that future efforts targeting BGCs including IDOs and other unusual biosynthetic players may prove fruitful. By mapping BGC relatedness across the fungal kingdom, we also uncover evidence of numerous IDO-containing BGCs in phylogenetically diverse fungi that may encode metabolites with novel chemical scaffolds or IDO inhibitory activity. This approach shows promise for focusing natural product discovery efforts toward uncharted biosynthetic space, yielding a diverse array of novel molecular scaffolds.

MATERIALS AND METHODS
FAC Generation, transformation, growth, and extraction. AtFAC7O19 was produced in a prior study through the unbiased random shear of Aspergillus terreus ATCC 20542 genomic DNA (10). Using previously described methods, targeted metabolomics and FAC metabolite scoring identified a target compound with m/z 310.1188 that was likely to be produced from this FAC (10). AtFAC7O19 was then maintained in Escherichia coli strain DH10B and used to transform the Red/ET-inducible E. coli strain SW012 prior to DNA preparation and transformation of A. nidulans, as described in a previous study (11). Following transformation of A. nidulans, growth and extraction of FAC-containing and control strains were completed as previously reported (10).
LC-MS analysis. Dried metabolite extracts were resuspended in methanol to a concentration of 10 mg/ml. Samples were sonicated for 10 min and then centrifuged for 10 min at 21,000 ϫ g. Samples were analyzed by LC-MS using a Thermo Q Exactive mass spectrometer with an inline Agilent 1290 ultrahigh performance liquid chromatograph (UHPLC) with a Phenomenex Kinetix 1.3-m-particle-size column with dimensions 2 ϫ 100 mm. Five percent acetonitrile in water with 0.01% formic acid was used as buffer A, and 100% acetonitrile with 0.01% formic acid was used as buffer B. An 8 min, 0 to 100% buffer B, gradient was used. The MS heated electrospray ionization (HESI) source was set at 275°C, with sheath gas set at 60 arbitrary units and spray voltage set at 3.5 kV. Data were collected in data-dependent mode, with the top five ions from each scan selected for higher-energy collisional dissociation (HCD) MS 2 fragmentation analysis with a normalized collision energy of 25.0%. A resolution of 17,500 was used for both MS 1 and MS 2 .
Stable isotope labeling studies. Isotopically labeled media were prepared by adding 1 mM (each) [ 13 C 6 ]anthranilate or L-tryptophan-D 5 to plates of minimal medium containing glucose. The plates were inoculated with concentrated spore stocks of A. terreus and incubated at 30°C for 5 days. Metabolites were extracted from the plates using Oasis hydrophilic lipophilic balance solid phase extraction cartridges, with elution using 100% methanol. Extracts were dried in vacuo and analyzed by LC-MS as described above.
Sequence analysis of AtFAC7019. To compare the AtFAC7019 DNA sequence to that of the A. terreus reference strain NIH 2624, a barcoded Illumina next-generation sequencing library of AtFAC7O19 was generated using a True-Seq kit and pooled for sequencing with other samples at the University of Wisconsin-Madison Biotechnology Center DNA Sequencing Facility. A Ͼ1,000ϫ coverage of the AtFAC7O19 DNA sequence was achieved. A single contig (114.911 kb, completed and finished FAC clone sequence) was obtained using the DNAStar next-generation sequence assembly program. The entire FAC sequence was confirmed by FAC end sequences, PmeI and NotI digestion, and contour-clamped homogenous-electric field gel electrophoresis. From this sequencing, we identified 35,769 kb of extra sequence toward the telomere in AtFAC7O19 that was not present in the reference genome (NIH 2624). However, the terreazepine gene cluster is well conserved in both strains. The AtFAC7O19 sequence is available through GenBank under accession number MT376588. Genes have been annotated in Table S1A in the supplemental material.
Gene cluster editing and production of gene and domain deletants. Gene and domain deletions were made with a modified method as previously described for whole-genome deletions (10,11). Briefly, we used a GalK-gpdA-p gene cassette for overexpression of the transcription factor (TF) gene (ATEG_07357), a kanamycin gene cassette for deleting both ATEG_07358 (NRPS, tzpA) and ATEG_07359 (IDO, tzpB), and a GalK-apramycin (Ap) gene cassette for domain deletions as previously reported for whole-gene deletions. For example, to delete the T 3 or C 2 T 3 domains of tzpA, a pair of primers for the GalK and apramycin resistance gene cassette (24 bases forward and 23 bases reverse shown in Table S1B as lowercase letters) with an additional 50 bp (uppercase letters shown in Table S1B) of either T 3 or C 2 T 3 flanking to generate the GalK-Ap PCR amplicons. Gel-purified PCR amplicons were used to transform RedET-inducible E. coli strain SW102. Recombineered FAC colonies were then selected on LB medium with apramycin, and second-step recombineering transformation using either the AtFAC7O19-58ΔT or AtFAC7O19-58-ΔCT oligonucleotide enabled excision of the GalK-Ap gene cassette. All FAC recombinants were confirmed by PCR, restriction digestion, pulsed-field gel electrophoresis, and sequencing as needed. The primers for deletions are listed in Table S1B.
Purification and structural analysis of terreazepine. AtFAC7O19-transformed A. nidulans were grown on 576 plates and extracted as described, yielding 400 mg of dried extract. The crude extract Heterologous Expression of Terreazepine Gene Cluster ® was subjected to normal-stage flash chromatography using a CombiFlash NextGen 300 system (Teledyne-ISCO) and examined using UV absorbance at 254 and 280 nm. The extract was separated using a silica 12-g gold column (Teledyne-ISCO) at a flow rate of 30 ml/min with a 50-min hexane/CHCl 3 /methanol (MeOH) gradient. Of the 12 resulting simplified fractions, fraction 6 contained the highest concentration of the target metabolite and was subjected to reversed-phase separation using an Agilent 1200 HPLC and analyzed using OpenLAB CDS Chemstation software (version number 1.8.1, Agilent Technologies). Fraction 6 (56.9 mg) was separated on a Kinetix C 18 semipreparatory column (5 m; 100 Å; 250 ϫ 10.0 mm; Phenomenex) at a flow rate of 2.5 ml/min. Approximately 200 l of a 10-mg/ml solution was injected per run, and subsequent fractions were pooled together for analysis. The 45-min run began at 20:80 CH 3 CN-H 2 O and was increased to 40:60 over 20 min. The gradient was increased to 100% CH 3 CN over the next 20 min and held at 100% for the remainder of the run. Terreazepine eluted from 16 to 17.5 min (1.5 mg, 90% purity). All solvents used for chromatographic separation were acquired through Sigma-Aldrich and were HPLC-grade. MS 2 spectra for terreazepine are shown in Fig. S2.
Nuclear magnetic resonance (NMR) spectra for the target compound, including 1 H, 13 C, correlation spectroscopy (COSY), heteronuclear multiple bond correlation (HMBC), and heteronuclear single quantum correlation (HSQC) spectra, were obtained using a Bruker Avance III 500-MHz system equipped with a DCH CryoProbe at 298.2 K. NMR spectra from the natural compound match those of the synthesized compound (see "Total synthesis and stereochemical analysis of terreazepine" below and Fig. S3). NMR spectra of the synthetic compound in DMSO-d 6 are the most complete and were used for full structural characterization (Table S2 and Fig. S3A to S3G). 1 H and 13 C NMR spectra are also provided for the synthetic compound in methanol-d 4 ( Fig. S3H and S3I). 1  The 1 H NMR spectrum (Fig. S3C) enabled detection of two CH 2 protons (␦ 3.02 and 3.24), one CH proton (␦ 4.99), two amide protons (␦ 8.42 and 10.38), two NH 2 protons (␦ 6.38), and eight aromatic protons (␦ 6.54 -7.76). This is consistent with the proposed molecular formula of C 17 H 15 N 3 O 3 . The presence of eight proton signals in the aromatic region is consistent with the presence of two di-substituted benzene rings, while the three protons in the aliphatic region are consistent with the CH and CH 2 species located in the seven-membered azepane-2,5-dione substructure. The downfield shift at H-11 (␦ 4.99) is congruent with its adjacency to an amide nitrogen at position 12. The amide proton shift at ␦ 10.38 was readily assigned to the proton at position 2 given its adjacency to the aromatic ring system, while the remaining amide proton at H-12 was assigned to the ␦ 8.42 shift.
The 13 C spectrum (Fig. S3D) contains only 16 signals, but one signal (␦ 128.47) corresponds to two carbons, further supporting the proposed molecular formula. Three of the signals corresponded to carbonyls, including two amides and one ketone (␦ 168.63, 171.25, and 197.76, respectively), one was a CH 2 (␦ 45.75), one was an N-CH (␦ 46.14), and 12 were aromatic CH or quaternary signals (8 CHs and 4 quaternaries, ␦ 113.87-149.70). 2D experiments were used to correlate aromatic 1 H and 13 C bonds, as well as to determine the connectivity of the molecule.
The HSQC experiment enabled linkage of protons to their corresponding carbons (Fig. S3F), while the COSY and HMBC spectra helped to build connections through the molecule. Three distinct portions of the molecule are evident using COSY correlations linking protons H-4 through H-7, protons H-10 through H-12, and protons H-15 through H-18 (Fig. S3E). The remainder of the molecule could be pieced together using HMBC correlations (Fig. S3G). We determined that the site of cyclization was at position 1 due to the correlation between the amide proton at H-2 and the carbonyl carbon C-1.
Total synthesis and stereochemical analysis of terreazepine. Detailed descriptions and schemes for the total synthesis of (R)-and (S)-terreazepines as well as methods for the determination of their stereochemical configuration are provided in Text S1 in the supplemental material.
Bioinformatics analysis. (i) Creation of gene cluster family networks. We downloaded a data set of 1,037 fungal genomes from GenBank and the Joint Genome Institute Genome Portal, excluding Saccharomycotina as this subphylum contains a large number of genomes with very few gene clusters. Biosynthetic gene clusters were detected using the command-line version of antiSMASH 4 with default parameters (41). For creation of gene cluster families, we used an approach similar to those mentioned (42,43). Protein domains were detected using the HMMER suite hmmsearch, resulting in protein domain arrays for each gene cluster. For each pair of gene clusters, the median sequence identity of backbone enzyme domains and the Jaccard similarity of protein domain sets was calculated. These two similarity metrics were combined for a single similarity score: similarity score ϭ ͱ (0.8 ϫ sequence identity) ϩ (0.2 ϫ Jaccard similarity) Caesar et al.

®
A final clustering step was performed using DBSCAN clustering with an epsilon value of 0.55 (corresponding to a similarity score of 0.45). This process thus resulted in a network of gene clusters, where subgraphs correspond to gene cluster families.
All gene cluster family dendrograms ( Fig. 3 and Fig. S6) were created using unweighted-pair group method using average linkages (UPGMA) clustering based on this same cluster similarity score.
(ii) Sequence alignments. All sequence alignments were performed using MUSCLE with default parameters in MEGA. For the data in Table S3, adenylation domain sequences from IDO-containing gene clusters were aligned together with known adenylation domain sequences from MIBiG. Putative substrate-binding residues were extracted using gramicidin S synthetase as a reference.
(iii) Phylogenetic analyses. A maximum likelihood phylogenetic tree was constructed using default parameters in MEGAX (44) to produce a tree containing IDOs in 15 Aspergillus species. The tree was constructed using 500 bootstrap iterations.
(iv) Determining IDO counts per genome. All Aspergillus genomes included in this study were analyzed using the HMMER suite tool hmmsearch, using the IDO Pfam model (PF01231) as a query model, a minimum subject sequence length of 350 residues, and a maximum E value of 1EϪ30.
Data availability. The AtFAC7O19 sequence is available through GenBank under accession number MT376588. Experimental details (Text S1) and other supporting data, including Fig. S1 to S6 (phylogenetic trees, structural elucidation data, stereochemical evaluation data, and expanded details of IDOcontaining fungal gene clusters), and Tables S1 to S3 (genome sequence alignments and primers, NMR summary, adenylation domain specificity codes, and C and T domain alignments) are included in the supplemental material.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.1 MB.

ACKNOWLEDGMENTS
We acknowledge K. Clevenger and P. Gao for their contributions to developing the FAC-metabolite scoring platform. R. Betori and J. Zhu from K. Scheidt's group are also acknowledged for their assistance with supercritical fluid chromatography analysis.
This work was funded by the U.S. National Institutes of Health SBIR grant R44AT009158 to C.C.W., J.W.B., and N.L.K., grant R01AI065728 to N.P.K., grant F32GM132679 to L.K.C. Part of this work was performed by the Northwestern University ChemCore, which is funded by Cancer Center support grant P30CA060553 from the National Cancer Institute awarded to the Robert H. Lurie Comprehensive Cancer Center and the Chicago Biomedical Consortium with support from the Searle Funds at the Chicago Community Trust. This work made use of the IMSERC and Northwestern University, which has received support from the Soft and Hybrid Nanotechnology Experimental (SHyNE) Resource (NSF ECCS-1542205), the State of Illinois, and the International Institute for Nanotechnology (IIN).
M.S., M.N.I., R.Y., and C.C.W. are employees of Intact Genomics that sells bacterial artificial chromosome (BAC) and FAC libraries, in addition to genome discovery and DNA research services, BAC cloning, DNA end repairing, E. coli competent cells, and other DNA and protein research kits.