Taxonogenomic Analysis of Marine-Derived Streptomyces sp. N11-50 and the Proﬁle of NRPS and PKS Gene Clusters

: Streptomyces sp. N11-50 was isolated from deep-sea water and found to produce diketopiperazine (DKP) compounds such as albonoursin and cyclo(Phe-Leu). This study aimed to reveal the potential to synthesize diverse nonribosomal peptide and polyketide compounds as the other secondary metabolites different from DKP after clarifying the taxonomic position. Strain N11-50 was identiﬁed as Streptomyces albus , as it showed 100% 16S rRNA gene sequence similarities and 95.5% DNA–DNA relatedness to S. albus NBRC 13014 T . We annotated the nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) gene clusters in the genome. Consequently, ﬁve NRPS, one hybrid PKS/NRPS, ﬁve type-I PKS and one type-II PKS gene clusters were observed, of which we predicted the products through bioinformatic analysis. These gene clusters were well conserved in already whole-genome sequence (WGS)-published strains belonging to S. albus . On the other hand, our taxonogenomic analysis revealed that three WGS-published S. albus strains were not S. albus . Two of the three should be classiﬁed as Streptomyces albidoﬂavus , and the remaining one was likely a new genomospecies. After reclassifying these appropriately, we demonstrated species-speciﬁc proﬁles of the NRPS and PKS gene clusters with little strain-level diversities.


Introduction
Secondary metabolites produced by actinomycetes are a promising source for pharmacological industries. Members of the genus Streptomyces are recognized as a rich source of structurally diverse secondary metabolites with useful bioactivities. Recent genome analyses revealed that each of the strains harbors a few dozen biosynthetic gene clusters (BGCs) for secondary metabolites in its genome. Secondary metabolites are classified according to their chemical structures and biosynthetic pathways. Polyketide and nonribosomal peptide compounds are major secondary metabolites in the genus Streptomyces because half to three quarters of the secondary metabolite-biosynthetic gene clusters (smBGCs) in a streptomycetal genome encode polyketide synthases (PKSs) and/or nonribosomal peptide synthetases (NRPSs) [1,2]. PKSs synthesize a polyketide chain from acyl-CoA molecules as the building blocks, whereas NRPSs do peptide chains from amino acids in a similar manner. These chains are modified through various mechanisms, such as reduction, cyclization, methylation, and epimerization, to yield the final products. Diversities in the chemical structures are due to the differences in the chain lengths and complex of various building blocks in addition to these modifications. Type-I PKSs and NRPSs are large modular enzymes with multiple catalytic domains. As chain elongations by these enzymes are based on the co-linearity rule of assembly lines [3], the chemical structures of the chains can be predicted through bioinformatic analysis of the domain organizations. In contrast, type-II and type-III PKSs iteratively catalyze polyketide chain elongations. Type-II PKSs are

Materials and Methods
Streptomyces sp. N11-50 was isolated from deep-sea water collected in Toyama, Japan [7]. This strain, preserved in Toyama Prefectural University as TP-A0906, was deposited to the NBRC Culture Collection and is available as NBRC 113679. EzBioCloud was used to search for taxonomic neighbors based on 16S rRNA gene sequences [16]. Multilocus sequence analysis (MLSA) was conducted using the concatenated gene sequences of atpD, gyrB, recA, rpoB and trpB, as recommended by Rong and Huang [17]. Genomic DNA for whole-genome sequencing was prepared from cultured cells using the method of Saito and Kimura [18]. Subsequently, library preparation and whole-genome de novo sequencing were performed by the Kazusa DNA Research Institute using a single-molecule real-time (SMRT) strategy. Sequencing was performed using the BluePippin system (Sage Science, MA, USA) with a SMRTbell Template Prep Kit 1.0 and a SMRTbell Damage Repair Kit (Pacific Biosciences, CA, USA), via the Sequel system with Sequel SMRT cell 1M versions 2 and 3, Sequel Sequencing Kits 2.1 and 3.0, a Sequel Binding Kit 2.0, and a Sequel Binding and Internal Ctrl Kit 3.0 (Pacific Biosciences). The resulting reads were assembled using SMRT Link version 6.0 (Pacific Bioscience) and Prokka 1.13.3. The accession numbers of the draft genome sequence are BNEJ01000001-BNEJ01000031. Digital DNA-DNA hybridization (dDDH) was carried out using the Genome-to-Genome Distance Calculator (GGDC) [19]. The DDH estimate (GLM-based) of Formula 2 (identities/HSP length), which is recommended in GGDC, was used as DNA-DNA relatedness. WGS-published S. albus strains were searched for on the NCBI website. Nucleotide BLAST (blastn) was used to search for WGS-published strains identified as Streptomyces sp. and showing >99.9% 16S rRNA gene sequence similarities to strain N11-50. Phylogenetic and phylogenomic trees were reconstructed using ClustalX 2.1 and the TYGS server [19], respectively. NRPS and PKS gene clusters in genomes were surveyed using antiSMASH [20], and then manually annotated as reported previously [15].

Taxonomic Position of Streptomyces sp. N11-50 and Related WGS-Published Strains
Streptomyces sp. N11-50 showed 100% 16S rRNA gene sequence similarity to Streptomyces albus NBRC 13014 T as the closest species. The second most similar species was observed to be Streptomyces reniochalinae, but the value is 98.6%, which is less than the cut-off (99.0%) for species delineation recognized in actinomycetes [21]. This suggests that the strain differs from the other species, except for S. albus.
The WGSs of 21 S. albus strains are published in GenBank/ENA/DDBJ at present. Among them, the taxonomic positions of eighteen strains are already reported, but the remaining three S. albus strains, G153, INA 01303 and NRRL B-2238, have not been studied [22]. Additionally, the WGSs of four strains showing >99.9% 16S rRNA gene sequence similarities to S. albus N11-50, such as Streptomyces sp. NRRL F-5639, Streptomyces sp. NRRL F-5917, Streptomyces sp. HPH0547 and Streptomyces sp. PHES57 51, are also published under the accession numbers JOGK01000000, JOHQ01000000, ATCE01000000 and JAINRF01000000, respectively. These four strains have not been classified at the species level. Thus, we included the three S. albus strains and four Streptomyces strains as well as Streptomyces N11-50 in our analysis. Strains G153, INA 01303 and NRRL B-2238 showed 100%, 99.7% and 100% rRNA gene sequence similarities to the type strain of Streptomyces albidoflavus. Streptomyces koyangensis was the next closest species, with 99.4-99.3% similarities. Type strains of the other species, including S. albus, did not show >99.0% similarities to the three strains. A phylogenetic tree of these members, in addition to strain N11-50, based on 16S rRNA gene sequences was reconstructed with type strains showing >99.0% sequence similarities ( Figure 1). Streptomyces violascens strains ATCC 27968 and NBRC 12920 T were included in the tree because we noticed that S. violascens ATCC 27968 was closely related to strain INA 01303 but was not the type strain. Streptomyces sp. N11-50, Streptomyces sp. NRRL F-5639, Streptomyces sp. NRRL F-5917, Streptomyces sp. HPH0547 and Streptomyces sp. PHES57 51 formed a clade with S. albus NBRC 13014 T . In contrast, G153 and NRRL B-2238 were not included in the clade but formed a clade with S. albidoflavus. Similarly, INA 01303 formed a clade with S. violascens ATCC 27968, which is not closely related to the type strain of S. violascens. These results suggest that strains G153, NRRL B-2238 and INA 01303 were incorrectly identified as S. albus. These species names registered in the databases must be properly curated.
Streptomyces strains are unable to be classified at the species level through only 16S rRNA gene sequence analysis [23]. MLSA [24] and/or dDDH [23] are recommended for the molecular classification. We therefore sequenced the whole genome of Streptomyces sp. N11-50 to classify the strain. The genome size and G + C content were 8.29 Mb and 72.8%, respectively. The genome size was 0.7 Mb larger than that of Streptomyces albus NBRC 13014 T (7.59 Mb) whereas their G + C contents were almost the same value (S. albus NBRC 13014 T , 72.7%).
We reconstructed MLSA-based phylogenetic and phylogenomic trees. Their phylogenetic relationships were similar to those in the tree based on the 16S rRNA gene sequences (Figures 2 and 3). We estimated the evolutionary distance in MLSA and DNA-DNA relatedness in dDDH (Table 1). Streptomyces sp. N11-50, Streptomyces sp. HPH0547, Streptomyces sp. PHES57 51, Streptomyces sp. NRRL F-5639 and Streptomyces sp. NRRL F-5917 showed an evolutionary distance of <0.001 and DNA-DNA relatedness of >90% to the type strain of S. albus. As the thresholds of MLSA evolutionary distance and DNA-DNA relatedness for species delineation are 0.007 and 70%, respectively, these five strains were identified as S. albus. In contrast, as these values of strains G153, NRRL B-2238 and INA 01303 to the type strain of S. albus were >0.15 and <24%, respectively, these strains were not identified as S. albus. Strains G153 and NRRL B-2238 were identified as S. albidoflavus because their evolutionary distances and DNA-DNA relatedness to the type strain were 0.001-0.002 and 91.7-92.1, respectively. On the other hand, although S. albus INA 01303 and S. violascens ATCC 27968 belong to the same genomospecies based on the evolutionary distance (0.003) and DNA-DNA relatedness (93.5%), they were considered as a putative new species, because S. violascens ATCC 27968 is not the type strain of S. violascens and they could not be classified as any known species. Streptomyces strains are unable to be classified at the species level through only 1 rRNA gene sequence analysis [23]. MLSA [24] and/or dDDH [23] are recommended f the molecular classification. We therefore sequenced the whole genome of Streptomy sp. N11-50 to classify the strain. The genome size and G + C content were 8.29 Mb a 72.8%, respectively. The genome size was 0.7 Mb larger than that of Streptomyces alb NBRC 13014 T (7.59 Mb) whereas their G + C contents were almost the same value (S. alb NBRC 13014 T , 72.7%).
We reconstructed MLSA-based phylogenetic and phylogenomic trees. Their phy genetic relationships were similar to those in the tree based on the 16S rRNA gene s quences (Figures 2 and 3). We estimated the evolutionary distance in MLSA and DN DNA relatedness in dDDH (            . Phylogenomic tree reconstructed with the TYGS server. Tree inferred with FastME 2.1.6.1 [25] from GBDP distances calculated from genome sequences. The branch lengths are scaled in terms of GBDP distance formula d 5 . The numbers above branches are GBDP pseudo-bootstrap support values > 60% from 100 replications, with an average branch support of 63.4%. E. scabrispora DSM 41855 T was used as the outgroup (not shown) to show the root.

NRPS and Hybrid PKS/NRPS Gene Clusters in S. albus N11-50
S. albus N11-50 harbored five NRPS gene clusters and one hybrid PKS/NRPS gene cluster, as recorded in Table 2. NRPS gene cluster 2 (nrps-2) and nrps-3 were identified as BGCs of dudomycin (1) [26] and enteromycin (2) [27], respectively (Figure 4), according to the domain organizations identical to theirs. In contrast, the others were orphan, whose products have not been identified. Therefore, we bioinformatically predicted their products. The product of nrps-1 was predicted to be a tripeptide compound derived from threonine, valine and serine residues. Nrps-4 seemed to synthesize a compound derived from dipeptide, but the amino acid residues could not be predicted. The product of nrps-5 was predicted to be a tetrapeptide with two cysteine residues. Hybrid PKS/NRPS gene cluster-1 (pks/nrps-1) includes four PKS modules and two NRPS modules, one of which was responsible for incorporating asparagine residue. Thus, its product was predicted to be a tetraketide compound with asparagine residue.    Table 2.

Type-I and Type-II PKS Gene Clusters in S. albus N11-50
S. albus N11-50 harbored five type-I PKS and one type-II PKS gene clusters, as recorded in Table 3. Although t1pks-2 was identified as a BGC of tambjamine BE-18591 (3), the others were orphan. The product of t1pks-1 could not be predicted because t1pks-1 included only one PKS module and the ORF did not show high sequence similarities to PKSs with the identified products. T1pks-3 was predicted to be a BGC for an ibomycin congener based on its domain organization, which resembles that of ibomycin (4a) [28].  Table 2.

Type-I and Type-II PKS Gene Clusters in S. albus N11-50
S. albus N11-50 harbored five type-I PKS and one type-II PKS gene clusters, as recorded in Table 3. Although t1pks-2 was identified as a BGC of tambjamine BE-18591 (3), the others were orphan. The product of t1pks-1 could not be predicted because t1pks-1 included only one PKS module and the ORF did not show high sequence similarities to PKSs with the identified products. T1pks-3 was predicted to be a BGC for an ibomycin congener based on its domain organization, which resembles that of ibomycin (4a) [28]. The polyketide chain synthesized by PKSs of t1pks-3 (4b) was predicted, as shown in Figure 5. T1pks-4 was predicted to synthesize a congener of lactomycins (5a) [29] and phoslactomycin (5b) [30] with a polyketide backbone, shown as 5c in Figure 5, and based on their similar domain organizations. T1pks-5 was predicted to synthesize an enediyne compound because the domain organization is KS-AT-KR-DH-ACP, which is specific to PksE, responsible for synthesis of enediyne moiety [31]. Type-II PKS gene cluster-1 (t2pks-1) was predicted to synthesize an aromatic compound like xantholipin (6) because its KSα and KSβ (CLF) showed 93% (89%) and 85% (87%) amino acid sequence similarities (identities) to those of xantholipin [32], respectively.    1 shown by locus tag such as TPA0909_14380; 2 encoded in the complementary strand. Abbreviations are as follows: AmT, aminotransferase; ATem, AT for ethylmalonyl-CoA; CLF, chain length factor; DH, dehydratase; ER, enoyl reductase; t1pks, type-I PKS gene cluster; t2pks, type-II PKS gene cluster. The other abbreviations are the same as those in Table 2. Chemical structures of 3 to 6 are shown in Figure 5. Table 3. 3, tambjamine BE-18591; 4a, polyketide chain synthesized by PKSs for ibomycin; 4b, predicted polyketide chain synthesized by PKSs of t1pks-3; 5a, lactomycin C; 5b, polyketide backbone synthesized by PnA to PnF (PKSs for phoslactomycin) before post-PKS processing; 5c, predicted backbone synthesized by PKS of t1pks-4. R = H or OH; 6, xantholipin.

Distribution of NRPS and PKS Gene Clusters Found in S. albus N11-50
We investigated whether the twelve NRPS and PKS gene clusters of S. albus N11-50 are conserved in WGS-published S. albus strains that have been confirmed to be S. albus in the report of Vela Gurovic et al. [22], three WGS-published S. albus strains that were reclassified as other species and four WGS-published Streptomyces strains that were identified as S. albus in Section 3.1. All the NRPS and PKS gene clusters were conserved in S.  Table 3. 3, tambjamine BE-18591; 4a, polyketide chain synthesized by PKSs for ibomycin; 4b, predicted polyketide chain synthesized by PKSs of t1pks-3; 5a, lactomycin C; 5b, polyketide backbone synthesized by PnA to PnF (PKSs for phoslactomycin) before post-PKS processing; 5c, predicted backbone synthesized by PKS of t1pks-4. R = H or OH; 6, xantholipin.

Distribution of NRPS and PKS Gene Clusters Found in S. albus N11-50
We investigated whether the twelve NRPS and PKS gene clusters of S. albus N11-50 are conserved in WGS-published S. albus strains that have been confirmed to be S. albus in the report of Vela Gurovic et al. [22], three WGS-published S. albus strains that were reclassified as other species and four WGS-published Streptomyces strains that were identified as S. albus in Section 3.1. All the NRPS and PKS gene clusters were conserved in S. albus strains except for the type strain, which lacked nrps-3, as shown in Table 4. S. albus NRRL F-5917 harbored an extra NRPS, as stated in the footnote of Table 4. In contrast, strains G153, NRRL B-2238 and INA 01303, which are registered as S. albus but revealed not to be S. albus, did not harbor these gene clusters.     Tables 2 and 3. Orphan gene clusters of the other strains were not numbered but gene clusters whose products were predicted are indicated with the gene name or product. Gene clusters specific in Streptomyces sp. INA 01303/ATCC 27968 or S. albidoflavus are shown as circles filled with a light color (b). The same gene clusters are as connected by dashed lines. Gene clusters specific to a strain are shown by circles filled a dark color. All the gene clusters of S. albidoflavus G153 were conserved in S. albidoflavus DSM 40455 T but the data is not indicated here because the WGS of S. albidoflavus DSM 40455 T is draft composed of 66 contig sequences. ant, antimycin [33]; enc, enterocin [34]; fdm, fredericamycin [35]; flk, cyclofaulknamycin [36]; fsc, candicidin [37]; PTMs, polycyclic tetramate macrolactams [38]; sur, surugamide [39]; tot, totopotensamides [40].

Discussion
Our isolate, N11-50, was identified as S. albus, and its genome encoded twelve NRPS and PKS gene clusters. These twelve gene clusters were well conserved in WGS-published S. albus strains. Strain diversity within S. albus was low in profile of these gene clusters because the strain diversity observed here is only the lack of nrps-3 in the type strains of S. albus and the presence of an extra NRPS gene cluster in S. albus NRRL F-5917. Except for the extra gene clusters, these conserved NRPS and PKS gene clusters in principle limit the structural diversity in nonribosomal peptide-and polyketide-skeletons that can be synthesized by S. albus to twelve, as elucidated in this study. However, S. albus can be expected to produce new nonribosomal peptide-and polyketide-compounds because nine clusters, except for nrps-2, nrps-3 and t1pks-2, were orphan and seemed to be BGCs for the unknown compounds. Seipke reported that smBGCs are diverse among the strains within S. albus [41], but the author studied not S. albus but S. albidoflavus J1074 and its   Tables 2 and 3. Orphan gene clusters of the other strains were not numbered but gene clusters whose products were predicted are indicated with the gene name or product. Gene clusters specific in Streptomyces sp. INA 01303/ATCC 27968 or S. albidoflavus are shown as circles filled with a light color (b). The same gene clusters are as connected by dashed lines. Gene clusters specific to a strain are shown by circles filled a dark color. All the gene clusters of S. albidoflavus G153 were conserved in S. albidoflavus DSM 40455 T but the data is not indicated here because the WGS of S. albidoflavus DSM 40455 T is draft composed of 66 contig sequences. ant, antimycin [33]; enc, enterocin [34]; fdm, fredericamycin [35]; flk, cyclofaulknamycin [36]; fsc, candicidin [37]; PTMs, polycyclic tetramate macrolactams [38]; sur, surugamide [39]; tot, totopotensamides [40].
The positions of these gene clusters in each chromosome were shown in a diagram using the S. albus strains DSM 40763, N11-50 and CAS922 as examples (Figure 6a) because the strains DSM 40763 and CAS922 have been confirmed to be S. albus [22], and their WGSs are complete. Although WGS of S. albus N11-50 is composed of 31 contig sequences, it is less incomplete than the other WGS-published S. albus strains. Similarly, strains INA 01303 and ATCC 27968, which were identified as the same new genomospecies in this study, shared twelve gene clusters different from those of S. albus, although INA 01301 and ATCC 27968 harbored one extra PKS gene cluster (filled in red) and one extra NRPS gene cluster (filled in blue), respectively, and the positions of a PKS gene cluster differed between the strains (Figure 6b, upper). In contrast, only seven to eight of the gene clusters of the two strains were conserved in S. albidoflavus, and five to six and four gene clusters were specific (filled in a light color) to a putative new species INA 01303/ATCC 27968 (indicated as Streptomyces sp. in Figure 6b) and S. albidoflavus, respectively, although these two species showed approximately 65% DNA-DNA relatedness (Table 1) and are taxonomically close.

Discussion
Our isolate, N11-50, was identified as S. albus, and its genome encoded twelve NRPS and PKS gene clusters. These twelve gene clusters were well conserved in WGS-published S. albus strains. Strain diversity within S. albus was low in profile of these gene clusters because the strain diversity observed here is only the lack of nrps-3 in the type strains of S. albus and the presence of an extra NRPS gene cluster in S. albus NRRL F-5917. Except for the extra gene clusters, these conserved NRPS and PKS gene clusters in principle limit the structural diversity in nonribosomal peptide-and polyketide-skeletons that can be synthesized by S. albus to twelve, as elucidated in this study. However, S. albus can be expected to produce new nonribosomal peptide-and polyketide-compounds because nine clusters, except for nrps-2, nrps-3 and t1pks-2, were orphan and seemed to be BGCs for the unknown compounds. Seipke reported that smBGCs are diverse among the strains within S. albus [41], but the author studied not S. albus but S. albidoflavus J1074 and its phylogenetically close and unidentified strains, which were likely neither S. albus nor S. albidoflavus [42]. He did not include type strains. Therefore, his report unfortunately did not actually examine strain-level diversity within S. albus. Very recently, Vela Gurovic et al. reported core secondary metabolome in S. albus, where ten NRPS and PKS gene clusters were described [22]. However, data from these gene clusters were nothing more than a result through only antiSMASH analysis, and the authors did not carefully review and annotate the gene clusters. Although one PKS gene cluster was reasonably annotated to be a BGC of xantholipin, ibomycin-BGC was mistakenly annotated as a hybrid oligosaccharide/T1PKS gene cluster. The metabolites of the other PKS and NRPS gene clusters were unassigned at all. Two type-I PKS gene clusters were inappropriately assigned to hybrid with other types of gene clusters. It was not described what extra gene clusters are. In contrast, we carefully and manually annotated the NRPS and PKS gene clusters, predicted the chemical structures of the products, as shown in Tables 2 and 3 and Figures 4 and 5, and then investigated the strain-level diversity of the NRPS and PKS gene clusters within S. albus. Thus, this is the first report that studied the strain-level diversity on NRPS and PKS gene clusters, in fact. Additionally, we investigated the taxonomic positions of three WGS-published S. albus strains, which have not been studied by Vela Gurovic et al. [22]. We do not include the taxonomic positions of the other WGS-published S. albus strains reported by Vela Gurovic et al. [22] in our present study. Consequently, it is revealed that S. albus G153 and NRRL B-2238 are S. albidoflavus, whereas S. albus INA 01303 is a new genomospecies with S. violascens ATCC 27968. If we did not confirm the taxonomic positions of these WGS-published strains, we might have concluded that the NRPS and PKS gene clusters are diverse among strains even within a single species. Hence, species names need to be updated according to the latest criteria for classification.
We also revealed that strains NRRL F-5639, NRRL F-5917, HPH0547 and PHES57 51, which are registered as Streptomyces sp., belong to S. albus. Recent availability of type strains' WGSs enabled us to conduct taxonogenomic classification easier than before. On the other hand, many WGSs were not complete but draft sequences with several dozen to thousands of contig sequences due to short-read sequencing. Although they can be used for dDDH, they are not appropriate for the analysis of PKS and NRPS gene clusters because many gene clusters are not completely sequenced but fragmentated into several contigs. In contrast, since we completely sequenced all the twelve gene clusters, this study does not include such issues. As used in this study, long-read sequencing such as PacBio would be better to analyze BGCs encoding large modular enzymes.
We analyzed NRPS and NRPS gene clusters in strains of not only S. albus but also S. albidoflavus and a putative new genomospecies with a complete WGS available. Our present study demonstrated that strains belonging the same species share the same or similar sets of NRPS and PKS gene clusters, and that strains classified as different species do not share similar sets of these gene clusters even if the strains are phylogenetically and/or taxonomically close, as shown in Figure 6. These results strongly support our idea that has been proposed in our previous studies [42]. Many researchers seem to believe that there is no correlation between taxonomic species and secondary metabolites. By accumulating and publishing more examples from Streptomyces strains with an updated species name, it can be further clarified that our idea is widely applicable to the genus Streptomyces.

Conclusions
An albonoursin-and cyclo(Phe-Leu)-producing Streptomyces sp. N11-50 was classified as S. albus and revealed to possess twelve NRPS and PKS gene clusters. These gene clusters were well conserved in the WGS-published strains that belonged to S. albus. Our taxonogenomic analysis revealed that S. albus G153 and NRRL B-2238, Streptomyces sp. HPH0547, PHES57 51, NRRL F-5639 and NRRL F-5917, and S. albus INA 01303 and S. violascens ATCC 27968 were S. albidoflavus, S. albus, and a new genomospecies, respectively. By reclassifying the WGS-published strain appropriately, the species-specific profiles of the NRPS and PKS gene clusters, with little strain-level diversities, were clearly demonstrated.