Genome sequence data of Streptomyces sp. SS52, an endophytic strain for daidzein biosynthesis

We report here the biosynthesis of daidzein in Streptomyces sp. SS52, its genome sequence and the analysis of its genome for finding putative genes involved in daidzein biosynthesis. The Streptomyces sp. SS52 strain was isolated from the plant Phyllanthus urinaria in Tra Vinh province, Vietnam. This endophytic strain is capable of producing the isoflavone daidzein in the culture medium. Streptomyces sp. SS52 possesses a linear genome of 8,184,045 bp and the GC content of this genome is 72.5%. The preliminary genome analysis identified homologs of genes involved in the de novo biosynthesis of daidzein in the genome of Streptomyces sp. SS52. The genome sequencing of Streptomyces sp. SS52 was essential for the study of the biosynthesis of daidzein in Streptomyces bacteria.


a b s t r a c t
We report here the biosynthesis of daidzein in Streptomyces sp. SS52, its genome sequence and the analysis of its genome for finding putative genes involved in daidzein biosynthesis. The Streptomyces sp. SS52 strain was isolated from the plant Phyllanthus urinaria in Tra Vinh province, Vietnam. This endophytic strain is capable of producing the isoflavone daidzein in the culture medium. Streptomyces sp. SS52 possesses a linear genome of 8,184,045 bp and the GC content of this genome is 72.5%. The preliminary genome analysis identified homologs of genes involved in the de novo biosynthesis of daidzein in the genome of Streptomyces sp. SS52. The genome sequencing of Streptomyces sp. SS52 was essential for the study of the biosynthesis of daidzein in Streptomyces bacteria.

Data
Streptomyces sp. SS52 was isolated from Phyllanthus urinaria in Tra Vinh province of Vietnam and this endophytic strain showed the capacity of producing daidzein in the culture medium. The presence of daidzein in the culture medium was confirmed by chromatographic techniques and NMR spectroscopy. Compound 1 was isolated from the culture of Streptomyces sp. SS52 as an amorphous powder. The 1 H NMR spectrum of 1 revealed a 1,2,4-trisubstituted benzene ring including the aromatic protons at d H 7.96 (1H, d, J ¼ 8.5 Hz), d H 6.96 (1H, dd, J ¼ 8.5, 2.0 Hz) and 7.65 (1H, d, 2.0); a 1,4-disubstituted benzene ring assigned by two ortho-coupled protons at d H 7.37 (1H, d, J ¼ 8.5 Hz) and d H 6.79 (1H, d, J ¼ 8.5 Hz); a singlet olefinic proton at d H 8.28 and a hydroxy group at d H 9.51. The 13 C NMR spectrum exhibited the presence of 13 carbon signals, consisting of one carbonyl carbon (d C 174.6), eight sp 2 methines carbon, and five substituted sp 2 carbons in the zone of 122e165 ppm (Table 1). These spectroscopic data were highly similar to those of daidzein [1], indicating that 1 was daidzein.
Streptomyces sp. SS52 was selected for genome sequencing for identification of putative genes involved in the biosynthesis pathway of daidzein. The assembled genome of Streptomyces sp. SS52 has the size of 8,184,045 bp with the GC content of 72.5% and the coverage of 156-fold. The complete genome of Streptomyces sp. SS52 has the Average Nucleotide Identity (ANI) value of 99.98% with Streptomyces sp. CC71, the ANI value of 99.97% with Streptomyces rochei NRRL B-2410, the ANI value of 99.81% with Streptomyces sp. CCM_MD2014. As a result of gene prediction and annotation by the NCBI Prokaryotic Genome Annotation Pipeline, a total of 7320 genes was predicted including 6843 proteincoding genes, 67 tRNA genes, 3 ncRNA genes, and 18 rRNA (5S (6), 16S (6), 23S (6)) genes. In addition, a total of 389 pseudogenes was also predicted in the genome of Streptomyces sp. SS52 (Table 2).
In plant, daidzein is synthesized by the phenylpropanoid pathway [2]. In this pathway, phenylalanine is first converted to cinnamate by phenylalanine ammonia lyase. Cinnamate is then transformed Specification Table   Subject Biology Specific subject area Microbiology, Genomics, Biotechnology Type of data by cinnamate 4-hydroxylase to r-coumarate which is next converted to r-coumaroyl-CoA by 4coumarate-CoA ligase. The r-coumaroyl-CoA starting unit is condensed by chalcone synthase and modified by chalcone reductase to give 4,2 0 ,4 0 -trihydroxychalcone which is then converted to 7,4 0dihydroxyflavanone by chalcone isomerase. Finally, 7,4 0 -dihydroxyflavanone is converted to 7,4 0dihydroxyisoflavone (daidzein) by isoflavone synthase [3]. Homologous gene searching of the Streptomyces sp. SS52 genome using BLAST Program showed genes encoding proteins analogous to phenylalanine ammonia lyase, cinnamate 4-hydroxylase, 4-coumarate-CoA ligase, chalcone synthase, chalcone reductase, chalcone isomerase, isoflavone synthase in plants and in Streptomyces clavuligerus (Table 3). Phenylalanine ammonia lyase from the plant Stylosanthes humilis was used to search Streptomyces sp. SS52 genome and one matching protein, histidine ammonia lyase, was found. This 512 amino acid protein is encoded by hutH (locus tag E5N77_22775 in the Streptomyces sp. SS52 genome) and shares 31% amino acid identity (48% conserved residues) to the plant phenylalanine ammonia lyase for the whole protein sequence. Similarly, cinnamate 4-hydroxylase from Glycine max has an analogous protein in Streptomyces sp. SS52, cytochrome P450, with 24% identity (39% functionally conserved amino acids). This cytochrome P450 protein is encoded by a gene with the locus tag E5N77_23955 in the Streptomyces sp. SS52 genome. The highest scores in amino acid identity and functionally conserved amino acids were found between 4-coumarate-CoA ligase of Nicotiana tabacum and 4coumarate-CoA ligase family protein of Streptomyces sp. SS52. The scores were 42% for amino acid   [4]. The protein analogous to SCLAV_5491 in Streptomyces sp. SS52 is also a cytochrome P450 which is encoded by a gene with the locus tag E5N77_05760. These two cytochrome P450 proteins of S. clavuligerus and Streptomyces sp. SS52 share 34% amino acid identity (48% conserved residues). Finally, using G. max isoflavone synthase for searching analogous protein in Streptomyces sp. SS52 resulted in a cytochrome P450 with 23% amino acid identity (40% conserved residues) for the whole protein. The cytochrome P450 of Streptomyces sp. SS52 is encoded by a gene with the locus tag E5N77_23955. The genome sequence of Streptomyces sp. SS52 has been deposited in GenBank under the accession number NZ_CP039123.

Experimental design, materials and methods
For isolation and identification of daidzein, Streptomyces sp. SS52 was cultured in the SS agar medium [5] at 28 C for 5 days. Then, the culture medium was extracted with ethyl acetate (EA). The solvent was evaporated under vacuum to obtain the EA extract. The EA extract was subsequently reextracted using solvents of increasing polarities: n-hexane, n-hexane-ethyl acetate 1:1, ethyl acetate to afford the corresponding extracts H, HEA, and EA. Extract HEA was applied to normal phase silica gel column chromatography (CC) and eluted with a solvent system of n-hexaneÀchloroformÀethyl ace-tateÀacetoneÀacetic acid (isocratic, 350:100:40:25:10, v/v/v/v/v) to afford seven fractions: HEA1-7. Fraction HEA4 was purified by CC with same solvent system as previously described to afford compound 1. 1 H and 13 C NMR spectra were acquired using Bruker AM-500 MHz spectrometer. Chemical shifts in ppm are referenced to the residual solvent signal (DMSO-d 6 : d H ¼ 2.50, d C ¼ 39.5).
For genome sequencing using NGS technologies, Streptomyces sp. SS52 was cultured in Tryptic Soy Broth-containing baffled erlenmeyer at 28 C. The erlenmeyer was shaken at 180 rpm for 3 days. The mycelium was harvested and washed with distilled water to remove the content of the medium before subjected to DNA extraction. Genomic DNA extraction was performed using the Qiagen MagAttract HMW kit (Qiagen) according to the instruction of the manufacturer. Library preparation and informatics was carried out by SNPSaurus (Eugene, OR). Genomic DNA was converted into sequencing libraries using the PacBio Multiplex kit and protocol (Pacific Biosciences, Menlo Park, CA) and sequenced with a PacBio Sequel using Sequencing Reagent Kit v2.1 by the University of Oregon GC3F facility. DNA was also converted to Illumina libraries using the Illumina Nextera DNA Flex kit (Illumina, San Diego, CA) and sequenced on a HiSeq4000 with paired-end 150 bp reads (Oregon GC3F facility). PacBio Sequel reads were assembled with Canu 1.7 [6] with a genome size of 8 Mbp and option corOutCoverage ¼ 60.
The Canu assembly was polished with the PacBio raw reads using the arrow program from PacBio. This consensus was then polished using Pilon [7] and the Illumina reads. Pairwise average nucleotide identity (ANI) was performed for Streptomyces sp. SS52 and other Streptomyces strains in the database using Jspecies Web server [8]. Gene prediction and annotation were carried out using the NCBI Prokaryotic Genome Annotation Pipeline (http://www.ncbi.nlm.nih. gov/genome/annotation_prok). BLAST Program was used to search putative genes encoding protein analogous to the enzymes participating in the daidzein biosynthesis by the phenylpropanoid pathway.