Short Complete genome sequence of Streptomyces formicae KY5, the formicamycin producer

Here we report the complete genome of the new species Streptomyces formicae KY5 isolated from Tetraponera fungus growing ants. S. formicae was sequenced using the PacBio and 454 platforms to generate a single linear chromosome with terminal inverted repeats. Illumina MiSeq sequencing was used to correct base changes re- sulting from the high error rate associated with PacBio. The genome is 9.6 Mbps, has a GC content of 71.38% and contains 8162 protein coding sequences. Predictive analysis shows this strain encodes at least 45 gene clusters for the biosynthesis of secondary metabolites, including a type 2 polyketide synthase encoding cluster for the antibacterial formicamycins. Streptomyces formicae KY5 is a new, taxonomically distinct Streptomyces species and this complete genome sequence provides an important marker in the genus of Streptomyces .

Streptomyces formicae KY5 is a new species, isolated from the African plant ant Tetraponera penzigi (Qin et al., 2017;Seipke et al., 2013). These ants nest inside specialised hollow swellings called domatia in their host Acacia plants where they grow a fungus as food (Blatrix et al., 2012). In return for housing, they protect their host plants from large herbivores, including elephants (Palmer et al., 2008). S. formicae produces an unidentified antifungal compound, which is active against the multidrug resistant human pathogen Lomentospora prolificans, and a group of pentacyclic polyketides, called formicamycins, which have potent antibacterial activity against clinical MRSA and VRE isolates. S. formicae has biosynthetic gene clusters (BGCs) encoding for at least 45 additional natural products. It is amenable to genetic engineering using Cas9 mediated genome editing, which was used to remove the entire ∼40 kbp formicamycin BGC (Qin et al., 2017).
High molecular weight genomic DNA was prepared using the salting out method (Kieser et al., 2000). Sequencing was performed at the Earlham Institute (Norwich, UK) using Pacific Biosciences (PacBio) RSII SMRT technology. Assembly using the HGAP2 pipeline gave one larger contig of 9.3Mb (unitig 9) and three smaller fragments of 179 kbp (unitig 1), 111 kbp (unitig 2) and 27 kbp (unitig 10). Roche 454 sequencing was also performed at the Earlham Institute, leading to 615 contigs (N50 31695 bp) using the Newbler assembler v 2.3. Overlapping PacBio and 454 contigs were then aligned using BLAST (Camacho et al., 2009) and fragments were merged to form a single contig and extend the terminal inverted repeats (Fig. S1).
Illumina sequencing was then carried out at the DNA Sequencing Facility, Department of Biochemistry, University of Cambridge, UK, using TruSeq PCR-free and Nextera Mate Pair libraries and a MiSeq 600 sequencer. Reads were mapped to the single contig generated above, and variants were called using breseq software (Barrick et al., 2014). Errors in the single contig, which likely arose through the higher error rate of PacBio (as seen for the S. leeuwenhoekii genome (Gomez-Escribano et al., 2015)), were predicted as variants by breseq and manually corrected. In total 124 errors were corrected (Table S1), most errors were associated with runs of Gs or Cs, with 83 additions and 37 deletions of bases and 4 errors were associated with small repeat sequences. The resulting genome sequence was annotated by calling the open reading frames (ORFs) using Prodigal (Hyatt et al., 2010), and then further annotated with RAST (Aziz et al., 2008) (Fig. S2). The tRNA and rRNA genes were also predicted using RAST. The fully annotated genome sequence was submitted to NCBI Genbank and assigned the accession number CP022685. 454 contigs, the initial PacBio assembly, as well as PacBio and Illumina reads (Mate Pair and TruSeq), were deposited at the NCBI Sequencing Reads Archive with accession number SRP117343.
The final polished assembly represents the complete genome sequence of S. formicae KY5. It comprises a single linear chromosome of 9,611,874 bp with a G + C content of 71.38% ( Fig. 1 and Table 1). There is a centrally located origin of DNA replication oriC located between the dnaA and dnaN genes (4,801,506-4,802,428 bp). Terminal inverted repeats (TIRs) of 35,482 bp are present at the ends of the chromosome with each TIR ending in a terminal associated helicase ttrA gene (Huang et al., 2003). The genome contains 8162 predicted ORFs and encodes six rRNA operons and 65 tRNAs. Genes predicted to encode ORFs were given identifiers of "KY5" followed by 4 digits. The rRNAs and tRNAs were given identifiers "KY5_rRNA" and "KY5_tRNA" respectively, followed by 4 digits.
To assess the secondary metabolite biosynthetic potential, we used antiSMASH v 4.0 which predicts 34 BGCs, although manual inspection suggests several of these may be islands comprising multiple BGCs, taking the minimum number to 45 (Table 2). There is a single type 2 PKS gene cluster which is responsible for biosynthesis of the formicamycins (Qin et al., 2017), and BGCs encoding three type 1 PKSs, eleven NRPSs and a number of hybrid BGCs. Six BGCs encode for terpenes, including the antibiotic albaflavenone made by Streptomyces Fig. 1. Map of the Streptomyces formicae KY5 Genome. The outer scale is numbered in intervals of 0.5 Mbp. Circles 1 and 2 display the ORFs on the forward strand (blue) and reverse strand (green) respectively. Circle 3 displays the TIRs (grey). Circle 4 displays the tRNA genes (teal). Circle 5 displays the rRNA genes (pink). Circle 6 displays the GC percentage plot (black above average, red below average). Circle 7 displays the GC skew (lime green above average, purple below average). The origin of replication is marked oriC. The genome map was made using DNAPlotter v 10.2 (Carver et al., 2009). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  Table 2) Table 2 The 34 BGCs predicted by antiSMASH v 4.0 are numbered but manual inspection revealed that some of these are islands of two or more BGCs giving 45 BGCs in total. BGCs within the antiSMASH called clusters are annotated a, b, c, d. coelicolor (Challis, 2013), and there are at least two siderophores BGCs, including one for desferrioxamine B.

Conflicts of interest
The authors declare no conflicts of interest.