Improved De Novo Draft Genome Sequence of the Nocavionin-Producing Type Strain Nocardia terpenica IFM 0706 and Comparative Genomics with the Closely Related Strain Nocardia terpenica IFM 0406

We report an improved de novo draft genome sequence of the human-pathogenic strain Nocardia terpenica IFM 0706T. The resequencing unveiled that the genome size is larger than anticipated, reducing significantly the number of contigs and building a basis for comparison with the closely related strain N. terpenica IFM 0406.

S train IFM 0706 T (ϭJCM13033 T ϭDSM44935 T ϭNBRC100888 T ) was isolated in the 1990s from a nocardiosis patient and was originally identified as a Nocardia brasiliensis strain. Together with the strain N. brasiliensis IFM 0406 (1), it was recognized as a new species and reclassified as Nocardia terpenica, with IFM 0706 representing the corresponding type strain of this new species (2). Recently, IFM 0706 T was shown to produce the antibiotic nocavionin (3).
Within the course of our genome-driven investigations of Nocardia strains (4-6), we noted that a genome sequence of IFM 0706 T was available under its synonymous designation N. terpenica NBRC 100888 T (GenBank accession number BAGI00000000.1). However, an annotation was missing, and the genome is highly fragmented. In addition, in comparison with closely related strains (4, 7), we hypothesized that the genome size of 8.63 Mbp might be too small. In order to close the significant genomic gaps and to increase the genomic resolution focused on secondary metabolism, we resequenced the genome of strain IFM 0706 T .
For genomic DNA isolation, a ZR Quick-DNA fungal/bacterial DNA miniprep kit (Zymo Research, Irvine, CA, USA) was used according to the manufacturer's protocol, except that the vortexing step was reduced from 15 to 5 min and conducted at maximum speed. The DNA was sheared using a Covaris g-TUBE, and the genomic library was prepared according to the standard PacBio 6-kb multiplex protocol, followed by size selection with the BluePippin size selection system (Sage Science, Inc.). The library was sequenced on a PacBio Sequel instrument using v3.0 chemistry, including Sequel Polymerase v3.0 and one single-molecule real-time (SMRT) cell v3, resulting in 321,329 reads with a median read length of 4,523 bp. No quality filtering was conducted; however, subreads shorter than 50 bp were discarded. The remaining PacBio long reads were assembled using SMRTLink v7.0.1 and HGAP4 (8,9). All software settings were kept at their default, except for the HGAP4 genome size estimate parameter, which was set to 9 Mbp. Overall, the reads were assembled to a 9,269,950-nucleotide draft genome at 142-fold coverage. The resulting sequence consists of 5 contigs with a GϩC content of 68.52%. Gene functional annotation using PGAP v4.11 (10) identified 8,402 coding genes.
In summary, the resequencing of strain IFM 0706 T enabled us to increase the quantity (from 8.63 Mbp to 9.27 Mbp) and quality of genomic information, to signifi-cantly reduce the number of contigs (from 4,460 down to 5), to correct the GϩC content (from 68.30 to 68.52%), and to provide the annotation.
Data availability. This whole-genome sequencing project has been deposited at DDBJ/ENA/GenBank under the accession number JABMCZ000000000. The corresponding raw sequencing data set has been registered in the NCBI SRA database under the accession number SRR11861893.