High quality draft genome of Nakamurella lactea type strain, a rock actinobacterium, and emended description of Nakamurella lactea

Nakamurella lactea DLS-10T, isolated from rock in Korea, is one of the four type strains of the genus Nakamurella. In this study, we describe the high quality draft genome of N. lactea DLS-10T and its annotation. A summary of phenotypic data collected from previously published studies was also included. The genome of strain DLS-10T presents a size of 5.82 Mpb, 5100 protein coding genes, and a C + G content of 68.9%. Based on the genome analysis, emended description of N. lactea in terms of G + C content was also proposed.


Introduction
The genus Nakamurella, belong to the order Nakamurellales [1] and is one of the rare genera in the class Actinobacteria [2]. The genus Nakamurella is the sole and type genus of the family Nakamurellaceae, which replaced the family Microsphaeraceae [2] in 2004 [3]. The genus and family names were assigned in honour of the microbiologist Kazonuri Nakamura [4].
N. lactea was originally described as Saxeibacter lacteus [9], which was the type species of one of the three genera comprising in the family Nakamurellaceae. Then, in the light of the 16S rRNA gene and rpoB gene sequences similarities and chemotaxonomic features [6], the species was reclassified into the genus Nakamurella. Nakamurella lactea is represented by the type strain DLS-10 T (= DSM 19367 T = JCM 16024 T = KCTC 19285 T ).
The availability of the genome of one more species in the genus will provide vital baseline information for better understanding of the ecology of these rare actinobacteria and their potential as source of bioactive natural products. In the present study, we summarise the phenotypic, physiological and chemotaxonomic, features of N. lactea DLS-10 T together with the genomic data.

Organism information
Classification and features N. lactea DLS-10 T was isolated from a rock collected on the parasitic volcano Darangshi Oreum at 300 m above sea level in Jeju island, Republic of Korea (latitude 33.51, longitude 126.52) [9]. It has been shown by Lee et al. [9] and Kim et al. [4,6] that its cells are aerobic, nonmotile, non-spore and non-mycelium forming short rods with 0.4-0.7 μm and 0.9-1.0 μm of cell diameter and length, respectively ( Fig. 1), producing cream-coloured colonies on TSA medium. A summary of the classification and general features of N. lactea strain DLS-10 T is presented in the Table 1. Additional phenotypic features can be found in Lee et al. and Kim et al. [6,9].
Only four species isolated from soil (N. panacisegetis and N. flavida), rock (N. lactea) and sludge (N. mutipartita), respectively, are currently classified in the genus.
Due to this limited number of the characterised species, the ecological diversity as well as the biotechnological potential of the members of the genus Nakamurella remain to be studied in depth.
Phylogenies based on 16S rRNA gene sequences included in this manuscript were performed using the GGDC web server [11] implementation of the DSMZ phylogenomics pipeline [12]. The multiple alignment was created with MUSCLE [13] and maximum likelihood (ML) and maximum parsimony (MP) trees were inferred from it with RAxML [14] and TNT [15], respectively. For ML, rapid bootstrapping in conjunction with the autoMRE bootstopping criterion [16] and subsequent search for the best tree was used; for MP, 1000 bootstrapping replicates were used in conjunction with tree-bisection-and-reconnection branch swapping and ten random sequence addition replicates. This analysis shows the family Nakamurellaceae [4] as the sister group of the families Cryptosporangiaceae, Sporichthyaceae, and Geodermatophilaceae. The monophyly of the genus Nakamurella was supported by (close to) maximum bootstrap values under ML and MP (Fig. 2).

Genome sequencing information
Genome project history N. lactea DLS-10 T (DSM 19367 T ) was selected for sequencing on the basis of its phylogenetic position [17,18], and is part of Genomic Encyclopedia of Type  Table 1 Classification and general features of Nakamurella lactea strain DLS-10 T , according to the MIGS recommendations [36] as developed by [22] MIGS ID Property Term Evidence code a Classification Domain Bacteria TAS [39] Phylum Actinobacteria TAS [40] Class Actinobacteria TAS [2] Order Nakamurellales TAS [1] Family Nakamurellaceae TAS [41] Genus Nakamurella TAS [3,41] Species Nakamurella lactea Type strain DLS-10 TAS [6,9] Gram stain Positive TAS [6,9] Cell shape Rod TAS [6,9] Motility non-motile TAS [6,9] Sporulation Non-sporulating NAS [6,9] Temperature range 4-37°C TAS [6,9] Optimum temperature 25°C TAS [6,9] pH range 5.1-9.1 TAS [6,9] pH  a Evidence codes are from of the Gene Ontology project [42]. TAS traceable author statement (i.e., a direct report exists in the literature) Strains, Phase I: the one thousand microbial genomes project [19], a follow-up of the Genomic Encyclopedia of Bacteria and Archaea pilot project [20], which aims at increasing the sequencing coverage of key reference microbial genomes and to generate a large genomic basis for the discovery of genes encoding novel enzymes [21]. KMG-I is the first of the production phases of the "Genomic Encyclopedia of Bacteria and Archaea: sequencing a myriad of type strains" initiative [22] and a Genomic Standards Consortium project [23]. The project and the genome sequence are deposited in the Genome OnLine Database [24] and Genbank under the accession number AUFT00000000.1. In Table 2, we summarize genome sequence project.

Growth conditions and genomic DNA preparation
A N. lactea DLS-10 T culture was prepared in DSM medium 65 [25] at 28°C. Genomic DNA was extracted using MasterPure™ Gram Positive DNA Purification Kit (Epicentre MGP04100) following the standard protocol provided by the manufacturer but modified by the incubation on ice overnight on a shaker, the use of additional 1 μl proteinase K, and the addition of 7.5 units achromopeptidase, 7.5 μg/μl lysostaphine, 1050.0 units lysozyme, and 7.5 units mutanolysine. DNA is available from DSMZ through the DNA Bank Network [26].

Genome sequencing and assembly
The draft genome of N. lactea DLS-10 T was generated at the DOE Joint genome Institute (JGI) using the Illumina technology [27]. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform, which generated 13,910,936 reads totalling 2,086.6 Mb. All general aspects of library construction and sequencing performed at the JGI can be found at http:// www.jgi.doe.gov. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artefacts (unpublished results). Following steps were then performed for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [28], (2) 1-3 kb simulated paired end reads were created from Velvet contigs using wgsim (https://github.com/lh3/wgsim), (3) Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r42328) [29]. Parameters for assembly steps were

Genome annotation
The complete genome sequence was annotated using the JGI Prokaryotic Automatic Annotation Pipeline [30] with additional manual review using the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [31]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non redundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [32] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [33]. Other noncoding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [34]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [35,36] developed by the Joint Genome Institute, Walnut Creek, CA, USA [37].

Genome properties
The 5820860 bp of genome size of N. lactea DLS-10 T presents 5100 protein-coding genes, 3 rRNA genes (5S, 16S, 23S RNA) and 59 tRNA genes. A G + C content of 68.9% was calculated. More genome details are listed in Tables 3 and 4.

Conclusion
The genome of N. lactea will be used to study, for the first time, its potential as bioactive natural products source and the correlation between the rare soil bacteria and their habitat. According to [38], the within-species deviation in genomic G + C content is at most 1%. The range of 70.4-74.3% given in by Kim et al. [6] is thus too broad and too deviating from the 68.9% calculated in the genome sequence, much like the value 74.3%  provided by Lee et al. [9]. This calls for an emendation of the species description [38].
Emended description of Nakamurella lactea (Lee et al. [9]) Kim et al. [6] The properties are as given in the species description by Kim et al. [6] with the following emendation. Based on the genomic data the G + C content is 68.9%.