Genome sequence of the halotolerant bacterium Corynebacterium halotolerans type strain YIM 70093T (= DSM 44683T)

Corynebacterium halotolerans Chen et al. 2004 is a member of the genus Corynebacterium which contains Gram-positive bacteria with a high G+C content. C. halotolerans, isolated from a saline soil, belongs to the non-lipophilic, non-pathogenic corynebacteria. It displays a high tolerance to salts (up to 25%) and is related to the pathogenic corynebacteria C. freneyi and C. xerosis. As this is a type strain in a subgroup of Corynebacterium without complete genome sequences, this project describing the 3.14 Mbp long chromosome and the 86.2 kbp plasmid pCha1 with their 2,865 protein-coding and 65 RNA genes will aid the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain YIM 70093 T (= DSM 44683 T ) is the type strain of the species Corynebacterium halotolerans [1] and was originally isolated from saline soil in Xinjiang Province in western China. The genus Corynebacterium is comprised of Gram-positive bacteria with a high G+C content. It currently contains over 80 members [2] isolated from diverse backgrounds like human clinical samples [3] and animals [4], but also from soil [5] and ripening cheese [6]. Within this diverse genus, C. halotolerans has been proposed to form a subclade together with C. freneyi and C. xerosis [1]. Data concerning salt tolerance is not available for most corynebacteria, but C. halotolerans YIM 70093 T displays the highest resistance to salt (up to 25%) described for Corynebacterium so far. Here we present a summary classification and a set of features for C. halotolerans YIM 70093 T , together with the description of the genomic sequencing and annotation.

Classification and features
A representative genomic 16S rRNA sequence of C. halotolerans YIM 70093 T was compared to the Ribosomal Database Project database [7], confirming the initial taxonomic classification. Addition of the recently published species C. maris Coryn-1 T [8], C. marinum 7015 T [9] and C. humireducens MFC-5 T [10] as well as C. diphtheriae NCTC 11397 T [11] indicates that C. halotolerans YIM 70093 T , together with C. maris, C. marinum, and C. humireducens, form a distinct subclade within the genus Corynebacterium. Interestingly, C. xerosis and C. freneyi do not group closely with this subclade when C. diphtheriae is added to the comparison. Figure 1 shows the phylogenetic neighborhood of C. halotolerans in a 16S rRNA based tree. The sequences of the four identical 16S rRNA gene copies in the genome differ by eight nucleotides from the previously published 16S rRNA sequence (AY226509), which contains two ambiguous bases.
C. halotolerans YIM 70093 T is Gram-positive and cells are rod-shaped, 0.5-1 μm long and 0.25-0.5 μm wide (Table 1 and Figure 2). It is described to be non-motile [1], which coincides with a complete lack of genes associated with 'cell motility' (functional category N). Optimal growth of YIM 70093 T was shown to occur at 28°C, pH 7.2 and 100 g/l KCl, albeit the strain tolerates a wide range of salinity, between 0-250 g/l, NaCl, and MgCl 2 [1]. Carbon sources utilized by strain YIM 70093 T include glucose, galactose, sucrose, arabinose, mannose, mannitol, maltose, xylose, ribose, salicin, dextrin, and starch [1], although the latter is doubtful as C. halotolerans cannot hydrolize starch [1].  [1]. In addition, the recently described C. maris, C. marinum, and C. humireducens were added, as they were shown to be closely related. Furthermore, the type strain of the genus, C. diphtheriae [11], was included. Species with at least one publicly available genome sequence (not necessarily the type strain) are highlighted in bold face. The tree is based on sequences aligned by the RDP aligner, utilizes the Jukes-Cantor corrected distance model to construct a distance matrix based on alignment model positions without the use of alignment inserts, and uses a minimum comparable position of 200. The tree is built with RDP Tree Builder, which uses Weighbor [12] with an alphabet size of 4 and length size of 1,000. The building of the tree also involves a bootstrapping process repeated 100 times to generate a majority consensus tree [13]. Rhodococcus equi (X80614) was used as an outgroup.

Genome sequencing and annotation
Genome project history C. halotolerans YIM 70093 T was selected for sequencing as part of a project to define the core genome and pan genome of the non-pathogenic corynebacteria due to its phylogenetic position and interesting capabilities, i.e. high salt tolerance. While not being a part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project [26], sequencing of the type strain will nonetheless aid the GEBA effort. The genome project is deposited in the Genomes On Line Database [27] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the Center of Biotechnology (CeBiTec). A summary of the project information is shown in Table 2. Standards in Genomic Sciences

Growth conditions and DNA isolation
C. halotolerans strain YIM 70093 T , DSM 44683, was grown aerobically in CASO broth (Carl Roth GmbH, Karlsruhe,Germany) at 30°C. DNA was isolated from ~ 10 8 cells using the protocol described by Tauch et al. 1995 [28].

Genome sequencing and assembly
The genome was sequenced using a 454 sequencing platform. A standard 3k paired end sequencing library was prepared according to the manufacturers protocol (Roche). Pyrosequencing reads were assembled using the Newbler assembler v2.3 (Roche). The initial Newbler assembly consisted of 81 contigs in six scaffolds with an additional 26 lone contigs. Analysis of the six scaffolds revealed one to be an extrachromosomal element (plasmid pCha1), four to make up the chromosome with the remaining one to contain the four copies of the RRN operon which caused the scaffold breaks. The scaffolds were ordered based on alignments to the complete genomes of C. glutamicum [29] and C. efficiens [30] and subsequent verification by restriction digestion, Southern blotting and hybridization with a 16S rDNA specific probe. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [25]. Project relevance Industrial, GEBA The Phred/Phrap/Consed software package [31][32][33][34] was used for sequence assembly and quality assessment in the subsequent finishing process. After the shotgun stage, gaps between contigs were closed by editing in Consed (for repetitive elements) and by PCR with subsequent Sanger sequencing (IIT Biotech GmbH, Bielefeld, Germany). A total of 61 additional reactions were necessary to close gaps not caused by repetitive elements. To raise the quality of the assembled sequence, Illumina reads were used to correct potential base errors and increase consensus quality. A WGS library was prepared using the Illumina-Compatible Nextera DNA Sample Prep Kit (Epicentre, WI, U.S.A) according to the manufacturer's protocol. The library was sequenced in an 80 bp single read GAIIx run, yielding 1,497,321 total reads. Together, the combination of the Illumina and 454 sequencing platforms provided 46.0× coverage of the genome.

Genome properties
The genome includes one plasmid, for a total size  Figure 4]. For the main chromosome, 2,856 genes were predicted, 2,791 of which are protein-coding genes. 1,632 (57%) of the proteincoding genes were assigned to a putative function with the remaining annotated as hypothetical proteins. 1,914 protein coding genes belong to 396 paralogous families in this genome corresponding to a gene content redundancy of 66.8%. The properties and the statistics of the genome are summarized in Table 3, Tables 4 and 5.   a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.