Draft genome sequence of Halopiger salifodinae KCY07-B2T, an extremly halophilic archaeon isolated from a salt mine

Halopiger salifodinae strain KCY07-B2T, isolated from a salt mine in Kuche county, Xinjiang province, China, belongs to the family Halobacteriaceae. It is a strictly aerobic, pleomorphic, rod-shaped, Gram-negative and extremely halophilic archaeon. In this work, we report the features of the type strain KCY07-B2T, together with the draft genome sequence and annotation. The draft genome sequence is composed of 83 contigs for 4,350,718 bp with 65.41 % G + C content and contains 4204 protein-coding genes and 50 rRNA genes.


Introduction
The genus Halopiger, which belongs to the family Halobacteriaceae, was originally established in 2007 by Gutiérrez et al. [1]. The type species of the genus Halopiger is Halopiger xanaduensis SH-6 T . To date, the genus is comprised of three validly published species and two effectively but not validly published species: H. xanaduensis [1], Halopiger aswanensis [2], Halopiger salifodinae [3], Halopiger djelfamassiliensis [4] and Halopiger goleamassiliensis [5]. The species of the genus were reported to be isolated from hypersaline environments such as salt lake sediment [1,4,5], hypersaline soil [2] and salt mine [3]. All are Gram-negative, strictly aerobic and extremely halophilic [1][2][3][4][5]. In this genus, three genome sequences, including one finished genome sequence H. xanaduensis SH-6 T , and two draft genome sequences H. djelfamassiliensis IIH2 T and H. goleamassiliensis IIH3 T , are available in Standards in Genomic Sciences [4][5][6], except H. aswanensis 56 T which showed highest 16S rRNA gene similarity to H. xanaduensis SH-6 T (99.1 %). Here we present a summary of the classification and a set of features of strain H. salifodinae KCY07-B2 T , together with a description of the non-contiguous finished genomic sequencing and annotation.

Classification and features
A representative genomic 16S rRNA gene sequence of H. salifodinae KCY07-B2 T was compared with sequences deposited in the GenBank database using BLASTN [7]. The 16S rRNA gene sequence analysis showed that H. salifodinae KCY07-B2 T shared the highest sequence identities to H. xanaduensis SH-6 T (95.8 %), followed by H. aswanensis 56 T (95.5 %), H. djelfamassiliensis IIH2 T (94.9 %) and H. goleamassiliensis IIH3 T (94.8 %), and shared low sequence similarities (<94.8 %) to species of other genera. The phylogenetic tree was reconstructed by the neighbor-joining method using MEGA 5 and Kimura's 2-parameter model for distance calculation [8,9]. The phylogenetic tree was assessed by bootstrapping for 1000 replications, and the consensus tree was shown in Fig. 1.
H. salifodinae KCY07-B2 T can tolerant high salinity (5.4 M NaCl ) and high temperature (50°C) [3]. Cells lyse in distilled water. The optimal growth condition of strain KCY07-B2 T occured in medium NOM-3 with 2.9-3.4 M NaCl [3]. The optimum temperature was 37-45°C. The optimum pH was 7.0, with a growth range of pH 6.0-8.0 [3]. Cells of strain KCY07-B2 T are strictly aerobic, non-motile and pleomorphic rod-shaped (Fig. 2). Several sugars, organic acids and amino acids can serve as sole carbon and energy sources, and amino acids are not required in the growth medium [3]. The features of H. salifodinae KCY07-B2 T are listed in Table 1.

Genome project history
This genome was selected for sequencing on the basis of its phylogenetic position and 16S rRNA sequence similarity to other members of the genus Halopiger. This whole genome shotgun project of strain H. salifodinae KCY07-B2 T was deposited at DDBJ/EMBL/GenBank under the accession number JROF00000000 and the sequence consisted of 83 contigs (further assembling constructed these contigs into 81 scaffolds). Table 2 shows the project information and its association with MIGS version 2.0 compliance [10].

Genome sequencing and assembly
The genome of H. salifodinae KCY07-B2 T was sequenced using Solexa paired-end sequencing technology (HiSeq2000 system, Illumina, Inc., USA) [12]. A shotgun library was constructed with a 500 bp-span paired-end library (~500 Mb available reads,~130-fold genome coverage) and a 2000 bp-span paired-end library (~250 Mb available reads,~65-fold genome coverage). The sequence data from an Illumina HiSeq 2000 were assembled with SOAPdenovo v.1.05 [13][14][15]. The final assembly identified 83 contigs and 81 scaffolds (the minimum length is 523 bp) generating a genome size of 4.35 Mb. The quality of the sequencing reads data was estimated by G + C content and sequencing depth correlation analysis.

Genome annotation
The tRNAs and rRNAs were identified using tRNAscan-SE [16], RNAmmer [17] and Rfam database [18]; The open reading frames and the functional annotation of translated ORFs were predicted and achieved by using the RAST server online [19,20]. Classification of some predicted genes and pathways were analyzed using COGs [21,22] and KEGG [23][24][25] databases. Meanwhile, we used CRISPRs web server [26] to predict CRISPRs and InterPro [27,28] to obtain the GO annotation with the database of Pfam [29].
To estimate the mean level of nucleotide sequence similarity at the genome level between e KCY07-B2 T and the genus Halopier genomes available to date (H. xanaduensis SH-6 T , H. djelfamassiliensis IIH2 T and H. goleamassiliensis IIH3 T ), we compared the ORFs only using comparison sequence based in the server RAST [19] at a query coverage of ≥60 % and a minimum nucleotide length of 100 bp.

Genome properties
The draft genome sequence of H. salifodinae KCY07-B2 T revealed a genome size of 4,350,718 bp (scaffold length) with a 65.41 % G + C content. Of the 4254  Evidence codes, IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [40,41] predicted genes, 4204 were protein-coding genes, and 50 were rRNA genes. There were one 16S rRNA gene, two 23S rRNA genes and two 5S rRNA genes. A total of 2887 genes (68.67 %) were assigned a putative function (Table 3). Table 4 showed the distribution of genes into COG functional categories.

Insights from the genome sequence
Strain H. salifodinae KCY07-B2 T was isolated from a salt mine sample. The experiments showed this strain could grow at 2.9-3.4 M NaCl for optimal growth, and the cells lysed in distilled water. So the analysis of the genome sequence focused on the adaption mechanism of the halophilic archaea in hypersaline-environments. Strain H. salifodinae KCY07-B2 T mainly utilized "the salt-in strategy" to maintain osmotic balance. According to the annotation of genome sequence, Trk system potassium uptake protein were found, which were responsible for K + uptake and transport, including 9 copies TrkH genes and 5 copies TrkA genes. Five copies of Kef-type K + transport proteins, one copy glutathioneregulated potassium-efflux protein KefB and 8 pH adaptation potassium efflux system proteins were found that were related to K + efflux. And there also existed 8 copies of potassium channel proteins. In addition, the genome contains 13 copies of Na + / H + antiporter proteins related  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Conclusions
Strain KCY07-B2 T is the third member of the genus Halopiger to be described and the fourth whose genome sequence report is available. These data will provide a new perspective of how microorganisms adapt to halophilic environments, and may also provide a pool of functional enzymes that work at higher salty.