Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. This genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.


Introduction
Halorubrum lacusprofundi is an extremely halophilic archaeon belonging to the class Halobacteria within the phylum Euryarchaeota. The species is represented by the type strain, ACAM 34 (= DSM 5036 = ATCC 49239 = JCM 8891), and a second strain, ACAM 32, both of which were isolated from Deep Lake, Antarctica [1]. This organism was first described as Halobacterium lacusprofundi but was later transferred to the genus Halorubrum [2]. Members of the genus Halorubrum have been found not only in Antarctica, but also in Africa [3], Asia [4], and North America [5], where they are usually found in saline lakes or salterns. Most members of the genus are neutrophiles, but some are haloalkaliphiles [6,7]. H. lacusprofundi ( Fig. 1) was proposed for sequencing as part of a 2006 Joint Genome Institute Community Sequencing Program project because of its ability to grow at low temperature and its phylogenetic distance from other halophiles with sequenced genomes (Fig. 2).

Classification and features
Halorubrum lacusprofundi ACAM 34 was isolated from a water-sediment sample from Deep Lake, Antarctica [1]. The water-sediment sample was incubated in the light at 18°C, and after 3 months developed a reddish color. H. lacusprofundi was isolated from the sample by streaking on Deep Lake vitamin agar, which was composed of Lake Deep water with 1 g/L yeast extract, 15 g/L agar, and vitamin solution. The physiological characteristics of H. lacusprofundi were described as follows [1]. Cells were pleomorphic. Motility was not observed, and no flagella were present. Cells grew at a temperature range of −1°C to 40°C with an optimal growth temperature of 36°C [8]. Growth was observed at NaCl concentration of 1.5 M to 4.5 M with an optimum salt concentration of 3.5 M. Cells lysed in distilled water. The optimum magnesium concentration for growth was 0.1 M. No growth was observed at magnesium concentrations of 0 M or 1.0 M. Ammonium could not be used as a nitrogen source; complex media such as yeast extract or peptone was required. Growth was stimulated by addition of glucose, galactose, mannose, ribose, lactose, glycerol, succinate, lactate, formate, acetate, propionate, and ethanol. Growth was not stimulated by addition of glycine. Acid was not produced from sugars.

Genome sequencing information
Genome project history H. lacusprofundi was selected for sequencing based upon its phylogenetic position relative to other haloarchaea and its cold tolerance (Table 1). It is part of a 2006 Joint Genome Institute Community Sequencing Program project that included six diverse archaeal genomes. Sequencing was done at the JGI Production Genomics Facility. Finishing was done at Los Alamos National Laboratory. Annotation was done at Oak Ridge National Laboratory and JGI. The complete genome sequence was finished in September, 2008 and was released to the public in GenBank in February, 2009. A summary of the project information is shown in Table 2.
The DNA extraction method was modified from [9]. Cells were grown to OD 600 = 0.8, collected by centrifugation at 8000 rpm for 10 min at 4°C, resuspended in 1/20 volume basal salts and lysed by addition of 2 volumes of deionized water and mixing at room temperature. Next, Fig. 1 Photomicrograph of H. lacusprofundi type strain ACAM 34 cells. The cells were grown in Franzmann et al. [1] medium. The image was taken using a phase microscope (Nikon Labphot) with 1000× magnification. The scale bar represents 10 μm Fig. 2 Phylogenetic tree of DNA-directed RNA polymerase subunit A' of select haloarchaea. Sequence alignment and tree construction were carried out with Clustal W [39]. The tree was visualized with njplot [40]. Positions with gaps were excluded during tree construction. Methanosarcina acetivorans was used as the outgroup. The numbers indicate bootstrap values based on 1000 replicates proteinase K was added to a final concentration of 100 μg/ml, mixed gently, and incubated for 1 h at 37°C. The lysate was extracted using an equal volume of phenol, mixed gently by inverting at room temperature for 5 min, and then spinning at 8000 g for 15 min at 4°C. The aqueous and interphase was collected and the phenol extraction was repeated twice more. The aqueous and interphase were then dialyzed against TE overnight at 4°C with one change of buffer. The dialyzed solution was collected and RNase A was added to a final concentration of 50 μg/ml, the solution was mixed and incubated for 2 h at 37°C with gentle shaking. Proteinase K was added to a final concentration of 100 μg/ml, mixed and incubated for an additional hour at 37°C. The RNase A and proteinase K steps were repeated. The DNA was then dialyzed overnight against TE at 4°C with one buffer change.

Genome sequencing and assembly
The genome of H. lacusprofundi was sequenced at the Joint Genome Institute using a combination of 3 kb, 8 kb, and fosmid DNA libraries. All general aspects of  Evidence codes-IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [38] library construction and sequencing were performed at the JGI [10]. Draft assemblies were based on 40,800 total reads. All libraries provided 12.5× coverage. The Phred/ Phrap/Consed software package was used for sequence assembly and quality assessment [11][12][13]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [14] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN). A total of 1722 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome sequence of H. lacusprofundi contains 54,250 reads, achieving an average of 11.8× and 13.8× coverage in the chromosomes per base with an error rate of less than 1 in 50,000 bp.

Genome annotation
Protein-coding genes were identified using a combination of CRITICA [15] and Glimmer [16] followed by a round of manual curation using the JGI GenePRIMP pipeline [17]. GenePRIMP points out cases where gene start sites may be incorrect based on alignment with homologous proteins. It also highlights genes that appear to be broken into two or more pieces, due to a premature stop codon or frameshift, and genes that are disrupted by transposable elements. All of these types of broken and interrupted genes are labeled as pseudogenes. Genes that may have been missed by the gene calling programs are also identified in intergenic regions. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and Interpro databases. Signal peptides were identified with SignalP [18], and transmembrane helices were determined with TMHMM [19]. CRISPR elements were identified with the   The total is based on the total number of protein coding genes in the annotated genome CRISPR Recognition Tool [20]. Paralogs are hits of a protein against another protein within the same genome with an e-value of 10 −2 or lower. The tRNAScanSE tool [21] was used to find tRNA genes. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes Expert Review (IMG-ER) [22] and HaloWeb [23] platform.

Genome properties
The genome of H. lacusprofundi consists of two chromosomes of length 2,735,295 bp (Chromosome 1) and 525,943 bp (Chromosome 2 or pHL500) and one plasmid of length 431,338 bp (pHL400) ( Table 3). The map of the genome is available on HaloWeb [24]. Partial sequence was obtained from a second smaller plasmid, but it appeared to be present in a minority of the cells and its complete sequence could not be determined. The GC content of the large chromosome (67 %) is larger than those of the small chromosome (57 %) and the plasmid (55 %). There are 2801 genes on the large chromosome, 522 genes on the smaller chromosome, and 402 genes on the plasmid. Two of the ribosomal RNA operons are on the large chromosome and one is found on the smaller chromosome. The properties and statistics of the genome are summarized in Table 4, and genes belonging to COG functional categories are listed in Table 5.

Conclusions
The Halorubrum lacusprofundi genome sequence is the first established from a cold-adapted haloarchaeon. The genome has features typical of halophilic Archaea, including high G + C-content, large extrachromosomal replicons, and eukaryotic-like DNA replication and transcription genes. Encoded proteins are highly acidic with properties that suggest looser packing and greater flexibility important for function at cold temperatures [25][26][27][28].
H. lacusprofundi co-exists in a community of three major haloarchaea in Deep Lake, Antarctica [29,30].