Complete genome sequence analysis of Archaeoglobus fulgidus strain 7324 (DSM 8774), a hyperthermophilic archaeal sulfate reducer from a North Sea oil field

Archaeoglobus fulgidus is the type species of genus Archaeoglobus Stetter 1998, a hyperthermophilic sulfate reducing group within the Archaeoglobi class of the euryarchaeota phylum. Members of this genus grow heterotrophically or chemolithoautotrophically with sulfate or thiosulfate as electron acceptors. Except for A. fulgidus strain 7324 and the candidate species “Archaeoglobus lithotrophicus”, which both originate from deep oil-fields, the other members of this genus have been recovered from marine hydrothermal systems. Here we describe the features of the A. fulgidus strain 7324 genome as compared to the A. fulgidus VC16 type strain. The 2.3 Mbp genome sequence of strain 7324 shares about 93.5% sequence identity with that of strain VC16T but is about 138 Kbp longer, which is mostly due to two large ‘insertions’ carrying one extra cdc6 (cell-cycle control protein 6) gene, extra CRISPR elements and mobile genetic elements, a high-GC ncRNA gene (hgcC) and a large number of hypothetical gene functions. A comparison with four other Archaeoglobus spp. genomes identified 1001 core Archaeoglobus genes and more than 2900 pan-genome orthologous genes.


Introduction
Archaeoglobus fulgidus strain 7324 was recovered from hot oil-field water originating from a deep oil-well in the North Sea [1]. It shares many features with the A. fulgidus type strain VC16, e.g. dissimilatory sulfate reduction, utilization of lactate and pyruvate as carbon sources, irregular coccoid to disc-shaped cells, and blue-green fluorescence under the UV microscope due to the presence of Coenzyme F 420 . Strain VC16 T was isolated from a shallow marine hydrothermal system at Volcano island, Italy [2]. The complete genome sequence of strain VC16 T was reported in 1997 as the third archaeal genome to be fully sequenced [3] and A. fulgidus has since served as a prototype for studies of archaeal and hyperthermophilic sulfate reduction [4]. Here we report a summary of the features of A. fulgidus strain 7324, together with the description of the complete genomic sequencing and annotation and comparison with the genome of the A. fulgidus type strain and other Archaeoglobus spp.

Classification and features
Genus Archaeoglobus comprises five validly published species; A. fulgidus [2], Archaeoglobus profundus [5], Archaeoglobus veneficus [6], Archaeoglobus infectus [7], Archaeoglobus sulfaticallidus [8], and one candidate species termed "Archaeoglobus lithotrophicus" [9]. All are hyperthermophilic sulfate-reducers capable of heterotrophic or chemolitoautotrophic growth on H 2 and CO 2 . The Archaeoglobus 'clade' also encompasses a few non-sulfate reducing anaerobic hyperthermophiles; Geoglobus acetivorans [10] and "Geoglobus ahangari" [11,12], which are both Fe(III) reducers, and Ferroglobus placidus, which is capable of using ferrous iron, H 2 and sulfide as electron donors with nitrate as electron acceptor [13]. Fig. 1 shows the phylogenetic affiliation of all current members of the Archaeoglobaceae family, including strain 7324. All Archaeoglobus species form small irregularly shaped cells. A scanning electron micrograph of A. fulgidus strain 7324 is shown in Fig. 2, revealing a similar cell shape as originally determined by transmission electron microscopy [1]. Strain 7324 has not been phylogenetically characterized by 16S rRNA gene sequencing before, but a wet lab genomic DNA: DNA hybridization with A. fulgidus strain Z, which, like the type strain, was recovered from the Vulcano island [14], revealed a genome hybridization value of 100% [1]. This close relationship was now confirmed via digital DNA-DNA hybridization [15] between strains VC16 T and 7324 with a GLM-based DDH estimate of 93.9%. All three A. fulgidus strains share common physiological characteristics, like growth from 60°C to above 84°C, use of sulfate and thiosulfate as electron acceptors, optimal growth with lactate or pyruvate as carbon sources, and production of trace amounts of methane.   optimal growth temperature of strain 7324 was initially determined to 76°C, we have routinely been cultivating it at 80°C. In contrast to the other isolates, strain 7324 rapidly lyses after the stationary phase [1]. The main features of the organism are listed in Table 1.

Genome sequencing information
Genome project history A. fulgidus strain 7324 was chosen for whole genome sequencing because it was isolated from a deep and hot oil reservoir while the closely related type strain, VC-16, was isolated from a shallow marine hot vent. A genome comparison might reveal particular adaptations of strain 7324 to the deep biosphere. The genome project information is given in the Genomes OnLine Database (Gp0102124). The genome sequence is deposited in GenBank (CP006577.1). A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
A. fulgidus strain 7324 was from our own collection at the University of Bergen. It was cultivated in anaerobic medium containing lactate and sulfate as described previously [1]. The incubation temperature was 80°C. Genomic DNA was isolated using a modification of the cetyl trimethylammonium bromide method as described [16].

Genome sequencing and assembly
The genome was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [17]. The initial assembly of 454 raw data suggested a contamination of the sequenced sample. Using blast search, all contigs (>500 nt in length) could be assigned either to A. fulgidus or Thermococcus litoralis, an archaeon that shares the same habitat [18]. To overcome this issue, two additional blast searches including all contigs longer than 500 nt were performed The strain grows well and has been routinely cultivated the last two decades in our labs at 80°C against the previously sequenced genome of A. fulgidus VC-16 T (NCBI/GenBank:AE000782) and all available genomic sequences of Thermococcus species in Genebank (Dec. 2010). Only sequences in length sharing more than 90% sequence identity with A. fulgidus VC-16 T and having no hits in the Thermococcus blast database were kept. A total of 84 Newbler contigs could be assigned to A. fulgidus. Illumina raw reads were assembled to 223 contigs. Both draft assemblies were merged in a hybrid approach using the phred/phrap/consed pipeline [19]. After manual curation, a total of 27 ordered gaps were closed by bridging PCRs at LGC Genomics (Berlin). The final consensus sequence represents a single circular chromosomal element (103× coverage).

Genome annotation
Coding genes were predicted by GeneMark [20] as part of the genome annotation pipeline in the Integrated Microbial Genomes Expert Review system [21]. The tRNAs were identified by tRNAScan-SE-1.23 [22], while ribosomal RNA genes within the genome were predicted using the tool RNAmmer [23]. Other non-coding RNA genes were predicted using Infernal [24]. CRISPR elements were identified by the program CRT [25]. Manual functional annotation was performed within the IMG platform [21] and the Artemis Genome Browser [26].

Genome properties
The genome of A. fulgidus strain 7324 comprises one circular chromosome with a total size of 2,316,287 bp, which is 137,887 bp larger than A. fulgidus VC16 T DSM 3404 [3]. The mole percent G + C is 48.08, which is slightly higher than the 47% value estimated previously with thermal denaturation [1] and slightly lower than for the type strain DSM 3404 (48.6%); in any case within the 1% threshold with the species' type strain VC16 T sensu Meier-Kolthoff et al. [27]. No plasmids were detected. The strain 7324 genome is the largest of the genomesequenced Archaeoglobus species, the smallest one being the A. profundus genome with a total size of 1.56 Mbp [28]. Out of the total 2615 genes annotated in the 7324 genome, 2558 were identified as protein coding genes and 56 as RNA genes (Table 3). Only 67.29% of the genes could be assigned to COG functional categories as listed in Table 4. Five CRISPR repeat regions were identified, as compared with only three in strain VC16 T (AE000782). There is only one rRNA operon (Fig. 3). As for VC16 T , there is no apparent GC skew in the genome, which could indicate the presence of multiple DNA replication origins and explain previous difficulties in precise mapping of replication origin(s) in this species using a marker rescue analysis approach [29,30].

Insights from the genome sequence
Genes encoding central metabolic pathways like dissimilatory sulfate reduction, lactate oxidation, a complete TCA cycle and the Acetyl-CoA pathway were identified in strain 7324, confirming previous physiological characterization and similarity with strain VC16 T [1]. A genome alignment between strains VC16 T and 7324 revealed a large degree of genomic similarity and sequence synteny, interrupted mostly by two large additional regions of about 64 and 109 Kbp (InsI and InsII, respectively) in strain 7324 as compared to VC16 T (Fig. 4). Both these regions are flanked by a  disrupted tRNA gene, which implies that InsI and II represent genomic insertions. They also possess a considerably lower G + C content (42%) as compared to the average of the genome (Fig. 3), indicating a result of recombination with an AT-rich organism. This is further supported by identification of a non-coding high-GC RNA gene in InsII belonging to the hgcC family of ncRNA (RFAM v12 accession code RF00062) typically found in AT-rich hyperthermophiles (Figs. 3 and 5b). This ncRNA family was originally identified in the genomes of Pyrococcus furiosus and Methanocaldococcus jannaschii [31] but its function is still unresolved. InsII also contains a gene encoding an extra homologue of the Orc1/Cdc6 family of replication initiation control proteins in addition to the two other cdc6 homologous in the 7324 and VC16 T genomes. The closest homologue identified by a blast search is from A. veneficus (62% amino acid sequence identity). The majority of the other genes are hypothetical or have a general function prediction only. InsI carries two CRISPR repeat regions and 14 genes encoding CRISPRassociated proteins including a Cas6 homologue (Fig. 5a). The rest of this insert mostly contains hypothetical genes.
A Venn diagram shows that A. fulgidus strains VC16 T and 7324 share a large number of genes (2292) (Fig. 6a), reflecting the high degree of genome similarity. The 263 genes unique to strain 7324 include about 200 hypothetical genes/uncharacterized functions most of them belonging to the large insertions and the CRISPR-associated genes of InsII. The Venn diagram including all the five genome-sequenced strains revealed an Archaeoglobus core genome of 1001 genes (Fig. 6b), most of which encode energy-yielding, biosynthetic and regulatory functions. About 200 of the core genes belong to the hypothetical/ uncharacterized category in the EggNog database [32]. This is considerably lower than the 32% fraction of unassigned genes for the entire strain 7324, but underpins that a large part of central gene functions in this genus still remain to be disclosed. About 2900 genes belong to the Archaeoglobus pan-genome, being unique to one of the genomes or shared by 2 to 4 of the species.
Archaeoglobus fulgidus strain 7324 has been reported to degrade starch [33] and several enzymes involved in starch degradation have been purified from cells grown on starch. These enzymes include cyclodextrin gluconotransferase, cyclodextrinase, maltodextrin phosphorylase, and phosphoglucomutase, ADP-dependent glucokinase, ADP-dependent appears also to be encoded by the Thermococcus strain genome. The purity of the original A. fulgidus 7324 isolate was not assessed by 16S rRNA gene sequencing prior to deposition at DSMZ [1] and whether the Thermococcus contamination was present in the original culture or has been introduced at a different stage is not known. The genome analysis of this Thermococcus strain, which appears to represent a novel Thermococcus species, will be published separately.

Conclusions
The complete genome of A. fulgidus strain 7324, recovered from hot water produced from an oil well in the North Sea was sequenced and annotated. In addition to the A. fulgidus type strain, VC16, isolated from a shallow hot vent in the Mediterranean, this is the second A. fulgidus genome to be characterized. The two strains share 93.5% genome sequence similarity, and differ mostly by two large insertions of 64 and 109 Kbp in strain 7324 that seem to have originated from an AT-rich archaeon. The insertions carry two additional CRISPR elements, an extra cdc6 gene, a variety of mobile genetic elements and a large number of hypothetical and unassigned genes. Based on comparison with four other Archaeoglobus spp. genomes, the Archaeoglobus core genome was estimated The diagrams were prepared using 'jvenn' [49] as implemented in the EzBioCloud's Comparative Genomics Database [50] to 1001 genes. No particular traits indicating adaptation to the petroleum reservoir subsurface environment could be identified.