Draft genome sequence of Thermovorax subterraneus 70BT, a thermophile isolated from a geothermally active underground mine that produces hydrogen

Thermovorax subterraneus 70BT is a thermophile found in a geothermically active underground mine. The strain 70BT belongs to the class of Clostridia, order of Thermosediminibacterales, and family of Thermosediminibacteraceae. Strain 70BT was the only type strain since the genus was discovered >10 years ago. Strain 70BT was compared to strains from other genera in terms of its phenotypics, chemotaxonomics, and phylogenetics (16S rRNA gene) in previous studies. However, the genome sequence of this strain has not been described. We herein described the genome sequence of strain 70BT. In total, the assembled genome of strain 70BT has a size of 2,451,552 bp, contributed by 44 contigs, with a coverage of 445X, a N50 of 86,294 bp, and a GC% of 43.8. A total of 2,540 genes were encoded in the genome, including 2,431 protein-coding sequences, 46 pseudogenes, and 63 RNA genes. Through the Cluster of Orthologous Groups (COGs) analysis, a total of 2,404 protein-coding genes were functionally assigned to COGs in the genome of strain 70BT. Among the members of Thermosediminibacteraceae family, strain 70BT has the closest relationship to Caldanaerovirga acetigignens JW/SA-NV4T based on the genome-to-genome comparison indexes (i.e., ANI, dDDH, AAI, and POCP). An earlier study reported that strain 70BT could produce hydrogen. We discovered genes encoding [FeFe] hydrogenase through gene mining analysis. For future research, this genome data will be used as a reference for all matters pertaining to the genus Thermovorax and family Thermosediminibacteraceae.


a b s t r a c t
Thermovorax subterraneus 70B T is a thermophile found in a geothermically active underground mine. The strain 70B T belongs to the class of Clostridia , order of Thermosediminibacterales , and family of Thermosediminibacteraceae . Strain 70B T was the only type strain since the genus was discovered > 10 years ago. Strain 70B T was compared to strains from other genera in terms of its phenotypics, chemotaxonomics, and phylogenetics (16S rRNA gene) in previous studies. However, the genome sequence of this strain has not been described. We herein described the genome sequence of strain 70B T . In total, the assembled genome of strain 70B T has a size of 2,451,552 bp, contributed by 44 contigs, with a coverage of 445X, a N50 of 86,294 bp, and a GC% of 43.8. A total of 2,540 genes were encoded in the genome, including 2,431 protein-coding sequences, 46 pseudogenes, and 63 RNA genes. Through the Cluster of Orthologous Groups (COGs) analysis, a total of 2,404 protein-coding genes were functionally assigned to COGs in the genome of strain 70B T . Among the members of Thermosediminibacteraceae family, strain 70B T has the closest relationship to Caldanaerovirga acetigignens JW/SA-NV4 T based on the genome-to-genome comparison indexes (i.e., ANI, dDDH, AAI, and POCP). An earlier study reported that strain 70B T could produce hydrogen. We discovered genes encoding [FeFe] hydrogenase through gene mining analysis. For future research, this genome data will be used as a reference for all matters pertaining to the genus Thermovorax and family Thermosediminibacteraceae .
© 2022 The Author(s

Value of the Data
• The first draft genome sequence of Thermovorax subterraneus 70B T can provide insight into the genetic diversity of the genus, species, and family Thermosediminibacteraceae . • These genome data provide beneficial information for scientists looking to further explore the genus Thermovorax , as well as for species delineation if any other closely related strains are discovered in the future. • Genome sequences of Thermovorax subterraneus 70B T can be used to discover enzymes and gene clusters involved in hydrogen production.

Data Description
Thermovorax subterraneus 70B T ( = DSM 21563 T = JCM 15541 T ) was isolated from a geothermally active underground mine located in Japan [1] . Strain 70B T is a Gram-positive, rod-shaped, and motile thermophile that grows optimally at 71 °C with pH 7.0-7.5. A heat treatment analysis at 95 °C for 25 minutes does not destroy the strain, indicating that it is heat stable. An earlier study had examined strain 70B T phenotypically, chemotaxonomically, and phylogenetically (16S rRNA gene) against close strains from other genera [1] . As of now, strain 70B T is classified under the class Clostridia , the order Thermosediminibacterales , and the family Thermosediminibacteraceae . Since its discovery > 10 years ago, strain 70B T remains the only type strain in the genus. The aims of this sequencing project are to fill in any missing data regarding the type strain genome and to spark scientific interest in an underexplored genus.
The sequencer generated a total of 1.2 Gb in 4.0 million paired-ends reads. Following adapter trimming and low-quality read filtering, the sequence data was assembled into 44 contigs, with a size of 2,451,552 bp, a coverage of 445X, a N50 of 86,294 bp, and an average GC% of 43.8. Based on the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) v5.30 annotation, the genome of strain 70B T contains 2,540 genes, with 2,431 protein-coding sequences, 46 pseudogenes, and 63 RNA genes (50 tRNAs, four 5S rRNA, four 16S rRNA, one 23S rRNA, and four noncoding RNA genes). The genome map of strain 70B T is shown in Fig. 1 , and Table 1 compares its genome characteristics to those of other genomes in Thermosediminibacteraceae family.  The protein-coding sequences of strain 70B T were classified into different Cluster of Orthologous Groups (COGs) categories. Table 2 also includes the COGs distribution of those members from family Thermosediminibacteraceae . Strain 70B T and other members have the highest COGs in the [S]-Function unknown category. This is then followed by [C]-Energy production and conversion, [E]-Amino acid transport and metabolism, [J]-Translation, ribosomal structure, and biogenesis, [K]-Transcription, and [L]-Replication, recombination, and repair. Each of these groups has more than 150 protein-coding sequences.
In Table 3 , we compared strain 70B T to other strains of the family Thermosediminibacteraceae with available genomes. The average nucleotide identity index (ANI) values range from 68.9-92.6% and the digital DNA-DNA hybridization index (dDDH) values are between 18.6-48.6%. The homology index of amino acid sequences between strain 70B T and other closest bacteria is 67.7-93.9% for average amino acid sequence identity (AAI) and 71.3-86.2% for percentage of conserved proteins (POCP). A phylogenetic tree of Thermosediminibacteraceae family based on 16S rRNA genes and a phylogenomic tree of all the available genomes are shown in Fig. 2 A and B, respectively. According to our analysis, strain 70B T is most closely related to Caldanaerovirga acetigignens JW/SA-NV4 T [2] .
According to the research group that isolated strain 70B T earlier, the bacterium was able to produce hydrogen gas [1] . Compared to mesophilic bacteria, thermophilic bacteria generate a higher yield of biohydrogen in dark fermentation [7] . Strain 70B T fermented glucose primarily into H 2 , CO 2 , acetate, ethanol, and lactate [1] ; however, it is unknown whether this strain able to utilize other substrates (i.e., starch waste). The hydrogenase enzymes in bacteria catalyze the reduction of protons into hydrogen during the anaerobic decomposition of organic compounds. It was determined that gene MCF6095860.1 encodes a transcriptional factor (TF) for hydrogenase system regulator. The sequence of this protein shares 46% sequence identity with the sequence of TM1266 TF from Thermotoga maritima with protein crystal structure deposited in PDB database (PDB ID: 2NZC). Besides, a group A-type [FeFe] hydrogenase (MCF6096026.1 and MCF6097398.1) is encoded in the genome. These hydrogenases contain a di-iron center and the active site is named the H-cluster. Several genes necessary for the formation of H-clusters are present, including several copies of the 4Fe-4S and 2Fe-2S binding proteins, carbon monoxide dehydrogenases (MCF6097400.1 and MCF6097401.1), as well as HydE, HydG, and the GTPase HydF (MCF6095948.1, MCF6096188.1 and MCF6096065.1). The gene encoding [NiFe] hydrogenase is not present in this bacterium.

Ethics Statements
This work did not involve human subjects, animal experiments, and data collected from social media platforms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.