Methylome and Complete Genome Sequence of Parageobacillus toebii DSM 14590T, a Thermophilic Bacterium

Here, we present the first complete genome assembly of the thermophilic bacterium Parageobacillus toebii DSM 14590T. The P. toebii DSM 14590T genome consists of a 3,270,071-bp circular chromosome and a 52,989-bp native plasmid.

P arageobacillus is a recently defined genus of Gram-positive, facultative thermophiles; many Parageobacillus species were formerly classified as Geobacillus species (1). Members of both Geobacillus and Parageobacillus have potential for biotechnological applications and can produce thermostable enzymes. The aerobic bacterium Parageobacillus toebii DSM 14590 T was isolated from hay compost in South Korea (2). The genome sequence and methylome of P. toebii DSM 14590 T will inform approaches for genetic engineering of the strain and provide resources for studying Parageobacillus species in general.
P. toebii DSM 14590 T was acquired from the German Collection of Microorganisms and Cell Cultures (DSMZ) and was grown in LB liquid medium with 3 g/liter beef extract at 55°C. We used the 100/G Genomic tip extraction kit and bacterial protocol (Qiagen, Valencia, CA, USA) to isolate genomic DNA from P. toebii DSM 14590 T . The DNA was not sheared or size selected. Long-read sequencing of P. toebii DSM 14590 T was generated at the DOE Joint Genome Institute (JGI). A PacBio SMRTbell library was constructed and sequenced on the PacBio RS II platform (Menlo Park, CA, USA) (3). Sequencing generated 379,771 filtered subreads totaling 769,802,073 bp; PacBio filtering removes reads if the quality score is Ͻ0.75 or the length is Ͻ50 bp and trims hairpin adapters from sequences, splitting them into subreads. The reads were assembled using the Hierarchical Genome Assembly Process 3 (HGAP3) v2.3.0 (4) with default parameters.
The genome of P. toebii DSM 14590 T contains a circular chromosome 3,270,071 bp long and a circular plasmid of 52,989 bp. Long-read sequencing data provided 350ϫ coverage when mapped back to the assembled genome and over 1,800ϫ coverage for the plasmid, suggesting an average plasmid copy number of 5. The genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (5), which predicted 3,375 chromosomal genes, including 3,253 protein-coding, 28 rRNA, and 90 tRNA genes. Additionally, the chromosome is predicted to encode four noncoding RNAs. The P. toebii plasmid, pDSM14590, is predicted to encode 61 genes, all of which are considered protein-coding genes.
JGI also performed DNA modification detection and motif analysis using the PacBio SMRT analysis platform v5.0.1.9585 with default parameters. Modified sites were then identified and grouped into motifs using MotifFinder (6), with motifs representing recognition sequences of active methyltransferase genes (7). One methylated motif, garaAtt, was found (100% of occurrences modified). No restriction-modification (RM) systems were predicted within the P. toebii DSM 14590 T genome, but all the necessary genes involved in the bacteriophage exclusion (BREX) system were predicted (DER53_11070 to DER53_11085, DER53_12455, and DER53_16290). Like RM systems, BREX is a methylation-based bacterial defense system wherein a nonpalindromic sequence is methylated on the genome and infection by unmethylated phage DNA is hindered, though the mechanism is not fully elucidated (8). No prophage regions were predicted within the P. toebii DSM 14590 T genome using VirSorter (9) with default parameters.
Data availability. The accession number for the P. toebii DSM 14590 T chromosome is CP049703, and the plasmid pDSM14590 is available under the accession number CP049704. The BioProject accession number is PRJNA455457, the BioSample accession number is SAMN09062732, and the reads have been deposited in the NCBI SRA under accession numbers SRX4823098 and SRX4823099.

ACKNOWLEDGMENTS
This manuscript has been authored by UT-Battelle, LLC, under contract number DE-AC05-00OR22725 with the U.S. Department of Energy.
This work was supported by the Center for Bioenergy Innovation, U.S. DOE Bioenergy Research Center, supported by the Office of Biological and Environmental Research in the DOE Office of Science. The Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. DOE under contract DE-AC05-00OR22725. The PacBio DNA sequencing work was conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231. The data were generated for JGI proposal 502982. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.