Complete genome sequence of the archetype bile acid 7α-dehydroxylating bacterium, Clostridium scindens VPI12708, isolated from human feces, circa 1980

ABSTRACT Clostridium scindens strain VPI12708 serves as model organism to study bile acid 7α-dehydroxylating pathways. The closed circular genome of C. scindens VPI12708 was obtained by PacBio sequencing. The genome is composed of 3,983,052 bp, with 47.59% G + C, and 3,707 coding DNA sequences are predicted.

The donated C. scindens VPI12708 strain was reconstituted from cryostock culture from clinical fecal samples storage at USDA-ARS-NCAUR in 30% glycerol and −80°C temperature. Initially cultured in anaerobic trypticase soy broth (TSB) (2), the strain was diluted and plated on TSB with 10% sheep's blood. Uniform colonies were observed, and characteristic phenotypes of this strain were confirmed (13,14). Genomic DNA was isolated as previously described (15). DNA concentration of 46.8 ng·μL −1 (A 260 nm /A 280 nm = 1.87) was determined by Nanodrop, and two high molecular weight bands were observed above the 12 kb band of a 1-kB DNA marker (Quick-Load 1 kB DNA Ladder, NEB) on a 0.5% 1× TAE gel stained with GelRed.
The genomic DNA was sheared with a Megaruptor 3 to an average fragment length of 13 kb. Sheared gDNA was converted to a library with the SMRTBell Express Template Prep Kit 3.0 at the Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign. The library was sequenced on one SMRT Cell 8M on a PacBio Sequel IIe using the circular consensus sequencing (CCS) mode and a 30-h movie time. CCS analysis was done in instrument with SMRTLink V11.0 using the following parameters: --ccs --min-passes 3 --min-rq 0.99.
PacBio generated a total of 310,426 reads with a minimum and maximum read length (bp) of 34,778 and 1,246, respectively. A mean read length of 9,743 bp (N50 contig length of 10,501 bp) was obtained, and PacBio reads were subjected to quality control using the FastQC v0.11.9 tool (16). Reads with less than 10K bases in length were removed using cutadapt v4.0 (17). Subsequently, to lower the coverage to around 50-fold, 17,000 of the remaining reads were randomly collected with seqtk v.1.2-r94 (https://github.com/lh3/ seqtk). The 17,000 selected reads used for genome assembly presented a mean read length of 12,564 bp and an N50 contig length of 12,420 bp.
Genome assembly was performed with the Canu v.2.2 assembler (18). To assess genome assembly completeness, BUSCO v.5.3.1 (19) was used. A single circular contig assembly was generated with an overall assembly containing 3,983,052 bp and 47.59% G + C content. Annotation was performed with the rapid prokaryotic genome annota tion tool Prokka v.1.14.5 (20). Default parameters were used for all software. Approxi mately 91.78% of the genome is composed of predicted genes, with 3,707 coding DNA sequences, 12 ribosomal RNA genes, 56 transfer RNA genes, 1 tmRNA gene, and 1 pseudogene.

DATA AVAILABILITY
This whole genome shotgun project has been deposited in GenBank under the accession no. CP113781; project data are available under BioProject accession number PRJNA902789, with BioSample accession number SAMN31775693 and SRA accession number SRR23686740.