Four Draft Single-Cell Genome Sequences of Novel, Nearly Identical Kiritimatiellaeota Strains Isolated from the Continental Deep Subsurface

The recently proposed bacterial phylum Kiritimatiellaeota represents a globally distributed monophyletic clade distinct from other members of the Planctomycetes, Verrucomicrobia, and Chlamydiae (PVC) superphylum. Here, we present four phylogenetically distinct single-cell genome sequences from within the Kiritimatiellaeota lineage sampled from deep continental subsurface aquifer fluids of the Death Valley Regional Flow System in the United States.

strainer (Becton Dickinson, Franklin Lakes, NJ, USA), and incubated with SYTO-9 DNA stain (Thermo Fisher Scientific, Waltham, MA, USA) at a final concentration of 5 M for 10 to 60 min. Fluorescence-activated cell sorting was performed with a BD InFlux Mariner flow cytometer equipped with a 488-nm laser and a 70-m nozzle orifice (Becton Dickinson). The cytometer was triggered on side scatter, and the "single-1 drop" mode was used for maximal sort purity. The sort gate was defined based on particle green fluorescence, light side scatter, and the ratio of green versus red fluorescence (for improved discrimination of cells from detrital particles). For each sample, individual cells were deposited into 384-well plates containing 600 nl per well of 1ϫ TE buffer and stored at Ϫ80°C prior to subsequent processing. Of the 384 wells, 317 wells were dedicated for single particles, 64 wells were used as negative controls (no droplet deposition), and 3 wells received 10 particles each to serve as positive controls. Cells were lysed, and their DNA was denatured with 5 freeze-thaw cycles, the addition of 700 nl of lysis buffer (0.4 M KOH, 10 mM EDTA, and 100 mM dithiothreitol), and a subsequent 10-min incubation at 20°C. Lysis was terminated by the addition of 700 nl of 1 M Tris-HCl at pH 4.
Sequencing libraries were created for each single cell with the Nextera XT DNA library preparation kit (Illumina, San Diego, CA, USA) with the following modifications: purification was performed with column cleanup kits (Qiagen, Venlo, the Netherlands), and library selection was performed with BluePippin (Sage Science, Beverly, MA, USA) with a target sequence size of 500 Ϯ 50 bp. Libraries were sequenced with the NextSeq 500 platform (Illumina) and V1 reagents (2 ϫ 150-bp paired-end sequencing). Raw sequencing reads for each single amplified genome (SAG) were quality trimmed with Trimmomatic v0.32 (8), reads with 95% or greater nucleotide identity with the Homo sapiens reference genome assembly (GRCh38) were removed, and low-complexity reads (less than 5% of any nucleotide) were removed as described previously (7). Quality-filtered reads were normalized in silico with kmernorm 1.05 (http://sourceforge .net/projects/kmernorm) using the settings -k 21 -t 30 -c 3 and subsequently assembled into contigs with SPAdes v3.9.0 (9) with the following settings: -careful -sc -phred-offset 33. Contig ends (100 bp) were trimmed, and contigs of fewer than 2,000 bp were discarded. Genome completeness and potential contamination were estimated with CheckM v1.0.8 (10). Predicted genome size was calculated by dividing assembly size by estimated genome completeness. Assembly quality for each SAG was determined according to minimum information about single amplified genome (MISAG) standards (11). Protein-encoding regions were identified with the Rapid Annotations using Subsystems Technology (RAST) server (12), and genes were annotated with Koala (KEGG) (13) and InterProScan 5 (14). Average nucleotide identity (ANI) of reciprocal hits between genome assemblies was calculated using the online ANI calculator (http://enve-omics.ce.gatech.edu/ani/) (15). Assembly statistics are shown in Table 1.
Based on the detection of conserved single-copy marker genes in the 3 most complete SAG assemblies, we predict that BLM1 Kiritimatiellaeota genome sequences contain 3.8 to 4.2 Mbp. The CheckM-based predicted genome size of the smallest SAG (AH-151-K23) was 3 times higher than values for the other SAGs. CheckM estimates genome completeness and contamination of genome assemblies based on the presence and location of lineage-specific marker genes selected from the phylogenetic placement (based on single-copy marker genes in the assembly) of the assembly in a built-in reference genome tree (10). Of the 104 marker genes used by CheckM to assess genome completeness and contamination for AH-151-K23, only 2 genes were found in the assembly (threonylcarbamoyl adenosine biosynthesis protein TsaE [accession no. PF02367] and Holliday junction DNA helicase RuvA [accession no. TIGR00084]), ultimately resulting in 3.4% estimated genome completeness. Furthermore, compared to the other SAGs, the largest contig size (30 kb) and N 50 value (7.7 kb) associated with AH-151-K23 were ϳ2 to 5 times lower. A combination of the absence of phylogenetically informative marker genes in the assembly, low genome recovery (small assembly), and relatively short contigs contributed to very low genome completeness and high genome size predictions for this SAG.
All four SAGs had identical 16S rRNA genes and shared greater than 99% average nucleotide identity. The 16S rRNA gene sequence has 82.7% sequence identity with K. glycovorans L21-Fru-AB T (GenBank accession no. KC665948) (1), suggesting that these 2 organisms belong to genetically distinct lineages. The SAGs encode a variety of glycosyl hydrolases, including cellulases (GH5), ␤-xylosidases (GH39), D-4,5-unsaturated ␤-glucuronyl hydrolases (GH88), glucoamylases (GH97), and endo-␣-N-acetylgalactosaminidases (GH101) as well as many uncharacterized sulfatases. These results suggest that these organisms may have the capacity for degradation of complex polysaccharides and glycoproteins to obtain carbon, amino acids, and sulfur, as has been previously suggested for members of this phylum (1). Comprehensive reconstruction of the metabolic pathways encoded in the SAGs will further deepen our understanding of the ecology of these unique Kiritimatiellaeota strains in the deep continental subsurface.
Data availability. Raw sequencing reads and genome assemblies for the four SAGs have been deposited in the EMBL ENA under project no. PRJEB30981. SAG-specific accession numbers are listed in Table 1 Table 1.