Complete genome sequencing of Dehalococcoides sp. strain UCH007 using a differential reads picking method

A novel Dehalococcoides sp. strain UCH007 was isolated from the groundwater polluted with chlorinated ethenes in Japan. This strain is capable of dechlorinating trichloroethene, cis-1,2-dichloroethene and vinyl chloride to ethene. Dehalococcoides bacteria are hardly cultivable, so genome sequencing has presented a challenge. In this study, we developed a differential reads picking method for mixed genomic DNA obtained from a co-culture, and applied it to the sequencing of strain UCH007. The genome of strain UCH007 consists of a 1,473,548-bp chromosome that encodes 1509 coding sequences including 29 putative reductive dehalogenase genes. Strain UCH007 is the first strain in the Victoria subgroup found to possess the pceA, tceA and vcrA genes.


Introduction
Chloroethenes such as PCE, TCE cis-1,2-DCE and VC in contaminated soil and groundwater can be removed by reductive dechlorination mediated by anaerobic bacteria. Under anaerobic conditions, dehalorespiring bacteria dechlorinate chloroethenes by mediating the step-wise replacement of chlorine with hydrogen resulting in the conversion of PCE to TCE, DCE isomers, VC, and ethene sequentially. Among many dehalorespiring bacterial isolates, only a few strains of the genus Dehalococcoides completely convert chloroethenes to nontoxic ethene, hence they are indispensable for successful bioremediation applications [1][2][3][4][5][6][7][8][9][10]. The RDases are essential enzymes for the dehalorespiring activities of Dehalococcoides ssp., however, the constitution of RDase genes in each strain varies significantly, resulting in varied dechlorination activities among strains. Among the RDase genes, vcrA and bvcA, which dechlorinate VC to ethene are essential for complete dechlorination.
In our previous report, we constructed a chloroethenedechlorinating microbial consortium derived from chloroethene-polluted groundwater in Japan, and identified some operational taxonomic units that were assigned to Dehalococcoides by amplicon sequencing of 16S rRNA genes [11]. In this report, we describe a Dehalococcoides bacterium designated strain UCH007 isolated from the consortium, and present its complete genome sequence. Strain UCH007, the first Dehalococcoides strain isolated in Japan, was phylogenetically affiliated with the Victoria subgroup of the Dehalococcoides.

Classification and features
A cis-1,2-DCE-to-ethene dechlorinating enrichment culture was obtained from the microbial consortium [11] by sequentially transferring to fresh media amended with acetate plus H 2 -CO 2 (80 %:20 %, vol/vol) in the headspace and cis-1,2-DCE as the electron acceptor. Following repeated transfers to cis-1,2-DCE amended media in the presence of ampicillin or 2-bromoethanesulfonate, several series of dilution-to-extinction culturing and several agar shake processes were performed, and strain UCH007 was obtained in pure culture.
The cells of strain UCH007 were non-motile, nonspore forming and had a disc-shaped morphology with a diameter of 0.1-0.3 μm (Fig. 1). The temperature range for growth of strain UCH007 was between 15 and 35°C, with optimum growth between 25 and 30°C. The pH range for growth of strain UCH007 was between 6.2 and 7.7, with an optimum pH between 7.0 and 7.3. The range of NaCl concentrations that allowed for growth of strain UCH007 was 0-1.5 %, with an optimum concentration of 0.3-0.5 %.
Strain UCH007 is a strictly anaerobic bacterium, and its growth depends on the presence of hydrogen as an electron donor, reductive dechlorination substrates such as TCE, cis-1,2-DCE, 1,1-DCE and VC as electron acceptors and acetate as a carbon source. Vitamin B 12 is essential for growth. The strain was observed to accumulate varying amounts of VC during TCE (or cis-1,2-DCE)-to-ethene dechlorination, but growth tended to be coupled with the reductive dechlorination of VC.
Dehalococcoides strains isolated to date shared more than 98 % 16S rRNA gene sequence similarity with each other, and grouped into three subgroups designated the Pinellas, Victoria and Cornell subgroups [1]. Phylogenetic analysis based on 16S rRNA gene sequences shows that strain UCH007 belonged to the Victoria subgroup, and the most closely related strain was D. mccartyi strain VS with 99.92 % similarity (Fig. 2). The most distantly related strain was D. mccartyi strain CBDB1 with 98.91 % similarity.  [33]. Dehalogenimonas lykanthroporepellens BL-DC-9 T was used as an outgroup

Genome project history
Strain UCH007 is the first Dehalococcoides isolate from Japan and is one of the few strains found to convert toxic chloroethenes to nontoxic ethene. It was selected for sequencing on the basis of its rarity and importance in bioremediation. Table 1 presents the project information and its association with MIGS version 2.0 compliance [12]. A summary of the project information is shown in Table 2.

Growth conditions and genomic DNA preparation
Strain UCH007 was pure-cultured in 300 mL of bicarbonate-buffered medium supplemented with 10 μM of cis-1,2-DCE for 47 days [3], however, the number of cells was insufficient for genome sequencing using nextgeneration sequencers. So, WGA using the pure culture as a template was performed using the REPLI-g Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions.
Strain UCH007 was also co-cultured with Sulfurospirillum cavolei UCH003 [13] in bicarbonate-buffered medium for 36 days. Cells were harvested from 100 mL of the culture by centrifugation (12,000 × g, 15 min, 4°C). Total DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen) according to the manufacturer's instructions. The effects of strain UCH003 on the growth of strain UCH007, will be described in a separate report (manuscript in preparation).

Genome sequencing and assembly
It was difficult to obtain sufficient genomic DNA for direct shotgun sequencing from the pure culture of strain UCH007. It was also difficult to construct a complete genome sequence using reads generated by WGA because of the high abundance of chimeric reads. Therefore, direct shotgun sequencing was performed using the mixed genomic DNA obtained from the coculture. Then, the differential reads picking method ( Fig. 3) was applied to pick up reads that originated from strain UCH007.
The DNA obtained by WGA was sequenced using a 454 GS FLX Titanium pyrosequencer (Roche, Basel, Switzerland), and generated 85,621 reads (WGA reads).  The mixed genomic DNA extracted from the co-culture was directly sequenced using 454 GS FLX and Illumina MiSeq sequencers (Illumina, San Diego, CA, USA), and generated 213,427 reads and 3,332,948 reads with 251 bp paired-end sequencing, respectively (DS reads). The reads from the MiSeq were trimmed using sickle software with default parameters [14]. After assembling the DS reads from the 454 GS FLX using Newbler 2.6 (Roche) (Fig. 3; Step 1), the WGA reads were mapped to the resulting contigs using Newbler 2.8 ( Fig. 3; Step 2). The DS reads from the 454 GS FLX that were contained in the mapped contigs were recovered, these were considered to originate from strain UCH007, yielding 47,262 reads (29,841,879 bp) ( Fig. 3; Step 3). Next, these reads and 2.5 million paired-end reads and 8,414 single-end reads from the MiSeq (approximately 100 × coverage against the D. mccartyi VS genome) were assembled using Newbler 2.6 software ( Fig. 3; Step 4). Then the MiSeq reads co-assembled with the 454 GS FLX reads were picked, yielding 620,022 paired-end reads and 1,874 single-end reads (144,540,399 bp and 383,354 bp, respectively) (Fig. 3; Step 5). Finally, the picked DS reads both from 454 GS FLX and MiSeq were re-assembled, yielding 13 contigs ( Fig. 3; Step 6). Genome closure was accomplished by manual adjustment of the assembly ( Fig. 3; Step 7).

Genome annotation
The complete sequence of the chromosome was analyzed using MiGAP [15], which uses MetaGeneAnnotator [16] for predicting protein-coding genes, tRNAscan-SE [17] for tRNA genes and RNAmmer [18] for rRNA genes. The functions of the predicted protein-coding genes were assigned based on information in the Uniprot [19], Interpro [20], HAMAP [21] and KEGG [22] databases, and an in-house database composed of manually curated microbial genome sequences, as reported previously [23]. Genes in internal clusters were detected using BLASTclust with thresholds of 70 % covered length and 30 % sequence identity [24]. Signal peptides and transmembrane helices were predicted using SignalP [25] and TMHMM [26], respectively.

Genome properties
The genome of strain UCH007 consisted of a circular chromosome of 1,473,548 bp with a 46.91 % G+C content. The chromosome was predicted to contain 1,509 protein coding genes, 47 tRNA genes and 3 rRNA genes (Table 3 and Fig. 4). The distribution of protein coding genes into COG functional categories is shown in Table 4.

Insights from the genome sequence
The ANI is becoming widely accepted as a method to delineate bacterial species, with 95-96 % ANI value Fig. 3 The scheme of the differential reads picking method for sequencing of Dehalococcoides sp. strain UCH007 corresponding to 70 % DNA relatedness [27,28]. Löffler et al. noted that strains BAV1, CBDB1 and GT (Pinellas subgroup) showed lower ANI values, 86-87 %, to strain VS (Victoria subgroup) and strain 195 (Cornell subgroup) [1]. However, they proposed only one species, D. mccartyi, to accommodate all six isolates belonging to three different subgroups because of the high similarity of gene contents, and morphological and physiological characteristics. We recalculated ANI values, based on ANIb using the JSpecies program with default settings, to make full use of the accumulating genomic sequences of Dehalococcoides. The results showed that strain UCH007 was closely related to strains GY50, CG1 and VS (Victoria subgroup) with 98.52, 97.99 and 97.07 % ANI values, respectively (Additional file 1: Table S1), which were above the species threshold [27]. By comparison, the strain UCH007 and other members of Victoria subgroup were more distantly related to strains 195 T and CG4 (Cornell subgroup) with ANI values of 89.20-89.40 %, and other strains (Pinellas subgroup) Fig. 4 Graphical circular map of the genome of Dehalococcoides sp. strain UCH007. The map was drawn using ArcWithColor [38]. From outside to the center: genes on the forward strand, genes on the reverse strand, rdhA genes (pceA gene, red; tceA gene, blue; vcrA gene, green), RNA genes (rRNAs, red; tRNAs, black), GC content, GC skew Altogether, all strains in each of three subgroups, each subgroup consisting of at least two strains, showed ANI values lower than the 95-96 % threshold to all strains in other two subgroups (Additional file 1: Table S1). These results suggest that three subgroups of Dehalococcoides are to be considered three separate species [27]. The genome of strain UCH007 harbors 29 rdhA and rdhB gene clusters, and four of these 29 RdhA proteins (UCH007_00760, UCH007_09900, UCH007_09930 and UCH007_13640) showed low similarities (<55 %) to those in other strains. HPRs have been designated on the genomes of strains within the genus Dehalococcoides [9,29,30], and three and 22 rdhA genes in strain UCH007 locate in HPR1 and HPR2, respectively (Fig. 4). Strain BTF08, belonging to the Pinellas subgroup, was the first strain reported to contain the pceA, tceA and vcrA genes, encoding key enzymes in the reductive dechlorination of chloroethenes [9]. Strain UCH007 also contains orthologues of pceA (UCH007_13880), tceA (UCH007_12670) and vcrA (UCH007_12960), and is the first example of a strain containing these genes in the Victoria subgroup (Additional file 2: Table S2). The vcrA gene of strain UCH007 was detected in a genomic island located downstream of the ssrA gene as is the case with other Dehalococcoides strains [9,31].
Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated genes are detected on the HPR2 in the genome of strain UCH007 (UCH007_13260-13330), and 40 spacer regions (start position: 1,300,493 bp, end position: 1,302,966 bp) are predicted using the CRISPRfinder program online [32]. CRISPR-associated genes have only ever been found in the Pinellas subgroup, and strains CBDB1, DCMB5 and GT [9,29,30], so this is the first report of a CRISPR region in the Victoria subgroup. A bidirectional BLASTP search of the CRISPR-associated proteins showed sequence identity of more than 73 % between strain UCH007 and other strains (Additional file 3: Table S3). The direct repeat was 29 bp in length, and the consensus sequence (5′-GTATTCCCCACGCgTGTGGG GGTGAACCG-3′) was conserved among the four strains, with the exception of the base shown in lowercase [32]. Therefore, these CRISPRs seem to share a common evolutionary origin.

Conclusions
Here we reported the isolation and complete genome sequence of Dehalococcoides strain UCH007, which can dechlorinate chloroethenes to ethene. The genome sequence showed that the strain UCH007 is the first strain in the Victoria subgroup of Dehalococcoides revealed to possess pceA, tceA and vcrA genes on the chromosome. As this strain is currently considered to be used in the bioaugmentation of chloroethenes-contaminated groundwater, this information will be useful for monitoring and improve the bioaugmentation process through, for example, metagenomic and metatranscriptomic analyses.