Complete genome sequence of Thauera aminoaromatica strain MZ1T

Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774.


Introduction
Strain MZ1T (=DSM 25461 =MTCC 11151=LMG 26735), a Gram-negative bacterium, was isolated from activated sludge samples from the industrial wastewater treatment facility of Eastman Chemical Company, Kingsport, Tennessee [1]. It is related to the genera Azoarcus and another prominent community member of activated sludge, Zoogloea. Strain MZ1T was identified as a significant component of microbial clusters formed during viscous bulking that resulted in poor sludge dewaterability and increased costs for dewatering, incineration and disposal [2]. Subsequently, MZ1T was found to produce a novel exopolysaccharide which contributed to the viscous bulking phenomenon. The genus Thauera is named after the German microbiologist Rudolf Thauer and was described by Macy et al. [3]. Currently, this genus consists of nine species with validly published names. These species have been isolated from a wide range of environments including wastewater activated sludge, water and soil, and typically degrade aromatic compounds such as benzoic acid or toluene under anaerobic conditions [3][4][5][6][7][8]. Here we present a summary classification and a set of features for T. aminoaromatica MZ1T, along with the description of the complete genomic sequencing and annotation.

Classification and features
Strain MZ1T originally was identified as belonging to Thauera genus based on the 16S rRNA phylogenetic analysis [1].The sequences of the four 16S rRNA gene copies in the genome do not differ from each other. However, they differ from the previously published 16S rRNA sequence (AF110005), which contains one gap and eleven ambiguous base calls. Figure 1 shows the phylogenetic relationship of T. aminoaromatica MZ1T in a 16S rRNA based tree to other Thauera species. Based on this tree, strain MZ1T is closely grouped with T. aminoaromatica S2, T. phenylacetica B4P and T. selenatis and the cluster of these four strains is well-separated from strains of T. aromatica, T. chlorobenzoica, T. mechernichensis, T. terpenica, T. butanivorans and T. linaloolentis. DNA-DNA hybridization was performed between strain MZ1T and T. selenatis ATCC 55363, T. phenylacetica B4P DSM 14743 and T. aminoaromatica S2 DSM 14742 by Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ) (Braunschweig, Germany). DNA-DNA hybridization studies showed that MZ1T was 100% similar to strain S2, 78.9% to strain B4P and 59.6% to T. selenatis ATCC 55363, respectively. When the recommended threshold value of 70% DNA-DNA similarity is used for the definition of bacterial species [4], MZ1T does not belong to the same species as T. selenatis ATCC 55363 but does belong to the same species as strain S2. Based on these results we recommend MZ1T be classified as Thauera aminoaromatica strain MZ1T. Morphologically, cells of strain MZ1T are Gram negative, short rods (0.5 x 1.1-1.8 µm) and motile due to the presence of a polar flagellum ( Figure 2). Colonies are slimy, creamy white in color at the optimal growth temperature of 30 ºC and pH 7.2, respectively. Strain MZ1T grows aerobically in Stoke's medium at 30 ºC shaking at 150 rpm and produces copious quantities of extracellular polysaccharide from relatively simple short chain fatty acids at early stationery stage [2]. However, when grown on agar plates, no obvious exopolysaccharide is observed. Under aerobic conditions, benzoate, succinate, aspartate, glutamate, proline, leucine, serine and alanine are utilized. Under anaerobic conditions MZ1T is capable of growth on benzoate with nitrate as the terminal electron acceptor. The characteristic features of the organism are listed in Table 1.  Phylum 'Proteobacteria' TAS [7]] Class Betaproteobacteria TAS [8,9] Order Rhodocyclales TAS [8,10] Current classification Family Rhodocyclaceae TAS [8,11] Genus Thauera TAS [3,12] Species Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [15]. If the evidence code is IDA, the property was directly observed by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences

Genome project history
This organism was selected for sequencing under the DOE Joint Genome Institute (JGI) Community Sequencing Program (CSP). The genome project is deposited in the Genome On Line Database (GOLD) [16] and the complete genome sequence is deposited in GenBank (CP001281). Sequencing, finishing and annotation were performed by the DOE JGI. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
Strain MZ1T was grown aerobically in Stoke's medium at 30 ºC shaking at 150 rpm [2]. Genomic DNA was extracted using a modified Cetyl Trimethyl Ammonium Bromide (CTAB) DNA extraction protocol [17]. Briefly, 100 ml of overnight culture was used for DNA isolation. After incubation with CTAB extraction buffer at 60 o C for 1 hr, cells were lysed and proteins precipitated using an equal volume of chloroform-isoamyl alcohol (24:1), and the aqueous phase was separated, to which one half volume of 5 M NaCl was added followed by two volumes of cold ~ 95% ethanol to precipitate DNA. DNA was dissolved in Tris-EDTA (TE) overnight at (4 to 6 o C). After RNase treatment followed by phenol/chloroform extraction, 1/10 volume of 2 M sodium acetate and 2 volumes absolute ethanol were added to re-precipitate DNA. Finally, DNA was dissolved in TE. The purity, quality and size of the bulk gDNA preparation were assessed by JGI according to DOE-JGI guidelines.

Genome sequencing and assembly
The genome of T. aminoaromatica strain MZ1T was sequenced at the JGI using a combination of 8 kb and 40 kb fosmid DNA libraries. In addition to Sanger sequencing, 454 pyrosequencing was done to a depth of 20 × coverage. All general aspects of library construction and sequencing performed by JGI can be found at the JGI website [18]. Draft assemblies were based on 47,422 total reads. The combined libraries provided 9.0 × coverage. The Phred/Phrap/Consed software package [19] was used for sequence assembly and quality assessment [20][21][22]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher [23] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN). A total of 2,230 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome sequences of T. aminoaromatica strain MZ1T contains 49,771 reads in the chromosome and 2,819 reads in the plasmid, achieving an average of 9.3 × coverage in the chromosome and 29.8 × in the plasmid per base with an error rate 0 in 100,000.

Genome annotation
The genes were annotated through the Oak Ridge National Laboratory genome annotation pipeline using Prodigal [24] followed by a round of manual curation using the JGI GenePRIMP pipeline [25]. Predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Data sources were then combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [26], RNAMMer [27], Rfam [28], TMHMM [29] and signalP [30].

Genome properties
The genome contains one chromosome and one plasmid for a total genome size of 4.5 Mb. (Table 3, Figure 3A and Figure 3B). The circular chromosome is 4,496,212 bp in length with a coding density of 89%, a GC content of 68%, 4,071 protein coding genes, 71 structural RNA genes, 93 pseudo genes and 4 copies each of 5S, 16S and 23S rRNA genes. About 62% of predicted genes begin with ATG, 30% begin with TTG, and 7% begin with GTG. Table 4 shows the distribution of genes in COG categories. The plasmid (pTha01) is 78,374 bp in size and has a GC content of 62%, 77% coding density, 75 protein coding genes, 4 pseudo genes and nonstructural RNA genes. Standards in Genomic Sciences

Insights from the genome
Annotation of the genome indicated that strain MZ1T has complete glycolytic and citric acid cycle pathways along with two complete acetate assimilation pathways with the key enzymes being acetate-CoA ligase and acetate kinase-phosphate acetyl transferase, respectively, thereby allowing MZ1T to utilize acetate as a carbon source [31]. Three putative gene clusters responsible for exopolysaccharide biosynthesis, polymerization and export were found. The discovery of the wzy gene in one of the cluster implicates a Wzydependent pathway of polysaccharide synthesis and export in MZ1T [32][33][34]. Unlike other related Thauera spp [35][36][37], MZ1T does not appear to have genes for anaerobic toluene or phenol degradation; however, genes for both anaerobic and aerobic benzoate degradation are present. The genome of MZ1T contains a total of six sigma factors controlling global gene regulation. These include the housekeeping sigma factor σ 70 , the nitrogen regulator σ 54 , the heat shock sigma factor σ 32 , as well as three copies of extracytoplasmic function (ECF) sigma factor [38]. MZ1T has a large number of genes encoding diverse transporter proteins and those involved in chemotaxis. More than ten copies of two component regulatory systems, genes known to be related to toxin-antitoxin plasmid addiction systems, replication-partition systems and stabilization factors such as Par-like systems were found distributed in both the plasmid and chromosome. Additionally, genes encoding efflux pumps for heavy metal resistance to arsenic, cadmium, lead, silver, zinc but not for selenium have been found on the plasmid. Furthermore, both the plasmid and chromosome contain numerous transposases, integrases and recombinases which demonstrate that genetic rearrangement is widely occurring in this strain. Figure 3B. Graphical circular map of the T. aminoaromatica MZ1T plasmid pTha01. The outermost two circles (circles 1 and 2) show the genes in the forward and reverse strands, respectively; different colors indicate different function categories. The next circle (circle 3) shows RNA genes (tRNAs green, rRNAs red, other RNAs black); circle 4 shows the GC content, and circle 5 shows the GC skew. Standards in Genomic Sciences  In liquid culture, MZ1T grows as planktonic cells until late log phase, during which it forms characteristic flocs or cell clusters and then settles out. It was hypothesized that this phenotype may be related to a quorum sensing mechanism. Genes with possible roles in quorum sensing were identified including an acyl-acyl-carrier protein synthase and luxR response regulator (12 copies). However, N-acyl-homoserine lactone synthetase or its homologue were not found, which does not support the hypothesis of quorum sensing being one of the mechanisms involved in floc formation. The genome also encodes adhesion related proteins which could be linked to exopolysaccharide production, quorum sensing or "clumping". Therefore, we speculate that the response of MZ1T to changing environmental conditions involves a complex system involving exopolysaccharide production and flocculation when the cells reach adequate density. Thus, the complete genome sequence of strain MZ1T provides an opportunity to study the biology of important adaptive factors.