Genome sequence and description of the anaerobic lignin-degrading bacterium Tolumonas lignolytica sp. nov.

Tolumonas lignolytica BRL6-1T sp. nov. is the type strain of T. lignolytica sp. nov., a proposed novel species of the Tolumonas genus. This strain was isolated from tropical rainforest soils based on its ability to utilize lignin as a sole carbon source. Cells of Tolumonas lignolytica BRL6-1T are mesophilic, non-spore forming, Gram-negative rods that are oxidase and catalase negative. The genome for this isolate was sequenced and returned in seven unique contigs totaling 3.6Mbp, enabling the characterization of several putative pathways for lignin breakdown. Particularly, we found an extracellular peroxidase involved in lignin depolymerization, as well as several enzymes involved in β-aryl ether bond cleavage, which is the most abundant linkage between lignin monomers. We also found genes for enzymes involved in ferulic acid metabolism, which is a common product of lignin breakdown. By characterizing pathways and enzymes employed in the bacterial breakdown of lignin in anaerobic environments, this work should assist in the efficient engineering of biofuel production from lignocellulosic material.


Introduction
The exponential increase in anthropogenic greenhouse gas emissions following the industrial revolution has drastically affected the climate of Earth, inspiring the need to produce clean, renewable energy with the goal of mitigating the consequences of burning fossil fuels. Second generation biofuels are a promising source of sustainable energy because they are derived from lignocellulose, the most abundant natural polymer on Earth. However, this material is highly recalcitrant due to the occlusion of cellulose by lignin, and the microbial pathways for lignin degradation are not yet well understood.
Lignin is a complex aromatic heteropolymer present in the cell wall of all plants, and comprises 10-30 % of cell wall material [1]. Lignin forms intricate associations with cellulose, the most abundant component within the cell wall, and serves as defense for plants, blocking access of cellulase enzymes to resist microbial breakdown.
Consequently, the production of biofuels from plant biomass is physically and chemically hindered by lignin and its links to cellulose [1]. Aerobic lignin degradation has been extensively studied in fungi, suggesting that lignolytic extracellular peroxidase and laccase enzymes play a significant role in the mineralization of lignin in soil [2,3]. Recent studies focusing on bacterial breakdown and modification of lignin have found that members of the phylogenetic groups Alphaproteobacteria, Gammaproteobacteria, Firmicutes and Actinomycetes are major players in lignin degradation, in both soil and insect guts [4]. Among bacterial lignin or phenol degraders, Sphingomonas paucimobilis SYK-6 produces a β-aryl etherase [5], and Rhodococcus sp. RHA1 contains a β-ketoadipate pathway [6]; Kocuria and Staphylococcus also likely degrade aromatic compounds derived from lignocellulose [7]. Although many lignolytic bacteria grow in environments where oxygen is depleted [8], it has been suggested that they employ oxygen-requiring peroxidases, similar to the ones utilized by fungi [9].
To address the need for more efficient removal of the lignin portion of lignocellulose to streamline biofuel production, we isolated anaerobic bacteria from tropical rainforest soil in Puerto Rico. Humid tropical forest soils like those from the Long Term Ecological Research Station at the Luquillo Experimental Forest in Puerto Rico have been shown to have among the fastest rates of plant litter decomposition globally [10], despite their low and fluctuating redox potential [11]. Frequent episodes of anoxia at the Luquillo Forest inhibit fungal growth [12], suggesting that bacteria are responsible for the observed litter decomposition, and consequently providing an optimal environment for isolating bacteria involved in the anaerobic decomposition of plant litter, including cellulose and lignin compounds. Bacteria are more amenable to genetic modification than fungi and thus are more easily incorporated into biofuel processing technology, for instance through metabolic engineering. Additionally, bacteria capable of metabolizing lignin anaerobically are favorable to industrial biofuel production, considering that current technology relies on anaerobic digestors to process plant waste into biofuels [13]. With this in mind, we isolated and characterized a bacterium capable of anaerobic lignin degradation, Tolumonas lignolytica BRL6-1 T sp. nov., and provide a summary of its genome sequence and annotation.

Classification and features
Tolumonas lignolytica BRL6-1 T was isolated from soil collected at the Bisley watershed site of the El Yunque National Forest in Puerto Rico, part of the Luquillo Experimental Forest in Luquillo, Puerto Rico, USA (Table 1; Additional file 1). Soils were diluted in water and used , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [53] to inoculate anaerobic roll tubes containing a modified CCMA media consisting of 2.8 g L −1 NaCl, 0.1 g L −1 KCl, 27 mM MgCl 2 , 1 mM CaCl 2 , 1.25 mM NH 4 Cl, 9.76 g L −1 MES, 1.1 ml L −1 K 2 HPO 4 , 12.5 ml L −1 trace minerals [14,15], and 1 ml L −1 Thauer's vitamins [16], with alkali lignin added as the sole source of carbon.
Tubes were incubated at ambient temperature for 3 months before colonies were picked and characterized.
Sanger sequencing was performed on the small subunit ribosomal RNA (16S rRNA) gene using universal primers 27F and 1492R [17]. BLAST analysis shows 97 % identity to the full length 16S rRNA gene of Tolumonas auensis type strain TA 4, indicating BRL6-1 T as a potentially novel species of the Tolumonas genus, within the Aeromonadaceae family of the Gammaproteobacteria (Fig. 2a). Since the 16S rRNA gene sequence is not sufficient to clearly define the evolutionary history of this cluster of the Gammaproteobacteria, a hierarchical clustering of whole genomes based on COGS was constructed [18] (Fig. 2b). This clustering supports the placement of T. lignolytica BRL6-1 T as a novel species within the Tolumonas genus.

Genome project history
Tolumonas lignolytica BRL6-1 T was selected for sequencing based on its ability to utilize lignin as a sole carbon source. Genome sequencing was performed by the JGI and completed on February 22, 2013 and the genome was presented for public access on IMG M/ER on August 28, 2013. Table 2 presents the project information and its association with MIGS version 2.0 compliance [19].

Growth conditions and genomic DNA preparation
For genomic DNA extraction, strain BRL6-1 T was grown overnight in 10 % TSB at 30°C with shaking at 225 rpm. Genomic DNA for sequencing was obtained following a modified cetyl-trimethylammonium bromide (CTAB) extraction protocol established by the DOE Joint Genome Institute. Modifications were as follows: 1) Overnight cultures were resuspended to an OD@600 nm of 0.5, instead of 1.0; 2) Lysozyme incubation was carried out at 37°C for 30 min; 3) Proteinase K incubation was carried out for 3 h; 4) the concentration of Proteinase K was doubled. The extracted DNA was quantified using the Invitrogen™ Quant-iT™ PicoGreen® dsDNA Assay Kit and measured using the PicoGreen Fluorescence protocol on the SpectraMax M2 Microplate Reader by Molecular Devices. Genomic DNA samples were verified as strain BRL6-1 T via 16S rRNA gene sequencing before being shipped to JGI for genome sequencing.

Genome sequencing and assembly
The draft genome of strain BRL6-1 T was generated at the DOE Joint Genome Institute using both Illumina and Pacific Biosciences (PacBio) technologies. An Illumina standard shotgun library and long insert mate pair library was constructed and sequenced using the Illumina HiSeq 2000 platform [20]. 64,682,509 reads totaling 9702.4 Mb were generated from the standard shotgun and 45,878,643 reads totaling 4175.0 Mb were generated from the long insert mate pair library. A Pacbio SMRTbell™ library was constructed and sequenced on the PacBio RS platform. 41,131 raw PacBio reads yielded 41,162 adapter-trimmed and quality filtered subreads totaling 118.4 Mb. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts [21]. Filtered Illumina and PacBio reads were assembled using AllpathsLG as previously described [22]. The total size of the genome is 3.6 Mb. The final draft assembly contained 9 contigs in 9 scaffolds, and was based on 9669.

Genome annotation
Genes were identified using Prodigal [23], followed by a round of manual curation using GenePRIMP [24]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [25] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [26]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [27]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [28] developed by the Joint Genome Institute, Walnut Creek, CA, USA [29]. Fig. 2 Phylogenetic tree highlighting the position of Tolumonas lignolytica BRL6-1 T among the Aeromonadales. a The phylogenetic tree based on 16S ribosomal RNA gene sequence was inferred using the Neighbor-Joining method [54] within MEGA6 [55]. Bootstrap values of 1000 replicate trees are shown at the branches [56]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Jukes-Cantor method [57] and are in units of the number of base substitutions per site. All positions containing gaps and missing data were eliminated, creating a total of 1234 positions in the final dataset. GenBank accession numbers are shown in parentheses after strain numbers. Type strains are indicated with a superscript T. Organisms with genomes available are indicated by an asterisk before the name. b Whole genomes were hierarchically clustered based on COG profiles using tools in IMG [58]. T. lignolytica BRL6-1 T is indicated in bold in both phylogenetic trees

Genome properties
One chromosomal origin of replication, located at position 1,760,186-1,760,733 bp of contig 1 was identified, suggesting that the genome contains only one chromosome of at least 3.187Mbp (Fig. 3). The location was determined using the Z-curve method [30][31][32], which utilizes base pair disparities to create a unique threedimensional graph of the genome using Ori-Finder software [33]. Although nine contigs are presented in the GenBank record (for the genome AZUK00000000.1), contigs 5 and 9 were direct repeats of sequences contained in the other contigs, so we hypothesized that they are repeated scaffolds and excluded them from our analyses (Table 3). Due to remarkably high repeat content at the ends of contigs, we were unable to close the gaps between them using regular sequencing methods. The contigs may be part of the chromosome, but a plasmid extraction indicated the presence of at least one plasmid. A search through the PATRIC database of plasmid sequences shows that contigs 2-8, excluding 5, all have homology to known plasmid sequences, using maximum E-value of 1e −5 [34]. Furthermore, contigs 3,4,7, and 8 have annotated genes commonly found in plasmids, such as toxin-antitoxin sequences, prevent-host-death  family sequences, and plasmid maintenance and stabilization protein genes, making them likely candidates. Of the 3427 predicted genes, 3323 were identified as protein-encoding genes, while 131 RNA genes were found. Of the total protein coding genes identified, 75.02 % were assigned to a putative function. The properties and the statistics of the genome are summarized in Tables 3, 4

Genome to genome comparisons
Once the genome of strain BRL6-1 T was sequenced, we were able to compare it to the genome of T. auensis, its closest relative. The two genomes have an average nucleotide identity (ANI) of 84 %, far below the 95 % threshold for species delineation [35]. A tool developed by DSMZ called the genome to genome distance calculator (GGDC) compares genome sequences to databases of DNA-DNA hybridization (DDH) data [36]. It estimates the DDH between these two genomes to be 23.10 % +/− 2.37, again far below the species threshold (70 %). The GGDC also uses logistic regression to estimate the probability that DDH > 70 %, i.e. the two genomes belong to the same species. The GGDC calculated a <0.01 % chance that DDH > 70 % between the genomes of strain BRL6-1 T and T. auensis. These data support the assertion that strain BRL6-1 T is a novel species.

Oxidative enzymes
To facilitate physiological comparisons among the species within the Tolumonas genus, T. auensis and T. osonensis were acquired from the DSMZ culture collection in Germany. Oxidase tests confirmed that all three Tolumonas species, including Tolumonas lignolytica BRL6-1 T , are negative for oxidation of cytochrome c, as  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Not in COGs
The total is based on the total number of protein coding genes in the annotated genome colonies did not change color when applied to BD BBL™ DrySlide™ test strips. However, oxidative enzyme assays show that all three organisms are capable of oxidizing 3,4-dihydroxy-L-phenylalanine (L-DOPA), a compound that is utilized as a lignin analog (Fig. 4a). A catalase test was performed on each Tolumonas organism by dropping 3 % hydrogen peroxide on cultures that had incubated at 30°C for 20 h. T. lignolytica and T. auensis had very weak positive reactions, while T. osonensis exhibited a much stronger phenotype. However, all three organisms showed relatively strong peroxidase enzyme activity (Fig. 4B). A search through the genomes of T. auensis and T. lignolytica shows that both organisms possess just one copy of the same enzyme with putative catalase activity (EC:1.11.1.21), while each genome contains several peroxidase genes. T. osonensis does not have its genome sequenced and thus we could not search for catalase or peroxidase genes.

Carbon utilization
Based on the genome sequence of T. lignolytica BRL6-1 T , we expected that this organism would be able to easily utilize fructose, glucose, mannitol, sorbitol, sucrose, and trehalose, as there are phosphotransferase system (PTS) genes annotated specifically for these carbon sources. Biolog phenotypic arrays were used to test the carbon sources utilized by strain BRL6-1 T under anaerobic conditions, using a modified version of DSMZ medium 500 (referred to as mGV medium), in which FeSO 4 · 7H 2 O, Na 2 S · 9H 2 O, yeast extract, and the selenite/tungstate solution were omitted. Plates were performed in duplicate, and reactions were only considered positive if the difference between average and standard error was greater than 20 units. Our results indicated that out of the 96 carbon sources tested, strain BRL6-1 T was able to utilize 26 under anaerobic conditions (Table 6), included the carbon sources predicted based on the genome sequence. Summarized in Table 7 are carbon sources that are differentially utilized among the three Tolumonas organisms under anaerobic conditions [37,38]. We predicted that lactose fermentation would be possible in strain BRL6-1 T , as the genome contains four copies of PTS system genes specific to lactose, more than for any other carbon source. T. lignolytica BRL6-1 T grows well on aerobic plates of eosin methylene blue agar (EMB) medium, and produces metallic green colonies after 2 days incubation, suggesting aggressive lactose fermentation, a common characteristic of members of the Enterobacteriaceae family. Considering that enteric bacteria are common inhabitants of soils [39], it is plausible that this phenotype originated from horizontal gene transfer. The type strains of T. auensis and T. osonensis were able to grow slowly on aerobic EMB plates, but did not produce metallic green colonies. Therefore, EMB plate growth morphology can be used to easily distinguish T. lignolytica BRL6-1 T from the other Tolumonas species.

Lignocellulose degradation
The genome of Tolumonas lignolytica BRL6-1 T contains four putative peroxidase genes, which may be important in depolymerizing lignin in the environment [40]. The genome also contains homologues to ligD and ligF, genes characterized in Sphingomonas paucimobilis SYK-6 [41] that encode enzymes responsible for the cleavage of βaryl ether bonds. This type of bond comprises approximately 50 % of the linkages among lignin monomers [9], thus its cleavage is crucial to lignocellulose breakdown. Furthermore, strain BRL6-1 T possesses homologues to genes involved in the pathway for transforming ferulic acid, a common lignin breakdown product, into vanillate, Fig. 5 Growth curve of Tolumonas lignolytica BRL6-1 T in mGV medium. Solid lines depict the growth (left axis) of the organism with and without lignin amendment, with error bars showing the standard error of triplicate samples. The dashed line shows lignin concentration (right axis) throughout the growth curve. Lignin concentration was measured by removing 1 ml of culture from anaerobic septum bottles, diluting 1:10 in distilled water, filtering out cells, then measuring the absorbance at 310 nm. These values were compared to a standard curve of known concentrations of lignin in mGV medium measured at this wavelength then into protocatechuate, and finally to β-ketoadipate (summarized in Table 8) [42,43].
The genome also contains several cytochrome oxidase genes, which may be implicated in utilizing lignin as an electron acceptor for dissimilatory respiration, as was observed for 'gnolyticus' SCF1 [40], an organism that was obtained in the same isolation effort as BRL6-1 T [44]. Preliminary data supporting this hypothesis can be seen in Fig. 5, which depicts the growth of strain BRL6-1 T in a further simplified version of mGV medium (in which resazurin, sodium bicarbonate, and cysteine were also omitted from the recipe), with 0.2 % glucose supplied as a readily oxidized carbon source. The addition of lignin in the media increased both the growth rate and the maximum optical density achieved by strain BRL6-1 T . Additionally, a decrease in lignin concentration correlates to exponential growth phase, suggesting that BRL6-1 T is using lignin as an additional carbon source, and/or as an electron acceptor, which may enhance the organism's ability to utilize the more labile glucose as a carbon source.

Conclusions
Based on biochemical characterization and genome analysis, we formally propose the creation of Tolumonas lignolytica sp. nov., of which BRL6-1 T is the type strain. Its 3.6 Mbp genome contains a suite of genes coding proteins involved in the breakdown of lignocellulosic material. These characteristics highlight its applicability to the industrial production of biofuels from plant biomass.
The G + C content of the genome is 47.56 %. The type strain BRL6-1 T (=DSM 100457 =ATCC [in submission]) was isolated from tropical rainforest soil using lignin as the sole carbon source.