Microbial diversity of thermophiles with biomass deconstruction potential in a foliage‐rich hot spring

Abstract The ability of thermophilic microorganisms and their enzymes to decompose biomass have attracted attention due to their quick reaction time, thermostability, and decreased risk of contamination. Exploitation of efficient thermostable glycoside hydrolases (GHs) could accelerate the industrialization of biofuels and biochemicals. However, the full spectrum of thermophiles and their enzymes that are important for biomass degradation at high temperatures have not yet been thoroughly studied. We examined a Malaysian Y‐shaped Sungai Klah hot spring located within a wooded area. The fallen foliage that formed a thick layer of biomass bed under the heated water of the Y‐shaped Sungai Klah hot spring was an ideal environment for the discovery and analysis of microbial biomass decay communities. We sequenced the hypervariable regions of bacterial and archaeal 16S rRNA genes using total community DNA extracted from the hot spring. Data suggested that 25 phyla, 58 classes, 110 orders, 171 families, and 328 genera inhabited this hot spring. Among the detected genera, members of Acidimicrobium, Aeropyrum, Caldilinea, Caldisphaera, Chloracidobacterium, Chloroflexus, Desulfurobacterium, Fervidobacterium, Geobacillus, Meiothermus, Melioribacter, Methanothermococcus, Methanotorris, Roseiflexus, Thermoanaerobacter, Thermoanaerobacterium, Thermoanaerobaculum, and Thermosipho were the main thermophiles containing various GHs that play an important role in cellulose and hemicellulose breakdown. Collectively, the results suggest that the microbial community in this hot spring represents a good source for isolating efficient biomass degrading thermophiles and thermozymes.

Thermostable biomass-acting enzymes are promising due to their suitability for industrial applications (Duan & Feng, 2010).
Other advantages of thermophiles and their enzymes have been reported (Taylor et al., 2009;Vishnivetskaya et al., 2015). Heated environments such as hot springs are potential sources of thermophiles and thermozymes (Urbieta et al., 2015;Zhao et al., 2017).
Due to high temperatures, most known hot springs lack vegetation sources. In one report, the microbial community in heated (68°C) sediments surrounding vegetated (Juncus tweedyi) wetland in Obsidian Pool (site OBP10) in Yellowstone National Park (YNP) was found to mainly include Firmicutes, Proteobacteria, Aquificae, Deinococcus-Thermus, Spirochaetes, and Verrucomicrobia phyla, and a huge proportion of unclassified bacteria. The majority of Firmicutes members including lignocellulolytic degraders were Clostridium, Anaerobacter, Caloramator, Caldicellulosiruptor, and Thermoanaerobacter (Vishnivetskaya et al., 2015). When OBP10 samples were inoculated with various lignocellulolytic materials, including Avicel, switchgrass, Populus, and xylan, and incubated at 55-85°C in anaerobic laboratory conditions, the main bacteria after three culturing rounds were Thermoanaerobacter, Caloramator, Caldicellulosiruptor, Clostridium, Dictyoglomus, and Fervidobacterium; their distributions in these experiments varied with experimental parameters such as temperature and type of substrate (Vishnivetskaya et al., 2015).
Another site that lacks lignocellulosic plant material is the Great Boiling Spring (GBS), located in Nevada (77-85°C) (Peacock et al., 2013). Microbial diversity analysis was conducted to compare the microbial diversity in GBS water-sediments with man-made in situ enrichment using ammonia fiber explosion-treated corn stover and aspen shavings. The microbial community attached to the supplemented biomass consisting of potential biomass degraders, sugar fermenters, and hydrogenotrophs that included Thermotoga, Dictyoglomus, Desulfurococcales, and Archaeoglobales. The microbial flora in biomass-enriched samples and GBS indigenous samples were different. Therefore, Peacock et al. (2013) suggested that the additional lignocellulosic biomass stimulated the growth of the potent biomass degraders in a natural environment.
One of the quickest approaches for examining microbial populations is 16S rRNA amplicon sequencing. The genera involved in high-temperature biomass degradation have been studied in laboratory setups with a predefined medium or type of biomass (Park et al., 2012;Eichorst et al., 2013;Peacock et al., 2013;Xia et al., 2014;Vishnivetskaya et al., 2015;Yu et al., 2015). In this study, we analyzed the microbial diversity in a Malaysian hot spring (60-90°C, mean 68°C, pH 8.6) using 16S rRNA amplicon-based sequencing.
The Y-shaped Sungai Klah (SK-Y) hot spring was studied because it is a natural "biomass degrading bioreactor" due to the presence of a submerged foliage bed. The data and results obtained add to the list of important thermophiles for biomass degradation at high temperatures, suggesting that the microbial populations involved in biomass degradation in natural environments are far more complicated than in laboratory setups.

| Sample collection and water analysis
The Y-shaped Sungai Klah hot spring (SK-Y) (3°59′50.50″N, 101°23′35.51″E) is located in Perak, Malaysia. Previously, we conducted microbial diversity analysis of the main water source of the Sungai Klah (SK) hot spring (Chan, Chan, Tay, Chua, & Goh, 2015). In this work, samples of the trapped heated spring water were taken from the SK-Y. The SK-Y is located approximately 10 meters from the SK hot spring, as reported by Chan et al. (2015).
Sampling was performed on March 24, 2016. A clean stainless water sampling dipper was used to collect water samples without any foliage at four different spots with approximately 5 m between sampling locations. Water was stored in sterile glass Schott bottles and immediately transported to the laboratory within 2.5 hr and stored at 4°C overnight. On the following day, water analysis was conducted by MyTest Lab Sdn Bhd (Malaysia) using American Public Health Association standard protocols. At least 20 pieces of submerged foliage with no apparent biofilm were collected with a sampling dipper and transferred to polypropylene Ziploc bags using a tweezer. Submerged foliage with green biofilm was collected and stored separately in 1 L Schott bottles. Degraded foliage was collected at the base of SK-Y with green biofilm, and nondegraded plant litters were carefully removed in situ.

| Total community DNA extraction
Unless specified, all samples were kept at 4°C. DNA extraction was completed within 2 days after sample collection. To study the microbial diversity of the SK-Y, total community DNA extraction of the following samples was performed: (1) pooled water of four sites with an equal volume ratio, (2) submerged foliage with no apparent biofilm, (3) submerged foliage with green biofilm, and (4) degraded foliage collected at the base of the SK-Y.
Four liters of pooled water were filtered through a filter membrane with a 0.22μm pore size (Sartorius, Göettingen, Germany).
Approximately 100 g of foliage samples were separately placed according to sample type (foliage with no apparent biofilm, foliage with green biofilm, or degraded foliage) inside a 500 ml autoclaved glass bottle containing sterile PBS with 0.05% Tween 20 (4 ml PBS per 1 g leaf, pH 7.4), and sonicated (Branson Ultrasonics, Danbury, CT, U.S.) for 1 min at 25°C. The preparations were then hand-shaken vigorously for 30 s, and leaf debris was discarded, and the remaining liquid was aliquoted into a 50 ml tube and centrifuged at 14,800g for 10 min at 4°C. The pellet was subjected to the aforementioned conventional total community DNA extraction.
To improve the purity of the extracted total community DNA, inhibitors such as humic acids were removed using the Agencourt AMPure XP System (Beckman Coulter, Brea, CA, U.S.). Quality and yield of the purified total community DNA were examined using 1% w/v agarose gel electrophoresis, a Nanodrop ™ 1000 spectrophotometer (Thermo Scientific, Waltham, MA, U.S.), and a Qubit ® 2.0 Fluorometer (Invitrogen, Merelbeke, Belgium).

| Library construction and 16S rRNA ampliconbased sequencing
The Illumina 16S rRNA Metagenomic Sequencing Library Preparation Guide was followed for the preparation of the libraries. Two sets of primers were used to target bacterial and archaeal hypervariable 16S rRNA conserved regions (Klindworth et al., 2012): (1)

| Sequence analysis
The raw sequence reads generated by the Illumina sequencer were processed in CLC Genomic Workbench 7.0 (CLC Bio, Aarhus, Denmark). Adapter sequences were trimmed and reads were filtered to ensure an average Phred score of 20. Paired-end reads were merged (mismatch cost = 2; gap cost = 3; maximum unaligned end mismatches = 0; minimum score = 8) in CLC Genomics Workbench 7.0. The assembled reads were then subjected to chimera filtering and microbial taxonomic classification using the Quantitative Insights Into Microbial Ecology (QIIME) pipeline (Caporaso et al., 2010). Analyses performed using the QIIME (version 1.9.1) pipeline were based on default parameters, unless otherwise stated. Briefly, the steps included removing chimera sequences, picking operational taxonomic units (OTUs) based on an open reference clustering approach using the UCLUST tool, and taxonomic assignment using BLAST with the National Center for Biotechnology Information (NCBI) 16S Microbial database with an e-value of 0.001. The NCBI database was selected because it is large and diverse, with the capability to provide a greater depth of information during taxonomic profiling compared to RDP, GreenGenes, or SILVA databases . All samples were randomly subsampled to the same sequencing depth prior to analysis. Microbial diversity was assessed using rarefraction analysis, the number of observed OTUs per sample, Shannon-Wiener, and Simpson using QIIME. Beta diversity measurements between all the samples were calculated using Unifrac distance (Lozupone & Knight, 2005), implemented in QIIME. Principal coordinates analysis (PCoA) was performed on the weighted UniFrac distance matrix, which accounts for communities' membership and relative abundance of OTUs. The resulting sequencing data were submitted to NCBI SRA under Bioproject PRJNA353967.

| Carbohydrate-active gene prediction
After taxonomic assignment using QIIME, taxa with a relative abundance of ≥0.85% were individually checked against the complete genome information available in the Carbohydrate-Active EnZymes database (CAZy) to determine the number and types of GH families for these taxa.

| General site descriptions
More than a dozen hot spring sites are present in the Sungai Klah hot spring park. Previously, we performed a 16S rRNA amplicon and shotgun sequencing for samples obtained from one of the SK hot springs . Although the site is located in a wooded area, plant litter does not accumulate in the hot spring, as the stream flows rapidly. Approximately 10 m away from the previously studied location , man-made drainage, 30 m long and 0.5 m deep, was built to trap heated spring water ( Figure 1a). As the shape of the drainage is Y-shaped, we therefore named the samples obtained from this site as SK-Y, to differentiate current work from our earlier study that was identified as SK hot spring . The temperature at the SK-Y spring head was approximately 90°C, but was lower (60-70°C; mean 68°C) adjacent to the water's surface and further away from the spring head. The pH for SK-Y ranged between 7.5 and 8.6. The most interesting feature of SK-Y is

| Physicochemical analysis of water
Water analysis was completed to determine the physicochemical condition of SK-Y (Table S1). The temperature and pH during sampling were 68°C and 8.6, respectively. The color of the SK-Y water was 68 true color units (TCU). Aluminum (0.96 mg/L) and iron (0.65 mg/L) were detected in the SK hot spring , but these metal ions were not detected in the SK-Y water sample. SK-Y has higher fluoride (6 mg/L), nitrate (0.29 mg/L), and zinc (0.17 mg/L) content compared to the SK hot spring (1.1 mg/L, <0.1 mg/L, and <0.02 mg/L, respectively). The sulfur and sulfide content in SK-Y were 0.5 mg/L and 12.3 mg/L, respectively. In SK-Y, other metals such as mercury, cadmium, chromium, lead, manganese, nickel, silver, aluminum, barium, and strontium were below quantifiable limits. and Adenanthera) were not found in the literatures to date, except as reported by Codron, Lee-Thorp, Sponheimer, and Codron (2007) for Vitex sp. foliage, which stated that foliage contained 6.0% lignin, 7.2% cellulose, and 8.1% hemicellulose.

| 16S rRNA gene sequencing data analysis
Total community DNA were extracted for four different types of samples. These sample types were (1)  Total community DNA extracts for water, nondecay, green biofilm, and decay underwent 16S rRNA amplicon-based sequencing using primer pairs specific for bacteria and archaea. After quality filtration and adapter trimming of the raw reads, high-quality assembled reads were analyzed using QIIME pipeline (Table 2) To assess the microbial phylogenetic beta diversity, we used the weighted Unifrac distance, which indicates the extent of the phylogenetic similarities among the microbial communities (Lozupone, Hamady, Kelley, & Knight, 2007). PCoA using weighted UniFrac revealed that the bacterial communities ( Figure S2A)  and nondecay samples, whereas a clear difference was revealed between the SK-Y water, decay, and cluster of green biofilm and the nondecay samples. The same pattern was also found in archaeal communities ( Figure S2B).

| Archaeal diversity analysis
In addition to bacteria, we wanted to understand the involvement of archaea in biomass degradation. An archaeal specific primer pair was used to target locus-specific sequences of archaea in four of the samples taken from SK-Y (Figure 2a) were present in all the samples (Figure 2b). In contrast, Caldisphaera genus was found in all samples except for the nondecay sample.

| Thermophiles and thermozymes involved in foliage degradation
The majority of detected OTUs were confidently assigned to the genus level with a blast e-value of 0.001. Genera with a relative abundance of ≥0.85% in at least one SK-Y sample were shortlisted and searched for related literature and databases. Table S2 summarizes the representative strains of these genera that were previously sequenced with complete genome information. We found that the following thermophilic bacterial genera have an abundance of genes encoded for 61 GH sequences: Acidimicrobium, Caldilinea, Chloracidobacterium, Chloroflexus, Desulfurobacterium, Fervidobacterium, Geobacillus, Meiothermus, Melioribacter, Roseiflexus, Thermoanaerobacter, Thermoanaerobacterium, Thermoanaerobaculum, and Thermosipho. For archaeal genera, Aeropyrum, Caldisphaera, Methanotorris, and Methanothermococcus are potent lignocellulosic biomass degraders, yet the total number of GHs for these archaeal genera were relatively few compared to bacterial genera (Table S2).
A comparison of the major biomass degraders found in this study and selected literature is shown in Table 3.
Microbial diversity in Octopus and Mushroom Springs at YNP were revisited using high-throughput NGS (Thiel et al., 2016). Most of the studied hot springs lack lignocellulosic plant materials.
Although SK  and SK-Y hot springs are approximately 10 m apart, the dominant microbial diversity was found to be dissimilar, probably due to several factors such as physicochemical or geochemical structure, temperature, dissolved oxygen level, and the quantity of plant litter. It is known that abiotic factors collectively contribute to the dynamics of microbial populations (Chan et al., 2017). In comparison to some known acidic hot springs (Lombard, Ramulu, Drula, Coutinho, & Henrissat, 2014;Sharp et al., 2014), SK-Y demonstrated rich microbial diversity. Often, microbial diversity might be higher in circumneutral or slightly alkaline hot springs than those of acidic sites (Sharp et al., 2014).
SK-Y was studied in this work because it represents a natural biomass degrading bioreactor. The top surface of the submerged foliage bed was covered by a green biofilm. Microbial diversity analysis showed that Cyanobacteria (14.7%), Proteobacteria (14.4%), and Chloroflexi (13.1%) were the main three phyla that contributed to green biofilm communities. Cyanobacteria and Chloroflexi are chlorophyll-based phototrophic bacteria. Their growth rates are strongly affected by temperature, pH, sulfide concentration, sunlight, and other factors (Klatt et al., 2013). A significant amount of sulfide (12.3 mg/L) with different chemical compositions were present in SK-Y (Table S1)  (sulfide, 0.2 mg/L) with a lower detected abundance of chlorophototrophs .  (Bhalla et al., 2013;Eichorst et al., 2013;De Maayer, Brumm, Mead, & Cowan, 2014;Cobucci-Ponzano et al., 2015;Vishnivetskaya et al., 2015). Eichorst et al. (2013) suggested that Firmicutes are the primary degraders of cellulose in laboratory enrichment experiments.
In the decomposition of sugarcane bagasse waste at 50°C, the predominant phylum was Proteobacteria (Mhuantong et al., 2015).
Thus, these reports elucidated that Firmicutes and Proteobacteria are the important phyla for biomass degradation at high tempera-

tures. Another important bacterial component in SK-Y was the
Acidobacteria phylum, particularly the Thermoanaerobaculum (7.3%) genus, a chemo-organotroph that thrives in anaerobic habitats (Losey et al., 2013). According to Fan et al. (2011), Acidobacteria exclusively or preferentially use organic substrates (in this work, plant litter) as an energy source.
Some of the thermophiles in SK-Y were also found in other natural geothermal or heated laboratory setups related to biomass degradation (Table 3). Nevertheless, our data suggest that thermophiles involved in biomass degradation under natural conditions are far more complicated than laboratory setups. In one report, only six orders were reported to be involved in an anaerobic digestion of sludge enriched with microcrystalline cellulose at 55°C (Xia et al., 2014), whereas SK-Y contained more than 100 orders. In separate studies, <10 genera were reported for each case (Table 3). Since these analyses were conducted in a laboratory setup using predetermined nutrients or cultivation conditions (Park et al., 2012;Eichorst et al., 2013;Xia et al., 2014;Yu et al., 2015), the few genera that grow well under these conditions would eventually dominate the culture.
In addition, our data in this work agreed with observations made by Peacock et al. (2013), as microbiota in the SK-Y water sample differed from the populations in the green biofilm attached to the foliage, as well as the microbiota in the degraded plant litter. During the biomass decomposition process, phyla composition may alter (Eichorst et al., 2013;Yu et al., 2015).

| Thermozymes for biomass degradation
To examine the dominant genera in SK-Y able to degrade biomass, all genera with a relative abundance of ≥0.85% were listed and individually searched against the CAZy complete genome database.
The 18 genera with a total of 61 GH families are summarized in Table S2. Genera with incomplete genome data, those with lower optimum growth temperature (<40°C) or OTUs that are unable to be classified confidently to the genus level, or OTUs that account for less than 0.85% of the total population were excluded from the analyses.
Some of the candidates listed in Table S2 have been well characterized, such as enzymes from Melioribacter  and Thermoanaerobacterium (Currie et al., 2014). The majority of the dominant bacterial genera in SK-Y produce GH enzymes, supporting the genera listed in Table S2 as being generally important for biomass degradation at circumneutral pH and high temperature.
Additionally, we deduced that bacteria, instead of archaea, play a more important role in the consortium that degrades biomass, which is because in most reported articles, bacteria instead of archaea dominated biomass degradation process (Table 3). In addition, genomes of thermophilic archaeal harbor lower numbers of GH groups than thermophilic bacteria (CAZy database). Generally, the genome size of archaea is relatively smaller than bacteria, probably due to the genome streaming process, many genes for GH enzymes in archaea have been omitted (Urbieta et al., 2015).  (Lončar, Božić, & Vujčić, 2016). We were not able to rule out the presence of fungi or enzymes from fungi in SK-Y that may assist with partial lignin removal. Nevertheless, as the average temperature of SK-Y is relatively high, the presence of fungi and the stability of its enzymes are questionable.

| CON CLUS IONS
The work presented here describes the microbiota within a heated

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests linked to the data presented in this manuscript. All the authors consented to the publication of this work.