Profiling soil microbial communities with next-generation sequencing: the influence of DNA kit selection and technician technical expertise

Structure and diversity of microbial communities are an important research topic in biology, since microbes play essential roles in the ecology of various environments. Different DNA isolation protocols can lead to data bias and can affect results of next-generation sequencing. To evaluate the impact of protocols for DNA isolation from soil samples and also the influence of individual handling of samples, we compared results obtained by two researchers (R and T) using two different DNA extraction kits: (1) MO BIO PowerSoil® DNA Isolation kit (MO_R and MO_T) and (2) NucleoSpin® Soil kit (MN_R and MN_T). Samples were collected from six different sites on Okinawa Island, Japan. For all sites, differences in the results of microbial composition analyses (bacteria, archaea, fungi, and other eukaryotes), obtained by the two researchers using the two kits, were analyzed. For both researchers, the MN kit gave significantly higher yields of genomic DNA at all sites compared to the MO kit (ANOVA; P < 0.006). In addition, operational taxonomic units for some phyla and classes were missed in some cases: Micrarchaea were detected only in the MN_T and MO_R analyses; the bacterial phylum Armatimonadetes was detected only in MO_R and MO_T; and WIM5 of the phylum Amoebozoa of eukaryotes was found only in the MO_T analysis. Our results suggest the possibility of handling bias; therefore, it is crucial that replicated DNA extraction be performed by at least two technicians for thorough microbial analyses and to obtain accurate estimates of microbial diversity.


INTRODUCTION
Determining microbial community structures of environmental samples by means of amplicon next-generation sequencing (NGS) is an important technique in fields such as agriculture, ecology, and human health. Deep sequencing and the capacity to sequence multiple samples make metagenomic sequencing technologies very attractive for exploring microbial species diversity (Hamady et al., 2008;Pinto & Raskin, 2012;Sogin et al., 2006). However, for all NGS approaches, the first crucial step is the isolation of DNA, since any bias introduced in this step will affect the final results, although additional biases can also be introduced subsequently by different sequencing protocols, databases, and data analysis using different algorithms.
Microbial communities in soil participate in diverse ecological interactions between organisms and in biogeochemical processes of nutrient mobilization, decomposition, and gas fluxes (Urbanova, Picek & Barta, 2011). Therefore, metagenomic studies of soil communities are very important to understand these processes. However, compared to aquatic environments, DNA isolation from soil is particularly challenging due to its physicochemical and biological properties, as well as the presence of compounds that inhibit the polymerase chain reaction (Hata et al., 2011;Iker et al., 2013). Three factors need to be considered for a full metagenomic analysis of soils: soil sampling, DNA extraction from microbes in the soil, and data analysis (Bakken, 1985;Lombard et al., 2011). In principle, there are two approaches to DNA isolation. The indirect method first isolates the microorganisms and in the next step, DNA is extracted from the isolates. In the direct method, DNA extraction is conducted without prior isolation of the target organisms. Direct DNA extraction from soils is faster and more accurate than indirect extraction (Knauth, Schmidt & Tippkotter, 2013); therefore, it is now used exclusively. Further improvements of current techniques are important for at least two reasons. First, metagenomics-based community studies must be reproducible within the same laboratory and between different laboratories in order for results to be comparable. Second, even small differences in community composition need to be reproducible, because many bacterial, archaeal, fungal, and other eukaryotic species have yet to be discovered (Taberlet et al., 2012). Hence, bias resulting from DNA isolation must be minimized.
Several studies on this topic have been published recently. Most have analyzed only the quantity and quality of the DNA isolated by various methods (Dineen et al., 2010;Knauth, Schmidt & Tippkotter, 2013;Mahmoudi, Slater & Fulthorpe, 2011;Tanase et al., 2015). Two studies demonstrated that different isolation methods and the use of different commercial kits can influence sequencing results and community analysis, but they focused on bacterial 16S rRNA genes (Bag et al., 2016;Zielińska et al., 2017). We have considerably extended those investigations by assessing not only the quality and quantity of the isolated DNA but also the sequencing outcome and the results of the final bioinformatics analysis of community structure. Furthermore, we have analyzed not only bacterial communities, but also archaea, fungi, and other eukaryotes.
This study evaluated the effectiveness of two commercial DNA isolation kits (MO BIO PowerSoil R DNA and NucleoSpin R Soil) and also variation in results attributable to skill level differences among technicians (R and T). These factors were evaluated to identify potential bias resulting from different kits and their handling, in order to optimize protocols for analysis of soil microbial communities.

PCR amplifications and sequences
PCR amplifications employed primer sets (attached with Illumina flow cell adapter) that targeted the 16S rRNA gene of bacteria and archaea, an internal transcribed spacer (ITS) region of fungi, and the 18S rRNA gene of other eukaryotes (Table 2). PCR amplification was carried out in a total volume of 20 µL containing 40 ng (10 ng/µL) microbial template genomic DNA, 0.6 µL (10 µM) each of forward and reverse primers, 4.8 µL PCR-grade water and 10 µL 2 × KAPA HiFi HotStart ReadyMix (Kapa Biosystems, Boston, MA, USA). PCR conditions were as follows: 95 • C for 5 min (initial denaturing step), 30 cycles of 20 s at 98 • C, 20 s at 58 • C, and 30 s at 72 • C, followed by a final extension step at 72 • C for 5 min. Amplicons were quality-tested and size-selected using gel electrophoresis

Data analyses
Analysis of variance (ANOVA) was performed using IBM SPSS v21.0.0, with a significance level of P < 0.05 for differences in DNA concentrations and purities derived from the two kits (MO & MN) and two researchers (R & T). We created four groups (MNR, MNT, MOR, MOT) of raw read sequences for the ANOVA test. We used FastQC v0.11.4 (Andrews, 2010) to assess the quality of raw fastq data files produced by the MiSeq. High-throughput sequences were imported into CLC Genomics Workbench v8.5.1 (QIAGEN, Aarhus A/S, http://www.clcbio.com) according to quality scores of Illumina pipeline 1.8. In order to achieve the highest quality sequences for clustering, paired reads were merged in CLC microbial genomics module v1.1 using default settings (mismatch cost = 1; minimum score = 40; Gap cost = 4 and maximum unaligned end mismatch = 5). Primer sequences were trimmed from merged reads using default parameters (trim using quality scores = 0.05 and trim ambiguous nucleotides = 2), and samples were filtered according to the number of reads. Sequences were clustered and chimeric sequences detected using CLC microbial genomics module v1.1 at a level of similarity 97% of operational taxonomic unit (OTU

RESULTS
For all locations except B and D, both researchers obtained higher DNA yields with the MN kit than with the MO kit (ANOVA, p < 0.005) ( Table 1). The amount of DNA extracted by researcher R was greater than that extracted by researcher T for all samples using the MO kit (Table 1). Furthermore, the MN kit showed variation in DNA concentration between researchers R and T among samples. Researcher R obtained greater DNA yields from locations A, C, E, and F, whereas researcher T obtained higher yields from locations B and D (Table 1), but these differences were not significant (ANOVA, p < 0.50). DNA quality, as judged by the 260/280 nm absorption ratio showed relatively small and insignificant differences between kits (MN and MO) (ANOVA, p < 0.50) and between researchers (R and T) (ANOVA, p < 0.50) for all sample locations (Table 1). Differences in the number of final sequences read among archaeal sequences were significant (ANOVA, p < 0.005) between researchers R and T, but insignificant regarding the two kits (MN and MO) (ANOVA, p < 0.50).
Interestingly, we found a high percentage of no-blast hits for fungal communities for researcher R using both kits at locations D (46.4%), E (99.9%), and F (99.5%), and two locations for researcher T when the MN kit was used (C = 45.5%; F = 51.5%) (Fig. 3). The relative abundance of Micrarchaea was shown only by researcher T (both kits and OTUs of Amoebozoa). WIM5 for eukaryotic communities was also detected by the same researcher (T), but only with the MO kit. In addition, OTUs of Armatimonadetes were detected only by researcher T and only with the MO kit. Overall, the taxonomic diversity  (Chao1), with the number of sequences. In general, the species richness between different researcher and kits were all similar, particularly the archaeal and fungal communities were the same with both kits (MN and MO), but differed by the researcher; whereas for bacterial and eukaryotic communities, the alpha diversity rarefaction curve was relatively similar for both researchers but differed between kits (Fig. 5). Principle component analysis (PCA) showed clusters of each sample for bacteria with slight differences between kits and researcher (Fig. 6A). However, archaea, fungi, and other eukaryotes showed clustering differences among most of the soil samples in the present study (Figs. 6B-6D).

DISCUSSION
Selection of a DNA extraction kit and protocol is crucial to achieving consistent results for microbial community analysis using NGS technology. Many studies have examined the composition of microbial taxonomic groups in soils and have claimed that unbiased DNA extraction kits and methods are necessary to obtain accurate results (Claassen et al., 2013;Cruaud et al., 2014;Deiner et al., 2015;McOrist, Jackson & Bird, 2002;Tang et al., 2008;Vishnivetskaya et al., 2014). In this study, we investigated the impact of handling methods and DNA extraction kits among four microbial communities (bacteria, archaea, bacteria, fungi, and other eukaryotes). The two DNA kits showed clear differences in DNA yield for both kits (MO and MN) and researchers (R and T). The MN kit produced a higher DNA yield overall. This result may be due to the bead-beating protocol, the type of beads, and differences in the chemical reagents of the two kits. Knauth, Schmidt & Tippkotter (2013) and Finley et al. (2016) reported that for soil, the MN kit yielded more DNA than other kits (FastDNA R SPIN kit (MP Biomedicals, Solon, OH, USA)), the NucleoSpin R soil kit (Macherey-Nagel, Duren, Germany), and the Innu-SPEED soil DNA kit (Analytik Jena AG, Jena, Germany)). In addition, researcher T obtained lower DNA yields than researcher R for most locations using both kits. We found that the type of kit and handling both affect the DNA yield from soil samples. Some previous studies on soils and feces have shown that the type of DNA isolation kit used significantly affected the results of microbial community analysis and that higher yields of genomic DNA produced a more comprehensive picture of microbial communities (Knauth, Schmidt & Tippkotter, 2013;Claassen et al., 2013;Ariefdjohan, Savaiano & Nakatsu, 2010).
In contrast, our finding using the Illumina MiSeq platform showed that the MO kit yielded a greater abundance of OTUs. Mackenzie, Waite & Taylor (2015) reported that the most effective DNA extraction kit for the human gut microbiome is MO, because of the quality of the DNA it produces. Our results differ from those of some previous studies, possibly due to differences between the Denaturing Gradient Gel Electrophoresis (DGGE) and MiSeq techniques (Knauth, Schmidt & Tippkotter, 2013;Claassen et al., 2013;Ariefdjohan et al., 2010). As per DNA isolation protocols, the MN kit has two different spin columns: a red ring spin column to remove inhibitors such as humic substances, and a green ring spin column to wash and bind DNA. So, for both kits, the richness of OTU profiles of microbial communities may differ depending upon the spin column type. Pooling DNA extractions from individual soil samples increased OTU richness (Song et al., 2015). Triplicate DNA extractions using different handling methods for replicates with the same kit have been recommended to avoid biases of NGS analysis and to enhance richness by isolating unique OTUs. Our results with both DNA extraction kits yielded similar DNA purity among samples and relatively similar OTU compositions. However, the OTUs of bacterial phylum Armatimonadetes and WIM5 of eukaryotes were obtained only by MO kit. Therefore, we assume that the MO kit is the most appropriate for DNA extraction from soil. The DNA extraction protocols influenced the structure of environmental soil microbial communities (Morgan, Darling & Eisen, 2010;Zielińska et al., 2017). Some studies recommended that many DNA extraction kits should test for environmental soil samples in the beginning of the study (Morgan, Darling & Eisen, 2010;Zielińska et al., 2017). However, we assume some laboratories cannot apply many kits for DNA extractions because of funding limitations and other factors. Therefore, our results stated that using different hands in triplicate of the DNA extraction might be one solution to reach a better protocol for soil metagenomics studies.
Community composition data produced in microbial ecology using metagenomics sequencing are not only important for a specific study but are also valuable for meta-analyses that compare results obtained by different research groups. The problem of possible bias in such data introduced by differences in sample handling and methodology was early realized and therefore, several studies analyzed the issue of standardization in microbial ecology (reviewed in Philippot et al., 2012). One needs to distinguish between standardization in the sense of general experimental standards for sample handling and analysis and strictly standardized procedures as they are for instance defined by the International Organization of Standardization (ISO). To achieve the latter, Philippot and colleagues developed and validated a protocol for directly extracting DNA from soil samples (Petric et al., 2011;Philippot et al., 2012), which was accepted by the ISO and is now known as the ISO-11063 method. This standard has been further evaluated in a series of follow-up studies (Plassart et al., 2012;Terrat et al., 2015). Although such strictly defined standards seem to be the most promising way to achieve inter-laboratory comparability of data, there are several issues connected to them. Firstly, it may be necessary to modify an established standard to improve recovery of certain microbial communities and to adjust it to progresses in sequencing technology. For example, one of the follow-up studies regarding the ISO-11063 procedure demonstrated that the protocol gave good results for bacterial soil communities but was less efficient for fungal diversity (Terrat et al., 2015). Therefore, the authors recommended modifying the original procedure by changing the cell lysis step, which is the very first step of all DNA extraction methods. However, if an ISO standard regularly needs modifications, it will weaken its usefulness in comparing data that were collected over longer periods of time. Secondly, it seems unrealistic to expect research groups all over the world to accept a strict standard that does not allow using commercial extraction kits of their own choice thereby also limiting the potential of improving kit efficiency. The most important issue, however, is that even the most strictly defined standard cannot avoid the handling bias. A comprehensive study of inter-and intra-laboratory variations in microbial community analyses demonstrated the possibility of significant differences in the results even when samples were processed in the same laboratory but by different investigators (Pan et al., 2010), which is consistent with our results. This type of bias appears to be independent of the DNA extraction protocol applied. In our opinion, it is in terms of practicability very difficult to introduce truly strict standards that are accepted by the research community. It might be more successful to define a set of standards of good experimental practice. Good practice in microbial community analysis of soils and other environments should include using different hands on triplicates of extracted DNA in order to minimize handling bias. This recommendation is also important when considering the design of any standard protocol in microbial ecology.

CONCLUSIONS
Our findings indicate that the type of DNA isolation kits used and laboratory handling of samples both influence the results of microbial soil community analysis. However, the yield of extracted DNA and the numbers of raw reads sequenced have a significant impact on the number of OTUs across all communities. If the amount of soil sample and initial extraction of DNA yield is not very low, we recommend that researchers should consider that the microbial DNA isolation be done in triplicate by at least two persons to obtain more accurate results when using amplicon sequences (Illumina-MiSeq).