Evaluation of the Effects of Library Preparation Procedure and Sample Characteristics on the Accuracy of Metagenomic Profiles

ABSTRACT Shotgun metagenomic sequencing has transformed our understanding of microbial community ecology. However, preparing metagenomic libraries for high-throughput DNA sequencing remains a costly, labor-intensive, and time-consuming procedure, which in turn limits the utility of metagenomes. Several library preparation procedures have recently been developed to offset these costs, but it is unclear how these newer procedures compare to current standards in the field. In particular, it is not clear if all such procedures perform equally well across different types of microbial communities or if features of the biological samples being processed (e.g., DNA amount) impact the accuracy of the approach. To address these questions, we assessed how five different shotgun DNA sequence library preparation methods, including the commonly used Nextera Flex kit, perform when applied to metagenomic DNA. We measured each method’s ability to produce metagenomic data that accurately represent the underlying taxonomic and genetic diversity of the community. We performed these analyses across a range of microbial community types (e.g., soil, coral associated, and mouse gut associated) and input DNA amounts. We find that the type of community and amount of input DNA influence each method’s performance, indicating that careful consideration may be needed when selecting between methods, especially for low-complexity communities. However, the cost-effective preparation methods that we assessed are generally comparable to the current gold-standard Nextera DNA Flex kit for high-complexity communities. Overall, the results from this analysis will help expand and even facilitate access to metagenomic approaches in future studies. IMPORTANCE Metagenomic library preparation methods and sequencing technologies continue to advance rapidly, allowing researchers to characterize microbial communities in previously underexplored environmental samples and systems. However, widely accepted standardized library preparation methods can be cost-prohibitive. Newly available approaches may be less expensive, but their efficacy in comparison to standardized methods remains unknown. In this study, we compared five different metagenomic library preparation methods. We evaluated each method across a range of microbial communities varying in complexity and quantity of input DNA. Our findings demonstrate the importance of considering sample properties, including community type, composition, and DNA amount, when choosing the most appropriate metagenomic library preparation method.

R ecent advancements in high-throughput sequencing have revolutionized genomic discovery and unlocked new insights regarding the diversity and function of microbial communities (1)(2)(3)(4). For example, shotgun metagenomic sequencing has clarified how the functional capacity of the gut microbiome links to human health (5)(6)(7)(8), improved the efficacy of antibiotic resistance gene discovery (9)(10)(11)(12), identified beneficial soil microbes for agricultural use (13)(14)(15), and uncovered novel, medically relevant biosynthetic gene clusters in marine microbes (16)(17)(18). However, while metagenomes offer rich opportunities to transform discovery, the financial cost of producing metagenomic data limits their application. Because much of this cost is associated with the preparation of metagenomic DNA for high-throughput sequencing, there is hope that emergent economical products and procedures can expand the scope of metagenomic investigations.
Illumina's Nextera XT and DNA Flex kits (the latter now known as "Illumina DNA Prep") have been the most widely used methods for preparing metagenomic libraries and have effectively served as industry-standard approaches. Indeed, Illumina DNA sequencing platforms remain the most widely utilized for generating genomic and metagenomic data, and their library preparation kits are accordingly used to prepare samples for sequencing. Due to their frequent use, these kits are subject to extensive evaluation and refinement. For example, Illumina recently released an updated version of their "gold-standard" Nextera XT kit, which was rebranded as Nextera DNA Flex (and now Illumina DNA Prep). This new kit allows greater flexibility across a wider range of genomes, from small genomes (microbial and amplicons) to more complex genomes found in eukaryotic and human systems. The Flex kit also resolved sequencing biases identified in the Nextera XT kit that occur in genomic regions with extreme GC content (19,20). These features of the Nextera DNA Flex kit have contributed to its broad adoption in metagenomic investigations.
One downside to the Nextera DNA Flex kit is its relatively high price, which presently costs roughly $46 per sample. While this cost may be reasonable considering the demand for the product and its observed efficacy, it is high enough that it limits the scale of many metagenomic investigations. For example, studies performing highthroughput analyses on hundreds or thousands of samples may be forced to utilize nonmetagenomic approaches (e.g., 16S rRNA gene sequencing) due to the library preparation expense. In an effort to circumvent this challenge, several alternative and competitive genomic library preparation methods have recently been developed and applied to metagenomic investigations. These approaches fall into two categories: methods that increase the economy of Illumina Nextera by modifying various aspects of the manufacturing protocols (e.g., see reference 21) and those that use entirely different technologies (e.g., seqWell plexWell96). These approaches hold great promise to improve the throughput of metagenomic investigations by reducing library preparation costs. For example, the recent method known as "Hackflex" achieves an 11-fold decrease in per-sample reagent costs compared to the Illumina kit protocols (22).
Although several alternative library preparation approaches have been assessed from the perspective of whole-genome sequencing, very little is known about their accuracy and precision when applied to metagenomic investigations. It is crucial that the performance of novel library preparation procedures be specifically assessed in diverse metagenomic communities as different community types provide unique sequencing challenges not common to traditional whole-genome sequencing. For example, metagenomic communities vary in complexity, with some communities having few distinct taxa (e.g., insect gut) and others being very highly diverse (e.g., soil). Library preparation procedures may vary in their abilities to unbiasedly sample DNA across the different genomes present in the community, whether due to amplification bias in regions with extreme GC content, kit-specific library tagmentation strategies (e.g., enzymatic versus tagmentation), or biases specific to bacterial species present in a community (20). Biological samples vary in their biomass, which affects the amount of wholecommunity DNA that is subject to the library preparation approach. The sensitivity of these approaches to the amount of input DNA may hence impact study outcomes (23)(24)(25)(26).
To advance the utility of low-cost metagenomic library preparation methods, we quantified the performances of five recently developed approaches. Our investigation assessed how different features of metagenomic samples, including community complexity and biomass, impact the performance of these procedures. In particular, we compared Illumina Nextera DNA Flex, a modified DNA Flex protocol (21), Qiagen QIASeq FX DNA, PerkinElmer NextFlex Rapid DNA-Seq 2.0, and seqWell plexWell96 library preparation methods using community-acquired DNA obtained from three different types of microbial communities: low-complexity communities (represented by Acropora hyacinthus microbiomes), moderately complex communities (represented by Mus musculus fecal microbiomes), and a highly complex community (represented by a soil microbiome). We also evaluated how each approach performs on a commercially available mock community comprised of 10 microbial species. Our analysis clarifies the performances of these approaches across these different sample conditions, and the results will assist investigators in identifying appropriate approaches for their metagenomic investigations.
Metagenomic read characteristics were also significantly impacted by the examined variables. For example, the GC content of reads was significantly dependent on the community type [F (3,60) = 2.89 Â 10 4 ; P = 9.38 Â 10 295 ]. GC content was also affected by Collectively, these findings indicate that metagenomic library preparation procedures yield distinct library characteristics for different community types. Different library preparation methods result in similar taxonomic profiles of a standardized mock community. While library preparation procedures vary in the resulting metagenomic library and sequence characteristics, it is unclear if this variation results in different downstream assessments of community composition. To address this question, we quantified how each library preparation method predicted the taxonomic composition of a defined mock community. We compared the taxonomic composition generated by each library preparation to the ZymoBIOMICS microbial community standard's defined taxonomic composition of the mock community. Strong correlations (r = 0.93 to 0.97; P = 1.29 Â 10 26 to P , 2.2 Â 10 216 ; FDR [false discovery rate] , 1.0 Â 10 25 ) were observed between the MetaPhlAn2-inferred taxonomic abundances and the theoretical taxonomic abundances of taxa present in the mock community ( Fig. 1; see also Table S1 in the supplemental material). To confirm that this was not due to bias in the MetaPhlAn2 database, we also compared the inferred taxonomic abundances using Kraken2 and observed similar taxonomic associations (r = 0.93 to 0.96; all P , 2.2 Â 10 216 ; FDR , 2.2 Â 10 216 ) and abundance profiles (Fig. S1).
The MetaPhlAn2 results showed that the Nextera Flex full and Nextera Flex reduced methods, which are widely used in metagenomic studies, had the lowest correlations (r = 0.93 to 0.96; P = 1.29 Â 10 26 to 2.66 Â 10 28 ) with the theoretical composition of the mock community. The lower correlations produced by these two library preparation procedures are driven by an underestimation of the abundance of Lactobacillus fermentum and an overestimation of the abundances of Staphylococcus aureus and Enterococcus faecalis. The strongest correlations were observed with the QIASeq Effects of Library Preparation on Metagenomic Profiles FX (r = 0.97; P = 1.21 Â 10 28 to 5.79 Â 10 29 ), plexWell96 (r = 0.96 to 0.97; P = 2.58 Â 10 28 to 1.24 Â 10 28 ), and NextFlex Rapid (r = 0.96 to 0.97; P = 1.02 Â 10 27 to 2.38 Â 10 28 ) methods (Fig. 1). Together, these data indicate that library preparation methods subtly influence some taxonomic estimates but that all methods examined overall performed well at recapitulating simple, defined microbial communities.
Community taxonomic profiles are significantly impacted by library preparation procedure and input concentration. Next, we quantified the impact of library preparation methods and input concentrations on the species-level taxonomic profiles of soil, coral, mock, and fecal metagenomes. The library preparation procedure was significantly associated with the resulting taxonomic microbiome profiles as measured by permutational multivariate analysis of variance (PERMANOVA) in coral (R 2 = 0.87; P = 2.00 Â 10 24 ) and soil (R 2 = 0.33; P = 5.00 Â 10 23 ) but not in fecal (R 2 = 0.32; P = 0.06) or mock (R 2 = 0.26; P = 0.15) communities ( Fig. 2A). In the fecal and mock communities, no association was found between the input concentration and microbiome beta-diversity. However, in coral (R 2 = 0.02; P = 0.02) and soil (R 2 = 0.13; P = 5.4 Â 10 23 ) communities, we identified a significant association between diversity and input. We also identified a significant interaction effect between input concentration and metagenomic preparation method in coral (R 2 = 0.07; P = 1.4 Â 10 23 ). The taxonomic abundance profiles of each library were highly correlated (r = 0.63 to 1; FDR , 2.2 Â 10 216 ) across library preparation methods (Fig. 2B), suggesting that the preparation methodology associates with distinct community profiles and offering a high level of profile prediction precision between each preparation method.
To quantify how variable individual replicates were across different library preparation methods, we examined the dissimilarity in taxonomic beta-diversity within each library preparation method. Methods that yield low intraconcentration (i.e., 1-ng/ml input) dissimilarities indicate high levels of reproducibility, while methods with low interconcentration dissimilarities (i.e., all concentrations) would indicate that the taxonomic profiles generated using this method are robust to variation in the library input concentration. We found that variation in intraconcentration taxonomic dissimilarity was low, and modest significant differences were observed only in fecal communities (H = 10.2; P = 0.04) across library preparation methodologies (Fig. 2C). The interconcentration dissimilarity was also low in all communities but varied significantly in fecal (H = 12.3; P = 0.02), mock (H = 11.6; P = 0.02), and soil (H = 15.6; P = 3.6 Â 10 23 ) samples but across preparation methods potentially due to variance in dissimilarity for the Nextera Flex reduced and plexWell96 libraries (Fig. 2D). Together, these data suggest that the taxonomic profiles generated using the methods under investigation are similarly reproducible, but the robustness varied across methods.

DISCUSSION
Taxonomic and functional profile predictions are similar across methodologies. Although the Nextera kits are widely used and considered the gold standard for metagenomic sample preparation, their cost can limit researchers from conducting expansive project aims. As applications for metagenomic sequencing continue to increase, researchers are left with the difficult task of balancing the need for high-quality data with the cost of their generation. The development of new protocols that modify the standard Nextera kit protocol as well as several new economical library preparation kits has the potential to dramatically alter the field by expanding the accessibility of shotgun metagenomics. However, the quality of libraries prepared using more economical methods varies substantially (19). While previous studies have demonstrated that different library preparation procedures can affect metagenome characteristics (27)(28)(29)(30), these studies did not evaluate contemporary procedures, nor did they consider the sensitivity of the approaches to different metagenome sample types. Here, we demonstrate that library quality as well as taxonomic and functional profiles vary as a function of environmental community type and biomass. Our findings suggest that while researchers need to be aware of differences between kits, overall, the taxonomic and functional profiles produced are similar and grant comparable precision among the kits (Fig. 5).
Several investigations have identified key differences in library characteristics across metagenomic library preparation procedures, often by incorporating multiple study designs. These variations can result in substantial changes in the quality of the metagenomic library and are important considerations in preparation method selection. For example, Baym et al. demonstrated that a custom Nextera XT protocol yielded a substantially reduced insert fragment length (21). Smaller fragments have higher proportions of adapter contamination in reads, while fragments that are too large may be preferentially lost during the Illumina cluster generation process (31). We observed significant effects of community type, library preparation procedure, and input DNA concentration on fragment size. In our hands, the Nextera Flex protocols generated the largest library insert sizes, while the QIASeq FX and plexWell96 procedures consistently produced the smallest. However, the Nextera procedures also produced libraries with the lowest average GC content compared to the other procedures examined. This reduced representation of GC content could impact the representation of genes with high GC content and skew both taxonomic and functional profiles (19,32). The interacting effects of library preparation procedure, community type, and DNA input on GC content further indicate that specific library preparation procedures may have distinct insertion site biases.
Comparing library characteristics across environmental sample types, samples with low relative diversity (i.e., coral) had both a high percentage of duplicate reads and a high number of reads filtered and removed from the resulting libraries regardless of the input concentration. This high level of filtering is likely due to the extreme levels of host DNA contaminants relative to the other sample types and additionally may point to the larger issue of host sequence contamination, regardless of the library preparation method, in similar research studies. However, coral samples also had similar levels of precision across kit types, with the exception of lower precision with QIASeq FX, demonstrating that different kit types may still be viable options in other low-complexity study systems.
Samples with moderate (fecal) and high (soil) microbial diversity had much lower respective average percentages of sequencing reads filtered than coral samples but with a higher average GC content across libraries than coral samples. Fecal sample libraries had the highest variability in precision between kit types, with the exception of Nextera Flex reduced, likely due to the more complex community composition. However, our study also had a relatively low average sequencing depth across all samples due to sample number and financial constraints; the high intracommunity variability in precision that we observed may be resolved with higher sequencing coverage (33). It is also possible that longer insert fragment sizes introduce greater variability due to lower base quality in the produced community composition regardless of sample type (34), although this was not the case for coral, soil, or mock libraries with the longest fragment sizes.
In successfully recapitulating the taxonomic profiles of a mock microbial community, all library preparation methods performed similarly overall; however, variation in taxonomic profiles for the environmental sample types showed subtle differences between methods. While higher levels of intracommunity variation per method could again be due to low sequencing depth, our results of higher variation for the coral sample types with lower relative diversity are consistent with previous findings that library coverage is increased for highly complex microbial communities. Furthermore, while it may appear that all preparation methods perform poorly in both taxonomic and functional resolution for low (coral)-and high (soil)-diversity sample types, it must be noted that these profiles may only be as complete as the reference databases used for assignment, and it is well known that these databases are preferentially curated with human microbiome sequences and studies in mind (35).
Financial and opportunity costs of metagenomic preparation methods differ. Decreasing the costs of kits and reagents associated with library preparation improves access to metagenomic approaches. The Nextera DNA Flex full preparation actualized cost remains the most expensive of the five methods tested, with NextFlex Rapid XP and QIASeq FX in the median relative expense range and Nextera Flex reduced prep and plexWell96 as the most economical choices for metagenomic library generation. However, due to the above-noted effect of the specificity of the environmental sample type on the performance of the preparation method, neither the most economical choice nor the most expensive may necessarily suit every study or generate the highest-quality libraries. Due to the effects of preparation procedure, community type, and DNA input on fragment size and both taxonomic and functional profiles of metagenomic samples, comparing communities across multiple study designs may require additional covariates in statistical design. For future studies, we recommend incorporating the library preparation technique as a potential covariate in statistical design to account for these known differences and potential biases.
Finally, one important limitation of our study is the lack of biological variation (i.e., only one sample per community type), which makes it challenging to determine whether technical variation inherently matters to a particular study. For some applications, such as biomarker discovery, technical variance may contribute to decreased sensitivity and specificity if technical biases exist for a library preparation procedure. On the other hand, technical variation is expected to have more modest impacts on studies of differential gene abundance across case and control groups because biological variance will likely overwhelm technical variance. Thus, the interpretation of the impact of the technical variation on future analyses should be carefully considered within the context of the biological variation of their specific application.
Conclusion. Collectively, these findings demonstrate that no single metagenomic library preparation approach performed the best across all community types and conditions evaluated. Rather, the performance of approaches varied as a function of the sample and the amount of input DNA. Consequently, researchers should consider these variables when selecting library preparation approaches, especially when attempting to optimize data quality, accuracy, and precision. To aid in this effort, we provide Fig. 5 as a reference guide to aid in choosing preparation methods with cost and performance in mind. We hope that this information helps improve the accessibility and utility of metagenomic investigations. Further study is needed to determine what community properties (e.g., GC content and taxonomic diversity, etc.) dictate these differences in library procedure performance in order to generate more generalizable guidance for procedure selection. That said, our results show that the different approaches generally produced relatively consistent taxonomic and gene family diversity profiles, which indicates that selecting approaches based on cost and ease of implementation may be appropriate for some studies (namely, those in which the loss of accuracy and precision is tolerable). However, we recommend careful consideration of the community type and the amount of input DNA when selecting a metagenomic library preparation procedure to ensure optimal performance.

MATERIALS AND METHODS
Genomic DNA extraction. Prior to metagenomic library construction, genomic DNA was extracted from environmental samples originating from soil from the North American Project To Evaluate Soil Health Measurements (36), coral (Acropora hyacinthus), and mammalian feces (Mus musculus; C57BL/6), using methods outlined below. In addition to environmental samples, we used the ZymoBIOMICS microbial community DNA standard (catalog number D6306, lot number ZRC193008) to more efficiently assess bias associated with library preparation methods on a standard mock community.
For the coral slurry, coral nubbins preserved in RNA/DNA Shield (ZymoBIOMICS) were vortexed in 15-ml tubes with a combination of ceramic and garnet bead lysing matrices at ;2,500 rpm for 25 min. DNA was extracted from 300 ml of the resulting coral slurry using the ZymoBIOMICS DNA/RNA miniprep kit (Zymo Research Corp., Irvine, CA, USA) following an additional 2-step enzyme incubation to increase the recovery of bacterial DNA: (i) the addition of 30 ml chicken egg white lysozyme (10 mg/ml; Novagen), 1.8 ml mutanolysin (50,000 units/ml from Streptomyces globisporus ATCC 21553; Sigma-Aldrich), and 1.8 ml lysostaphin (4 KU/ml from Staphylococcus staphylolyticus; Sigma-Aldrich) with incubation at 37°C for 1 h and (ii) 1 h of incubation at 50°C following the addition of 15 ml proteinase K (20 mg/ml; Thermo Scientific) and 30 ml proteinase K digestion buffer (0.1 M NaCl, 10 mM Tris [pH 9.0], 1 mM EDTA, 0.5% SDS, nuclease-free water). Following digestion, 1 volume of kit-specific DNA/RNA lysis buffer was added in order to proceed with the manufacturer's recommended extraction protocol.
For soil, the sample was taken on 27 February 2019 at the Virginia Tech Eastern Shore Agricultural Research and Extension Center. Samples were collected as 12 composite knife slices of soil to a depth of 15 cm, and each of the 12 slices was passed through an 8-mm filter. Detailed sampling methods were described previously by Norris et al. (36). Following collection, 0.25-g aliquots of soil were stored at 280°C after overnight shipment from the collection site. Soil aliquots were then extracted according to the Earth Microbiome Project protocol (37) using a KingFisher Flex kit (Thermo Fisher).
For mouse feces, DNA was isolated from a single fecal pellet using the DNeasy PowerSoil isolation kit (Qiagen) according to the manufacturer's instructions. An additional 10-min incubation step at 65°C directly before bead beating was added to enhance microbial cell lysis. The samples were then homogenized using Vortex-Genie 2 and a vortex adapter (Qiagen) at the highest setting for 10 min.
Metagenomic library preparation and sequencing. Environmental DNA samples were prepared for metagenomic sequencing according to the manufacturers' protocols using the following four commercially available kits: (i) the Illumina Nextera DNA Flex library kit, (ii) the Qiagen QIASeq FX DNA library kit, (iii) PerkinElmer NextFlex Rapid DNA-Seq kit 2.0, and (iv) seqWell plexWell96. In addition, we included a fifth preparation method using the modified "reduced" protocol established by Baym et al. to increase the number of libraries that Nextera DNA Flex could generate (21). Genomic DNA was quantified using a Qubit 1Â high-sensitivity (HS) double-stranded DNA (dsDNA) assay kit for soil, fecal, and coral communities. The mock community DNA concentration was not quantified as ZymoBIOMICS manufacturer information provided a known concentration of 100 ng/ml. Following quantification, all samples prepared using Nextera Flex full, QIASeq FX, and NextFlex Rapid XP were diluted with water to 0.2 ng/ml. To determine how the DNA input affected library generation, each standardized DNA concentration was then added to obtain the respective 0.5-ng, 1.0-ng, and 5.0-ng inputs. Samples prepared using plexWell96 were diluted to 0.25 ng/ml with water, and appropriate additive volumes were made to obtain 0.5-ng and 1.0-ng input concentrations. For samples with 5.0-ng inputs, samples were diluted to 1.25 ng/ml, and 4 ml of the sample was then used to obtain a 5.0-ng input concentration. For the Nextera Flex reduced reaction, all samples were diluted to 5.0 ng/ml with nuclease-free water. A 1-ml aliquot of this dilution was used for 5.0-ng input libraries. For 0.5-ng and 1.0-ng input libraries, sufficient water was added to the 1-ml dilution to bring the respective concentrations to 0.5 ng/ml and 1.0 ng/ml. The library insert size was assessed for the Nextera DNA Flex (full and reduced) and plexWell96 methods using Agilent TapeStation 4200 high-sensitivity D5000 DNA ScreenTape. The insert size for the QIASeq FX and NextFlex Rapid XP methods was quantified using the Agilent Bioanalyzer 2100 high-sensitivity DNA chip as these libraries are more prone to having adapter dimers, which are poorly resolved using the TapeStation. The library concentration was assessed with the Qubit 1Â high-sensitivity dsDNA quantification kit (Thermo Fisher). The resulting libraries were normalized to the lowest concentration for each library prep kit based on molarity using the Qubit concentration and Bioanalyzer/TapeStation median fragment sizes (except for plexWell96, which pools early on in the library prep procedure). All 20 of the normalized samples from each kit were then pooled, and each of the 5 pools was verified using quantitative PCR (qPCR). These 5 pools were normalized and pooled just prior to sequencing for pairedend reads of 150 bp on a single lane with the Illumina HiSeq3000 system.
Microbial community gene family abundance and taxonomic diversity. Quality filtering, adapter removal, and host read filtering were performed using shotcleaner v0.1 (38) with default parameters. For mouse fecal samples, host reads were removed by alignment to the mouse reference genome (GRCm38). A similar procedure was used for coral samples except that these reads were filtered against a concatenated version of the coral (Acropora millepora; GenBank accession number QTZP00000000.1 [39]) and symbiont (Symbiodiniaceae sp. clade A MAC-Cass KB8 [UniProt taxon identifier 671378]) genomes. Quality-controlled sequence reads were input into HUMAnN 2.0 (40) for taxonomic and functional classification using the UniRef90 database and default parameters. HUMAnN 2.0 outputs for each community type were combined and renormalized to counts per million using HUMAnN 2.0 utility scripts before downstream analysis. High-quality reads were also taxonomically classified using Kraken2 v2.0.8-beta (41,42) and a custom reference database that included sequences from all human, mouse (GRCm38), UniVec core, bacterial, archaeal, viral, fungal, and protozoal sequences in the NCBI RefSeq database (accessed 8 October 2019) as well as the Symbiodiniaceae sp. clade A MAC-Cass KB8 and A. millepora genomes. Taxonomic data derived from Kraken2 were normalized by dividing the number of taxonomic annotations within a given hierarchy by the total number of overall reads annotated (i.e., relative abundances of different bacterial species, genera, or phyla).
Statistical analyses. Independent linear models (R::stats::lm) were used to determine how community type, library preparation method, and input DNA concentration affect the variance of the resulting library characteristics, including the number of reads generated, median fragment size, library concentration, library molarity, mean read length, read duplication rate, mean read GC content, and total reads filtered and removed. Since we reasoned that it was likely that interactions between the predictors exist, we employed a model selection procedure to identify the most parsimonious model for each characteristic examined. For each characteristic, we built a set of models of increasing complexity: (i) a reduced model with only additive effects (equation 1) and (ii) a model with interaction terms for community type, library preparation procedure, and DNA input concentration (equation 2).
We then used the Akaike information criterion (AIC) to select the most parsimonious model and analysis of variance (ANOVA) to determine the significance of each term in the selected model [F(df 1 = K 2 1, df 2 = n 2 K, a = 0.05)]. Because sequencing libraries produced from distinct samples are pooled as part of the plexWell library preparation protocol, a single value is available for median fragment size, sequence library concentration, and sequence library molarity for this method.
The similarity between taxonomic profiles generated by the library preparation methods and the known taxonomic composition of the ZymoBIOMICS microbial community DNA standard was assessed by Pearson's correlation test (R::stats::cor.test). To measure the variation in species-level taxonomic profiles generated by different library preparation methods across soil, coral, and fecal communities, we calculated Pearson's correlation coefficient of the generated taxonomic abundance profiles for each pair of samples. This analysis was conducted using Kraken2, a sensitive read-binning tool, and MetaPhlAn2, a marker-gene-based abundance estimation tool to eliminate the possibility that mammalian biases in marker gene databases would skew results in environmental samples. We accounted for the effects of multiple correlation tests using the false discovery rate (R::stats::p.adjust, method = fdr).
The additive and interactive statistical effects of library preparation and DNA input concentration on the microbiome composition, as measured by the Bray-Curtis dissimilarity metric, were evaluated using PERMANOVA (R::vegan::adonis, permutations = 5,000, method= bray) and visualized using an ordination of principal-component analysis (PCA) for each community. Differences in the Bray-Curtis dissimilarity of species-level taxonomic abundance profiles within and across library preparation methods were measured using Kruskal-Wallis tests (R::stats::kruskal.test) with a post hoc pairwise Wilcoxon test (R::stats::pairwise.wilcox.test). A Holm correction was used to control Wilcoxon test family-wise error rates.
Shannon entropy and gene richness were calculated for HUMAnN 2.0 gene abundance profiles using R and vegan. Linear regression quantified associations between gene-level alpha-diversity and library preparation and input DNA concentration for each community type. Associations with gene-level Bray-Curtis dissimilarity, preparation method, and DNA input were quantified using PERMANOVA (R::vegan:: adonis, permutations = 5,000, method = bray). Differences in gene abundances and metagenomic dissimilarity were quantified as described above for taxonomy.
Data availability. Raw sequence data are available from the NCBI under BioProject accession number PRJNA747032. The GitHub code repository for reference of processing and analyzing raw data is available at https://github.com/chrisgaulke/ht_metagenomes.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, EPS file, 0.4 MB. TABLE S1, CSV file, 0.02 MB.