Environmental DNA metabarcoding as a tool for biodiversity assessment and monitoring: reconstructing established fish communities of north‐temperate lakes and rivers

To evaluate the ability of precipitation‐based environmental DNA (eDNA) sample collection and mitochondrial 12S metabarcoding sequencing to reconstruct well‐studied fish communities in lakes and rivers. Specific objectives were to 1) determine correlations between eDNA species detections and known community composition based on conventional field sampling, 2) compare efficiency of eDNA to detect fish biodiversity among systems with variable morphologies and trophic states, and 3) determine if species habitat preferences predict eDNA detection.


| INTRODUC TI ON
Obtaining basic community structure and diversity estimates is critical for effective biomonitoring and conservation of biodiversity (Dudgeon et al., 2006;Maxwell & Jennings, 2005;Tilman, 1999). In aquatic systems, community composition can be used as an indicator of system health (Branton & Richardson, 2011), ecosystem function (Humbert & Dorigo, 2005) and even physical structure (Dodson et al., 2000). Therefore, describing and monitoring the native community of a system prior to significant change can help inform future mitigation of anthropogenic stressors and the loss of endemic species (Banks et al., 2010;Dudgeon et al., 2006). Unfortunately, freshwater streams and lakes around the world often lack basic biodiversity data despite their critical importance for local communities (Fluet-Chouinard et al., 2018;Lynch et al., 2016). Environmental DNA (eDNA) metabarcoding has the potential to offer ecologists and conservation biologists the ability to efficiently describe community composition in systems that may be impractical to sample with conventional techniques (Bohmann et al., 2014;Evans et al., 2016;Olds et al., 2016).
The promise of eDNA metabarcoding to describe and monitor freshwater communities has led to a large increase in eDNA studies over a short time. In particular, eDNA has been increasingly used to study fish communities which were conventionally surveyed with nets and traps that target different subsets of the community (e.g. Aylagas, Borja, Irigoien, Rodríguez-Ezpeleta, 2016;Yamamoto et al., 2017). Many of these studies have directly assessed how community estimates based on eDNA compare with conventional sampling techniques (Hänfling et al., 2016;Sard et al., 2019). Comparisons between eDNA metabarcoding and conventional sampling indicate that eDNA metabarcoding often identifies similar, but slightly different community estimates as conventional sampling (Fujii et al., 2019;Shaw et al., 2016). In some cases, eDNA metabarcoding requires less sampling effort Sard et al., 2019) and identifies novel species that were undetected with conventional sampling Valentini et al., 2016). One potential reason for novel detections with eDNA could be life history sampling bias of conventional sampling techniques (Clark et al., 2007;Murphy & Willis, 1996). The growing evidence that eDNA metabarcoding can complement conventional sampling of fish communities has encouraged many agencies to begin considering how to implement eDNA sampling into local monitoring and management efforts (Bohmann et al., 2014;Valentini et al., 2016). Understanding how and why eDNA metabarcoding detections differ from conventional methods is critical for the design and interpretation of eDNA sampling.
One region where eDNA metabarcoding assessments have the potential to expand conservation biology is the Great Lakes region of North America. This region was glaciated until 12,000 YBP and contains thousands of natural lakes with complex fish communities (Downing & Duarte, 2006). Natural resource managers have conducted intensive sampling and monitoring efforts throughout the region. However, the majority of lakes have never been formally surveyed or have no reported biodiversity data. Additionally, where surveys have been conducted, the focus has been on game species, rather than total biodiversity. In Wisconsin, USA, there are 14,973 lakes reported in the Wisconsin Department of Natural Resources database, of which 3,426 (23%) have reported fisheries data from between 1990 and 2020 (Zachary Feiner, Wisconsin Department of Natural Resources, personal communication, July 21, 2020).
The ability to quickly survey the community composition of novel systems could help to identify the presence of rare or endangered species (Ficetola et al., 2008), estimate diversity , or identify dominant fish communities to help predict broad scale patterns of community diversity (Sard et al., 2019).
Sampling protocols can have a large influence on eDNA metabarcoding results (Creer et al., 2016;Dickie et al., 2018). Recent studies have commonly used filtration protocols to sample eDNA from water whereby a predetermined volume of water (usually 1-2 litres) is collected per waterbody and pumped through sterilized filters. Filters can then be stored in preservative or frozen until extraction (Li et al., 2018). While filtration techniques have been used with success, excessive particulate matter can clog filters and greatly increase the processing time per sample. One method that can eliminate filtration and therefore expedite per-sample processing time is the use of a centrifuge to pellet organic matter and eDNA (Ficetola et al., 2008). A version of this protocol has been used successfully in single species detection eDNA studies of invasive bighead and silver carp (Hypophthalmichthys nobilis and H. molitrix) populations in the Mississippi drainage which often inhabit areas with a high degree of suspended particulate matter (Erickson et al., 2016;Merkes et al., 2014) as well as for the detection of endangered species (Lor et al., 2020). Because large volumes can be difficult to centrifuge, the centrifugal method relies on a greater number of smaller volume water samples (50 ml), the eDNA of which can be directly extracted from centrifuge-generated pellets without filtration. Because lakes in Wisconsin range from extremely eutrophic with a high density of suspended particulate matter to oligotrophic with a low-density suspended particulate matter (Birge & Juday, 1934), a single sampling technique that can successfully sample both extremes is desirable.
Here, we apply small volume/high number eDNA sampling with centrifugal sample processing and metabarcoding to describe the species communities of seven different reference waterbodies in Wisconsin and two in Illinois and Iowa that have varying degrees of productivity and community compositions well characterized by conventional sampling methods. We incorporate more than 25 years of data collected on fish community structure from the Mississippi wisc.edu/). We evaluated the efficacy of centrifugal eDNA sampling to describe lake and river fish communities in these nine systems using these monitoring data and a detailed habitat/substrate preference database for detected fish species (Berker, 1983;Frimpong & Angermeier, 2009;Froese & Pauly, 2019;USGS, 2005). The objectives of our study were threefold: a) determine correlations between eDNA species detection and current and historical catch data, b) compare efficiency of eDNA to detect fish biodiversity among systems with different known community compositions, and c) identify habitat/substrate preferences that predict if a species can be detected using eDNA.

| Waterbody selection
We selected five inland lakes (McDermott Lake, Lake Mendota, Sparkling Lake, Trout Lake and Lake Wingra) and one bog (Trout Bog) located in Wisconsin, as well as two navigation pools of the Upper Mississippi River System (UMR-13 and UMR-19) located between Illinois and Iowa for this study (Table 1). We selected waterbodies from this region based on the amount of available fish community data and to represent a wide range of latitudes (40.398-46.041 N), mean depth (3-14 m), surface area (1-1565 ha), trophic state (oligotrophic-eutrophic) and species richness (see Table 1). The fish communities at each waterbody were previously well characterized by long-term routine sampling with conventional methods (nets, traps, electrofishing boat, etc.)

| eDNA sampling
We collected 50-ml water samples at 10-50 sites from each waterbody. Samples were collected at LTER sites over the course of three days between July 23rd and July 25th 2018 (Table S1), and UMR-19 and UMR-13 were sampled on June 19th and June 20th TA B L E 1 Waterbody structure and detection thresholds. Known richness was calculated based on all available data for a waterbody between 2014 and 2020 with years of data shown in parentheses. Read counts for Threshold 1 were calculated based on laboratory blanks (n = 18) and read counts for Threshold 2 were calculated from field blanks samples where the number of samples (n) was 10% of total samples collected at a given waterbody. 2018, respectively. The number of samples was approximately proportional to the surface area of each water body, resulting in a total of 327 water samples processed for eDNA metabarcoding across all nine systems. Within each waterbody, sample collection was divided 1:1 between nearshore and offshore sites (defined as less than or greater than 10 m from shore, respectively). Samples were spread approximately evenly across its surface area. Nearshore sites were selected to be representative of the waterbody's major nearshore habitat types (e.g. rock, sand, macrophytes, etc.). For thermally unstratified waterbodies (n = 5), all samples were collected from the surface by hand-dipping a 50-mL centrifuge tube just below the surface of the water. Sterile gloves were worn to prevent contamination and were changed between each sample. For thermally stratified waterbodies (n = 4), all nearshore samples were collected at the surface and offshore samples were divided 2:1 between surface water and at-depth water.
At-depth samples were taken from haphazardly selected depths within the hypolimnion using a Van Dorn sampler. The Van Dorn sampler was soaked in a 20% bleach solution for 30 min (min.) and rinsed with tap water between waterbodies to prevent DNA crosscontamination. Negative field blanks containing 50 ml of ultrapure water were filled in the laboratory and opened to the air for ~ 5 s and had their lower ends submerged in the water during the sampling of each waterbody. The number of negative field blanks differed by waterbody (1 to 5 samples) but was approximately 10% of the total number of samples for each waterbody. Samples were immediately placed on ice and remained refrigerated for no more than 48 hr before being delivered to the molecular laboratory at the U.S.

Geological Survey Upper Midwest Environmental Sciences Center
(UMESC) and frozen at −80 degrees C (°C) until sample centrifugation and DNA extraction.

| DNA extraction and library preparation
All water samples including field blanks were processed according to Merkes et al. (2014). In short, each 50-mL sample was centrifuged at 5,000 g for 30 min. at 4°C. The supernatant was decanted, and DNA was extracted from the remaining pellets with a final elution volume of 100 µl using the gMax Mini Genomic DNA Extraction Kit (IBI Scientific, Peosta, IA, USA) according to the manufacturer's instructions. Two extraction blanks were processed alongside all water samples all the way through sequencing and data analysis to account for potential contamination during extraction.
Illumina MiSeq libraries were prepared using vertebrate-specific primers that amplified a region of the mitochondrial 12S gene (Riaz et al., 2011). These primers have been used successfully in similar metabarcoding studies to amplify this particular region for fish (Sard et al., 2019;Gehri et al., 2020). To ensure that sufficient DNA was sequenced, the samples were amplified using a two-step PCR protocol. We first amplified each sample in 50 µl PCR reactions using

| Bioinformatics and sequence processing
Sequence data were processed using protocols outlined in Gehri et al.
(2020). Raw reads were demultiplexed then primer sequences and putative adapter contamination were trimmed from each sequence using cutadapt version 2.01 (Martin, 2011). Cleaned reads were then processed in DADA2 version 1.16 (Callahan et al., 2016) to remove all sequences < 100 bp or longer than 125 bp and reads with a quality score (truncQ) ≤ 2 or expected error (maxEE) > 2. Next, putative chimeras were identified and removed also using DADA2. As the final step in the DADA2 pipeline, forward and reversed reads were merged and a sequence identical or similar 12S sequences, we assigned sequences with ambiguous species or genus level assignments to the lowest common taxonomic unit (i.e. family or genus). We used custom R scripts to export feature tables containing all samples and read counts for the lowest taxonomic unit and subsequently filtered to avoid falsepositive detections and account for sample contamination.
Field blanks, extraction blanks and NTCs were used to account for contamination following similar procedures as those outlined in Sard et al. (2019) and . First, putative fish detections in each water sample were grouped into two categories based on the number of independent samples with a non-zero read count for a particular waterbody. Any putative detection that occurred in two or more samples (5%-20% of the total samples for a lake depending on sample size) was considered to be a true detection for that system and the taxon was counted as present in the system. Any putative fish detection that occurred in only a single sample was subjected to two additional read count thresholds based on the number of reads in extraction blanks, NTCs, and field blanks. Threshold 1 was determined by using the read counts assigned to fish taxa from NTCs and extraction negatives. Read counts from both extraction blanks and NTCs were combined into a single "laboratory blank" group (n = 18) and the average number of reads assigned to any fish taxa was used as the baseline threshold needed to define a species as "detected" if reads were only identified in a single sample at a waterbody. Threshold 2 was defined as the mean number of reads assigned to fish taxa in field blanks from a particular waterbody and then applied to all water samples from that waterbody. Therefore, any read counts found in only a single water sample at a particular waterbody that did not exceed both thresholds 1 and 2 were considered to be putative contamination and removed from further analysis. All downstream data processing and figures were constructed in R version 3.6 (R Core Team, 2019) using custom scripts and the ggsci, ggpubr and tidyverse packages (Kassambara, 2020;Wickham et al., 2019;Xiao, 2018) and the eDNA species detection matrix (Table S3).

| Obj 1. Community assessment
We created a matrix of known fish communities based on routine long-term monitoring data or intensive short-term sampling using conventional netting and electrofishing methods (Table S2)

| Obj 2. Efficiency of eDNA among systems with different known community compositions
To determine if eDNA metabarcoding could efficiently identify waterbody-level differences in community composition, we conducted non-metric multidimensional scaling (NMDS) using known community compositions and eDNA detection matrices. We conducted the NMDS three times based on family, genus and specieslevel detections with Bray-Curtis dissimilarity and 2 ordination axes.
The similarities between matrices were evaluated using analysis of similarities (ANOSIM) available in the vegan R package (Oksanen et al., 2019).

| Obj 3. Relationship between habitat/substrate preference traits and species detection
To determine whether habitat/substrate preference influenced the probability of detection with eDNA, we conducted a discriminant analysis of principal components (DAPC; Jombart, 2008) with five retained principal components using basic habitat/substrate preference matrix as the predictive matrix (Table S4). Habitat preference was a binary variable whereby 1 indicated that species has been frequently found in the habitat and 0 indicated a general absence from the habitat. We included 12 habitat/substrate types in our analysis: bedrock, boulders, clay/silt, cobble, gravel, woody debris, muck, organic debris, pelagic water, lotic environments, sand and vegetation.
A DAPC analysis is a multivariate method to describe distinct variable clusters by identifying the largest between-group variance and smallest within-group variance. DAPC first transforms a matrix using principal component analysis (PCA) then performs a discriminant analysis on all the user-retained principal components. Group differences are described as discriminant function loadings whereby species with dissimilar habitat preferences will have dissimilar loadings.
By comparing the loadings of each species known to inhabit each system grouped by whether or not the species was detected through eDNA metabarcoding, we were able to determine whether eDNA metabarcoding detected a biased subset of fish communities and, if so, identify traits or sets of traits that may predict eDNA detection.
The R package adegenet was used to conduct eight total DAPC's, one for each waterbody, as well as a final test where all waterbodies were combined with each analysis including the first five principal components (Jombart, 2008). Evidence of significant bias in habitat/ substrate preference was determined based on reassignment accuracy of eDNA-detected species to the eDNA-detected group of at least 90%.

| Sequence summary
Twelve samples contained less than the minimum recommended concentration for sequencing (2 nM), so those were removed in the final steps of library preparation. Another sample was removed after sequencing that contained zero sequence reads after sequence fil- Of all reads that remained following sequence filtering in DADA2, a total of 267 amplicon sequence variants were identified that could be assigned to 74 unique taxa of which 53 fish taxa. Each sample contained between 0 and 17,059 reads that assigned to a fish taxon (mean = 475, median = 172). Field and laboratory blanks contained between 0 and 5,438 reads assigned to fish taxa (mean = 317, To account for false-positive eDNA detections in the water samples (detections of species in waterbodies where there is no resident population), we set thresholds based on the number of samples with positive read counts for a particular taxon and two additional read count thresholds based on the number of reads assigned to each F I G U R E 1 Sequential loss of per-sample sequence reads during filtering and assignment for each waterbody. Density distributions show the total number of sequence reads present after each step of the analysis pipeline beginning with the total reads present in demultiplexed fastq files (raw reads), followed by quality filtering and removal of chimeras in DADA2 (filtered reads), and, finally, reads that were successfully assigned to a fish taxa with 98% accuracy (assigned reads). Triangles show the read counts of the 1-5 field blanks at each filtering step. Unfiltered reads in one of the field blanks in Crystal Lake are out of the boundaries of the x-axis because the number of reads exceeded most of the samples taxon in field and laboratory blanks. First, taxa detected in two or more samples from the same waterbody were considered true detections and retained (1,008 of 1,054 positive read count observations). Next, the 46 positive read count observations that occurred in only one sample from a waterbody were subjected to the two-part threshold based on read counts from laboratory and field blanks. Of these 46 observations, 39 failed to pass Threshold 1 because they contained fewer than 52 total reads which was the mean read count in laboratory blanks. Another four observations passed Threshold 1 but failed the waterbody-specific Threshold 2 (Table 1). This resulted in 1,011 observations that were considered "detections," of which 1,008 observations were observed in at least two samples at a particular waterbody and an additional three observations that were observed only once at a particular waterbody but passed read count Thresholds 1 and 2. Positive fish taxon detections were identified in 278 of 314 samples and represented 44 unique fish taxa (Table S4).

| Obj. 1: Intersection between eDNA species detection and known fish communities
We detected 15% of known species at our highest diversity waterbody (UMR-13) and 100% of species at our lowest diversity waterbody (Trout Bog) but detected an average of 39% of the known community across all waterbodies with eDNA metabarcoding data (Table 2). Because many closely related species shared identical or almost identical 12S sequences, species-level designations using a single gene was not possible in many cases. However, there was a marked increase in overlap between known community composition and eDNA detections when taxa were grouped by genus (average = 54%) or family (average = 67%) rather than by species (Table 2). Common species had a higher rate of detection than the total community (Table 3). On average, eDNA detected 52% of the species observed in every year of the last 5 years of conventional sampling surveys.
Species not present in the known community of a particular waterbody were detected with eDNA eight times across four of the nine waterbodies sampled. Three novel species were detected in Trout Lake (Fundulus diaphanus, Pimephales promelas and Umbra limi). Of the three species, only P. promelas has been detected in Trout Lake historically (outside of the most recent five years of monitoring

| Obj 3. Relationship between habitat/substrate preference traits and species detection
Trout Bog had low known species diversity, so it was not used for habitat/substrate preference analysis. Discriminant analysis of principal components was used as a method to determine whether there were large differences between the habitat/substrate preferences shared by species that were detected by eDNA metabarcoding and the habitat/substrate preferences shared by species that were not detected by eDNA metabarcoding. We predicted that if species with certain habitat/substrate preferences were more easily detected than others, species loading scores of detected and undetected groups along Discriminant Function 1 should be diverged. However, we observed high overlap in species loading scores Discriminant Function 1 (Figure 3) indicating that detected and undetected species shared many of the same habitat/substrate preferences. The scale of the overlap between groups can be summarized using a reassignment accuracy estimate, whereby if habitat/substrate preferences are strongly diverged between groups, reassignment accuracy should be high. In our analysis, the reassignment accuracy for detected and undetected species was low both overall when detections from all lakes were combined (28%) and within each waterbody (8%-66%; mean = 42%). The explanatory contribution of the habitat variables (i.e. proportion of variance along Discriminant Function 1 explained by preference for a certain habitat/substrate) was inconsistent among waterbodies (Figure 4). Preference for lotic environments appeared to somewhat consistently explain the most variance in eDNA detection but was only higher than 20% in 4 out of 7 sites.
Other habitat preferences explained similarly high amounts of variance (>20%) in certain waterbodies. However, because the variable contribution of same habitat preference was rarely high in more than one waterbody, we concluded that habitat preference was not a good predictor of whether or not a species would be detected with eDNA metabarcoding.

| D ISCUSS I ON
Freshwater systems represent a disproportionate amount of global biodiversity by surface area and face substantial anthropogenic stresses including climate change, pollution and species introductions (Collingsworth et al., 2017;Strayer & Dudgeon, 2010).
However, biodiversity and fisheries data for freshwater systems can be limited (Bower et al., 2020;Strayer & Dudgeon, 2010). Our study shows that centrifugal DNA isolation followed by 12S eDNA metabarcoding sequencing can provide basic biodiversity information for a broad range of freshwater systems. Using 50-ml water samples collected from waterbodies with a wide range of physical and trophic characteristics, we were able to detect 30%-50% of the known fish community and 50%-60% of common fish species in a single eDNA sampling event using sequence data from a single gene region and publicly available reference sequences. Sampling and sample processing effort for this study represents a fraction of the hundreds of hours of effort involved in establishing the known fish community which is based on 5 years of data and hundreds of hours of sampling with beach seins, electrofishing, gillnetting, fyke netting and minnow traps (NTL-LTER, LTRM). Additional eDNA surveys would almost definitely result in the detection of more species, but these results indicate that even single-pass metabarcoding could provide useful information about the biodiversity in understudied systems.
Using eDNA metabarcoding, we reconstructed the broad patterns of known community structure and successfully differentiated lake and river pool communities as well as communities between lakes of different trophic states. These results indicate that eDNA metabarcoding data could be useful to conduct coarse but consistent biodiversity assessments across a large spatial scale to elucidate Note: Read counts for Threshold 1 were calculated based on laboratory blanks (n = 18) and read counts for Threshold 2 were calculated from field blanks samples where the number of samples (n) was 10% of total samples collected at a given waterbody. Information not available is denoted with na TA B L E 3 Intersection between the total known community of common species identified through conventional net sampling conducted over the last 5 years in each waterbody (2014)(2015)(2016)(2017)(2018) and eDNA detections from 12S metabarcoding for species, genus, and family level taxonomic assignment. Common taxa are defined as any species identified in every year for the most recent 5 years of survey data (2014-2018) from sampling for long-term monitoring systems or in all years of sampling for short-term intensive sampling systems  how communities change in space and time. Our results echo those of other eDNA metabarcoding studies that have found that eDNA can provide useful community information for relatively low effort but also highlight that a certain amount of type I and II errors should be expected (Ficetola et al., 2016;Hänfling et al., 2016). Though species-specific assays or high-volume water filtration methods may be more appropriate if detecting rare species is a priority, we show that a single sampling event of low-volume/high-sample number study design combined with centrifugation and metabarcoding can reconstruct basic fish community structure from a variety of freshwater systems.

| Optimal sampling strategy and evaluation of centrifugal methods
Our study is one of the few demonstrations of non-filtration-based water sampling for fish metabarcoding assessments (but see Deiner et al. 2015;Ficetola et al. 2008;Thomsen et al. 2012). While many reviews consider both techniques (e.g. Bohmann et al. 2014;Goldberg et al. 2015), the field of eDNA metabarcoding in aquatic systems has increasingly moved towards the use of filtration approaches.
However, precipitation-based extractions can produce similar concentrations of DNA as filtration (Deiner et al., 2015). In our study, we found that not using a filtration method enabled field crews to collect a large number of water samples from many lakes without returning to the laboratory for filtration. Excluding transportation time between lakes, collection of water samples generally took 1-3 hr per lake. Also, minimal field experience was required to conduct sampling which means that sample collectors could be trained quickly which further expedited the time required to collect samples.
While the employed sampling strategy could be completed quickly, the low per-sample volume and sequencing depth both likely contributed to lower rates of species detections than other similar metabarcoding studies that conducted higher coverage sampling (Hänfling et al., 2016;Sard et al., 2019). Although we found low read counts in control samples (often zero), the number of sequence reads in many of the actual water samples was also low; therefore, many taxonomic assignments were based on fewer than 100 reads. The use of a larger sample volume, collection of additional 50 ml samples or inclusion of additional field and laboratory negatives could have boosted total read counts and improved our study's ability to distinguish true detections from contamination. Another major factor that contributed to low total read counts was the use of a MiSeq Reagent v2 Nano kit. When possible, future studies should consider sequencing at a greater depth by using a standard MiSeq Reagent kit which can produce six times the reads of a Nano kit or consider other sequencing platforms, such as NovaSeq, to obtain an even higher depth of coverage. The increased overall read counts should help to discriminate between negative control contamination and true sample detections and increase detection rates of low abundance species.
The number of control samples collected at each waterbody is another important consideration. One to five field blanks were collected for each waterbody which made the identification of outliers difficult. For example, one of the three field control samples from Crystal Lake contained over 4,000 Lepomis macrochirus reads while the other two field control samples contained 0 reads. If more field control samples had been available for Crystal Lake, potential anomalies, like the control sample with 4,000 reads, may have been able to be confidently identified and possibly removed as outliers. However, even when read counts were high in some control samples, the high F I G U R E 2 Non-metric multidimensional scaling (NMDS) for waterbodies based on eDNA and known community data 0sets with detections for three taxonomic levels: species (top), genera (middle) and family (bottom). Ellipses show the standard deviation around the centroid of each group. Shapes group waterbodies by basic system physical structure number of water samples collected per waterbody made it possible to still confidently identify taxa in each system. This suggests that our approach could be a successful means of eDNA capture especially when efficiency in the field and high sample size per system are desired. Many filtration protocols collected fewer water samples per waterbody than we did even if a larger total volume was sampled with filtration (Dickie et al., 2018). By collecting many independent samples per waterbody, it was possible to use the number of observations of a taxon in water samples as a control for false-positive detection in addition to overall read counts. This approach is similar to other studies that collected a large number of independent samples per waterbody and is suggested to be a balanced approach for handling putative false-positive and false-negative errors (Evans, Li, et al., 2017). This additional filter parameter made it possible to

| Reconstruction of known community structure
Our results complement much of the existing eDNA metabarcoding literature in finding that eDNA metabarcoding is capable of detecting a subset of the known fish community across a broad range of life histories and habitat preferences (Hänfling et al., 2016;Sard et al., 2019). Also, our data represent a low estimate for species detection because only a single gene was used for taxonomic assignments. Many eDNA metabarcoding studies use multiple genes which can help to increase the number of species detected and increase the number of species-level taxonomic assignments (e.g. Sard et al. 2019;Shaw et al. 2016). We chose to use a single gene because the majority of taxa can often be identified using a single marker and our goal was to evaluate the ability of a very basic eDNA monitoring protocol to reconstruct known fish communities. Future studies could include additional metabarcoding markers such as 16S or COI to increase species-level assignments and detections.
We collected samples from a broad range of habitats within each waterbody including below the thermocline in stratified lakes.
This may have helped to ensure the detection of species with diverse habitat preferences. Some studies have found that eDNA is often well mixed throughout a system (Zhang et al. 2020), but others found that DNA can clump together (Furlan et al. 2016;Williams, Huyvaert, Vercauteren, Davis, Piaggio, 2018); therefore, the optimal sampling strategy should consider both spatial distribution which could be important for stagnant systems where there is low mixing or large waterbodies and total volume which could be important to mitigate effects of DNA clumping. Conventional sampling approaches often have biased sampling efficiency for different species (Murphy & Willis, 1996); therefore, multiple types of sampling are necessary to fully describe local fish communities (Clark et al., 2007). For example, the data sets used to construct our known lake communities were derived from a combination of different conventional sam-  (Wisz et al., 2008).
While we found limited sampling bias, careful interpretation of eDNA sequence data is still crucial for accurate community assessments (Valentini et al., 2016) because substantial laboratory and bioinformatic biases do occur (Kelly et al., 2019;Piggott, 2016). For example, S. vitreus was a common species in 6 out of the 9 waterbodies we sampled, but we observed few S. vitreus detections across all samples, and where detections did occur, they contained low read counts. This suggests potential primer bias against S. vitreus; however, walleye have been successfully detected using these same primers in other studies (Gehri et al. 2020). The primers we used in this study are general vertebrate primers that are known to amplify a broad range of vertebrate taxa including the dominant taxonomic groups of fish common to lakes and rivers in our study area (Riaz et al. 2011;Gehri et al. 2020;Sard et al. 2019). Still, given the low sequencing depth which may compound the effects of primer bias and lack of a second genetic marker which would help to identify variance in sequence abundance we cannot rule out that some taxa are over-or under-represented in our data set. For this reason, primer bias can be especially problematic when attempting to estimate relative abundance based on metabarcoding (Piñol et al., 2019). We chose to use eDNA data as a method of detection instead of quantification of relative abundance which means that primers would need to fail to amplify sequences from a given taxa for it to be excluded from our analysis. However, presence-only data can still provide important information on species occurrence that is necessary to guide conservation action (Hefley et al., 2017;McDonald et al., 2013).
False-positive detections due to contamination or misassignment are a common concern in eDNA studies (Lahoz-Monfort et al., 2016;Wilcox et al., 2013). Here, we identified eight species in three waterbodies that had not been previously reported. Two novel species detections in Trout Lake and one in Lake Wingra have never been observed in either lake in 40 years of routine monitoring. Given the extensive conventional sampling of both lakes, it is unlikely that the detections are from DNA shed from individuals residing in each waterbody. In certain cases, such as the detection of F. diaphanus in Lake Wingra, it is possible that F. diaphanus recently recolonized the lake and is a true detection (Willink et al., 2018). Unfortunately, it is very difficult conclusively say whether detections are false-positives due to contamination, misassignment, or from DNA that originated from adjacent unsampled streams or wetlands or transferred to the lake by some other means. Additionally, Cyprinidae DNA closely related to Ctenopharyngodon idella 12S reference sequences was identified in Trout Bog, which only has one known species of fish (U. limi). The molecular laboratory at UMESC where the water samples were processed frequently works with Asian carp, including C. idella, indicating contamination as the likely source of this detection, but it could have originated elsewhere. Misassignments could also be the result of an incomplete reference database; however, this concern was mitigated by developing a reference database with as many species known to inhabit local lakes as possible and by conducting GenBank BLASTn searches on all unknown sequences. Nonetheless, 12S sequences were missing in GenBank for 15 species that have been observed in one of our survey sites previously. If sequences from any of these species were in our samples, it is likely they assigned to a closely related species or assigned to a higher taxonomic level (e.g. genus or family). Until a complete archive of barcoding sequences is available for all known species in a region, there will be the potential for taxa to be misassigned or un-assigned as a result of missing reference data. Extensive knowledge of fish communities in our study systems allowed us to make informed decisions about unexpected detections; however, when study systems lack previous survey data, it will be important to acknowledge type I errors when describing communities and plan for false-positive corrections (Ficetola et al., 2016) since these detections could bias future species distribution studies (Gormley et al., 2011).
We were able to successfully identify established differences and similarities among fish communities using the subset of the total fish community detected with eDNA. A great deal of community ecology research aims to quantify and describe patterns of diversity across time and space (Lomolino & Rosenzweig, 1996). However, this requires consistent methodology and effort to appropriately compare among systems (Gotelli, 2008). It can be difficult to conduct biodiversity surveys consistently at multiple locations because of differences in gear, sampling effort, or scientific expertise (Bried & Hinchliffe, 2019). Our study highlights the capability of eDNA metabarcoding to consistently sample a variety of systems, allowing simple cross-system comparison. eDNA detections described broad community structure similar to that achieved by the most recent 5 years of routine long-term monitoring or 1-2 years of intensive short-term monitoring with conventional methods. Waterbody grouping in NMDS mirrored established differences in fish communities such as strong community differences between river and lake habitats and smaller but still significant differences between eutrophic and oligotrophic lakes. These cross-system differences can help to identify at-risk systems or differences in ecosystem health among systems (Argillier et al., 2013). Our results and others suggest that eDNA metabarcoding could facilitate community surveys at a larger scale (10s to 100s of waterbodies; Hänfling et al., 2016;Li et al., 2019;Valentini et al., 2016).

| Implications for biodiversity monitoring and conservation management in the Anthropocene
Describing diversity and community structure continues to be a crucial area of research for conservation (Dudgeon et al., 2006).
Establishing baseline community composition and consistent monitoring of a diverse range of systems is one step in meeting conservation goals of the future (Strayer & Dudgeon, 2010).
Many freshwater systems throughout the world are currently experiencing increased stress (Fluet-Chouinard et al., 2018;Heino et al., 2009) and lack of fish community data makes the sustainable management of these systems much more difficult (Beard et al., 2011;Bower et al., 2020). Our study indicates that a simple eDNA metabarcoding assessment using a high number of low-volume samples, centrifugation and a single gene for metabarcoding can provide quality information about fish community composition that could be incorporated into monitoring regimes and allow for increased scales of sampling. As methods are adopted, sampling and sample processing can be optimized which will help increase the sensitivity and accuracy of assessments. We liken an eDNA metabarcoding approach such as ours to the fundamental limnological tool the Secchi disc. Though a single measurement with a Secchi disc is coarse and imperfect, its simplicity and low effort allow for easy scaling through time and space that make it a powerful tool for understanding freshwater ecosystems (Tyler 1968). Likewise, eDNA metabarcoding offers the opportunity to collect coarse but consistent biodiversity estimates across large spatial and temporal scales.

ACK N OWLED G EM ENTS
We thank Matthew Hoogland for assistance with library preparation and sequencing as well as Nick Sard and Becky Gehri for guidance and protocols used in laboratory and bioinformatic pipelines.
We also thank biologists at the Illinois Natural History Survey, especially James Lamer, for providing data and insights on the Upper Mississippi River fishery. This research was funded by the U.S.

Geological Survey Invasive Species Program and supported by the
Research Computing Clusters at Old Dominion University. Any use of trade, product or company names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

CO N FLI C T O F I NTE R E S T
There is no conflict of interest declared in this article.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13253.

DATA AVA I L A B I L I T Y S TAT E M E N T
Annotated R Scripts can be found on GitHub at https://github. com/peucl ide/eDNA_metab arcod ing_pipeline. Sequencing and associated metadata are available via ScienceBase https://doi. org/10.5066/P9CORN8Q.