Executing multi-taxa eDNA ecological assessment via traditional metrics and interactive networks

assessment


H I G H L I G H T S
• Environmental DNA biological quality elements enable effective environmental assessment using high throughput sequencing • Non-traditional 18S biological indicators provide a finer scale assessment of environmental conditions • Network based analyses enable quantification of changes in biotic and abiotic associations across multiple taxonomic groups • Transitivity network properties enable direct assessment with currently existing environmental policy metrics

G R A P H I C A L A B S T R A C T
a b s t r a c t a r t i c l e i n f o

Introduction
A major challenge for the 21st century is ensuring natural resources are well managed for current and future generations (Lampert, 2019). Increased awareness of the negative impacts of human interactions, paired with continued technological development and budgetary constraints, calls for a practical reassessment of how we implement environmental regulation and the strengthening of links between assessment methods and underlying ecological principles. Freshwater ecosystem management is immensely important as access to freshwater via lakes and rivers is essential for human societies. Freshwater ecosystems are highly sensitive and are regularly exposed to multiple stressors including toxic chemicals, pharmaceuticals, and thermal profiles, which compromise their stability and ecosystem function. Instability of natural waterways imposes a negative impact on human livelihood, particularly with regards to growth, development and sustainment of our populations and communities. Whereas direct chemical assessment would intuitively seem the most logical means to evaluate natural systems for contamination, it often lacks temporal and spatial reliability resulting in false positive or negative associations (Morse et al., 2007). Additionally, with the ever increasing number of manufactured compounds, the ability to detect all possible point effects and the innumerable interactions of potential environmental stressors is impractical (Gavrilescu et al., 2015). Ecological assessment, whereby observations of living populations or communities within and among sampling locations is used to assess natural resource health, is a well-established practice and serves as the basis for freshwater regulation such as that required by the Water Framework Directive (WFD) (Morse et al., 2007).
The WFD is a key piece of legislation within the European Union (EU) with the primary objective of ensuring sustainable aquatic resources for future generations. Implementation of the WFD requires a system for evaluating the health of aquatic ecosystems, as a means of identifying those in need of remediation, and to ensure no deterioration of ecosystems currently in a healthy state (referred to as "good ecological status") (Carvalho et al., 2019). The WFD requires characterization of biological communities, along with physiochemical and hydromorphological conditions. The ecological status of a given freshwater body is determined by the assessment of biological quality elements (BQEs); including fish, benthic macroinvertebrates, phytobenthos, macrophytes and phytoplankton, each requiring unique sampling, analysis and computational approaches (e.g. metrics) (Simboura et al., 2005). Benthic macroinvertebrate metrics, such as the Whalley, Hawkes, Paisley & Trigg (WHPT) metric (Water Framework Directive, 2014b), function by assigning taxonomically identifiable groups to scores based on their pollution tolerance, which are then used to generate ecological assessment metrics, including Average Score per Taxon (ASPT) (Mandaville, 2002), from which ecological status can be derived. Similarly, the Trophic Diatom Index (TDI) assigns scores based on pollution tolerance to diatom species (Kelly, 1998). Furthermore, current biomonitoring metrics rely heavily on the assumption of environmental sorting of communities, which may induce false positive or negative associations of the sampled individuals if other community factors are not considered, including species interactions and dispersal dynamics (Cadotte and Tucker, 2017). While increased replication would allow for greater accuracy under traditional protocols, current budgetary and manpower constraints hinder effective implementation.
Traditional biomonitoring protocols also rely heavily on identification of organisms by skilled analysts and can be time and resource intensive if a centralized identification database is not used to cross reference variants (Baird and Hajibabaei, 2012). Alternatively, utilizing a molecular based sampling and taxonomic assignment protocol that relies on a centralized taxon database and computer assignment may allow for a standardized biomonitoring protocol as well as furthering the scope and depth of the indicator groups used to associate pollution or environmental perturbation, thereby allowing finer scale assessment of any environmental changes that have occurred at target sites (Pawlowski et al., 2018).
Environmental DNA (eDNA) is DNA isolated from environmental samples and has been shown to be a reliable means for assessing biodiversity in freshwater environments (Bohmann et al., 2014;Seymour, 2019). Utilizing eDNA for community biomonitoring is not yet fully recognized, however eDNA based methods are increasingly being used across government and industry sections to assess individual species populations that are of conservation concern (Biggs et al., 2015), or pose a threat as invasive species (Jerde et al., 2013). Environmental DNA assessment methods facilitate rapid sampling and can be analyzed faster than traditional methods, thereby providing greater spatial and replication scope when sampling a given environment. The eDNAbased assessment approach also facilitates a standardized sampling and assessment protocol across broad taxonomic space, providing several orders of magnitude more diversity data generated per sample than traditional assessment methods. With regards to traditional community biomonitoring, eDNA-based assessment still needs to be validated as a viable surrogate for current biomonitoring protocols, but offers several potential advantages with increased sensitivity, spatial and temporal scope, and reduced manpower requirements (Cordier et al., 2017;Pawlowski et al., 2018). Currently, eDNA based biomonitoring studies have shown good agreement between traditional and eDNA based methods (Gibson et al., 2015), but many are limited in their spatial scale, having been restricted to mesocosm experiments often with narrow taxonomic breadth, and do not link to an existing ecological assessment strategy (though see recent efforts regarding diatom eDNA biomonitoring; Vasselon et al., 2018).
In contrast to current biomonitoring data, environmental DNAderived data also has the potential to include a much wider range of taxa and indicator groups that are not currently included due to limitations of traditional taxonomic identification. This may allow for a finer scale assessment, particularly in assessing differences among neighboring sites or in evaluating moderate changes in environmental conditions (Cordier et al., 2017). While there is a need to ensure continuity between traditional and eDNA based assessment methods, there should also be a forward thinking approach to utilizing the broader eDNA data to derive comparable biometric data that encompass not only the concept of environmental filtering on individual indicator taxa, but also the interactive qualities that are also linked to the stability and ecological complexity of habitats (Cadotte and Tucker, 2017). Cooccurrence networks are one way to assess taxa interactions across deep diversity pools, and have been linked to several aspects of metacommunity dynamics that are directly relatable to biomonitoring principles (Araújo et al., 2011). From co-occurrence networks, several network properties can be used to summarize the interactive relationship between abiotic (effects of the environment on the individual indicators) and the biotic (strength of the relationship between sets of indicators). Linkage density is a surrogate measure of community complexity, whereas transitivity provides a measure of network connectivity. Modularity indicates network redundancy, and cohesion indicates the sensitivity of the community to changes in community structure (Karimi et al., 2017).
This study assesses the reliability of applying eDNA based ecological assessment metrics using matched traditional and eDNA based sampling from a national biomonitoring program. We compared biomonitoring assessment approaches for benthic macroinvertebrates and diatoms using both traditional and eDNA-derived data to determine whether eDNA can be used as a surrogate for traditional methods. We further assessed the ability of using eDNA genera groups as indicators by comparing environmental association between the 18S nuclear small subunit (nSSU) universal molecular marker and molecular markers that targets traditional biomonitoring groups (rbcL, 12S and COI). The expectation being that traditional biomonitoring groups are the preferred monitoring biological units to determine inter-site variations based on environmental differences, since biomonitoring methods were originally designed for the traditional indicator groups. Finally, we assessed the ability of co-occurrence networks of environmentally sensitive genera to infer ecological status of rivers, with the expectation that network properties associated with connectivity among genera (i.e. network vertices) should link to differences in ecological status.

Sampling
The Glastir Monitoring and Evaluation Programme (GMEP) was a Wales-wide environmental assessment program that included assessment of freshwater biodiversity (Emmett et al., 2017). The GMEP assessments were based on structured, unbiased reporting of ongoing national trends in widespread habitat, soil and landscape types across the rural and peri-urban landscape of Wales, based on a survey of 300 squares (1 × 1 km) sampled over a 4-year period between 2012 and 2016. The survey squares used for sampling were chosen by random sampling within Institute of Terrestrial Ecology (ITE) land classes to provide a good representation of widespread broad habitats and wider countryside. Headwater streams are typically found in 50-60% of the randomly selected GMEP squares. In the 75 squares sampled in 2016, 35 included rivers for which eDNA samples were collected, and eDNA successfully analyzed for 31 of the collected samples for this study (Fig. 1). The streams sampled for eDNA were also analyzed for chemical, physical and biological metrics including diatoms, macrophytes, macroinvertebrates and habitat suitability. For each sampling event (i.e. site) we collected eDNA and traditional invertebrate kick-net samples on the same day. Each site was visited on a separate day with separate sampling kits to avoid contamination among sites. Kits consisted of single replicate sampling material prepared in sterile plastic bags and have been found to be highly effective in avoiding cross contamination in a previous 300 replicate experiment (Seymour et al., 2018). Water samples for eDNA analysis (1 L) were collected, in triplicate, from each stream and filtered through enclosed 0.22 μm Sterivex filter units (EMD Millipore Corporation, Billerica, USA) using a Geopump TM Series II peristaltic pump (Geotech, Denver, USA). Filters were immediately preserved in lysis buffer (buffer ATL; Qiagen, Venlo, The Netherlands) and stored at room temperature during transit to the laboratory for further processing. Invertebrate communities were sampled using a standardized 3-minute kick sampling protocol, with a kick/hand net (500 μm mesh gauge). Both bank margins and riffle habitats were sampled during this timed sampling period. Invertebrates were preserved in absolute ethanol (99.8%; VWR International, Lutterworth, UK) after collection. On return to the laboratory, invertebrates were sorted from other collected material and identified to the lowest practical level. Benthic diatom samples were collected from submerged surface structures; cobble when available, otherwise large pebbles and small boulders were used. At least five separate surface structures with obvious diatom film were sampled per site. Structures were placed in a tray with 50 mL of stream water and brushed vigorously with a stiff toothbrush. After each structure had been scrubbed in the same tray, the suspension was transferred to sample bottles for transport back to the lab for identification.

Extraction and sequencing
DNA from filters was extracted using a modified QIAGEN DNA blood and tissue extraction protocol (Spens et al., 2017). Extracts were then cleaned for impurities using QIAGEN Power Clean kit and frozen at −20°for subsequent analyses. Four sequence libraries were created using a twostep protocol (e.g. Bista et al., 2017) with unique matching index ends (manufactured by Integrated DNA Technologies, IDT) and the following metabarcode gene regions: 1) 18S primers TAReuk454FWD1 (5′-463 CCAGCA(G/C)C(C/T) GCGGTAATTCC-3′) and TAReukREV3 (5′-464 ACTTTCGTTCTTGAT (C/T)(A/G)A-3′) (Stoeck et al., 2010); 2) COI primers m1COIintF (5′-GGWACWGGWTGAACWGTWTAYCCYCC-3′) and jgHCO2198 (5′-TAIACYTCIGGRTGICCRAARAAYCA-3′) (Leray et al., 2013); 3) rbcL rbcL646F (5′-ATGCGTTGGAGAGARCGTTTC-3) and rbcL998R (5′-GATCACCTTCTAATTTACCWACAACTG-3) (Glover, 2019); 4) 12S MiFish-U forward (5′-GTCGGTAAAACTCGTGCCAGC-3) and reverse (5′-CATAGTGGGGTATCTAATCCCAGTTTG-3) (Miya et al., 2015). Each sample library was amplified using each primer set in triplicates which were then pooled and index labeled with assistance of a Gilson pipette max liquid handler. For the 18S, rbcL and 12S barcodes we used New England Biolab's Q5 mastermix. For the COI barcodes we utilized Thermo Scientific's Ampli-gold mastermix, due to the higher number of inosine in the COI barcodes, which would not amplify when using the Q5 master mix. For the 2nd round index PCR, Q5 master mix was used for all reactions. All PCRs were run in 25 μL reactions. For 18S barcodes, PCR reactions consisted of 1.5 μL DNA template, 12.5 μL mastermix, 1 μL (10 ng/ μL)of each primer pair and 9 μL PCR grade water, and amplified using 98°C for 30 s followed by 15 cycles of 98°C for 10 s, 50°C for 30 s and 72°C for 30 s followed by a final annealing step at 72°C for 10 min. COI barcodes were generated using a reaction mix of 12.5 μL mastermix, 2 μL DNA template, 1 μL of each primer and 8 μL nuclease free water and amplified using an initial 95°C for 5 min then 25 cycles of 95°C for 30 s, 54°C for 30 s and 72°C for 60 s followed by a 72°C final annealing for 10 min. RbcL barcode reactions were setup using 12.5 μL master mix, 1.5 μL DNA template and 9 μL nuclease free water and amplified using 98°C at 30 s with 23 cycles of 98°C for 5 s, 54°C for 10 s and 72°C for 20 s followed by 72°C for 5 min. 12S barcode reactions were initiated with 12.5 μL mastermix, 2 μL DNA template, 1 μL of each primer and 9 μL nuclease free water and amplified using 98°C for 30 s, followed by 15 cycles of 98°C for 10 s, 54°C for 30 s, 72°C for 30 s with a final annealing step of 72°C for 5 min. Libraries were then shipped to the University of Birmingham's Genomic sequencing facility for quality control and sequencing. For quality control amplicons were purified using High Prep PCR magnetic beads (Auto Q Biosciences) and quantitated using a 200 pro plate reader (TECAN) using qubit dsDNA HS solution (Invitrogen). Final library amplicons were mixed in equimolar quantities (at a final concentration of 12 pmol) using a Biomek FXp liquid handling robot (Beckman Coulter). Pool molarity was confirmed using a HS D1000 tapestation screentape (Agilent) prior to 2 × 250 bp paired end sequencing on an Illumina HiSeq platform to obtain ≥100,000 reads per sample.

Bioinformatics
Bioinformatics up to taxonomic assignments were performed by the University of Birmingham. Amplicon reads were demultiplexed by the sequencing facility. Quality based trimming of the fastq reads were performed using SolexaQA++ v.3.1.7.1 (Cox et al., 2010). Paired ends were merged using Flash (Magoč and Salzberg, 2011) under the default parameters. Primer sequences were removed by TagCleaner, whereby up to 3 mismatches per primer sequence was allowed (Schmieder et al., 2010;Miya et al., 2020). Only sequences with both forward and reverse primers were retained for further analyses. Amplicon sequence variants (ASVs) were obtained using Unoise3 using default parameters (Edgar, 2016). Diatom taxonomy was assigned to representative ASVs using the Diat.barcode data based at 97% similarity threshold (Rimet et al., 2019). COI and 12S taxonomy was assigned to representative ASVs using the non-redundant nucleotide database (NCBI) at 97% similarity threshold (Alberdi et al., 2018). 18S taxonomy was assigned to representative ASVs using the SILVA v128 database at 97% similarity (Quast et al., 2013). Taxonomic assignments, and their associate ASVs that returned incomplete taxonomy or unknown identifiers were excluded from further analyses. Amplicon sequence variants were rarified to the lowest replicate level to normalize the diversity of the samples. Mean read numbers for each amplicon sequence variant (ASV) were calculated across the rarified sequences before being matched to their taxonomic identifier. Amplicon sequence variants that did not occur in at least two of the three site replicates were discarded from subsequent analyses.

Statistics
All statistical analyses were performed using R version 3.6.1(R Core Team, 2019). Whalley, Hawkes, Paisley & Trigg (WHPT) average score per taxa (ASPT) was based on WFD river invertebrate metric scores to the identified family groups from the traditional and eDNA based sampling (Water Framework Directive, 2014a). WHPT is an abundancebased method that assigns metric scores based on raw or log scale abundance groups. To adapt the traditional metric to eDNA we used proportional read numbers within each sample to assign metric scores to each group, such that 0.1, 1, 10, N10 translated to 1, 10, 100, N100 class groupings in the traditional metric protocol (Water Framework Directive, 2014a). Correlations were assessed between traditional and eDNA based ASPT numbers for each site using type II linear regressions using a major axis regression. Diatom biomonitoring scores were calculated, following existing protocols, as the sum of the percentages for all taxa (counts for traditional sampling and read numbers for eDNA) in each sample using the TDI4 taxon scores for rivers (Kelly et al., 2008). Correlations were assessed between traditional and eDNA based TDI values for each site using type II linear regressions using a major axis regression.

Networks
Co-occurrence networks were constructed in a two-step process that utilized the unique genera found across all four markers (Table S1). First, Genera sensitivity to environmental conditions across the sites was assessed against the first principle component axis of the PCA characterizing habitat variation across the sites (listed below). We used logistic regression and maximum likelihood to determine whether the presence or absence of each genera was significant (p b 0.05) against the response variable (first PCA axis characterizing environmental variation) across the sites. To account for unequal weighting due to study design sampling and unequal groupings of environmental variation across sites, an environmental weighting was used based on the density of the environmental values across the sites (King and Zeng, 2001). Additionally, each genus had to occur in at least 3 sites to be statistically tested. Second, genera co-occurrence (across all sites) networks were constructed from the genera who significantly (p b 0.05) associated with the PCA derived environmental gradient, with links between genera (edges) determined using the proportion of shared sites between each genera (vertices) as a threshold for linking (edges) genera. Due to the general nature of connectivity networks to be sensitive to threshold values, we took a holistic approach and calculated all networks for thresholds between 1% and 80% (0.5% increments; 159 resulting networks) co-occurrence, whereby connections (edges) between genera (vertices) were connected (non-directional) if they cooccurred at or above the threshold value (proportion of shared sites).
For each site we isolated the site-specific subnetwork (N = 31) for each of the co-occurrence networks created (N = 159). Subnetwork properties; including linkage density, modularity, transitivity and cohesion were calculated to compare changes in networks structure across sites for each threshold via the igraph package in R (Csardi and Nepusz, 2006). We assessed the relationship between each biomonitoring metric and network property vs the habitat quality score (HQS) and habitat modification score (HMS) using linear regression across all network co-occurrence threshold values, to assess overall relationships in subnetwork properties to water framework environmental quality measures.
Across all sites we found 1157 unique universal (18S) genera, 219 unique macroinvertebrate (COI) genera, 138 unique diatom (rbcL) genera and 9 unique fish (12S) genera. We found an average of 66.12 macroinvertebrate genera across all sites (SD = 20.33) via COI eDNA, and 22.84 genera (SD = 8.57) with traditional kick netting. We found an average of 47.93 diatom (rbcL) genera (SD = 17.95) per site using eDNA and an average of 12.5 genera (SD = 4.67) per site using traditional sampling. The universal 18S marker found 412.23 genera (SD = 94.73) on average per site using eDNA. Fish (12S) diversity was on average 2 genera (SD = 1.43) across all sites using eDNA.

Indicator genera assessment and co-occurrence networks
Indicator genera were identified from 18S, COI and rbcL assigned genera; however, no 12S assigned taxa were significantly associated with environmental variation among sites, and were excluded from further analyses. We found 214 indicator genera for 18S, 26 genera for COI and 12 for rbcL (Fig. 3). Main networks derived from the indicator genera varied in linkage across the threshold values ranging from 6 vertices at 80% co-occurrence to 249 at 0.1% (mean = 197, SD = 77). Network degree (the average connections per network vertex) ranged from 1.7 at 80% co-occurrence to 221.6 at 0.1% (mean 79.2, SD = 70.6). Across all networks, vertices (i.e. genera) with similar environmental sensitivity (Fig. 3) consistently clustered with genera of similar environmental sensitivity (Fig. 4).
Subnetwork properties (i.e. sites) varied across threshold values for modularity, transitivity, linkage density and cohesion (Fig. 5). Modularity was not significantly (p N 0.05) associated with HMS or HQA for any of the co-occurrence thresholds and remained close to 0 for most subnetworks (mean across threshold = 0.139, SD = 0.066), indicating random network associations. We found significant (p b 0.05) positive associations between transitivity and HMS for all levels between 17 and 50% as well as thresholds between 66 and 80%. Transitivity was not consistently associated with HQA but was significantly negatively associated with HQA for thresholds of 59 to 63% (Fig. 5). Transitivity ranged from 0.1 to 1 across all subnetworks and threshold with mean transitivity across the thresholds being 0.791 (SD = 0.146). Linkage density was not associated with HQA, but did correlate significantly (negatively) with HMS between thresholds of 24-66%. Linkage density ranged from 0.07 to 1 across all subnetworks and thresholds with means across the thresholds was 0.554 (SD = 0.315). Cohesion was significantly (p b 0.05) negatively associated with HMS sporadically between thresholds of 7% to 79% and was not associated with HQA for any threshold values. Cohesion ranged from 0 to 144 with a mean of 19.08 (SD = 30.28).

Discussion
We have shown that eDNA based biomonitoring reveals similar trends at a national scale to those produced via traditional methods for benthic macroinvertebrate and diatom biomonitoring metrics. We also show that 18S-derived biodiversity holds greater potential as an eDNA biomonitoring marker compared to COI or rbcL and that eDNA in general captured more biodiversity compared to traditional sampling methods. Lastly, we show that multiple levels of eDNA-based biomonitoring data can be screened for environmental sensitivity and, when combined with co-occurrence networks, distill additional community based environmental assessment that incorporates environmental filtering and indicator taxa interactions, which will undoubtedly impact future regulatory protocols.
Traditional biomonitoring metric scores such as WHPT and BMWP, from which ASPT are derived, and TDI are based on indicator taxa likelihood occurrence for sites along pollution or stressor gradients. The exact measures used for benthic macroinvertebrates are rather cryptic in the corresponding government reports, but have been distilled to relate to testing of macroinvertebrate group persistence for two sites for the UK and the broader EU (Hawkes, 1998;Water Framework Directive, 2014b). Diatom biometric scores are more transparent and are based on individual species occurrences across a set of environmental conditions using weighted averages; however the individual metric scores are not accessible in the corresponding report and need to be obtained as part of a software program called DARLEQ2 (Kelly et al., 2008). Comparisons between eDNA and traditional metrics show positive covariation between approaches for both diatoms and invertebrates, with diatom metrics calculated from eDNA data showing higher values compared to values calculated using traditional methods, whereas the macroinvertebrate eDNA metric was slightly lower than the traditional metric. Taxa assignment overlap between eDNA and traditional methods, and was 47.83% for macroinvertebrate indicator groups versus 15.57% for genera, with eDNA generally yielding higher diversity than traditional methods for each set of metrics, though not for all sites. The disparity in overlap between indicator group and genera between eDNA and traditional assignment is largely due to the mixed class identification that is utilized for traditional invertebrate biomonitoring, and is often limited to family level (Morse et al., 2007), therefore many indicator taxa are not identified to the genus level. Diatom overlap was 11% for indicator groups versus 33% for genera, which may reflect the extensive disparity of diatom taxonomy names between existing diatom studies and databases (Mann, 1999). Of note was the inability of eDNA to adequately sample macroinvertebrates from 2 sites, which led to poor eDNA assessment. As such, the poor eDNA taxonomic coverage was not noticeable until after sequencing, while a poor kick net sample can be identified during the field sampling itself. Overall, the few studies that have assessed eDNA biomonitoring, including the present study, show that eDNA based methods generally produce higher diversity, but do vary in the taxonomic groups they identify, which can be linked to disparity in the sampling methods used (Gibson et al., 2015;Elbrecht et al., 2017;Hajibabaei et al., 2019).
Taxonomic annotation for any method (e.g. traditional or eDNA) is inherently linked to the sequence database or identification key used to assign the final taxonomic designation, and is expected to fall short of fully cataloging the biodiversity of any given site. For traditional kick-net sampling, taxonomic assignment is often limited to family or genera level assignments and often omits/simplifies small or difficult to distinguish groups that are key environmental indicators, such as Rotifers, Oligochaeta or Chironomidae (Furse et al., 2009;Elbrecht et al., 2017). With regards to efficiency, standard kick-net sampling is estimated to return between 40 and 60% of the expected biodiversity for a given sample (Bradley and Ormerod, 2002;Furse et al., 2009), with an estimated 30% routinely misidentified due to human error (Haase et al., 2010). For metabarcoding based taxonomic assignment, taxonomic identification is linked to the utilized marker and the reference database. This study utilized four unique markers and three unique databased, COI and 12S utilized NCBI, whereas 18S relied on Silva and rbcL utilized the Diat.barcode database. Currently, the coverage of the existing databases are estimated at 14.6% for diatoms (rbcL) 64.5% for freshwater macroinvertebrates (COI), 87.9% for freshwater fish (12S) (Weigand et al., 2019) and 40% for prokaryotes (18S) (Louca et al., 2019). Taxonomic assignment for high throughput sequence data is expected to improve with the ongoing research efforts steadily improving sequence database catalogues, with current assignments still being highly valuable for assessing general ecological trends, including among site environmental assessment (Bista et al., 2017;Cordier et al., 2017).
Environmental filtering/selection of genera (Fig. 2) for the networks found 214 18S genera, associated with environmental variation across the sampling sites, compared to 26 COI (macroinvertebrate) and 12 rbcL (diatom) genera, with no environmental filtering present across the 12S dataset (fish). The reasons behind the disparity in observed indicator groups partially stems from the increased diversity in the 18S derived data whereby 29,948 unique genera were detected across all sites, compared to 171, 138 and 9 for COI, rbcL and 12S respectively. Additionally, the increased phylogenetic depth represented in the 18S derived data allows for increased ecological and evolutionary breadth, potentially making this a better marker for eDNA-based biomonitoring. The finding that 18S provides a better eDNA based biomonitoring marker is in-line with recent findings of other highly diverse molecular markers, including 16S, which offer a more powerful assessment of communities across environmental conditions (Pawlowski et al., 2016). Additionally, the COI marker, while widely used for eDNA studies, is not optimal for eDNA based identification likely due to the loss of reads through sequencing of non-target taxa (Collins et al., 2019).
Our co-occurrence analyses found strong associations between genera that co-occurred across sites as descriptors of environmental variation across the network. The high connectivity and degree values associated with a few key genera within the network also highlights the importance of observing not only those taxa associated with a particular environment, but the expected associations of the taxa to other indicator taxa as a means to determine whether an environment is experiencing environmental change (Germain et al., 2018). The network properties associated with community complexity and network connectivity were significantly positively related with habitat modification score (HMS), which relates to highly modified sites having subnetworks that are more centralized in the network, thereby increasing their connectivity properties. This is most evident in the transitivity scores, with the probability that the connecting genera in a network are also connected ecologically. Likewise, the greater linkage density, which is the proportion of realized vs. potential links in the subnetworks is intrinsically higher at more centralized subnetworks (Kantarci and Labatut, 2013). Both these measures highlight the importance of unmodified habitat in protecting not only unique species, but unique communities (Muotka et al., 2002). Additionally, both co-occurrence analyses highlight the potential importance of key individual indicator taxa within their respective networks via cohesion, which indicates the number of vertices that can be removed prior to the network collapsing (White and Harary, 2001). The resulting networks also provide information regarding which key indicator interactions could provide a better indication of change in environmental condition of a site, rather than only presence/absence of the indicator taxa themselves, which may occur stochastically across a range of environmental conditions (Beauchard et al., 2017).
Overall, this study shows the application of eDNA metabarcoding as a viable surrogate for traditional biomonitoring methods and also shows the ability to incorporate a wide range of taxa that have been overlooked using traditional sampling and biomonitoring protocols. We emphasize the need for future studies to evaluate communities for co-occurrence to ensure that false positives from random associations between indicators and local random/stochastic factors are not associated with co-occurring indicator groups. This approach should provide a more sensitive assessment of community response to environmental variation.
Supplementary data to this article can be found online at https://doi. org/10.1016/j.scitotenv.2020.138801.   4. Subset of the main co-occurrence networks constructed. Each node is a unique genus and the color represents the mean gradient value for the node (genera) across the study sites (yellow = low, blue = high). Here we show four of the constructed networks (N = 157) for A) 10% co-occurrence B) 30% C) 50% D) 70%. Fig. 5. Sensitivity analyses of the co-occurrence network threshold (x-axis) and their associated subnetworks (i.e. sampling site) properties (top: modularity, 2nd row: transitivity, 3rd row: linkage density, bottom: cohesion) tested against HQA (first column) and HMS (second column). The red line indicates the 0.05 significance value of the x-axis (p-value for the respective test). Points on or below the line signify a significant association between the network property and the ecological assessment metric (HQA or HMS).

Declaration of competing interest
None.