Marine water environmental DNA metabarcoding provides a comprehensive fish diversity assessment and reveals spatial patterns in a large oceanic area

Abstract Current methods for monitoring marine fish (including bony fishes and elasmobranchs) diversity mostly rely on trawling surveys, which are invasive, costly, and time‐consuming. Moreover, these methods are selective, targeting a subset of species at the time, and can be inaccessible to certain areas. Here, we used environmental DNA (eDNA), the DNA present in the water column as part of shed cells, tissues, or mucus, to provide comprehensive information about fish diversity in a large marine area. Further, eDNA results were compared to the fish diversity obtained in pelagic trawls. A total of 44 5 L‐water samples were collected onboard a wide‐scale oceanographic survey covering about 120,000 square kilometers in Northeast Atlantic Ocean. A short region of the 12S rRNA gene was amplified and sequenced through metabarcoding generating almost 3.5 million quality‐filtered reads. Trawl and eDNA samples resulted in the same most abundant species (European anchovy, European pilchard, Atlantic mackerel, and blue whiting), but eDNA metabarcoding resulted in more detected bony fish and elasmobranch species (116) than trawling (16). Although an overall correlation between fishes biomass and number of reads was observed, some species deviated from the common trend, which could be explained by inherent biases of each of the methods. Species distribution patterns inferred from eDNA metabarcoding data coincided with current ecological knowledge of the species, suggesting that eDNA has the potential to draw sound ecological conclusions that can contribute to fish surveillance programs. Our results support eDNA metabarcoding for broad‐scale marine fish diversity monitoring in the context of Directives such as the Common Fisheries Policy or the Marine Strategy Framework Directive.


| INTRODUC TI ON
Monitoring of marine biodiversity provides a baseline for policy implementation toward a sustainable use of the marine environment and its resources. Among the traditional methods for surveying marine fauna, trawling has been widely used, as identification and quantification of large volumes of organisms are considered a reliable method for monitoring bony fishes and elasmobranchs (hereafter fishes) and other marine animal populations (ICES, 2015;Massé, Uriarte, Angélico, & Carrera, 2018). Fish surveys using trawls are conditioned by the gear's own characteristics (e.g., mesh size, area of opening) and deployment parameters (e.g., towing speed, depth, and diel variation) (Heino et al., 2011). Consequently, besides being invasive and time-consuming, fish trawling in pelagic environments can be largely selective affecting diversity estimates and knowledge of species composition (Fraser, Greenstreet, & Piet, 2007;ICES, 2004).
For instance, due to their large body size, fast swimming speed, and in some cases, scarcity, many elasmobranch species are not thoroughly surveyed (Rago, 2004). Therefore, alternative methods are needed, and advances in DNA sequencing and bioinformatics have opened new avenues to assess marine biodiversity in a noninvasive manner (Danovaro et al., 2016;Rees, Maddison, Middleditch, Patmore, & Gough, 2014).
In particular, the analysis of environmental DNA (eDNA), that is, the genetic material shed and excreted by organisms to the environment, to characterize the biological communities present in an environment (Taberlet, Coissac, Hajibabaei, & Rieseberg, 2012) is gaining increasing attention for monitoring aquatic environments (Thomsen & Willerslev, 2015). Community composition can be inferred from eDNA samples through metabarcoding, whereby the eDNA is collected from the water column through filtering, selectively amplified through PCR using primers targeting a given barcode from a particular taxonomic group and sequenced (Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012). The resulting sequences are then compared against a reference database to perform biodiversity inventories (Deiner, Bik, & Mächler, 2017). Besides the inherent biases of metabarcoding (Aylagas, Borja, Irigoien, & Rodríguez-Ezpeleta, 2016), the use of eDNA adds additional biases due to the complex ecology of this molecule (Barnes & Turner, 2016) that might interfere with its potential use for biodiversity assessment. Thus, additional research is required to better understand the utility of eDNA for fish monitoring. Most studies using eDNA metabarcoding for monitoring fish communities are based on freshwater environments and have shown that eDNA metabarcoding provides overall estimates that are equivalent or superior to traditional methods such as visual surveys, trawling, or electrofishing (Hänfling et al., 2016;Minamoto, Yamanaka, Takahara, Honjo, & Zi, 2012;Pont et al., 2018).
As opposed to freshwater systems, the marine environment has in general a larger water volume to fish biomass ratio and is influenced by currents, implying that the eDNA is less concentrated and disperses quicker (Hansen, Bekkevold, Clausen, & Nielsen, 2018).
This, coupled with a higher sympatric marine fish diversity, suggests that monitoring fish diversity through eDNA sampling could be particularly challenging in the marine environment. Indeed, only a handful of studies have applied eDNA metabarcoding for monitoring fish in natural marine environments (e.g., O'Donnell et al., 2017;Stat et al., 2017). Among them, only a few have compared eDNA and other traditional surveying methods and are based on a very small area of a few square kilometers either in ports (Jeunen et al., 2019;Sigsgaard et al., 2017;Thomsen et al., 2012) or in coastal areas (Andruszkiewicz et al., 2017;DiBattista et al., 2017;Yamamoto et al., 2017) or have performed comparisons at family level taxonomic assignments . Thus, although these studies envision eDNA metabarcoding as a promising method for noninvasive, faster, more efficient, and reliable marine surveys, this needs still to be tested in the context of a fishery survey covering a broad marine area.
The Bay of Biscay is a biogeographical area in the North Atlantic Region covering more than 220,000 km 2 , at which the main eco-  (ICES, 2018). Fish diversity in the Bay of Biscay has been accounted using mainly observational methods, fish trawling, and acoustic surveys; thus, there is scope for incorporating and assessing the performance of eDNA-based surveys. This paper aims to test the potential of eDNA metabarcoding to assess the fish community composition in a large marine area, such as the Bay of Biscay. For that aim, we have compared eDNA metabarcoding-based biodiversity estimates with those derived from fishing trawls catches and have related eDNA metabarcoding-based estimates with the known spatial distribution and ecological patterns of the species in the area.

| Sample collection
Fish and elasmobranchs catches and water samples were collected during the BIOMAN 2017 survey (Santos, Ibaibarriaga, Louzao, Korta, & Uriarte, 2018) between May 5 and May 29, 2017, covering the area of about 120,000 km 2 between the French continental shelf and the Spanish shelf ( Figure 1) on board the Emma Bardán and Ramón Margalef research vessels. Fish catches were obtained on board the R/V Emma Bardán pelagic trawler. The trawl had an 8 mm mesh size cod end, and towing time and speed were 40 min and 4 knots, respectively. A total of 44 stations were used for trawling. Although station depths varied between 26 and 3,000 m, the maximum fishing depth was 156 m. Onboard, fish were morphologically identified to species level or, when doubt, to the smallest taxonomic rank (e.g., family or genus). Biomass estimates were standardized as Kg caught per taxa and per station. In 44 additional stations (Figure 1), water samples were collected on board the R/V Ramón Margalef research vessel using the continuous circuit intake of the ship at 4.4 m depth, transferred to 5-L plastic bottles and filtered through Sterivex 0.45 µm pore size enclosed filters (Millipore) with a peristaltic pump, using a 6 μm mesh size net in the incoming tube to avoid clogging. All material used for filtering, including tubes, net, and bottles were decontaminated by rinsing them once with 10% bleach solution, three times with Milli-Q water and three times with the sampling water to be filtered. Filters were kept at −20°C until further processing.

| DNA extraction and amplicon library preparation
DNA extractions were performed in a dedicated pre-PCR laboratory using the DNeasy ® blood and tissue kit (Q iagen) following the modified protocol for DNA extraction from Sterivex filters without preservation buffer by Spens, Evans, and Halfmaerten (2017). DNA concentration was measured with the Quant-iT dsDNA HS assay kit using a Qubit ® 2.0 Fluorometer (Life Technologies, California, USA). DNA from all 44 samples was amplified with the teleo_F/telo_R primer pair (hereafter "teleo"), targeting a region (~60 bp) of the mitochondrial 12S rRNA gene, combined with the human blocking primer teleo_blk (Valentini et al., 2016). PCR mixtures were prepared under the hood in the pre-PCR laboratory using dedicated micropipettes and disposable plastic ware that were previously decontaminated under the UV light, and all postamplification steps were carried out in the post-PCR primer (final concentration of 0.2 µM), 4 µl of teleo_blk (final concentration of 2 µM), 3.2 µl of Milli-Q water, and 2 µl of 10 ng/µl template DNA. Samples from 4 stations were also amplified (a) using the same procedure but without the blocking primer, and (b) using the mlCOI-intF/dgHCO2198 primer pair (hereafter "mlCOI"), targeting a region (~310 bp) of the COI gene (Leray et al., 2013;Meyer, 2003). The thermocycling profile for PCR amplification included 3 min at 98°C; 40 or 35 cycles (for "teleo" and "mlCOI" as indicated in Valentini et al. (2016) and Leray et al. (2013), respectively) of 10 s at 98°C, 30 s at 55, or 46°C (for "teleo" and "mlCOI," respectively) and 45 s at 72°C, and finally, 10 min at 72°C. Replicate PCR products were combined and purified

| Reference database
Two reference databases were created for the "teleo" barcode. A only those covering the "teleo" region were selected and aligned.
The tree was visually inspected, and the records corresponding to misplaced species were removed from the database. This "local" reference database contains "teleo" region sequences of 612 species.
For the "mlCOI" barcode, the reference database consisted in the COI sequences and their corresponding taxonomy obtained from the BOLD (Ratnasingham & Hebert, 2007) database.

| Read preprocessing, clustering, and taxonomic assignment
Overall quality of raw demultiplexed reads was verified with FASTQC (Andrews, 2010 cutadapt (Martin, 2011) allowing a maximum error rate of 20%, discarding read pairs that do not contain the two primer sequences and retaining only those reads longer than 30 nucleotides. Paired reads were merged using pear (Zhang, Kobert, Flouri, & Stamatakis, 2014) with a minimum overlap of 20 nucleotides. Pairs with average quality lower than 25 Phred score were removed using Trimmomatic (Bolger, Lohse, & Usadel, 2014). mothur (Schloss et al., 2009) was used to remove reads (a) not covering the target region, (b) shorter than 40 or 313 nucleotides, for "teleo" and "mlCOI," respectively,

| Biodiversity analyses
Analyses were performed in R v3.6.1 with the packages Phyloseq

| Data quality and overall taxonomic composition
We obtained a total of 4,640,913 raw "teleo" reads from which 3,366,264 (72%) were retained after quality check for downstream analyses. The average number of "teleo" reads per sample F I G U R E 3 (a) Relative read abundance (%) of taxa classified to Subphylum, and (b) specifically classes within Chordata and families within Actinopterygii, respectively, from the four samples sequenced with the "mlCOI" primers

(b)
As for the four samples amplified with "mlCOI" primers, we obtained 389,665 raw reads from which, 324,731 (83%) were retained for downstream analyses. The average number of "mlCOI" reads per sample retained after quality filtering is 81,183 (Table 3). Using the BOLD database, 89.86% of the reads were classified into Phylum, 80.87% of which were metazoans, and among them 47.88% were classified as arthropods and 2.51% as chordates (Figure 3). Within chordates, 74.56% of the reads were classified as Actinopterygii (1.87% of the overall reads), resulting in only seven taxa classified into species (Figure 3).

| Comparison with fish trawling
Trawling operations during the BIOMAN survey resulted in a total of 18 taxa caught, from which lanternfishes (Fam. Myctophidae) and mullets (Mugil sp.) were the only ones not classified into species level. Qualitatively, a total of 10 species were identified both from the eDNA and trawling catches ( Figure 4a)  There was an overall correlation between fish biomass and number of reads per species although not significantly different from 0 at p < .05 (Figure 4b). E. encrasicolus was the most abundant species for both methods, while the relative abundance for some species like Dicentrarchus labrax, M. poutassou, and S. pilchardus was higher when using eDNA. In contrast, the relative abundance of M. merluccius, S. scombrus, and Trachurus spp. was higher in catches than when using eDNA ( Figure 4b; Table 4). At a local scale, no significant correlation between eDNA and trawling-based abundances was found (Mantel test, r = −0.04 p = .646). In fact, eDNA data showed a more constant abundance of the three most abundant species (E. encrasicolus, S. pilchardus, and S. scombrus), compared to trawl data, which showed in general a higher number of species per station, except for those eight stations were E. encrasicolus was dominant (>94% of the catch) ( Figure 5).

| Species distribution patterns
We found that correlation between compositional dissimilarities and geographic distances between stations was weak for both eDNA (R 2 = .38 p < .01) and trawling stations (R 2 = .20 p < .01). In both cases, pairs of stations that are less than about 100 nautical miles apart cover the full range of Bray-Curtis distances (Figure 6), whereas more distant stations differ more in taxonomic composition. This is particularly evident for eDNA samples, for which pairs of stations that are more than 200 nautical miles apart are available. Comparisons between samples within same or distinct depth  (shallow, medium, deep) or within same or distinct sampling methods (eDNA, trawling) had no effect over the observed patterns ( Figure 7).
The overall compositional pattern of our data showed significant differences between species occurrence and sampling sites according to their zone (e.g., shallow, medium, and deep stations) (PERMANOVA F 2,43 = 2.24, p < .05) (Figure 8)

| D ISCUSS I ON
This study shows how eDNA metabarcoding provides a comprehensive overview of the fish diversity in a large-scale marine area. Compared to fish trawling, eDNA metabarcoding was able to "capture" a larger number of fish species. Both, eDNA and trawling-based estimates (in number of reads and biomass, respectively) indicate that E. encrasicolus represents half of the abundance, which is consistent to the known large and stable anchovy population in the Bay of Biscay (Erauskin-Extramiana et al., 2019;Santos, Uriarte, Boyra, & Ibaibarriaga, 2018;Uriarte, Prouzet, & Villamor, 1996)  The following three species were caught during fish trawling but were absent from eDNA data despite being present in the reference database, Z. faber, S. sprattus, and C. aper. One possible explanation for this false-negative detection could be the little abundance of this species' DNA in the water, as suggested by the small and reduced number of catches (2.07 Kg in 3 sites, 1.07 kg in 2 sites, and 0.34 Kg in 2 sites, respectively). In fact, a small number of reads, that is, 591, was also detected for Solea solea, a species from which 0.05 kg were caught in a single station. If this is the case, filtering larger volumes of water and increasing sequencing depth could improve detection. Alternatively, reference sequences for Z. faber, S. sprattus, and C. aper could be undetected errors in the reference database (Li et al., 2018) or correspond to alternative intraspecific variants.
On the other hand, in accordance with previous studies, eDNA data resulted in about 100 more species (35 with more than 10 reads) than trawling data collected simultaneously (Thomsen et al., 2012Yamamoto et al., 2017). For example, species such as D. sargus, P. acarne, M. mola, D. labrax, L. piscatorius, Chelon ramada, A. dubius, G. niger, Ctenolabrus ruperstris, A. silus, S. microcephalus, and B. affinis were not found in catches, but were more abundant in eDNA reads than the 5th most abundant species (M. merluccius) in catches. The fact that eDNA results in a higher number of species could be partially attributed to the efficiency of the method to detect benthic or coastal species, difficult to catch by pelagic trawling nets, focused on small and medium-size pelagic species. To check to what extent eDNA is able to detect in surface waters (4 m) demersal species, we compared the results with the ICES International Bottom F I G U R E 8 Nonmetric multidimensional scaling (NMDS) plot, with a stress of 0.15, showing the similarity of species from each sample based on their relative abundance. The ellipse shows the 95% distance based on the centroid of the three sampling zones groups (shallow, medium, and deep stations). Spatial patterns of the species with >1,000 reads are shown   stations suggests that stations further apart tend to be more different. The amount, quality, and stability of DNA molecules are largely affected by the production rate from each organism, diffusion of the molecules in the water, and its inherent degradation (Barnes & Turner, 2016;Collins et al., 2018;Murakami et al., 2019;Thomsen et al., 2012). But also, PCR amplification stochasticity and sequencing depth are known to affect the number of reads obtained from an eDNA sample Zinger et al., 2019). previous studies, our data support eDNA as a potential mechanism for detecting and studying the distribution of elusive and deep-water species, which normally go undetected in fish trawl surveys, for example, elasmobranchs . In any case, eDNA results also revealed an ecological pattern for elasmobranchs, for instance R. undulata, which has a high-site fidelity occurred only in shallow waters (ICES, 2014), while large sharks as S. microcephalus, P. glauca and Lamna nasus predominantly occurred in deeper sites.
Interestingly, these differences were observed even when collecting water from the surface. we found incorrectly annotated sequences at all taxonomic levels.
As environmental samples contain highly complex DNA signal from various organisms, primer choice is critical for species-level identification (Collins et al., 2019). We found that for our samples, the eukaryote universal COI primers result in a very small proportion of reads assigned to Actinopterygii. This is due to the fact that the primers target a large number of taxonomic groups, so larger coverage is needed for producing robust data (Alberdi, Aizpurua, Gilbert, Bohmann, & Mahon, 2018;Corse et al., 2019;Gunther, Knebelsberger, Neumann, Laakmann, & Martinez Arbizu, 2018;Stat et al., 2017). The use of more specific primers in our study allowed the specific detection of both Actinopterygii and Elasmobranchii. (Kelly, Port, Yamahara, & Crowder, 2014;Miya et al., 2015). Yet the amount of reads attributed to Elasmobranchii is small as "teleo" primers were not specifically designed for this taxa, for example, Kelly et al. (2014), and recent developments on elasmobranch-specific primers (Miya et al., 2015) could potentially be a powerful tool to increase the elasmobranch diversity in future marine surveys. In addition, for closely related species such as Alosa alosa and Alosa fallax, the target barcode was exactly the same, so being cautious we consider them as Alosa spp. Another crucial methodological F I G U R E 1 0 Venn diagrams showing fish caught in the ICES Bottom Trawling Survey carried out (a) between 2003 and 2019 and (b) in October 2018 available from ices.dk/marine-data/data-portals/ and (c) in the 2017 Pélagiques Gascogne (PELGAS) integrated survey compared to the fish species detected through eDNA metabarcoding step is the clustering method. We showed that using a clustering method (i.e., vsearch and swarm) decreased the number of identified species, probably because the algorithm merged similar sequences from different species into singular OTUs. Recent studies have suggested that clustering techniques and the use of percentages of similarities specially in short (<100 bp) sequences might mislead diversity estimates (Calderón-Sanou, Münkemüller, Boyer, Zinger, & Thuiller, 2019;Callahan et al., 2017;Xiong & Zhan, 2018). Thus, procuring a taxonomically comprehensive database with good quality sequences and accurate data curation steps is crucial for producing robust and reproducible ecological conclusions from eDNA metabarcoding methods (Collins et al., 2019;Weigand et al., 2019). Including a human-specific blocking primer in our samples had little effect, as we indeed detect, although a small percentage (<0.01%), reads identified as H. sapiens. The use of blocking primers in metabarcoding analysis has been previously used to block dominant taxa in a specific samples, for instance host DNA from diet analysis (Jakubavičiūtė, Bergström, Eklöf, Haenel, & Bourlat, 2017), or human DNA from ancient samples (Boessenkool et al., 2012). Our results suggest that our samples held very little contamination from external sources such as human manipulation, air, or input from land.
Alternative ways to survey marine biodiversity and unbiased evaluations of the ecosystem components are needed as these provide the baseline for policy implementation in the context of global marine directives (e.g., Common Fisheries Policy or the Marine Strategy Framework Directive). eDNA metabarcoding is becoming a more accessible method that generates reliable information for ecosystem surveillance and invites its application on regular marine monitoring programs (Bohmann et al., 2014;Lacoursière-Roussel, Rosabal, & Bernatchez, 2016;Takahara, Minamoto, Yamanaka, Doi, & Zi, 2012). However, there is still discussion on whether eDNA-based approaches can be used to manage fisheries, and there is a demand of continuous research to build confidence in eDNA-based results as evidence (Jerde, 2019). This study has shown that eDNA samples provide information on fish diversity in a broad-scale marine area such as the Bay of Biscay, detecting almost ten times more fish species compared with pelagic trawling, including some considered elusive or difficult to capture with traditional fishing methods. These results show that, despite its inherent uncertainties, eDNA metabarcoding has the potential to become a routine technique for fisheries management as it can provide information on fish diversity and distribution in large oceanic areas, including less accessible locations and targeting rare and elusive species, in a cost-effective and noninvasive manner. This is particularly relevant in a context of global change, where establishing efficient management actions based on numerous, continuous, and accurate biodiversity assessments is paramount.

ACK N OWLED G M ENTS
Authors are grateful to the crew of R/V Ramon Margalef and R/V Emma Bardán for their support during filtering and collection of sam- Innovation and Universities (project CTM2017-89500-R). This is contribution number 976 from the Marine Research Division (AZTI).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw sequencing reads are available at the NCBI SRA under Biosample accession numbers SAMN13489000-SAMN13489051.