Estimating sampling effort for early detection of non-indigenous benthic species in the Toledo Harbor Region of Lake Erie

Toledo Harbor (Maumee River and Maumee Bay) is a “port of concern” for introduction of non-indigenous species into the Great Lakes due to the large amounts of ballast water from outside the Great Lakes discharged at the port, the amenable habitat for many potential invasives, and the large amount of ballast water transported from Toledo to other Great Lakes ports, making Toledo a potential source of invasives throughout the entire region. To estimate sampling intensity needed to detect rare or new non-indigenous species, 27 benthic grab samples from 13 locations near Toledo Harbor were collected during autumn, 2010. Benthic organisms were identified, and sampling intensity needed to detect rare or new non-indigenous species was evaluated via a Chao asymptotic richness estimator. Morphological taxonomic criteria and cytochrome oxidase I (COI) sequence barcodes identified 29 different taxa (20 to species level) in the samples, including six non-indigenous taxa (Branchiura sowerbyi, Bithynia tentaculata, Corbicula fluminea, Dreissena polymorpha, Dreissena bugensis, Lipiniella sp.). While all the non-indigenous species had previously been reported in Lake Erie or nearby Ohio waters, several North American species are not previously listed in Ohio. Richness estimates indicate that >75% of the benthic species in the area were encountered and that 90% of the species could be detected with less than a doubling of collecting effort. Since sampling for this study occurred only in the autumn and detectable life stages of benthic organisms may vary seasonally, additional species may be observed with more extensive sampling over a broader seasonal range.


Introduction
Invasions of non-indigenous species (NIS) are among the most important problems facing the Great Lakes.Beginning in the 1800's, the introduction of NIS into North America has had overwhelmingly negative impacts on human health, ecosystems, and economic activities including social, cultural, recreational and industrial use of Great Lakes waters, tributaries, harbors, and coastal regions.The St. Lawrence Seaway accelerated these introductions by providing direct migration routes from the oceans and mediating the entry of foreign ships that discharge large of amounts of ballast water into ports in the Great Lakes.As a result, NIS are among the most significant threats to Great Lakes ecosystems.
Non-indigenous species have entered the Great Lakes through ballast water (over half of all damaging introductions), aquaculture-associated introductions (e.g., Asian carp which have arrived at the "doorstep" of the Great Lakes near Chicago and the headwaters of the Wabash River), and trade in live organisms.Prior to applying stricter permitting and regulations to ballast water management in 2006, a new NIS was discovered in the Great Lakes on average every 28 weeks (Ricciardi 2006).Estimated annual costs in the United States associated with aquatic NIS are >$5 billion due to fish, $1 billion for dreissenid mussels and Asiatic clams, $100 million for aquatic plants, and $40 million for green crabs (Pimentel 2005).The costs of allowing such trends to continue are potentially enormous, so it is wise to invest in preventing NIS introduction and limiting their spread with the help of early detection programs that are crucial to managers and early response programs.
Economic analysis of prevention, detection, and control costs have indicated that detecting NIS early in an invasion may decrease the ultimate cost of subsequent control measures (Mehta et al. 2007).For example, early estimates of costs associated with zebra mussels as high as $4 billion per year (Morton 1997) have been reduced substantially (Ram and Palazzolo 2008).Some of the reductions in costs have been due to widespread but locally managed use of dreissenid detection strategies, which enable managers to anticipate the arrival of mussels and to avoid unnecessary treatment when mussels are not present (Connelly et al. 2007).Understanding the benefits of early detection of NIS has led to studying strategies to detect NIS in the Great Lakes.
Toledo Harbor (Maumee River and Maumee Bay) has been characterized by United States Environmental Protection Agency (EPA) as "the port of greatest concern…" for new ballast water mediated introductions throughout the Great Lakes due to the large amount of ballast water discharged there from outside the Great Lakes and its highly suitable habitat for many potential NIS (US Environmental Protection Agency 2008).Although this conclusion by the EPA was based on the assumption that data analyzed for 2006-2007 was representative of relative ballast discharge patterns over several years, a recent doubling of the size of the Toledo Harbor seaport (Toledo Lucas County Port Authority 2014) probably means that, if anything, the risk of introductions may have grown further.Another consideration is that the low flood plain between the Wabash River (Mississippi River watershed) and the Maumee River, in which the port of Toledo is located, makes the Maumee River a potential entry point for NIS from the Mississippi watershed during high water events (Hebert 2010).Also, the nearby large population centers of Toledo, Cleveland, and Detroit increase the risk of introductions from the trade in live organisms, including bait.
Previous studies by EPA's Office of Research and Development (EPA-ORD) in Duluth-Superior Harbor (DSH) have shown that intensive survey methods and careful taxonomic analysis are effective for discovering previously undetected NIS (Trebitz et al. 2009;Trebitz et al. 2010).DNA analysis methods were used when morphological characters proved to be inadequate (Grigorovich et al. 2008).These EPA-ORD studies identified 19 species of non-indigenous benthic invertebrates, including 8 that had not previously been detected in DSH (Trebitz et al. 2010).The present study applies similar methods to those used by EPA-ORD in DSH, complemented by a more intensive application of molecular identification methods, to predict the sampling intensity that may be required for efficient detection of new NIS in Toledo Harbor.

Methods and materials
Sampling sites ranged from riverine (i.e., in the Maumee River itself) to open bay (beyond the mouth of the Maumee River).Figure 1 shows the location of the 13 collecting sites at which benthic samples were collected on one to three of the following dates (as detailed in the results) during early autumn, 2010: September 24, October 4, and/or October 5, 2010.Sediments were collected with a bottom dredge (Ben Meadows, 25 lb, bottom dredge; cat.# 125006)) with an effective sampling area of 213 cm 2 , which is about 10% smaller than the petite ponar grab sampler (area 236 cm 2 ) used by Trebitz et al. (2009)).Depths (range from 0.6 m -3.6 m), GPS coordinates, and vegetative cover were recorded for each collected sample.
Sediments were sieved in the field with a 500 μm screen (Cole-Palmer, brass, #35, cat.No.YO-59990-09) and preserved in 90 percent ethanol on ice for subsequent laboratory analysis.After resieving on a 500 μm sieve in the lab, samples were stored in 90% ethanol at 4 o C until sorting or other processing.Samples were searched visually and under the dissecting microscope for organisms, using a quick scan approach in which benthic samples are processed by visually scanning for asyet unseen species rather than enumerating all, taking care that the smallest organisms possible, down to 500 m, were not missed.Representative unique organisms, including those that were represented by as few as a single specimen and potentially identifiable molluscan shells, were selected from each sample.Voucher samples have been retained in 90% ethanol at 4 o C.
Specimens used as positive controls.Previously collected or laboratory grown organisms that were preserved in ethanol and for which the taxonomic identification is unambiguous were used as positive controls for methods development and quality assurance tests.Such specimens included adult zebra mussels and quagga mussels, Daphnia spp.obtained from Dr. Donna Kashian and Dr. Christopher Steiner (Dept. of Biological Sciences, Wayne State University), and specimens for which taxonomic identification is assured by biological supply companies (e.g., Lumbriculus variegates from Carolina Biological Supply).
Taxonomic analysis.Gross-level identification and tabulation of easily recognized taxa (e.g., Dreissenidae, Amphipoda, Oligochaeta, Diptera, other) were performed during a quick visual scan, sorting, and selection step.The selected representative organisms from each sample and several positive control organisms (blinded; i.e., not identified as already known) were individually photographed and shipped one to a vial in 90% ethanol to EcoAnalysts, Inc., a professional taxonomic services company, for identification according to classical morphological criteria.Organisms identified by EcoAnalysts were returned to the Ram laboratory either in ethanol in their original vial, or, in the case of oligochaetes, permanently slide-mounted.
DNA barcoding by the Canadian Centre for DNA Barcoding (CCDB).Small tissue samples (about 2 mm in diameter, each) from organisms were submitted in 90% ethanol in 96-well plates, according to Standard Operating Procedures required by CCDB.These organisms included tissue from specimens identified by EcoAnalysts, additional oligochaete specimens (since the slidemounted oligochaetes could not be used), and various positive controls and other specimens, as detailed further in the results.All organisms from which the tissue samples were taken were photographed.Upon receipt of the preserved tissues, CCDB extracts DNA and analyzes sequences for the mitochondrial cytochrome oxidase I (COI) "barcode" region of each sample.Control experiments tested that CCDB obtained identical barcode sequences for DNA extracted by the Ram laboratory (DNA extracted and purified using the DNeasy Blood and Tissue Kit, Qiagen cat.no.69506, Valencia, CA) and submitted in addition to blinded tissue samples from the same organisms.Sequences for selected specimens are given in the supplement (Appendix S1).
Taxa accumulation analysis.Taxa accumulation curves (i.e., a curve showing how many additional species types are identified with increasing numbers of samples assessed) were plotted to provide a means of assessing the likelihood that all possible species in the sampled habitats had been encountered.If an accumulation of taxa plotted against the number of samples yields an ascending curve without reaching an asymptote, then it is highly probable that additional taxa remain to be found.The species incidence data (i.e., the number of sediment samples containing particular species or other taxonomic classification) were then analyzed by the Chao asymptotic richness estimator (Chao et al. 2009;Colwell 2009) to estimate the total number of species likely to be present in the sampled habitat.

Taxonomic analysis
The 27 benthic samples varied in numbers of organismal types identified by the quick scan method from as few as one unique organism per sample to as many as seven.EcoAnalysts identified 25 different taxa from the 142 animals or shells sent to them (Table 1).Photographs (one view only) of each of the 25 different types of organisms identified by EcoAnalysts are shown in the supplement in Figure S1.Of the 25 different organism types, 19 were identified to species, and the others were identified to the genus or family.Sixteen of the organism types are molluscs; five are annelids; and four are arthropods.Among the annelids, 17 oligochaetes were identified as Limnodrillus hoffmeisteri, two as Limnodrillus udekemianus, one as Branchiura sowerbyi, and five as unidentifiable fragments.Other annelids were leeches, identified as Helobdella elongata (three specimens), and Helobdella stagnalis (two specimens).In the 27 samples collected, seven of the 25 species were encountered in only one 1 Location-date format: The letter refers to sites on the map in Figure 1.The number refers to one of three collection dates in 2010: 1, September 26; 2, October 4; or 3, October 5. Identifications are according to EcoAnaysts, Inc. Authority and year of each taxon are from http://zipcodezoo.com and cross-checked on http://www.marinespecies.org/.Synonyms and different opinions about the valid name are indicated as "also known as".
sample while four species were encountered in only two samples.Several Daphnia pulex/pulicaria sent to EcoAnalysts as blind positive controls were correctly identified but one sample was said to be Daphnia catawba.

Barcode molecular analysis by CCDB
Out of 105 samples sent to CCDB (seven positive controls and 98 "unknowns"), CCDB obtained quality COI sequences from all seven positive controls and from 81 of the unknowns.
For the positive controls, CCDB obtained 100% matches to the correct organism for purified DNA submitted as blind samples from Dreissena polymorpha and Dreissena bugensis.Purified DNA from a portion of two different chironomids and the rest of each organism submitted as separate blind samples gave identical DNA sequences with respect to which organism the DNA was from.DNA extracted from a leech was correctly identified as being from the genus Helobdella despite a >15% divergence of the sequence from previously known leech sequences.
Among the 98 unknowns, 37 were from specimens that had also been analyzed by EcoAnalysts.Of these, six had matches (identical in >97% of the sequence) in the Genbank or CCDB reference DNA databases at the genus or species level: Lipiniella sp., (99.7%), two specimens of Bithynia tentaculata (both 99.7%), two specimens of Dreissena (99.7% and 100% match to D. polymorpha), and Hexagenia limbata (99.4%).Newly identified barcodes (i.e., organisms identified to species by EcoAnalysts for which no previous COI barcode had been identified; see supplement Appendix S1 sequences 1 and 2) include Pisidium compressum (five specimens) and Musculium transversum (one specimen).Among specimens that had not been analyzed by Eco-Analysts, species sequence matches in the reference databases were obtained for Branchiura sowerbyi (two specimens, 100% match), Chironomus cf.decorus (99.7%),Helobdela elongata (97.9%), and Corbicula fluminea (100%).Seven organisms identified by EcoAnalysts as Coelotanypus sp. had identical barcode sequences (Appendix S1 sequence 3) and have 100% matches to reference sequences from chironomids.Unfortunately, none of these sequences have been identified by CCDB at a level of genus or species.The nearest species matches in the reference databases differ from these sequences at more than 10% of their bases.The same sequence was also obtained for 3 other chironomids that were submitted to CCDB without prior classification by EcoAnalysts.
Similarly, 14 specimens identified by Eco-Analysts as Chironomus sp. had nearly identical sequences to each other (no differences within the group of more than 1%), matched 100% to sequences in the reference databases that were identified to family as Chironomidae, and differed from all previously identified genus or species barcodes by greater than 10%.An additional three chironomid specimens that had not been classified by EcoAnalysts also had sequences identical to this group.A representative sequence for these Chironomus sp.specimens is given in Appendix S1 (sequence 4).
EcoAnalysts identified three species of oligochaetes: Limnodrillus hoffmeisteri, Limnodrillus udekemianus, and Branchiura sowerbyi.Due to the difficulty of extracting DNA from the mounted specimens, similar but unclassified oligochaetes were submitted to CCDB for bar-coding.The morphology of Branchiura sowerbyi is distinct, and two such specimens were correctly predicted to have that barcode (100% match).Among the other oligochaetes submitted to CCDB, all of the barcodes differed by more than 10% from previously identified genera or species in Genbank or the CCDB database.These sequences fell into four barcode groups (see Appendix S1, sequences 5ad), one containing 18 specimens, another of three specimens, another of two specimens, and one with one specimen.
One leech that was identified by EcoAnalysts as Helobdella stagnalis differed from previously barcoded H. stagnalis sequences by more than 15% (see supplement, sequence 6).In fact, the sequence seen in this single leech specimen was identical to the sequence obtained from the single leech specimen submitted to CCDB as an annelid positive control (see above).

Taxa accumulation
Figure 2 illustrates the accumulation of the 25 taxa identified by EcoAnalysts.The curve is still rising, indicating that more intensive sampling by the same methods would likely yield more species.Analyzing the taxa incidence data in Table 1 with the Chao asymptotic richness estimator, the number of taxa present is estimated to be approximately 31, suggesting that approximately 80% of the taxa present in this environment have been detected by this sampling regime.Additional calculations estimate that to encounter 100% of the taxa present would require approximately 100 more similar samples to be collected and analyzed.However, calculation estimates indicate that by collecting only 15 more sediment samples than those analyzed in this study 90% of all taxa present may be captured, and so on.A caveat is that not all of the organisms were identified to species level.If that were taken into account, this could change these numbers significantly.
These calculations can also be performed taking into account the greater species richness indicated by the CCDB molecular barcoding data.The main effect of the molecular data is to enable the differentiation of several additional identifiable taxonomic units among groups that could not be distinguished by EcoAnalysts.Thus, among the oligochaetes, instead of just three species, the molecular analysis indicates at least five oligochaete species are likely present.Several additional chironomid species may also be differentiated.One specimen that had a barcode of Chironomus cf.decorus was clearly different from sequences of other chironomid specimens and brought the total number of taxa identified to species to 20.Taking the molecular data into account indicates that the number of species sampled was at least 29, while the number of unique (seen in only one sample) and duplicate (observed in just two samples) taxa were nine and six, respectively.With these values, the total number of species in the sampled environment is estimated to be 35, indicating that approximately 82% of them have been encountered.To encounter 100% of the species would require 44 samples; 90% should be encountered with five more samples.The taxa accumulation curve (not shown) is similar to Figure 2 and had R 2 = 0.952.

Discussion
This study used a combination of classical taxonomic analysis and molecular taxonomic methods based on the mitochondrial COI barcode region in a search for rare, novel, or non-indigenous benthic organisms in Toledo Harbor.The detected taxa were compared to the Nature Serve and Integrated Taxonomic Information System (ITIS) databases of known species in North America (http://www.natureserve.org/index.jsp and http://www.itis.gov) and to various lists of NIS, including those published by the United States Geological Survey (http://nas.er.usgs.gov/),National Exotic Marine and Estuarine Species Information System (NEMESIS; http://invasions.si.edu/nemesis/browseDB/searchTaxa.jsp?taxon=branchiura; see (Fofonoff et al. 2003); Great Lakes and Mississippi River Interbasin Study (GLMRIS) (http://glmris.anl.gov/documents/ans/index.cfm; see also (Veraldi et al. 2011), the Global Invasive Species Database (http://www.issg.org/database/welcome),and the EPA (US Environmental Protection Agency 2008).Pisidium compressum and Musculium transversum, are North American species that had not previously been reported in Ohio waters, according to the Nature Serve and ITIS databases.Non-indigenous species in the EcoAnalysts dataset (Table 1) include Branchiura sowerbyi, Bithynia tentaculata (Kipp and Benson 2011), Corbicula fluminea, Dreissena polymorpha, and Dreissena bugensis.Lipiniella sp., usually described as a European species but also reported elsewhere in North America, was also found.These were all confirmed by CCDB DNA barcodes.
All of the NIS had previously been reported in Lake Erie or nearby Ohio waters; including several that are comparatively rare (Branchiura sowerbyi and Lipiniella sp.accounted for fewer than 1% of the identified specimens).Part of the difficulty in identifying new NIS is the lack of information about the species already present.The sequences for many of the annelids had no matches in reference COI databases, likely due to the lack of prior investment in getting those organisms sequenced.For example, a leech identified as Helobdella stagnalis and another leech with an identical sequence both differed by >15% from previously sequenced H. stagnalis, and all previous leech sequences.Generally, organisms in the same species differ in barcode sequences by less than 3%.However, leech barcodes that vary by as much as 7% between different populations of H. stagnalis are nevertheless still considered to be from the same species (Oceguera-Figueroa et al. 2010).We encountered several of these specimens, so they may be fairly common.Whether they represent a new introduction or a new, but cryptic species not previously named remains for future work, possibly including sequencing of nuclear genes to confirm these divergences.Similarly, little is known about COI barcode sequences for oligochaetes and chironomids.Adding molecular analysis to classical taxonomic identification increased the numbers of species detected and may also reveal cryptic previously unrecognized indigenous and non-indigenous taxa.
For taxonomic identifications, this project used commercial taxonomy services, such as EcoAnalysts and CCDB, in part, to determine if such services were sufficiently accurate for future early detection surveys of non-indigenous organisms.We assessed the quality of their results with various blinded positive controls.In general, these vendors did well, although EcoAnalysts identified one Daphnia pulex as Daphnia catawba.Transcription errors can also occur: By examining internal consistency of data entries (e.g., does the phylum agree with the indicated genus?) and comparing various entries in vendor datasheets with photographs and voucher specimens, we identified several such errors.These companies do not guarantee 100% accuracy (e.g., EcoAnalysts QA documents indicate that >90% agreement between independent taxonomists meets their quality standard).Such errors have been corrected when detected.The use of two methods (barcodes and morphology) for species identification provides a further double-check on identifications.Such issues reinforce the need for photographic documentation and retention of archival specimens whenever possible.
The present survey was similar in methods to the study by Trebitz et al. (2009) in Duluth-Superior Harbor.Although Trebitz et al. (2009) had a greater collecting intensity (77 benthic samples) and identified a larger number of benthic taxa (158 taxa), their accumulation curves, like ours, were still rising.Their estimate is that they had detected only 80% (158 out of 197 taxa) of the taxa predicted to be in the system by the Chao asymptotic richness estimator.Altogether, approximately 8% (13 out of 158 taxa) of the benthic taxa identified by Trebitz et al. (2009) were NIS.
In comparison, approximately 20% of the taxa detected in the Toledo Harbor area in the present study were NIS.The Chao estimator similarly estimated for the present study that about 80% of the taxa in the sampled environment had been detected despite the much lower number of samples ( 27) collected in Toledo Harbor than in Duluth-Superior Harbor.However, as has recently been pointed out (Lopez et al. 2012), richness estimators (Chao and others) consistently underestimate the total abundance of taxa when sample sizes are small.Applying their suggested correction formula (S est,corrected = S est (1+P 2 ), where P is the proportion of singleton or unique taxa in the samples) to adjust the richness estimator produces an adjusted number of benthic taxa predicted to be present in Toledo Harbor upward by about 10%.This lowers the estimate of the proportion of total taxa detected by the 27 samples to about 75% and increases estimates of the sampling effort that would be required to achieve 90% detection.
A further consideration is that the samples for this study were collected on three days in late September/early October, a time when rooted vegetation, known to be present earlier in the season, had already disappeared from collecting sites.Trebitz et al. (2009) had also collected during a short time period, but it was in late summer when vegetation was still present in about a third of their collecting sites.Since Trebitz et al. (2009) detected significantly more rare and nonindigenous species in shallow vegetated areas, the lack of this identifying factor and habitat could also have decreased the number and types of non-indigenous and rare species detected in the present study.The limited collecting periods may also have resulted in missing various benthic organisms whose numbers may vary seasonally.The results thus apply to a limited range of substrates and may be seasonally specific as well.Potentially, a more extensive sampling regimen that includes more types of habitat substrates and a broader seasonal distribution than in this study will reveal additional species and substrate types that favor detection of rare or non-native species.
To develop management programs for specific ports, studies like this can provide a guide for future collecting effort and therefore likely costs to provide effective early detection of rare or non-indigenous species in the area, which may differ from port to port.The taxonomic complexity and predicted number of samples needed for an effective survey of Toledo Harbor appears to be lower than observed in Duluth-Superior Harbor by Trebitz et al. (2009).The EPA, in its 2010 Great Lakes Restoration Initiative call for proposals, suggested that an appropriate oversampling strategy for early detection of NIS should be to capture and identify roughly 90% or more of all taxa present in the biological component of the system being sampled.To achieve >90% detection of all species present and an increased likelihood of detecting NIS will require a substantial increase in the number of samples collected and should include a broader seasonal range and habitats such as vegetated sites.Nevertheless, the experience gained from navigating the area and sampling these sites near the Port of Toledo should enable resource managers to conduct future surveys with greater efficiency and appropriately increased sampling effort.

Figure 1 .
Figure 1.Sites A -M along the Maumee River and Maumee Bay at which benthic samples were collected during September and October 2010.The site collection area is located in the box in the inset.

Figure 2 .
Figure 2. Accumulation of taxa incidence as a function of number of samples analyzed.These data are based on EcoAnalyst's identifications of unique organisms in 27 sediment samples.

Table 1 .
Sites at which 25 different organism types or their shells were collected in Toledo Harbor (Maumee River and Maumee Bay).