A Framework for Aquatic Invasive Species Surveillance Site A Framework for Aquatic Invasive Species Surveillance Site Selection and Prioritization in the US waters of the Laurentian Selection and Prioritization in the US waters of the Laurentian Great Lakes Great Lakes

Risk-based prioritization for early detection monitoring is of utmost importance to prevent and mitigate invasive species impacts and is especially needed for large ecosystems where management resources are not sufficient to survey all locations susceptible to invasion. In this paper we describe a spatially-explicit and quantitative approach for identifying the highest risk sites for aquatic invasive species (AIS) introduction into the United States’ waters of the Laurentian Great Lakes, a vast inland sea with a surface area of 246,049 square km and a shoreline length of 16,431 km. We compiled data from geospatial metrics available across all of the US waters of the Great Lakes as surrogates for propagule pressure from the dominant AIS pathways. Surrogates were weighted based on the observed or expected contribution of each pathway to past (historic) and predicted future invasions. Weighted surrogate data were combined to generate “invasion risk” scores for plants, invertebrates, fish, and all taxa combined at 3,487 management units (9 km × 9 km). The number of sites with invasion risk scores > 0 is: for plants (490), for invertebrates (220), for fish (436), and for all taxa (403). The rank order of sites with the highest risk scores varies by taxa, but in general the top thirty highest risk sites are the same across all groups. For all taxonomic groups, we show that the “top 30” sites account for at least 50% of predicted propagule pressure to the basin from all pathways. Many of the highest risk sites are located in western Lake Erie, southern Lake Michigan, and the St. Clair-Detroit River System. This framework provides a starting point for objective surveillance planning and implementation that can be adaptively improved.


Introduction
The management of biological invasions is one of twenty targets included in the United Nation's Convention on Biological Diversity Strategic Plan, the foremost guidance for national strategies to conserve and sustainably use global biodiversity. Aichi Target 9, concerning invasive species, identifies prioritization of species and pathway risks as a key element of invasive species management (UNEP 2011). Site prioritization, though not explicitly mentioned in Aichi Target 9, has been recognized as a critical third focus area for comprehensive invasive species prioritization (McGeoch et al. 2016). Whereas, species prioritization efforts are numerous and have been applied at various scales from global to regional (Roy et al. 2014;Nentwig et al. 2016), and pathway prioritization examples are increasing as pathway research clarifies terminology and links pathways with "real-world" data (Hulme et al. 2008;Essl et al. 2015), prioritization efforts that consider the interaction of species, pathways, and sites are less common (i.e. "integrated prioritization," sensu McGeoch et al. 2016). Here we consider the combined risk of multiple species, from multiple pathways, across multiple sites to inform aquatic invasive species (AIS) surveillance and early detection efforts in the Laurentian Great Lakes, one of the most heavily invaded aquatic systems in the world (Mills et al. 1993;Ricciardi 2006). We describe a spatially explicit and quantitative approach for identifying the highest risk sites for AIS introduction based on the cumulative risk of new introductions (including range expansions) from a range of pathways and associated non-native species across sites spanning the US waters of the Great Lakes basin.
Policy changes appear to have slowed the rate of invasion in the Great Lakes (Bailey et al. 2011a). Increased regulation and monitoring of ballast water transport by transoceanic vessels likely accounts for some of the observed decline in non-native species introductions, as the shipping pathway has accounted for the majority (~ 70%) of new species introductions to the Great Lakes in the last sixty years (Holeck et al. 2004). However, four new non-native plankton species have been detected in the basin since 2015. Vectors of introduction are not known for these species but some of them were likely introduced via contaminated ballast water from foreign ports (e.g., Thermocyclops crassus, Connolly et al. 2017;Brachionus leydigii, Connolly et al. 2018; Diaphanosoma fluviatile, Whitmore et al. 2019), whereas introduction of Mesocyclops pehpeiensis into Lake Erie was probably related to the ornamental aquatic plant trade or aquaculture . Thus, management of the ballast pathway, while robust, does not provide complete protection against biological invasion, and imperfect management of non-shipping vectors leaves the Great Lakes vulnerable to new introductions. Several potentially invasive species are predicted to arrive in the Great Lakes over the next few decades (Pagnucco et al. 2015).
A recent analysis of historical Great Lakes AIS detection data found that through time, detections are increasingly associated with population centers and less associated with maritime traffic, highlighting the growing importance of introduction pathways other than shipping (O'Malia et al. 2018). For invaders that are already established in North America, authorized and unauthorized release of AIS and spread via canals and natural aquatic connections are two key vectors for the Great Lakes (Rothlisberger and Lodge 2013).
Recognizing the continued and imminent threat of AIS to the Great Lakes, the US Environmental Protection Agency (EPA) called for the development of a comprehensive program for AIS early detection and the establishment of a coordinated, multi-species early detection network (USEPA 2014). In 2014, the Great Lakes states of Illinois, Indiana, Michigan, Minnesota, New York, Ohio, Pennsylvania and Wisconsin formed an Early Detection Rapid Response (EDRR) Team to collaborate on the development of tools and guiding documents to support state AIS management actions. Under the leadership of the Michigan Department of Environmental Quality, the EDRR Team secured a Great Lakes Restoration Initiative (GLRI) grant from US Fish and Wildlife Service (USFWS) and invited partners representing state and federal agencies, academic institutions, and non-governmental organizations to develop a watch list of species of concern and a surveillance site selection and prioritization method as a first step towards developing a comprehensive program for AIS early detection in the Great Lakes. This paper is a key product of those efforts.
The framework we describe here relies heavily on the predictive power of propagule pressure and history of invasion as indicators of invasion success (Williamson and Fitter 1996;Ricciardi et al. 2011;Kumschick et al. 2015;Davis and Darling 2017;O'Malia et al. 2018). Our aim was to derive a relative "index of cumulative invasion pressure" based on estimates of propagule pressure from the dominant pathways of AIS introduction and secondary spread in the Great Lakes. Separate index scores were developed for fish, invertebrates, plants, and all taxa combined and are expressed across standardized management units for the US waters of the Great Lakes. This framework provides a useful starting point for surveillance and prevention planning that can be adaptively improved.

Materials and methods
A systematic spatial (geo-referenced) prioritization method was developed for attributing weighted indices of invasion pressure to each of 3,487 9 km × 9 km grid squares across the Great Lakes Basin (Figure 1). The method consisted of the following six steps: 1) Attribute: We selected geospatial metrics (hereafter, "surrogates") representing the dominant pathways of AIS introduction to the Great Lakes. Selected surrogate data were then attributed to each of the 3,487 sites using an existing spatial Figure 1. Conceptual diagram showing the systematic spatial prioritization method for attributing a weighted "index of invasion pressure" score for each 9 km × 9 km grid cell. framework for the US waters of the Great Lakes, 2) Rescale: We rescaled all surrogate data layers to values between 0 and 100, with 0 being no value and 100 being the highest value across all sites for each surrogate, 3) Weight: We derived weighting factors for all combinations of taxa and pathways (i.e. surrogates) based on existing knowledge of pathway associations for both past and predicted future invaders. In all, forty weighting factors were derived: 5 pathway surrogates × 4 taxonomic groups (fish, plants, invertebrates and all taxa combined) × 2 time periods (historic and future). We then multiplied the rescaled data layers by the assigned weights. 4 & 5) Combine & Average: For every combination of taxa and time period, the rescaled and weighted data layers for a given site were combined to generate a risk score (e.g. a "historic fish" score). Thus, eight risk scores were generated for each site. We then averaged the "historic" and "future" risk scores within each taxonomic group of interest to generate the final index of invasion pressure scores for each site (e.g. a "fish index of invasion pressure," or an "all taxa combined" index of invasion pressure), 6) Rank: We ranked all sites from highest to lowest risk (by taxonomic group) according to the index score for each grid square. Additional detail for each step is provided below.

Spatial framework
We used the Great Lakes Aquatic Habitat Framework (GLAHF) 9,000-meter grid (Wang et al. 2015) as our underlying spatial framework. Using ArcGIS version 10.3 (ESRI 2015), the original raster grid was converted to a polygon layer and cells (9 km per side) were attributed with country, state/province, and lake basin based on the location of cell centroids. This grid was subsequently attributed with data from surrogates related to the dominant pathways for the introduction and secondary spread of aquatic non-native species in the Great Lakes, namely shipping, recreational boating, the trade in live organisms (including release and escape), cultivation or stocking, and canals (Mills et al. 1993;Holeck et al. 2004;Ricciardi 2006;Pagnucco et al. 2015;Hatton et al. 2019). The surrogates we used were selected because data for each were available over the entire geography of interest and based on evidence in the literature that each surrogate is a reasonable proxy for propagule pressure for the taxa and pathways of interest (ship visits and marina size, O'Malia et al. 2018 (fish, invertebrates); human population, Copp et al. 2010 (fish), Darling 2017 (all taxa), O'Malia et al. 2018 (fish, invertebrates); ponds and natural dispersal, Marchetti et al. 2004 (fish), Woodford et al. 2013 (fish)).
Grid cells were attributed according to features occurring locally in the grid cell. Coastal cells that included a river mouth were also attributed with the features in upstream contributing areas (watersheds). The grid was restricted to waters of the Great Lakes, connecting channels, and inland streams up to the first major barrier. The first major barrier was identified using a draft version of the FishWorks hydrography and barriers data layers (Moody et al. 2017).

Surrogate data
The data representing surrogates for pathways of AIS invasion were acquired from multiple sources ( Table 1). The data were attributed to grid squares as follows: Most point datasets originated as tabular data and were converted to geospatial layer points using latitude and longitude coordinates contained in the data using ArcGIS version 10.3 (ESRI 2015). Census population and land cover for the Great Lakes Basin were acquired from the Great Lakes Aquatic Habitat Framework dataset (GLAHF; https://www. glahf.org/). We used the Dasymetric Mapping Toolbox tools (USEPA 2017) to apportion the census unit population data to appropriate land covers to get a more refined geospatial representation of population across the basin. This tool apportions the census block unit population to those areas within that block that have "developed" land uses (30 m cell size resolution). By apportioning the data in this way, waterbodies and undeveloped land get little to no population assigned and the developed areas get most of the population. For our work, which quantifies population in every watershed, this provides a more accurate assessment of the population in each watershed. These refined population data were then attributed to GLAHF watersheds and our grid cells. The Chicago metropolitan area is situated mostly outside of the basin, but because of the artificial connections created by the Chicago Area Waterway System, much of the population is effectively connected to the basin. We therefore included the population within two 8-digit hydrologic units (07120003 and 07120004), which are hydrologically connected to Lake Michigan, to more accurately account for the population risk in the Chicago area. Data located within the boundaries of a Great Lake or along the coastline were assigned to the grid cell in which they occur and attributed with a count of the feature in that grid cell (e.g., population size) or a total amount of an attribute of the feature (e.g., total number of marina boat slips). Data located inland were first attributed to watershed polygons developed as part of the GLAHF, and then transferred to the appropriate grid cell using the outlet pour point of those watersheds that intersected the grid. To create risk scores that did not over emphasize a single pathway, we combined the rescaled data for marina size and boat launch size (both surrogates for the natural dispersal and bait release pathways) into a single variable.
The shipping surrogate data layer is a combination of the number of ship visits to a given port and open water discharge events, with the latter being treated as equivalent to a ship visit. There is no evidence that the risk of introduction from these two events is equal, just as the risk of introduction is not equivalent between any two ship visits. That is, not all port visits result in ballast discharge, and even where there is a discharge event, volume and risk are not equal. Drake et al. (2015) have shown that ballast water volume has minimal accuracy as a proxy variable for species' invasion at new locations and propagules can also be introduced by other mechanisms like hull fouling (Drake and Lodge 2007;Bailey et al. 2011b). Therefore, we treated ship visits and open water discharge events as equal based on the assumption that the introduction of potentially invasive propagules is possible with any single event of either kind. We selected ship visits as a primary measure because O' Malia et al. (2018) found that out of eight pathway metrics for maritime commerce (including commercial cargo tonnage and ballast water discharge volume) commercial vessel trips were the best predictor of AIS presence across ports in the Great Lakes over a five decade period (from 1970 to present). However, if we had only used ship visits, this would suggest that the risk from shipping only existed in ports, and since data on open water discharge events was also available, we chose to include those events so that invasion risk could also be characterized for "offshore" sites exposed to some risk from shipping.
The canal data layer consisted of the point locations for smaller headwater or large canal inter-basin connections. The connection points were assigned values between 1-100. Perennial connections where an aquatic pathway is maintained at all times regardless of flow were assigned a value of 100. Connections that are intermittent in nature and only establish an aquatic pathway under high flow conditions (i.e. during a rainfall event that has a one or ten percent probability of being equaled or exceeded in any given year) were assigned values based on perceived risk derived from an existing risk assessment of Great Lakes and Mississippi River inter-basin connections by the US Army Corps of Engineers (USACE 2013). USACE assessed risk of inter-basin transfer of AIS for eighteen inter-basin connections. Eight of these intermittent connections were evaluated as high or medium-risk pathways. The other ten intermittent connections were deemed low risk. For our analysis, low-risk connections were assigned a value of 1, medium and high-risk connections were assigned a value of 10. As with the other risk variables, upstream connections were summed to the drainage outlet. Grid cells with no connections were assigned a value of zero.

Rescaling data
Maximum values for the population, shipping, boating, and ponds surrogates varied widely (e.g. max. population = 6,670,986; max. ponds = 18,517), whereas all surrogates had a minimum value of 0. Therefore, surrogates were normalized to combine data layers without overemphasizing data measured on a scale with a higher maximum value. We rescaled the surrogates using a min-max normalization approach (Schneider 2009). The surrogate value for each site was divided by the maximum value for that surrogate from across all sites and the quotient was multiplied by 100. The min-max method had only a small effect on the underlying distribution for each surrogate because maximum values did not greatly exceed other values. For the connections data layer, we retained the relative 0, 1, 10, 100 values from our created index, since the aggregate value of all connections (based on the created index) did not exceed 100 for any of the sites with connections.

Weighting data
Rescaled surrogates were weighted to modify each data layer's relative influence in the cumulative index of invasion pressure. Retrospective and prospective analyses of non-native species introductions to the Great Lakes demonstrate that the pathways most responsible for introduction of one taxonomic group can be different from those for another taxon and the relative importance of the major pathways as vectors for introduction is changing (Pagnucco et al. 2015;Ricciardi 2006;O'Malia et al. 2018). For our approach, weights represent the relative importance of pathways for each of the taxonomic groups of interest (fish, invertebrates, plants, and all taxa combined) and the known or expected contribution of pathways to past (historic) and predicted future invasions, respectively.  Table S1).
A priori exclusions included viruses, bacteria, marine and tropical species, species established in all five lake basins, and species with no known history of invasion or impacts (among other criteria). We then used the Great Lakes Aquatic Nonindigenous Risk Assessment (GLANSRA), a semi-quantitative questionnaire-based methodology Table 2. "Future invaders" risk weighting factors. The risk weighting factors were calculated as the proportion of species in each taxonomic group that is predicted to arrive by the pathway(s) specified by Davidson et al. 2017. When more than one pathway was indicated for a surrogate (e.g. INT + ESC for US population), the risk weighting factor is the sum of all pathways combined. The corresponding surrogate to which the weights were applied is indicated. "Inverts" = invertebrates. developed by Davidson et al. (2017), to evaluate invasion potential and to assign a pathway (or pathways) of introduction for every species on the candidate list. The risk assessment method from Davidson et al. (2017) scores risk for each of three "assessment components" (introduction, establishment, and impact) based on the results of a literature review and expert judgement. Having a priori excluded marine, estuarine, tropical, and sub-tropical species from the candidate list, we assumed all remaining species were capable of establishment in the Great Lakes if introduced. We therefore used only the introduction and impact components of the GLANSRA to identify the final list of future invaders, based on the following criteria: for introduction, we excluded species if their probability of introduction was assessed as "unlikely" (i.e. pathway risk score = 0) with high confidence (i.e. zero unknowns); for impact, we excluded species with low or unknown impact scores. The final list of "future invaders" was comprised of the 147 species that met these criteria. The future invaders risk weighting factor for each surrogate (i.e. pathway) was then derived based on the relative proportion of all future invaders assigned to each pathway (Table 2). In cases where a surrogate is associated with more than one pathway (e.g. US population accounts for intentional release and escape from culture pathways) the weighting factor was the sum of proportions from all pathways combined (and could therefore exceed 1).
2. "Historic invaders" risk weighting factor: In this approach, a pathway of introduction was assigned to every Nonindigenous and Range Expander species currently established in the Great Lakes (GLANSIS 2017). Pathways were assigned per the pathway categories defined by GLANSIS using the GLANSIS Species List Generator Tool (e.g. aquaculture, aquarium release, bait release, canals, etc.; see Table 3). The Species List Generator generates custom lists of nonindigenous species for a specified geographic area, species category, taxonomic group, species status, and pathway. For our analysis we sorted species to each of the GLANSIS pathways using the following search criteria: Table 3. "Historic invaders" risk weighting factors. The risk weighting factors were calculated as the proportion of species in each taxonomic group for which introduction has been assigned to a given pathway(s) based on the GLANSIS assessment (for "Nonindigenous + Range Expanders" species). When more than one pathway was indicated (e.g. aquarium release, pet release, stocked, and planted for US population), the weighting factor is the sum of all pathways combined. The corresponding surrogate to which the weights were applied is indicated. "Inverts" = invertebrates.

Combining data
We summed the rescaled, weighted data to generate multiple risk scores for each site (eight in all; a "historic" fish, invertebrate, plant, or all taxa combined score and a "future" fish, invertebrate, plant, or all taxa combined score). Combining data layers in this way assumes that data are independent. We used the non-parametric Mann-Kendall test to test for correlation between data layers. The analysis was conducted using the Kendall package (McLeod 2011) in R (R Core Team 2019).

Averaging data
Summed data for each site from the historic and future risk models were averaged to produce a final index of invasion pressure score for each taxonomic group. To determine the strength of the correlation between scores for each site based on the "historic invaders" versus "future invaders" weighting factors, we used rank order correlation, averaging ties (Spearman's rho; Systat 13). A non-parametric test was chosen because about half the risk scores were zero, and thus the data were not normally distributed.

Ranking data
After the final index score was calculated for each taxonomic group all 3,487 sites were ranked from highest to lowest risk score. The rank order facilitated an analysis of the proportion of total propagule pressure that could be accounted for by sampling an increasingly greater number of sites.
Within each taxonomic group we derived the "proportion of propagule pressure" measure for each pathway surrogate independently and on average across all pathways based on surveillance effort of up to the maximum number of sites (i.e. 3,487). As an example, for fish and the population surrogate: First, we ranked all sites based on final risk score for fish and sorted the ranked sites from highest to lowest. Then, the population values for all higher ranked sites were summed to accumulate the population accounted for in ranked sites 1−n. For every iteration (1 to 3,487), this rank accumulated population was divided by the total combined population for all 3,487 sites to yield the proportion propagule pressure (from pathways associated with the population surrogate) accounted for within the top "n" sites.

Surrogates
Pathway surrogate values vary across the basin ( Figures S1-S5), but most grid squares have low values for each surrogate, relative to basin-wide maximum values (Table 4). All surrogate combinations are significantly correlated but Mann-Kendall tau values are low (mean ± SD = 0.19 ± 0.18), indicating that while there is some monotonic relationship between each pairwise set of surrogates, any one surrogate is not necessarily a good predictor of another (Table 5). We found that site rank order based on values for each surrogate, from highest value (rank 1) to lowest value, varied across surrogates (Table 6).

Weights
Weights representing the proportion of either historic or predicted future invaders associated with each of the dominant pathways of AIS introduction to the Great Lakes varied by taxa (Tables 2, 3). The dominant pathways for introduction of fish and invertebrates did not vary over time (i.e. within each group the pathways responsible for introduction of historic invaders are the same as those for future invaders; Figure 2). The majority of fish have been or will likely be introduced through pathways associated with ponds (i.e. stocking; historic weight = 0.61; future weight = 0.67) and human population (i.e. organisms in trade; historic weight = 0.68; future weight = 0.57). Invertebrates are strongly associated with the shipping pathway, by a nearly 2:1 margin relative to any other pathway (historic weight = 0.67; future weight = 0.74). For plants, the dominant pathways for historic invaders differ from those for predicted future invaders ( Figure 2). Hitchhiking or hull fouling associated with recreational watercraft is the pathway most responsible for past plant invasions (historic weight = 0.66), while pathways associated with ponds (i.e. cultivation or stocking; future weight = 1.08) and human population (i.e. intentional introduction or escape from the live trades; future weight = 0.98) are predicted to be the dominant pathways for future plant invasions.

Index scores and Rankings
The spatial framework was comprised of 3,487 sites in the U.S. Great Lakes basin. The number of sites with index scores > 0.5 based on past invasions (i.e. historic risk score) was, for fish (442), for invertebrates (201), for plants (371), and for all taxa combined (302). The number of sites with index scores > 0.5 based on predicted future invasions (i.e. future risk score) was, for fish (427), for invertebrates (232), for plants (601), and for all taxa combined (474). The number of sites with invasion risk scores > 0.5 based on the average of historic and future risk scores (i.e. final risk score) was, for fish (436), for invertebrates (220), for plants (490), and for all taxa (403). Historic and future invaders index scores were strongly correlated (r s = 1.000 (fish); 0.999 (invertebrates); 0.999 (plants); 0.999 (all taxa)). Of the 3,487 scores, 85%, 75%, 78%, and 82% were within 25 ranks between the historic and future invaders indices for the fish, invertebrate, plant, and all taxa model, respectively. However, the distribution on the difference between ranks (historic vs. future) indicates that rank-order within taxa varied by model, between a maximum of 193 places for fish and 445 places for plants. A common result was that many scores for the "future invader" model were modestly higher than for the "historic model," increasing from 1-100 ranks, and these were offset by many fewer sites that had a large decrease in rank ( Figure S6). The highest risk sites vary by taxa. However, the same subset of sites consistently rank among the top twenty-five highest risk sites for all taxonomic groups (Table 7). For fish, the high-risk sites are especially concentrated at the St. Clair-Detroit River System (SCDRS; from Port Huron,  MI to Sandusky, OH), in western basin Lake Erie, and in southern Lake Michigan (Figure 3). These sites are characterized by moderate population density within their contributing catchments, large marinas and boat ramps, or in some cases moderate to high shipping activity. The highest risk sites for invertebrates are major ports, including Duluth-Superior, Toledo, Chicago, and Cleveland (Figure 4). High risk plant sites are concentrated in southern Lake Michigan, near large population centers at the mouth of the Chicago Area Waterway System and in western and central Lake Erie and the SCDRS, an area with relatively large boat launches and marinas ( Figure 5). The composite index for "all taxa" highlights concentrated risk at a few discrete locations representing a similar subset of sites identified as high-risk sites for the other taxonomic groups (Figure 6). The "proportion of propagule pressure" measure indicates that a relatively small number of sites account for a majority of predicted propagule pressure to the basin for any given pathway (Figure 7, Table 8). On average, across all pathways fewer than thirty sites represent at least 50% of propagule pressure (range 23-26 sites depending on taxa; Table 8). Sites with large  values for any given surrogate substantially increase the proportion of propagule pressure that is accounted for within some pathways. For example, including Duluth, MN (25 th highest risk site for fish introduction, but the largest shipping port in the Great Lakes) in a "portfolio of surveillance sites", increases the proportion of propagule pressure that is accounted for from the shipping pathway by 17%.

Discussion
Modeled historic and future risk scores for most sites are similar across all taxonomic groups (fish, invertebrates, plants, all taxa) despite the differences in underlying pathway surrogate weights, suggesting that our model predictions are relatively robust (i.e. small changes in surrogate weights or values do not substantially affect risk scores). Indeed, the top Figure 7. For each pathway, proportion of propagule pressure as a function of number of sites surveyed (based on final invasion risk scores for each taxonomic group, ranked 1 to n). Dashed lines intersect the curves at the minimum number of sites that must be surveyed to account for at least 50% or 95% of propagule pressure from each pathway. "Inverts" = invertebrates. Table 8. Minimum number of sites that must be surveyed to account for at least 50% (95%) of propagule pressure from each pathway (and on average across all pathways), based on final invasion risk scores for each taxon (sites ranked 1 to n). "RecBoat" represents surrogates (marina size and boat launch size) associated with the recreational boating pathway. "Inverts" = invertebrates. twenty-five highest risk sites across all taxonomic groups are comprised of a subset of only thirty-three sites (Table 7). These thirty-three sites represent a nexus of invasion pathways and collectively account for a major proportion of total propagule pressure. The model prediction, showing concentrated risk at a few discrete locations, is fundamentally similar to predictions from other analyses of Great Lakes' invasion risk (e.g. Grigorovich et al. 2003). This relative concentration of risk at a handful of sites around the basin means that monitoring a reasonable number of sites (i.e. fewer than 100) is likely to account for most of the existing risk. Here we examine the rationale for our model framework and some limitations of the framework design. We then discuss how the framework could be applied to improve surveillance efforts across the basin including implications for selecting priority sites for surveillance implementation and determining which taxa to target for surveillance at any given site. Finally, we suggest some areas of future inquiry for improving model predictions.

Framework methodology
Our use of surrogates (for AIS pathways) to predict propagule pressure is a commonly employed approach for modeling invasion risk (e.g. Compton et al. 2012;Leathwick et al. 2016;Davis and Darling 2017). Although invasion risk predictions are often based on a statistical approach that examines the relationship between surrogates and existing patterns of invasion, we chose to develop a simple additive model that describes propagule pressure based on the relative importance of key pathways of invasion for established and predicted future AIS. We did this in part because empirical validation of statistical models is hampered by insufficient or biased empirical data sets of non-native species distribution in the Great Lakes (Grigorovich et al. 2003). Biased or insufficient distribution data may explain why analyses of invasion success based on surrogates sometimes produce contradictory results (Wonham et al. 2013). Also, in a large connected water body like the Great Lakes, contemporary non-native species distribution patterns reflect natural dispersal and secondary spread (Sieracki et al. 2014;Beletsky et al. 2017) and points of initial introduction can be obscured (Davis and Darling 2017). Patterns of secondary spread are relevant for an effective regional surveillance program, but a key focus for our modeling effort was to identify the sites where novel AIS are most likely to be introduced into the Great Lakes. A limitation of our approach is that it reflects an understanding of the expected contribution of each pathway to Great Lakes' invasions that is subject to change. AIS invasion pathways are dynamic and future management actions may reduce propagule pressure from certain vectors (Bailey et al. 2011a). Thus, pathway surrogate weightings, while based on objective estimates of invasion pressure using a uniform risk assessment, are a source of uncertainty. However, the underlying model framework allows us to explore how changes in pathway dynamics may affect invasion risk. For example, compared to established non-native species (historic), a smaller proportion of future invaders are predicted to be introduced via the shipping pathway (all taxa pathway weights 0.43 vs. 0.31, respectively), which is consistent with the recent decline in new introductions attributed to this pathway (Bailey et al. 2011a). Instead, future invasion pressure is predicted to be concentrated in large cities, reflecting the increasing importance of the trade in live organisms' pathway (all taxa pathway weights 0.67 future vs. 0.37 historic for the US population surrogate; Rixon et al. 2005;Pagnucco et al. 2015). Although risk scores based on historic pathways are highly correlated with risk scores based on predicted future pathways, differences like this underscore the added value of our using the average of historic and future risk scores as the final index of invasion pressure. Retaining an element of a backward-looking model is important because current invasion pressure might still best be predicted by past trends in invasion pathways, owing to the expected lag in spread and establishment following initial introduction (Crooks 2005). But incorporating information about predicted future pathways is equally important because the relative importance of different invasion pathways is changing.
Another limitation of our model framework is that the pathways on which our model is based, while generally considered the dominant pathways for AIS invasion in the Great Lakes, are only a subset of all possible pathways, and the surrogates we have selected for each pathway are only a subset of possible surrogates. These represent additional sources of uncertainty, though as noted earlier, the surrogates that we use have been found to be reasonable proxies for invasion pressure across different taxa (e.g. human population, marina size and ships visits; O'Malia et al. (2018). While our model may not account for total risk from all possible pathways or pathway surrogates, the model is based on the reasonable assumption that every pathway has a unique potential to increase overall invasion risk for a given site. Model predictions for West Harbor/Marblehead (Lake Erie) are a good example. Whereas most surrogates in our model are strongly correlated (Table 5), values for the population and marina size surrogates at Marblehead diverge substantially. Total population at Marblehead is 6,743 (ranked 409 th out of 1,781 sites with a human population value), but with 5,710 combined boat slips and boat ramp parking spaces it ranks as the single most popular boating and fishing area in the Great Lakes (a "destination watershed," sensu Davis and Darling 2017). If population alone was used to predict invasion pressure Marblehead would be considered a low-risk site, but the high-intensity of recreational boating traffic suggests that realized propagule pressure at the site is high.

Framework application
One challenge for surveillance planning is to identify the locations where new introductions are most likely to occur. Management resources are finite. Hence, it is important that surveillance efforts concentrate on those sites with the highest risk of introduction (Lodge et al. 2006). Yet, current surveillance efforts for AIS in the Great Lakes basin are often implemented across very large priority areas on the order of hundreds of kilometers (e.g. Green Bay, the SCDRS, and Western Lake Erie; USFWS 2014). In reality, risk is probably not spread evenly across such large areas and each location likely contains multiple sites at relatively high-risk for AIS introduction. The spatial framework that we employed allows managers to sort surveillance priorities to specific "neighborhoods" within these larger geographies of risk based on a standardized 9 km × 9 km survey unit. Focusing on discrete patches in this way can increase detection sensitivity since limited surveillance resources are concentrated locally rather than spread across a much larger area of dispersed risk.
Our model reveals taxa-specific pathway associations which, when paired with pathway activity around the basin (from surrogates), show that spatial patterns of risk vary by taxa. Thus, model results are useful for prioritizing which taxa to target at a particular site. We found that the risk of invertebrate introduction is greatest at sites with high levels of shipping activity, whereas fish and plant introduction is most influenced by population density and density of ponds, respectively. Surveillance site selection for these taxa should be prioritized accordingly (see Table 7). Invertebrate surveillance should be directed to major ports like Duluth-Superior harbor (MN/WI), the busiest shipping port in the Great Lakes and one of the busiest in the United States in terms of total tonnage per annum (USDOT 2017). Conversely, Duluth-Superior ranks outside the top twenty highest risk sites for fish (25) and plants (39). Surveillance efforts for fish and plants should be directed to sites that are the nexus of pathways most associated with their introduction, places like Chicago (IL) and Toledo (OH).
We developed separate ranked lists of high-risk sites for fish, invertebrates and plants, in part because each taxonomic group is best sampled with taxon-specific gears and survey methods. Taxa specific survey designs and gear specifications have already been developed for Great Lakes ports and coastal areas similar in size to our 9 km × 9 km grid, for fish (Hoffman et al. 2011(Hoffman et al. , 2016, invertebrates (Trebitz et al. 2009(Trebitz et al. , 2010, plants (Trebitz and Taylor 2007), and all taxa (Uzarski et al. 2017). These survey methods could be employed to maximize detection sensitivity for the most relevant taxa at any particular site.

Framework improvements
Our model focuses primarily on the likelihood of introduction as a function of propagule pressure and does not explicitly consider the influence of habitat suitability on invasion risk. We implicitly consider the probability of establishment at a broader basin scale because surrogate weightings are based on a list of species that are already established (historic invaders) or a subset of future potential invaders from temperate freshwater habitats. But future model iterations should more explicitly incorporate abiotic measures of habitat suitability (i.e. habitat invasibility, sensu Vander Zanden and Olden 2008). Data on abiotic conditions are increasingly being used to predict suitability of Great Lakes' waters to novel AIS based on published environmental tolerances (e.g. Kramer et al. 2017;Egly et al. 2019). The GLAHF now contains over 300 abiotic variables (Wang et al. 2015) and provides an excellent resource to assess environmental suitability for a species of concern. The Great Lakes Environmental Assessment and Mapping project (GLEAM; Allan et al. 2013), which provides spatially referenced measures of human disturbance across the Great Lakes, is another rich source of basin-wide spatially referenced data. GLEAM could be used to develop site specific measures of anthropogenic disturbance, a well-recognized correlate of invasibility (Marchetti et al. 2004;Havel et al. 2005;Clark and Johnston 2011). Our understanding of the distribution of non-native fish and invertebrates for the US waters of the Great Lakes continues to improve as the USFWS implements and expands their regional surveillance program (e.g. Harris et al. 2018). As the relevant data become available it should be possible to empirically identify the combination of abiotic habitat and human disturbance measures that best predict habitat suitability and to include a suitability measure as a component of overall invasion risk at a site.
Another important consideration for site prioritization is the concept of site irreplaceability and vulnerability (i.e. "site sensitivity," McGeoch et al. 2016). Preventing invasion at sites with exceptional ecological or economic value (e.g., uninvaded areas, areas supporting important fisheries, including large wetland nursery areas, or municipal water intakes), locations where rare or threatened species persist, areas set aside as parks or wilderness areas, and areas of high biodiversity is a relevant management concern that could be accounted for in the model (Vander Zanden and Olden 2008;Collier et al. 2017;Panlasigui et al. 2018).
Inclusion of site connectivity measures would also be relevant (Stewart-Koster et al. 2015), especially where a site has the potential to facilitate the spread of novel AIS. Thus, proximity of sites to key ballast water uptake zones or waterways that connect the Great Lakes or the Great Lakes Basin to other major catchments (e.g., Chicago Area Waterway System or the Erie Canal) could be a component of future prioritization models. Nearest neighbor analysis, network models, or particle transport models could be used to measure connectivity (Sieracki et al. 2014;Stewart-Koster et al. 2015;Beletsky et al. 2017;Kvistad et al. 2019).
Finally, recognizing that the Great Lakes are a shared resource and that a comprehensive regional surveillance program will require a binational approach consistent with the goals of the updated Great Lakes Water Quality Agreement (2012), we recommend that the model framework be extended to include Canadian waters of the Great Lakes. The current framework was developed with funding from the US government and thus was limited in scope to US waters. Surrogate data similar to what was used for the US framework are available for Canada and it should be possible to develop a comparable prioritization model for Canadian waters so that risk can be considered at a basin scale. We expect that patterns of risk in Canada are similar to those observed in US waters, with a high proportion of predicted propagule pressure coalescing around a small number of sites that are the nexus of multiple invasion pathways. Invasion risk at some Canadian sites is likely comparable to that of some of the highest ranked sites in US waters.

Conclusion
Models are useful for surveillance planning because they provide a framework that allows us to predict and compare future invasion risk based on existing pathway and species information and they can be updated as new information emerges (Wonham et al. 2013). Our model used available data for surrogates related to the dominant pathways for AIS introduction and estimates of the known or expected contribution of each pathway to past and predicted future Great Lakes' invasions to predict risk of AIS introduction for 3,487 sites spanning the US waters of the Great Lakes. Risk is concentrated in a few high-risk sites, but the relative risk from different taxa varies across these highest risk sites, allowing stakeholders to make decisions about which taxa to target at any given location. Recognizing that taxon-pathway associations are predicted to change over time, the model is designed so that surrogates and weightings can be easily updated as new information regarding potential AIS and associated pathways becomes available. The surrogates are geo-referenced, within a standardized 9 km × 9 km site, so priorities can be sorted to finer scale geographies. Surveillance can then be implemented by managers at a scale conducive to high detection sensitivity. The model should be considered a first step and a working model for Great Lakes surveillance site prioritization and planning that can be adaptively improved.

Supplementary material
The following supplementary material is available for this article: Table S1. Sources used to compile a candidate inventory of "Future invaders". Figure S1. Shipping data attributed to grid squares. Legend shows the number of ship visits (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) or in-lake discharge events (2004)(2005)(2006)(2007)(2008)(2009). Figure S2. Population data attributed to grid squares. Legend shows population count. Figure S3. Ponds data attributed to grid squares. Legend shows number of ponds. Figure S4. Marina and Boat Ramp Size data attributed to grid squares. Legend shows the combined number of marina boat slips and boat ramp parking spaces. Figure S5. Canals and Headwater connections data attributed to grid squares. Legend shows the sum of risk weighted canal and headwater connections values in the upstream contributing area. Figure S6. Frequency distribution showing the difference between ranks for each site (calculated as difference in ranks, futurehistoric) by taxonomic group ("Inverts" = Invertebrates).