Ensemble habitat suitability modeling of vulnerable marine ecosystem indicator taxa to inform deep-sea fisheries management in the South Pacific Ocean

Resolutions of the United Nations General Assembly (UNGA) require states and competent authorities to protect vulnerable marine ecosystems (VMEs), ecologically important habitats in the deep sea that are considered to be especially at risk from anthropogenic disturbances such as fishing. The lack of data concerning the location and extent of VMEs poses a significant obstacle to their protection. Habitat suitability modeling is increasingly used in spatial management planning due to its ability to predict the distribution and niche of marine organisms based on limited input data. We generated broad-scale, medium-resolution (1 km) ensemble models for ten VME indicator taxa within the New Zealand Exclusive Economic Zone and a portion of the South Pacific Regional Fishery Management Organisation (SPRFMO) convention area. Ensemble models were constructed using a weighted average of three habitat suitability model types: Boosted Regression Trees, Maximum Entropy, and Random Forest. All models performed well, with area under the curve scores above 0.9, and ensemble models marginally outperformed any of the individual modeling approaches. Highly suitable habitat for each VME indicator taxa was predicted to occur in relatively small areas throughout the region, typically associated with elevated seafloor features with steep slopes. Determining the spatial distribution of VME indicator taxa is critical for assessing the current and historical extent of bottom trawling impacts on benthic communities, and for supporting the improved spatial management of fisheries in the South Pacific Ocean. Given the additional threats of climate change and ocean acidification to VME indicator taxa throughout the deep sea, habitat suitability modeling is likely to play an increasing role in designing effective, long-term protection measures for cumulative


Introduction
The demand for resources and advances in technology have greatly accelerated the rate of anthropogenic disturbances to even the remotest of deep-sea environments (reviewed in Ramirez-Llodra et al., 2011). Historically, the largest documented disturbances to deep-sea communities of the seafloor have occurred as the result of fishery activities (reviewed by Clark et al., 2016). The bulk of impacts have stemmed from bottom trawling practices (Puig et al., 2012;Pusceddu et al., 2014), which can cause extensive physical damage to benthic environments. However, long-line fishing gear has also been observed to disturb deep-sea habitats (Orejas et al., 2009). Bottom trawls were found to have severely damaged 30-50% of cold-water coral reef structures in some regions (Fosså et al., 2002), and have also been shown to considerably reduce the richness, diversity, and abundance of deep-sea soft sediment communities (Cryer et al., 2002). In addition to direct physical damage, bottom trawling resuspends sediments that can smother benthic filter feeding organisms even outside of the trawled area (Palanques et al., 2001).
In areas beyond national jurisdictions (the high seas) in the South Pacific Ocean, bottom trawl fisheries targeting deep-sea species such as orange roughy (Hoplostethus atlanticus), cardinalfish (Epigonus telescopus), black oreo (Allocyttus niger), and alfonsino (Beryx spp.) occur in several broad areas. The South Pacific Regional Fisheries Management Organisation (SPRFMO) is an intergovernmental agency established by nations obligated under United Nations General Assembly (UNGA) resolutions to ensure the sustainability of high seas fisheries and to safeguard marine ecosystems from adverse impacts. To meet these two objectives, management approaches have been designed to protect vulnerable marine ecosystems (VMEs) while simultaneously ensuring fishery access to areas suitable for sustainable fishing. VMEs are defined by their susceptibility to disturbance based on the fragility, functional significance, rarity, and life history traits of their component communities, populations, or species (FAO, 2009). Several VME indicator taxa have been identified for the South Pacific Ocean using these guidelines . The United Nations require that states and associated regional fisheries management organizations put in place adequate measures to prevent significant adverse impacts on VMEs, which can include closing areas to fishing (UNGA, 2007). In the SPRFMO convention area, two main interim management measures have been implemented for New Zealand-and Australian-flagged vessels to protect VMEs: a move-on rule that requires vessels to cease bottom trawling within 5 nautical miles of an encountered VME, and a network of large (20 min latitude by 20 min longitude) closures in previously unfished or low-impacted areas . However, the Australian and New Zealand implementations of the interim measures were different and not always complementary, and the efficacy of these measures has been questioned (Penney and Guinotte, 2013). The SPRFMO Commission has, therefore, indicated a desire for the interim measures to be replaced by a consistent and improved measure as soon as possible, principally through the design and implementation of more effective spatial closures. The design of spatial management measures requires information about the distribution or likely distribution of VME indicator taxa in the South Pacific Ocean.
Habitat suitability modeling (also known as species distribution modeling) is increasingly used to predict the niche and distribution of species based on limited data. This statistical approach relates field observations to environmental predictor variables, yielding predictions of the abundance or probability of presence of taxa and the underlying environmental drivers of their geographic distribution. Habitat suitability models are particularly useful for characterizing VMEs in the deep sea, where field surveys are logistically difficult and expensive and it is not feasible to directly observe the entire area of interest (Vierod et al., 2014). In this study, we build on previous habitat suitability modeling work (Anderson et al., 2016a(Anderson et al., , 2016bRowden et al., 2017), using an ensemble habitat suitability modeling approach to predict the niche and distribution of ten VME indicator taxa over an area that covers the fishing interests of both New Zealand and Australia in the western South Pacific. The model outputs were designed for use in a multi-country and multi-stakeholder spatial management planning process to design and implement new spatial closures in the region, for eventual adoption by SPRFMO.

Site description
The study area extends over a 21.6 million km 2 area of the South Pacific Ocean from approximately 24°S to 58°S and 143°E to 146°W. This area encompasses the New Zealand EEZ, and a large portion of the SPRFMO convention area that represents the main fishing areas of New Zealand and Australian fleets (Fig. 1). We set the spatial resolution of the models to 1 km 2 (see Table 1) to match that of the main sampling methods for the species presence records; trawls, sleds and dredgeswhich had a median tow length of about 0.8 km (Anderson et al., 2016a).

Occurrence sampling
Ten VME indicator taxa were selected for modeling. These included two sponge taxa; glass sponges (Hexactinellida) and demosponges (Demospongiae), and four species of framework-building scleractinian corals; Enallopsammia rostrata, Goniocorella dumosa, Madrepora oculata, and Solenosmilia variabilis. Soft corals (Alcyonacea), sea pens (Pennatulacea), and black corals (Antipatharia) were modeled as larger taxonomic groups rather than at the species level. Hydrozoans in the family Stylasteridae, commonly known as hydrocorals, were modeled as a single taxonomic group. These taxa have all been identified as VME indicator taxa based on their vulnerability to trawls (fragility, rareness, or low productivity), ecological significance (functionally significant via habitat creation or ecological services), and ability to be readily identified from trawl collections . Presence data for the ten modeled taxa were obtained from New Zealand and Australian museum/collection records, fisheries research databases, and online biodiversity databases. Over 24,000 individual records were compiled, but these were unequally distributed among the taxa with numbers ranging from 203 (Madrepora oculata) to 5345 (Alcyonacea) ( Table 2). Duplicate occurrences were removed prior to further analyses.

Background sampling
For the background (or pseudo-absence) data locations used in the models, we used a combination of the concepts of random selection and target-group background sampling (Phillips et al., 2009) to generate a spatially structured set of background points. While commonly not accounted for in habitat suitability models, spatial bias in the sampling of occurrence data considerably weakens the performance and interpretability of models (Phillips et al., 2009). However, it is possible to reduce the effects of sampling bias by selecting or creating a set of background data that reflect the same bias as the occurrence data (Phillips et al., 2009). We were unable to construct target-group background data directly (i.e., from the locations of a broad set of species presence records based on similar observation methods) due to the wide range of data sources supplying the presence data. Given the relatively high sampling bias observable in the observed presences, we generated a structured sample of background points for each taxon, based on the distribution of presence records. To preferentially generate background points in areas with more presence observations, we first created a two-dimensional kernel density estimate of sampling effort based on the presence locations for each taxon (e.g., Supplemental Fig.  S1a). This estimate created a probability grid from which 10,000 background points were sampled according to the probability grid weights (e.g., Supplemental Fig. S1b). Previous studies using a similar approach to reduce the influence of sampling bias found that model performance was significantly improved (Elith et al., 2010;Huang et al., 2011;Fitzpatrick et al., 2013).

Environmental variables
A suite of environmental variables (Table 1) that may potentially influence marine benthic organism distributions were generated using a similar approach to that described by Davies and Guinotte (2011), based on depth data derived from a recently updated bathymetry grid for the New Zealand region, which we gridded at a resolution of 1 km 2 (Mackay et al., 2015). Additional seafloor terrain metrics were also derived from this bathymetry using the Benthic Terrain Modeler in ArcGIS 10.3.1.1 (Wright et al., 2012). Locations were defined as being on a seamount if they were within the footprint of the deepest complete depth contour around any of the seamounts recorded in the NIWA Seamounts database (Rowden et al., 2008) or identified by Yesson et al. (2011). The percent mud and percent gravel layers for the region were developed from > 30,000 raw sediment sample data compiled in dbseabed (Jenkins, 1997), which were then imported into ArcGIS and interpolated using Inverse Distance Weighting.
Dissolved oxygen (Garcia et al., 2014b), apparent oxygen utilization (Garcia et al., 2014b), percent oxygen saturation (Garcia et al., 2014b), temperature , salinity , nitrate (Garcia et al., 2014a), phosphate (Garcia et al., 2014a), and silicate (Garcia et al., 2014a) data were obtained from the World Ocean Atlas (version 2, 2013). The saturation states of the aragonite and calcite polymorphs of calcium carbonate were obtained from Bostock et al. (2013). Oxygen and nutrient data were originally available at a S.E. Georgian et al. Fisheries Research 211 (2019) 256-274 resolution of 1°. Salinity and temperature were available at a resolution of 0.25°. Carbonate data were available at a resolution of 0.5°. These variables were obtained as depth-gridded data ranging from 0 to 5,500 m, and were transformed to match the resolution of the bathymetry data (1 km 2 ) using an upscaling approach that approximates conditions at the seafloor (sensu Davies and Guinotte, 2011). Particulate organic carbon (POC) flux at the seafloor was obtained from Lutz et al. (2007) at a native resolution of 0.08°. Surface productivity data were obtained from the Oregon State Ocean Productivity site (http://www.science.oregonstate.edu/ocean.productivity) as the Standard Vertically Generalized Production Model (VGPM; Behrenfeld and Falkowski, 1997), the Eppley-VGPM (Eppley, 1972), and the updated Carbon-based Production Model (CbPM, Westberry et al., 2008). All products were derived from NOAA's Aqua MODIS data ranging from mid-2002 through 2016, and calculated as the mean, minimum, maximum, and standard deviation over that period (Aqua MODIS, 2014). These layers had a native resolution of 0.167°, and were interpolated to a slightly higher resolution (0.05°) through kriging and then resampled to match the extent and resolution of the bathymetry data (1 km 2 ).

Variable selection
Our aim in selecting variables was to identify a base set of predictors for use in each model, modified by specific biological requirements of some taxa, and among which correlations were minimized. The inclusion of correlated environmental variables may inhibit model performance and interpretation (e.g., Huang et al., 2011). Therefore, highly correlated variables were removed based on their relationship with other variables and on their performance in preliminary random forest models. First, all variables were placed into ecologically relevant groupings (see Table 1): seafloor characteristics (terrain metrics and sediment type), water chemistry (nutrients, carbonate chemistry, temperature, salinity, sigma theta, and oxygen metrics), and productivity (POC export and surface productivity). Then a random forest model was constructed for each taxon using the variables in each grouping, and variable performance was assessed as the mean decrease in accuracy (% Inc. MSE) scaled to 100% (Supplemental Fig. S2). Among highly correlated (> 0.7 Pearson's Correlation; Supplemental Fig. S3) variables that clustered together (Supplemental Fig. S4), the variable with the highest performance across all taxa was retained. If performance was equivalent (i.e., within approximately 10%) within a group of highly correlated variables, we also took into account the clustering and correlation of each variable with the remaining variable set, as well as biological relevance. Equivalent variables were preferentially retained if they clustered separately and had lower correlations with the remaining variable set. Depth was removed prior to these preliminary S.E. Georgian et al. Fisheries Research 211 (2019) 256-274 model runs due to its lack of biological relevance apart from strong correlations with other variables. Aragonite saturation state was only included for taxa that incorporate calcium carbonate in the form of aragonite into skeletal materials (i.e., Stylasteridae, Enallopsammia rostrata, Goniocorella dumosa, Madrepora oculata, Pennatulacea, and Solenosmilia variabilis), and calcite saturation state was only included for taxa that incorporate calcite (Alcyonacea and Antipatharia). Given the strong correlation between aragonite and calcite (Pearson's correlation, r > 0.999, p < 0.001), these variables can be expected to perform identically in models, but were distinguished among taxa to make biological interpretations easier. Silicate was included for Hexactinellida and Demospongiae because these taxa incorporate silicate into their skeletons, but was omitted for all other taxa. Temperature was included in all models due to its biological relevance and high performance in previous cold-water coral modeling studies (e.g., Davies and Guinotte, 2011). The correlations and clustering of variables in the final retained set are shown in Fig. 2 (see Supplemental Table S1 for Pearson's correlation coefficient values). Supplemental Fig. S5 shows the relationship of occurrence points, targeted background points, and random background points with each environmental variable in the final set.

Boosted regression tree models
Boosted Regression Tree (BRT) is a presence-absence (as opposed to presence-background) data modeling approach, that can also be used to   model abundance, and is often used to predict distributions of deep-sea taxa (e.g., Rooper et al., 2017). The model calculates habitat suitability probabilities by incorporating recursive binary splits within a regression tree structure to explore the relationship between response and predictor variables . We fitted BRT models using a standard approach, including optimization of the learning rate and number of trees by internal cross validation , and adjustment of the learning rate as necessary to ensure a minimum number of 1000 trees. Tree-complexity was set to three to allow a moderate level of interactions between terms. A random selection of absence records, equal in number to the number of presence records, was selected from the sampling bias-corrected set of 10,000 background points generated for each taxon, following the recommendations of Barbet-Massin et al. (2012). The relative influence of each variable was determined from the number of times it was selected for splitting, weighted by the squared improvement to the model as a result of each split (using in-bag data) (Friedman and Meulman, 2003).

Maximum entropy models
The Maximum Entropy (Maxent) modeling algorithm (Phillips et al., 2006) is a machine learning, presence-only model that is based on the statistical assumption that within known environmental constraints, maximizing entropy is the optimal method to address remaining uncertainty in a distribution. Maxent has been shown to outperform other presence-only models (Elith et al., 2006;Tittensor et al., 2009) and presence-absence models (Reiss et al., 2011). All Maxent models were run in an R environment using the 'dismo' package (Hijmans et al., 2015). Models were built using default Maxent settings (convergent threshold = 10 −5 , prevalence = 0.5, regularization multiplier = 1) that have performed well in previous studies (see Phillips and Dudik, 2008). However, the number of maximum iterations was increased to 5000 to ensure model convergence, as preliminary models did not always converge within the default 500 iterations. In addition, instead of generating random background points, models were trained using the sampling bias-corrected set of 10,000 background points. The standard jackknifing procedure was used to determine the percent contribution of environmental variables to the model.

Random forest models
Random forest (RF) modeling (Breiman, 2001) is a classification and regression tree-based approach similar to boosted regression trees. However, it chooses a random selection of explanatory values at each split and employs an internal bootstrapping approach to produce and combine a large number of individual trees. Models were built using the 'randomForest' package (Liaw and Wiener, 2002) in R. In most cases, the error rate stabilized after approximately 100 trees, however 501 trees were generated for each taxon to ensure stabilization in all model runs. Optimal values of 'mtry', which determines the number of explanatory variables selected at each tree split, were explored using the 'tuneRF' function and preliminary models. Ultimately, the default value (mtry = 3, the square root of the number of explanatory variables) was used due to its consistent performance across all taxa. Classification trees are sensitive to imbalances in the response variable (Chen et al., 2004;Evans et al., 2011) so, as in the BRT models, the number of background points was set equal to the number of occurrences for each taxon, randomly selected from the 10,000 set of sampling bias-corrected background points. This sampling approach has been found to optimize model performance for both BRT and RF models (Barbet-Massin et al., 2012).

Spatial autocorrelation
Spatial autocorrelation occurs when there is a relationship between observations and geographic distance (Legendre, 1993). Environmental variables are frequently spatially autocorrelated (Lichstein et al., 2002), and observed species distributions often exhibit strong spatial  Table 2). See Supplemental Table S1 for actual Pearson's correlation coefficient values. S.E. Georgian et al. Fisheries Research 211 (2019) 256-274 autocorrelation due to sampling bias or natural processes such as dispersal limitations (Lennon, 2000). The majority of spatial analyses do not account for spatial autocorrelation (Dormann, 2007) despite the fact that it typically violates the underlying assumptions of many models (Kühn, 2007) and can considerably affect model performance and the importance of explanatory variables (Tognelli and Kelt, 2004).
To account for spatial autocorrelation, we calculated a residual autocovariate (RAC) variable for each taxon (sensu Crase et al., 2012). The RAC method derives a term representing the spatial autocorrelation present in the residuals of a preliminary model. This term is then added as an explanatory variable in subsequent models, accounting for spatial autocorrelation present in the initial occurrence data. The RAC approach has been used to correct for spatial autocorrelation in classification models including boosted regression trees and random forest (Crase et al., 2012;Rowden et al., 2017) as well as in Maxent (Václavík et al., 2012).
To calculate the RAC variable, preliminary models were built using environmental variables alone, and the model residuals were tested for spatial autocorrelation using Moran's I (Moran, 1948). Model residuals were rasterized at the same resolution of the prediction grids (1 km 2 ). Next, the autocovariate was calculated based on the focal mean of these residuals, in which the value for each output cell was the average of the input cells within the specified window size. The resulting RAC was then included in models as an explanatory variable. In Maxent models, RAC was calculated for E. rostrata, Hexactinellida, Pennatulacea, and S. variabilis using a 5 × 5 focal mean window (indicating the number of grid cells used to calculate the focal mean) rather than the default 3 × 3 window (Crase et al., 2012), because initial tests revealed highly inflated importance (∼80-90%) of the RAC variable in these models. Increasing the size of the focal mean window reduced the importance of RAC while still resulting in the reduction of spatial autocorrelation among model residuals. The same was true for BRT models, where a 3 × 3 window was used for Antipatharia, Hexactinellida, and Pennatulacea, a 7 × 7 window was used for E. rostrata and M. oculata, and a 5 × 5 window was used for the remaining taxa. For random forest models, a 3 × 3 focal window was used for all taxa, as the highest percent contribution of RAC was relatively low (∼30%) without adjusting the window size. In BRT and RF models, the inclusion of the RAC variable considerably reduced spatial autocorrelation in model residuals as measured by Moran's I. In Maxent models, spatial autocorrelation was originally lower and the RAC variable reduced spatial autocorrelation to a lesser extent than in BRT and RF models, consistent with previous studies showing that Maxent is naturally robust to underlying spatial structures in the input data (De Marco et al., 2008;Václavík et al., 2012).

Model performance
The BRT, Maxent, and RF models were verified using a ten-fold cross-validation procedure in which input data were randomly partitioned into training (70%) and testing datasets (30%) for ten model runs. The area under the curve (AUC) metric was calculated for each model and averaged across all ten runs. AUC is a threshold-independent assessment of the ability of the model to correctly rank occurrences above background locations. A random model has a theoretical AUC of 0.5, while AUC values greater than 0.7 indicate adequate performance, and values greater than 0.8 indicate excellent performance (Wiley et al., 2003;Hosmer et al., 2013). Final models were constructed using all available data. A separate AUC score was calculated for each final model using the entire training dataset to facilitate direct comparisons to the ensemble models (see below). As an additional measure to compare each model to the ensemble model, we also calculated the correlation between predicted habitat suitability and expected values based on the entire training dataset.
To ascertain the spatial variation in model uncertainty, we employed a bootstrap approach to generate coefficient of variation estimates for every cell within the study area (sensu Anderson et al., 2016b). For each modeling type, 200 models were run by randomly sampling occurrences and background points with replacement equal to the original number of occurrences and background points. The mean and standard deviation of each cell was calculated from this set of 200 models, and the coefficient of variation was calculated as the mean divided by standard deviation, reported as a percentage. This approach yields a spatial estimate of uncertainty based on how sensitive the models are to changes in the input presence and background data.

Ensemble modeling
An ensemble of models with fundamentally different approaches to the same problem is a practical way of avoiding dependence on a single model type or structural assumption, and enables a more robust characterization of the predicted spatial variation and uncertainties (Robert et al., 2016). To incorporate the predictions and underlying assumptions of BRT, Maxent, and RF into a single output grid, we produced ensemble models by calculating weighted averages of the three models (after Oppel et al., 2012;Anderson et al., 2016b;Rowden et al., 2017) as follows: where AUC B , AUC M , and AUC R are the model performance statistics (average cross-validated AUC values); X B , X M , X R , and X E the model predictions; and CV B , CV M , CV R , and CV E the bootstrap CVs from the BRT, Maxent, RF, and ensemble models respectively. The performance of ensemble models was calculated as AUC scores calculated using the entire training dataset rather than through a cross-validation procedure using test data, and as the correlation between expected (observed presence or assumed absence) and predicted values.

Model outputs
Model outputs were limited to depths of 300 to 3000 m, approximately equivalent to the combined bathymetric range of all modeled taxa, covering a reduced area of 4.5 million km 2 . For each modeling type (BRT, Maxent, RF, and ensemble), habitat suitability scores were calculated across the study area at a resolution of 1 km 2 . In the context of our modeling approach, these values represent the relative predicted suitability for each taxon ranging from 0 to 1, with higher values indicating a more suitable habitat. While these scores are commonly and at times erroneously referred to as the 'probability of occurrence' in the literature, it is important to note that our habitat suitability scores do not represent the actual probability of occurrence due to the lack of true absence data and unknown efficiency of the sampling gear. In the following text, we use the following arbitrary divisions for interpreting habitat suitability scores: < 0.4=low suitability, 0.4-0.8=moderate suitability, and > 0.8=high suitability.

Final variable set
Of the 30 variables considered for inclusion in models (Table 1), 13 were ultimately retained by the variable selection process (Fig. 2; Table 2). Nine of these variables: broad-scale bathymetric position index (BPI), dissolved oxygen, percent gravel, percent mud, particulate S.E. Georgian et al. Fisheries Research 211 (2019) 256-274 organic carbon flux to the seafloor (POC), occurrence on or off seamounts, standard deviation of slope (slope-SD) calculated with a window size of 15 × 15, temperature, and vector ruggedness measure (ruggedness) calculated with a window size of 7 × 7 were included in all models. The saturation states of aragonite and calcite were included only for coral taxa that incorporate one of these calcium carbonate polymorphs, and silicate was included only for sponge taxa that incorporate silicate into their skeleton (Table 2). In addition, the RAC variable, although not ecologically interpretable, was included in all models to reduce the spatial autocorrelation of model residuals. Therefore, each model was built using a total of 11 variables. The final set of variables generally had lower correlations than the initial variable set (see Supplemental Figs S3 and S4), however some of the retained variables still had correlations greater than 0.7 ( Fig. 2; Supplemental Table S1). Aragonite, calcite, and silicate were all highly correlated. However, only one of these variables was included in each model in accordance with its biological relevance for each taxon. Temperature was highly correlated with aragonite (Pearson's correlation, r = 0.960, p < 0.001), calcite (Pearson's correlation, r = 0.959, p < 0.001), and silicate (Pearson's correlation, r=-0.876, p < 0.001), but was retained in the final set due to its known biological relevance. Dissolved oxygen and POC were also correlated with other variables including aragonite, calcite, silicate, and temperature, but were retained because the correlations were lower (< 0.8), and because they not grouped together in the cluster analysis (Fig. 2).

Model performance
Cross-validation testing suggested that all models for all taxa had excellent performance, with AUC scores above 0.9. However, R 2 values were more variable, ranging from 0.14 to 0.64 for the ensemble models (Table 3). Unlike AUC, R 2 is dependent on scale and will tend to be lower where probabilities are close to zero. Accordingly, R 2 values were higher for taxa with more presence records (e.g., Alcyonacea and Antipatharia) than for taxa with fewer presences (e.g. M. oculata and G. dumosa). AUC scores for BRT models ranged from 0.961 to 0.998, with a mean of 0.978 ± 0.014 across all taxa. Maxent model AUC scores were consistently the lowest of the three modeling approaches, ranging from 0.913 to 0.990 with a mean of 0.964 ± 0.026 across all taxa. RF model AUC scores ranged from 0.951 to 0.993 with a mean of 0.976 ± 0.014 across all taxa. The worst performing model for an individual taxon across the three modeling approaches was Pennatulacea, with a minimum AUC of 0.913 and a mean of 0.948 ± 0.031. Models for S. variabilis consistently performed the best, with a mean AUC of 0.994 ± 0.004 across the three modeling approaches. The sample size (number of presences) for each taxon (Table 2) was not significantly correlated with AUC (Linear regression, R 2 = 0.03, p = 0.33).

Variable contributions
The BRT and RF models had similar percent contributions of predictor variables, while the Maxent models typically had higher percent contributions from variables associated with skeletal chemistry (i.e., calcite, aragonite, and silicate) (Table 4). Variables describing physical characteristics of the seafloor, the saturation state of aragonite or calcite, and RAC were typically the most important variables. While seafloor characteristics (BPI, seamount, slope SD, and ruggedness) generally did not contribute a large percentage of information to the models on their own, their combined contribution when considered together was substantial, ranging between 8.5% and 51.8%. Ruggedness was frequently the best predictor in this group, with percent contributions as high as 36.3%, followed by BPI with a maximum contribution of 23.7%. Taxa were generally located in areas with more complex, steep, and elevated terrain features than expected by chance (Supplemental Figs. S5, S11-S20). When the saturation state of calcite and aragonite where included in models, they typically had a large percent contribution, with species demonstrating a preference for higher saturation states (Supplemental Figs. S11-S20). Calcite contributed an average of 28.2% and 21.0% in models for Alcyonacea and Antipatharia, respectively, across all model types. The average percent contribution of aragonite, when included, ranged from 6.7 to 17.7% across all model types, and contributed an overall average of 12.0% across all taxa and model types. Models for the aragonite-forming taxa that had a relatively low percent contribution from aragonite often had an elevated percent contribution from temperature. Aragonite and temperature were highly correlated across the study region (Pearson's correlation, r = 0.96, p < 0.001; Fig. 2; Supplemental Table S1), making it difficult to tease apart their influence in models. Temperature contributed 4.0-18.3% of total explanatory power when averaged across all model types, and was most important for G. dumosa, Pennatulacea, and S. variabilis. Most taxa occurred predominantly within a narrower temperature range than expected by chance based on the available habitat space (Supplemental Fig. S5), with Demospongiae, Pennatulacea, and Stylasteridae having the largest observed range. For the two sponge taxa, silicate contributed an average of 19.8% (Demospongiae) and 14.0% (Hexactinellida), with both taxa preferentially located in areas with intermediate concentrations (Supplemental Figs. S5, S13, S16). However, detection of the real drivers of sponge distribution may be confounded by the relatively high correlation of silicate with dissolved oxygen (Pearson's correlation, r=−0.72, p < 0.001), POC (Pearson's correlation, r=−0.68, p < 0.001), or temperature (Pearson's correlation, r=−88, p < 0.001). Both dissolved oxygen and POC generally contributed between 3-7% when averaged across all model types. However, these variables were somewhat correlated with each other as well as with aragonite, calcite, and temperature (Fig. 2). In general, taxa occurred in areas with higher POC concentrations than expected by chance, and at intermediate dissolved oxygen concentrations (Supplemental Figs. S5, S11-S20). RAC contributed between 9.1-75.2% of information to the models. BRT models generally had the largest percent contribution of RAC, with a mean of 46.3% across all taxa and the largest percent contribution of 75.2%. Maxent and RF models both had lower contributions of RAC, with means of 24.8% and 22.6%, respectively. Substrate type, quantified as percent gravel and percent mud, generally contributed less than 10% of information to the models for all taxa. While taxa generally occurred across all values of these predictors (Supplemental Figs. S5, S11-S20), there was a general trend towards areas with a lower percentage of mud and a higher percentage of gravel than expected by chance. Being located on or off a seamount generally contributed negligibly to models, with an average percent contribution of 3.0% across all taxa and model types. Taxa were most frequently predicted to be distributed on or near seamounts, although this may partially reflect sampling bias in the occurrence data.

Table 3
Performance metrics for Boosted Regression Trees (BRT), Maximum Entropy (Maxent), and Random Forest (RF) for each taxon, assessed by the area under the curve (AUC) score measured using ten-fold cross-validation on a 30% test dataset withheld from model creation. Larger AUC scores (closer to 1) indicate better model performance.

Ensemble modeling
A single ensemble model was constructed using a weighted average of the BRT, Maxent, and RF models for each taxon (Figs. 3-6). The weighted ensemble models were generally a nearly equal mixture of the three model types for each taxon, as each approach yielded similar AUC values for each taxon (Table 3). The ensemble models produced similar patterns, as the three modeling approaches yielded comparable habitat suitability predictions, with the Pearson's correlation between model types for the same taxon generally being 0.7 or higher (Supplemental Table S2). In general, the BRT and Maxent models were more highly correlated with an average correlation of 0.894 across all taxa, and the RF and Maxent models were less correlated with an average correlation of 0.812. The three modeling approaches exhibited similar spatial patterns, with the biggest absolute differences between each approach typically occurring within areas of predicted high suitability (see Supplemental Fig. S9). However, the model types differed considerably in the scale of predicted habitat suitability scores (Supplemental Fig. S10). BRT consistently predicted higher suitability scores across all taxa, with more extensive areas predicted to have high suitability scores (> 0.8) than the other models (Fig. 3). Maxent generally predicted the lowest habitat suitability scores (Fig. 4), and RF frequently predicted a larger area of intermediate suitability (Fig. 5).
Spatial uncertainty (measured as the coefficient of variation, CV) differed somewhat among the three modeling approaches (see Supplemental Figs, S6-S8). While the models generally exhibited a similar spatial pattern in uncertainty (therefore leading to similar patterns in ensemble models), BRT models typically had the largest CVs, followed by RF and then Maxent. The ensemble model for each taxon generally outperformed any of the individual modeling approaches as assessed by AUC scores calculated using the entire dataset and the correlation between predicted and expected values. However, the RF models marginally outperformed the ensemble model in some cases (Table 5).

Spatial distributions
The ensemble models (Fig. 6) predicted that highly suitable (scores > 0.8) habitat for each taxon occurs in various regions throughout the New Zealand EEZ as well as in portions of the SPRFMO convention area that were included in the study area (Fig. 1). For most taxa, predicted concentrations of highly suitable habitat were located on areas of the Chatham Rise, Kermadec Ridge, Colville Ridge, South Tasman Rise, Macquarie Ridge, Norfolk and West Norfolk Ridges, Lord Howe Rise, Three Kings Ridge, Louisville Seamount Chain, and along much of the continental shelf edge of Australia and the east coast of New Zealand's North Island (Fig. 6). Sea pens (Pennatulacea) had the largest predicted distribution of highly suitable habitat (Fig. 6h), with suitability predicted to be above 0.6 in most of these areas, the Chatham Rise in particular. Soft corals (Alcyonacea) and black corals (Antipatharia) generally exhibited a spatial pattern comparable to sea pens, but with lower predicted habitat suitability scores (Fig. 6a,b and Supplemental Fig. S10) and slightly higher CVs (Fig. 7a,b). The two sponge taxa (Demospongiae and Hexactinellida) had the lowest CVs across the study area (Fig. 7c,f), with habitat suitability predictions similar to but not quite as high as the predictions for sea pens (Fig. 6h). Models for the four species of reef-forming stony corals (E. rostrata, G. dumosa, M. oculata, and S. variabilis) typically predicted lower habitat suitability than for other taxa (Supplemental Fig. S10), with highly suitable habitats clustered on the southeast corner of Chatham Rise, Kermadec Ridge, Colville Ridge, and West Norfolk Ridge (Fig. 6). On the Louisville Seamount Chain, stony corals typically had lower habitat suitability scores than other taxa, with predictions generally showing that only the upper flanks and tops of seamounts were moderately or highly suitable (Fig. 8). In addition, the stony coral taxa exhibited the highest CVs, although model predictions were typically more robust in areas predicted to be more suitable (Fig. 7). Hydrocorals (Stylasteridae) followed a similar spatial pattern to the stony corals, but with generally    S.E. Georgian et al. Fisheries Research 211 (2019) 256-274 higher suitability scores (Supplemental Fig. S10), especially along the Louisville Seamount Chain (Fig. 8j), and lower CVs (Fig. 7j).

Spatial distribution patterns and niche space
We produced habitat suitability models for ten VME indicator taxa in order to inform the spatial management of bottom trawl fisheries in the SPRFMO convention area. Areas of suitable habitat were predicted to occur throughout the region for each taxon, but the most highly suitable habitat was generally clustered in patches covering a relatively small fraction of the total area studied. The patchy distribution of suitability predicted for each taxon highlights the difficulties in drawing broad conclusions from limited field observations over vast areas of seafloor, and emphasizes the role of modeling to fill in key knowledge gaps.
While there were variations among the modeled taxa, habitat suitability maps generally predicted high suitability of at least one of the ten VME indicator taxa across a range of distinct areas, many of which support important bottom trawl fisheries, distributed across the SPRFMO convention area and the New Zealand EEZ. These areas included the Chatham Rise, Kermadec Ridge, Colville Ridge, South Tasman Rise, Macquarie Ridge, West Norfolk Ridge, and Louisville Seamount Chain. Terrain metrics, when considered in concert, generally contributed considerably to the models for each taxon. Topographical features have long been known to be important drivers of cold-water coral and sponge distributions, and many modeling studies have relied solely on terrain metrics as predictor variables (e.g., Dolan et al., 2008;Rengstorf et al., 2012;Tong et al., 2013;Rowden et al., 2017). As filter feeders, cold-water corals and sponges generally exhibit strong preferences for complex and elevated seafloor features that experience locally accelerated currents (Masson et al., 2003;Bryan and Metaxas, 2006). Amplified current speeds improve habitat quality by increasing food supply and capture rate (Thiem et al., 2006;Purser et al., 2010), larval delivery (Piepenburg and Müller, 2004), and the removal of sediments (Rogers, 1994). The availability of hard bottom substrate is generally considered critical for the successful recruitment of many coral species (Gass and Roberts, 2006). Therefore, it is surprising that the substrate variables (percent gravel and percent mud) were not more important in the models, as has been observed in previous studies (e.g., Georgian et al., 2014). However, it is possible that the use of broad scale, interpolated sediment data concealed the confluence of substrate and topographic features at finer scales, for example, small carbonate outcrops and boulders that are known to support cold-water corals (e.g., Baco et al., 2010;Ross and Howell, 2012). The 1 km 2 grid resolution used in our analysis may have also been too broad to capture small-scale variation in substrate type and topographical features, even in areas with high sampling intensity.
Particulate organic carbon (POC) export to the seafloor, the saturation state of aragonite and calcite, temperature, and dissolved oxygen all contributed a small to moderate amount of explanatory power to the models for each taxon. POC flux to the seafloor serves as a proxy for the supply of food to cold-water corals and sponges, which are most frequently found in highly productive waters (Roberts et al., 2009;Wagner et al., 2011). The saturation states of aragonite and calcite generally had moderate to high importance when included in models, particularly for the four species of stony corals, which tended to occur in areas with higher saturation states. High saturation states make it energetically easier for organisms to secrete calcium carbonate skeletal materials, and have been demonstrated to be vital for the growth and survival of cold-water corals in experimental (Georgian et al., 2016b;Kurman et al., 2017), field (Lunden et al., 2013;Georgian et al., 2016a), and modeling (Tittensor et al., 2009;Davies and Guinotte, 2011;Quattrini et al., 2013) studies. Given that most occurrence records occurred within a relatively narrow temperature range, as has been commonly observed in other cold-water coral Cohen et al., 2006;Georgian et al., 2014) and sponge (Leys et al., 2004) studies, it is not surprising that temperature was also an important driver of predicted distributions, especially notable in the four branching coral species. Our models also indicated a general preference for moderate to high levels of dissolved oxygen, indicating that regions with low dissolved oxygen concentrations may be untenable for recruitment or long-term survival. Previous studies have also found that cold-water corals and sponges are generally absent in regions with low concentrations of dissolved oxygen (Freiwald, 2002;Leys et al., 2004;Wisshak et al., 2005). Experimental observations confirm that coldwater corals are sensitive to changes in both temperature and dissolved oxygen (Dodds et al., 2007;Lunden et al., 2014;Naumann et al., 2014). However, it is still uncertain to what extent dissolved oxygen structures their spatial distribution, especially given the extensive cold-water coral populations that occur in oxygen minimum zones in the Gulf of Mexico (Georgian et al., 2016a) and the North Pacific (Baco, 2007). It is important to note however, that due to the relatively high correlations of each of these variables with each other, as well as with depth, it is difficult to discern their biological relevance based solely on field observations and modeling results.

Previous modeling studies in the region
The models presented here expand and improve on previous efforts to model VME indicator taxa in the South Pacific Ocean. Global habitat suitability models have predicted that extensive suitable habitat exists throughout the New Zealand region for stony corals (Tittensor et al., 2009;Davies and Guinotte, 2011), black corals (Yesson et al., 2017), and octocorals . On the regional scale, Tracey et al.

Table 5
Performance metrics for Boosted Regression Trees (BRT), Maximum Entropy (Maxent), Random Forest (RF), and ensemble models for each taxon, assessed by the area under the curve (AUC) score calculated using the entire occurrence and background dataset, and by the correlation (R 2 ) between expected and predicted values. Larger AUC and R 2 values (closer to 1) indicate better model performance. Rise, Tasman Rise, and Louisville Seamount Chain, a spatial pattern similar to our results for the same species. The most important predictors in their models were aragonite saturation state, dissolved oxygen, particulate organic carbon, and temperature, which also contributed moderate to large amounts of explanatory power to the models in our study. Interestingly, as also seen in our models, their BRT models predicted higher suitability scores than the Maxent models, but exhibited similar spatial patterns, suggesting that this may be a trait inherent to these modeling types and highlighting the benefits of using an ensemble modeling approach. However, the authors noted that errors in the SRTM30 bathymetry data, sampling approach, lack of substrate data, and spatial bias may have negatively affected their models -as demonstrated by subsequent field validation efforts. Building from that study, Anderson et al. (2016b) used BRT and Maxent to model suitable habitat for a wide range of VME indicator taxa in the New Zealand EEZ and immediately adjacent areas. Their habitat suitability models generally show similar spatial patterns as the results in this study for the same taxa, but typically predicted a much larger area of highly suitable habitat for most taxa. For example, highly suitable habitats for S. variabilis, E. rostrata, G. dumosa, M. oculata, Stylasteridae, and Antipatharia were predicted to occur extensively on and around the Campbell and Bounty plateaus, while in our study suitable habitats in this region were generally constrained to small patches. Interestingly, Pennatulacea also exhibited the lowest model performance in their study, while similarly having the largest predicted distribution of suitable habitat across the region. Previous studies have found that more generalized species with large areas of occupancy (in this case, likely driven by the ability of Pennatulacea to occupy soft bottom substrate on a variety of terrain features), corresponded to worse model performance (Araújo and Williams, 2000;Segurado and Araújo, 2004). However, neither Anderson et al. (2016b) or our study found a significant correlation between sample size and model performance, indicating that this result was not simply an artefact of our available data. An important difference between the two studies (which used a similar set of presence data) was the use of a much larger set of background/absence records by Anderson et al. (2016b) (10,000) based on true target-group observations. A more recent habitat suitability modeling study focused efforts on the Louisville Seamount Chain, an area historically and currently subjected to a bottom trawl fishery that largely targets orange roughy. Based on a camera and bathymetry survey, a suite of high-resolution (25 m) ensemble models were constructed for S. variabilis and two VME habitat indicator taxa, Brisingida and Crinoidea, for six of the chain's 80+ seamounts, using true presence-absence data (Rowden et al., 2017). The authors predicted that only small, rare patches (< 0.1% of the total modeled area) on the flanks of seamounts could be characterized as VMEs, based on the density thresholds specified by New Zealand's Environmental Protection Agency through domestic regulations (see Penney et al., 2009). The abundance and probability of presence of S. variabilis was predicted to be low across much of the seamounts. In contrast, the S. variabilis model produced in our study predicted extensive suitable habitat throughout the Louisville Seamount Chain, with the most suitable habitat frequently occurring on both the tops and sides of seamounts. This discrepancy may indicate that our models generally over-predict VME species occurrence on smaller scale topographical features and that, in general, habitat suitability values from models using pseudo-absence background data should be treated as relative rather than absolute quantities. It is also likely that their use of abundance data and true absence data contributed to an improved ability to classify an appropriate threshold for VME designation on the seamount scale. Previous models (Anderson et al., 2016a) also had apparent accuracy issues in the Louisville Seamount Chain based on subsequent field validation surveys, suggested to have resulted not only from errors in the bathymetry data but also from the lack of true absences, lack of substrate data, and the unique topography of the seamounts compared to the areas that predominantly influenced model training. In particular, the lack of high quality substrate data may be hampering modeling efforts at the seamount scale, as may be a lack of precision in upscaled transforms of critical predictor variables with low native resolutions (e.g., silicate (1°) and carbonate parameters (0.5°); Table 1). Field surveys have found that seamounts in the Louisville Seamount Chain frequently have relatively flat tops with sandy substrate that would not be suitable for larval recruitment of stony corals (Anderson et al., 2016a). While our models were trained using substrate data (percent gravel and percent mud), the raw data were generally concentrated within the New Zealand EEZ and interpolated elsewhere. Therefore, we likely failed to accurately depict substrate at a biologically relevant scale in the Louisville Seamount Chain. Regardless of the causal mechanism, these results highlight the importance of pairing broad-scale regional models with high-resolution models in order to better inform spatial management and produce protection recommendations that can be implemented at appropriate scales.
The habitat suitability modeling in this study represents a considerable improvement over previous regional efforts. In particular, many of the potential sources of error outlined in Anderson et al. (2016a) have been addressed, and we followed many of their recommendations for future studies. Environmental variables were constructed using new bathymetry data that is not prone to the errors and bias found in the SRTM30 data used in previous studies. In addition, we were able to include variables that characterize substrate, which is often a critical factor in determining the distribution of cold-water corals (e.g., Georgian et al., 2014). The modeling techniques used were also more sophisticated. We used a targeted background sampling approach designed to eliminate the deleterious influence of spatial bias in the occurrence datasets. Spatial bias reduces the performance and interpretability of models (Phillips et al., 2009), but can largely be accounted for using our targeted sampling approach (Elith et al., 2010). In addition, we adopted the residual autocovariate approach to reduce spatial autocorrelation (sensu Crase et al., 2012), a common feature of sampling collection that violates the assumption of independent data in many models and has been shown to increase error rates and make it difficult to ecologically interpret the influence of environmental variables (Dormann, 2007;Dormann et al., 2007). Finally, our use of ensemble models constructed using three algorithms (BRT, Maxent, and RF models) reduces the risk that structural inadequacies, inappropriate parameter specifications, or bias in each approach may unduly influence the final outputs (Robert et al., 2016). While each of the three individual modeling approaches predicted similar spatial patterns in habitat suitability, ensemble models are preferred for spatial management applications as they typically produce more robust predictions (Araújo and New, 2007) and can improve model accuracy outside of the primary training region (Randin et al., 2006). Despite these improvements, future studies in the region would benefit from using true absence and abundance data, modeling strategically important areas at a high resolution, and including higher quality substrate data. Furthermore, our models did not account for any uncertainty in the environmental predictor variables, and future modelling efforts should seek to address this issue.

The role of habitat suitability models in the conservation and management of vulnerable marine ecosystems
The predicted areas of suitable habitat for VME indicator taxa overlap with historically trawled regions or places of current fishery interest in the South Pacific Ocean (Clark and O'Driscoll, 2003;Clark et al., 2016;Black and Tilney, 2017;Roux et al., 2017). The apparent overlap of VMEs and prime fish habitat is unsurprising given the strong association of many demersal fish species with seamounts and other elevated seafloor features, and the complex structures that can be created by cold-water corals and sponges on such topography (Costello et al., 2005;Baillon et al., 2012;Kutti et al., 2014). This overlap presents a significant obstacle to limiting adverse impacts to benthic habitats while still maintaining sustainable fisheries. However, it is possible to optimize closures by spatially quantifying the trade-off between biodiversity protection and fishery catch, including with the combined use of habitat suitability models and decision-support tools (e.g., Leathwick et al., 2008;Penney and Guinotte, 2013;Lagasse et al., 2015).
Bottom trawling has known detrimental impacts on benthic communities (Watling and Norse, 1998;Fosså et al., 2002;Clark and Koslow, 2007;Sampaio et al., 2012;Pusceddu et al., 2014). Interim protection measures for VMEs in the SPRFMO convention area include the move-on rule that requires vessels to move bottom trawl operations at least five nautical miles away if a VME taxon is caught as bycatch. While limiting fishing activity in areas known to contain VMEs is paramount, move-on rules are increasingly being questioned due to the risk of expanding the footprint of trawling into new areas and the potential for damage by even a low level of trawling (Williams et al., 2010). Spatial closures are an important tool for effectively managing bottom-trawling fisheries and preventing significant adverse impacts to VMEs in the deep sea. However, designating the location and extent of closures is challenging because the distribution of VMEs is generally poorly known, and there is often industry and political opposition to restricting fishing opportunities. The models presented here will aid in assessments of the effectiveness of current and historical closures, and will inform the creation of improved, long-term management plans for the region. The spatial predictions generated by these models will allow managers to move beyond the current interim protections by enabling future selection of closed areas with habitats that are predicted to be highly suitable for VME indicator taxa. Given the broad range of taxa modeled, our results will also help protect areas that are predicted to be highly suitable for a large diversity of associated taxa, both acting to reduce VME indicator taxa bycatch in trawls and protect key biodiversity hotspots. In addition, models can be used to help ensure that future closures are representative of key seafloor features and biogeographic areas, ensuring adequate protection for all VME associated taxa .
Habitat suitability models also provide a framework for assessing and responding to other anthropogenic stressors that threaten VMEs in addition to bottom trawling. For example, climate change and ocean acidification are projected to impact the distribution of cold-water corals around the world (Guinotte et al., 2006;Tittensor et al., 2010). Cold-water corals in the Pacific Ocean already grow in waters with low aragonite and calcite saturation states (this study and Thresher et al., 2011;McCulloch et al., 2012;Jantzen et al., 2013;Baco et al., 2017). Ocean acidification is projected to significantly reduce aragonite and calcite saturation states (Feely et al., 2012), making it difficult or impossible for corals to continue to calcify and create reef structures (Turley et al., 2007). In concert with reduced pH, temperatures in the deep sea are projected to rise due to ongoing global warming (Mora et al., 2013). As deeper waters acidify and warm, cold-water corals and sponges may exhibit reduced calcification rates (Lunden et al., 2014;Georgian et al., 2016b), altered metabolic activity (Dodds et al., 2007), and changes in gene expression (Carreiro-Silva et al., 2014). Climate change is also expected to result in the decreased export of POC to the seafloor and reduced dissolved oxygen levels (Mora et al., 2013), which may hamper the ability of VME indicator taxa to metabolically compensate for other stressors (Georgian et al., 2016b). Thus, for spatial management measures to be able to accommodate the cumulative impact of fishing and climate change on VMEs, habitat suitability models that use environmental variables adjusted for future climate change scenarios should also be used to inform spatial management processes.

Conclusions
Given the likelihood that benthic communities in the deep sea will be adversely affected by several anthropogenic stressors, it is necessary to devise effective, science-based management strategies for ensuring their continued survival. Restoration of deep-sea environments is generally prohibitively expensive or impossible with current technology, making preemptive management activities critical for preserving these vulnerable habitats (Van Dover et al., 2014). While global stressors stemming from the effects of climate change will be extremely difficult to mitigate, efforts to reduce perturbations from acute stressors such as bottom trawling may prove to be both practical and fruitful. Recently, effort has been put into establishing closed areas for protecting deep-sea ecosystems from the impacts of trawling, and habitat suitability modeling can and has assisted in the design and evaluation of these conservation efforts (e.g., Knudby et al., 2013;Guinotte and Davies, 2014;Rooper et al., 2014). We recommend that regional and local habitat suitability models be used in conjunction to identify and characterize the distribution of VMEs and inform the spatial management planning of fisheries organizations.

Funding
This study was funded by the New Zealand Ministry for Primary Industries (MPI) as part of the project "Analytical support for development of a new SPRFMO bottom fishing measure" (SEA2016-22). MPI reviewed the manuscript prior to submission but did not have a role in the collection, analysis, or interpretation of our data.

Declarations of interest
None.