Improving the predictive capability of benthic species distribution models by incorporating oceanographic data – towards holistic ecological modelling of a submarine canyon

Submarine canyons are associated with increased biodiversity, including cold-water coral (CWC) colonies and reefs which are features of high conservation value that are under increasing anthropogenic pressure. Effective spatial management and conservation of these features requires accurate distribution maps and a deeper understanding of the processes that generate the observed distribution patterns. Predictive distribution modelling offers a powerful tool in the deep sea, where surveys are constrained by cost and technological capabilities. To date, predictive distribution modelling in canyons has focussed on integrating groundtruthed acoustically acquired datasets as proxies for environmental variables thought to influence faunal patterns. Physical oceanography is known to influence faunal patterns but has rarely been explicitly included in predictive distribution models of canyon fauna, thereby omitting key information required to adequately capture the species-environment relationships that form the basis of predictive distribution modelling. In this study, acoustic, oceanographic and biological datasets were integrated to undertake high-resolution predictions of benthic megafaunal diversity and CWC distribution within Whittard Canyon, North-East Atlantic. The main aim was to investigate which environmental variables best predict faunal patterns in canyons and to assess whether including oceanographic data improves predictive modelling. General additive models, random forests and boosted regression trees were used to build predictive maps for CWC occurrence, megafaunal abundance, species richness and biodiversity. To provide more robust predictions, ensemble techniques that summarise the variation in predictions and uncertainties between modelling approaches were applied to build final maps. Model performance improved with the inclusion of oceanographic data. Ensemble maps identified areas of elevated current speed that coincided with steep ridges and escarpment walls as the areas most likely to harbour CWCs and increased biodiversity, probably linked to local hydrodynamics interacting with topography to concentrate food resources. This study shows how incorporating oceanographic data into canyon models can broaden our understanding of processes generating faunal patterns and improve the mapping of features of conservation, supporting effective procedures for spatial ecosystem management.

including oceanographic data improves predictive modelling. General additive models, random forests and boosted regression trees were used to build predictive maps for CWC occurrence, megafaunal abundance, species richness and biodiversity. To provide more robust predictions, ensemble techniques that summarise the variation in predictions and uncertainties between modelling approaches were applied to build final maps. Model performance improved with the inclusion of oceanographic data.
Ensemble maps identified areas of elevated current speed that coincided with steep ridges and escarpment walls as the areas most likely to harbour CWCs and increased biodiversity, probably linked to local hydrodynamics interacting with topography to concentrate food resources. This study shows how incorporating oceanographic data into canyon models can broaden our understanding of processes generating faunal patterns and improve the mapping of features of conservation, supporting effective procedures for spatial ecosystem management.

Introduction
Submarine canyons are environmentally complex geomorphological features that incise continental margins and act as conduits between the shelf and the deep sea (Allen and Durrieu de Madron, 2009, Huvenne and Davies, 2014, Puig et al., 2014, Amaro et al., 2016, Fernandez-Arcaya et al., 2017.
Canyons are characterised by high spatial and temporal heterogeneity in environmental conditions (De Leo et al., 2014, Amaro et al., 2016, often resulting in enhanced regional and local productivity, biodiversity, and faunal abundance , Vetter et al., 2010, De Leo et al., 2014. Reefforming cold-water coral colonies (from here indicated as CWC) and reefs in particular represent features of high conservation value that can occur within canyons and are under increasing anthropogenic pressure (92/43/EEC, 1992, OSPAR, 2008. Accurate distribution maps of these features, in addition to an understanding of the processes that drive the observed spatial patterns, can support their effective spatial management and conservation (Huvenne and Davies, 2014, Buhl-Mortensen et al., 2015, Anderson et al., 2016a. In the deep sea, where surveys are constrained by costs and technological capabilities, predictive mapping offers a powerful tool for such studies. (Robert et al., 2015, Anderson et al., 2016a. Predictive mapping is based upon models of species-environment relationships that enable predictions of the likely occurrence of species beyond where they have been sampled Zimmermann, 2000, Guisan andThuiller, 2005). These techniques are based upon concepts of niche theory, whereby species' distributions are determined by the environmental dimensions of their ecological niche (Guisan and Zimmermann, 2000). Therefore, accurate predictions rely upon the incorporation of ecologically relevant environmental data collected at resolutions which capture the scale at which these variables influence species spatial patterns (Lecours et al., 2015, Miyamoto et al., 2017, Misiuk et al., 2018, Porskamp et al., 2018.
In submarine canyons, acoustically derived environmental variables (e.g., depth, slope) are routinely used as indirect proxies for direct and resource variables (sensu Guisan and Zimmermann, 2000) including, water mass characteristics (temperature, salinity, potential density, dissolved oxygen concentration, aragonite compensation level and pH), substratum, seafloor characteristics, current exposure and food supply (Wilson et al., 2007, Robert et al., 2015; all of which have been shown to act at multiple scales to influence faunal patterns in canyons (De Mol et al., 2011, Howell et al., 2011, Baker et al., 2012, De Leo et al., 2014, Bargain et al., 2018. For example, water mass characteristics tend to influence canyon fauna at spatial scales of 10 -1000 km (Dullo et al., 2008, Fabri et al., 2017 at which resolution they often co-vary with depth (Henry et al., 2014). On the other hand, spatial variation in seafloor characteristics and substratum are influential at finer resolutions of <1 -10 km (Howell et al., 2011, Robert et al., 2015, Fabri et al., 2017, which can be captured by terrain derivatives such as slope and rugosity (Wilson et al., 2007, Howell et al., 2011. Equally at this resolution, aspect can provide insights into areas that may be more exposed to currents (Wilson et al., 2007, Robert et al., 2015.
However, the sole use of indirect variables as proxies can hinder ecological interpretation, as a single proxy can be collinear with multiple direct and/or resource variables across varying scales (Wilson et al., 2007, Porskamp et al., 2018 and because the measured proxy does not influence organisms' distributions directly, it can lead to further predictive inaccuracies. In addition, environmental data are often acquired at low resolutions that reflect technological constraints rather than being ecologically meaningful (Verfaillie et al., 2009, Huvenne and Davies, 2014, Ismail et al., 2015, Lecours et al., 2015, Porskamp et al., 2018. These data are then incorporated into models at a pre-determined single fixed resolution as opposed to the increasingly advocated approach of incorporating data at multiple resolutions to then statistically identify the resolution that best captures the variability in the environment to which fauna are responding (Wilson et al., 2007, Fourniera et al., 2017, Porskamp et al., 2018. Consequently, the use of indirect variables together with the mismatch of resolution between ecological processes and data sampling represent key limitations of predictive model and map accuracy and precision (Brown et al., 2011, Lecours et al., 2015, Porskamp et al., 2018.
Physical characteristics of the water column and oceanographic processes are known to influence faunal patterns, including those of CWCs (Dullo et al., 2008, De Mol et al., 2011, Flögel et al., 2014, Fabri et al., 2017 but have rarely been included in predictive models of canyon fauna, one exception being Bargain et al (2018). In canyons supporting intense hydrodynamic processes Carter, 2011, Aslam et al., 2018) variability in faunal patterns has been observed and attributed to the increased heterogeneity in physical oceanography (Huvenne et al., 2011, Johnson et al., 2013. As such, canyons represent model systems for testing the role of physical oceanography in controlling faunal distribution patterns. Here we develop predictive distribution models for CWCs and epibenthic megafaunal biodiversity using a multiscale approach integrating bathymetric and oceanographic datasets and their derivatives in the Whittard Canyon (North-East Atlantic) to investigate which environmental variables best predict faunal patterns. Finally, we aim to assess how the inclusion of oceanographic variables affects model performance, testing the null hypothesis that the inclusion of physical oceanographic variables in distribution models has no effect on model accuracy or precision.

Study area
Whittard Canyon is located along the Celtic Margin, south-west of the British Isles in the Northern Bay of Biscay and extends >200 km (Figure 1). It is a dendritic canyon system comprised of four main tributaries, the Western-, Western Middle-, Eastern Middle-and Eastern-branches, incising the shelf edge at a depth of ~200 m and coalescing at ~3700 -3800 m water depth, then developing as Whittard Channel up to a depth of ~4500 m, where the it joins the Celtic Fan that leads onto the Porcupine Abyssal Plain (Hunter et al., 2013, Amaro et al., 2016. Intensified bottom currents and internal tides have been associated with the canyon, making it a good candidate for investigating the impact of physical oceanography on faunal patterns (Reid and Hamilton, 1990, Hall et al., 2017, Aslam et al., 2018. Within the canyon system are the Dangaard and Explorer Canyons that together constitute the only deep-sea marine conservation zone (MCZ) within English waters. The Canyons MCZ designation is based upon the presence of the 'Deep-sea bed' broadscale habitat and 'Cold-water coral reefs', 'Coral gardens' and 'Sea-pen and burrowing megafauna communities' habitat features of conservation interest (JNCC, 2013, DEFRA, 2019. Accurate predictive maps of these features based on key environmental predictors are essential to assist effective management of the MCZ. This study focuses on the Eastern branch of Whittard Canyon and the adjoining Dangaard and Explorer Canyons ( Figure 1). This region of the Whittard Canyon system was chosen as the Eastern branch has been identified as the most hydrodynamically energetic while the Dangaard and Explorer Canyons incise the Brenot Spur, which is postulated to be a generation site for the internal tide that propagates into the Eastern branch (Aslam et al., 2018).
Whittard Canyon exhibits heterogeneity in both physical and oceanographic attributes. The geomorphology and substrata of the canyon are complex, with variability observed along the canyon axis and between branches , Robert et al., 2015, Amaro et al., 2016, Ismail et al., 2018. The heads of the canyons are characterised by steep-sided walls and coarser substrata (outcropping bedrock, boulders and cobbles) (Carter et al., 2018). Where the branches coalesce, the Whittard Channel leads further downslope to the depositional fan comprised of finer grained substrata (fine sand, silt and hemiplegic ooze). Sediment dynamics within the canyon are poorly understood.
Although developing on a passive margin, Whittard Canyon does experience sediment dynamics (Amaro et al., 2016, Carter et al., 2018. Resuspension by intensified bottom currents and local slope failures within the canyon facilitate the availability of fine grained material (Reid and Hamilton, 1990, Amaro et al., 2015, Amaro et al., 2016, Hall et al., 2017, Carter et al., 2018 which is then transported via active down-slope transport in the form of turbidity currents and mud-rich sediment gravity flows (Cunningham et al., 2005, Amaro et al., 2016.
As it descends, the canyon intersects several water masses, including the Eastern North Atlantic Water (ENAW) (~100 -600 m), the Mediterranean Outflow Water (MOW) (800 -1200 m) and the Northeast Atlantic Deep Water (NEADW) (1500 -3000 m), within which occurs a core of Labrador Sea Water (LSW) (~1800 -2000 m) (Pollard et al., 1996, Van Aken, 2000. Mixing occurs along the water mass boundaries (Van Rooij et al., 2010). Barotropic tidal currents interact with the steep canyon topography converting some of the energy into baroclinic internal waves (Allen andDurrieu de Madron, 2009, Hall et al., 2017) and partly standing internal waves have been observed within the Eastern branch (Hall et al., 2017). Internal wave driven turbulent mixing is associated with increased concentrations of particulate organic matter (POM) and nepheloid layer production within the canyon (Wilson et al., 2015, Hall et al., 2017, Aslam et al., 2018.  , 1920 x 1080 pixels). Positional data were derived from the ROV's ultra-short baseline navigation system (USBL). A total of nine dives were completed in the Eastern branch ( Figure 1 and Table 1) at an average speed of ~0.08 m s -1 and an average camera height of 3 m from the seafloor (Robert et al., 2015). Video footage from the dives was analysed with all epibenthic megafauna >10 mm annotated and georeferenced, organism size was estimated from a laser scale with parallel beams positioned 10 cm apart. Due to limited species taxonomic knowledge for the area, fauna were identified to the lowest taxonomic level possible and identified as morphospecies (visually distinct taxa). To ensure consistency in nomenclature and improve comparability of annotations, the developed morphospecies catalogue was based upon the CATAMI nomenclature (Althaus et al., 2015) and cross-referenced against the Howell and Davies (2010) morphospecies catalogue for the North-East Atlantic Deep Sea. Those sections where the ROV altitude was >4 m for extended periods, prohibiting annotations, were noted by time and not considered in subsequent analysis. Video data annotations from the JC010, JC036 (previously annotated by Robert et al (2015)) and JC125 cruises were combined into a single data matrix with possible annotator bias in the combined dataset assessed following the protocol set out in Durden et al (2016) (see supplementary materials S1.1). Transects were subdivided into 50 m length sections and the morphospecies records within each section consolidated, with Species richness, Simpson's reciprocal index (1/D) (Simpson, 1949) and megafaunal abundance calculated for each 50 m section sample. These metrics were chosen as together they capture the key faunal responses to environmental heterogeneity Barry, 2010, Amaro et al., 2015). Presence-absences for three scleractinian reef forming species, Desmophyllum pertusum (formerly Lophelia pertusa), Madrepora oculata and Solenosmilia variabilis were combined to provide a CWC presence-absence value. This was recorded because reef forming scleractinians represent features of high conservation value that are often associated with increased diversity (OSPAR, 2008, 92/43/EEC, 1992. Additionally, as long-lived immobile filter feeders that are associated with sustained hydrodynamics (Dullo et al., 2008, Howell et al., 2011, Fabri et al., 2017, CWCs represent good candidates for investigating the role of physical oceanography on faunal distributions. All statistical analyses were conducted using the open source software R (R_Core_Team, 2014), packages "sp", "maptools", "rgeos", "vegan", "clustersim" and "MASS".  (Masson, 2009 and Kongsberg Simrad EM1002 MBES system of RV Celtic Explorer (MESH, Davies et al., 2008). Bathymetry data were processed utilising CARIS HIPS & SIPS v.8 and combined utilising the mosaic to new raster tool in ArcGIS 10.4.1, to produce a new grid at a resolution of 50 m (WGS1984, UTM Zone 29N).
The bathymetric position index is a derived metric of a cell's position and elevation relative to its surrounding landscape/cells within a user defined area (Wright, 2005). A combination of broad and fine scale BPI metrics were derived to enable features at varying scales to be identified (Wilson et al., 2007).
Broad scale BPI was calculated using a neighbourhood analysis based upon an annulus with an inner radius of 2 pixels and an outer radius of 20 pixels with a scale factor of 1000. Fine scale BPI was calculated using a neighbourhood analysis based upon an annulus with an inner radius of 1 pixel and an outer radius of 2 pixels with a scale factor of 100. Rugosity is a measure of the ratio of the surface area to the planar area and was calculated with a neighbourhood size of 3 x 3 pixels (Wilson et al., 2007). Other scales were also assessed (Supplementary S1.2). Slope is a measure of change in elevation, and aspect (subsequently converted to eastness and northness) measures the orientation of maximum change along the slope. Curvature is a measure of the shape of the slope, with values indicating whether a slope is convex or concave. Three types of curvature were calculated: profile, planar and general.
Each accentuates different aspects of slope shape and can provide indirect measures of different processes relating to flow, erosion and deposition within the canyon (Wilson et al., 2007).
To capture the range of spatial scales at which the terrain derivatives may affect faunal distributions, a multiscale approach was implemented, whereby terrain variables were derived from bathymetry gridded at 50, 100 and 500 m. Statistical modelling (following the same protocol to assess predictive value of variables as detailed in section 2.3) was then applied to identify the most ecologically meaningful resolution to use for each variable, identified as those derivatives contributing the greatest to variance explained. Terrain derivatives from bathymetry gridded at 50 m were found to be optimal (Supplementary S1.2), and were exported as rasters at 50 m resolution (Figures 2 and 3) for further modelling.
Bathymetric slope criticality to the dominant semi-diurnal internal tide was calculated (Supplementary S1.3) from the processed bathymetry gridded at 50 m and the potential density derived from a shipbased CTD cast acquired during JC125 ( Figure 1). Bathymetric slope criticality to the dominant semidiurnal internal tide (α) can identify potential areas within the canyon where up-slope propagating waves could be reflected back down-slope toward the canyon floor (supercritical, α > 1), be focussed toward the head of the canyon (subcritical, α < 1) or, become trapped (near-critical, α = 1) resulting in waves breaking and mixing (Hall et al., 2017).

Oceanographic data processing and derived environmental variables
Near bottom values for absolute salinity and conservative temperature were extracted from the Canyon region for 32 M 2 tidal cycles (Aslam et al., 2018). Both R.M.S baroclinic and barotropic current speed were calculated to differentiate between the influences of the two tides that exhibit different spatial patterns across the canyon system ( Figure 3).
In order to represent the physical oceanographic conditions experienced by the benthos and match the resolution of the depth and terrain derivatives, the oceanographic data were interpolated into rasters at 50 m resolution in ArcGIS ( Figure 3). Interpolation was based upon spatial variograms calculated in Golden Software Surfer V 8 and undertaken by kriging using the Spatial Analyst tool box in ArcGIS.
To account for discrepancies in bathymetric resolution between the physical oceanographic models and the bathymetry gridded at 50 m, bathymetry from the models was also exported and rasters created.
Depth discrepancies between the datasets were accounted for by extracting oceanographic and current values from the nearest corresponding depths to that of the bathymetry gridded at 50 m.   (Zuur et al., 2014a). For each group of correlated environmental variables, modelling using various techniques was undertaken (as described below) with a representative of each group added in turn to assess its predictive value by reviewing diagnostic plots of residuals and when model assumptions were met, retaining those that explained the greatest variance and gave the lowest Akaike's Information Criterion (AIC) score ( Table   2). The AIC score is commonly applied to compare model performance and measures the goodness of fit and model complexity reflecting the variance explained penalised by the number of explanatory variables. A lower AIC score indicates a better model fit (Zuur et al., 2014a). This resulted in four of the 12 environmental variables being retained. Random forests (RF) is a classification method that builds multiple trees based upon splitting rules that maximise homogeneity in response to predictors within branches, starting each time with a randomised subset of data points and predictor variables (Breiman, 2001, Prasad, 2007. RF was chosen because it makes no underlying assumption of the distribution of the response variable, is robust to overfitting, allows for interactions between environmental variables and nonlinear relationships between the response and environmental variables (Cutler et al., 2007, Prasad, 2007. RF was run in classification mode for CWC presence-absence data and regression mode for the continuous response variables. Abundance was log+1 transformed. Each random forest was run with 1500 trees and the number of variables chosen at each node split set to default (square root of the number of variables in the model for classification and two for regression) with the out of bag (OOB) settings set as default (Breiman and Cutler, 2018).
Boosted regression trees (BRT) is a combined classification and regression method that builds a sequence of regression trees, with the initial tree fitted to the entire dataset and subsequent trees added to fit the remaining residuals (Elith, 2008). BRT was chosen as this method is robust to differing resolutions of data input and accommodates interactions and nonlinear relationships (Elith, 2008). BRT models were developed with cross validation on data using a tree complexity of 3 and learning rate of 0.001 with the optimum number of trees determined using a step forward function using k-fold cross validation. These parameter settings were chosen to ensure a minimum of 1000 trees were created and that the models did not overfit the data (Elith, 2008, Elith andLeathwick, 2009). For CWC presenceabsence, a Bernoulli distribution was assumed, for species richness a Poisson distribution was assumed.
Abundance was log+1 transformed to improve normality and modelled with a Gaussian distribution.
Environmental variables were assessed using the inbuilt gbm.simplify function that specifies the optimum number of variables by dropping the least contributing variables and comparing deviance minimum error and model variance with and without that variable (Elith, 2008).
Generalized additive models (GAMs) are generalised models with smoothers and link functions based on an exponential relationship between the response variable and the environmental predictor variables (Zuur et al., 2014b). This method was chosen because it can accommodate nonlinear relationships and produces ecologically intuitive outputs (Zuur et al., 2014a). GAMs have successfully been applied to model the distribution of marine species and habitats (Robert et al., 2015). The degree of smoothing for the environmental variables was selected based on the generalized cross validation (GCV) method and a log link function was used for all models except CWC presence-absence where a logit link function was used for the binary response. For CWC presence-absence, a Binomial distribution was assumed.
For species richness and 1/D a Gamma distribution was assumed after exploring several alternative distributions (Gaussian, Poisson, quasi-Poisson and Negative-Binomial). Abundance was log+1 transformed to improve normality and modelled with a Gaussian distribution. Environmental variables were assessed by a backward-step selection, whereby the environmental variables resulting in the lowest deviance explained were dropped one at a time and the model refitted until only statistically significant (p value <0.05) variables remained in the models. Overall model fit was then compared and the most parsimonious model, identified as that containing those environmental variables that explained the maximal amount of variance whilst giving the lowest AIC score, was selected.

Model performance
Model performance was assessed using a cross-validation procedure in which models were trained using a random partition of data (70%) and tested against the remaining portion (30%) (Guisan and Zimmermann, 2000). Model accuracy was assessed in terms of the model fit to the training dataset using AIC scores, diagnostic plots and variance explained (Adjusted R 2 ). Predictive performance was assessed using the Area Under the Receiver operating Curve (AUC) score for CWC presence-absence (Elith and Leathwick, 2009). The AUC score indicates how well the model discriminates presences and absences. An AUC score <0.5 indicates that the model is no better than random and an AUC score >0.7 can be considered as adequately discriminating presences from absences (Lobo et al., 2008). Due to the equal weighting of misclassification errors by the AUC, measures of sensitivity and specificity were also used to assess performance. Sensitivity is the fraction of correctly predicted CWC presences, while specificity is the fraction of correctly predicted CWC absences (Lobo et al., 2008). Predictive performance for the remaining models was assessed with correlation coefficients (linear regression) between the predicted and observed values.

Ensemble Models
To provide more robust predictions, ensemble techniques that summarise the variation in predictions and uncertainties between modelling approaches were applied to build final maps. Ensemble models are important when optimal models cannot be identified. Ensemble model maps based upon weighted AUC scores or correlation coefficients of each of the algorithms were produced for each response variable.

Morphospecies and observed patterns in diversity
A total of 280 morphospecies were annotated from the video data. Xenophyophores (representing ~17% of individuals) were the most abundant morphospecies, followed by Acanthogorgia. sp (~10%), Brachiopoda sp. 1 (~9%), Pentametrocrinus atlanticus (~8%) and Cerianthidae (~7%). Due to poor video quality, Brachiopoda were not annotated from the data collected during JC010 and JC036 and so are omitted from further analysis. The predominant functional groups observed were suspension (filter) feeders, followed by detritivores and carnivores. Highest species richness (48)   with RF generally performing the best and GAM showing higher sensitivity in test datasets (Table 3).
Lower sensitivity values and similar specificity and AUC values suggest a degree of overprediction of CWC occurrences by the models (Table 3). Correlation coefficients (Adjusted R 2 ) between predicted and observed species richness, 1/D and abundance ranged between 0.17 -0.87 (training dataset) and 0.07 -0.46 (test dataset) and were highest for RF, followed by BRT and GAM (Table 4). The superior performance of RF could result from the inadequacy of available modelling distributions for the response variables assumed for BRT and GAM (Zuur et al., 2014a).

Variable contribution in the predictive models
The environmental variables used for optimal models of CWC presence-absence, species richness, 1/D and abundance are shown in Tables 3 and 4. The importance of the environmental variables varied between the modelling algorithms. The models for CWC presence-absence ranked depth, rugosity and R.M.S baroclinic current speed as important predictor variables ( Table 3). Models of species richness and abundance ranked depth as the most important predictor variable, whilst models for 1/D rank the predictor variables inconsistently ( Table 4  Forests (RF) and General Additive Models (GAMs)) that integrate variables including baroclinic current speed (BC_RMS) and excluding baroclinic current speed. Model performance was assessed using a cross-validation procedure in which models were trained using a random partition of data (70 %) and tested against the remaining portion (30 %). Model accuracy was assessed in terms of the model fit to the training dataset using variance explained (Adjusted R 2 ) and for GAMs the Akaike's Information Criterion score (AIC) and for RF the out of bag (OOB) test misclassification error rate. Predictive performance was assessed based upon the test dataset using measures of sensitivity, specificity and the Area under the receiver operating Curve (AUC).

Influence of oceanographic data
The physical oceanographic variables were highly collinear and only R.M.S baroclinic current speed was retained in the optimum models. Overall model performance was improved with the inclusion of R.M.S baroclinic current speed as an environmental predictor variable (Tables 3 -4). Spatial predictions from the ensemble model including R.M.S baroclinic current speed showed increased diversity and increased probability of CWCs in areas of elevated current speed that coincided with steep topography ( Figures 5 -8), while the extent of suitable CWC habitat predicted decreased ( Figure 5). For a CWC occurrence threshold >60%, the suitable habitat reduced from 387 km 2 to 174 km 2 , a decrease of 55%; for a threshold of >70% the habitat reduced from 125 km 2 to 13 km 2 , a decrease of 89 % (thresholds consistent with those applied by Bargain et al., (2018).

Model predictions
Model predictions were made across the full extent of available environmental rasters. However, as the models were trained from samples within the canyon branches, model predictions beyond this extent are deemed less reliable. Therefore, we limit further analysis of model predictions to within the canyon branches.
Ensemble models predicted increased probability of CWCs, and increased species richness, 1/D and abundance at specific depths in areas of increased terrain complexity that coincided with relatively elevated current speed of the internal (baroclinic) tide.
Abundance increased with depth although below 1600 m, the response became negative.
(Supplementary S3.1). The response of abundance to R.M.S baroclinic current speed was variable, becoming negative at speeds greater than 0.25 m s -1 whilst increased slope and seafloor ruggedness resulted in an overall positive abundance response. Ensemble models predicted increased abundance in association with greater terrain complexity. Peaks in abundance were predicted to occur on crests of the ridges between 800 -1600 m (Figure 8). In areas of low terrain complexity below ~2000 -2500 m, on the Southern flanks of the Explorer and Dangaard Canyons as well as at shallow depths along sections of the canyon axis, lower abundance was predicted by the ensemble model (Figure 8). CWCs is predicted on escarpments (1) and slopes of ridges (2) and lower probability is predicted in areas of low terrain complexity (3). Model predictions beyond canyon branches (i.e. on the interfluves and the shelf) are less reliable because training datasets did not include these environments. We have excluded them from our interpretation. (1) and the crests and south facing slopes of ridges (2) while lower species richness is predicted along sections of the canyon axis and of low terrain complexity (3). Model predictions beyond canyon branches (i.e. on the interfluves and the shelf) are less reliable because training datasets did not include these environments. We have excluded them from our interpretation.  . Highest abundance is predicted on the crests of ridges between 800 -1600 m (2) and lower abundance is predicted along sections of the canyon axis and of low terrain complexity (3). Model predictions beyond canyon branches (i.e. on the interfluves and the shelf) are less reliable because training datasets did not include these environments.
We have excluded them from our interpretation.

Environmental variables influencing faunal patterns in canyons
We have identified that depth, terrain complexity and hydrodynamics are important environmental factors influencing faunal patterns in submarine canyons and demonstrated that incorporating physical oceanographic data into predictive models improves their performance.
Spatial heterogeneity in these environmental conditions drives spatial patterns in fauna by providing a greater variety of niches with the potential to support increased species richness and diversity (Levin et al., 2010, De Leo et al., 2014.

Terrain complexity
Terrain complexity is a proxy of seafloor heterogeneity that is positively correlated with diversity (Levin et al., 2010, De Leo et al., 2014. The high terrain complexity of canyons generates spatial heterogeneity in sediment dynamics , Martín et al., 2011, Puig et al., 2017, substratum composition (Huvenne et al., 2011, Huvenne and Davies, 2014 and current exposure (Ismail et al., 2015). Filter feeders, including CWCs, show a preference for such increased terrain complexity (De Mol et al., 2011, Howell et al., 2011, Huvenne et al., 2011, Gori et al., 2013, Rengstorf et al., 2013, Robert et al., 2015, Pierdomenico et al., 2016, Fabri et al., 2017, Bargain et al., 2018. They colonise topographic highs to exploit local current regimes, and so increase food encounter rates (Mohn et al., 2014, Fabri et al., 2017. In our study, increased probability of CWC occurrence, species richness, 1/D and abundance were associated with areas of high terrain complexity (slope and rugosity) over similar spatial scales predicted for macrobenthic diversity in canyons off Hawaii (De Leo et al., 2014). These predictions are supported by previous studies within the canyon system that also predicted CWCs in areas of complex topography (Robert et al., 2015) and observed CWCs and increased epibenthic diversity and abundance in association with steep walls and topographic highs (Huvenne et al., 2011, Johnson et al., 2013, Robert et al., 2015. Our models predicted asymmetric distributions (where a higher prevalence of different taxa is predicted for one or the other canyon flank) between the opposing flanks of both Dangaard and Explorer Canyons. The flanks of the canyons differ in complexity, with higher species richness and probability of CWCs predicted for the more complex northern flanks. Unfortunately the spatial extent of predictive mapping in previous studies does not enable further confirmation of the asymmetric distributions predicted (Robert et al., 2015), but fauna are predicted and observed in association with complex terrain which would support our model predictions , Robert et al., 2015. In other canyons, asymmetric distributions have been attributed to the different geomorphology and hydrodynamics of canyon flanks, with one side more subject to intense hydrodynamics and the other dominated by depositional regimes (De Mol et al., 2011, Fabri et al., 2017, Pierdomenico et al., 2017. Our data suggest, more specifically, that it is the differences in terrain complexity between flanks that result from these processes, together with variation in baroclinic current speeds which generate the observed asymmetric patterns in fauna distribution.
Slope acts as a proxy for substratum type, which is correlated with faunal distributions. The steep slopes of Whittard Canyon are generally associated with hard substratum (Huvenne et al., 2011, Johnson et al., 2013, Robert et al., 2015, Robert et al., 2017, Carter et al., 2018, which is positively correlated with sessile epibenthic diversity as it provides a suitable surface for epifauna to adhere to (Baker et al., 2012). In addition, steep slopes prevent sediment deposition and subsequent smothering of epifauna in these environments affected by high sedimentation rates (Howell et al., 2011, Baker et al., 2012. Steep slopes may also provide refuge for fauna from anthropogenic disturbance caused by fishing gear (Huvenne et al., 2011, Johnson et al., 2013, Pierdomenico et al., 2016. A positive relationship between slope and diversity has been observed previously from Whittard and other canyons (Huvenne et al., 2011, Johnson et al., 2013, Robert et al., 2015, Chauvet et al., 2018. In our study, although highest diversity was recorded from vertical walls, some sections of the walls supported low diversity. This observation suggests that other processes and/or resources are acting together with terrain complexity to influence faunal distributions in canyons.

Food supply and the internal tide
Variability in quality and amount of food supply influences canyon faunal distributions , McClain and Barry, 2010, Cunha et al., 2011, Chauvet et al., 2018. Many benthic species within canyons rely on surface derived POM as their main food supply (Cunha et al., 2011, Miller et al., 2012. Generally, availability of surface derived POM decreases with depth (Lutz et al., 2007). However, in active canyons sediments can regularly be flushed to the deep. In parallel, local hydrodynamics (including internal tides) can cause resuspension of material and generate nepheloid layers at specific depths (Puig et al., 2014, Wilson et al., 2015. Nepheloid layers are concentrations of suspended material (including POM) that represent an important food resource for deep-sea fauna (Demopoulos et al., 2017).
Within Whittard Canyon, nepheloid layers and centres of resuspension have been previously observed 1) where the MOW interacts with areas of complex canyon topography resulting in baroclinic internal wave motion, causing turbulent mixing (Wilson et al., 2015), and 2) associated with the internal tide at depths of 400 -500 m, 900 -1600 m and 1700 -1800 m as well as where internal waves propagate at the boundary between the permanent thermocline 600 -900 m and upper boundary of the MOW (Wilson et al., 2015). In our study, high probability of CWCs occurrence and peaks in species richness, 1/D and abundance are predicted at depths of 800 -1600 m, coinciding with some of the above mentioned areas of resuspension and nepheloid layer production ( Figures 5 -8). Previous studies have also observed high diversity in association with nepheloid layers in Whittard Canyon (Huvenne et al., 2011, Johnson et al., 2013, Robert et al., 2015. The correlated spatial patterns between canyon fauna and nepheloid layer distributions support the importance of food availability, in the form of nepheloid layers, in influencing fauna distributions.
We found that internal tide dynamics correspond to an important factor influencing faunal patterns in canyons, contributing to increased spatial heterogeneity in environmental conditions. Faunal distributions are influenced by the internal tide both directly and indirectly. The internal tide directly influences fauna distributions by current speed and indirectly via its role in the production and distribution of nepheloid layers.
Current speeds exceeding 0.15 m s -1 can cause resuspension of material (Thomsen and Gust, 2000), an important stage in nepheloid layer production. In our study, increased probability of CWC occurrence, species richness, 1/D and abundance coincide with areas of elevated current speed for the internal tide.
CWC occurrences have been linked to intensified bottom currents in a number of settings (Davies, 2009, Howell et al., 2011, Mohn et al., 2014, Rengstorf et al., 2013, including canyons (Bargain et al., 2018). However, our data show that above 0.25 m s -1 species richness and abundance are predicted to decrease. Species vary in their feeding strategies and efficiency under different hydrodynamic regimes (Järnegren andAltin, 2006, van Oevelen et al., 2016). For filter feeders, increased current flow increases food encounter rate up to a limit after which the speed of the current exceeds that at which fauna can extract particles and/or causes physical disturbance (Johnson et al., 2013, Orejas et al., 2016. Our models predict low diversity on relatively flat sections of the canyon floor that experience current speeds exceeding (0.25 m s -1 ), located toward the canyon head of the Eastern branch, and also where the adjoining Dangaard and Explorer canyons intersect the main axis ( Figure 3, 6-8). These are areas expected to experience higher disturbance regimes as mobile sandy sediments are routinely reworked over the tidal cycle, forming an unsuitable substratum for colonisation and abrasing the lower canyon walls. Additionally stochastic/episodic turbidity currents and mud-rich sediment gravity flows travel along the canyon's axis representing major disturbance events (Puig et al., 2014, Amaro et al., 2016. Johnson et al (2013) also attributed low diversity toward the bottom of canyon walls to increasing disturbance toward the canyon floor. It is therefore likely that disturbance is restricting faunal patterns across the canyon floor, and could explain the negative relationship of species richness and abundance with high current speed.
As the internal tide wave propagates, it generates vertical displacement of the isopycnal surfaces and associated nepheloid layers (Hall et al., 2017). Internal waves, turbulent mixing and downslope displacement of water can generally be associated with enhanced resuspension of POM and can control the development of nepheloid layers (Allen and Durrieu de Madron, 2009, Hall et al., 2017, Aslam et al., 2018. Examples of the internal tide interacting with topography enhancing local hydrodynamics to form efficient food supply mechanisms to the benthos, have previously been documented in the Baltimore Canyon (Demopoulos et al., 2017). In other settings the reliance of CWCs on local current regimes to deliver food from the surface has been stressed (Rengstorf et al., 2013, Mohn et al., 2014, Davies, 2009, Mienis et al., 2009, Soetaert et al., 2016. It is probable that a similar process is occurring in Whittard Canyon. Our models predict high diversity and probability of CWCs in areas of complex terrain, especially steep slopes that are critical and supercritical to the dominant semi-diurnal internal tide and experience moderate internal tide current speeds. In their study of nepheloid layers within Whittard Canyon, Wilson et al. (2015) found the distribution of nepheloid layers was associated with the criticality of the slope to the dominant semidiurnal internal tide. Intermediate nepheloid layers were associated with critical conditions, whilst supercritical conditions, that reflect wave energy back down slope to suspend material, were linked to the formation of intermediate nepheloid layers at greater depths. These correlated spatial patterns between canyon fauna, nepheloid layer distributions and criticality support the theory of the interactive processes of the internal tide (local hydrodynamics) and topography in generating spatial heterogeneity in food supply to which fauna respond.

Physical oceanography in canyon modelling
Despite hydrodynamics having been related to epibenthic fauna distributions in canyons (Hargrave et al., 2004, Cunha et al., 2011, Huvenne et al., 2011, Johnson et al., 2013, Fabri et al., 2017, Bargain et al., 2018, there is a paucity of work which really quantifies this relationship as we have done here. Of the few studies that have incorporated hydrodynamics into predictive models, authors also found current speed to be an important environmental predictor (Bargain et al., 2018). In other studies the variable aspect, or its derivative components eastness and northness, used as a proxy for current exposure, have been identified as an important predictor variable .
Our work has shown that by integrating high-resolution hydrodynamic data into predictive models we are able to capture greater environmental heterogeneity beyond that solely represented by terrain proxies (specifically areas of resuspension and nepheloid layer production), and in turn improved the precision of the predicted distribution maps.
Future modelling efforts would benefit from incorporating physical oceanography data. However, highresolution hydrodynamic models have only been developed for a subset of canyons and previous studies that integrated oceanographic data at low resolutions found it difficult to discriminate different environmental conditions (Davies et al., 2008, Davies andGuinotte, 2011). Consequently, integrating oceanographic data at an appropriate scale currently represents the main challenge of high-resolution canyon mapping.

Model limitations
Field validations of deep-sea predictive models have demonstrated that caution should be applied not to over-interpret results (Anderson et al., 2016b). In particular, high spatial heterogeneity in environmental conditions and localised faunal distributions can be difficult to model accurately (Anderson et al., 2016b). As such, model results should be viewed as representing suitable locations rather than actual distributions. The outputs from models are constrained by the data inputs (Lecours et al., 2015, Miyamoto et al., 2017, Misiuk et al., 2018, Porskamp et al., 2018, as demonstrated by our results which differed depending upon the inclusion of hydrodynamics (Table 3 and  The dependence of model performance on data resolution represents a limitation for deep-sea models (Lecours et al., 2015, Miyamoto et al., 2017, Misiuk et al., 2018, Porskamp et al., 2018. In our study the environmental variables temperature and salinity were extracted and interpolated from the FOAMM model that outputted the data at 7 km, which is too coarse a grid size to resolve the fine-scale heterogeneity that influences species distributions in Whittard Canyon. As a result these variables were not retained in the models. The inclusion of finer resolution temperature and salinity data would enable environmental heterogeneity in water mass characteristics to be better characterised. Unfortunately, such fine-scale modelling outputs are rarely available for canyons. Additionally, incorporating oceanographic data metrics of higher temporal resolution that capture temporal variability in addition to mean values, could further improve the predictive value of oceanographic variables, since species distributions are often limited by environmental extremes (Vasseur et al., 2014, Stuart-Smith et al., 2017. Our results suggest that food supply is an important factor influencing species distributions, as such, the inclusion of environmental variables that capture variability in food availability could provide further insights and improve variance explained by models. Lastly, increasing the number of groundtruthed samples, from across the different canyon environments could reduce heterogeneity in the dataset and enable more accurate modelling of species-environment relationships, so improving prediction outside the originally sampled area.
Despite the limitations of predictive modelling, as mentioned above, and despite the limitations of our specific dataset in Whittard Canyon, the results of this study still provide new insights in the functioning of submarine canyons, and in the processes that drive benthic faunal distributions in canyons.

Conclusion
In conclusion, our study has shown that the inclusion of high-resolution oceanographic data into predictive models of CWCs and epibenthic megafaunal biodiversity improves their performance. Our work builds upon previous studies that solely used indirect variables to capture information regarding physical oceanography and provides further evidence within a statistical modelling framework for the role of hydrodynamics, and principally the internal tide, in influencing faunal patterns in canyons.
Highest probability of CWCs and epibenthic diversity occur in areas of complex terrain that are subject to elevated current speed. These areas coincide with areas of probable resuspension and nepheloid layer distribution that represent enriched food resources for epibenthic canyon fauna. Future predictive modelling efforts would benefit from incorporating physical oceanography data at ecologically meaningful resolutions, based upon prior multiscale analysis, helping to ensure accurate habitat mapping of features of conservation interest, which will facilitate effective spatial management.