Improving predictive mapping of deep-water habitats: Considering multiple 1 model outputs and ensemble techniques

36 In the deep sea, biological data are often sparse; hence models capturing relationships 37 between observed fauna and environmental variables (acquired via acoustic mapping 38 techniques) are often used to produce full coverage species assemblage maps. Many 39 statistical modelling techniques are being developed, but there remains a need to determine 40 the most appropriate mapping techniques. Predictive habitat modelling approaches 41 (redundancy analysis, maximum entropy and random forest) were applied to a heterogeneous 42 section of seabed on Rockall Bank, NE Atlantic, for which landscape indices describing the 43 spatial arrangement of habitat patches were calculated. The predictive maps were based on 44 remotely operated vehicle (ROV) imagery transects, high-resolution autonomous underwater 45 vehicle (AUV) sidescan backscatter maps and ship-based multibeam bathymetry. Area under 46 the curve (AUC) and accuracy indicated similar performances for the three models tested, but 47 performance varied by species assemblage, with the transitional species assemblage showing 48 the weakest predictive performances. Spatial predictions of habitat suitability differed 49 between statistical approaches, but niche similarity metrics showed redundancy analysis and 50 random forest predictions to be most similar. As one statistical technique could not be found 51 to outperform the others when all assemblages were considered, ensemble mapping 52 techniques, where the outputs of many models are combined, were applied. They showed 53 higher accuracy than any single model. Different statistical approaches for predictive habitat 54 modelling possess varied strengths and weaknesses and by examining the outputs of a range 55 of modelling techniques and their differences, more robust predictions, with better described 56 variation and areas of uncertainties, can be achieved. As improvements to prediction outputs 57 can be achieved without additional costly data collection, ensemble mapping approaches have 58 clear value for spatial management. 59

performance varied by species assemblage, with the transitional species assemblage showing 48 the weakest predictive performances. Spatial predictions of habitat suitability differed 49 between statistical approaches, but niche similarity metrics showed redundancy analysis and 50 random forest predictions to be most similar. As one statistical technique could not be found 51 to outperform the others when all assemblages were considered, ensemble mapping 52 techniques, where the outputs of many models are combined, were applied. They showed 53 higher accuracy than any single model. Different statistical approaches for predictive habitat 54 modelling possess varied strengths and weaknesses and by examining the outputs of a range 55 of modelling techniques and their differences, more robust predictions, with better described 56 variation and areas of uncertainties, can be achieved. As improvements to prediction outputs 57 can be achieved without additional costly data collection, ensemble mapping approaches have 58 clear value for spatial management. 59 KEYWORDS: Cold-water corals, Deep sea, Ensemble approaches, Habitat mapping, 60 Megabenthos 61 62 As the anthropogenic footprint extends deeper into our oceans, reliable descriptions of 63 the seafloor and the species present are required to devise appropriate management and 64 conservation measures. With very limited areas of seafloor mapped at comparable resolution 65 to terrestrial environments (Sandwell et al., 2006), quantitative spatial information regarding 66 distributions of marine biotic and abiotic components is needed to build benthic habitat maps 67 (Kostylev et al., 2001). Recent advances in acoustic techniques for seafloor mapping (Brown 68 et al., 2011) have made it possible to create detailed geomorphological maps more rapidly. 69

INTRODUCTION
However, the biological information needed to supplement complete coverage topographic 70 and geological maps has remained limited owing to the time-consuming process of specimen 71 collection and taxonomic identification (Przeslawski et al., 2011). 72 Full coverage biological sampling is often not an option, and hierarchical approaches 73 involving nested survey designs are often employed. They involve a combination of broader-74 scale geological map creation based on acoustic data, and detailed ground-truthing biological 75 studies covering smaller spatial extents, often taking the form of imagery transects (Elvenes 76 et al., 2014; Robert et al., 2015). These broader-scale geological maps can be used to define 77 habitat patches allowing the relationships between the spatial arrangement of these patches 78 within the surrounding landscape and their effect on species spatial patterns (Turner and 79 Gardner, 1991) to be examined, modelled and used to make biological predictions across the 80 larger extent covered by the acoustic surveys. The spatial arrangement of habitat patches can 81 be described using a variety of class and landscape metrics, the former used to describe 82 properties of patches from a single habitat type while the latter are used to characterise all 83 patches present within a landscape (McGarigal et al., 2012). Although such metrics have 84 been shown to help explain species spatial patterns (Teixidó et al., 2002), they have so far 85 rarely been employed for predictive mapping. 86 overlapping, but slightly extended area was put forth as candidate Special Area of 136 Conservation (cSAC) with the main aim of protecting stony and biogenic reefs (JNCC, 137 2010), a habitat listed under Annex I of the Habitat Directive (92/43/EEC). As such, AUV 138 mapping was conducted in areas outside of the Fisheries Closure, but still inside the cSAC 139 (M44 and M45) as well as inside both protected areas (M43), to identify the status of the 140 seabed habitats. ROV imagery transects were positioned to sample a variety of sediment 141 types within each of the three areas, including areas of high backscatter likely to harbour 142 cold-water corals. To reduce the influence of spatial autocorrelation, images were 143 systematically subsampled into 8 groups in which neighbouring pictures were located at a 144 distance of 40m ( Figure 2). 145 All individual organisms larger than 1 cm were counted and identified, using 146 morphospecies when species-level identification could not be achieved. 2009; JNCC, 2010) and taxonomic resources (Hayward and Ryland, 1995 ;Mortensen, 151 1927). Sponges were only described to morphological categories as outlined in Bell and 152 Barnes (2001). Parallel lasers (with 10 cm separation) were mounted on the ROVs to provide 153 a scale on all recorded images. Positioning was achieved using the ROVs' ultra-short 154 baseline (USBL) navigation systems. Only common species, which occurred in at least 10 155 images, were retained for the analysis, which was carried out with the images as sampling 156 images were located every 40m. Three statistical approaches were applied separately to each 172 partition and the results were evaluated using the partition whose images were halfway 173 (20m). For each partition, the three statistical approaches were combined to form ensemble 174 models. These steps were carried out for four species assemblages. 175 176 Environmental descriptors were derived from the sidescan backscatter maps 178 (EdgeTech FS2200, 410 kHz). These maps had been classified into sediment interpretation 179 maps (0.5x0.5 m pixel size) representing six seabed facies (soft and mixed sediments, hard 180 substratum, exposed bedrock as well as coral stand and rubble) using an unsupervised 181 classification (Robert et al., 2014). From the sediment interpretation maps, class and 182 landscape indices were derived to describe the shape, size, diversity and spatial arrangement 183  Table 1). Bathymetry 185 and CTD derived environmental variables were examined, but as they did not significantly 186 improve the models, they were not included and are not discussed further. The R package 'SDMTools' was used to compute the metrics and the package 'Snowfall' 193 was used to run the computations in parallel. On smaller datasets, these computations could 194 easily be accomplished on a regular desktop computer (see Appendix A for R code).

Predictive Modelling 233
Four species assemblages (A1-Parastichopus tremulus, A2-Munida sarsi and 234 associated species, A3-Reteporella sp. and various sponge spp., and A4-Lophelia pertusa 235 and associated species) were identified using K-mean classification, ANOSIM and 'species Habitat suitability predictions were made separately for each of the four species assemblages. 263 Presence/absence predictions were obtained by setting the threshold level to optimize 264 sensitivity and specificity. 265

Random Forest 266
Random Forest (RF) is a technique that allows for the building of multiple trees for a 267 dataset, hence the term forest (Breiman, 2001). Each tree is built based on a sub-sample of 268 the data and at each node the data are split based on the best predictor variable, selected out 269 of a smaller number of randomly selected variables. A probability estimate can be obtained 270 based on the number of votes given to each class for a given pixel. Forests were built using a 271 varying number of trees and environmental variables, but a forest containing 1,000 trees and 272 considering 15 environmental predictors per node was selected. 273

Model Evaluation 274
To minimize spatial autocorrelation between the training and testing datasets, 275 systematic data splitting was carried out. For each of the 8 data partitions, the dataset whose 276 images were located at a distance of 20m (for example models based on partition 3 were 277 assessed using images in partition 7, Figure 2 partitions. This measure can vary from 0 (no overlap) to 1 (identical niches). 290

Ensemble Predictions 291
Considering that different models are likely to produce different predictive outputs, 292 but with each containing separate information and areas of uncertainties, the idea of ensemble 293 predictions is to summarise a range of potential outcomes to produce more robust predictions 294 (Araújo and New, 2007). Using the same partitioning of training and test datasets as 295 previously described, for each partition, AUC values for the ensembles were calculated by 296 averaging probability maps from all three models for each species assemblage. Accuracy of 297 the ensemble predictions was calculated by first assigning, for each statistical approach and 298 partition, the species assemblage with the highest predicted probability of occurrence. 299 Subsequently, for each partition, majority voting was carried out based on the species 300 assemblage predicted by each statistical technique. To obtain a visual depiction of prediction 301 confidence, the number of models in agreement at each pixel was also calculated. 302

303
For the combined JC-060 and JC-073 datasets, a total of 11,268 individual organisms 304 were observed from 38 morphospecies (present in at least 10 images The three models showed differences in the maps of habitat suitability for the various 315 species assemblages, but measures of environmental niche indicated similarities between 316 model predictions (Table 3)    The areas of variability also differed between models ( Figure 3) and ensemble 386 predictions (Figure 4 and Appendix B) made by combining all three models exhibited a 387 slightly higher accuracy across species assemblages than could be obtained based on any 388 single model (Table 4)

395
By taking advantage of species-environment relationships, abiotic proxies can provide 396 direct applications for the management of natural resources by establishing representations of 397 biotic components via high resolution acoustic survey techniques. The spatial arrangement of 398 habitat patches was successfully included to predict the spatial patterns of four species 399 assemblages across a highly heterogeneous area of seabed. No single approach consistently 400 surpassed the others across species assemblages and although differences occurred between 401 spatial predictions of habitat suitability from the different statistical approaches, ensemble 402 models appeared as a meaningful improvement. 403

Model Predictions 404
Of the three models (RDA, RF and MaxEnt) compared in this study, similar AUC values 405 were obtained, but performance varied by species assemblage. As species turnover generally 406 occurs over a gradient, the predictions showed a similar pattern, and overlap between habitat 407 suitability predictions occurred, particularly between Assemblage A3 and A4. This is to be 408 expected as cold-water corals need hard substratum for attachment .  (Wilson, 1979) and in turn provide hard substratum to a number of species. Assemblage A2 427 appeared as a transition between the more defined hard substratum and soft sediment 428 associated fauna, and as such prediction performance for this assemblage generally tended to 429 be lower. Across models, areas of highest disagreement tended to occur at the edge of patches 430 and highlighted the difficulty associated in delineating hard boundaries for otherwise 431 continuous gradients of species assemblages. Albeit at a larger scale, higher levels of 432 discrepancies between modelling techniques have been shown to occur at the edge of a 433 species distribution (Grenouillet et al., 2011). Assemblage A2 tended to be found in 434 particularly complex areas where a high number of patches, of both hard and soft sediments, 435 appeared. On the other hand, Assemblage A1 was found in areas characterised by few large 436 patches in proximity to other large soft sediment patches. Assemblage A3 or even A4 were 437 generally found in regions of harder substratum, particularly if bedrock was present. 438 As these three statistical approaches are based on very different modelling strategies, 439 differences in their predictions are to be expected. Presence-absence models generally 440 provide more information about less suitable habitats (if adequate absences are available). As 441 this information is not available to presence only models, overestimation of suitable habitats 442 can occur (Brotons et al. 2004;Pearson et al. 2006). Results can also depend on species 443 characteristics, with generalist species being more difficult to predict accurately, and absence 444 data being more valuable for such species (Brotons et al., 2004;Marmion et al., 2009a). This 445 might be another reason why lower prediction performances were obtained for Assemblage 446 A2. Overall MaxEnt tended to show a lower niche similarity than RDA and RF, which may 447 be due to its different data requirement. In the case of RDA, classification into assemblages 448 was only conducted after predictions of individual species, and as such could be more 449 affected by difficulties associated with predicting rarer species. However, since species are 450 predicted instead of assemblages, it might also be possible to define potentially new assemblages as occurring in areas outside of the originally sampled locations (Ferrier and 452 Guisan, 2006). RF predictions for Assemblage A4 equalled those of the ensemble model. 453 Other studies have found RF to often equal ensemble approaches (Grenouillet et al., 2011;454 Marmion et al., 2009b), potentially because it already includes a consensus step and might be 455 less affected by species geographical attributes, such as prevalence, range and spatial 456 autocorrelation (Marmion et al., 2009a). On the other hand, Meynard and Quinn (2007)  457 found that although GAM tended to outperform classification trees under many simulated 458 scenarios, the latter were particularly effective at predicting species displaying threshold 459 (on/off) response curves to environmental variables. In the case of Assemblages A3 and A4, 460 a threshold response to the presence of hard substratum could be expected while 461 Assemblages A1 and A2 may be more likely to exhibit more continuous response curves. have the disadvantage of causing issues of spatial autocorrelation which need to be taken into account in order to adequately capture predictive ability (Hirzel and Guisan, 2002;Legendre 477 et al., 2002). In our study, this effect was mitigated through a subsampling scheme which 478 increased distances between sample images used for model building. 479

Ensemble Mapping for Conservation 480
Comparison of the statistical approaches showed differences in predictions, but a single 481 approach did not consistently outperform the others when multiple species assemblages were 482 considered. Instead, our results suggest that taking into account the output of many different 483 models may provide a valuable alternative. Ensembles can be created using an array of 484 approaches (Marmion et al., 2009b), but even the relatively simple approach taken in this 485 study was effective at optimizing different model strengths and increasing accuracy. All 486 three statistical approaches were included in the ensemble mapping of all four species 487 assemblages, but in other cases, the consideration of thresholds for the exclusion of lower 488 performing models may also be valuable. In any case, diversity in the type of approaches 489 selected is needed to increase the likelihood of obtaining better performing ensemble models 490 (Du et al., 2012). Identifying regions of prediction disagreement across models also provides 491 an easy to understand depiction of spatial uncertainties. uncertainties, may not be as important as having a consistent approach with minimum 498 deviation over time from which to monitor change (Strong, 2015). Employing multiple 499 models can increase the variability as some models may perform less adequately for certain 500 assemblages and make it more difficult to assess the degree of change across surveys. 501 However, this should still not preclude the examination of the data using multiple statistical 502 approaches, as one approach may be more sensitive to a given environmental variable and be 503 able to detect change earlier. Once prediction similarly has been ascertained, the final 504 measure of extent could still rely on one specific technique for consistency. 505 Cold-water corals can have a strong impact on local diversity and much effort is being 506 made to improve their conservation (Roberts and Hirshfield, 2004), but owing to limited data 507 spatial planning often must rely only on spatial predictions of habitat suitability. Even so, 508 these maps provide greater insights into their spatial distribution patterns, which helps in 509 understanding their ecology and supports adequate management better than single point 510 observation obtained from limited imagery transects or physical samples. As illustrated by 511 the case of Rockall Bank, different statistical approaches may provide different predictive 512 maps of coral suitability. Predictions of assemblage A4 (mostly composed of the cold-water 513 coral L. pertusa and associated filter-feeding species), the least common assemblage, were 514 particularly sensitive to changes in modelling approach. For example, if only random forest 515 had been considered, it would have been tempting to conclude that area M44 was as suitable 516 a conservation area as M45. However, M44 was only found to contain coral rubble in ROV 517 video surveys, likely resulting from past trawling activities. Ensemble models better 518 represented the spatial patterns observed in the video survey as they highlight areas where 519 predictions were consistent across at least two models. 520 Even though it is the broader-scale patterns in species distributions that may be of 521 interest for management purposes, it is the fine-scale habitat characterisation of the 522 environment, through high-resolution sidescan sonar mapping, that allowed the heterogeneity 523 of the region to be accurately captured and the driving processes identified. The ship-board 524 bathymetry survey carried out during JC-60 covered less than 10% of the 4,365 km 2 525 conservation zone and took approximately 2.3 days. Although of much higher resolution than other datasets available for the remainder of this area, compared to the even higher 527 resolutions obtained with the AUV, the ship-board dataset was of limited use in explaining 528 species distribution patterns for the extent covered in this survey (Robert et al. 2014). It is 529 clear that AUV mapping shows great promise for marine management; however there 530 remains a distinct trade-off between the resolution achieved and the extent that can be 531 covered. With current AUV technologies, Autosub6000 can be sent out from a ship to 532 autonomously map an area for ~30 hrs, covering a distance of ~150 km (the size of the 533 resulting area mapped will vary based on the acquired resolution) (Wynn et al., 2012). In 534 order to map the entirety of the conservation zone to the resolution acquired in this study, 535 >200 days would be required. This is well outside the scope of most scientific cruises or 536 conservation projects, but AUVs have been successfully employed to target certain features 537 in other conservations zones such as Haig Fras and the Darwin Mounds (Wynn et al., 2012). 538 The Marine Autonomous and Robotic Systems (MARS) facility is also currently working on 539 developing long-distance AUVs which could be deployed from shore to reach the closer 540 offshore conservation areas with the aims of eventually covering greater extents at high 541 resolutions and instituting repeat long-term monitoring of specific areas without the need for 542 expensive ship-based surveys. 543

Conclusion 544
Predictive habitat maps are of great use for marine management as they represent the 545 best available information to support decision making, but, as they are typically based on a 546 very limited amount of data, they should only serve as general guides until more data become 547 available. The presentation of uncertainty maps should help emphasize this point and can be 548 employed to help select target areas for which further biological sampling will be particularly 549 valuable. Uncritical reliance on a particular statistical method, without comparison with 550 others, may lead to decisions being biased by the chosen method since predictions made from 551 different modelling strategies have been shown to give differing outputs, but whose 552 combination into ensemble models can lead to increased accuracy. Comparison between 553 statistical methods showing one method to outperform the others may not always be 554 extendable to other habitats, species or assemblages, and similarly our results cannot be 555 perfunctorily generalized to all habitats. However, in cases where one statistical approach 556 cannot be identified as performing significantly better, ensemble approaches may provide an 557 elegant alternative. Although this approach can be more involved than other techniques, the 558 additional work requires no further costly sampling or access to specialized equipment and 559 potential increases in prediction performances are clearly of value for spatial planning. 560 561 ACKNOWLEDGEMENTS 562 We would like to thank the captain, crew, technicians and scientific party of cruises 563 JC-060 and JC-073. We would also like to acknowledge the following funding sources: HPC Facility, and associated support services at the University of Southampton. Finally, a 570 special thanks to Prof.Paul Tyler for helpful discussions and support throughout the PhD.