A field and video annotation guide for baited remote underwater stereo‐video surveys of demersal fish assemblages

Baited remote underwater stereo‐video systems (stereo‐BRUVs) are a popular tool to sample demersal fish assemblages and gather data on their relative abundance and body size structure in a robust, cost‐effective and non‐invasive manner. Given the rapid uptake of the method, subtle differences have emerged in the way stereo‐BRUVs are deployed and how the resulting imagery is annotated. These disparities limit the interoperability of datasets obtained across studies, preventing broadscale insights into the dynamics of ecological systems. We provide the first globally accepted guide for using stereo‐BRUVs to survey demersal fish assemblages and associated benthic habitats. Information on stereo‐BRUVs design, camera settings, field operations and image annotation are outlined. Additionally, we provide links to protocols for data validation, archiving and sharing. Globally, the use of stereo‐BRUVs is spreading rapidly. We provide a standardized protocol that will reduce methodological variation among researchers and encourage the use of Findable, Accessible, Interoperable and Reusable workflows to increase the ability to synthesize global datasets and answer a broad suite of ecological questions.


| INTRODUC TI ON
Our understanding of fish ecology and ability to manage populations appropriately require accurate data on species occurrence, abundance, body size, distribution and behaviour. Remote video-based sampling methods are increasingly being adopted due to: (a) their non-destructive nature, (b) ability to sample rare species (Goetze et al., 2019; Santana-Garcon, Goetze, Saunders, & Cappo, 2018), over broad depth ranges (Heyns-Veale et al., 2016;Wellington et al., 2018), (c) provision of a permanent record that can be reviewed to reduce interobserver variability (Cappo, De'ath, Stowar, Johansson, & Doherty, 2009), (d) ability to collect concomitant data on habitat (Bennett, Wilson, Shedrawi, McLean, & Langlois, 2016;e.g. epibenthic cover and substrate, Collins et al., 2017) and (e) provision of images for science communication.
Remote underwater video sampling methods are not subject to diver safety restrictions, nor do they suffer from the behavioural biases resulting from diver presence (Gray et al., 2016;Lindfield, Harvey, McIlwain, & Halford, 2014). Multiple remote systems can be deployed in the field consecutively to make efficient use of field time and enable spatially extensive sampling (Langlois, Radford, et al., 2012).
The use of bait with remote underwater video (BRUV) systems increases the relative abundance and diversity of fishes observed, particularly species targeted by fisheries, without precluding the sampling of fishes not attracted to bait (Coghlan, McLean, Harvey, & Langlois, 2017;Harvey, Cappo, Butler, Hall, & Kendrick, 2007;Speed, Rees, Cure, Vaughan, & Meekan, 2019). Biases associated with bait use have been discussed in various studies (Coghlan et al., 2017;Dorman, Harvey, & Newman, 2012;Goetze et al., 2015;Hardinge, Harvey, Saunders, & Newman, 2013). Variation in bait plume dispersal and the sensitivity of different fish species to bait is unknown (Harvey et al., 2007), and species-specific (Bernard & Götz, 2012), with cryptic and sedentary species potentially under-represented (Stat et al., 2019;Watson, Harvey, Anderson, & Kendrick, 2005). Despite these limitations, BRUVs have been shown to provide relative measures of species richness and abundance for a range of species in a diverse array of conditions and habitats (Cappo, Harvey, & Shortis, 2006). BRUV systems with stereo-video cameras (stereo-BRUVs) enable precise measurements of body size (Harvey, Fletcher, & Shortis, 2001), which surpass estimates made by divers (Harvey et al., 2001). Both length and biomass distribution data are recognized as essential metrics for biodiversity conservation and fisheries management reporting .
Importantly, stereo-BRUVs provide comparable body size distribution data to fisheries-dependent methods such as trawls (Cappo, Speare, & De'ath, 2004), hook and line (Langlois, Fitzpatrick, et al., 2012) and trap fishing . Despite being considered unsuitable for estimating density, stereo-BRUVs provide a cost-effective and statistically powerful method to detect spatiotemporal changes in the relative abundance, length and biomass distribution of fish assemblages (Bornt et al., 2015;Harvey, Cappo, Kendrick, & McLean, 2013;Malcolm, Schultz, Sachs, Johnstone, & Jordan, 2015). However, in over 275 studies using stereo-BRUVs for a range of objectives (Supporting Information 1), Whitmarsh, Fairweather, and Huveneers (2017) found widespread variation in methodology, which may prevent interoperability of the data.
We provide a widely accepted protocol for the use of benthic stereo-BRUVs including information on design, field operation, image annotation, data validation, archiving and synthesis. By providing a standardized protocol for stereo-BRUVs surveys, we aim to reduce variation in methodologies among researchers, and encourage the use of Findable, Accessible, Interoperable and Reusable (FAIR, Wilkinson et al., 2016) workflows to increase the ability to synthesize datasets and answer broadscale ecological questions.

| S TEREO -B RU Vs DE S I G N
Stereo-BRUV systems consist of a frame (Figure 1a), protecting two convergent video cameras inside waterproof housings, attached to a base bar (Figure 1b), with some form of baited container fixed in front of the cameras (Figure 1e). Systems are generally tethered by rope to surface buoys to facilitate relocation and retrieval 2. We provide the first globally accepted guide for using stereo-BRUVs to survey demersal fish assemblages and associated benthic habitats.
3. Information on stereo-BRUVs design, camera settings, field operations and image annotation are outlined. Additionally, we provide links to protocols for data validation, archiving and sharing.
4. Globally, the use of stereo-BRUVs is spreading rapidly. We provide a standardized protocol that will reduce methodological variation among researchers and encourage the use of Findable, Accessible, Interoperable and Reusable workflows to increase the ability to synthesize global datasets and answer a broad suite of ecological questions.

| Cameras and photogrammetry
We recommend cameras with full, high-definition resolution of at least 1,920 × 1,080 pixels (Harvey, Goetze, McLaren, Langlois, & Shortis, 2010) and a capture rate of at least 30 frames per second (note: some models of action cameras can overheat at high resolution e.g. 4K). Higher camera resolution will improve identification of fish and the pixel selection required for measurement. Higher frame rates reduce blur on fast-moving species. To maintain stereocalibrations, cameras must have video stabilization disabled, and a fixed focal length can facilitate measurements both close to and far from the camera systems when correctly calibrated (Boutros, Shortis, & Harvey, 2015;Shortis, Harvey, & Abdo, 2009). The field of view should be standardized and chosen to limit distortion in the image (e.g. no more than a medium angle, ~95° H-FOV). When sampling demersal fish assemblages at typical maximum range (8 m) from the cameras, Boutros et al. (2015) suggested a camera separation <500 mm will result in a decrease in the accuracy of measurements, with measurement precision being a function of 1/(camera separation). Cameras are fixed to a rigid base bar to preserve the stereo-calibration required to calculate accurate length and range measurements (Boutros et al., 2015;Harvey & Shortis, 1995Shortis et al., 2009). The stereo system pictured in Figure 1 uses two GoPro Hero 5 Black cameras, with camera housings separated by 700 mm with 7° convergence angle on a steel base bar, although 500 mm with a 5° convergence angle is also common.
Stereo-calibrations must be made both prior to and following a field campaign. Given the required tolerances involved with stereo-BRUVs design, we recommend seeking manufacture and calibration advice from recognized providers or adhering to strict specifications. Any changes in camera positioning (e.g. if a camera is dismounted during battery replacement) will disrupt the stereo-calibration, resulting in measurement error. For this reason, most 'off-the-shelf' housings remain unsuitable for stereo-BRUVs. Figure 1i provides an example of a camera that is secured to the housing faceplate to ensure stability.
Each housing and camera should be uniquely identified, ensuring the latter are only used on the system they are calibrated for.

| Bait
As a general rule, locally sourced, sardine-type oily bait is recommended (Dorman et al., 2012), as the oil disperses to attract fish.
Sourcing sardine bait locally from factory discards (e.g. fish heads, tails and guts) will reduce the survey's ecological footprint, cost of sampling and potential for disease translocation. We recommend 0.8-1 kg of roughly crushed bait, positioned between 1.2 and 1.5 m in front of the cameras with the mesh bait bag as close to the benthos as possible. Positioning further than 1.5 m from the camera will reduce the ability to identify and measure individuals. Bait should be replaced after each deployment.

| Deployment duration
Benthic stereo-BRUVs should be deployed for a standard duration.
Deployments of 30 min have been demonstrated to be sufficient for sampling particular species of finfish on shallow temperate reefs (Bernard & Götz, 2012;Harasti et al., 2015).

| Sampling design
Sampling strategies should be designed to ensure valid inferences and interpretations of resulting data (Smith, Anderson, & Pawley, 2017). We F I G U R E 1 Equipment required for baited remote underwater stereo-video system surveys, including (a) mild-steel galvanized frame and bridle, (b) stereo base-bar and camera housings, (c) rope with detachable float line and two floats, (d) storage container for equipment and bait, (e) PVC bait arm (reinforced with fiberglass rod) with mesh bait bag and supporting metal diode arm, (f) metal weights for deep-water or strong current, (g) long-armed glove for handling bait, and (h) dry kit including spare cameras, spare batteries, battery charger, micro-SD card reader, micro-SD cards, standard tools, cable ties to secure bait bags, silicone grease for o-rings and (i) calibrated cameras securely fixed to face plates recommend spatially balanced statistical routines, such as r package MBHdesign , which can incorporate environmental information and legacy sites to create sampling designs with known inclusion probabilities (Foster et al., 2017. Due to the need to revisit each site to retrieve stereo-BRUVs after deployment, spatially balanced designs may be inefficient for sampling large regions (>10 min transit time between samples) and clustered sampling designs may be preferred (Hill et al., 2018).
Individual stereo-BRUVs samples should be separated when set simultaneously to reduce the likelihood of non-independence due to individuals being concurrently sampled by adjacent stereo-BRUVs.
Separation distance will depend on the mobility of the species and the habitat being studied; for typical demersal fish assemblages, a minimum of 400 m for 1-hr deployments is recommended  or 250 m for 30-min deployments (Cappo et al., 2001).

| Field logistics
Vessels fitted with a swinging davit arm, or pot-tipper and winch are ideal for deploying and retrieving stereo-BRUVs in deeper waters ( Figure 2); however, light-weight stereo-BRUVs (Supporting  Myers, Harvey, Saunders, & Travers, 2016). When sampling in low light conditions, both blue (450-465 nm) and white (550-560 nm) lights can be used. White can provide the best imagery for identification (Birt, Stowar, Currey-Randall, McLean, & Miller, 2019), but blue has been found to avoid potential behavioural biases and reduce backscatter from plankton at night (Fitzpatrick, McLean, & Harvey, 2013). Field methodology checklists are provided in Supporting Information 3.

| Software
Software specifically designed to annotate and measure fish from stereo-video will substantially increase the cost-efficiency and con-

| Annotation metadata
Field metadata (Supporting Information 4) should be used to populate a unique code for each sample and annotation set. Time on the seabed should be annotated to provide a start time for the stereo-BRUVs deployment period. It is important that the link between annotations and imagery is maintained.

| Abundance estimates
We recommend all fish be identified to the lowest taxonomic level possible. The standard metric of abundance is MaxN, the maximum number of individuals of a given species present in a single video frame (Priede, Bagley, Smith, Creasey, & Merrett, 1994). MaxN is widely used for BRUVs (Whitmarsh et al., 2017), as it is conservative and ensures that no individual is counted more than once (Schobernd, Bacheler, & Conn, 2013). While it has frequently been suggested that MaxN underestimates both small-and large-bodied individuals, the only study so far to evaluate this has found MaxN provides a representative sample of size distributions (Coghlan et al., 2017). Synchronized and calibrated left and right cameras allow the analyst to determine the range of fish in the field of view and ensure they are within a predefined distance from the cameras. Typically, fish are counted within a maximum distance of 8 m, beyond which length estimates are likely to be inaccurate unless specialist calibrations have been conducted.
Annotations of the current MaxN may be updated when individual fish are more clearly visible, and therefore easier to measure, by taking photogrammetric measurements of individual body length at the last MaxN annotated.

| Body size measurements
Synchronized and calibrated stereo-video streams are used to accurately measure body size. All individuals of each species should be measured at their MaxN. We recommend measuring fork length rather than total length, as it is more easily definable across a range of species. Biomass estimates typically rely on total length, but fork length to total length conversions can be used to complete these calculations (Froese & Pauly, 2019). For species where total length can be unreliable or there is no definable fork, body size is estimated using other measures (e.g. disk measurements for rays). Photogrammetric length measurements are typically made with some degree of error, which can be minimized by measuring individuals when they are as close to cameras as possible with both the nose and the tail-fork clearly visible, still or slowly moving, at an angle <45° perpendicular to the cameras and straight (not bent from turning). Defining cut-offs for measurement error across projects will help to maintain accurate and precise body size estimates, we provide recommended stereo-measurement length rules for EventMeasure in Supporting Information 5. If fish cannot be measured within these parameters, a '3D point' may be used for annotation, which records the 3D location of the fish to ensure it is within the sampling area (Harvey, Fletcher, Shortis, & Kendrick, 2004). To create a relative abundance metric standardized to a consistent sample area, abundance should be summed from the lengths and 3D points at the MaxN for each species. For biomass estimates, 3D points provide a basis for extrapolating a median length value to fish that could not be measured (Wilson, Graham, Holmes, MacNeil, & Ryan, 2018).
When large tightly packed schools are encountered, fish that cannot be measured should have 3D points. When lengths or 3D points are not possible for every fish, multiple individuals can be assigned to a single length or 3D point, but care should be taken to represent the range of body sizes within a school.

| Behaviour
A range of behavioural observations, including time of first arrival, time to first feed and minimum approach distance, may also be calculated (Coghlan et al., 2017;Goetze et al., 2017).

| Interoperable and reproducible annotations
Video imagery enables annotators to work collaboratively to ensure identifications are consistent. A library of reference images, such as that supported by EventMeasure, will assist with identification and training.
It is acknowledged that some genera cannot be consistently identified to species level from imagery, so individuals are recorded at genus-family levels (e.g. flathead: Platycephalus spp). For unidentified individuals, a common convention is that fish that are potentially identifiable later are annotated to Genus sp1-10, this permits a batch rename at a later stage if the species is successfully identified. Individuals that are clearly unidentifiable to species are annotated as Genus sp.

| Habitat classification
Information on relief, habitat types and benthic composition (e.g. percent cover of benthos types) should be recorded from each deployment (Bennett et al., 2016;Collins et al., 2017), to facilitate investigation of fish-habitat relationships and to enable the sampling field of view to be standardized or controlled for subsequent data analysis . It is important that these data are annotated consistently and it is recommended that they are mapped to the CATAMI classification scheme (Althaus et al., 2015) and a 0-5 estimate of benthic relief (Polunin & Roberts, 1993;Wilson, Graham, & Polunin, 2007). An example of habitat composition and relief annotation schema is provided in a GitHub repository (Langlois, 2017).

| Quality control and data curation
Quality control and data curation are vital to ensure FAIR data workflows (Wilkinson et al., 2016). All corrections should be made within the original annotation files to ensure data consistency over time.
We recommend the following approaches to ensure quality control: • Annotators should complete 'training' videos where species IDs and MaxN are known and can be used to assess competency.
• A different annotator should complete the MaxN and length measurement annotations to provide an independent check of the species identifications.
• Quality assurance should be carried out by a senior video analyst or researcher and involves a random review of 10% of annotated videos and data within a project. If accuracy is below 95% for all identifications and estimates of MaxN, reannotation should be undertaken.
• Unique identifiers of annotators and dates of when imagery was annotated should be maintained to provide a data checking trail (see Supporting Information 4).
r workflows and function packages are provided in a GitHub repository (Langlois, 2020) to enable validation with regional species lists and likely minimum and maximum sizes for each species.

| Data storage, discoverability and release
We encourage open data policies and recommend archiving and sharing stereo-BRUVs annotations on global biodiversity data repositories, such as Ocean Biogeographic Information System, Global Biodiversity Information Facility and the recently developed GlobalArchive (globalarchive.org). GlobalArchive is a centralized repository that allows open access and private sharing of fish image annotation data from stereo-BRUVs or similar imagerybased sampling techniques. GlobalArchive allows users to store data in a standardized and secure manner and makes meta-data discoverable, thus encouraging collaboration and synthesis of datasets within the community of practice. We recommend all qualitycontrolled annotation data and any associated calibration, taxa and habitat data should be uploaded to GlobalArchive and we encourage that all data should be made publicly available via the public data option. As an example, the Australian standards for data management, discoverability and release are provided in Supporting Information 6.

| CON CLUS ION
Globally, stereo-BRUVs usage is increasing rapidly. The standardization of stereo-BRUVs surveys and annotation will facilitate the synthesis of comparable data over continental and global scales and provide rich and interoperable data to inform natural resource management. Variation in methodology has constrained the interoperability of these data to date (Whitmarsh et al., 2017), we encourage researchers to standardize and share technical improvements and issues via an established on-line forum or working group (Supporting Information 7).
Achieving consistent field methodology and FAIR annotation, with data archiving and sharing protocols, provide the greatest barrier to the globally consistent uptake and impact of stereo-BRUVs.
We provide a standardized protocol that will reduce methodological variation among researchers and encourage the use of FAIR workflows to increase the ability to synthesize datasets and answer a range of ecological questions.

ACK N OWLED G EM ENTS
The authors would like to thank James Seager (SeaGIS.com.au) for support with software and both James Seager and Ray Scott for

AUTH O R S ' CO NTR I B UTI O N S
All authors conceived the ideas and designed methodology; T.L. and J.G. led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns. com/publo n/10.1111/2041-210X.13470.

DATA AVA I L A B I L I T Y S TAT E M E N T
No data were presented.