A fully-annotated imagery dataset of sublittoral benthic species in Svalbard, Arctic

Underwater imagery is widely used for a variety of applications in marine biology and environmental sciences, such as classification and mapping of seabed habitats, marine environment monitoring and impact assessment, biogeographic reconstructions in the context of climate change, etc. This approach is relatively simple and cost-effective, allowing the rapid collection of large amounts of data. However, due to the laborious and time-consuming manual analysis procedure, only a small part of the information stored in the archives of underwater images is retrieved. Emerging novel deep learning methods open up the opportunity for more effective, accurate and rapid analysis of seabed images than ever before. We present annotated images of the bottom macrofauna obtained from underwater video recorded in Spitsbergen island's European Arctic waters, Svalbard Archipelago. Our videos were filmed in both the photic and aphotic zones of polar waters, often influenced by melting glaciers. We used artificial lighting and shot close to the seabed (<1 m) to preserve natural colours and avoid the distorting effect of muddy water. The underwater video footage was captured using a remotely operated vehicle (ROV) and a drop-down camera. The footage was converted to 2D mosaic images of the seabed. 2D mosaics were manually annotated by several experts using the Labelbox tool and co-annotations were refined using the SurveyJS platform. A set of carefully annotated underwater images associated with the original videos can be used by marine biologists as a biological atlas, as well as practitioners in the fields of machine vision, pattern recognition, and deep learning as training materials for the development of various tools for automatic analysis of underwater imagery.


a b s t r a c t
Underwater imagery is widely used for a variety of applications in marine biology and environmental sciences, such as classification and mapping of seabed habitats, marine environment monitoring and impact assessment, biogeographic reconstructions in the context of climate change, etc. This approach is relatively simple and cost-effective, allowing the rapid collection of large amounts of data. However, due to the laborious and time-consuming manual analysis procedure, only a small part of the information stored in the archives of underwater images is retrieved. Emerging novel deep learning methods open up the opportunity for more effective, accurate and rapid analysis of seabed images than ever before. We present annotated images of the bottom macrofauna obtained from underwater video recorded in Spitsbergen island's European Arctic waters, Svalbard Archipelago. Our videos were filmed in both the photic and aphotic zones of polar waters, often influenced by melting glaciers. We used artificial lighting and shot close to the seabed ( < 1 m) to preserve natural colours and avoid the distorting effect of muddy water. The underwater video footage was captured using a remotely operated vehicle (ROV) and a drop-down camera. The footage was converted to 2D mosaic images of the seabed. 2D mosaics were manually annotated by several experts using the Labelbox tool and co-annotations were refined using the SurveyJS platform. A set of carefully annotated underwater images associated with the original videos can be used by marine biologists as a biological atlas, as well as practitioners in the fields of machine vision, pattern recognition, and deep learning as training materials for the development of various tools for automatic analysis of underwater imagery. ©

Value of the Data
• The dataset presents annotated images of Arctic bottom macrofauna derived from the underwater video. The dataset can be useful both as a biological atlas and training material for automatic segmentation solutions. Seabed imagery data can be used for multiple purposes in marine biology and environmental sciences, for example: benthic habitat classification and mapping, marine environmental monitoring, impact assessment, biogeographical reconstructions in the context of climate change, etc. • A set of carefully annotated underwater images, linked to source videos, is of great value for both marine biologists and researchers as well as practitioners working in the fields of Machine Vision, Pattern Recognition, Machine Learning, and Deep Learning.
• The data will be used for the development of methods and tools for automatic identification of biological categories in underwater imagery, semantic image segmentation, object detection, and automatic characterization of the seabed. The data might be used for validation of various machine vision applications (i.e. automatic identification of biological organisms in underwater imagery), educational purposes (i.e. training material for marine scientists) and other tasks. • There is a lack of annotated underwater imagery datasets with just a few recently published cases [1][2][3] featuring coarse categories from various camera angles. Liu and Fang [1] collected 2537 images with 16 categories (nautilus, squid, plant, coral, fish, jellyfish, dolphin, sea lion, Syngnathus, turtle, starfish, shrimp, octopus, seahorse, person, stone). SUIM dataset [2] contains 1635 images with 7 categories (human diver, aquatic plant or sea-grass, wreck or ruins, robot, reef and invertebrates, fish and vertebrates, sea-floor or rock). Martin-Abadal et al. [3] annotated 483 images of Posidonia oceanica meadows. • The most similar dataset to ours in terms of biological accuracy and seabed aspect is the coral reef study by King et al. [4] , where 9511 cropped images of one object representing 10 categories ( Acropora palmata, Orbicella spp., Siderastrea, Porites astreoides, Gorgonia ventalina , sea plume, sea rod, algae, rubble, sand) were prepared. More coral reef transects exist [5,6] even a web-based solution for coral reef analysis -CoralNet [7] . • Our video was captured in both the photic and aphotic zones of polar waters, often in the vicinity of melting glaciers. We used artificial lighting and shot close to the seabed to preserve natural colours and avoid the distorting effect of turbid waters.

Data Description
We present visual data of bottom macrofauna filmed in the sublittoral of European Arctic -Svalbard. Some of the areas (Burgerbukta, Borebukta, Dahlbrebukta, St. Johnsfjorden, Trygghamna) are in the vicinity of melting glaciers; others are in ice-free areas (Adriabukta, Eidembukta, Gipsvika). The dataset [8] consists of three types of data: a) Video samples. In total, 22 min 51 s of video footage was filmed and split into 10-30 s segments, resulting in 47 video samples; frame rate was reduced to 3-5 fps. b) 2D mosaics. All video samples were converted into still images (video mosaics), that were manually analysed by marine biologists -specialists in the Arctic biota, who identified visible biological objects at the lowest possible taxonomic level. Twelve taxons were targeted for annotation (see Fig. 1 ): brown alga -kelp Laminaria sp . , benthic trachymedusa Ptychogastria polaris, burrowing anemone Halcampa sp., tube anemone Ceriantharia sp., tube-dwelling Polychaeta, spider crab Hyas sp., Shrimps, brittle stars Ophiuroidea, sea star Urasterias lincki , sea squirts Tunicata, snailfishes Liparidae and flatfishes Pleuronectiformes. c) Annotations. The annotation process, where four experts performed manual pixel-wise segmentation (see Fig. 2 ) and mask refinement survey (see Fig. 3 ), resulted in 2242 annotated objects with the most frequent category -Ophiuroidea. The annotation outcome is summarized by listing mosaics for each category label (see Table 1 ) and listing category labels for each mosaic (see Table 2 ). The example of 2D mosaic, mosaic with masked objects and their overlay is shown in Fig. 4 .

Underwater video
Underwater video data were acquired with a ROV, equipped with a low-resolution analogue camera on a tilted unit for navigation, and a primary camera. The primary camera was mounted vertically, has 3 CCD, Full HD (1920 × 1080) resolution, high-quality Leica Dicomar lenses and 10x optical zoom. The primary camera lighting system consists of 16 bright LED in 4 × 4 stations. The ROV was used in Borebukta, Dahlbrebukta, Eidembukta, Gipsvika, St. Johnsfjord and in Trygghamna. A Drop-down camera was equipped with an analogue camera of 700 TV lines (TVL) resolution for live view and a digital camera (Panasonic HX-A500) that recorded the material at high resolution (1280 × 720 px) on a memory card. The drop-down camera was used in Adriabukta and Burgerbukta. During the filming, camera speed was about 1 knot to avoid motion blur, and camera altitude over the seabed was 0.4-0.5 m, to ensure optimal lighting conditions. Stations near glaciers had very turbid water because of the inflow of glacial meltwater. At those stations, colours were slightly washed out due to light scattering on the suspended particles, but the imagery was still useful.

2D seabed mosaics
Video mosaicking is a process of converting a video sample into a single still image containing overlapping video frames. For the pre-mosaicking process raw videos were divided into 10-30 s video segments. Frame size was reduced, and the frame rate was lowered to 3-5 fps to shorten computing time. Each frame was enhanced for more accurate pair-wise registration and video mosaics were produced using original non-enhanced video footage and pair-wise registration data. Algorithms for video mosaicking have been developed by Rzhanov et al. [9,10] . Taxonomic identification of benthic species was carried out with specialists' help using a digital catalog, in which more than 40 biological (fish, benthic invertebrates, algae, etc.) and physical (stones, substrate, burrows, footprints, etc.) categories were identified. For simplicity, we decided to select 12 most prominent ones for annotation. No image post-processing was considered for a stitched mosaic and we would like to note that a large diversity of potentially useful water effect removal methods exist: from enhancement-based to restoration-based and even deep-learningbased post-processing [11] .

Mosaic annotation
Prepared 47 large 2D mosaics were uploaded to the online collaborative annotation platform Labelbox [12] . A new project was created by configuring the default editor (video, image, and text annotation) to have 12 categories (termed as OBJECTS in the interface) and inviting the team members to join. All mosaics were inspected and identified objects were segmented by four different marine biology experts using the polygon tool (see Fig. 2 ). Since all the experts had all mosaics available, there was an intentional overlap between many segmented objects. The annotation results with URL links to mosaics and generated masks were downloaded in .json and .csv formats.

Mask refinement
Expert annotations, downloaded from Labelbox , were later processed using the R language script to form a survey for all masks (both overlapping between experts and unique) in .json format. Correctly formated .json survey was uploaded to the SurveyJS platform [13] for serving and collecting expert responses on each annotated object (see Fig. 3 for a survey question example). Resulting .json structure for an example survey with a single question is detailed in Table 3 .
Post-processing of annotation results was as follows: 1. Find objects segmented only by a single expert. 2. Find objects segmented by several experts simultaneously. Create a new synthetic mask using a union of two masks with the highest overlap. 3. For each object cut-out its view from mosaic to get a background image. 4. For each mask of the object create an overlay to get overlayed images. 5. Upload background and overlayed images to a free image hosting service imguR . 6. For each object make a survey question using image picker and dropdown (see Table 3 ). 7. Upload the generated .json structure to the SurveyJS platform for survey serving (see Fig. 3 ); 8. Share a survey link with experts and ask them to fill out the survey, where they could: a. discard an object if all masks look inappropriate; b. choose the best mask for an object using an image picker; c. check an assigned category and change it using dropdown if needed. 9. Download survey results and choose the best mask using majority voting.
A few questions where each expert has chosen a different candidate mask were reviewed together to arrive at the consensus. There were also some questions where one mask was chosen Table 3 Example JSON structure for a single survey question. More questions would be created by repeating "picker-dropdown" sequence inside elements array. Survey logic for dropdown element was configured so that it becomes visible only after mask is selected. JSON code could be copied and pasted into JSON Editor tab at https://surveyjs.io/create-survey and then previewed live in Test Survey tab.
Survey part details JSON code by 2 experts and another mask was also chosen by 2 experts. This kind of tie was resolved by preferring a synthetic mask (if it existed among the choices made) or choosing between the two selected masks at random.

Declaration of Competing Interest
None.