Data for non-invasive (photo) individual fish identification of multiple species

This paper describes data from five studies focused on the individual fish identification of the same species. The lateral images of five fish species are present in the dataset. The dataset's primary purpose is to provide a data to develop a non-invasive and remote method of individual fish identification using fish skin patterns, which can serve as a substitute for the common invasive fish tagging. The lateral images of the whole fish body on the homogenous background for Sumatra barb, Atlantic salmon, Sea bass, Common carp and Rainbow trout are available with automatically extracted parts of the fish with skin patterns. A different number of individuals (Sumatra barb – 43, Atlantic salmon – 330, Sea bass – 300, Common carp – 32, Rainbow trout - 1849) were photographed by the digital camera Nikon D60 under controlled conditions. The photographs of only one side of the fish with several (from 3 to 20) repetitions were taken. Common carp, Rainbow trout and Sea bass were photographed out of the water. Atlantic salmon was photographed underwater, out of the water, and the eye of the fish was photographed by the microscope camera. Sumatra barb was photographed under the water only. For all species, except Rainbow trout, the data collection was repeated after a different period (Sumatra barb – four months, Atlantic salmon – six months, Sea bass – one month, Common carp – four months) to collect the data for a study of skin patter changes (ageing). The development of the method for photo-based individual fish identification was performed on all datasets. The identification accuracy for all species for all periods was 100% using the nearest neighbour classification. Different methods for skin pattern parametrization were used. The dataset can be used to develop remote and non-invasive individual fish identification methods. The studies focused on the discrimination power of the skin pattern can benefit from it. The changes of skin patterns due to fish ageing can be explored from the dataset.


a b s t r a c t
This paper describes data from five studies focused on the individual fish identification of the same species. The lateral images of five fish species are present in the dataset. The dataset's primary purpose is to provide a data to develop a non-invasive and remote method of individual fish identification using fish skin patterns, which can serve as a substitute for the common invasive fish tagging. The lateral images of the whole fish body on the homogenous background for Sumatra barb, Atlantic salmon, Sea bass, Common carp and Rainbow trout are available with automatically extracted parts of the fish with skin patterns. A different number of individuals (Sumatra barb -43, Atlantic salmon -330, Sea bass -300, Common carp -32, Rainbow trout -1849) were photographed by the digital camera Nikon D60 under controlled conditions. The photographs of only one side of the fish with several (from 3 to 20) repetitions were taken. Common carp, Rainbow trout and Sea bass were photographed out of the water. Atlantic salmon was photographed underwater, out of the water, and the eye of the fish was photographed by the microscope camera. Sumatra barb was photographed under the water only. For all species, except Rainbow trout, the data collection was repeated after a different period (Sumatra barb -four months, Atlantic salmon -six months, Sea bass -one month, Common carp -four months) to collect the data for a study of skin patter changes (ageing). The development of the method for photo-based individual fish identification was performed on all datasets. The identification accuracy for all species for all periods was 100% using the nearest neighbour classification. Different methods for skin pattern parametrization were used. The dataset can be used to develop remote and non-invasive individual fish identification methods. The studies focused on the discrimination power of the skin pattern can benefit from it. The changes of skin patterns due to fish ageing can be explored from the dataset. ©

Value of the Data
• The dataset is the largest (the highest number of fish and the longest period of data collection) data collection suitable for the development and testing of the methods for individual fish identification. No similar dataset exists. This data could be useful to the researchers who deal with the identification of fish individuals welfare indicators and disease monitoring systems. • The computer vision specialist and fish biologist can benefit from the data.
• The data can be mainly used by the developers to create an automatic system for individual fish identification. • The discrimination power of different skin patterns can also be studied due to the high number of individual fish. • The skin pattern changes due to the ageing can be studied because the dataset contains long-term data collection for Atlantic salmon.

Objective
The reason for creating this dataset was the non-existence of a freely accessible dataset useful for individual fish identification and to study the discrimination power of fish skin patterns. Several studies on fish identification based on fish skin patterns exist, but the data are not accessible and it was designed mainly for human-based processing. This dataset contains the images of five fish species, a high number of individuals and long-term data collections.

Data Description
We collected a large dataset of five fish species for the development and testing of the methods for individual fish identification of the same species. Those data sets were used for the development of our own approaches for the identification of fish individuals. The basic description of the datasets is presented in Table 1 . Table 2 contains information about collected raw images of the fish, and Table 3 summarizes automatically extracted fish skin patterns (ROI) for identification. The data are grouped based on the fish species. For each data collection and data processing, the protocol in Zenodo system exists. This protocol describes experimental conditions/data processing and contains relevant images. All codes published in Zenodo system [1] .
Dataset of five fish species was collected: Atlantic salmon Salmo salar , Sumatra barb Puntigrus tetrazona , European seabass Dicentrarchus labrax , common carp Cyprinus carpio and rainbow HOG [5] Common carp dataset A small number of fish, out-of-water, four data collections in four months. Suitable for a test of pattern changes due to the ageing of the fish without an obvious skin pattern.

HOG
Rainbow trout A high number of fish, out-of-water, one data collection. Suitable for a test of pattern discriminative power of the skin pattern.
HOG Not yet published. trout Oncorhynchus mykiss . Sumatra barb data and common carp datasets has one protocol each. European seabass data devided into two protocol based on a experiment types: short term and lond term datasets. Atlantic salmon datasets devided into three protocols by the data types: eye dataset, tent dataset and aquarium dataset.

Experimental Design, Materials and Methods
The digital camera Nikon D90 was used for data collection of lateral images of all fish species. The microscope camera Dino-Lite AM3113T was used to collect Atlantic salmon and Sea bass eye images. Lateral images of the fish with the head orientated to the left were photographed. Different fish rotations and positions in the image were used to provide a rich dataset. The camera was perpendicular to the fish plane. The green homogenous background and controlled white light were used to simplify automatic fish detection. All fish were anaesthetized using clove oil for data collection. The fish images were saved in RAW format to keep the highest data quality.
Different regions (ROI) of the fish body were automatically extracted for different species as a pattern for individual identification. The extraction is based on the detection of the green background using the know background colour. The largest object is then detected in the background as the fish. The ROI is then extracted based on the height and width of the detected fish to keep the same ROI position on the fish's body. The ROI is saved in PNG format in the dataset.

Atlantic Salmon [6-8]
A total of 328 fish were used for the first data collection, with an average length of 29.5 ± 2.5 cm. Thirty of the fish were tagged with PIT tags and used for the next three data collections performed over six months at two months intervals. Three types of data were taken in each session: lateral view images of the fish out of the water (in a photographic tent) and underwater (in a small aquarium) and iris of the fisheye. On average, eight images of each fish were photographed. Three ROIs were extracted from the images. A central part of the fish body was extracted from underwater images and out-of-water images. The bounding box of the eye was extracted from microscope images.

Sumatra Barb [9]
43 fish individuals of Sumatra barb were used for imaging, with an average size of 41 mm. Data were collected two times during two weeks for fish inside the aquarium. Tree images of each individual were taken in every data collection. Four ROIs were extracted for identification.

European Seabass [10,11]
Totally 300 sea bass were used for the first data collection. The initial fish size was 15 -17 cm for approximately one-year old fish. Ten images of each fish were collected. Randomly selected 32 fish individuals out of the 300 fish were tagged by PIT tags for second data collection. The second data collection was performed in two months. Four different ROI (regions of interest) were used for identification.

Common Carp [12]
Four data collections were performed over the four-month. Thirty-two fish individuals were used. Up to ten images were collected per each fish with an average length of 42 cm. The age of the carp was approximately two years. Three ROIs were extracted for identification.

Rainbow Trout
one data collection was done for 1849 fish individuals of rainbow trout. The size of the fish was around 7-10 cm. Fifteen fish individuals were photographed together. The image was automatically divided into sub-images containing individual fish. Three images of each fish were taken at different angles. One ROI was extracted for identification ( Fig. 1 ). The individual identification experiments were performed for all species. The results of those experiments are fully presented in our papers. The datasets are scaled and focused on the different aspects of fish identification. The first dataset contains images of Sumatra barb. The identification methods can be first tested on this dataset because of the obvious stripe pattern on the fish and data collection close to the real conditions. We reached the classification accuracy of 100% [2] . The long-term patterns stability was tested on Atlantic salmon data on a higher number of individuals. We reported 100% identification accuracy using specific skin dot parametrization [3] . The possibility of using a fish eye pattern was also tested on this dataset [4] with lower accuracy (81%) than for the skin dot pattern. The identification of the fish without an obvious pattern was tested on Sea bass and Common carp dataset. In both cases, we reached 100% accuracy [5] . The identification of the largest number of fish (1602 individuals) was done for Rainbow trout. The discrimination power of the pattern was tested for fish numbers close to commercial fish production. The identification accuracy was 100%.

Ethics Statements
The work did not involve the use of human subjects, animal experiments, nor data collected from social media platforms. The fish anaesthesia and fish tagging were done according to the standard operating procedures and regulations of the infrastructure where the data were collected. The animal experiments complied with the EU Directive 2010/63/EU for animal experiments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.