Datasets for face and object detection in fisheye images

We present two new fisheye image datasets for training object and face detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.


Data
We present two new datasets -VOC-360 and Wider-360for visual analytics based on fisheye images.The datasets contain raw data files: JPG images (both datasets), XML annotations (VOC-360) and MAT file annotations (Wider-360).VOC-360 can be used to train machine learning models for object detection, classification, and segmentation.Wider-360 can be used to train face detectors.Links to the data are given in the specification table on the last page of the paper.
VOC-360 contains 39,575 images and the corresponding annotations.The images and annotations are derived from the VOC2012 dataset [1].The data is organized into five directories: Annotations, fisheye, fisheye_class, and fisheye_object, and ImageSets, as depicted in Fig 1(a).Each fisheye image inside the fisheye folder corresponds to one XML file in the Annotations folder, with the same filename.The XML files provide all the annotations for each image, following the same structure as the original VOC2012 dataset.Directories fisheye_class and fisheye_object contain the fisheye image masks with pixel-wise segmentations giving the class of the object visible at each pixel.The directory ImageSets contains text files that specify the lists of images for training, testing and validation.
Wider-360 contains 63,897 images, of which 50,982 images are intended for training and 12,915 images for validation/test, as shown in Fig. 1(b).Following the original Wider Face dataset [2], the images are organized into 61 directories depending on the type of the scene, as shown in Fig. 1(c).The annotations are in the form of face bounding boxes and are stored in two MAT files, one for the training set and another for the validation set.
Sample images from VOC-360 and Wider-360 are shown in Figs. 2 and 3, respectively.

Experimental Design, Materials, and Methods
The images were obtained from the existing public datasets -VOC2012 [1] and Wider Face [2] and transformed into fisheye-looking images.The corresponding annotations were also converted into the fisheye image coordinate system.Square patches were sampled from the original images and converted to fisheye-looking images using the following generic transformation [3]: Equation ( 1) converts the square patch to a circular patch.Here, (, ) are the normalized coordinates of the square patch, such that the center of the square patch is at (0,0), while the four corners have the coordinates as (±1, ±1).The output coordinates ( # ,  # ) refer to the produced circular patch.Equation ( 2) further squeezes the circular image towards the perimeter.Here,  = :( # ) -+ ( # ) -is the radial distance from the center of the circular patch, while the output coordinates ( ## ,  ## ) refer to the final, fisheye-looking circular patch.
An alternative method to generate a fisheye-looking image from a rectangular or square image was proposed in [4], although no public data was provided with that work.The method in [4] is based on equidistant projection and requires the user to specify the focal length of the camera.In Fig. 4 below we show three images: a real fisheye image taken by the Ricoh Theta V 360-degree camera (left), an image generated from a square original by the proposed method (middle) and an image generated from the same original by the method from [4] (right) using one of the suggested focal length values ( = 242).The image generated by our method (middle) is a closer approximation to a real fisheye image (left).
Fig. 4. Fisheye image from the Ricoh Theta V 360-degree camera (left), image generated by our proposed method (middle), and an image generated by the method from [4] Each ground-truth annotation was converted to the new coordinate system.For segmentation annotations, the segmentation mask was treated as an image, and the transformation specified by equations ( 1)-( 2) was applied to generate the segmentation mask for the fisheye image, as shown in the right part of Fig. 2. For bounding-box annotations, eight points were selected around the bounding box: four corner points and four edge midpoints, as depicted in Fig. 5.The same transformation specified by equations ( 1) and ( 2) was applied to these eight points to map them to the fisheye image coordinate system.The new points define a polygon in the fisheye image coordinate system, as shown in the right part of Fig. 5.The minimum axis-aligned bounding rectangle of these new eight points was found, shown in green in Fig. 5, and stored as the new bounding-box annotation for the fisheye image.

Fig. 5 .
Fig. 5. Convert annotation box from square coordinate to circular coordinate