Method for Automatic Segmentation of Vehicles in Digital Images

Introduction. Modern systems for active vehicle safety are designed to significantly reduce the number of road accidents. Sensors based on monocular cameras are increasingly being introduced by the world's leading automakers as an effective tool for improving traffic safety. Modern methods of localisation and classification, combined with semantic segmentation algorithms, allow for image division into independent groups of pixels corresponding to each object. However, the problem of developing segmentation algorithms ensuring improved quality of image segmentation remains to be solved.Aim. To develop an automatic method for segmenting a given object during image analysis.Materials and methods. An automatic method for segmenting vehicles in an image was proposed. The method presented herein allows semantic segmentation of the object of interest, based upon a priori information about the bounding boxes, which frame the objects in the image. Bounding box information is used to transform an image into a polar coordinate system where the pixels of the image act as the edges of a weighted graph. A closed contour is obtained around the object of interest by using the shortest path search algorithm and inverse transformation to the Cartesian coordinate system.Results. The experiments confirmed the correctness of the selected area of interest based on this algorithm. Jacquard’s similarity coefficient for the Carvana open database is 85 %. Furthermore, the proposed method was applied to different classes of images from the Pascal VOC database, thus demonstrating the ability to segment objects of other classes.Conclusion. The main contribution of the proposed method was as follows: 1) segmentation of the object of interest at the level of modern methods, and in some cases in excess thereof; 2) the study presents a new look at the way of tracking object contours.


Метод автоматической сегментации транспортных средств на изображении Method for Automatic Segmentation of Vehicles in Digital Image
The use of sensors based on a monocular camera can solve a wide range of problems.For example, it is possible to estimate the size of the vehicle and the distance to it on the basis of data concerning the shape and area of the vehicle along with other vehicle design features.Using advanced classification, detection and segmentation algorithms, while also considering possible limitations of vehicle behaviour based on the theory of vehicle movement, it is additionally possible to predict vehicle behaviour.However, in order to solve these problems, it is necessary to describe the road scene as a composition of objects having a specific shape, area and mutually-defined location.
In 2012, the convolutional neural network AlexNet, trained by Krizhevsky, Sutzkever and Hinton, won a competition on the classification of images, making the world community look at image analysis methods in a new way.The method developed by Krizhevsky et al. managed to surpass all the classical methods of computer vision presented in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [2].Against this background and due to the general availability of digital cameras, one of the most widely developed areas of machine vision has become the analysis of images based on colour information using a single camera.Currently, systems based on convolution neural networks are the most accurate approaches to image classification and object detection.With an impressive level of achievement, neural networks have been successfully applied to various types of problems, for example [3][4][5].
The vast majority of the existing object detectors are focused on two-dimensional localisation.A 2D object detection model provides information
It is necessary to perform segmentation for further analysis of the detected object.Segmentation is usually understood as the division of an image into a multitude of disjoint connected areas (segments).During segmentation, each image pixel is assigned a class according to some characteristic or calculated property, for example, by colour, brightness, or texture.The result of image segmentation is a set of segments, which together cover the entire image.Image segmentation permits a description of the scene as an object composition comprising shape, area, relative position, brightness and texture parameters.
Currently, there are many methods of image segmentation, such as [11- level methods, -TurboPixels.Methods.In this article, the object of interest is motor vehicles.When segmenting detected objects in the relevant images, the position of the rectangle bounding the object is regarded as a priori information (Fig. 2, a).
To solve the problem of vehicle image segmentation, the author of this article has developed an image processing algorithm based on the search for the shortest path in the weighted graph represented in the polar coordinate system.Basic steps of the algorithm.
Step 1. Scaling up the image.The image is scaled by means of bilinear interpolation [14] such that the aspect ratio of the image is 1:1 (Fig. 2, b).Scaling allows the polar pole to be centred at each point (see Step 4 below).
Step 2. Image processing by the Canny edge detection operator [15].The main stages of this algorithm: 1.The Gaussian filter is applied to the image: where A is the pixel image matrix.
2. Image gradient projections are calculated by coordinates: as well as the direction of the gradient:
4. Pixel selection, where the gradient is the local maximum relative to adjacent pixels.These pixels are considered as candidates for the formation of the object boundary.
5. Two-threshold filtering allows all selected pixels of an image fragment to be split into three sets: many pixels having gradient values exceeding the upper threshold; many pixels having gradient values less than the lower threshold; set of pixels having gradient values between the two thresholds.
The pixels of the first set belong to the object boundaries, while the pixels of the second set make up the boundless areas of the background or object.Decisions on the third set of pixels are made according to the results of further processing.
6. Pixels of the third set belong to the object boundary if they are adjacent to the boundary pixels.If these pixels are surrounded only by non-boundary pixels, they are non-boundary pixels.
7. The boundaries are finally determined by the trace operation.In this case, the thickness of borders is reduced to one pixel, gaps are filled, and branches of borders are processed.Tracing is performed by a cumulative analysis of the surroundings of each boundary pixel.As a result, the Canny edge detector is used to form the final representation of the object boundaries of the original image (Fig. 3).
Step 3. Polar transformation.The following transformation is used to describe the position of the point   , M x y in polar coordinates r and φ around the cen- tre of the rectangle bounding the detected object: where r is the pole (distance from the point M to the origin); φ is the angle formed by the beam 0M with the polar axis.
The origin of the coordinates is the centre of the rectangle limiting the detected object, i.e. (1) and ( 2) have the following form: where h and w are the height and width of the bounding rectangle, respectively.Using ( 3) and ( 4), let us convert the image from the Cartesian coordinate system to the polar one (Fig. 4).Following this transformation, the object contour is located in the area 0 2 .

 
Step 4. Finding the shortest path in a weighted graph.Let us imagine the image as a graph, the vertices of which are the pixels of the image in the polar coordinate system.Based on the fact that the detected object occupies the largest part of the image, its outer contour is formed by pixels having the highest radius values.Then the weights of the edges separating the two pixels in the point with the coordinates φ, r can be represented as follows: is the intensity of the pixel at the coordinates φ, r.
Assuming that the outer object contour is located in the area 0 2 ,    and the graph weights depend on the value of the polar radius and intensity of pixels, the segmentation task can be represented as a search for the shortest path in the weighted graph.
Graph traversal starts at the vertex   0, r  and ends at the vertex   2 , .r    From the existing variety of methods using graph theory, the algorithm of searching for the shortest path A  has been chosen [16].This algorithm finds the path of the lowest cost from a given starting point to the target node (from one or more possible targets).
A  follows the path of the lowest known heuristic costs: where v is the current vertex;   gv is the smallest distance from the starting vertex to the current position;   jv is the heuristic function (Manhattan distance) of approaching the distance from the current location to the final target.
Fig. 5 presents the resulting path with the lowest weight, calculated according to the algorithm A* for the image from Fig. 4.
Step 5. Conversion of the received path into a Cartesian coordinate system with subsequent filling.Using the inverse transformation of expressions ( 3), ( 4), let us translate the obtained path into the Cartesian coordinate system.Since the shortest path obtained in Step 4 is a closed path describing the object of interest, the filling of the area inside this path can be used to get the object mask.Comparing the mask with the original image, a segmented image is obtained (Fig. 6).

Results
. In order to assess the quality of the developed algorithm, the segmentation results were compared with the results of three standard methods of image segmentation: K-Means [17], GrabCut [18], and Mask-RCNNN [19].
K-Means is a clustering algorithm based on dividing the set of vector space elements into a predefined number of clusters with a minimisation of the standard deviation at the points of each cluster.Each iteration calculates the centre of mass for each cluster based on the centres obtained at the previous iteration.Following this, the elements of vector space are again divided into clusters according to the closest distance to the new centres.The algorithm ends if the cluster centres remain unchanged at the next iteration.
GrabCut [18] is an image segmentation method based on the GraphCut algorithm [20].GrabCut extends GraphCut's ability to process colour images.Initially, the set of pixels inside and outside the detected object is approximated by a mixture of Gaussian values representing the target object and background pixels.The resulting model is used to build a Markov random field with an energy function that highlights connected pixels of the same class.After that, the optimisation method based on the minimum graph section is launched.
Mask-RCNNN [19] is modern neural network architecture for object segmentation in images.It can be presented as the following modules: feature extractor forming a three-dimensional matrix of features of the input image obtained by the ResNet-50 convolution neural network [21]; -Region Proposal Networka network of regions generation with present objects; fully-connected layers comprise a network that cuts out the region-specific part of the feature matrix for each region and provides the object class and a specified rectangle describing the object; generation of binary masks within the regions of object presence.
In order to evaluate the proposed method and compare it with the above algorithms, the Carvana segmented image database [22] was used.This resource contains 5.088 vehicle images of different classes, as well as masks for each image.Each image was scaled to a single number of pixels 500×500 (Fig. 7, a).The rectangular area in which the object of interest is located was selected based on the binary mask of the image (Fig. 7, b).The image enclosed within the frame limiting this area was used as input for segmentation algorithms.
A binary Jaccard similarity coefficient was used as a similarity coefficient of the masks obtained for segmented objects:

| | | |, J A B A B   
where A, B are binary masks of the image obtained by the segmentation algorithm and the initial images, respectively.Segmentation algorithms were applied to all images in the database.The results are presented in Table 1.
The binary images presented in Table 2 show the results of the algorithms of the automatic segmentation of the detected vehicles.

Fig. 1 .
Fig. 1.Visualization of bounding boxes predicted by an object detector, if the probability of finding objects in a rectangle is more than 0.5 (а) and more than 0.05 (b)

Fig. 2 .
Fig. 2. Scaling a selected object: athe original image of the selected object; bthe scaled image

Fig. 6 .Fig. 7 .
Fig. 6.The result of the algorithm operation represent the result of applying the proposed method to different object classes in the image.Conclusion.The article presents a new method of automatic vehicle segmentation in the image.The efficiency and competitiveness of the method in relation to the known segmentation algorithms of K-Means, GrabCat, Mask-RCNNN are verified by its testing based on the Carvana image database.The method, also successfully applied to the Pascal VOC image database, demonstrates the possibility of segmentation of objects of different classes.

Table 2 .
The results of the algorithm for segmentation of detected vehicles

Table 1 .
Comparative results of the segmentation of the Carvana database images

Table 3 .
The result of applying the algorithm to the images of the Pascal VOC base Source VOC database image Segmented image Algorithm result