Complex Traffic Scene Image Classification Based on Sparse Optimization Boundary Semantics Deep Learning

: With the rapid development of intelligent traffic information monitoring technology, accurate identification of vehicles, pedes‐ trians and other objects on the road has become particularly important. Therefore, in order to improve the recognition and classification ac‐ curacy of image objects in complex traffic scenes, this paper proposes a segmentation method of semantic redefine segmentation using im‐ age boundary region. First, we use the SegNet semantic segmentation model to obtain the rough classification features of the vehicle road object, then use the simple linear iterative clustering (SLIC) algorithm to obtain the over segmented area of the image, which can deter‐ mine the classification of each pixel in each super pixel area, and then optimize the target segmentation of the boundary and small areas in the vehicle road image. Finally, the edge recovery ability of condition random field (CRF) is used to refine the image boundary. The experi‐ mental results show that compared with FCN-8s and SegNet, the pixel accuracy of the proposed algorithm in this paper improves by 2.33% and 0.57%, respectively. And compared with Unet, the algorithm in this paper performs better when dealing with multi-target segmentation.


Introduction
With the continuous growth of the number of private cars, the accidents of malignant traffic occur frequently on expressways, which causes certain losses to the economy. In order to solve the traffic safety problems and ensure the safety of passengers lives and property, the concept of Advanced Driver Assistance System (ADAS) has been proposed. ADAS mainly uses a variety of on-board sensors to obtain environmental information inside and outside the vehicle, and through appropriate information processing, analysis, fusion and the necessary, or vehicle control systems are directly involved to perform part of the operation and improve vehicle driving safety [1,2] .
According to the statistics of relevant literature, more than 90% of environmental information is acquired by visual means [3] . As one of methods of the most effective perception in ADAS system, visual perception can provide the most intuitive, reliable and abundant environmental information for ADAS system. At present, the main research of vision sensing technology of traffic environment is divided into object detection algorithm and image segmentation algorithm. The segmentation of road images is one of the most basic and important research fields, while the algorithm for target segmentation is mainly deep-learning algorithms based on convolutional neural network and the traditional machine learning algorithms.
Traditional image segmentation algorithms include methods based on image threshold [4][5][6] , edge detection [7][8][9][10] and region image segmentation [11][12][13][14] ,etc. Ref. [4] proposed an adaptive gray enhancement and linear region threshold segmentation algorithm, which enhances the gray contrast of the target and background, avoids the bad influence of image segmentation method based on single threshold, and improves the accuracy of target recognition and measurement. Ref. [5] proposed a multi-level threshold optimization algorithm incorporating Kapur entropy, and this algorithm uses whale optimization algorithm (WOA) to improve the segmentation accuracy. Ref. [12] proposed an image segmentation method based on sector ring region, by which the object and background in the image can be accurately separated. Ref. [15] divided the image area, and used the improved Hough transform and the tangent relationship of the lane line model to realize the recognition and reconstruction of the lane line. However, due to the complexity of road scenes and the richness of target categories, these traditional image segmentation methods can not accurately distinguish target categories, and the probability of missed detection and false detection is high in practical application. Moreover, these algorithms have long calculation time, low real-time, and large limitations in real scenes. Since 2012, deep-learning algorithms have been rapidly applied to target recognition [16][17][18][19] , target detection [20][21][22] and other tasks, and have achieved remarkable results. Deep learning is widely used in image semantic segmentation algorithms and intelligent vehicle assisted driving, providing reliable guidance and decision-making for intelligent vehicle assisted or active driving. Fully Convolutional Networks (FCN) proposed by Long et al [23] is used for image semantics segmentation and pixel level classification. When any size image is input and the full connection layer in traditional convolution neural network (CNN) is replaced by convolution layer, arbitrary size images can be classified by the fine-tuning model. However, the disadvantage of FCN is that the results are not precise enough and lack of spatial consistency. SegNet algorithm proposed by Badrinarayanan et al [24] is used for semantics image segmentation. Its central idea is based on FCN, which adopts symmetrical encode-decode structure. The encoder part uses the first 13 layers of convolutional network of VGG16. Each encoder layer corresponds to a decoder layer. Finally, the output of the decoder is sent to the soft-max classifier to generate class probability for each pixel independently, which improves the accuracy of image segmentation. Gan et al [25] proposed an improved Unet model of self attention mechanism, which had certain improvement in image segmentation of similar regions. Qian et al [26] improved the R-CNN network and introduced the characteristic pyramid network into the backbone network, which achieved higher accuracy in traffic sign segmentation. Ref. [27] fused polarization features and intensity images under low visibility conditions, and used depth neural network to segment and identify the vehicle road environment, which ultimately improved the perception effect under adverse weather conditions. Although deep learning has developed rapidly in the field of image segmentation, the difficulties of multi-target recognition and segmentation in complex traffic environments under changing conditions lie in the variety of targets, mutual occlusion between targets, and many similar regions. The existing network needs to have higher accuracy and robustness.
Based on the above ideas, this paper proposes a target boundary optimization algorithm for complex vehicle road images. The LF-SegNet [28] network model is used to classify different target categories in the road images, but the network modeling ability is insufficient, resulting in poor network stability and target classification. The recognition accuracy is not high. Therefore, it is necessary to further optimize the classification to improve the overall performance of image classification. In this paper, two prior constraints are introduced by using the super-pixel feature. The first constraint is based on the correlation between adjacent pixels, which are likely to belong to the same category. The second constraint is the label map and the original image. The boundary information is basically the same. With these two conditions, the road image boundary of the SegNet network segmentation can be optimized.

Overall Design
The proposed image boundary optimization algorithm feeds back the boundary and contour information of the original road image extracted by super-pixels to the convolution neural network classification image, further enhances and improves the preliminary model, and achieves the accurate classification of complex road targets. Firstly, the SegNet algorithm is used to extract the target pixel level features. Then, the simple linear iterative clustering (SLIC) [19][20][21][22][23][24]29] algorithm is used to extract the features of image super-pixel blocks and edge information. Then, the image boundary optimization is realized by combining the features of pixel level and superpixel blocks. Finally, the precise edge recovery capability of Condition Random Field (CRF) is used to optimize the segmentation results. This paper optimizes the boundary of road image and the false segmentation of small area targets. The specific scheme framework is shown in Fig.1.
From Fig. 1, we can see that the road image is semantically classified and super-pixel segmented by Seg-Net algorithm and SLIC algorithm, respectively. The boundary optimization algorithm proposed in this paper is used to optimize the image boundary and to deal with local false segmentation. However, this method is not effective for the object with slender size, so the boundary recovery ability of conditional random field is used to deal with such object edges finally. Information is restored to further improve the effect of target classification.

Semantic Segmentation Algorithm Based on Super-Pixel Boundary Optimization
The boundary optimization algorithm based on super-pixel is mainly divided into three parts: 1) Pixel classification: using convolution neural net-work algorithm to obtain image features, and classify and recognize each pixel.
2) Region segmentation: combining the pixels with similar features in street scene image to form several representative regions and obtain the over-segmented regions of the image, and the classification of pixels in each region determining by combining the neural network.
3) Recognition of pixel categories: according to the pixel classification algorithm in each region, the pixels in the whole image are reclassified. The flow chart of the boundary optimization algorithm is shown in Fig.2.

Semantic Segmentation Algorithm Based on Boundary Optimization
In this paper, SegNet [28] network is used to classify images at the pixel level. SegNet network is mainly composed of encoder and decoder. The network model is shown in Fig. 3. The coding process is mainly based on the pretraining model VGG-16 network, which retains 13 convolution layers to extract image features. Only the last three full connection layers need to be discarded, which greatly reduces the number of learning parameters. SegNet network has 13 convolution layers, 5 pooling layers, 13 deconvolution layers and 5 upper sampling layers. The pool layer uses 2×2 window and strides 2 step size. Each pool layer is equivalent to a half-resolution reduction of the image. During each max pool, the location of the maximum value in each pooling window in feature maps is recorded. The decoding process mainly uses the largest pooled index recorded to sample the feature maps of the input image after convolution pooling. The up-sampling processes of SegNet and FCN are shown in Fig.4(a) and (b), respectively.
In Fig. 4(a), the up-sampling process of SegNet is that characteristic graph values 1, 2, 3, 4 are mapped to a new feature graph by the maximum pooled coordinates previously saved; in Fig. 4(b), the up-sampling process of FCN is that by deconvoluting the eigenvalues 1, 2, 3 and 4, the new characteristic chart is added to the corresponding convolutional characteristic chart.
The test accuracy of the trained SegNet model is higher than that of the FCN algorithm. It is also used in the image segmentation of vehicle-road environment. By restoring the original image details hierarchically with multiple decoders, better boundary accuracy can be obtained to a certain extent. Classification of each target is obtained by convolution layer, pooling layer and deconvolution layer, and each pixel is classified by Soft-max function. The Soft- In image classification, S p i denotes the probability that the pixel p belongs to class i, and N represents the total number of categories, and e c p i denotes the scoring value of point p belonging to class i in score graph. Then according to the error between the predicted value and the real value, the cross-entropy loss function is constructed: where y p i denotes the true probability that the pixel p belongs to class i, and y p i is defined as follows: If the pixel belongs to the third class, then the crossentropy loss function represents the error between the predicted value and the real value, and the smaller l p is, the higher the accuracy of the prediction is. By restoring the original image details hierarchically with multiple decoders in SegNet network, better boundary accuracy can be obtained to a certain extent. However, due to the complexity of road images and the ambiguity of some target boundaries, it is impossible to classify all targets directly by using SegNet network only.

SLIC Algorithm
In this paper, SLIC algorithm is used for image region segmentation. SLIC algorithm converts image from RGB color space to CIE-LAB color space. The color values of (l, a, b) and (x, y) coordinates of each pixel constitute a five dimensional vector V[l, a, b, x, y]. The similarity of two pixels can be measured by their vector distance. The larger the distance, the smaller the similarity. The detailed flow of SLIC algorithm is as follows: Step 1: Setting the number of super-pixels is K for the color image in CIE-LAB color space. Initial clustering centers are initialized by C i = (l i a i b i x i b i ) T and the step length of clustering center of super-pixels is Step 2: Clustering. Class labels are assigned to each pixel in the neighborhood of each seed point. By calculating the distance between the seed point and each pixel, the lab color space distance between the seed point and the seed point is initialized and infinite, which represents the space coordinate distance and the comprehensive distance between the seed point and the seed point.
N s is the maximum spatial distance within a class, which is defined as N s =S and it is suitable for each cluster. The maximum color distance is N c . The final distance metric D′ is as follows: Step 3: Iterative optimization. Repeat Step 2 until all clustering centers do not change.
Step 4: Remove outliers and enhance connectivity. When the number of input super-pixels K and the maximum color distance m are different, different segmentation effects will be produced. The effect is shown in Fig.5. It can be seen from the graph that using hundreds or thousands of super-pixels to optimize massive image data can obtain accurate image boundary, realize efficient image processing, understanding and expression, and serve for efficient and flexible perception of road environment information. Figure 5 shows that we can not only get clear edge areas of the image, but also greatly reduce the computational complexity of the pixel samples and improve the computational efficiency by using hundreds or thousands of super-pixels instead of massive image data .

Reclassification
For the reclassification of boundary and missegmented pixels in each region, the specific steps are as follows: All the pixels in each super-pixel S p are C = {C 1 C 2 C S } , the number of pixel labels n k for each class in this super-pixel is counted: n p ={n 1 n 2 n 3 n K } n i indicates the number of labels of the pixels owned by class i in region P, then the maximum value n j in n p is found, and the pixels in region P are classified as class j.
But there is also a case that when the number of maximum number tags n i is close to the number of sub-maxi-mum number tags n j in n p , it is impossible to determine whether the region P belongs to class i or j, but in the neural network, it is defined as the highest probability of classification, so there will be a phenomenon of false segmentation. Thus we need to define a threshold T.
In this paper, T is 0.2. If T is larger than 0.2, the pixels in region P are classified as class i. Otherwise, they are still classified according to the result of semantic segmentation of convolutional neural network.

Specific Flow of Boundary Optimization Algorithms
There are many ways to realize the optimization of the image boundary, and the method used in this paper is to combine super-pixels with convolutional neural network to achieve the optimization of the image boundary. Firstly, SegNet image semantics segmentation algorithm based on VGG-16 network model is used to realize the rough segmentation of images and extract rough features. SLIC algorithm is used to process the image of image segmentation generated super-pixel. Then rough features are optimized by the boundaries of these superpixel objects. This method can improve the accuracy of object boundary segmentation in a way. The key algorithm of boundary optimization has been fully proved in Algorithm 1.
The flow chart of the Algorithm 1 is as follows: Step 1: Input original image I and rough feature graph L extracted by SegNet algorithm.
Step 2: After segmenting the original image with SLIC algorithm, K super-pixels S p = {S 1  S 2  S K } are obtained, and the area of each super-pixels is marked with label i.
Step 3: for i = 1:K 1) All the pixels in the super-pixel S i are S i = {C 1  C 2  C N } , where C j corresponds to one of the pixels in class j in the feature map; 2) Initialize the number of occurrences of the same label to n = 0; 3) for j = 1:K The feature label of C j is saved as L C j , and the number of pixels with the same label is calculated, and the whole super-pixel is traversed.
W C j is the proportion of pixels in the same label L C j .

4) Redistribution of labels.
If W C j > 0.8, mark the super-pixel with the corresponding label L C max of W C j and jump to Step 4.
Else search for maximum W max and sub-maximum W sub ; If W max − W sub > 0.2, mark the super pixel with L C max corresponding to W max , and jump to Step 4.
Else use the class of the segmentation result of Dee-pLabV2 to mark the super pixel.
Step 4: Use L C max to reassign the current super-pixel classification and output the image I'.
Through this algorithm, we can optimize the boundary of the road image and the scene of small area of missegmentation in road image, which shows that there are many small "spots" in the image. Through SLIC algorithm processing, these "spots" can be eliminated by its similarity with the block labels of surrounding superpixels to a certain extent. The specific effect is shown in Fig.6.
The two images on the left side of Fig.6 are superimposed by the vehicle road image of SegNet segmentation and super-pixel segmentation, but its boundary is not optimized. As shown in the enlarged part 1,2,3,4 of the image, we can see that the boundary of the image and some parts of the local area are not classified. Therefore, the optimization of the algorithm in this chapter can show that the boundary of the image is optimized and the error of local small. Figure 6 is the effect diagram of local optimization. Thus, area classification can be eliminated. For example, 1 and 2 of the local graph show that the boundary is optimized, and 3 and 4 show that some small "spots" are correctly classified. The spe-cific optimization diagram is shown in Fig.7.

Boundary Restoration
Although the super-pixel boundary optimization algorithm can optimize the road image boundary and the wrong segmentation area, the optimized image boundary is no longer smooth, and it is difficult to segment into several subblocks for such objects as poles, fences and so on. Therefore, CRF model is used to refine the image boundary in the subsequent optimization in this section to restore the boundary more accurately.
CRF [30] is a typical discriminant model, which is similar to the probabilistic indirected graph model. If the pixel label in the image is regarded as a node, the weight where i denotes a random pixel representing class l i and an input pixel of image I is N, plus normalization, and the final conditional probability of (I, L) is where E(l) is the marked Gibbs energy l Î L N , Z(I) is the partition function and E(l | I) is the energy function.

Fully connected CRF model using Energy Function is
Among them, ∑ i ψ i (l i ) is the one-dimensional potential function of the probability of taking label l i for pixel i, which comes from the optimized output of frontend super-pixels; ∑ ij (l i l j ) is the binary potential function of assigning label l i ,l j to pixel i, j at the same time, and similar pixels are assigned the same label, while the pixels with large differences are assigned different la-bels. In this section, the one-dimensional potential can be regarded as boundary optimization feature mapping, which can help improve the performance of CRF model. The two-dimensional potential function usually simulates the relationship between adjacent pixels and weights them by color similarity. The expression of the binary potential function is as follows: Among them, I i and I j are color vectors, P i and P j are pixel positions, so the value of binary potential function depends on the pixel position and color information. σα, σβ control the proximity and similarity between two pixels. As shown in Potts model [31] , if l i =l j , then µ(l i ,l j ) is 1, otherwise 0, indicating that adjacent similar pixels should be assigned different labels. The similar pixels are assigned the same label.
The "distance" refers to the relevant color space distance and the actual distance. Therefore, the accuracy of image boundary optimization can be improved by using CRF algorithm. Figure 8 is an enlarged image of local details optimized by CRF algorithm.
From Fig. 8, we can see that although the superpixel boundary optimization algorithm can eliminate the local error segmentation and optimize the image edge, it still needs to strengthen the edge optimization for thin edge and complex overlapping area. After adding CRF algorithm, we can see that the image boundary details can be optimized, and more effective information can be applied.

Experimental Results and Analysis
In this section, we evaluate and analyze the experimental results from subjective performance and objective performance, respectively, to verify the effectiveness and performance of our proposed algorithm.

Performance
In this paper, KITTI data set is used to train and simulate the network. After many times of network iteration training, the network training results are obtained. The classification results are optimized by the fusion of super-pixel and convolution neural network. The segmentation results of road map tested by training model are shown in Fig.9.
From Fig. 9, we can see that the segmentation re-sults of our algorithm and SegNet s algorithm are not very different from each other from a visual point of view because the image boundary pixels are relatively small. However, the phenomenon of misjudgment in small area in the image has been significantly improved. From the right grassland in the first line image, the lane part in the second line image, the traffic signs in the third line image, and the right grassland in the fifth line image, there are misjudged pixels in these areas. By using this algorithm, some areas can be classified correctly.

Evaluation and Analysis of Objective Performance
In image segmentation, many criteria are usually used to measure the accuracy of the algorithm. These criteria are usually variations of pixel accuracy and intersection-over-union (IoU). In the formula for evaluating indicators, the basic principle is that an image has k + 1 categories, p ij indicates the number of pixels that should belong to class i but are mis-predicted to class j.
3) Mean Intersection over Union (MIoU): it means calculating the ratio of the intersection and union of two sets. In the problem of semantic segmentation, the two sets are ground truth and predicted segmentation. This ratio can be transformed into the ratio of positive and true numbers to the sum of true, false negative and false positive (union). IoU is calculated on each class and then averaged.
The indicators overall performance of the three different models studied in this paper are shown in Table 1 below: The model proposed in this paper is mainly composed of three parts. SegNet model is used to roughly segment the network, SLIC algorithm is used to fine segment the segmentation result under super pixel, and finally CRF algorithm is used to restore the image boundary. To prove the effectiveness of the proposed model in this paper, the SegNet model, SegNet+SLIC algorithm (no CRF algorithm added) are compared with it. The experimental results are shown in Table 1. Table 1 shows that before adding CRF to the algorithm in this paper, the pixel accuracy of this algorithm is very similar to that of SegNet model, but after adding CRF algorithm, the pixel accuracy of our algorithm is improved by 0.71% and the average pixel accuracy is increased by 2.25%. The MIoU increased by 0.10% without CRF and increased by 0.57% with CRF compared with SegNet. Compared with Unet, the MPA and MIoU of this algorithm are 18.15% and 10.08% higher, respectively. Experiments show that the proposed algorithm improves the overall performance of road image segmentation, but the segmentation accuracy of slender objects such as poles and fences still needs to be improved.
Using the KITTI data set to test the value of IoU for each class, the contrastive values are shown in Table 2.
From Table 2, compared with FCN-8s and SegNet, it can be seen that IoU of road, sidewalk, building, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, rider, car, truck, bus, motorcycle and bicycle has been improved, and the performance of wall remains unchanged. The performance of person and train has declined compared with FCN-8s, because there are not many human images in the data set, and most of them are concentrated in the distance of the scene. There is also the phenomenon of occlusion, it is easy to classify them into other categories by using the boundary optimization algorithm based on super-pixel. After adding CRF algorithm to the image boundary restoration, the IoU of all target categories has been basically improved, which shows that CRF algorithm has a strong ability to restore image boundary details. In short, the algorithm in this paper is helpful to image boundary optimization.
Comparing the proposed algorithm with the Unet model, it can be found that although the total pixel accuracy of Unet is one percentage point higher than that of the proposed algorithm , the MPA and MIoU of Unet are significantly lower than that of the proposed model . It can be seen from Table 3 that although Unet has strong recognition ability for target categories that account for a large proportion of pixels such as road and terrain, its performance for multi-target segmentation tasks is not as good as that of the algorithm proposed in this paper. The comparison experiment shows that the algorithm in this paper has better performance for multi segmentation tasks.

Conclusion
This paper proposes a segmentation method of semantic redefine segmentation using image boundary region. First, the rough features of the target image are extracted using the SegNet model. Then, the SLIC algorithm is used to extract the contour information of the image edge at the super pixel level, and the super pixel feature is applied to the edge information to improve the segmentation accuracy of the target image. In addition, the CRF algorithm is used to restore the image boundary, further refining the segmentation effect. At the same time, the ablation experiment proves that the use of CRF  Through the algorithm in this paper, the object seg-mentation result in the image is more accurate, and the recognition accuracy of each category is higher in the multi object segmentation scene. The final experiment in