A method of cross-layer fusion multi-object detection and recognition based on improved faster R-CNN model in complex traffic environment
Introduction
Object detection has always been a key technology for vehicles to cope with complex scenes reasonably and safely, and is one of the hot spots in computer vision research. Researchers have conducted a large number of studies on object detection methods. Such as selecting the Harr feature and Adaboosting classifier, the sliding window was used for face detection. The features extracted by the Histogram of Gradients (HOG) [1] were examined by a Support Vector Machine (SVM) [2] for pedestrian detection. For general object detection, the features of HOG and the Deformable Part Model (DPM) algorithm were adopted. Above methods have few features and great improvement in time efficiency, however, those approaches have obvious limitations and inaccuracies.
In recent years, with the development of deep learning technology, convolutional neural network is significantly superior to traditional methods in accuracy, and has become the latest research hotspot. Girshick et al. [3] proposed that region-based R-CNN could apply high-capacity convolutional neural network (CNN) to the bottom-up region for locating and segmenting objects. Zhang et al. [4] performed improving object detection with deep convolutional networks via bayesian optimization and structured prediction. Ren et al. [5] proposed Faster R-CNN and introduced a fully convolutive network RPN, which could simultaneously predict the object boundary and object fraction of each position. Kong et al. [6] proposed HyperNet, which combined the Hyper features of the bottom layer, the middle layer and the upper layer to obtain better effects in the processing of small objects. Zuo et al. [7] proposed traffic signs detection based on Faster R-CNN. Wang et al. [8] proposed Fast R-CNN and introduced GAN [9], [10] to generate highly difficult samples to improve the network’s adaptability to occlusion and deformation. Jian et al. [11] focused on investigating the salient feature fusion strategies in human visual attention mechanism for saliency detection. Jian et al. [12] proposed a novel computational model for saliency detection by integrating the holistic center-directional map with the principal local color contrast map. Jian et al. [13] proposed a novel framework for underwater image saliency detection by exploiting Quaternionic Distance Based Weber Descriptor. Jian et al. [14] proposed a video saliency-detection model based on human attention mechanism and full convolution neural networks. Jian et al. [15] described a simple visual saliency-detection model based on spatial position of salient objects and background cues. Chen et al. [16] carried out class detection of accurate object with 3D object proposals through stereo images. Peng et al. [17] designed a concurrent softmax to handle the multi-label problems in object detection and propose a soft-sampling method with hybrid training scheduler to deal with the label imbalance. Li et al. [18] provided the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. Gkioxari et al. [19] proposed Faster R-CNN model to detect and recognize human-object interactions. Hu et al. [20] proposed a method based on depth supervision for the salient object detection of short connections.
Faster R-CNN algorithm has high precision and strong scalability. In recent years, many researchers have improved based on the Faster R-CNN. Aiming to the low accuracy and speed of multi-object detection in the current complex traffic environment, we propose a cross-layer fusion multi-object detection and identification algorithm based on Faster R-CNN. Our main contributions are as follows:
(1) We use the five-layer structure of VGG16 to obtain more characteristic information. This idea is lateral embedding 1×1 convolution kernel in the last convolutional at the 1, 3 and 5 layers, and then add a max-pooling layer in the 1 layer to fuse with 3 layer. For 5 layer, we add a deconvolutional operation to fuse with 3 layer.
(2) Aiming to control the imbalance between difficult and easy samples, we use weighted balanced multi-class cross entropy loss function and Soft-NMS (Non-maximum suppression).
(3) Considering the actual situation in a complex traffic environment, we manually labeled mixed dataset. experimental results and data show that the proposed model achieves better effects than the current mainstream object detection models.
The process of performing our method is shown in Fig. 1.
The rest of this paper is organized as follows. In Section 2, we briefly introduce the Faster R-CNN and RPN. Section 3 introduces the improved network structure based on the Faster R-CNN model and the weighted balanced multi-class cross entropy loss function. In Section 4, we describe the training process and the experimental contrast results. Section 5 gives the discussion and conclusion.
Section snippets
Faster R-CNN for object detection
Faster R-CNN used Region Proposal Network (RPN) [21] to replace selective search (SS) [22] in the selection of candidate frames, which greatly improved the detection speed. Faster R-CNN was widely used in object detection and recognition, but the detection accuracy for small objects and occluded objects need to be improved. As shown in Fig. 2, both occluded cars and distant cars were not recognized.
Visual geometry group network
Visual Geometry Group Network was a deep convolutional neural network architecture [23], [24].
The improved faster R-CNN networks
In this paper, a cross-layer fusion multi-object detection and recognition algorithm is proposed. Five-layer convolution of the VGG16 is the mainstream architecture, and small convolution kernels of different dimensions are added to the hidden layer on the 1, 3 and 5 layers. After pairwise cross fusion, the feature map is extracted, and then the classification and location are performed by RPN and ROI. The algorithm is divided into 4 parts, the first is the input images of any size and angle,
Experimental environment
Experiments are carried out in the fast Feature embedded Caffe software environment under Ubuntu 18.04. The hardware environment is i7 8700k, and the GPU is GTX 1070ti 8G memory.
Training process
In order to verify the influence of multiscale fusion, weighted balance multi-class cross entropy loss function and Soft-NMS on the performance of model detection, the training process of this paper is divided into the following six steps. Algorithm 4 is shown the pseudocode of training.
Loss function of training process
In Fig. 4, the initial value of
Conclusions
In this paper, a VGG16-based improvement of the Faster R-CNN was proposed for multi-object detection and recognition. Experimental results and data show that the improved Faster R-CNN model integrates low-level and high-level image semantic features, compared with previous neural networks such as Fast R-CNN and Faster R-CNN based on the VGG16 template, which are allowing the model to acquire more, so the positioning accuracy of the object pixel feature is improved, and the weighted multi-class
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant 61701060 and the Doctoral Talent Training Project of Chongqing University of Posts and Telecommunications under Grant BYJS202007.
References (27)
- et al.
Multilingual scene character recognition with co-occurrence of histogram of oriented gradients
Pattern Recognit.
(2016) - et al.
Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction
Conference on Computer Vision and Pattern Recognition (CVPR)
(2015) - et al.
Assessment of feature fusion strategies in visual attention mechanism for saliency detection
Pattern Recognit. Lett.
(2019) - et al.
Saliency detection based on directional patches extraction and principal local color contrast, journal of visual communication and image representation
J. Vis. Commun. and Image Repres.
(2018) - et al.
Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection
J. Vis. Commun. Image Repres.
(2018) - et al.
Overcoming classifier imbalance for long-tail object detection with balanced group softmax
Conference on Computer Vision and Pattern Recognition (CVPR)
(Seattle, 2020) - et al.
Bounding box regression with uncertainty for accurate object detection
Conference on Computer Vision and Pattern Recognition (CVPR)
(2019) - et al.
Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery
Sensors.
(2018) - et al.
Rich feature hierarchies for accurate object detection and semantic segmentation
Conference on Computer Vision and Pattern Recognition (CVPR)
(2014) - et al.
Faster R-CNN: towards real-time object detection with region proposal networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2017)
HyperNet: towards accurate region proposal generation and joint object detection
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Traffic signs detection based on faster R-CNN
Computer International Conference on Distributed Computing Systems Workshops (ICDCSW)
A-Fast-RCNN: hard positive generation via adversary for object detection
Conference on Computer Vision and Pattern Recognition (CVPR)
Cited by (36)
Vehicle detection using improved region convolution neural network for accident prevention in smart roads
2022, Pattern Recognition LettersCitation Excerpt :The rapid population growth in modern cities has increased the demand for smart technologies for environmental sustainability and safety. Road safety is one of the most critical issues in smart city development when it comes to intelligent mobility [5,13,22]. Accident prevention [16,17] is one of the hot topics in road safety, where the goal is to find an efficient mechanism in predicting accidents before they happened.
Improved Mask R-CNN for obstacle detection of rail transit
2022, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :In order to solve the problem that Faster R-CNN [26] cannot effectively detect small targets and improve the classification ability of Faster R-CNN, Shao et al. [27] systematically improved the fast region based on Faster R-CNN for traffic sign detection in actual traffic conditions. For complex traffic scenes, Li et al. [28] proposed a cross-layer fusion multi-objective detection and recognition algorithm based on Faster R-CNN, which uses VGG16 [29] five-layer structure to obtain more feature information. By horizontally embedding 1 × 1 convolution kernel, max pooling and deconvolution, the imbalance between difficulty and simple samples is controlled by combining the weighted balanced multi-class cross-entropy loss function and Soft-NMS.
Adaptive enhancement of spatial information in adverse weather
2024, Spatial Information ResearchAn improved single-stage convolutional neural network for rail transit obstacle detection
2023, Measurement Science and Technology
Editor: Yuxin Peng.