基于Fast R-CNN的车辆目标检测

曹诗雨; 刘跃虎; 李辛昭

发布时间： 2017-05-05
摘要点击次数： 8541
全文下载次数： 562
DOI: 10.11834/jig.160600
2017 | Volume 22 | Number 5

第十八届全国图像图形学术会议栏目
<< 上一篇
下一篇>>

基于Fast R-CNN的车辆目标检测

曹诗雨¹, 刘跃虎², 李辛昭²(1.西安交通大学软件学院, 西安 710049;2.西安交通大学人工智能与机器人研究所, 西安 710049)

摘要

目的在传统车辆目标检测问题中，需要针对不同图像场景选择适合的特征。为此提出一种基于快速区域卷积神经网络（Fast R-CNN）的场景图像车辆目标发现方法，避免传统车辆目标检测问题中需要设计手工特征的问题。方法该方法基于深度学习卷积神经网络思想。首先使用待检测车辆图像定义视觉任务。利用选择性搜索算法获得样本图像的候选区域，将候选区域坐标与视觉任务示例图像一起输入网络学习。示例图像经过深度卷积神经网络中的卷积层，池化层计算，最终得到深度卷积特征。在输入时没有规定示例图像的规格，此时得到的卷积特征规格不定。然后，基于Fast R-CNN网络结构，通过感兴趣区域池化层规格化特征，最后将特征输入不同的全连接分支，并行回归计算特征分类，以及检测框坐标值。经过多次迭代训练，最后得到与指定视觉任务强相关的目标检测模型，具有训练好的权重参数。在新的场景图像中，可以通过该目标检测模型检测给定类型的车辆目标。结果首先确定视觉任务包含公交车，小汽车两类，背景场景是城市道路。利用与视觉任务强相关的测试样本集对目标检测模型进行测试，实验表明，当测试样本场景与视觉任务相关度越高，且样本中车辆目标的形变越小，得到的车辆目标检测模型对车辆目标检测具有良好的检测效果。结论本文提出的车辆目标检测方法，利用卷积神经网络提取卷积特征代替传统手工特征提取过程，通过Fast R-CNN对由示例图像组成定义的视觉任务训练得到了效果良好的车辆目标检测模型。该模型可以对与视觉任务强相关新场景图像进行效果良好的车辆目标检测。本文结合深度学习卷积神经网络思想，利用卷积特征替代传统手工特征，避免了传统检测问题中特征选择问题。深层卷积特征具有更好的表达能力。基于Fast R-CNN网络，最终通过多次迭代训练得到车辆检测模型。该检测模型对本文规定的视觉任务有良好的检测效果。本文为解决车辆目标检测问题提供了更加泛化和简洁的解决思路。

关键词

快速区域卷积神经网络深度学习车辆视觉任务目标检测

Vehicle detection method based on fast R-CNN

Cao Shiyu¹, Liu Yuehu², Li Xinzhao²(1.School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China;2.Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, China)

Abstract

Objective The traditional vehicle target detection problem is typically divided into two steps:the first step is generating assumptions, that is, the image may exist in the vehicle target, thus reducing the need to calculate the area; the second step is verifying the hypothesis, that is, testing to verify whether there is a vehicle target in the image. In the first step, different features must be designed for different scenes. Among the features commonly used in vehicle detection problems are symmetry, color, shadows, corners, edges, textures, and lights. In the second step, verifying the hypothesis is typically based on the template method or on the appearance of the characteristics of the method. In addition to the above basic features, HOG, Harris, SIFT, and other manual features are also typically used. Finally, the test results are obtained through the support vector machine and other classifiers that classify the feature matrix. The whole process appears to be very detrimental to the generalization of detection problems; thus, it is necessary to select suitable characteristics for the case of unreasonable samples. This paper proposes a vehicle detection method based on Fast R-CNN, which can find vehicle objects in scene images. Method The method is based on the idea of deep learning convolution neural network. First, define the visual task using the vehicle image to be detected. The candidate region of the sample image is obtained by the selective search algorithm, and the candidate region coordinates are inputted to the network learning together with the visual task sample image. The sample image is calculated by the convolution layer and the pool layer in the deep convolution neural network. Finally, the deep convolution feature is obtained. The specifications of the sample image are not specified at the time of input, and the convolution characteristics obtained at this time are variable. Subsequently, the feature is normalized by the pooling region of the region of interest based on the Fast R-CNN network structure. Finally, the feature is inputted into different full-connection branches, parallel regression calculation feature classification, and detection frame coordinate values. After several iterations and training, the target detection model, which is strongly related to the specified visual task is obtained, and the trained weight parameters are trained. In a new scene image, a given type of vehicle target can be detected by the target detection model. Result The method uses a test dataset that is relative to the vision task to test the object detection model. The experiment suggests that it will achieve effective detection result by the vehicle detection model in the situation where the scenes of test samples are strongly correlated to the vision task. Conclusion First, determine the visual tasks that include the bus and car as the two categories. The background scene is the city road. Experimental results show that when the correlation between the test sample scene and the visual task is higher and the deformation of the vehicle target in the sample is smaller, the vehicle target detection model is obtained for the vehicle. Target detection has a good detection effect. Conclusion The vehicle target detection method proposed in this paper uses the convolution neural network to extract the convolution feature instead of the traditional manual feature extraction process. Fast R-CNN obtained the vehicle target detection model with good effect on the visual task training defined by the sample image. The model can achieve well-performing vehicle target detection on new scene images that are strongly related to visual tasks. In this paper, the convolution characteristics are used to replace the traditional manual features in combination with the depth of learning convolution neural network to avoid the traditional detection problems in the feature selection problem. Deep convolution features have better expression ability. Based on the Fast R-CNN network, the vehicle detection model is obtained by several iterations. The detection model has a good detection effect on the visual task specified in this paper. This paper provides a more general and concise solution of vehicle detection problems.

Keywords

fast region convolutional neural network(Fast R-CNN) deep learning vehicle vision tasks object detection

在线采编平台

在线出版

年度会议

下载中心

年度信息