Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and recognition in mobile laser scanning data

doi:10.1016/j.isprsjprs.2016.01.005

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 113, March 2016, Pages 106-123

https://doi.org/10.1016/j.isprsjprs.2016.01.005 Get rights and content

Abstract

This paper presents a novel algorithm for detection and recognition of traffic signs in mobile laser scanning (MLS) data for intelligent transportation-related applications. The traffic sign detection task is accomplished based on 3-D point clouds by using bag-of-visual-phrases representations; whereas the recognition task is achieved based on 2-D images by using a Gaussian-Bernoulli deep Boltzmann machine-based hierarchical classifier. To exploit high-order feature encodings of feature regions, a deep Boltzmann machine-based feature encoder is constructed. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieves an average recall, precision, quality, and F-score of 0.956, 0.946, 0.907, and 0.951, respectively, on the four selected MLS datasets. For on-image traffic sign recognition, a recognition accuracy of 97.54% is achieved by using the proposed hierarchical classifier. Comparative studies with the existing traffic sign detection and recognition methods demonstrate that our algorithm obtains promising, reliable, and high performance in both detecting traffic signs in 3-D point clouds and recognizing traffic signs on 2-D images.

Introduction

Traffic signs provide road users with correct, detailed road information, thereby ensuring road users to rapidly approach their destinations. Traffic signs also function to regulate and control traffic activities, thereby ensuring traffic safety and traffic smoothness. To facilitate management and improve efficiency, an effective and automated traffic sign recognition system is urgently demanded by transportation agencies to monitor the status and measure the usability of traffic signs. In addition, the accurate functionality and localization information of traffic signs also provides important inputs to many intelligent transportation-related applications, such as driver assistance and safety warning systems (Zheng et al., 2004, Cheng et al., 2007) and autonomous driving (Choi et al., 2012, Broggi et al., 2013, Seo et al., 2015). Specifically, the absence or lack of visibility of necessary traffic signs might cause inconvenience to road users, sometimes even cause terrible traffic accidents and casualties. Therefore, exploring and developing effective, automated traffic sign detection and recognition techniques are essential to the transportation agencies to rapidly update traffic sign inventory and improve traffic quality and safety.

Traditionally, the measurement and maintenance of traffic signs were basically accomplished through field work, where field workers from transportation agencies conducted on-site inspections and maintenance on a regular basis. In fact, such field work was time consuming, labor intensive, costly, and inefficient to operate large-scale, complicated road networks. Recently, with the advance of optical imaging techniques, mobile mapping systems (MMS) using digital or video camera(s) (Murray et al., 2011, Brogan et al., 2013) have emerged as an effective tool for a wide range of transportation applications. The images collected by a camera-based MMS have provided a promising data source for rapid detection and recognition of traffic signs along roadways. However, the MMS images suffer greatly from object distortions, motion blurs, noises, and illumination variations. In addition, caused by viewpoint variations, traffic signs are sometimes occluded or partially occluded by the nearby objects (e.g., trees) in the images. Therefore, it is still a great challenge to achieve high-quality, high-accuracy, and automated detection and recognition of traffic signs from MMS images.

Since last decade, benefitting from the integration of laser scanning and position and orientation technologies, mobile laser scanning (MLS) systems have been designed and extensively used in fields of transportation, road feature inventory, computer games, cultural heritage documentation, and basic surveying and mapping (Williams et al., 2013). MLS systems can rapidly acquire highly dense and accurate 3-D point clouds along with color imagery. The 3-D point clouds provide accurate geometric and localization information of the objects; whereas the color imagery provides detailed texture and content information of the objects. Therefore, by fusing imagery and 3-D point clouds, MLS systems provide a promising solution to traffic sign detection (based on 3-D point clouds) and recognition (based on imagery).

In this paper, we present a novel algorithm combining bag-of-visual-phrases (BoVPs) and hierarchical deep models for detecting and recognizing traffic signs from MLS data. The proposed algorithm includes three stages: visual phrase dictionary generation, traffic sign detection, and traffic sign recognition. At the visual phrase dictionary generation stage, the training MLS data are supervoxelized to construct feature regions for generating a visual word vocabulary, and to construct spatial word patterns for generating a visual phrase dictionary. At the traffic sign detection stage, individual semantic objects are first segmented, supervoxelized, featured, and quantized to form BoVPs representations. Then, traffic signposts are detected based on similarity measures between the BoVPs of the query object and the semantic objects. Finally, traffic signs are located and segmented via percentile-based analysis. At the traffic sign recognition stage, a Gaussian-Bernoulli deep Boltzmann machine (DBM) based hierarchical classifier is applied to the registered traffic sign regions to recognize traffic signs.

The main contributions of this paper are as follows: (1) a DBM-based feature encoder is proposed to generate high-order feature encodings of feature regions; (2) a supervoxel-based BoVPs model is proposed to depict point cloud objects; (3) an extended voxel-based normalized cut segmentation method is developed to segment overlapped semantic objects. In this paper, “high-order feature” denotes the high-level abstraction of a set of features or the “feature of features”; whereas “low-order feature” denotes a specific single feature.

Section snippets

Existing methods

In the following sections, we present a detailed review of existing methods for traffic sign detection based on MLS point clouds and traffic sign recognition based on images.

Visual phrase dictionary generation

In this section, we present the technical and implementation details for the supervoxel-based visual phrase dictionary generation from MLS point clouds. Such a visual phrase dictionary can be further used to construct BoVPs for depicting 3-D point cloud objects.

Traffic sign detection

In this section, we present a traffic sign detection framework by using the generated visual phrase dictionary. For a search scene, semantic objects are first segmented through a combination of Euclidean distance clustering and extended voxel-based normalized cut segmentation. Then, the query object and each of the segmented semantic objects are supervoxelized, featured, and quantized to form BoVPs representations. Next, traffic signposts are detected based on the similarity measures between

On-image traffic sign detection

Due to the lack of informative textures of MLS point clouds, the task of traffic sign recognition cannot be accomplished only based on the point cloud data. Fortunately, along the acquisition of 3-D point cloud data, MLS systems simultaneously capture image data using the on-board digital cameras. Therefore, in this paper, the images captured by the on-board cameras of the MLS system are used for traffic sign recognition. Based on the detected traffic sign point clouds in Section 4, on-image

RIEGL VMX-450 system and MLS data sets

In this study, a RIEGL VMX-450 system was used to collect both point clouds and images in Xiamen City, China. This system is composed of (1) two RIEGL VQ-450 laser scanners, (2) four 2452 × 2056 pixels CS6 color cameras, and (3) an integrated IMU/GNSS/DMI position and orientation system. The two laser scanners achieve a maximum effective measurement rate of 1.1 million measurements per second, a line scan speed of up to 400 lines per second, and a maximum valid range of 800 m.

We collected four data

Conclusion

In this paper, we have proposed a novel algorithm combining 3-D point clouds and 2-D images for detecting and recognizing traffic signs based on BoVPs and hierarchical deep models. The traffic sign detection task was accomplished based on 3-D point clouds; whereas the recognition task was achieved based on 2-D images. The proposed algorithm has been evaluated on four data sets collected by a RIEGL VMX-450 system. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieved

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant 41471379, and in part by PAPD and CICAEET. The authors would like to thank the anonymous reviewers for their valuable comments.

References (71)

A. Bolovinou et al.
Bag of spatio-visual words for context inference in scene classification
Pattern Recogn.
(2013)
H.D. Cheng et al.
Color image segmentation: advances and prospects
Pattern Recogn.
(2001)
D. Cireşan et al.
Multi-column deep neural network for traffic sign classification
Neural Netw.
(2012)
A. Serna et al.
Detection, segmentation and classification of 3D urban objects using mathematical morphology and supervised learning
ISPRS J. Photogram. Remote Sens.
(2014)
B. Soheilian et al.
Detection and 3D reconstruction of traffic signs from multiple view color images
ISPRS J. Photogram. Remote Sens.
(2013)
B. Yang et al.
A shape-based segmentation method for mobile laser scanning point clouds
ISPRS J. Photogram. Remote Sens.
(2013)
C. Ai et al.
Critical assessment of an enhanced traffic sign detection method using mobile LiDAR and INS technologies
J. Transp. Eng.
(2015)
R. Baeza-Yates et al.
Modern Information Retrieval
(1999)
X. Baró et al.
Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification
IEEE Trans. Intell. Transp. Syst.
(2009)
M. Brogan et al.
Assessment of stereo camera calibration techniques for a portable mobile mapping system
IET Comput. Vis.
(2013)

A. Broggi et al.

Extensive tests of autonomous driving technologies

IEEE Trans. Intell. Transp. Syst.

(2013)

G. Carneiro et al.

Combining multiple dynamic models and deep learning architectures for tracking left ventricle endocardium in ultrasound data

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

B. Chen et al.

Deep learning with hierarchical convolutional factor analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

Chen, X., Kohlmeyer, B., Stroila, M., Alwar, N., Wang, R., Bach, J., 2009. Next generation map making: geo-referenced...

Chen, Y., Zhao, H., Shibasaki, R., 2007. A mobile system combining laser scanners and cameras for urban spatial objects...

H. Cheng et al.

Interactive road situation analysis for driver assistance and safety warning systems: framework and algorithms

IEEE Trans. Intell. Transp. Syst.

(2007)

Chigorin, A., Konushin, A., 2013. A system for large-scale automatic traffic sign recognition and mapping. In: ISPRS...

J. Choi et al.

Environment-detection-and-mapping algorithm for autonomous driving in rural or off-road environment

IEEE Trans. Intell. Transp. Syst.

(2012)

A. de la Escalera et al.

Road traffic sign detection and classification

IEEE Trans. Ind. Electron.

(1997)

C. Fang et al.

Road sign detection and tracking

IEEE Trans. Veh. Technol.

(2003)

Fleyeh, H., 2006. Shadow and highlight invariant colour segmentation algorithm for traffic signs. In: IEEE Conference...

H. Fleyeh et al.

Eigen-based traffic sign recognition

IET Intell. Transp. Syst.

(2011)

Golovinskiy, A., Funkhouser, T., 2009a. Min-cut based segmentation of point clouds. In: IEEE 12th International...

Golovinskiy, A., Kim, V.G., Funkhouser, T., 2009b. Shape-based recognition of 3D point clouds in urban environments....

H. Gómez-Moreno et al.

Goal evaluation of segmentation algorithms for traffic sign recognition

IEEE Trans. Intell. Transp. Syst.

(2010)

Á. González et al.

Automatic traffic signs and panels inspection system using computer vision

IEEE Trans. Intell. Transp. Syst.

(2011)

J. Greenhalgh et al.

Real-time detection and recognition of road traffic signs

IEEE Trans. Intell. Transp. Syst.

(2012)

J. Greenhalgh et al.

Recognizing text-based traffic signs

IEEE Trans. Intell. Transp. Syst.

(2015)

L. Hazelhoff et al.

Exploiting street-level panoramic images for large-scale automated surveying of traffic signs

Mach. Vis. Appl.

(2014)

J. Jin et al.

Traffic sign recognition with hinge loss trained convolutional neural networks

IEEE Trans. Intell. Transp. Syst.

(2014)

Y. Jiang et al.

Randomized spatial context for object search

IEEE Trans. Image Process.

(2015)

Körtgen, M., Park, G.J., Novotni, M., Klein, R., 2003. 3D shape matching with 3D shape contexts. In: 7th Central...

F. Larsson et al.

Correlating Fourier descriptors of local patches for road sign recognition

IET Comput. Vis.

(2011)

K. Lu et al.

Sparse-representation-based graph embedding for traffic sign recognition

IEEE Trans. Intell. Transp. Syst.

(2012)

Mathias, M., Timofte, R., Benenson, R., Van Gool, L., 2013. Traffic sign recognition – how far are we from the...

Cited by (97)

Road object detection for HD map: Full-element survey, analysis and perspectives
2023, ISPRS Journal of Photogrammetry and Remote Sensing
As the key part of autonomous driving (AD), High-Definition (HD) map provides more precise location and rich semantic information than the traditional map. With the development of AD, it is eager to construct HD map accurately and efficiently. Road scene perception, especially the road object detection, plays important role in HD map generation. Recently, numerous methods have been undertaken for different tasks in constructing HD maps. To stimulate further research, this paper presents a comprehensive review of recent progress in road surface's information extraction and road object detection. For road surface information extraction, it covers three main categories: road pavement, marking and line. For road object detection, it focuses on the full-element road object detection, including sign-objects, pole-like objects, guardrails, street trees and moving objects detection. In addition, the common public datasets and corresponding results are provided. Besides, for the future research directions, we also summarize the current development trends and topics.
Traffic sign extraction using deep hierarchical feature learning and mobile light detection and ranging (LiDAR) data on rural highways
2023, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations
The application of deep learning techniques on point cloud data holds significant promise for efficient data segmentation and classification of traffic signs. This study proposes modifications to the PointNet++ neural network to improve performance on outdoor scenes. In addition, the method leverages the use of local geometric features in the training process. Several models with different combinations of geometric features and proposed changes were trained using labeled data from seven highway segments in Alberta, Canada. The results indicate that the proposed models have improved performance in accuracy and processing times compared to previous studies on sign detection using point cloud data. The overall per sign detection performance shows a 99.2% recall (98% per point) and a 98% F1-score (97% per point). Overall, the inclusion of z-gradient significantly increased sign detection in terms of precision, recall, and F1-score, by 9%, 4.9%, and 7.1%, respectively, allowing the model to yield notable performance improvements for outdoor scene recognition. Ablation tests were performed to validate the performed PointNet++ modifications. The modified PointNet++ was compared with SqueezeSegV2, a state-of-the-art neural network designed for road-object segmentation, and showed improved performance. A comparison was also made with existing sign detection methods on the Paris-Lille-3D benchmark, finding higher recall rates than existing studies. The proposed approach suggests that with adjustments, the PointNet++ neural network architecture can achieve remarkable results on large metric scale scenes for sign extraction using point cloud data.
GraNet: Global relation-aware attentional network for semantic segmentation of ALS point clouds
2021, ISPRS Journal of Photogrammetry and Remote Sensing
Semantic labeling is an essential but challenging task when interpreting point clouds of 3D scenes. As a core step for scene interpretation, semantic labeling is the task of annotating every point in the point cloud with a label of semantic meaning, which plays a significant role in plenty of point cloud related applications. For airborne laser scanning (ALS) point clouds, precise annotations can considerably broaden its use in various applications. However, accurate and efficient semantic labeling is still a challenging task, due to the sensor noise, complex object structures, incomplete data, and uneven point densities. In this work, we propose a novel neural network focusing on semantic labeling of ALS point clouds, which investigates the importance of long-range spatial and channel-wise relations and is termed as global relation-aware attentional network (GraNet). GraNet first learns local geometric description and local dependencies using a local spatial discrepancy attention convolution module (LoSDA). In LoSDA, the orientation information, spatial distribution, and elevation information are fully considered by stacking several local spatial geometric learning modules and the local dependencies are learned by using an attention pooling module. Then, a global relation-aware attention module (GRA), consisting of a spatial relation-aware attention module (SRA) and a channel relation-aware attention module (CRA), is presented to further learn attentions from the structural information of a global scope from the relations and enhance high-level features with the long-range dependencies. The aforementioned two important modules are aggregated in the multi-scale network architecture to further consider scale changes in large urban areas. We conducted comprehensive experiments on three ALS point cloud datasets to evaluate the performance of our proposed framework. The results show that our method can achieve higher classification accuracy compared with other commonly used advanced classification methods. For the ISPRS benchmark dataset, our method improves the overall accuracy (OA) to 84.5 % and the average $F_{1}$ measure ( ${AvgF}_{1}$ ) to 73.6 %, which outperforms other baselines. Besides, experiments were conducted using a new ALS point cloud dataset covering highly dense urban areas and a newly published large-scale dataset.
DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements
2021, Expert Systems with Applications
Citation Excerpt :
In Liu, Chang, Chen, and Liu (2016), Liu et al. used region-of-interest (ROI) extraction method called high contrast region extractor (HCRE), to extract ROI with high local contrast, a split-flow cascade tree detector to quickly detect different types of traffic signs, and an extended sparse representation based traffic sign classification method on multiple databases. Yu et al. (2016) created a system for detection and recognition of traffic signs using mobile laser scanning data. The detection task is based on 3-D point clouds using bag-of-visual-phrases representations and the recognition task is based on 2-D images using a Gaussian–Bernoulli deep Boltzmann machine-based hierarchical classifier.
For a safe and automated vehicle driving application, it is a prerequisite to have a robust and highly accurate traffic sign detection system. In this paper, we proposed a novel energy-efficient Thin yet Deep convolutional neural network architecture for traffic sign recognition. Within the proposed architecture, each convolutional layer contains less than 50 features enabling our convolutional neural network to be trained quickly even without the aid of a graphics processing unit. The performance of the proposed architecture is measured using two publicly available traffic sign datasets, namely the German Traffic Sign Recognition Benchmark and the Belgian Traffic Sign Classification dataset. First, we train and test the performance of the proposed architecture using the large German Traffic Sign Recognition Benchmark dataset. Then, we retrain the network models using transfer learning on the more challenging Belgian Traffic Sign Classification dataset to evaluate test performance. The proposed architecture outperforms the performance of the state-of-the-art traffic sign methods with at least five times less parameter in the individual end-to-end network for training.
Vision-based detection of car turn signals (left/right)
2024, AIP Conference Proceedings
Transfer Learning-based Drowsiness Detection System for Driver Assistance and Classification of Traffic Signs Employing a Deep Convolutional Neural Network
2023, International Journal on Artificial Intelligence Tools

View all citing articles on Scopus

View full text

Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and recognition in mobile laser scanning data

Abstract

Introduction

Section snippets

Existing methods

Visual phrase dictionary generation

Traffic sign detection

On-image traffic sign detection

RIEGL VMX-450 system and MLS data sets

Conclusion

Acknowledgement

Pattern Recogn.

Pattern Recogn.

Neural Netw.

ISPRS J. Photogram. Remote Sens.

ISPRS J. Photogram. Remote Sens.

ISPRS J. Photogram. Remote Sens.

Critical assessment of an enhanced traffic sign detection method using mobile LiDAR and INS technologies

J. Transp. Eng.

Modern Information Retrieval

Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification

IEEE Trans. Intell. Transp. Syst.

Assessment of stereo camera calibration techniques for a portable mobile mapping system

IET Comput. Vis.

Extensive tests of autonomous driving technologies

IEEE Trans. Intell. Transp. Syst.

Combining multiple dynamic models and deep learning architectures for tracking left ventricle endocardium in ultrasound data

IEEE Trans. Pattern Anal. Mach. Intell.

Deep learning with hierarchical convolutional factor analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Interactive road situation analysis for driver assistance and safety warning systems: framework and algorithms

IEEE Trans. Intell. Transp. Syst.

Environment-detection-and-mapping algorithm for autonomous driving in rural or off-road environment

IEEE Trans. Intell. Transp. Syst.

Road traffic sign detection and classification

IEEE Trans. Ind. Electron.

Road sign detection and tracking

IEEE Trans. Veh. Technol.

Eigen-based traffic sign recognition

IET Intell. Transp. Syst.

Goal evaluation of segmentation algorithms for traffic sign recognition

IEEE Trans. Intell. Transp. Syst.

Automatic traffic signs and panels inspection system using computer vision

IEEE Trans. Intell. Transp. Syst.

Real-time detection and recognition of road traffic signs

IEEE Trans. Intell. Transp. Syst.

Recognizing text-based traffic signs

IEEE Trans. Intell. Transp. Syst.

Exploiting street-level panoramic images for large-scale automated surveying of traffic signs

Mach. Vis. Appl.

Traffic sign recognition with hinge loss trained convolutional neural networks

IEEE Trans. Intell. Transp. Syst.

Randomized spatial context for object search

IEEE Trans. Image Process.

Correlating Fourier descriptors of local patches for road sign recognition

IET Comput. Vis.

Sparse-representation-based graph embedding for traffic sign recognition

IEEE Trans. Intell. Transp. Syst.