Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and recognition in mobile laser scanning data

https://doi.org/10.1016/j.isprsjprs.2016.01.005Get rights and content

Abstract

This paper presents a novel algorithm for detection and recognition of traffic signs in mobile laser scanning (MLS) data for intelligent transportation-related applications. The traffic sign detection task is accomplished based on 3-D point clouds by using bag-of-visual-phrases representations; whereas the recognition task is achieved based on 2-D images by using a Gaussian-Bernoulli deep Boltzmann machine-based hierarchical classifier. To exploit high-order feature encodings of feature regions, a deep Boltzmann machine-based feature encoder is constructed. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieves an average recall, precision, quality, and F-score of 0.956, 0.946, 0.907, and 0.951, respectively, on the four selected MLS datasets. For on-image traffic sign recognition, a recognition accuracy of 97.54% is achieved by using the proposed hierarchical classifier. Comparative studies with the existing traffic sign detection and recognition methods demonstrate that our algorithm obtains promising, reliable, and high performance in both detecting traffic signs in 3-D point clouds and recognizing traffic signs on 2-D images.

Introduction

Traffic signs provide road users with correct, detailed road information, thereby ensuring road users to rapidly approach their destinations. Traffic signs also function to regulate and control traffic activities, thereby ensuring traffic safety and traffic smoothness. To facilitate management and improve efficiency, an effective and automated traffic sign recognition system is urgently demanded by transportation agencies to monitor the status and measure the usability of traffic signs. In addition, the accurate functionality and localization information of traffic signs also provides important inputs to many intelligent transportation-related applications, such as driver assistance and safety warning systems (Zheng et al., 2004, Cheng et al., 2007) and autonomous driving (Choi et al., 2012, Broggi et al., 2013, Seo et al., 2015). Specifically, the absence or lack of visibility of necessary traffic signs might cause inconvenience to road users, sometimes even cause terrible traffic accidents and casualties. Therefore, exploring and developing effective, automated traffic sign detection and recognition techniques are essential to the transportation agencies to rapidly update traffic sign inventory and improve traffic quality and safety.

Traditionally, the measurement and maintenance of traffic signs were basically accomplished through field work, where field workers from transportation agencies conducted on-site inspections and maintenance on a regular basis. In fact, such field work was time consuming, labor intensive, costly, and inefficient to operate large-scale, complicated road networks. Recently, with the advance of optical imaging techniques, mobile mapping systems (MMS) using digital or video camera(s) (Murray et al., 2011, Brogan et al., 2013) have emerged as an effective tool for a wide range of transportation applications. The images collected by a camera-based MMS have provided a promising data source for rapid detection and recognition of traffic signs along roadways. However, the MMS images suffer greatly from object distortions, motion blurs, noises, and illumination variations. In addition, caused by viewpoint variations, traffic signs are sometimes occluded or partially occluded by the nearby objects (e.g., trees) in the images. Therefore, it is still a great challenge to achieve high-quality, high-accuracy, and automated detection and recognition of traffic signs from MMS images.

Since last decade, benefitting from the integration of laser scanning and position and orientation technologies, mobile laser scanning (MLS) systems have been designed and extensively used in fields of transportation, road feature inventory, computer games, cultural heritage documentation, and basic surveying and mapping (Williams et al., 2013). MLS systems can rapidly acquire highly dense and accurate 3-D point clouds along with color imagery. The 3-D point clouds provide accurate geometric and localization information of the objects; whereas the color imagery provides detailed texture and content information of the objects. Therefore, by fusing imagery and 3-D point clouds, MLS systems provide a promising solution to traffic sign detection (based on 3-D point clouds) and recognition (based on imagery).

In this paper, we present a novel algorithm combining bag-of-visual-phrases (BoVPs) and hierarchical deep models for detecting and recognizing traffic signs from MLS data. The proposed algorithm includes three stages: visual phrase dictionary generation, traffic sign detection, and traffic sign recognition. At the visual phrase dictionary generation stage, the training MLS data are supervoxelized to construct feature regions for generating a visual word vocabulary, and to construct spatial word patterns for generating a visual phrase dictionary. At the traffic sign detection stage, individual semantic objects are first segmented, supervoxelized, featured, and quantized to form BoVPs representations. Then, traffic signposts are detected based on similarity measures between the BoVPs of the query object and the semantic objects. Finally, traffic signs are located and segmented via percentile-based analysis. At the traffic sign recognition stage, a Gaussian-Bernoulli deep Boltzmann machine (DBM) based hierarchical classifier is applied to the registered traffic sign regions to recognize traffic signs.

The main contributions of this paper are as follows: (1) a DBM-based feature encoder is proposed to generate high-order feature encodings of feature regions; (2) a supervoxel-based BoVPs model is proposed to depict point cloud objects; (3) an extended voxel-based normalized cut segmentation method is developed to segment overlapped semantic objects. In this paper, “high-order feature” denotes the high-level abstraction of a set of features or the “feature of features”; whereas “low-order feature” denotes a specific single feature.

Section snippets

Existing methods

In the following sections, we present a detailed review of existing methods for traffic sign detection based on MLS point clouds and traffic sign recognition based on images.

Visual phrase dictionary generation

In this section, we present the technical and implementation details for the supervoxel-based visual phrase dictionary generation from MLS point clouds. Such a visual phrase dictionary can be further used to construct BoVPs for depicting 3-D point cloud objects.

Traffic sign detection

In this section, we present a traffic sign detection framework by using the generated visual phrase dictionary. For a search scene, semantic objects are first segmented through a combination of Euclidean distance clustering and extended voxel-based normalized cut segmentation. Then, the query object and each of the segmented semantic objects are supervoxelized, featured, and quantized to form BoVPs representations. Next, traffic signposts are detected based on the similarity measures between

On-image traffic sign detection

Due to the lack of informative textures of MLS point clouds, the task of traffic sign recognition cannot be accomplished only based on the point cloud data. Fortunately, along the acquisition of 3-D point cloud data, MLS systems simultaneously capture image data using the on-board digital cameras. Therefore, in this paper, the images captured by the on-board cameras of the MLS system are used for traffic sign recognition. Based on the detected traffic sign point clouds in Section 4, on-image

RIEGL VMX-450 system and MLS data sets

In this study, a RIEGL VMX-450 system was used to collect both point clouds and images in Xiamen City, China. This system is composed of (1) two RIEGL VQ-450 laser scanners, (2) four 2452 × 2056 pixels CS6 color cameras, and (3) an integrated IMU/GNSS/DMI position and orientation system. The two laser scanners achieve a maximum effective measurement rate of 1.1 million measurements per second, a line scan speed of up to 400 lines per second, and a maximum valid range of 800 m.

We collected four data

Conclusion

In this paper, we have proposed a novel algorithm combining 3-D point clouds and 2-D images for detecting and recognizing traffic signs based on BoVPs and hierarchical deep models. The traffic sign detection task was accomplished based on 3-D point clouds; whereas the recognition task was achieved based on 2-D images. The proposed algorithm has been evaluated on four data sets collected by a RIEGL VMX-450 system. For detecting traffic signs in 3-D point clouds, the proposed algorithm achieved

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant 41471379, and in part by PAPD and CICAEET. The authors would like to thank the anonymous reviewers for their valuable comments.

References (71)

  • A. Broggi et al.

    Extensive tests of autonomous driving technologies

    IEEE Trans. Intell. Transp. Syst.

    (2013)
  • G. Carneiro et al.

    Combining multiple dynamic models and deep learning architectures for tracking left ventricle endocardium in ultrasound data

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • B. Chen et al.

    Deep learning with hierarchical convolutional factor analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Chen, X., Kohlmeyer, B., Stroila, M., Alwar, N., Wang, R., Bach, J., 2009. Next generation map making: geo-referenced...
  • Chen, Y., Zhao, H., Shibasaki, R., 2007. A mobile system combining laser scanners and cameras for urban spatial objects...
  • H. Cheng et al.

    Interactive road situation analysis for driver assistance and safety warning systems: framework and algorithms

    IEEE Trans. Intell. Transp. Syst.

    (2007)
  • Chigorin, A., Konushin, A., 2013. A system for large-scale automatic traffic sign recognition and mapping. In: ISPRS...
  • J. Choi et al.

    Environment-detection-and-mapping algorithm for autonomous driving in rural or off-road environment

    IEEE Trans. Intell. Transp. Syst.

    (2012)
  • A. de la Escalera et al.

    Road traffic sign detection and classification

    IEEE Trans. Ind. Electron.

    (1997)
  • C. Fang et al.

    Road sign detection and tracking

    IEEE Trans. Veh. Technol.

    (2003)
  • Fleyeh, H., 2006. Shadow and highlight invariant colour segmentation algorithm for traffic signs. In: IEEE Conference...
  • H. Fleyeh et al.

    Eigen-based traffic sign recognition

    IET Intell. Transp. Syst.

    (2011)
  • Golovinskiy, A., Funkhouser, T., 2009a. Min-cut based segmentation of point clouds. In: IEEE 12th International...
  • Golovinskiy, A., Kim, V.G., Funkhouser, T., 2009b. Shape-based recognition of 3D point clouds in urban environments....
  • H. Gómez-Moreno et al.

    Goal evaluation of segmentation algorithms for traffic sign recognition

    IEEE Trans. Intell. Transp. Syst.

    (2010)
  • Á. González et al.

    Automatic traffic signs and panels inspection system using computer vision

    IEEE Trans. Intell. Transp. Syst.

    (2011)
  • J. Greenhalgh et al.

    Real-time detection and recognition of road traffic signs

    IEEE Trans. Intell. Transp. Syst.

    (2012)
  • J. Greenhalgh et al.

    Recognizing text-based traffic signs

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • L. Hazelhoff et al.

    Exploiting street-level panoramic images for large-scale automated surveying of traffic signs

    Mach. Vis. Appl.

    (2014)
  • J. Jin et al.

    Traffic sign recognition with hinge loss trained convolutional neural networks

    IEEE Trans. Intell. Transp. Syst.

    (2014)
  • Y. Jiang et al.

    Randomized spatial context for object search

    IEEE Trans. Image Process.

    (2015)
  • Körtgen, M., Park, G.J., Novotni, M., Klein, R., 2003. 3D shape matching with 3D shape contexts. In: 7th Central...
  • F. Larsson et al.

    Correlating Fourier descriptors of local patches for road sign recognition

    IET Comput. Vis.

    (2011)
  • K. Lu et al.

    Sparse-representation-based graph embedding for traffic sign recognition

    IEEE Trans. Intell. Transp. Syst.

    (2012)
  • Mathias, M., Timofte, R., Benenson, R., Van Gool, L., 2013. Traffic sign recognition – how far are we from the...
  • Cited by (97)

    • Road object detection for HD map: Full-element survey, analysis and perspectives

      2023, ISPRS Journal of Photogrammetry and Remote Sensing
    • DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements

      2021, Expert Systems with Applications
      Citation Excerpt :

      In Liu, Chang, Chen, and Liu (2016), Liu et al. used region-of-interest (ROI) extraction method called high contrast region extractor (HCRE), to extract ROI with high local contrast, a split-flow cascade tree detector to quickly detect different types of traffic signs, and an extended sparse representation based traffic sign classification method on multiple databases. Yu et al. (2016) created a system for detection and recognition of traffic signs using mobile laser scanning data. The detection task is based on 3-D point clouds using bag-of-visual-phrases representations and the recognition task is based on 2-D images using a Gaussian–Bernoulli deep Boltzmann machine-based hierarchical classifier.

    View all citing articles on Scopus
    View full text