Loading [a11y]/accessibility-menu.js
MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid Tokens | IEEE Journals & Magazine | IEEE Xplore

MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid Tokens

Publisher: IEEE

Abstract:

In this paper, we propose MedoidsFormer, a novel transformer-based backbone equipped with a self-attention mechanism that is tailored explicitly to LiDAR-based 3D object ...View more

Abstract:

In this paper, we propose MedoidsFormer, a novel transformer-based backbone equipped with a self-attention mechanism that is tailored explicitly to LiDAR-based 3D object detection. Unlike 2D object detection, the proportion of target objects to the input scene is much smaller, and their distribution is significantly sparser in 3D object detection. Given these observations, we introduce a new self-attention mechanism called Medoids Attention, focusing on exploiting interactions within surrounding regions, which not only reduces computation and memory costs but obtains discriminative context information. Instead of aggregating tokens from adjacent areas, we present a dynamic semantic-aware token mining process through k-Medoids clustering to direct select representative tokens for attention modeling. Our proposed method shows consistent improvement over existing 3D object detectors through extensive experiments and achieves state-of-the-art performance on the large-scale Waymo Open Dataset. We also conduct comprehensive ablation studies to verify the efficacy of the new self-attention mechanism and provide thorough insights.
Page(s): 5844 - 5854
Date of Publication: 21 March 2023

ISSN Information:

Publisher: IEEE

Funding Agency:


References

References is not available for this document.