Computer Science > Computer Vision and Pattern Recognition
[Submitted on 27 Mar 2024]
Title:Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking
View PDF HTML (experimental)Abstract:RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a parameter-efficient manner. However, these methods inadequately explore modality-independent patterns and disregard the dynamic reliability of different modalities in open scenarios. We propose M3PT, a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome these challenges. We pioneer the use of the middle fusion framework for RGB-T tracking, which achieves a balance between performance and efficiency. Furthermore, we incorporate the pre-trained RGB tracking model into the framework and utilize multiple flexible prompt strategies to adapt the pre-trained model to the comprehensive exploration of uni-modal patterns and the improved modeling of fusion-modal features, harnessing the potential of prompt learning in RGB-T tracking. Our method outperforms the state-of-the-art methods on four challenging benchmarks, while attaining 46.1 fps inference speed.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.