Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

Wang, Qiang; Du, Junlong; Yan, Ke; Ding, Shouhong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.04828 (cs)

[Submitted on 9 Aug 2023]

Title:Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

Authors:Qiang Wang, Junlong Du, Ke Yan, Shouhong Ding

View PDF

Abstract:The Contrastive Language-Image Pre-training (CLIP) has recently shown remarkable generalization on "zero-shot" training and has applied to many downstream tasks. We explore the adaptation of CLIP to achieve a more efficient and generalized action recognition method. We propose that the key lies in explicitly modeling the motion cues flowing in video frames. To that end, we design a two-stream motion modeling block to capture motion and spatial information at the same time. And then, the obtained motion cues are utilized to drive a dynamic prompts learner to generate motion-aware prompts, which contain much semantic information concerning human actions. In addition, we propose a multimodal communication block to achieve a collaborative learning and further improve the performance. We conduct extensive experiments on HMDB-51, UCF-101, and Kinetics-400 datasets. Our method outperforms most existing state-of-the-art methods by a significant margin on "few-shot" and "zero-shot" training. We also achieve competitive performance on "closed-set" training with extremely few trainable parameters and additional computational costs.

Comments:	Accepted by ACM MM 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.04828 [cs.CV]
	(or arXiv:2308.04828v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.04828

Submission history

From: Qiang Wang [view email]
[v1] Wed, 9 Aug 2023 09:33:45 UTC (581 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators