CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

Liu, Yating; Li, Yaowei; Liu, Zimo; Yang, Wenming; Wang, Yaowei; Liao, Qingmin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.09496 (cs)

[Submitted on 18 Sep 2023 (v1), last revised 2 Jan 2024 (this version, v2)]

Title:CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

Authors:Yating Liu, Yaowei Li, Zimo Liu, Wenming Yang, Yaowei Wang, Qingmin Liao

View PDF HTML (experimental)

Abstract:Text-based Person Retrieval (TPR) aims to retrieve the target person images given a textual query. The primary challenge lies in bridging the substantial gap between vision and language modalities, especially when dealing with limited large-scale datasets. In this paper, we introduce a CLIP-based Synergistic Knowledge Transfer (CSKT) approach for TPR. Specifically, to explore the CLIP's knowledge on input side, we first propose a Bidirectional Prompts Transferring (BPT) module constructed by text-to-image and image-to-text bidirectional prompts and coupling projections. Secondly, Dual Adapters Transferring (DAT) is designed to transfer knowledge on output side of Multi-Head Attention (MHA) in vision and language. This synergistic two-way collaborative mechanism promotes the early-stage feature fusion and efficiently exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art approaches across three benchmark datasets when the training parameters merely account for 7.4% of the entire model, demonstrating its remarkable efficiency, effectiveness and generalization.

Comments:	ICASSP2024(accepted). minor typos revision compared to version 1 in arxiv
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.09496 [cs.CV]
	(or arXiv:2309.09496v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.09496

Submission history

From: Yating Liu [view email]
[v1] Mon, 18 Sep 2023 05:38:49 UTC (244 KB)
[v2] Tue, 2 Jan 2024 05:01:26 UTC (353 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators