Vision-Language Adaptive Mutual Decoder for OOV-STR

Hu, Jinshui; Liu, Chenyu; Yan, Qiandong; Zhu, Xuyang; Wu, Jiajia; Du, Jun; Dai, Lirong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.00859 (cs)

[Submitted on 2 Sep 2022 (v1), last revised 30 Oct 2023 (this version, v2)]

Title:Vision-Language Adaptive Mutual Decoder for OOV-STR

Authors:Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, Lirong Dai

View PDF

Abstract:Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings. Inspired by the intuition that the learned language prior have limited OOV preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle OOV problems partly. VLAMD consists of three main conponents. Firstly, we build an attention based LSTM decoder with two adaptively merged visual-only modules, yields a vision-language balanced main branch. Secondly, we add an auxiliary query based autoregressive transformer decoding head for common visual and language prior representation learning. Finally, we couple these two designs with bidirectional training for more diverse language modeling, and do mutual sequential decoding to get robuster results. Our approach achieved 70.31\% and 59.61\% word accuracy on IV+OOV and OOV settings respectively on Cropped Word Recognition Task of OOV-ST Challenge at ECCV 2022 TiE Workshop, where we got 1st place on both settings.

Comments:	1st Place Solution to ECCV 2022 OOV-ST Challenge; ICIG 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.00859 [cs.CV]
	(or arXiv:2209.00859v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.00859

Submission history

From: Chenyu Liu [view email]
[v1] Fri, 2 Sep 2022 07:32:22 UTC (1,194 KB)
[v2] Mon, 30 Oct 2023 03:15:16 UTC (1,500 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Adaptive Mutual Decoder for OOV-STR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Adaptive Mutual Decoder for OOV-STR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators