HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal

Liu, Kun; Lyu, Guangtao; Zhu, Anna

doi:10.1007/978-981-99-8540-1_3

Kun Liu¹⁵,
Guangtao Lyu¹⁵ &
Anna Zhu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14431))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

348 Accesses

Abstract

Scene text removal aims to remove scene text from images and fill the resulting gaps with plausible and realistic content. Within the context of scene text removal, two potential sub-tasks exist, i.e., text perception and text removal. However, most existing methods have ignored this premise or only divided this task into two consecutive stages, without considering the interactive promotion relationship between them. By leveraging some transformations, better segmentation results can better guide the process of text removal, and vice versa. These two sub-tasks can mutually promote and co-evolve, creating an intertwined and spiraling process similar to the double helix structure of Deoxyribonucleic acid (DNA) molecules. In this paper, we propose a novel network, HelixNet, incorporating Dual Helix Cooperative Decoders for Scene Text Removal. It is an end-to-end one-stage model with one shared encoder and two interacted decoders for the text segmentation and text removal sub-tasks. Through the use of dual branch information interaction, we can fuse complementary information from each sub-task, achieving interaction between scene text removal and segmentation. Our proposed method is extensively evaluated on publicly available and commonly used real and synthetic datasets. The experimental results demonstrate the promotion effect of the specially designed decoder and also show that HelixNet can achieve state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yang, Q., Jin, H., Huang, J., Lin, W.: SwapText: image based texts transfer in scenes. In: CVPR (2020)
Google Scholar
Singh, A., Pang, G., Toh, M., Huang, J., Hassner, T.: TextOCR: towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR (2021)
Google Scholar
Nakamura, T., Zhu, A., Yanai, K., Uchida, S.: Scene text eraser. In: ICDAR (2017)
Google Scholar
Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: ensconce text in the wild. In: AAAI (2019)
Google Scholar
Tursun, O., Rui, Z., Denman, S., Sridharan, S., Fookes, C.: MTRNet: a generic scene text eraser. In: ICDAR (2019)
Google Scholar
Yu, T., et al.: Inpaint anything: segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)
Liu, C., Liu, Y., Jin, L., Zhang, S., Wang, Y.: EraseNet: end-to-end text removal in the wild. IEEE Trans. Image Process. 29, 8760–8775 (2020)
Article Google Scholar
Lyu, G., Liu, K., Zhu, A., Uchida, S., Iwana, B.K.: FETNet: feature erasing and transferring network for scene text removal. Pattern Recognit. 140, 109531 (2023)
Article Google Scholar
Nguyen, N., et al.: Dictionary-guided scene text recognition. In: CVPR (2021)
Google Scholar
Nobile, N., Suen, C.Y.: Text segmentation for document recognition. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 257–290. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_8
Chapter Google Scholar
Bonechi, S., Bianchini, M., Scarselli, F., Andreini, P.: Weak supervision for generating pixel level annotations in scene text segmentation. Pattern Recogn. Lett. 138, 1–7 (2020)
Article Google Scholar
Xixi, X., Qi, Z., Ma, J., Zhang, H., Shan, Y., Qie, X.: BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR (2022)
Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. 36(4), 1–14 (2017)
Article Google Scholar
Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001)
Article MathSciNet Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)
Google Scholar
Lyu, G., Zhu, A.: PSSTRNet: progressive segmentation-guided scene text removal network. In: ICME (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR (2016)
Google Scholar
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR (2019)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Tursun, O., Denman, S., Zeng, R., Sivapalan, S., Sridharan, S., Fookes, C.: MTRNet++: one-stage mask-based scene text eraser. Comput. Vis. Image Underst. 201, 103066 (2020)
Article Google Scholar

Download references

Acknowledgement

This work is supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (No. 202200049).

Author information

Authors and Affiliations

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
Kun Liu, Guangtao Lyu & Anna Zhu

Authors

Kun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guangtao Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Anna Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Zhu .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, K., Lyu, G., Zhu, A. (2024). HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-8540-1_3
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8539-5
Online ISBN: 978-981-99-8540-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics