Abstract
Scene text removal aims to remove scene text from images and fill the resulting gaps with plausible and realistic content. Within the context of scene text removal, two potential sub-tasks exist, i.e., text perception and text removal. However, most existing methods have ignored this premise or only divided this task into two consecutive stages, without considering the interactive promotion relationship between them. By leveraging some transformations, better segmentation results can better guide the process of text removal, and vice versa. These two sub-tasks can mutually promote and co-evolve, creating an intertwined and spiraling process similar to the double helix structure of Deoxyribonucleic acid (DNA) molecules. In this paper, we propose a novel network, HelixNet, incorporating Dual Helix Cooperative Decoders for Scene Text Removal. It is an end-to-end one-stage model with one shared encoder and two interacted decoders for the text segmentation and text removal sub-tasks. Through the use of dual branch information interaction, we can fuse complementary information from each sub-task, achieving interaction between scene text removal and segmentation. Our proposed method is extensively evaluated on publicly available and commonly used real and synthetic datasets. The experimental results demonstrate the promotion effect of the specially designed decoder and also show that HelixNet can achieve state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang, Q., Jin, H., Huang, J., Lin, W.: SwapText: image based texts transfer in scenes. In: CVPR (2020)
Singh, A., Pang, G., Toh, M., Huang, J., Hassner, T.: TextOCR: towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: CVPR (2021)
Nakamura, T., Zhu, A., Yanai, K., Uchida, S.: Scene text eraser. In: ICDAR (2017)
Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: ensconce text in the wild. In: AAAI (2019)
Tursun, O., Rui, Z., Denman, S., Sridharan, S., Fookes, C.: MTRNet: a generic scene text eraser. In: ICDAR (2019)
Yu, T., et al.: Inpaint anything: segment anything meets image inpainting. arXiv preprint arXiv:2304.06790 (2023)
Liu, C., Liu, Y., Jin, L., Zhang, S., Wang, Y.: EraseNet: end-to-end text removal in the wild. IEEE Trans. Image Process. 29, 8760–8775 (2020)
Lyu, G., Liu, K., Zhu, A., Uchida, S., Iwana, B.K.: FETNet: feature erasing and transferring network for scene text removal. Pattern Recognit. 140, 109531 (2023)
Nguyen, N., et al.: Dictionary-guided scene text recognition. In: CVPR (2021)
Nobile, N., Suen, C.Y.: Text segmentation for document recognition. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 257–290. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_8
Bonechi, S., Bianchini, M., Scarselli, F., Andreini, P.: Weak supervision for generating pixel level annotations in scene text segmentation. Pattern Recogn. Lett. 138, 1–7 (2020)
Xixi, X., Qi, Z., Ma, J., Zhang, H., Shan, Y., Qie, X.: BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR (2022)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. 36(4), 1–14 (2017)
Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)
Lyu, G., Zhu, A.: PSSTRNet: progressive segmentation-guided scene text removal network. In: ICME (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR (2016)
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR (2019)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Tursun, O., Denman, S., Zeng, R., Sivapalan, S., Sridharan, S., Fookes, C.: MTRNet++: one-stage mask-based scene text eraser. Comput. Vis. Image Underst. 201, 103066 (2020)
Acknowledgement
This work is supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (No. 202200049).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, K., Lyu, G., Zhu, A. (2024). HelixNet: Dual Helix Cooperative Decoders for Scene Text Removal. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-8540-1_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8539-5
Online ISBN: 978-981-99-8540-1
eBook Packages: Computer ScienceComputer Science (R0)