Skip to main content

TDAE: Text Detection with Affinity Areas and Evolution Strategies

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14192))

Included in the following conference series:

  • 378 Accesses

Abstract

Text detection in natural scenes has evolved considerably in recent years. Segmentation-based methods are widely used for text detection because they are robust to detect text of any shape. However, most previous works focus on word-level detection and neglect the regions between adjacent words, which are helpful when some text instances are very close. In this paper, we propose a novel image feature named affinity area that exploits the area between two adjacent text instances to enhance the detection capability. We design an affinity module to generate annotations based on existing word-level annotations since no open dataset supports that. By optimizing this module, our segmentation-based network TDAE can predict text regions and affinity regions through which we can obtain the final detection results. Inspired by the evolutionary strategy (ES), our network also utilizes an additional novel fine-tuning step to update the parameters by adding adaptive but random perturbations, which is quite different from the traditional gradient descent approach. Competitive results on ICDAR (2013, 2015, 2017), CTW-1500, and SynthText benchmarks further demonstrate the effectiveness of TDAE.

This work was supported by the National Natural Science Foundation of China (Grant No. 92270201).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  2. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780. AAAI Press (2018)

    Google Scholar 

  3. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324. IEEE Computer Society (2016)

    Google Scholar 

  4. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE Computer Society (2017)

    Google Scholar 

  5. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3066–3074. IEEE Computer Society (2017)

    Google Scholar 

  6. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  7. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)

    Google Scholar 

  8. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)

    Google Scholar 

  9. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 532–548 (2021)

    Article  Google Scholar 

  10. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  11. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167. AAAI Press (2017)

    Google Scholar 

  12. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, pp. 11474–11481. AAAI Press (2020)

    Google Scholar 

  13. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944. IEEE Computer Society (2017)

    Google Scholar 

  14. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  15. Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: CVPR, pp. 3454–3461. IEEE Computer Society (2017)

    Google Scholar 

  16. Liu, Y., Jin, L., Zhang, S., Zhang, S.: Detecting curve text in the wild: new dataset and new solution. CoRR abs/1712.02170 (2017)

    Google Scholar 

  17. Long, S., et al.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  18. Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: CVPR, pp. 7553–7563. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  19. Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, pp. 1454–1459. IEEE (2017)

    Google Scholar 

  20. Raisi, Z., Naiel, M.A., Younes, G., Wardell, S., Zelek, J.S.: Transformer-based text detection in the wild. In: CVPR Workshops, pp. 3162–3171. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  21. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)

    Google Scholar 

  22. Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490. IEEE Computer Society (2017)

    Google Scholar 

  23. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28

    Chapter  Google Scholar 

  24. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  25. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8439–8448. IEEE (2019)

    Google Scholar 

  26. Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11750–11759. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  27. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9125–9135. IEEE (2019)

    Google Scholar 

  28. Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567. IEEE Computer Society (2015)

    Google Scholar 

  29. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, K., Luo, Y., Huang, Z., Chen, K., Guo, J., Qiu, W. (2023). TDAE: Text Detection with Affinity Areas and Evolution Strategies. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14192. Springer, Cham. https://doi.org/10.1007/978-3-031-41731-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41731-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41730-6

  • Online ISBN: 978-3-031-41731-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics