Skip to main content

Dynamic Feature Selection for Structural Image Content Recognition

  • Conference paper
  • First Online:
  • 1184 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13834))

Abstract

Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We interchangeably use character and symbol in this paper.

References

  1. Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: IbPRIA (2019)

    Google Scholar 

  2. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)

    Google Scholar 

  3. Blostein, D., Grbavec, A.: Recognition of mathematical notation. In: Handbook of Character Recognition and Document Image Analysis (1997)

    Google Scholar 

  4. Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. IJDAR 3, 3–15 (2000)

    Article  Google Scholar 

  5. Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)

    Article  MATH  Google Scholar 

  6. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML (2017)

    Google Scholar 

  7. Fu, Y., Liu, T., Gao, M., Zhou, A.: EDSL: an encoder-decoder architecture with symbol-level features for printed mathematical expression recognition. arXiv (2020)

    Google Scholar 

  8. Garain, U., Chaudhuri, B.B., Chaudhuri, A.R.: Identification of embedded mathematical expressions in scanned documents. In: ICPR (2004)

    Google Scholar 

  9. LaViola, J.J., Zeleznik, R.C.: A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer. TPAMI 29(11), 1917–1926 (2007)

    Article  Google Scholar 

  10. Liu, Q., et al.: Finding similar exercises in online education systems. In: SIGKDD (2018)

    Google Scholar 

  11. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv (2015)

    Google Scholar 

  12. Masato, M., Satoru, T., Hillary, K.C., Tadahiro, O., Momoyo, I., Minoru, F.: Automatic mood score detection method for music retrieval. IPSJ SIG Notes (2011)

    Google Scholar 

  13. Okamoto, M., Imai, H., Takagi, K.: Performance evaluation of a robust method for mathematical expression recognition. In: ICDAR (2001)

    Google Scholar 

  14. Qin, Y., Du, J., Zhang, Y., Lu, H.: Look back and predict forward in image captioning. In: CVPR (2019)

    Google Scholar 

  15. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv (2019)

    Google Scholar 

  16. Raja, A., Rayner, M., Sexton, A., Sorge, V.: Towards a parser for mathematical formula recognition. In: MKM (2006)

    Google Scholar 

  17. Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. TPAMI 30(6), 941–954 (2008)

    Article  Google Scholar 

  18. Shi, B., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  19. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: ACM Symposium on Document Engineering. ACM (2003)

    Google Scholar 

  20. Tang, P., Hui, S.C., Fu, C.W.: A progressive structural analysis approach for handwritten chemical formula recognition. In: ICDAR (2013)

    Google Scholar 

  21. Twaakyondo, H.M., Okamoto, M.: Structure analysis and recognition of mathematical expressions. In: ICDAR (1995)

    Google Scholar 

  22. Vo, Q.N., Nguyen, T., Kim, S.H., Yang, H.J., Lee, G.S.: Distorted music score recognition without staffline removal. In: ICPR (2014)

    Google Scholar 

  23. Wang, L., Zhang, D., Gao, L., Song, J., Guo, L., Shen, H.T.: Mathdqn: solving arithmetic word problems via deep reinforcement learning. In: AAAI (2018)

    Google Scholar 

  24. Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: CVPR (2020)

    Google Scholar 

  25. Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015)

    Google Scholar 

  26. Yin, Y., et al.: Transcribing content from structural images with spotlight mechanism. In: SIGKDD (2018)

    Google Scholar 

  27. Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. IJDAR 15, 331–357 (2012)

    Article  Google Scholar 

  28. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR (2018)

    Google Scholar 

  29. Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, Y., Zheng, S., Cai, W., Gao, M., Jin, C., Zhou, A. (2023). Dynamic Feature Selection for Structural Image Content Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27818-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27817-4

  • Online ISBN: 978-3-031-27818-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics