Dynamic Feature Selection for Structural Image Content Recognition

Fu, Yingnan; Zheng, Shu; Cai, Wenyuan; Gao, Ming; Jin, Cheqing; Zhou, Aoying

doi:10.1007/978-3-031-27818-1_28

Dynamic Feature Selection for Structural Image Content Recognition

Yingnan Fu¹⁵,
Shu Zheng¹⁵,
Wenyuan Cai¹⁷,
Ming Gao^15,16,
Cheqing Jin¹⁵ &
…
Aoying Zhou¹⁵

Conference paper
First Online: 31 March 2023

1184 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13834))

Abstract

Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We interchangeably use character and symbol in this paper.

References

Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: IbPRIA (2019)
Google Scholar
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
Google Scholar
Blostein, D., Grbavec, A.: Recognition of mathematical notation. In: Handbook of Character Recognition and Document Image Analysis (1997)
Google Scholar
Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. IJDAR 3, 3–15 (2000)
Article Google Scholar
Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)
Article MATH Google Scholar
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML (2017)
Google Scholar
Fu, Y., Liu, T., Gao, M., Zhou, A.: EDSL: an encoder-decoder architecture with symbol-level features for printed mathematical expression recognition. arXiv (2020)
Google Scholar
Garain, U., Chaudhuri, B.B., Chaudhuri, A.R.: Identification of embedded mathematical expressions in scanned documents. In: ICPR (2004)
Google Scholar
LaViola, J.J., Zeleznik, R.C.: A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer. TPAMI 29(11), 1917–1926 (2007)
Article Google Scholar
Liu, Q., et al.: Finding similar exercises in online education systems. In: SIGKDD (2018)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv (2015)
Google Scholar
Masato, M., Satoru, T., Hillary, K.C., Tadahiro, O., Momoyo, I., Minoru, F.: Automatic mood score detection method for music retrieval. IPSJ SIG Notes (2011)
Google Scholar
Okamoto, M., Imai, H., Takagi, K.: Performance evaluation of a robust method for mathematical expression recognition. In: ICDAR (2001)
Google Scholar
Qin, Y., Du, J., Zhang, Y., Lu, H.: Look back and predict forward in image captioning. In: CVPR (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv (2019)
Google Scholar
Raja, A., Rayner, M., Sexton, A., Sorge, V.: Towards a parser for mathematical formula recognition. In: MKM (2006)
Google Scholar
Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. TPAMI 30(6), 941–954 (2008)
Article Google Scholar
Shi, B., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39(11), 2298–2304 (2016)
Article Google Scholar
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: ACM Symposium on Document Engineering. ACM (2003)
Google Scholar
Tang, P., Hui, S.C., Fu, C.W.: A progressive structural analysis approach for handwritten chemical formula recognition. In: ICDAR (2013)
Google Scholar
Twaakyondo, H.M., Okamoto, M.: Structure analysis and recognition of mathematical expressions. In: ICDAR (1995)
Google Scholar
Vo, Q.N., Nguyen, T., Kim, S.H., Yang, H.J., Lee, G.S.: Distorted music score recognition without staffline removal. In: ICPR (2014)
Google Scholar
Wang, L., Zhang, D., Gao, L., Song, J., Guo, L., Shen, H.T.: Mathdqn: solving arithmetic word problems via deep reinforcement learning. In: AAAI (2018)
Google Scholar
Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: CVPR (2020)
Google Scholar
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015)
Google Scholar
Yin, Y., et al.: Transcribing content from structural images with spotlight mechanism. In: SIGKDD (2018)
Google Scholar
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. IJDAR 15, 331–357 (2012)
Article Google Scholar
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR (2018)
Google Scholar
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Data Science and Engineering, East China Normal University, Shanghai, China
Yingnan Fu, Shu Zheng, Ming Gao, Cheqing Jin & Aoying Zhou
Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
Ming Gao
Shanghai Hypers Data Technology Inc., Shanghai, China
Wenyuan Cai

Authors

Yingnan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ming Gao
View author publications
You can also search for this author in PubMed Google Scholar
Cheqing Jin
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Gao .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, Y., Zheng, S., Cai, W., Gao, M., Jin, C., Zhou, A. (2023). Dynamic Feature Selection for Structural Image Content Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-27818-1_28
Published: 31 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27817-4
Online ISBN: 978-3-031-27818-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics