LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition

Yang, Guangjie; Zhong, Dajian; Xiong, Yu-jie; Zhan, Hongjian

doi:10.1007/978-3-031-53308-2_37

Guangjie Yang¹⁴,
Dajian Zhong¹⁵,
Yu-jie Xiong¹⁶ &
…
Hongjian Zhan^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14555))

Included in the following conference series:

International Conference on Multimedia Modeling

317 Accesses

Abstract

Table structure recognition is a challenging task due to complex background and various styles of tables. Existing methods address this challenge by exploring adjacency relationship prediction, image-to-text generation, logical position prediction, etc. However, these methods either adopt Graph Convolutional Network (GCN) structures, which mainly focus on the local context information, or Multi-Head Attention (MHA) structures, which mainly focus on the global context information. Both of them ignore the correlation between local and global features. In this paper, we propose a Local-Relationship-Aware Transformer Network (LRATNet) for table structure recognition. LRATNet constructs a robust correlation between local and global information using the LRAT module. The LRAT model has been adapted into three distinct variants: Row-LRAT, Col-LRAT, and Spa-LRAT. These variants are designed to emphasize specific aspects of information: row information, column information, and spatial information, respectively. This is achieved through the exploration of different adjacency relationships. This improves the performance of logical location prediction. Additionally, we have developed a new loss function called Lstage, which is designed to improve accuracy in predicting logical positions. Experimental results demonstrate that our method outperforms existing approaches on three public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Clinchant, S., Déjean, H., Meunier, J.L., Lang, E.M., Kleber, F.: Comparing machine learning approaches for table recognition in historical register books. In: IAPR International Workshop on Document Analysis Systems (DAS), pp. 133–138 (2018)
Google Scholar
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)
Google Scholar
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)
Google Scholar
Hirayama, Y.: A method for table structure analysis using DP matching. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 583–586 (1995)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Y., et al.: Improving table structure recognition with visual-alignment sequential coordinate modeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11134–11143 (2023)
Google Scholar
Kieninger, T., Dengel, A.: The t-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48172-9_21
Chapter Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: table benchmark for image-based table detection and recognition. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 1918–1925 (2020)
Google Scholar
Li, Y., Huang, Z., Yan, J., Zhou, Y., Ye, F., Liu, X.: GFTE: graph-based financial table extraction. In: Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R. (eds.) ICPR 2021, Part II. LNCS, vol. 12662, pp. 644–658. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_50
Chapter Google Scholar
Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4533–4542 (2022)
Google Scholar
Liu, H., et al.: Show, read and reason: table structure recognition with flexible context aggregator. In: ACM International Conference on Multimedia (ACM MM), pp. 1084–1092 (2021)
Google Scholar
Long, R., et al.: Parsing table structures in the wild. In: International Conference on Computer Vision (ICCV), pp. 944–952 (2021)
Google Scholar
Qasim, S.R., Kieseler, J., Iiyama, Y., Pierini, M.: Learning representations of irregular particle-detector geometry with distance-weighted graph networks. Eur. Phys. J. C 79(7), 1–11 (2019)
Article Google Scholar
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 142–147 (2019)
Google Scholar
Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5
Chapter Google Scholar
Tupaj, S., Shi, Z., Chang, C.H., Alam, H.: Extracting tabular information from text files. EECS Department, Tufts University, Medford, USA (1996)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Xing, H., et al.: LORE: logical location regression network for table structure recognition. In: Association for the Advancement of Artificial Intelligence Conference (AAAI), pp. 2992–3000 (2023)
Google Scholar
Xue, W., Li, Q., Tao, D.: ReS2TIM: reconstruct syntactic structures from table images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 749–755 (2019)
Google Scholar
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: a table graph reconstruction network for table structure recognition. In: International Conference on Computer Vision (ICCV), pp. 1295–1304 (2021)
Google Scholar
Ying, C., et al.: Do transformers really perform bad for graph representation? arXiv preprint arXiv:2106.05234 (2019)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)
Google Scholar
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34
Chapter Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China
Guangjie Yang & Hongjian Zhan
College of Information Engineering, Shanghai Maritime University, Shanghai, China
Dajian Zhong & Hongjian Zhan
Shanghai University of Engineering Science, Shanghai, China
Yu-jie Xiong

Authors

Guangjie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dajian Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yu-jie Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Hongjian Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjian Zhan .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, G., Zhong, D., Xiong, Yj., Zhan, H. (2024). LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-53308-2_37
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53307-5
Online ISBN: 978-3-031-53308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition