Skip to main content

A Study on Reproducibility and Replicability of Table Structure Recognition Methods

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14188))

Included in the following conference series:

  • 1084 Accesses

Abstract

Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. National Academies: Reproducibility and Replicability in Science. National Academies Press (2019). https://doi.org/10.17226/25303

  2. Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature 533(7604), 452–454 (2016). https://doi.org/10.1038/533452a

    Article  Google Scholar 

  3. Camerer, C.F., et al.: Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nat. Hum. Behav. 2(9), 637–644 (2018). https://doi.org/10.1038/s41562-018-0399-z

    Article  Google Scholar 

  4. Collberg, C., Proebsting, T.A.: Repeatability in computer systems research. Commun. ACM 59(3), 62–69 (2016). https://doi.org/10.1145/2812803

    Article  Google Scholar 

  5. Dutta, A., Zisserman, A.: The via annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279 (2019)

    Google Scholar 

  6. Fanelli, D.: Opinion: is science really facing a reproducibility crisis, and do we need it to? Proc. Natl. Acad. Sci. 115(11), 2628–2631 (2018). https://doi.org/10.1073/pnas.1708272114

    Article  Google Scholar 

  7. Fischer, P., Smajic, A., Abrami, G., Mehler, A.: Multi-type-TD-TSR – extracting tables from document images using a multi-stage pipeline for table detection and table structure recognition: from OCR to structured table representations. In: Edelkamp, S., Möller, R., Rueckert, E. (eds.) KI 2021. LNCS (LNAI), vol. 12873, pp. 95–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87626-5_8

    Chapter  Google Scholar 

  8. Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (CTDAR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019). https://doi.org/10.1109/ICDAR.2019.00243

  9. Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67

    Chapter  Google Scholar 

  10. Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453 (2013). https://doi.org/10.1109/ICDAR.2013.292

  11. Goodman, S.N., Fanelli, D., Ioannidis, J.P.: What does research reproducibility mean? Sci. Transl. Med. 8(341), 341ps12-341ps12 (2016)

    Google Scholar 

  12. Gundersen, O.E., Kjensmo, S.: State of the art: reproducibility in artificial intelligence. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  13. Hashmi, K.A., Stricker, D., Liwicki, M., Afzal, M.N., Afzal, M.Z.: Guided table structure recognition through anchor optimization. IEEE Access 9, 113521–113534 (2021)

    Article  Google Scholar 

  14. Jain, A., Paliwal, S., Sharma, M., Vig, L.: TSR-DSAW: table structure recognition via deep spatial association of words. In: 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2021, Online event (Bruges, Belgium), 6–8 October 2021 (2021). https://doi.org/10.14428/esann/2021.ES2021-109

  15. Kamphuis, C., de Vries, A.P., Boytsov, L., Lin, J.: Which BM25 do you mean? A large-scale reproducibility study of scoring variants. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 28–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_4

    Chapter  Google Scholar 

  16. Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table structure extraction with bi-directional gated recurrent unit networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1366–1371. IEEE (2019)

    Google Scholar 

  17. Lee, E., Park, J., Koo, H.I., Cho, N.I.: Deep-learning and graph-based approach to table structure recognition. Multimedia Tools Appl. 81(4), 5827–5848 (2022)

    Article  Google Scholar 

  18. Li, Y., et al.: Rethinking table structure recognition using sequence labeling methods. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 541–553. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_35

    Chapter  Google Scholar 

  19. Liu, C., Gao, C., Xia, X., Lo, D., Grundy, J.C., Yang, X.: On the reproducibility and replicability of deep learning in software engineering. ACM Trans. Softw. Eng. Methodol. 31(1), 15:1–15:46 (2022). https://doi.org/10.1145/3477535

  20. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med. 22(3), 276–82 (2012)

    Article  MathSciNet  Google Scholar 

  21. Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41

    Chapter  Google Scholar 

  22. Nosek, B.A., et al.: Replicability, robustness, and reproducibility in psychological science. Ann. Rev. Psychol. 73(1), 719–748 (2022). https://doi.org/10.1146/annurev-psych-020821-114157. pMID: 34665669

  23. Olorisade, B.K., Brereton, P., Andras, P.: Reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist. J. Biomed. Inform. 73, 1–13 (2017). https://doi.org/10.1016/j.jbi.2017.07.010

    Article  Google Scholar 

  24. Pimentel, J.F., Murta, L., Braganholo, V., Freire, J.: A large-scale study about quality and reproducibility of jupyter notebooks. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 507–517. IEEE (2019). https://doi.org/10.1109/MSR.2019.00077

  25. Pineau, J., et al.: Improving reproducibility in machine learning research. J. Mach. Learn. Res. 22, 7459–7478 (2021)

    Google Scholar 

  26. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)

    Google Scholar 

  27. Prenkaj, B., Velardi, P., Distante, D., Faralli, S.: A reproducibility study of deep and surface machine learning methods for human-related trajectory prediction. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 2169–2172. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3340531.3412088

  28. Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 142–147. IEEE (2019)

    Google Scholar 

  29. Qiao, L., et al.: LGPMA: complicated table structure recognition with local and global pyramid mask alignment. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 99–114. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_7

    Chapter  Google Scholar 

  30. Raff, E.: A step toward quantifying independently reproducible machine learning research. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  31. Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5

    Chapter  Google Scholar 

  32. Salsabil, L., et al.: A study of computational reproducibility using URLs linking to open access datasets and software. In: Companion Proceedings of the Web Conference 2022, WWW 2022, pp. 784–788. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3487553.3524658

  33. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017). https://doi.org/10.1109/ICDAR.2017.192

  34. Seibold, H., et al.: A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. PLoS ONE 16(6), 1–15 (2021). https://doi.org/10.1371/journal.pone.0251194

    Article  Google Scholar 

  35. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409. IEEE (2019)

    Google Scholar 

  36. Stevens, L.M., Mortazavi, B.J., Deo, R.C., Curtis, L., Kao, D.P.: Recommendations for reporting machine learning analyses in clinical research. Circ. Cardiovasc. Qual. Outcomes 13(10), e006556 (2020). https://doi.org/10.1161/CIRCOUTCOMES.120.006556

  37. Tatman, R., VanderPlas, J., Dane, S.: A practical taxonomy of reproducibility for machine learning research (2018)

    Google Scholar 

  38. Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 114–121 (2019). https://doi.org/10.1109/ICDAR.2019.00027

  39. Xue, W., Li, Q., Tao, D.: Res2tim: reconstruct syntactic structures from table images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 749–755 (2019). https://doi.org/10.1109/ICDAR.2019.00125

  40. Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: a table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)

    Google Scholar 

  41. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021)

    Google Scholar 

Download references

Acknowledgment

This work was partially supported by the Defense Advanced Research Projects Agency (DARPA) under cooperative agreement No. W911NF-19-2-0272. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kehinde Ajayi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ajayi, K., Choudhury, M.H., Rajtmajer, S.M., Wu, J. (2023). A Study on Reproducibility and Replicability of Table Structure Recognition Methods. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41679-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41678-1

  • Online ISBN: 978-3-031-41679-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics