Skip to main content

Investigating the Usage of Formulae in Mathematical Answer Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

This work focuses on the task of Mathematical Answer Retrieval and studies the factors a recent Transformer-Encoder-based Language Model (LM) uses to assess the relevance of an answer for a given mathematical question. Mainly, we investigate three factors: (1) the general influence of mathematical formulae, (2) the usage of structural information of those formulae, (3) the overlap of variable names in answers and questions. The findings of the investigation indicate that the LM for Mathematical Answer Retrieval mainly relies on shallow features such as the overlap of variables between question and answers. Furthermore, we identified a malicious shortcut in the training data that hinders the usage of structural information and by removing this shortcut improved the overall accuracy. We want to foster future research on how LMs are trained for Mathematical Answer Retrieval and provide a basic evaluation set up (Link to repository: https://github.com/AnReu/math_analysis) for existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.org/project/transformers-interpret/.

  2. 2.

    We use a custom tokenizer, e.g., is tokenized as .

References

  1. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)

    Google Scholar 

  2. Belinkov, Y.: Probing classifiers: promises, shortcomings, and advances. Comput. Linguist. 48(1), 207–219 (2022)

    Article  Google Scholar 

  3. del Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: An optimal transportation approach for assessing almost stochastic order. In: Gil, E., Gil, E., Gil, J., Gil, M.Á. (eds.) The Mathematics of the Uncertain. SSDC, vol. 142, pp. 33–44. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73848-2_3

    Chapter  Google Scholar 

  4. Dror, R., Shlomov, S., Reichart, R.: Deep dominance - how to properly compare deep neural models. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019, vol. 1: Long Papers, pp. 2773–2785. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1266

  5. Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., Gardner, M.: Drop: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proceedings of NAACL-HLT, pp. 2368–2378 (2019)

    Google Scholar 

  6. Fan, Y., Guo, J., Ma, X., Zhang, R., Lan, Y., Cheng, X.: A linguistic study on relevance modeling in information retrieval. In: Proceedings of the Web Conference 2021, pp. 1053–1064 (2021)

    Google Scholar 

  7. Geletka, M., Kalivoda, V., Štefánik, M., Toma, M., Sojka, P.: Diverse semantics representation is king. In: Proceedings of the Working Notes of CLEF 2022 (2022)

    Google Scholar 

  8. Hendrycks, D., et al.: Measuring mathematical problem solving with the math dataset. In: NeurIPS (2021)

    Google Scholar 

  9. Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (2019)

    Google Scholar 

  10. Khattab, O., Zaharia, M.: Colbert: efficient and effective passage search via contextualized late interaction over bert. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)

    Google Scholar 

  11. Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Finding old answers to new math questions: the ARQMath lab at CLEF 2020. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 564–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_73

    Chapter  Google Scholar 

  12. Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Advancing math-aware search: the arqmath-2 lab at clef 2021, pp. 631–638 (2021)

    Google Scholar 

  13. Mansouri, B., Novotnỳ, V., Agarwal, A., Oard, D.W., Zanibbi, R.: Overview of arqmath-3 (2022): third clef lab on answer retrieval for questions on math (working notes version). In: Proceedings of the Working Notes of CLEF 2022 (2022)

    Google Scholar 

  14. Mansouri, B., Oard, D.W., Zanibbi, R.: DPRL systems in the clef 2021 arqmath lab: sentence-bert for answer retrieval, learning-to-rank for formula retrieval (2021)

    Google Scholar 

  15. Novotnỳ, V., Štefánik, M.: Combining sparse and dense information retrieval. In: Proceedings of the Working Notes of CLEF (2022)

    Google Scholar 

  16. O’Connor, J., Andreas, J.: What context features can transformer language models use? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 851–864 (2021)

    Google Scholar 

  17. Pham, T., Bui, T., Mai, L., Nguyen, A.: Out of order: how important is the sequential order of words in a sentence in natural language understanding tasks? In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1145–1160 (2021)

    Google Scholar 

  18. Polu, S., Sutskever, I.: Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020)

  19. Qiao, Y., Xiong, C., Liu, Z., Liu, Z.: Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531 (2019)

  20. Reusch, A., Lehner, W.: Extracting operator trees from model embeddings. In: Proceedings of the 1st MathNLP Workshop (2022)

    Google Scholar 

  21. Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder and decoder models for questions on math. In: Proceedings of the Working Notes of CLEF 2022, pp. 5–8 (2022)

    Google Scholar 

  22. Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder-based mathematical information retrieval. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 175–189. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-13643-6_14

  23. Rohatgi, S., Wu, J., Giles, C.L.: Psu at clef-2020 arqmath track: unsupervised re-ranking using pretraining. In: CEUR Workshop Proceedings. Thessaloniki, Greece (2020)

    Google Scholar 

  24. Saxton, D., Grefenstette, E., Hill, F., Kohli, P.: Analysing mathematical reasoning abilities of neural models. In: International Conference on Learning Representations (2019)

    Google Scholar 

  25. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  26. Ulmer, D., Hardmeier, C., Frellsen, J.: deep-significance: easy and meaningful signifcance testing in the age of neural networks. In: ML Evaluation Standards Workshop at the Tenth International Conference on Learning Representations (2022)

    Google Scholar 

  27. Van Aken, B., Winter, B., Löser, A., Gers, F.A.: How does bert answer questions? a layer-wise analysis of transformer representations. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1823–1832 (2019)

    Google Scholar 

  28. Vashishth, S., Upadhyay, S., Tomar, G.S., Faruqui, M.: Attention interpretability across NLP tasks. arXiv preprint arXiv:1909.11218 (2019)

  29. Wallat, J., Singh, J., Anand, A.: Bertnesia: investigating the capture and forgetting of knowledge in bert. CoRR abs/2106.02902 (2021). https://arxiv.org/abs/2106.02902

  30. Wolf, T., et al.: Transformers: state-of-the-art natural language processing, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6

  31. Zhan, J., Mao, J., Liu, Y., Zhang, M., Ma, S.: An analysis of bert in document ranking. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1941–1944 (2020)

    Google Scholar 

  32. Zhong, W., Lin, S.C., Yang, J.H., Lin, J.: One blade for one purpose: advancing math information retrieval using hybrid search. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 141–151 (2023)

    Google Scholar 

  33. Zhong, W., Yang, J.H., Lin, J.: Evaluating token-level and passage-level dense retrieval models for math information retrieval. arXiv preprint arXiv:2203.11163 (2022)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful feedback and comments. This work was supported by the DFG under Germany’s Excellence Strategy, Grant No. EXC-2068-390729961, Cluster of Excellence “Physics of Life” of TU Dresden. Furthermore, the authors are grateful for the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anja Reusch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reusch, A., Gonsior, J., Hartmann, C., Lehner, W. (2024). Investigating the Usage of Formulae in Mathematical Answer Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56027-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56026-2

  • Online ISBN: 978-3-031-56027-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics