Investigating the Usage of Formulae in Mathematical Answer Retrieval

Reusch, Anja; Gonsior, Julius; Hartmann, Claudio; Lehner, Wolfgang

doi:10.1007/978-3-031-56027-9_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14608))

Included in the following conference series:

European Conference on Information Retrieval

273 Accesses

Abstract

This work focuses on the task of Mathematical Answer Retrieval and studies the factors a recent Transformer-Encoder-based Language Model (LM) uses to assess the relevance of an answer for a given mathematical question. Mainly, we investigate three factors: (1) the general influence of mathematical formulae, (2) the usage of structural information of those formulae, (3) the overlap of variable names in answers and questions. The findings of the investigation indicate that the LM for Mathematical Answer Retrieval mainly relies on shallow features such as the overlap of variables between question and answers. Furthermore, we identified a malicious shortcut in the training data that hinders the usage of structural information and by removing this shortcut improved the overall accuracy. We want to foster future research on how LMs are trained for Mathematical Answer Retrieval and provide a basic evaluation set up (Link to repository: https://github.com/AnReu/math_analysis) for existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pypi.org/project/transformers-interpret/.
2.
We use a custom tokenizer, e.g., is tokenized as .

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Google Scholar
Belinkov, Y.: Probing classifiers: promises, shortcomings, and advances. Comput. Linguist. 48(1), 207–219 (2022)
Article Google Scholar
del Barrio, E., Cuesta-Albertos, J.A., Matrán, C.: An optimal transportation approach for assessing almost stochastic order. In: Gil, E., Gil, E., Gil, J., Gil, M.Á. (eds.) The Mathematics of the Uncertain. SSDC, vol. 142, pp. 33–44. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73848-2_3
Chapter Google Scholar
Dror, R., Shlomov, S., Reichart, R.: Deep dominance - how to properly compare deep neural models. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019, vol. 1: Long Papers, pp. 2773–2785. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1266
Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., Gardner, M.: Drop: a reading comprehension benchmark requiring discrete reasoning over paragraphs. In: Proceedings of NAACL-HLT, pp. 2368–2378 (2019)
Google Scholar
Fan, Y., Guo, J., Ma, X., Zhang, R., Lan, Y., Cheng, X.: A linguistic study on relevance modeling in information retrieval. In: Proceedings of the Web Conference 2021, pp. 1053–1064 (2021)
Google Scholar
Geletka, M., Kalivoda, V., Štefánik, M., Toma, M., Sojka, P.: Diverse semantics representation is king. In: Proceedings of the Working Notes of CLEF 2022 (2022)
Google Scholar
Hendrycks, D., et al.: Measuring mathematical problem solving with the math dataset. In: NeurIPS (2021)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: International Conference on Learning Representations (2019)
Google Scholar
Khattab, O., Zaharia, M.: Colbert: efficient and effective passage search via contextualized late interaction over bert. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)
Google Scholar
Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Finding old answers to new math questions: the ARQMath lab at CLEF 2020. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 564–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_73
Chapter Google Scholar
Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Advancing math-aware search: the arqmath-2 lab at clef 2021, pp. 631–638 (2021)
Google Scholar
Mansouri, B., Novotnỳ, V., Agarwal, A., Oard, D.W., Zanibbi, R.: Overview of arqmath-3 (2022): third clef lab on answer retrieval for questions on math (working notes version). In: Proceedings of the Working Notes of CLEF 2022 (2022)
Google Scholar
Mansouri, B., Oard, D.W., Zanibbi, R.: DPRL systems in the clef 2021 arqmath lab: sentence-bert for answer retrieval, learning-to-rank for formula retrieval (2021)
Google Scholar
Novotnỳ, V., Štefánik, M.: Combining sparse and dense information retrieval. In: Proceedings of the Working Notes of CLEF (2022)
Google Scholar
O’Connor, J., Andreas, J.: What context features can transformer language models use? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1: Long Papers, pp. 851–864 (2021)
Google Scholar
Pham, T., Bui, T., Mai, L., Nguyen, A.: Out of order: how important is the sequential order of words in a sentence in natural language understanding tasks? In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1145–1160 (2021)
Google Scholar
Polu, S., Sutskever, I.: Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020)
Qiao, Y., Xiong, C., Liu, Z., Liu, Z.: Understanding the behaviors of bert in ranking. arXiv preprint arXiv:1904.07531 (2019)
Reusch, A., Lehner, W.: Extracting operator trees from model embeddings. In: Proceedings of the 1st MathNLP Workshop (2022)
Google Scholar
Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder and decoder models for questions on math. In: Proceedings of the Working Notes of CLEF 2022, pp. 5–8 (2022)
Google Scholar
Reusch, A., Thiele, M., Lehner, W.: Transformer-encoder-based mathematical information retrieval. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 175–189. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-13643-6_14
Rohatgi, S., Wu, J., Giles, C.L.: Psu at clef-2020 arqmath track: unsupervised re-ranking using pretraining. In: CEUR Workshop Proceedings. Thessaloniki, Greece (2020)
Google Scholar
Saxton, D., Grefenstette, E., Hill, F., Kohli, P.: Analysing mathematical reasoning abilities of neural models. In: International Conference on Learning Representations (2019)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Ulmer, D., Hardmeier, C., Frellsen, J.: deep-significance: easy and meaningful signifcance testing in the age of neural networks. In: ML Evaluation Standards Workshop at the Tenth International Conference on Learning Representations (2022)
Google Scholar
Van Aken, B., Winter, B., Löser, A., Gers, F.A.: How does bert answer questions? a layer-wise analysis of transformer representations. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1823–1832 (2019)
Google Scholar
Vashishth, S., Upadhyay, S., Tomar, G.S., Faruqui, M.: Attention interpretability across NLP tasks. arXiv preprint arXiv:1909.11218 (2019)
Wallat, J., Singh, J., Anand, A.: Bertnesia: investigating the capture and forgetting of knowledge in bert. CoRR abs/2106.02902 (2021). https://arxiv.org/abs/2106.02902
Wolf, T., et al.: Transformers: state-of-the-art natural language processing, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Zhan, J., Mao, J., Liu, Y., Zhang, M., Ma, S.: An analysis of bert in document ranking. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1941–1944 (2020)
Google Scholar
Zhong, W., Lin, S.C., Yang, J.H., Lin, J.: One blade for one purpose: advancing math information retrieval using hybrid search. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 141–151 (2023)
Google Scholar
Zhong, W., Yang, J.H., Lin, J.: Evaluating token-level and passage-level dense retrieval models for math information retrieval. arXiv preprint arXiv:2203.11163 (2022)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful feedback and comments. This work was supported by the DFG under Germany’s Excellence Strategy, Grant No. EXC-2068-390729961, Cluster of Excellence “Physics of Life” of TU Dresden. Furthermore, the authors are grateful for the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

Author information

Authors and Affiliations

Dresden Database Research Group, Technische Universtität Dresden, Dresden, Germany
Anja Reusch, Julius Gonsior, Claudio Hartmann & Wolfgang Lehner

Authors

Anja Reusch
View author publications
You can also search for this author in PubMed Google Scholar
Julius Gonsior
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Hartmann
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anja Reusch .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reusch, A., Gonsior, J., Hartmann, C., Lehner, W. (2024). Investigating the Usage of Formulae in Mathematical Answer Retrieval. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-56027-9_15
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Investigating the Usage of Formulae in Mathematical Answer Retrieval