skip to main content
10.1145/3534678.3539131acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

Published:14 August 2022Publication History

ABSTRACT

This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model (PLM) for effectively understanding and representing mathematical problems. Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. Typically, it requires complex mathematical logic and background knowledge for solving mathematical problems.

Considering the complex nature of mathematical texts, we design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses. Specially, we first perform token-level pre-training based on a position-biased masking strategy, and then design logic-based pre-training tasks that aim to recover the shuffled sentences and formulas, respectively. Finally, we introduce a more difficult pre-training task that enforces the PLM to detect and correct the errors in its generated solutions. We conduct extensive experiments on offline evaluation (including nine math-related tasks) and online A/B test. Experimental results demonstrate the effectiveness of our approach compared with a number of competitive baselines. Our code is available at: https://github.com/RUCAIBox/JiuZhang.

Skip Supplemental Material Section

Supplemental Material

KDD2022-fp1642.mp4

mp4

154.9 MB

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015.Google ScholarGoogle Scholar
  2. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In ICML (ICML '09).Google ScholarGoogle Scholar
  3. Ting-Rui Chiang and Yun-Nung Chen. 2019. Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems. In NAACL.Google ScholarGoogle Scholar
  4. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. 2021. Pre-training with whole word masking for chinese bert. TASLP (2021).Google ScholarGoogle Scholar
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google ScholarGoogle Scholar
  6. Charles R Fletcher. 1985. Understanding and solving arithmetic word problems: A computer simulation. Behavior Research Methods, Instruments, & Computers (1985).Google ScholarGoogle Scholar
  7. Zheng Gong, Kun Zhou, Xin Zhao, Jing Sha, Shijin Wang, and Ji-Rong Wen. 2022. Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network. In ACL. 5923--5933.Google ScholarGoogle Scholar
  8. Suchin Gururangan, Ana Marasovic, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In ACL.Google ScholarGoogle Scholar
  9. Tzu-Hua Huang, Yuan-Chen Liu, and Hsiu-Chen Chang. 2012. Learning achievement in solving word-based mathematical questions through a computer-assisted learning system. Journal of Educational Technology & Society (2012).Google ScholarGoogle Scholar
  10. Chen Jia, Yuefeng Shi, Qinrong Yang, and Yue Zhang. 2020. Entity Enhanced BERT Pre-training for Chinese NER. In EMNLP.Google ScholarGoogle Scholar
  11. Ronnie Karsenty. 2014. Mathematical Ability. Springer Netherlands, Dordrecht.Google ScholarGoogle Scholar
  12. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP.Google ScholarGoogle Scholar
  13. Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay. 2014. Learning to automatically solve algebra word problems. In ACL.Google ScholarGoogle Scholar
  14. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI.Google ScholarGoogle Scholar
  15. Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, and Ee-Peng Lim. 2021. MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers. arXiv preprint arXiv:2109.00799 (2021).Google ScholarGoogle Scholar
  16. Diana Laurillard. 1979. The processes of student learning. Higher education 8, 4 (1979), 395--409.Google ScholarGoogle Scholar
  17. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In ACL.Google ScholarGoogle Scholar
  18. Jierui Li, Lei Wang, Jipeng Zhang, Yan Wang, Bing Tian Dai, and Dongxiang Zhang. 2019. Modeling intra-relation in math word problems with different functional multi-head attentions. In ACL.Google ScholarGoogle Scholar
  19. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In ACL 2004.Google ScholarGoogle Scholar
  20. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR.Google ScholarGoogle Scholar
  21. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In ACL.Google ScholarGoogle Scholar
  22. Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).Google ScholarGoogle Scholar
  23. Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. 2022. Formal Mathematics Statement Curriculum Learning. arXiv:2202.01344 [cs.LG]Google ScholarGoogle Scholar
  24. Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020).Google ScholarGoogle Scholar
  25. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR (2020).Google ScholarGoogle Scholar
  26. Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works. TACL (2020).Google ScholarGoogle Scholar
  27. Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, and Xipeng Qiu. 2021. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. arXiv preprint arXiv:2109.05729 (2021).Google ScholarGoogle Scholar
  28. Jia Tracy Shen, Michiharu Yamashita, Ethan Prihar, Neil Heffernan, Xintao Wu, Ben Graff, and Dongwon Lee. 2021. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. arXiv preprint arXiv:2106.07340 (2021).Google ScholarGoogle Scholar
  29. Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu, and Yong Rui. 2015. Automatically solving number word problems by semantic parsing and reasoning. In EMNLP. 1132--1142.Google ScholarGoogle Scholar
  30. Yujin Song and Xiaoyu Chen. 2021. Searching for Mathematical Formulas Based on Graph Representation Learning. In CICM.Google ScholarGoogle Scholar
  31. Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. CoRR (2021).Google ScholarGoogle Scholar
  32. Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu, and Jiwei Li. 2021. ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. In ACL.Google ScholarGoogle Scholar
  33. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  34. Yan Wang, Xiaojiang Liu, and Shuming Shi. 2017. Deep neural solver for math word problems. In EMNLP.Google ScholarGoogle Scholar
  35. Richard Zanibbi and Dorothea Blostein. 2012. Recognition and retrieval of mathematical expressions. IJDAR 15, 4 (2012), 331--357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, et al. 2021. PanGu-a: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation. arXiv preprint arXiv:2104.12369 (2021).Google ScholarGoogle Scholar
  37. Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, et al. 2021. CPM: A large-scale generative Chinese pre-trained language model. AI Open 2 (2021), 93--99.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhuosheng Zhang, Hanqing Zhang, Keming Chen, Yuhang Guo, Jingyun Hua, Yulong Wang, and Ming Zhou. 2021. Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv preprint arXiv:2110.06696 (2021).Google ScholarGoogle Scholar

Index Terms

  1. JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 August 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader