research-article

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

Authors:
Wayne Xin Zhao

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Kun Zhou

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Zheng Gong

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Beichen Zhang

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Yuanhang Zhou

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Jing Sha

iFLYTEK Research, Hefei, China

iFLYTEK Research, Hefei, China
View Profile

,
Zhigang Chen

iFLYTEK Research, Hefei, China

iFLYTEK Research, Hefei, China
View Profile

,
Shijin Wang

AI Research (Central China), iFLYTEK, Wuhan, China

AI Research (Central China), iFLYTEK, Wuhan, China
View Profile

,
Cong Liu

iFLYTEK Research, Hefei, China

iFLYTEK Research, Hefei, China
View Profile

,
Ji-Rong Wen

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2022Pages 4571–4581https://doi.org/10.1145/3534678.3539131

Published:14 August 2022Publication History

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4571–4581

ABSTRACT

This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model (PLM) for effectively understanding and representing mathematical problems. Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. Typically, it requires complex mathematical logic and background knowledge for solving mathematical problems.

Considering the complex nature of mathematical texts, we design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses. Specially, we first perform token-level pre-training based on a position-biased masking strategy, and then design logic-based pre-training tasks that aim to recover the shuffled sentences and formulas, respectively. Finally, we introduce a more difficult pre-training task that enforces the PLM to detect and correct the errors in its generated solutions. We conduct extensive experiments on offline evaluation (including nine math-related tasks) and online A/B test. Experimental results demonstrate the effectiveness of our approach compared with a number of competitive baselines. Our code is available at: https://github.com/RUCAIBox/JiuZhang.

Supplemental Material

KDD2022-fp1642.mp4

mp4

154.9 MB

Download

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015.Google Scholar
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In ICML (ICML '09).Google Scholar
Ting-Rui Chiang and Yun-Nung Chen. 2019. Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems. In NAACL.Google Scholar
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. 2021. Pre-training with whole word masking for chinese bert. TASLP (2021).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google Scholar
Charles R Fletcher. 1985. Understanding and solving arithmetic word problems: A computer simulation. Behavior Research Methods, Instruments, & Computers (1985).Google Scholar
Zheng Gong, Kun Zhou, Xin Zhao, Jing Sha, Shijin Wang, and Ji-Rong Wen. 2022. Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network. In ACL. 5923--5933.Google Scholar
Suchin Gururangan, Ana Marasovic, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In ACL.Google Scholar
Tzu-Hua Huang, Yuan-Chen Liu, and Hsiu-Chen Chang. 2012. Learning achievement in solving word-based mathematical questions through a computer-assisted learning system. Journal of Educational Technology & Society (2012).Google Scholar
Chen Jia, Yuefeng Shi, Qinrong Yang, and Yue Zhang. 2020. Entity Enhanced BERT Pre-training for Chinese NER. In EMNLP.Google Scholar
Ronnie Karsenty. 2014. Mathematical Ability. Springer Netherlands, Dordrecht.Google Scholar
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP.Google Scholar
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina Barzilay. 2014. Learning to automatically solve algebra word problems. In ACL.Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI.Google Scholar
Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, and Ee-Peng Lim. 2021. MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers. arXiv preprint arXiv:2109.00799 (2021).Google Scholar
Diana Laurillard. 1979. The processes of student learning. Higher education 8, 4 (1979), 395--409.Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In ACL.Google Scholar
Jierui Li, Lei Wang, Jipeng Zhang, Yan Wang, Bing Tian Dai, and Dongxiang Zhang. 2019. Modeling intra-relation in math word problems with different functional multi-head attentions. In ACL.Google Scholar
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In ACL 2004.Google Scholar
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In ACL.Google Scholar
Shuai Peng, Ke Yuan, Liangcai Gao, and Zhi Tang. 2021. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).Google Scholar
Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. 2022. Formal Mathematics Statement Curriculum Learning. arXiv:2202.01344 [cs.LG]Google Scholar
Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020).Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR (2020).Google Scholar
Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works. TACL (2020).Google Scholar
Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, and Xipeng Qiu. 2021. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. arXiv preprint arXiv:2109.05729 (2021).Google Scholar
Jia Tracy Shen, Michiharu Yamashita, Ethan Prihar, Neil Heffernan, Xintao Wu, Ben Graff, and Dongwon Lee. 2021. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. arXiv preprint arXiv:2106.07340 (2021).Google Scholar
Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu, and Yong Rui. 2015. Automatically solving number word problems by semantic parsing and reasoning. In EMNLP. 1132--1142.Google Scholar
Yujin Song and Xiaoyu Chen. 2021. Searching for Mathematical Formulas Based on Graph Representation Learning. In CICM.Google Scholar
Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, and Haifeng Wang. 2021. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. CoRR (2021).Google Scholar
Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu, and Jiwei Li. 2021. ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. In ACL.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
Yan Wang, Xiaojiang Liu, and Shuming Shi. 2017. Deep neural solver for math word problems. In EMNLP.Google Scholar
Richard Zanibbi and Dorothea Blostein. 2012. Recognition and retrieval of mathematical expressions. IJDAR 15, 4 (2012), 331--357.Google ScholarDigital Library
Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, et al. 2021. PanGu-a: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation. arXiv preprint arXiv:2104.12369 (2021).Google Scholar
Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, et al. 2021. CPM: A large-scale generative Chinese pre-trained language model. AI Open 2 (2021), 93--99.Google ScholarCross Ref
Zhuosheng Zhang, Hanqing Zhang, Keming Chen, Yuhang Guo, Jingyun Hua, Yulong Wang, and Ming Zhou. 2021. Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv preprint arXiv:2110.06696 (2021).Google Scholar

Index Terms

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models

Recommendations

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (e.g. a model copy for ...
Read More
Impact of Morphological Segmentation on Pre-trained Language Models
Intelligent Systems
Abstract
Pre-trained Language Models are the current state-of-the-art in many natural language processing tasks. These models rely on subword-based tokenization to solve the problem of out-of-vocabulary words. However, commonly used subword segmentation ...
Read More
Introduction to Chinese Natural Language Processing
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese pre-trained language model
mathematical logic understanding
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 229
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Impact of Morphological Segmentation on Pre-trained Language Models

Introduction to Chinese Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Impact of Morphological Segmentation on Pre-trained Language Models

Introduction to Chinese Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media