research-article

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Authors:
Hao Wang

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China

0000-0003-0896-080X
View Profile

,
Bin Wang

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China

0000-0001-7181-4157
View Profile

,
Jianyong Duan

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China

North China University of Technology and CNONIX National Standard Application and Promotion Lab, Beijing, China
View Profile

,
Jiajun Zhang

Institute of Automation Chinese Academy of Sciences, Beijing, China

Institute of Automation Chinese Academy of Sciences, Beijing, China
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20 Issue 2Article No.: 28pp 1–11https://doi.org/10.1145/3426882

Published:05 May 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Unlike English, where every single word is directly typed by keyboard, we have to use an input method to input Chinese characters. The pinyin input method is the most widely used. By intuition, pinyin should be helpful in detecting spelling errors. However, when detect spelling errors, most of the current methods ignore the pinyin information and adopt a pipeline framework that leads to error propagation. In this article, we propose a fusion lattice-LSTM model under the end-to-end framework to integrate character, word, and pinyin features for error detection. Experiments on the SIGHAN Bake-off-2015 dataset show that pinyin is a discriminating feature, and our end-to-end model outperforms the baseline models obviously.

References

Jill Burstein and Martin Chodorow. 1999. Automated essay scoring for nonnative English speakers. In Proceedings of the Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing. 68–75. Google ScholarDigital Library
Chao-Huang Chang. 1995. A new approach for automatic Chinese spelling correction. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Vol. 95. 278–283.Google Scholar
Kuan-Yu Chen, Hung-Shin Lee, Chung-Han Lee, Hsin-Min Wang, and Hsin-Hsi Chen. 2013. A study of language modeling for Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 79–83.Google Scholar
Hsun-Wen Chiu, Jian-Cheng Wu, and Jason S. Chang. 2013. Chinese spelling checker based on statistical machine translation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 49–53.Google Scholar
Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quirk, and Xu Sun. 2010. A large scale ranker-based system for search query spelling correction. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 358–366 Google ScholarDigital Library
Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2017), 2222–2232.Google ScholarCross Ref
Dongxu Han and Baobao Chang. 2013. A maximum entropy approach to Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 74–78.Google Scholar
Yu He and Guohong Fu. 2013. Description of HLJU Chinese spelling checker for SIGHAN Bakeoff 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 84–87.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
Yu-Ming Hsieh, Ming-Hong Bai, and Keh-Jiann Chen. 2013. Introduction to CKIP Chinese spelling check system for SIGHAN Bakeoff 2013 evaluation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 59–63.Google Scholar
Yu-Ming Hsieh, Ming-Hong Bai, Shu-Ling Huang, and Keh-Jiann Chen. 2015. Correcting Chinese spelling errors with word lattice decoding. ACM Transactions on Asian and Low-Resource Language Information Processing 14, 4 (2015), 18. Google ScholarDigital Library
Jie Yang, Yue Zhang, and Shuailong Liang. 2019. Subword encoding in lattice LSTM for chinese word segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Technologies (Volume 1: Long and Short Papers) (NAACL’19). 2720–2725.Google Scholar
Diederik P. Kingma and Jimmy B. 2014. Adam: A method for stochastic optimization. arXIv:1412.6980Google Scholar
John D. Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282–289. Google ScholarDigital Library
C.-L. Liu, M.-H. Lai, K.-W. Tien, Y.-H. Chuang, S.-H. Wu, and C.-Y. Lee. 2011. Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing 10 (2011), Article 10, 39 pages. Google ScholarDigital Library
Xiaodong Liu, Kevin Cheng, Yanyan Luo, Kevin Duh, and Yuji Matsumoto. 2013. A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 54–58.Google Scholar
Deryle W. Lonsdale and Diane Strong-Krause. 2003. Automated rating of ESL essays. In Proceedings of the HTL-NAACL ’03 Workshop on Building Educational Applications Using Natural Language Processing (HLT-NAACL-EDUC’03). 61. Google ScholarDigital Library
Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
Bruno Martins and Mário J. Silva. 2004. Spelling correction for search engine queries. In Advances in Natural Language Processing. Lecture Notes in Computer Science, Vol. 3230. Springer. 372–383.Google Scholar
Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen. 2015. Introduction to SIGHAN 2015 Bake-off for Chinese spelling check. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing. 32–37.Google ScholarCross Ref
Andrew J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13 (1967), 260–269. Google ScholarDigital Library
Dingmin Wang, Yan Song, Jing Li, Jialong Han, and Haisong Zhang. 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. 2013. Chinese spelling check evaluation at SIGHAN Bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 35–42.Google Scholar
Ting-Hao Yang, Yu-Lun Hsieh, Yu-Hsuan Chen, Michael Tsang, Cheng-Wei Shih, and Wen-Lian Hsu. 2013. Sinica-IASL Chinese spelling check system at SIGHAN-7. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 93–96.Google Scholar
Jui-Feng Yeh, Sheng-Feng Li, Mei-Rong Wu, Wen-Yi Chen, and Mao-Chuan Su. 2013. Chinese word spelling correction based on n-gram ranked inverted index list. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 43–48.Google Scholar
Junjie Yu and Zhenghua Li. 2014. Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 220–223.Google ScholarCross Ref
Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen. 2014. Overview of SIGHAN 2014 Bake-off for Chinese spelling check. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing.Google Scholar
Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref

Index Terms

Chinese Spelling Error Detection Using a Fusion Lattice LSTM
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

A Hybrid Model for Chinese Spelling Check

Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic errors ...
Read More
Correcting Chinese Spelling Errors with Word Lattice Decoding
Special Issue on Chinese Spell Checking

Chinese spell checkers are more difficult to develop because of two language features: 1) there are no word boundaries, and a character may function as a word or a word morpheme; and 2) the Chinese character set contains more than ten thousand ...
Read More
A Probabilistic Framework for Chinese Spelling Check
Special Issue on Chinese Spell Checking

Chinese spelling check (CSC) is still an unsolved problem today since there are many homonymous or homomorphous characters. Recently, more and more CSC systems have been proposed. To the best of our knowledge, language modeling is one of the major ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 2
March 2021
313 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3454116
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 May 2021
- Accepted: 1 September 2020
- Revised: 1 July 2020
- Received: 1 November 2019
Published in tallip Volume 20, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese spelling error
neural networks
spelling check
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 185
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Hybrid Model for Chinese Spelling Check

Correcting Chinese Spelling Errors with Word Lattice Decoding

A Probabilistic Framework for Chinese Spelling Check

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Hybrid Model for Chinese Spelling Check

Correcting Chinese Spelling Errors with Word Lattice Decoding

A Probabilistic Framework for Chinese Spelling Check

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media