Abstract
Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Unlike English, where every single word is directly typed by keyboard, we have to use an input method to input Chinese characters. The pinyin input method is the most widely used. By intuition, pinyin should be helpful in detecting spelling errors. However, when detect spelling errors, most of the current methods ignore the pinyin information and adopt a pipeline framework that leads to error propagation. In this article, we propose a fusion lattice-LSTM model under the end-to-end framework to integrate character, word, and pinyin features for error detection. Experiments on the SIGHAN Bake-off-2015 dataset show that pinyin is a discriminating feature, and our end-to-end model outperforms the baseline models obviously.
- Jill Burstein and Martin Chodorow. 1999. Automated essay scoring for nonnative English speakers. In Proceedings of the Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing. 68–75. Google ScholarDigital Library
- Chao-Huang Chang. 1995. A new approach for automatic Chinese spelling correction. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Vol. 95. 278–283.Google Scholar
- Kuan-Yu Chen, Hung-Shin Lee, Chung-Han Lee, Hsin-Min Wang, and Hsin-Hsi Chen. 2013. A study of language modeling for Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 79–83.Google Scholar
- Hsun-Wen Chiu, Jian-Cheng Wu, and Jason S. Chang. 2013. Chinese spelling checker based on statistical machine translation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 49–53.Google Scholar
- Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quirk, and Xu Sun. 2010. A large scale ranker-based system for search query spelling correction. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 358–366 Google ScholarDigital Library
- Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2017), 2222–2232.Google ScholarCross Ref
- Dongxu Han and Baobao Chang. 2013. A maximum entropy approach to Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 74–78.Google Scholar
- Yu He and Guohong Fu. 2013. Description of HLJU Chinese spelling checker for SIGHAN Bakeoff 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 84–87.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
- Yu-Ming Hsieh, Ming-Hong Bai, and Keh-Jiann Chen. 2013. Introduction to CKIP Chinese spelling check system for SIGHAN Bakeoff 2013 evaluation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 59–63.Google Scholar
- Yu-Ming Hsieh, Ming-Hong Bai, Shu-Ling Huang, and Keh-Jiann Chen. 2015. Correcting Chinese spelling errors with word lattice decoding. ACM Transactions on Asian and Low-Resource Language Information Processing 14, 4 (2015), 18. Google ScholarDigital Library
- Jie Yang, Yue Zhang, and Shuailong Liang. 2019. Subword encoding in lattice LSTM for chinese word segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Technologies (Volume 1: Long and Short Papers) (NAACL’19). 2720–2725.Google Scholar
- Diederik P. Kingma and Jimmy B. 2014. Adam: A method for stochastic optimization. arXIv:1412.6980Google Scholar
- John D. Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282–289. Google ScholarDigital Library
- C.-L. Liu, M.-H. Lai, K.-W. Tien, Y.-H. Chuang, S.-H. Wu, and C.-Y. Lee. 2011. Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing 10 (2011), Article 10, 39 pages. Google ScholarDigital Library
- Xiaodong Liu, Kevin Cheng, Yanyan Luo, Kevin Duh, and Yuji Matsumoto. 2013. A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 54–58.Google Scholar
- Deryle W. Lonsdale and Diane Strong-Krause. 2003. Automated rating of ESL essays. In Proceedings of the HTL-NAACL ’03 Workshop on Building Educational Applications Using Natural Language Processing (HLT-NAACL-EDUC’03). 61. Google ScholarDigital Library
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Bruno Martins and Mário J. Silva. 2004. Spelling correction for search engine queries. In Advances in Natural Language Processing. Lecture Notes in Computer Science, Vol. 3230. Springer. 372–383.Google Scholar
- Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen. 2015. Introduction to SIGHAN 2015 Bake-off for Chinese spelling check. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing. 32–37.Google ScholarCross Ref
- Andrew J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13 (1967), 260–269. Google ScholarDigital Library
- Dingmin Wang, Yan Song, Jing Li, Jialong Han, and Haisong Zhang. 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
- Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. 2013. Chinese spelling check evaluation at SIGHAN Bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 35–42.Google Scholar
- Ting-Hao Yang, Yu-Lun Hsieh, Yu-Hsuan Chen, Michael Tsang, Cheng-Wei Shih, and Wen-Lian Hsu. 2013. Sinica-IASL Chinese spelling check system at SIGHAN-7. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 93–96.Google Scholar
- Jui-Feng Yeh, Sheng-Feng Li, Mei-Rong Wu, Wen-Yi Chen, and Mao-Chuan Su. 2013. Chinese word spelling correction based on n-gram ranked inverted index list. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 43–48.Google Scholar
- Junjie Yu and Zhenghua Li. 2014. Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 220–223.Google ScholarCross Ref
- Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen. 2014. Overview of SIGHAN 2014 Bake-off for Chinese spelling check. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing.Google Scholar
- Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
Index Terms
- Chinese Spelling Error Detection Using a Fusion Lattice LSTM
Recommendations
A Hybrid Model for Chinese Spelling Check
Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic errors ...
Correcting Chinese Spelling Errors with Word Lattice Decoding
Special Issue on Chinese Spell CheckingChinese spell checkers are more difficult to develop because of two language features: 1) there are no word boundaries, and a character may function as a word or a word morpheme; and 2) the Chinese character set contains more than ten thousand ...
A Probabilistic Framework for Chinese Spelling Check
Special Issue on Chinese Spell CheckingChinese spelling check (CSC) is still an unsolved problem today since there are many homonymous or homomorphous characters. Recently, more and more CSC systems have been proposed. To the best of our knowledge, language modeling is one of the major ...
Comments