skip to main content
research-article

Chinese Spelling Error Detection Using a Fusion Lattice LSTM

Authors Info & Claims
Published:05 May 2021Publication History
Skip Abstract Section

Abstract

Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Unlike English, where every single word is directly typed by keyboard, we have to use an input method to input Chinese characters. The pinyin input method is the most widely used. By intuition, pinyin should be helpful in detecting spelling errors. However, when detect spelling errors, most of the current methods ignore the pinyin information and adopt a pipeline framework that leads to error propagation. In this article, we propose a fusion lattice-LSTM model under the end-to-end framework to integrate character, word, and pinyin features for error detection. Experiments on the SIGHAN Bake-off-2015 dataset show that pinyin is a discriminating feature, and our end-to-end model outperforms the baseline models obviously.

References

  1. Jill Burstein and Martin Chodorow. 1999. Automated essay scoring for nonnative English speakers. In Proceedings of the Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing. 68–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chao-Huang Chang. 1995. A new approach for automatic Chinese spelling correction. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Vol. 95. 278–283.Google ScholarGoogle Scholar
  3. Kuan-Yu Chen, Hung-Shin Lee, Chung-Han Lee, Hsin-Min Wang, and Hsin-Hsi Chen. 2013. A study of language modeling for Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 79–83.Google ScholarGoogle Scholar
  4. Hsun-Wen Chiu, Jian-Cheng Wu, and Jason S. Chang. 2013. Chinese spelling checker based on statistical machine translation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 49–53.Google ScholarGoogle Scholar
  5. Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quirk, and Xu Sun. 2010. A large scale ranker-based system for search query spelling correction. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 358–366 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2017), 2222–2232.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dongxu Han and Baobao Chang. 2013. A maximum entropy approach to Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 74–78.Google ScholarGoogle Scholar
  8. Yu He and Guohong Fu. 2013. Description of HLJU Chinese spelling checker for SIGHAN Bakeoff 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 84–87.Google ScholarGoogle Scholar
  9. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yu-Ming Hsieh, Ming-Hong Bai, and Keh-Jiann Chen. 2013. Introduction to CKIP Chinese spelling check system for SIGHAN Bakeoff 2013 evaluation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 59–63.Google ScholarGoogle Scholar
  11. Yu-Ming Hsieh, Ming-Hong Bai, Shu-Ling Huang, and Keh-Jiann Chen. 2015. Correcting Chinese spelling errors with word lattice decoding. ACM Transactions on Asian and Low-Resource Language Information Processing 14, 4 (2015), 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jie Yang, Yue Zhang, and Shuailong Liang. 2019. Subword encoding in lattice LSTM for chinese word segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Technologies (Volume 1: Long and Short Papers) (NAACL’19). 2720–2725.Google ScholarGoogle Scholar
  13. Diederik P. Kingma and Jimmy B. 2014. Adam: A method for stochastic optimization. arXIv:1412.6980Google ScholarGoogle Scholar
  14. John D. Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282–289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C.-L. Liu, M.-H. Lai, K.-W. Tien, Y.-H. Chuang, S.-H. Wu, and C.-Y. Lee. 2011. Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing 10 (2011), Article 10, 39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaodong Liu, Kevin Cheng, Yanyan Luo, Kevin Duh, and Yuji Matsumoto. 2013. A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 54–58.Google ScholarGoogle Scholar
  17. Deryle W. Lonsdale and Diane Strong-Krause. 2003. Automated rating of ESL essays. In Proceedings of the HTL-NAACL ’03 Workshop on Building Educational Applications Using Natural Language Processing (HLT-NAACL-EDUC’03). 61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  19. Bruno Martins and Mário J. Silva. 2004. Spelling correction for search engine queries. In Advances in Natural Language Processing. Lecture Notes in Computer Science, Vol. 3230. Springer. 372–383.Google ScholarGoogle Scholar
  20. Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen. 2015. Introduction to SIGHAN 2015 Bake-off for Chinese spelling check. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing. 32–37.Google ScholarGoogle ScholarCross RefCross Ref
  21. Andrew J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13 (1967), 260–269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dingmin Wang, Yan Song, Jing Li, Jialong Han, and Haisong Zhang. 2018. A hybrid approach to automatic corpus generation for Chinese spelling check. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  23. Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. 2013. Chinese spelling check evaluation at SIGHAN Bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 35–42.Google ScholarGoogle Scholar
  24. Ting-Hao Yang, Yu-Lun Hsieh, Yu-Hsuan Chen, Michael Tsang, Cheng-Wei Shih, and Wen-Lian Hsu. 2013. Sinica-IASL Chinese spelling check system at SIGHAN-7. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 93–96.Google ScholarGoogle Scholar
  25. Jui-Feng Yeh, Sheng-Feng Li, Mei-Rong Wu, Wen-Yi Chen, and Mao-Chuan Su. 2013. Chinese word spelling correction based on n-gram ranked inverted index list. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 43–48.Google ScholarGoogle Scholar
  26. Junjie Yu and Zhenghua Li. 2014. Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 220–223.Google ScholarGoogle ScholarCross RefCross Ref
  27. Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen. 2014. Overview of SIGHAN 2014 Bake-off for Chinese spelling check. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing.Google ScholarGoogle Scholar
  28. Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Chinese Spelling Error Detection Using a Fusion Lattice LSTM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 2
      March 2021
      313 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3454116
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 May 2021
      • Accepted: 1 September 2020
      • Revised: 1 July 2020
      • Received: 1 November 2019
      Published in tallip Volume 20, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format