Skip to main content

Incorporating Syntactic Cognitive in Multi-granularity Data Augmentation for Chinese Grammatical Error Correction

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14452))

Included in the following conference series:

  • 414 Accesses

Abstract

Chinese grammatical error correction (CGEC) has recently attracted a lot of attention due to its real-world value. The current mainstream approaches are all data-driven, but the following flaws still exist. First, there is less high-quality training data with complexity and a variety of errors, and data-driven approaches frequently fail to significantly increase performance due to the lack of data. Second, the existing data augmentation methods for CGEC mainly focus on word-level augmentation and ignore syntactic-level information. Third, the current data augmentation methods are strongly randomized, and fewer can fit the cognition pattern of students on syntactic errors. In this paper, we propose a novel multi-granularity data augmentation method for CGEC, and construct a syntactic error knowledge base for error type Missing and Redundant Components, and syntactic conversion rules for error type Improper Word Order based on a finely labeled syntactic structure treebank. Additionally, we compile a knowledge base of character and word errors from actual student essays. Then, a data augmentation algorithm incorporating character, word, and syntactic noise is designed to build the training set. Extensive experiments show that the \(F_{0.5}\) in the test set is 36.77%, which is a 6.2% improvement compared to the best model in the NLPCC Shared Task, proving the validity of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tcci.ccf.org.cn/conference/2018/dldoc/trainingdata02.tar.gz.

  2. 2.

    http://hsk.blcu.edu.cn/.

  3. 3.

    http://www.jubenwei.com/.

  4. 4.

    https://challenger.ai/datasets/translation.

  5. 5.

    https://github.com/pytorch/fairseq.

  6. 6.

    https://github.com/nusnlp/m2scorer.

References

  1. Fu, K., Huang, J., Duan, Y.: Youdao’s winning solution to the NLPCC-2018 task 2 challenge: a neural machine translation approach to Chinese grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11108, pp. 341–350. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99495-6_29

    Chapter  Google Scholar 

  2. He, W., Wang, H., Guo, Y., Liu, T.: Dependency based Chinese sentence realization. In: Proceedings of ACL-AFNLP, pp. 809–816 (2009)

    Google Scholar 

  3. Hinson, C., Huang, H.H., Chen, H.H.: Heterogeneous recycle generation for Chinese grammatical error correction. In: Proceedings of COLING, pp. 2191–2201 (2020)

    Google Scholar 

  4. Kasper, G., Roever, C.: Pragmatics in Second Language Learning. Handbook of Research in Second Language Teaching and Learning, pp. 317–334 (2005)

    Google Scholar 

  5. Li, J., et al.: Sequence-to-action: grammatical error correction with action guided sequence generation. In: Proceedings of AAAI. vol. 36, pp. 10974–10982 (2022)

    Google Scholar 

  6. Li, P., Shi, S.: Tail-to-tail non-autoregressive sequence prediction for Chinese grammatical error correction. In: Proceedings of ACL, pp. 4973–4984 (2021)

    Google Scholar 

  7. Ma, S., et al.: Linguistic rules-based corpus generation for native Chinese grammatical error correction. In: Findings of EMNLP, pp. 576–589 (2022)

    Google Scholar 

  8. Peng, W., Wei, Z., Song, J., Yu, S., Sui, Z.: Formalized Chinese sentence pattern structure and its hierarchical analysis. In: Proceedings of CLSW, pp. 286–298 (2022)

    Google Scholar 

  9. Ren, H., Yang, L., Xun, E.: A sequence to sequence learning for Chinese grammatical error correction. In: Proceedings of NLPCC, pp. 401–410 (2018)

    Google Scholar 

  10. Tang, Z., Ji, Y., Zhao, Y., Li, J.: Chinese grammatical error correction enhanced by data augmentation from word and character levels. In: Proceedings of CCL, pp. 13–15 (2021)

    Google Scholar 

  11. Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS. vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

  12. Wang, C., Yang, L., Wang, y., Du, y., Yang, E.: Chinese grammatical error correction method based on transformer enhanced architecture. J. Chin. Inf. Process. 34(6), 106–114 (2020)

    Google Scholar 

  13. Wang, Q., Tan, Y.: Chinese grammatical error correction method based on data augmentation and copy mechanism. CAAI Trans. Intell. Syst. 15(1), 99–106 (2020)

    Google Scholar 

  14. Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)

    Article  Google Scholar 

  15. Zhang, Y., Song, J., Peng, W., Zhao, Y., Song, T.: Automatic conversion of phrase structure TreeBank to sentence structure treebank. J. Chin. Inf. Process. 5, 31–41 (2018)

    Article  Google Scholar 

  16. Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of NAACL, pp. 3118–3130 (2022)

    Google Scholar 

  17. Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of AAAI. vol. 34, pp. 1226–1233 (2020)

    Google Scholar 

  18. Zhou, J., Li, C., Liu, H., Bao, Z., Xu, G., Li, L.: Chinese grammatical error correction using statistical and neural models. In: Proceedings of NLPCC, pp. 117–128 (2018)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Beijing Natural Science Foundation (Grant No.4234081), the National Natural Science Foundation of China (Grant No.62007004), the Major Program of Key Research Base of Humanities and Social Sciences of the Ministry of Education of China (Grant No.22JJD740017) and the Scientific and Technological Project of Henan Province of China (Grant No.232102210077).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianbao Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, J., Peng, W., Xu, Z., Wang, S., Song, T., Song, J. (2024). Incorporating Syntactic Cognitive in Multi-granularity Data Augmentation for Chinese Grammatical Error Correction. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8076-5_27

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8075-8

  • Online ISBN: 978-981-99-8076-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics