Abstract
Chinese grammatical error correction (CGEC) has recently attracted a lot of attention due to its real-world value. The current mainstream approaches are all data-driven, but the following flaws still exist. First, there is less high-quality training data with complexity and a variety of errors, and data-driven approaches frequently fail to significantly increase performance due to the lack of data. Second, the existing data augmentation methods for CGEC mainly focus on word-level augmentation and ignore syntactic-level information. Third, the current data augmentation methods are strongly randomized, and fewer can fit the cognition pattern of students on syntactic errors. In this paper, we propose a novel multi-granularity data augmentation method for CGEC, and construct a syntactic error knowledge base for error type Missing and Redundant Components, and syntactic conversion rules for error type Improper Word Order based on a finely labeled syntactic structure treebank. Additionally, we compile a knowledge base of character and word errors from actual student essays. Then, a data augmentation algorithm incorporating character, word, and syntactic noise is designed to build the training set. Extensive experiments show that the \(F_{0.5}\) in the test set is 36.77%, which is a 6.2% improvement compared to the best model in the NLPCC Shared Task, proving the validity of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fu, K., Huang, J., Duan, Y.: Youdao’s winning solution to the NLPCC-2018 task 2 challenge: a neural machine translation approach to Chinese grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11108, pp. 341–350. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99495-6_29
He, W., Wang, H., Guo, Y., Liu, T.: Dependency based Chinese sentence realization. In: Proceedings of ACL-AFNLP, pp. 809–816 (2009)
Hinson, C., Huang, H.H., Chen, H.H.: Heterogeneous recycle generation for Chinese grammatical error correction. In: Proceedings of COLING, pp. 2191–2201 (2020)
Kasper, G., Roever, C.: Pragmatics in Second Language Learning. Handbook of Research in Second Language Teaching and Learning, pp. 317–334 (2005)
Li, J., et al.: Sequence-to-action: grammatical error correction with action guided sequence generation. In: Proceedings of AAAI. vol. 36, pp. 10974–10982 (2022)
Li, P., Shi, S.: Tail-to-tail non-autoregressive sequence prediction for Chinese grammatical error correction. In: Proceedings of ACL, pp. 4973–4984 (2021)
Ma, S., et al.: Linguistic rules-based corpus generation for native Chinese grammatical error correction. In: Findings of EMNLP, pp. 576–589 (2022)
Peng, W., Wei, Z., Song, J., Yu, S., Sui, Z.: Formalized Chinese sentence pattern structure and its hierarchical analysis. In: Proceedings of CLSW, pp. 286–298 (2022)
Ren, H., Yang, L., Xun, E.: A sequence to sequence learning for Chinese grammatical error correction. In: Proceedings of NLPCC, pp. 401–410 (2018)
Tang, Z., Ji, Y., Zhao, Y., Li, J.: Chinese grammatical error correction enhanced by data augmentation from word and character levels. In: Proceedings of CCL, pp. 13–15 (2021)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS. vol. 30, pp. 5998–6008 (2017)
Wang, C., Yang, L., Wang, y., Du, y., Yang, E.: Chinese grammatical error correction method based on transformer enhanced architecture. J. Chin. Inf. Process. 34(6), 106–114 (2020)
Wang, Q., Tan, Y.: Chinese grammatical error correction method based on data augmentation and copy mechanism. CAAI Trans. Intell. Syst. 15(1), 99–106 (2020)
Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)
Zhang, Y., Song, J., Peng, W., Zhao, Y., Song, T.: Automatic conversion of phrase structure TreeBank to sentence structure treebank. J. Chin. Inf. Process. 5, 31–41 (2018)
Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of NAACL, pp. 3118–3130 (2022)
Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of AAAI. vol. 34, pp. 1226–1233 (2020)
Zhou, J., Li, C., Liu, H., Bao, Z., Xu, G., Li, L.: Chinese grammatical error correction using statistical and neural models. In: Proceedings of NLPCC, pp. 117–128 (2018)
Acknowledgments
This work was supported by the Beijing Natural Science Foundation (Grant No.4234081), the National Natural Science Foundation of China (Grant No.62007004), the Major Program of Key Research Base of Humanities and Social Sciences of the Ministry of Education of China (Grant No.22JJD740017) and the Scientific and Technological Project of Henan Province of China (Grant No.232102210077).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, J., Peng, W., Xu, Z., Wang, S., Song, T., Song, J. (2024). Incorporating Syntactic Cognitive in Multi-granularity Data Augmentation for Chinese Grammatical Error Correction. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14452. Springer, Singapore. https://doi.org/10.1007/978-981-99-8076-5_27
Download citation
DOI: https://doi.org/10.1007/978-981-99-8076-5_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8075-8
Online ISBN: 978-981-99-8076-5
eBook Packages: Computer ScienceComputer Science (R0)