Skip to main content

An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13395))

Abstract

This paper aims to develop a Chinese text classification workflow in education situations, where a grade can swing due to subjective cognitive loads. This problem is often observed between the academic paper comments and their grades, leading to a challenge in Chinese texts. To analyze this problem, we in this paper introduce an effective Chinese text classifier by extending the popular seed words-based model into an effective workflow. We first made texts into vectors in the proposed method using Chinese preprocessing. We then exploited the bidirectional encoder representations from Transformers to integrate the contextualization features, then performed a hierarchical attention network for classification. In this study, we collected 4,310 review comment short-texts involving 140 universities in China. As these texts include noisy grades from experts, the proposed method yields seed words for each category, resulting in pseudo labels to weakly supervise the network training instead of the noisy labels. We finally evaluated the designed workflow on the real-world datasets and achieved a good performance in Chinese classification compared with the traditional models. This study provides insights into a real educational text case where a review grade can swing due to subjective cognitive loads and an available workflow to automatically grade these Chinese expert comment texts, facilitating the precise academic evaluation system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wang, Y., Sohn, S., Liu, S., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19, 1 (2019)

    Article  Google Scholar 

  2. Yu, M., Jiaming, S., Chao, Z., Jiawei, H.: Weakly-supervised neural text classification. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), pp. 983–992. Association for Computing Machinery, New York, NY, USA (2018)

    Google Scholar 

  3. Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, A., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020)

    Article  Google Scholar 

  4. Zhang, Y., An, R., Liu, S., Cui, J., Shang, X., 2021. Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data 1–1 (2021)

    Google Scholar 

  5. Liu, Q., Shen, S., Huang, Z., Chen, E., Zheng, Y.: A survey of knowledge tracing. arXiv preprint arXiv:2105.15106 (2021)

  6. Yun, Y., Dai, H., Cao, R., Zhang, Y., Shang, X.: Self-paced graph memory network for student GPA prediction and abnormal student detection. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12749, pp. 417–421. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78270-2_74

    Chapter  Google Scholar 

  7. Dwivedi, P., Kant, V., Bharadwaj, K.K.: Learning path recommendation based on modified variable length genetic algorithm. Educ. Inform. Technol. 23(2), 819–836 (2017). https://doi.org/10.1007/s10639-017-9637-7

  8. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning–based text classification: a comprehensive review. ACM Comput. Surv. 54(3), 1–40 (2021)

    Article  Google Scholar 

  9. Mohamed, D.A.R., Sakre, M.M.: A performance comparison between classification techniques with CRM application. SAI Intell. Syst. Conf. 2015, 112–119 (2015)

    Google Scholar 

  10. Kumar, G.K., Rani, D.M.: Paragraph summarization based on word frequency using NLP techniques. In: AIP Conference Proceedings, vol. 2317, p. 060001 (2021)

    Google Scholar 

  11. Anhar, R., Adji, T.B., Setiawan, N.A.: Question classification on question-answer system using bidirectional-LSTM. In: 2019 5th International Conference on Science and Technology (ICST), pp. 1–5 (2019)

    Google Scholar 

  12. En.wikipedia.org.: Support-vector machine – Wikipedia (2022). https://en.wikipedia.org/wiki/Support-vector-machine. Accessed 10 April 2022

  13. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016)

    Google Scholar 

  14. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  15. Yu, S., et al.: ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (2021)

    Google Scholar 

  16. Shree, P.: The Journey of Open AI GPT models. Medium (2020). https://medium.com/walmartglobaltech/the-journey-of-open-ai-gpt-models-32d95b7b7fb2. Accessed 10 April 2022

  17. Mass, Y., Roitman, H.: Ad-hoc document retrieval using weak-supervision with BERT and GPT2. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4191–4197. Association for Computational Linguistics (2020)

    Google Scholar 

  18. Zhang, L., Ding, J., Xu, Y., Liu, Y., Zhou, S.: Weakly-supervised Text Classification Based on Keyword Graph (2021). https://doi.org/10.18653/v1/2021.emnlp-main.222

  19. Wikimedia Foundation: Transformer (Machine Learning Model). Wikipedia (2022). Retrieved from 12 April 2022. https://en.wikipedia.org/wiki/Transformer(machine-learning-model)32d95b7b7fb2. Accessed 10 April 2022

  20. Zhang, Y., Zhou, Y., Xiao, M., et al.: Comment text grading for Chinese graduate academic dissertation using attention convolutional neural networks. In: 2021 7th International Conference on Systems and Informatics (ICSAI), pp. 1–6. IEEE (2021)

    Google Scholar 

  21. PyPI: jieba (2022). https://pypi.org/project/jieba/. Accessed 11 April 2022

  22. Welcome to Harvesttext's documentation: Welcome to HarvestText's documentation - HarvestText 0.8.1.6 documentation. (n.d.). Retrieved from 11 April 2022. https://harvesttext.readthedocs.io/en/latest/. Accessed 11 April 2022

  23. Mekala, D., Shang, J.: Contextualized weak supervision for text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 323–333. Association for Computational Linguistics (2020)

    Google Scholar 

  24. Huggingface.co.: ckiplab/bert-base-chinese · Hugging Face (2022). https://huggingface.co/ckiplab/bert-base-chinese. Accessed 11 April 2022

  25. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. Association for Computational Linguistics, San Diego, California (2016)

    Google Scholar 

  26. Analytics India Magazine: A complete tutorial on masked language modelling using BERT (2022). https://analyticsindiamag.com/a-complete-tutorial-on-masked-language-modelling-using-bert. Accessed 14 April 2022

  27. Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018)

  28. Diederik, K., Jimmy, B.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)

    Google Scholar 

  29. Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017)

    Article  Google Scholar 

  30. Zhang, Y., Xiang, M., Yang, B.: Hierarchical sparse coding from a Bayesian perspective. Neurocomputing 272, 279–293 (2018)

    Article  Google Scholar 

  31. Stopwords-Iso.: STOPWORDS-ZH/STOPWORDS-ZH.TXT at master · stopwords-ISO/stopwords-zh. GitHub (2020). Retrieved from 28 March 2022. https://github.com/stopwords-iso/stopwords-zh/blob/master/stopwords-zh.txt. Accessed 11 April 2022

  32. Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017). ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2017.05.003

  33. Zhang Y, et al.: Multi-needle detection in 3D ultrasound images using unsupervised order-graph regularized sparse dictionary learning. IEEE Trans. Med. Imaging 39(7), 2302–2315 (2020). https://doi.org/10.1109/TMI.2020.2968770. Epub 2020 Jan 22. PMID: 31985414; PMCID: PMC7370243

  34. Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, S., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106290

  35. Zhang, Y., An, R., Liu, S., Cui, J., Shang, X.: Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data. https://doi.org/10.1109/TBDATA.2021.3125204

  36. Liu, S., Zhang, Y., Shang, X., Zhang, Z.: ProTICS reveals prognostic impact of tumor infiltrating immune cells in different molecular subtypes. Brief Bioinform. 22(6), bbab164 (2021). https://doi.org/10.1093/bib/bbab164. PMID: 33963834

Download references

Acknowledgement

This study was funded in part by the National Natural Science Foundation of China (U1811262, 61802313, 61772426), the Key Research and Development Program of China (2020AAA0108500), the Reformation Research on Education and Teaching at Northwestern Polytechnical University (2021JGY31), the Higher Research Funding on International Talent cultivation at Northwestern Polytechnical University (GJGZZD202202), Research Topic at The Chinese Society of Academic Degrees and Graduate Education (2020ZA1008).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yupei Zhang or Xuequn Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Khan, M.S.I., Zhou, Y., Xiao, M., Shang, X. (2022). An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13832-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13831-7

  • Online ISBN: 978-3-031-13832-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics