An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding

Zhang, Yupei; Khan, Md Shahedul Islam; Zhou, Yaya; Xiao, Min; Shang, Xuequn

doi:10.1007/978-3-031-13832-4_15

An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding

Conference paper
First Online: 16 August 2022

1757 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13395))

Abstract

This paper aims to develop a Chinese text classification workflow in education situations, where a grade can swing due to subjective cognitive loads. This problem is often observed between the academic paper comments and their grades, leading to a challenge in Chinese texts. To analyze this problem, we in this paper introduce an effective Chinese text classifier by extending the popular seed words-based model into an effective workflow. We first made texts into vectors in the proposed method using Chinese preprocessing. We then exploited the bidirectional encoder representations from Transformers to integrate the contextualization features, then performed a hierarchical attention network for classification. In this study, we collected 4,310 review comment short-texts involving 140 universities in China. As these texts include noisy grades from experts, the proposed method yields seed words for each category, resulting in pseudo labels to weakly supervise the network training instead of the noisy labels. We finally evaluated the designed workflow on the real-world datasets and achieved a good performance in Chinese classification compared with the traditional models. This study provides insights into a real educational text case where a review grade can swing due to subjective cognitive loads and an available workflow to automatically grade these Chinese expert comment texts, facilitating the precise academic evaluation system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Wang, Y., Sohn, S., Liu, S., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19, 1 (2019)
Article Google Scholar
Yu, M., Jiaming, S., Chao, Z., Jiawei, H.: Weakly-supervised neural text classification. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), pp. 983–992. Association for Computing Machinery, New York, NY, USA (2018)
Google Scholar
Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, A., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020)
Article Google Scholar
Zhang, Y., An, R., Liu, S., Cui, J., Shang, X., 2021. Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data 1–1 (2021)
Google Scholar
Liu, Q., Shen, S., Huang, Z., Chen, E., Zheng, Y.: A survey of knowledge tracing. arXiv preprint arXiv:2105.15106 (2021)
Yun, Y., Dai, H., Cao, R., Zhang, Y., Shang, X.: Self-paced graph memory network for student GPA prediction and abnormal student detection. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12749, pp. 417–421. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78270-2_74
Chapter Google Scholar
Dwivedi, P., Kant, V., Bharadwaj, K.K.: Learning path recommendation based on modified variable length genetic algorithm. Educ. Inform. Technol. 23(2), 819–836 (2017). https://doi.org/10.1007/s10639-017-9637-7
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning–based text classification: a comprehensive review. ACM Comput. Surv. 54(3), 1–40 (2021)
Article Google Scholar
Mohamed, D.A.R., Sakre, M.M.: A performance comparison between classification techniques with CRM application. SAI Intell. Syst. Conf. 2015, 112–119 (2015)
Google Scholar
Kumar, G.K., Rani, D.M.: Paragraph summarization based on word frequency using NLP techniques. In: AIP Conference Proceedings, vol. 2317, p. 060001 (2021)
Google Scholar
Anhar, R., Adji, T.B., Setiawan, N.A.: Question classification on question-answer system using bidirectional-LSTM. In: 2019 5th International Conference on Science and Technology (ICST), pp. 1–5 (2019)
Google Scholar
En.wikipedia.org.: Support-vector machine – Wikipedia (2022). https://en.wikipedia.org/wiki/Support-vector-machine. Accessed 10 April 2022
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Yu, S., et al.: ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (2021)
Google Scholar
Shree, P.: The Journey of Open AI GPT models. Medium (2020). https://medium.com/walmartglobaltech/the-journey-of-open-ai-gpt-models-32d95b7b7fb2. Accessed 10 April 2022
Mass, Y., Roitman, H.: Ad-hoc document retrieval using weak-supervision with BERT and GPT2. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4191–4197. Association for Computational Linguistics (2020)
Google Scholar
Zhang, L., Ding, J., Xu, Y., Liu, Y., Zhou, S.: Weakly-supervised Text Classification Based on Keyword Graph (2021). https://doi.org/10.18653/v1/2021.emnlp-main.222
Wikimedia Foundation: Transformer (Machine Learning Model). Wikipedia (2022). Retrieved from 12 April 2022. https://en.wikipedia.org/wiki/Transformer(machine-learning-model)32d95b7b7fb2. Accessed 10 April 2022
Zhang, Y., Zhou, Y., Xiao, M., et al.: Comment text grading for Chinese graduate academic dissertation using attention convolutional neural networks. In: 2021 7th International Conference on Systems and Informatics (ICSAI), pp. 1–6. IEEE (2021)
Google Scholar
PyPI: jieba (2022). https://pypi.org/project/jieba/. Accessed 11 April 2022
Welcome to Harvesttext's documentation: Welcome to HarvestText's documentation - HarvestText 0.8.1.6 documentation. (n.d.). Retrieved from 11 April 2022. https://harvesttext.readthedocs.io/en/latest/. Accessed 11 April 2022
Mekala, D., Shang, J.: Contextualized weak supervision for text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 323–333. Association for Computational Linguistics (2020)
Google Scholar
Huggingface.co.: ckiplab/bert-base-chinese · Hugging Face (2022). https://huggingface.co/ckiplab/bert-base-chinese. Accessed 11 April 2022
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. Association for Computational Linguistics, San Diego, California (2016)
Google Scholar
Analytics India Magazine: A complete tutorial on masked language modelling using BERT (2022). https://analyticsindiamag.com/a-complete-tutorial-on-masked-language-modelling-using-bert. Accessed 14 April 2022
Agarap, A.F.: Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018)
Diederik, K., Jimmy, B.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Google Scholar
Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017)
Article Google Scholar
Zhang, Y., Xiang, M., Yang, B.: Hierarchical sparse coding from a Bayesian perspective. Neurocomputing 272, 279–293 (2018)
Article Google Scholar
Stopwords-Iso.: STOPWORDS-ZH/STOPWORDS-ZH.TXT at master · stopwords-ISO/stopwords-zh. GitHub (2020). Retrieved from 28 March 2022. https://github.com/stopwords-iso/stopwords-zh/blob/master/stopwords-zh.txt. Accessed 11 April 2022
Zhang, Y., Xiang, M., Yang, B.: Low-rank preserving embedding. Pattern Recogn. 70, 112–125 (2017). ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2017.05.003
Zhang Y, et al.: Multi-needle detection in 3D ultrasound images using unsupervised order-graph regularized sparse dictionary learning. IEEE Trans. Med. Imaging 39(7), 2302–2315 (2020). https://doi.org/10.1109/TMI.2020.2968770. Epub 2020 Jan 22. PMID: 31985414; PMCID: PMC7370243
Zhang, Y., Dai, H., Yun, Y., Liu, S., Lan, S., Shang, X.: Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis. Knowl. Based Syst. 205, 106290 (2020). ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2020.106290
Zhang, Y., An, R., Liu, S., Cui, J., Shang, X.: Predicting and understanding student learning performance using multi-source sparse attention convolutional neural networks. IEEE Trans. Big Data. https://doi.org/10.1109/TBDATA.2021.3125204
Liu, S., Zhang, Y., Shang, X., Zhang, Z.: ProTICS reveals prognostic impact of tumor infiltrating immune cells in different molecular subtypes. Brief Bioinform. 22(6), bbab164 (2021). https://doi.org/10.1093/bib/bbab164. PMID: 33963834

Download references

Acknowledgement

This study was funded in part by the National Natural Science Foundation of China (U1811262, 61802313, 61772426), the Key Research and Development Program of China (2020AAA0108500), the Reformation Research on Education and Teaching at Northwestern Polytechnical University (2021JGY31), the Higher Research Funding on International Talent cultivation at Northwestern Polytechnical University (GJGZZD202202), Research Topic at The Chinese Society of Academic Degrees and Graduate Education (2020ZA1008).

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Yupei Zhang, Md Shahedul Islam Khan, Yaya Zhou & Xuequn Shang
MIIT Big Data Storage and Management Libraries, Xi’an, China
Yupei Zhang, Yaya Zhou & Xuequn Shang
Graduate School, Northwestern Polytechnical University, Xi’an, China
Min Xiao

Authors

Yupei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Md Shahedul Islam Khan
View author publications
You can also search for this author in PubMed Google Scholar
Yaya Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Min Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xuequn Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yupei Zhang or Xuequn Shang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Xi’an Polytechnic University, Xi’an, China
Junfeng Jing
The University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Polytecnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Khan, M.S.I., Zhou, Y., Xiao, M., Shang, X. (2022). An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-13832-4_15
Published: 16 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13831-7
Online ISBN: 978-3-031-13832-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics