A multi-semantic passing framework for semi-supervised long text classification

Ai, Wei; Wang, Ze; Shao, Hongen; Meng, Tao; Li, Keqin

doi:10.1007/s10489-023-04556-x

A multi-semantic passing framework for semi-supervised long text classification

Published: 31 March 2023

Volume 53, pages 20174–20190, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Wei Ai¹,
Ze Wang¹,
Hongen Shao¹,
Tao Meng ORCID: orcid.org/0000-0002-9787-2002¹ &
…
Keqin Li²

378 Accesses
3 Citations
Explore all metrics

Abstract

As an important task of natural language processing (NLP), text classification has flourished with the rise of deep learning techniques. However, existing deep learning methods face challenges as the length of input text increases. Many long text classification works are classified by text truncation or simply extracting keywords, which leads to the loss of rich semantic and structural information. Furthermore, there are great demands for studying semi-supervised long text classification due to the lack of labeled training data and continuously generated long texts in different stylistic. To alleviate these problems, we propose a heterogeneous attention network method based on a multi-semantic passing framework. In particular, we develop a flexible heterogeneous information graph to model the long texts by extracting information, including keywords, entities, titles, and their multi-interrelation. It can effectively integrate the semantic relationship and condense the global information to preserve the significant semantic and structural information well. Furthermore, we design a multi-semantic passing framework capable of extracting the semantic and structural information in the constructed heterogeneous information graph by the semantic degree of specific structures. Experimental works on four real-world datasets are studied, such as ThuCNews, SougouNews, 20NG, and Ohsumed, yielded outstanding results. It is shown an accuracy rate of 98.13%, 98.69%, 87.62%, and 71.46%, respectively, which performs better than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

References

Chen J, Gong X, Qiu Y, et al. (2021) Multi-label classification of long text based on key-sentences extraction. In: International conference on database systems for advanced applications, pp 3–19
Conneau A, Schwenk H, Barrault L et al (2016) Very deep convolutional networks for text classification. arXiv:160601781
Devlin J, Chang M W, Lee K, et al. (2018) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4171–4186
Du J, Huang Y, Moilanen K (2020) Pointing to select: a fast pointer-lstm for long text classification. In: Proceedings of the 28th international conference on computational linguistics, pp 6184–6193
Du J, Huang Y, Moilanen K (2021a) Knowledge-aware leap-lstm: integrating prior knowledge into leap-lstm towards faster long text classification. In: Proceedings of the AAAI conference on artificial intelligence, pp 12768–12775
Du J, Vong C M, Chen C L P (2021b) Novel efficient rnn and lstm-like architectures: recurrent and gated broad learning systems and their applications for text classification. IEEE Trans Cybern 51 (3):1586–1597
Article Google Scholar
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in Neural Information Processing Systems 30
Johnson R, Zhang T (2017) Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th annual meeting of the association for computational linguistics, pp 562–570
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:160902907
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Lan Z, Chen M, Goodman S, et al. (2020) Albert: a lite bert for self-supervised learning of language representations. In: International conference on learning representations
Linmei H, Yang T, Shi C, et al. (2019) Heterogeneous graph attention networks for semi-supervised short text classification. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 4821– 4830
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879
Mikolov T, Chen K, Corrado G, et al. (2013) Efficient estimation of word representations in vector space. Proceedings of workshop at ICLR
Minaee S, Kalchbrenner N, Cambria E, et al. (2021) Deep learning–based text classification: a comprehensive review. ACM Comput Surv 54(3):1–40
Article Google Scholar
Monti F, Otness K, Bronstein M M (2018) Motifnet: a motif-based graph convolutional network for directed graphs. In: 2018 IEEE data science workshop, pp 225–228
Peng H, Li J, Wang S, et al. (2021) Hierarchical taxonomy-aware and attentional graph capsule rcnns for large-scale multi-label text classification. IEEE Trans Knowl Data Eng 33(6):2505–2519
Article Google Scholar
Radford A, Narasimhan K (2018) Improving language understanding by generative pre-training
Ragesh R, Sellamanickam S, Iyer A, et al. (2021) Hetegcn: heterogeneous graph convolutional networks for text classification. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 860–868
Tan Z, Chen J, Kang Q, et al. (2022) Dynamic embedding projection-gated convolutional neural networks for text classification. IEEE Trans Neural Netw Learn Syst 33(3):973–982
Article Google Scholar
Tang J, Qu M, Mei Q (2015) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1165–1174
Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Advances in Neural Information Processing Systems 30
Veličković P, Cucurull G, Casanova A, et al. (2018) Graph attention networks. In: International conference on learning representations
Wang X, Ji H, Shi C, et al. (2019) Heterogeneous graph attention network. In: The world wide web conference, pp 2022–2032
Weijie D, Yunyi L, Jing Z, et al. (2021) Long text classification based on bert. In: 2021 IEEE 5th information technology, networking, electronic and automation control conference (ITNEC), pp 1147–1151
Yang T, Hu L, Shi C, et al. (2021) Hgat: heterogeneous graph attention networks for semi-supervised short text classification. ACM Trans Inf Syst 39(3):1–29
Article Google Scholar
Yang Z, Yang D, Dyer C, et al. (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Yang Z, Dai Z, Yang Y, et al. (2019) Xlnet: generalized autoregressive pretraining for language understanding. Advances in Neura l Information Processing Systems 32
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence, pp 7370–7377
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:14092329
Zhang C, Song D, Huang C, et al. (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 793–803
Zhang Y, Yu X, Cui Z, et al. (2020) Every document owns its structure: inductive text classification via graph neural networks. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 334–339

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 61802444); the Changsha Natural Science Foundation (Grant No. kq2202294), the Research Foundation of Education Bureau of Hunan Province of China (Grant No. 20B625, No. 18B196, No. 22B0275); the Research on Local Community Structure Detection Algorithms in Complex Networks (Grant No. 2020YJ009).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, 410082, Hunan, China
Wei Ai, Ze Wang, Hongen Shao & Tao Meng
Department of Computer Science, State University of New York, New Paltz, NY, 12561, USA
Keqin Li

Authors

Wei Ai
View author publications
You can also search for this author in PubMed Google Scholar
Ze Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongen Shao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Meng
View author publications
You can also search for this author in PubMed Google Scholar
Keqin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Meng.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ai, W., Wang, Z., Shao, H. et al. A multi-semantic passing framework for semi-supervised long text classification. Appl Intell 53, 20174–20190 (2023). https://doi.org/10.1007/s10489-023-04556-x

Download citation

Accepted: 01 March 2023
Published: 31 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10489-023-04556-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-semantic passing framework for semi-supervised long text classification

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-semantic passing framework for semi-supervised long text classification

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation