Skip to main content
Log in

A news classification applied with new text representation based on the improved LDA

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, news classification became an essential part of the Natural Language Processing (NLP). The traditional Latent Dirichlet Allocation (LDA) model used the generated “topic-document” matrix θ as a text representation feature to train a classifier and has achieved improved results. However, some text information will be missed using only the “topic-document” matrix θ as the text feature. In addition, the Gibbs sampling iteration number of the traditional LDA model must be set in advance, which affects the algorithm’s speed. In this paper, the traditional LDA model is improved in two phases. In the first phase, a method to determine the convergence of the parameter search process is proposed. An adaptive iterative method is used with the proposed method. In the second phase, a new text representation (Cnew) obtained by multiplying the “topic-document” matrix θ and the “word-topic” matrix φ is provided. In the evaluation results, the proposed method is tested using the news corpus in the field of metallurgy, and the THU Chinese News (THUCNews) corpus provided by the Natural Language Processing Laboratory of Tsinghua University. The proposed method proved its efficiency in improving the classification accuracy and reducing the number of iterations for the Gibbs sampling compared with the traditional LDA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Akash S, Charles S (2017) Autoen-coding variational inference for topic models. arXiv preprint arXiv:1703.01488

  2. Blei DM et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Burkhardt S, Kramer S (2018) Online multi-label dependency topic models for text classification. Mach Learn 107(3):859–886

    Article  MathSciNet  Google Scholar 

  4. Cecchini D, Na L (2018) Chinese news classification. 681-684.

  5. Chair-Chickering P, Max PC-H, Joseph (2004) Proceedings of the twentieth conference on uncertainty in artificial intelligence (2004). AUAI Press, Conference on Uncertainty in Artificial Intelligence

  6. Chen K, Zhang Z, Long J, … Zhang H (2016) Turning from tf-idf to tf-igm for term weighting in text classification. Expert Syst Appl Int J 66(Dec.):245–260

    Article  Google Scholar 

  7. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine Reading. Proceedings of the 2016 conference on empirical methods in natural language processing.

  8. Deerwester S, Dumais ST, Furnas GW et al (2010) Indexing by latent semantic analysis. J Assoc Inform ence Technol 41(6):391–407

    Google Scholar 

  9. Feng G, Guo J, Jing BY et al (2015) Feature subset selection using naive bayes for text classification. Patt Recogn Lett, 65(NOV.1), 109-115.

  10. Gao J, Li M, Huang CN, … Wu A (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Comput lingus 31(4):531–574

    Article  Google Scholar 

  11. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  12. Huang R et al (2018) Classification of settlement types from tweets using LDA and LSTM. IGARSS 2018:6408–6411

    Google Scholar 

  13. Jiang L, Li C, Wang S, … Zhang L (2016) Deep feature weighting for naive bayes and its application to text classification. Eng Appl Artif Intell 52(Jun.):26–39

    Article  Google Scholar 

  14. Jia-Ni HU, Wei-Ran XU, Jun G et al (2005) Study on feature selection methods in chinese text categorization Study On Communications

  15. Jun L, Dongsheng Z, Xinlai X, Yinghao L (2012) Keyword extraction and headline generation using novel word features. J Comput Appl 29(11):4224–4227

    Google Scholar 

  16. Kumar M, Kaur RP, Jindal MK (2020) Newspaper text recognition of gurumukhi script using random forest classifier. Multimed Tools Appl 79(5):7435–7448

    Google Scholar 

  17. Li G, Lin Z, Wang H, … Wei X (2020) A discriminative approach to sentiment classification. Neural Process Lett 51(1):749–758

    Article  Google Scholar 

  18. Liu G, Guo J (2019) Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing, 337(APR.14), 325-338.

  19. Liu J, Wu F, Wu C et al (2019) Neural chinese word segmentation with dictionary. Neurocomputing, 338(APR.21), 46-54.

  20. Liu Y, Pang B (2019) Opinion spam detection based on annotation extension and neural networks. Comput Inform ence 12(2):87–102

    Google Scholar 

  21. Luo LX (2019) Network text sentiment analysis method combining lda text representation and gru-cnn. Pers Ubiquit Comput 23:405–412

    Article  Google Scholar 

  22. Nan F, Ding R et al (2019) Topic modeling with wasserstein autoencoders. In proceedings of the 57th annual meeting of the Association for Computational Linguistics, 6345-6381.

  23. Park ST, Liu C (2020) A study on topic models using lda and word2vec in travel route recommendation: focus on convergence travel and tours reviews. Pers Ubiquit Comput:1–17

  24. Rajeswari S, Suthendran K (2019) C5.0: advanced decision tree (adt) classification model for agricultural data analysis on cloud. Comput Electron Agric 156:530–539

    Article  Google Scholar 

  25. Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-lda based topic modeling. Multimedia Tools Appl, 1-31.

  26. Ravenzwaaij DV, Cassey P, Brown SD (2018) A simple introduction to markov chain Monte–carlo sampling. Springer Open Choice 25(1):143–154

    Google Scholar 

  27. Shen Y, He X, Gao J, et al. (2014) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval[J], 101–110

  28. Shoukun X, Jia Z, Ning L, Lin S (2018) Text topics based on Word2Vec and LDA. Comput Eng Des 39(9):2764–2769

    Google Scholar 

  29. Sun M, Li J, Guo Z, Yu Z, Zheng Y, Si X (2016) Zhiyuan Liu. THUCTC, An Efficient Chinese Text Classifier

  30. Varatharajan R, Manogaran G, Priyan MK (2017) A big data classification approach using lda with an enhanced svm method for ecg signals in cloud computing. Multimed Tools Appl 77:10195–10215

    Article  Google Scholar 

  31. Wang Q, Peng R, Wang J, et al. (2020) Research on Text Classification Method of LDA- SVM Based on PSO optimization. 2019 Chinese automation congress (CAC). IEEE. 1974–1978.

  32. Wermter S (2000) Neural network agents for learning semantic text classification. Inf Retr 3(2):87–103

    Article  Google Scholar 

  33. Xiangdong L, Zhichao B, Li H (2015) A text feature selection method based on weighted LDA model and multi-grain. Library Inf Technol 31(5):42–49

    Google Scholar 

  34. Xu AJ et al (2020) Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 386:42–53

    Article  Google Scholar 

  35. Xu G, Wu X, Yao H et al (2019) Research on topic recognition of network sensitive information based on sw-lda model. IEEE access, 1-1.

  36. Yangsen Z, Jia Z, Yuru J et al (2019) A text sentiment classification modeling method based on coordinated cnn-lstm-attention model. Chin J Electron 28(01):124–130

    Google Scholar 

  37. Yu S, Liu D, Zhu W et al (2020) Attention-based lstm, gru and cnn for short text classification. J Intell Fuzzy Syst 1-8

  38. Zhang X, Song Q (2014) Predicting the number of nearest neighbors for the k-nn classification algorithm. Intell Data Anal 18(3):449–464

    Article  Google Scholar 

  39. Zhao W, Guan Z, Chen L, … Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):185–197

    Article  Google Scholar 

  40. Zhen Y, Kefeng F, Yingxu L et al (2014) Short texts classification through reference document expansion. Chin J Electron 02:315–321

    Google Scholar 

  41. Zhiyong Q, Jianzhong Z, Guoping T et al (2014) Research on automatic word segmentation and pos tagging for chu ci based on hmm Library and Information Service

  42. Zhou C, Sun C, Liu Z et al (2015) A c-lstm neural network for text classification. Comput ence 1(4):39–44

    Google Scholar 

  43. Zhou Y, Xu B, Xu J et al (2017) Compositional recurrent neural networks for Chinese short text classification. ACM, IEEE/WIC/ACM International Conference on Web Intelligence

Download references

Acknowledgements

This work was supported by the Postdoctoral Science Foundation of China (2016M592894XB), the Nature and Science Foundation of China (61741112) and the Nature and Science Foundation of Yunnan Province (2017FB098). We also appreciated the valuable comments from the other members in our department.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dangguo Shao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shao, D., Li, C., Huang, C. et al. A news classification applied with new text representation based on the improved LDA. Multimed Tools Appl 81, 21521–21545 (2022). https://doi.org/10.1007/s11042-022-12713-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12713-6

Keywords

Navigation