An extractive text summarization approach using tagged-LDA based topic modeling

Rani, Ruby; Lobiyal, D. K.

doi:10.1007/s11042-020-09549-3

An extractive text summarization approach using tagged-LDA based topic modeling

Published: 19 September 2020

Volume 80, pages 3275–3305, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruby Rani¹ &
D. K. Lobiyal¹

1800 Accesses
36 Citations
Explore all metrics

Abstract

Automatic text summarization is an exertion of contriving the abridged form of a text document covering salient knowledge. Numerous statistical, linguistic, rule-based, and position-based text summarization approaches have been explored for different rich-resourced languages. For under-resourced languages such as Hindi, automatic text summarization is a challenging task and still an unsolved problem. Another issue with such languages is the unavailability of corpus and the inadequacy of the processing tools. In this paper, we proposed an extractive lexical knowledge-rich topic modeling text summarization approach for Hindi novels and stories in which we implemented four independent variants using different sentence weighting schemes. We prepared a corpus of Hindi Novels and stories since the absence of a corpus. We used a smoothing technique for edifying and variety summaries followed by evaluating the efficacy of generated summaries against three metrics (gist diversity, retention ratio, and ROUGE score). The results manifest that the proposed model produces abridge, articulate and coherent summaries. To investigate the performance of the proposed model, we simulate the experiments on the English dataset as well. Further, we compare our models with the baselines and traditional topic modeling approach, where we show that the proposed model has confessed optimal results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

References

Aggarwal CC (2018) Text Summarization, In Machine Learning for Text, Springer, pp. 361–380
Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms, Cognit Comput, 1–19
Bairi R, Iyer R, Ramakrishnan G, Bilmes J (2015) Summarization of multi-document topic hierarchies using submodular mixtures, In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 553–563
Bamman D, Smith NA (2013) New alignment methods for discriminative book summarization, arXiv Prepr. arXiv1305.1319
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
MATH Google Scholar
Blogwriter, Munshi Premchand’s Stories. [Online]. Available: http://premchand.kahaani.org/. (Accessed: 29-Mar-2019)
Brainy, “Brainy questions.” [Online]. Available: https://brainly.in/subject/hindi. (Accessed: 27-Mar-2019)
Ceylan H (2011) Investigating the extractive summarization of literary novels. University of North Texas, Denton
Google Scholar
Chi L, Li B, Zhu X (2014) Context-preserving hashing for fast text classification, In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 100–108
Di Fabbrizio G, Aker A, Gaizauskas R (2013) Summarizing online reviews using aspect rating distributions and language modeling. IEEE Intell Syst 28(3):28–37
Article Google Scholar
Elhadad M, Miranda-Jiménez S, Steinberger J, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 2: Czech, hebrew and spanish, In Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, pp. 13–19
Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Article Google Scholar
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19–25
Gupta V (2013) Hybrid algorithm for multilingual summarization of Hindi and Punjabi documents, In Mining Intelligence and Knowledge Exploration, Springer, pp. 717–727
Gupta V, Kaur N (2016) A novel hybrid text summarization system for Punjabi text. Cogn Comput 8(2):261–277
Article Google Scholar
Hafeez R, Khan S, Abbas MA, Maqbool F (2018) Topic based Summarization of Multiple Documents using Semantic Analysis and Clustering, In 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT), pp. 70–74
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews–a text summarization approach. Inf Process Manag 53(2):436–449
Article Google Scholar
Huang T, Li L, Zhang Y (2016) Multilingual multi-document summarization with enhanced hLDA features, In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, pp. 299–312
John A, Premjith PS, Wilscy M (2017) Extractive multi-document summarization using population-based multicriteria optimization. Expert Syst Appl 86:385–397
Article Google Scholar
Kabadjov M, Atkinson M, Steinberger J, Steinberger R, Van Der Goot E (2010) NewsGist: a multilingual statistical news summarizer, In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 591–594
Kaljahi R, Foster J, Roturier J (2014) Semantic role labelling with minimal resources: Experiments with french, In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (* SEM 2014), pp. 87–92
Kazantseva A, Szpakowicz S (2010) Summarizing short stories. Comput Linguist 36(1):71–109
Article Google Scholar
Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic, In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 159–168
Li L, Forascu C, El-Haj M, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 1: Arabic, English, Greek, Chinese, Romanian
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries, Text Summ. Branches Out
Litvak M, Vanetik N, Liu C, Xiao L, Savas O (2015) Improving summarization quality with topic modeling, In Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 39–47
Litvak M, Vanetik N, Li L (2017) Summarizing Weibo with Topics Compression, In International Conference on Computational Linguistics and Intelligent Text Processing, pp. 522–534
Liu N, Tang X-J, Lu Y, Li M-X, Wang H-W, Xiao P (2014) Topic-Sensitive Multi-document Summarization Algorithm, In 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74
Luhn HP (1999) The automatic creation of literature abstracts. Adv Autom Text Summ:15–22
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169
Article Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing
Google Scholar
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, del Fiol G (2014) Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467
Article Google Scholar
Na L, Ming-xia L, Ying L, Xiao-jun T, Hai-wen W, Peng X (2014) Mixture of topic model for multi-document summarization, In The 26th Chinese Control and Decision Conference (2014 CCDC), pp. 5168–5172
Na L, Ying L, Xiao-jun T, Hai-wen W, Peng X, Ming-xia L (2016) Multi-document summarization algorithm based on significance sentences, In 2016 Chinese Control and Decision Conference (CCDC), pp. 3847–3852
Nomoto T, Matsumoto Y (2003) The diversity-based approach to open-domain text summarization. Inf Process Manag 39(3):363–389
Article Google Scholar
Oufaida H, Blache P, Nouali O (2015) Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization, In International Conference on Applications of Natural Language to Information Systems, pp. 51–63
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417
Article MathSciNet Google Scholar
Parveen D, Strube M (2015) Integrating importance, non-redundancy and coherence in graph-based extractive summarization, In Twenty-Fourth International Joint Conference on Artificial Intelligence
Patel A, Siddiqui T, Tiwary US (2007) A language independent approach to multilingual text summarization, In Large scale semantic access to content (text, image, video, and sound), pp. 123–132
Radev DR et al. (2004) MEAD-a platform for multidocument multilingual text summarization
Rani R, Lobiyal DK (2018) Social Choice Theory Based Domain Specific Hindi Stop Words List Construction and Its Application in Text Mining, In International Conference on Intelligent Human Computer Interaction, pp. 123–135
Rani R, Lobiyal DK (2018) Automatic Construction of Generic Stop Words List for Hindi Text, In Procedia Computer Science Elsevier Journal, pp. 1–7
Roul RK, Mehrotra S, Pungaliya Y, Sahoo JK (2019) A New Automatic Multi-document Text Summarization using Topic Modeling, In International Conference on Distributed Computing and Internet Technology, pp. 212–221
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Syst 159:1–8
Article Google Scholar
Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cogn Comput 9(5):671–688
Article Google Scholar
Torres-Moreno J-M, St-Onge P-L, Gagnon M, El-Beze M, Bellot P (2009) Automatic summarization system coupled with a question-answering system (qaas), arXiv Prepr. arXiv0905.2990
Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models, In Proceedings of the ACL-IJCNLP 2009 conference short papers, pp. 297–300
Wikipedia, Premchand (2019) [Online]. Available: https://en.wikipedia.org/wiki/Premchand. (Accessed: 29-Mar-2019)
Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23
Article Google Scholar
Yang G, Wen D, Chen N-S, Sutinen E (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352
Article Google Scholar
Yao J, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization, In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 118–127
Yao J, Wan X, Xiao J (2015) Compressive document summarization via sparse optimization, In Twenty-Fourth International Joint Conference on Artificial Intelligence
Zhuang L, Jing F, Zhu X-Y (2006) Movie review mining and summarization, In Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43–50

Download references

Acknowledgments

The authors are grateful to anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi, India
Ruby Rani & D. K. Lobiyal

Authors

Ruby Rani
View author publications
You can also search for this author in PubMed Google Scholar
D. K. Lobiyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruby Rani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rani, R., Lobiyal, D.K. An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80, 3275–3305 (2021). https://doi.org/10.1007/s11042-020-09549-3

Download citation

Received: 18 February 2020
Revised: 26 June 2020
Accepted: 06 August 2020
Published: 19 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09549-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extractive text summarization approach using tagged-LDA based topic modeling

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Automated identification of media bias in news articles: an interdisciplinary literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An extractive text summarization approach using tagged-LDA based topic modeling

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Automated identification of media bias in news articles: an interdisciplinary literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation