Skip to main content
Log in

An extractive text summarization approach using tagged-LDA based topic modeling

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic text summarization is an exertion of contriving the abridged form of a text document covering salient knowledge. Numerous statistical, linguistic, rule-based, and position-based text summarization approaches have been explored for different rich-resourced languages. For under-resourced languages such as Hindi, automatic text summarization is a challenging task and still an unsolved problem. Another issue with such languages is the unavailability of corpus and the inadequacy of the processing tools. In this paper, we proposed an extractive lexical knowledge-rich topic modeling text summarization approach for Hindi novels and stories in which we implemented four independent variants using different sentence weighting schemes. We prepared a corpus of Hindi Novels and stories since the absence of a corpus. We used a smoothing technique for edifying and variety summaries followed by evaluating the efficacy of generated summaries against three metrics (gist diversity, retention ratio, and ROUGE score). The results manifest that the proposed model produces abridge, articulate and coherent summaries. To investigate the performance of the proposed model, we simulate the experiments on the English dataset as well. Further, we compare our models with the baselines and traditional topic modeling approach, where we show that the proposed model has confessed optimal results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aggarwal CC (2018) Text Summarization, In Machine Learning for Text, Springer, pp. 361–380

  2. Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms, Cognit Comput, 1–19

  3. Bairi R, Iyer R, Ramakrishnan G, Bilmes J (2015) Summarization of multi-document topic hierarchies using submodular mixtures, In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 553–563

  4. Bamman D, Smith NA (2013) New alignment methods for discriminative book summarization, arXiv Prepr. arXiv1305.1319

  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  6. Blogwriter, Munshi Premchand’s Stories. [Online]. Available: http://premchand.kahaani.org/. (Accessed: 29-Mar-2019)

  7. Brainy, “Brainy questions.” [Online]. Available: https://brainly.in/subject/hindi. (Accessed: 27-Mar-2019)

  8. Ceylan H (2011) Investigating the extractive summarization of literary novels. University of North Texas, Denton

    Google Scholar 

  9. Chi L, Li B, Zhu X (2014) Context-preserving hashing for fast text classification, In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 100–108

  10. Di Fabbrizio G, Aker A, Gaizauskas R (2013) Summarizing online reviews using aspect rating distributions and language modeling. IEEE Intell Syst 28(3):28–37

    Article  Google Scholar 

  11. Elhadad M, Miranda-Jiménez S, Steinberger J, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 2: Czech, hebrew and spanish, In Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, pp. 13–19

  12. Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

  13. Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  14. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66

    Article  Google Scholar 

  15. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19–25

  16. Gupta V (2013) Hybrid algorithm for multilingual summarization of Hindi and Punjabi documents, In Mining Intelligence and Knowledge Exploration, Springer, pp. 717–727

  17. Gupta V, Kaur N (2016) A novel hybrid text summarization system for Punjabi text. Cogn Comput 8(2):261–277

    Article  Google Scholar 

  18. Hafeez R, Khan S, Abbas MA, Maqbool F (2018) Topic based Summarization of Multiple Documents using Semantic Analysis and Clustering, In 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT), pp. 70–74

  19. Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews–a text summarization approach. Inf Process Manag 53(2):436–449

    Article  Google Scholar 

  20. Huang T, Li L, Zhang Y (2016) Multilingual multi-document summarization with enhanced hLDA features, In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, pp. 299–312

  21. John A, Premjith PS, Wilscy M (2017) Extractive multi-document summarization using population-based multicriteria optimization. Expert Syst Appl 86:385–397

    Article  Google Scholar 

  22. Kabadjov M, Atkinson M, Steinberger J, Steinberger R, Van Der Goot E (2010) NewsGist: a multilingual statistical news summarizer, In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 591–594

  23. Kaljahi R, Foster J, Roturier J (2014) Semantic role labelling with minimal resources: Experiments with french, In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (* SEM 2014), pp. 87–92

  24. Kazantseva A, Szpakowicz S (2010) Summarizing short stories. Comput Linguist 36(1):71–109

    Article  Google Scholar 

  25. Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic, In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 159–168

  26. Li L, Forascu C, El-Haj M, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 1: Arabic, English, Greek, Chinese, Romanian

  27. Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries, Text Summ. Branches Out

  28. Litvak M, Vanetik N, Liu C, Xiao L, Savas O (2015) Improving summarization quality with topic modeling, In Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 39–47

  29. Litvak M, Vanetik N, Li L (2017) Summarizing Weibo with Topics Compression, In International Conference on Computational Linguistics and Intelligent Text Processing, pp. 522–534

  30. Liu N, Tang X-J, Lu Y, Li M-X, Wang H-W, Xiao P (2014) Topic-Sensitive Multi-document Summarization Algorithm, In 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74

  31. Luhn HP (1999) The automatic creation of literature abstracts. Adv Autom Text Summ:15–22

  32. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169

    Article  Google Scholar 

  33. Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing

    Google Scholar 

  34. Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, del Fiol G (2014) Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467

    Article  Google Scholar 

  35. Na L, Ming-xia L, Ying L, Xiao-jun T, Hai-wen W, Peng X (2014) Mixture of topic model for multi-document summarization, In The 26th Chinese Control and Decision Conference (2014 CCDC), pp. 5168–5172

  36. Na L, Ying L, Xiao-jun T, Hai-wen W, Peng X, Ming-xia L (2016) Multi-document summarization algorithm based on significance sentences, In 2016 Chinese Control and Decision Conference (CCDC), pp. 3847–3852

  37. Nomoto T, Matsumoto Y (2003) The diversity-based approach to open-domain text summarization. Inf Process Manag 39(3):363–389

    Article  Google Scholar 

  38. Oufaida H, Blache P, Nouali O (2015) Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization, In International Conference on Applications of Natural Language to Information Systems, pp. 51–63

  39. Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417

    Article  MathSciNet  Google Scholar 

  40. Parveen D, Strube M (2015) Integrating importance, non-redundancy and coherence in graph-based extractive summarization, In Twenty-Fourth International Joint Conference on Artificial Intelligence

  41. Patel A, Siddiqui T, Tiwary US (2007) A language independent approach to multilingual text summarization, In Large scale semantic access to content (text, image, video, and sound), pp. 123–132

  42. Radev DR et al. (2004) MEAD-a platform for multidocument multilingual text summarization

  43. Rani R, Lobiyal DK (2018) Social Choice Theory Based Domain Specific Hindi Stop Words List Construction and Its Application in Text Mining, In International Conference on Intelligent Human Computer Interaction, pp. 123–135

  44. Rani R, Lobiyal DK (2018) Automatic Construction of Generic Stop Words List for Hindi Text, In Procedia Computer Science Elsevier Journal, pp. 1–7

  45. Roul RK, Mehrotra S, Pungaliya Y, Sahoo JK (2019) A New Automatic Multi-document Text Summarization using Topic Modeling, In International Conference on Distributed Computing and Internet Technology, pp. 212–221

  46. Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Syst 159:1–8

    Article  Google Scholar 

  47. Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cogn Comput 9(5):671–688

    Article  Google Scholar 

  48. Torres-Moreno J-M, St-Onge P-L, Gagnon M, El-Beze M, Bellot P (2009) Automatic summarization system coupled with a question-answering system (qaas), arXiv Prepr. arXiv0905.2990

  49. Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models, In Proceedings of the ACL-IJCNLP 2009 conference short papers, pp. 297–300

  50. Wikipedia, Premchand (2019) [Online]. Available: https://en.wikipedia.org/wiki/Premchand. (Accessed: 29-Mar-2019)

  51. Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23

    Article  Google Scholar 

  52. Yang G, Wen D, Chen N-S, Sutinen E (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352

    Article  Google Scholar 

  53. Yao J, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization, In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 118–127

  54. Yao J, Wan X, Xiao J (2015) Compressive document summarization via sparse optimization, In Twenty-Fourth International Joint Conference on Artificial Intelligence

  55. Zhuang L, Jing F, Zhu X-Y (2006) Movie review mining and summarization, In Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43–50

Download references

Acknowledgments

The authors are grateful to anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruby Rani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rani, R., Lobiyal, D.K. An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80, 3275–3305 (2021). https://doi.org/10.1007/s11042-020-09549-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09549-3

Keywords

Navigation