Abstract
Automatic text summarization is an exertion of contriving the abridged form of a text document covering salient knowledge. Numerous statistical, linguistic, rule-based, and position-based text summarization approaches have been explored for different rich-resourced languages. For under-resourced languages such as Hindi, automatic text summarization is a challenging task and still an unsolved problem. Another issue with such languages is the unavailability of corpus and the inadequacy of the processing tools. In this paper, we proposed an extractive lexical knowledge-rich topic modeling text summarization approach for Hindi novels and stories in which we implemented four independent variants using different sentence weighting schemes. We prepared a corpus of Hindi Novels and stories since the absence of a corpus. We used a smoothing technique for edifying and variety summaries followed by evaluating the efficacy of generated summaries against three metrics (gist diversity, retention ratio, and ROUGE score). The results manifest that the proposed model produces abridge, articulate and coherent summaries. To investigate the performance of the proposed model, we simulate the experiments on the English dataset as well. Further, we compare our models with the baselines and traditional topic modeling approach, where we show that the proposed model has confessed optimal results.
Similar content being viewed by others
References
Aggarwal CC (2018) Text Summarization, In Machine Learning for Text, Springer, pp. 361–380
Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms, Cognit Comput, 1–19
Bairi R, Iyer R, Ramakrishnan G, Bilmes J (2015) Summarization of multi-document topic hierarchies using submodular mixtures, In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 553–563
Bamman D, Smith NA (2013) New alignment methods for discriminative book summarization, arXiv Prepr. arXiv1305.1319
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Blogwriter, Munshi Premchand’s Stories. [Online]. Available: http://premchand.kahaani.org/. (Accessed: 29-Mar-2019)
Brainy, “Brainy questions.” [Online]. Available: https://brainly.in/subject/hindi. (Accessed: 27-Mar-2019)
Ceylan H (2011) Investigating the extractive summarization of literary novels. University of North Texas, Denton
Chi L, Li B, Zhu X (2014) Context-preserving hashing for fast text classification, In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 100–108
Di Fabbrizio G, Aker A, Gaizauskas R (2013) Summarizing online reviews using aspect rating distributions and language modeling. IEEE Intell Syst 28(3):28–37
Elhadad M, Miranda-Jiménez S, Steinberger J, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 2: Czech, hebrew and spanish, In Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, pp. 13–19
Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19–25
Gupta V (2013) Hybrid algorithm for multilingual summarization of Hindi and Punjabi documents, In Mining Intelligence and Knowledge Exploration, Springer, pp. 717–727
Gupta V, Kaur N (2016) A novel hybrid text summarization system for Punjabi text. Cogn Comput 8(2):261–277
Hafeez R, Khan S, Abbas MA, Maqbool F (2018) Topic based Summarization of Multiple Documents using Semantic Analysis and Clustering, In 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT), pp. 70–74
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews–a text summarization approach. Inf Process Manag 53(2):436–449
Huang T, Li L, Zhang Y (2016) Multilingual multi-document summarization with enhanced hLDA features, In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, pp. 299–312
John A, Premjith PS, Wilscy M (2017) Extractive multi-document summarization using population-based multicriteria optimization. Expert Syst Appl 86:385–397
Kabadjov M, Atkinson M, Steinberger J, Steinberger R, Van Der Goot E (2010) NewsGist: a multilingual statistical news summarizer, In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 591–594
Kaljahi R, Foster J, Roturier J (2014) Semantic role labelling with minimal resources: Experiments with french, In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (* SEM 2014), pp. 87–92
Kazantseva A, Szpakowicz S (2010) Summarizing short stories. Comput Linguist 36(1):71–109
Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic, In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 159–168
Li L, Forascu C, El-Haj M, Giannakopoulos G (2013) Multi-document multilingual summarization corpus preparation, part 1: Arabic, English, Greek, Chinese, Romanian
Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries, Text Summ. Branches Out
Litvak M, Vanetik N, Liu C, Xiao L, Savas O (2015) Improving summarization quality with topic modeling, In Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 39–47
Litvak M, Vanetik N, Li L (2017) Summarizing Weibo with Topics Compression, In International Conference on Computational Linguistics and Intelligent Text Processing, pp. 522–534
Liu N, Tang X-J, Lu Y, Li M-X, Wang H-W, Xiao P (2014) Topic-Sensitive Multi-document Summarization Algorithm, In 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74
Luhn HP (1999) The automatic creation of literature abstracts. Adv Autom Text Summ:15–22
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, del Fiol G (2014) Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467
Na L, Ming-xia L, Ying L, Xiao-jun T, Hai-wen W, Peng X (2014) Mixture of topic model for multi-document summarization, In The 26th Chinese Control and Decision Conference (2014 CCDC), pp. 5168–5172
Na L, Ying L, Xiao-jun T, Hai-wen W, Peng X, Ming-xia L (2016) Multi-document summarization algorithm based on significance sentences, In 2016 Chinese Control and Decision Conference (CCDC), pp. 3847–3852
Nomoto T, Matsumoto Y (2003) The diversity-based approach to open-domain text summarization. Inf Process Manag 39(3):363–389
Oufaida H, Blache P, Nouali O (2015) Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization, In International Conference on Applications of Natural Language to Information Systems, pp. 51–63
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417
Parveen D, Strube M (2015) Integrating importance, non-redundancy and coherence in graph-based extractive summarization, In Twenty-Fourth International Joint Conference on Artificial Intelligence
Patel A, Siddiqui T, Tiwary US (2007) A language independent approach to multilingual text summarization, In Large scale semantic access to content (text, image, video, and sound), pp. 123–132
Radev DR et al. (2004) MEAD-a platform for multidocument multilingual text summarization
Rani R, Lobiyal DK (2018) Social Choice Theory Based Domain Specific Hindi Stop Words List Construction and Its Application in Text Mining, In International Conference on Intelligent Human Computer Interaction, pp. 123–135
Rani R, Lobiyal DK (2018) Automatic Construction of Generic Stop Words List for Hindi Text, In Procedia Computer Science Elsevier Journal, pp. 1–7
Roul RK, Mehrotra S, Pungaliya Y, Sahoo JK (2019) A New Automatic Multi-document Text Summarization using Topic Modeling, In International Conference on Distributed Computing and Internet Technology, pp. 212–221
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2018) Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Syst 159:1–8
Singh J, Gupta V (2017) An efficient corpus-based stemmer. Cogn Comput 9(5):671–688
Torres-Moreno J-M, St-Onge P-L, Gagnon M, El-Beze M, Bellot P (2009) Automatic summarization system coupled with a question-answering system (qaas), arXiv Prepr. arXiv0905.2990
Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models, In Proceedings of the ACL-IJCNLP 2009 conference short papers, pp. 297–300
Wikipedia, Premchand (2019) [Online]. Available: https://en.wikipedia.org/wiki/Premchand. (Accessed: 29-Mar-2019)
Wu Z, Lei L, Li G, Huang H, Zheng C, Chen E, Xu G (2017) A topic modeling based approach to novel document automatic summarization. Expert Syst Appl 84:12–23
Yang G, Wen D, Chen N-S, Sutinen E (2015) A novel contextual topic model for multi-document summarization. Expert Syst Appl 42(3):1340–1352
Yao J, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization, In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 118–127
Yao J, Wan X, Xiao J (2015) Compressive document summarization via sparse optimization, In Twenty-Fourth International Joint Conference on Artificial Intelligence
Zhuang L, Jing F, Zhu X-Y (2006) Movie review mining and summarization, In Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43–50
Acknowledgments
The authors are grateful to anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rani, R., Lobiyal, D.K. An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80, 3275–3305 (2021). https://doi.org/10.1007/s11042-020-09549-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09549-3