Abstract
With the rapid growth of digital content, there is a need for an automatic text summarizer to provide short text from a long text document. Many research works have been presented for extractive text summarization (ETS). This article mainly focuses on the graph-based ETS approach for multiple Telugu text documents. A modified Text-Rank algorithm is employed with the noun and verb count of each sentence in the text as the initial score of each node. To get the optimal features, a novel feature selection algorithm called improved Flamingo Search Algorithm is proposed in this article. Though graph-based ETS is an important approach, the generated summaries are redundant. To reduce the redundancy in the generated summary, maximum marginal relevance is combined with the modified Text-Rank. Different word-embedding techniques such as Fast-Text, Word2vec, TF-IDF, and one-hot encoding are utilized to experiment with the proposed approach. The performance of the proposed text summarization approach is evaluated with BLEU and ROUGE in terms of F-measure, precision, and recall.
- [1] . 2017. Text summarization techniques: A brief survey. arXiv:1707.02268. Retrieved from https://arxiv.org/abs/1707.02268.Google Scholar
- [2] . 2019. Convolutional neural network based for automatic text summarization. Int. J. Adv. Comput. Sci. Appl. 10, 4 (2019). Google ScholarCross Ref
- [3] . 2007. Tamil document summarization using semantic graph method. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’07). IEEE, Los Alamitos, CA, 128–134. Google ScholarDigital Library
- [4] . 2012. Combining syntax and semantics for automatic extractive single-document summarization. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin, 366–377. Google ScholarDigital Library
- [5] . 2020. CRHASum: Extractive text summarization with contextualized-representation hierarchical- attention summarization network. Neural Comput. Appl. 32 (2020), 11491–11503. .Google ScholarDigital Library
- [6] . 2020. Extractive arabic text summarization using modified pagerank algorithm. Egypt. Inf. J. 21, 2 (July 2020), 73–81. Google ScholarCross Ref
- [7] . 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 365–371.Google Scholar
- [8] . 2019. Multi-document summarization by using textrank and maximal marginal relevance for text in bahasa indonesia. In Proceedings of the International Conference on ICT for Smart Society (ICISS’19), 1–5. Google ScholarCross Ref
- [9] . 2010. A Survey of text summarization of extractive techniques. J. Emerg. Technol. Web Intell. 2, 3 (August 2010), 258–268. Google ScholarCross Ref
- [10] . 2016. A novel hybrid text summarization system for Punjabi text. Cogn. Comput. 8 (April 2016), 261–277. Google ScholarCross Ref
- [11] . 2020. Extractive automatic text summarization based on lexical-semantic keywords. IEEE Access 8 (March 2020), 49896–49907. Google ScholarCross Ref
- [12] . 2015. A comparative study of hindi text summarization techniques, genetic algorithm and neural network. Int. J. Innov. Adv. Comput. Sci. 4 (2015).Google Scholar
- [13] . 2020. IndicNLPSuite: Monolingual corpora, evaluation benchmarks and Pre-trained multilingual language models for indian languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 4948–4961.Google ScholarCross Ref
- [14] . 2016. Text summarization for telugu document. IOSR J. Softw. Eng. 18, 6 (2016).Google Scholar
- [15] . 2011. Information extraction by an abstractive text summarization for an Indian regional language. In Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 319–322. Google ScholarCross Ref
- [16] . 2015. An improvised extractive approach to Hindi text summarization. In Information Systems Design and Intelligent Applications. Springer, New Delhi, 291–300. Google ScholarCross Ref
- [17] . 2020. Multi-Document abstractive text summarization through semantic similarity matrix for telugu language. Int. J. Adv. Sci. Technol. 29, 1 (2020), 513–521. http://sersc.org/journals/index.php/IJAST/article/view/3105.Google Scholar
- [18] . 2020. Extractive summarization of telugu documents using textrank algorithm. In Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud (I-SMAC’20). IEEE, Los Alamitos, CA, 678–683. Google ScholarCross Ref
- [19] . 2021. A framework for generating extractive summary from multiple malayalam documents. Information 12, 1 (January 2021). Google ScholarCross Ref
- [20] . 2004. TextRank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404–411.Google Scholar
- [21] . 1998. Generating natural language summaries from multiple on-line sources. Comput. Ling. 24 (1998), 469–500.Google ScholarDigital Library
- [22] . 2018. Extractive text summarization of marathi news articles. Int. Res. J. Eng Technol. 5, 7 (July 2018), 1204–1210.Google Scholar
- [23] . 2020. Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput. 25 (2020), 1113–1127. Google ScholarDigital Library
- [24] . 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Google ScholarCross Ref
- [25] . 2019. Text summarization techniques survey on telugu and foreign languages. Int. J. Res. Eng. Sci. Manage. 2, 1 (Jan. 2019), 211–213. Google ScholarCross Ref
- [26] Telugu POS tagger: Natural Language Processing at KBCS, CDAC Mumbai. Retrieved from http://kbcs.in/tools.php.Google Scholar
- [27] . 2020. Extractive multi-document text summarization based on graph independent sets. Egypt. Inf. J. 21, 3 (Sept. 2020), 145–157. Google ScholarCross Ref
- [28] . 2020. Accountability of NLP tools in text summarization for indian languages. J. Sci. Res. 64, 1 (2020). Google ScholarCross Ref
Index Terms
- Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance
Recommendations
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
RankSum—An unsupervised extractive text summarization based on rank fusion
AbstractIn this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, ...
Graphical abstractDisplay Omitted
Highlights- A unified summarization framework with multi-dimensional sentence features.
- ...
Comments