skip to main content
research-article

Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance

Published:22 September 2023Publication History
Skip Abstract Section

Abstract

With the rapid growth of digital content, there is a need for an automatic text summarizer to provide short text from a long text document. Many research works have been presented for extractive text summarization (ETS). This article mainly focuses on the graph-based ETS approach for multiple Telugu text documents. A modified Text-Rank algorithm is employed with the noun and verb count of each sentence in the text as the initial score of each node. To get the optimal features, a novel feature selection algorithm called improved Flamingo Search Algorithm is proposed in this article. Though graph-based ETS is an important approach, the generated summaries are redundant. To reduce the redundancy in the generated summary, maximum marginal relevance is combined with the modified Text-Rank. Different word-embedding techniques such as Fast-Text, Word2vec, TF-IDF, and one-hot encoding are utilized to experiment with the proposed approach. The performance of the proposed text summarization approach is evaluated with BLEU and ROUGE in terms of F-measure, precision, and recall.

REFERENCES

  1. [1] Allahyari M., Pouriyeh S., Assefi M., Safaei S., Trippe E. D., Gutierrez J. B., and Kochut K.. 2017. Text summarization techniques: A brief survey. arXiv:1707.02268. Retrieved from https://arxiv.org/abs/1707.02268.Google ScholarGoogle Scholar
  2. [2] Alquliti W. H. and Abdul Ghani N. B.. 2019. Convolutional neural network based for automatic text summarization. Int. J. Adv. Comput. Sci. Appl. 10, 4 (2019). Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Banu M., Karthika C., Sudarmani P., and Geetha T.. 2007. Tamil document summarization using semantic graph method. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’07). IEEE, Los Alamitos, CA, 128134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Barrera A. and Verma R.. 2012. Combining syntax and semantics for automatic extractive single-document summarization. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin, 366377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Diao Y., Lin H., Yang L., Fan X., Chu Y., Wu D., and Xu K.. 2020. CRHASum: Extractive text summarization with contextualized-representation hierarchical- attention summarization network. Neural Comput. Appl. 32 (2020), 1149111503. .Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Elbarougy R., Behery G., and El Khatib A.. 2020. Extractive arabic text summarization using modified pagerank algorithm. Egypt. Inf. J. 21, 2 (July 2020), 7381. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Erkan G. and Radev D.. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 365371.Google ScholarGoogle Scholar
  8. [8] Gunawan D., Harahap S. H., and Fadillah Rahmat R.. 2019. Multi-document summarization by using textrank and maximal marginal relevance for text in bahasa indonesia. In Proceedings of the International Conference on ICT for Smart Society (ICISS’19), 15. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Gupta V. and Lehal G. S.. 2010. A Survey of text summarization of extractive techniques. J. Emerg. Technol. Web Intell. 2, 3 (August 2010), 258268. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Gupta V. and Kaur N.. 2016. A novel hybrid text summarization system for Punjabi text. Cogn. Comput. 8 (April 2016), 261277. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Hernandez-Castaneda A., Garcia-Hernandez R. A., Ledeneva Y. and Millan- Hernandez C. E.. 2020. Extractive automatic text summarization based on lexical-semantic keywords. IEEE Access 8 (March 2020), 4989649907. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Kadam D. P., Patil N., and Gulathi A.. 2015. A comparative study of hindi text summarization techniques, genetic algorithm and neural network. Int. J. Innov. Adv. Comput. Sci. 4 (2015).Google ScholarGoogle Scholar
  13. [13] Kakwani D., Kunchukuttan A., Golla S., Gokul N. C., Bhattacharyya A., Khapra M. M., and Kumar P.. 2020. IndicNLPSuite: Monolingual corpora, evaluation benchmarks and Pre-trained multilingual language models for indian languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 49484961.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Khanam M. H. and Sravani S.. 2016. Text summarization for telugu document. IOSR J. Softw. Eng. 18, 6 (2016).Google ScholarGoogle Scholar
  15. [15] Kallimani J. S., Srinivasa K. G., and Reddy B. Eswara. 2011. Information extraction by an abstractive text summarization for an Indian regional language. In Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering. IEEE, Los Alamitos, CA, 319322. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Kumar K. V. and Yadav D.. 2015. An improvised extractive approach to Hindi text summarization. In Information Systems Design and Intelligent Applications. Springer, New Delhi, 291300. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Latha Y. M. and Sudha D. N.. 2020. Multi-Document abstractive text summarization through semantic similarity matrix for telugu language. Int. J. Adv. Sci. Technol. 29, 1 (2020), 513521. http://sersc.org/journals/index.php/IJAST/article/view/3105.Google ScholarGoogle Scholar
  18. [18] Manjari K. U.. 2020. Extractive summarization of telugu documents using textrank algorithm. In Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud (I-SMAC’20). IEEE, Los Alamitos, CA, 678683. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Manju K., Peter S. David, and Idicula S. M.. 2021. A framework for generating extractive summary from multiple malayalam documents. Information 12, 1 (January 2021). Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Mihalcea R. and Tarau P.. 2004. TextRank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404411.Google ScholarGoogle Scholar
  21. [21] Radev D. R. and McKeown K. R.. 1998. Generating natural language summaries from multiple on-line sources. Comput. Ling. 24 (1998), 469500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Rathod Y. V.. 2018. Extractive text summarization of marathi news articles. Int. Res. J. Eng Technol. 5, 7 (July 2018), 12041210.Google ScholarGoogle Scholar
  23. [23] Roul R. K.. 2020. Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput. 25 (2020), 11131127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] See A., Liu P. J., and Manning C. D.. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Shashikanth S. and Sanghavi S.. 2019. Text summarization techniques survey on telugu and foreign languages. Int. J. Res. Eng. Sci. Manage. 2, 1 (Jan. 2019), 211213. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Telugu POS tagger: Natural Language Processing at KBCS, CDAC Mumbai. Retrieved from http://kbcs.in/tools.php.Google ScholarGoogle Scholar
  27. [27] Uçkan T. and Karcı A.. 2020. Extractive multi-document text summarization based on graph independent sets. Egypt. Inf. J. 21, 3 (Sept. 2020), 145157. Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Verma P. and Verma A.. 2020. Accountability of NLP tools in text summarization for indian languages. J. Sci. Res. 64, 1 (2020). Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 9
      September 2023
      226 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3625383
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 September 2023
      • Online AM: 12 June 2023
      • Accepted: 23 April 2023
      • Revised: 13 July 2022
      • Received: 9 June 2022
      Published in tallip Volume 22, Issue 9

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)157
      • Downloads (Last 6 weeks)8

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text