Skip to main content
Log in

From task to evaluation: an automatic text summarization review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Automatic summarization is attracting increasing attention as one of the most promising research areas. This technology has been tried in various real-world applications in recent years and achieved a good response. However, the applicability of conventional evaluation metrics cannot keep up with rapidly evolving summarization task formats and ensuing indicator. After recent years of research, automatic summarization task requires not only readability and fluency, but also informativeness and consistency. Diversified application scenarios also bring new challenges both for generative language models and evaluation metrics. In this review, we analysis and specifically focus on the difference between the task format and the evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Acharya S, Di Eugenio B, Boyd A et al (2018) Towards generating personalized hospitalization summaries. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: student research workshop, 2018, pp 74–82

  • Agirre E, Gonzalez Agirre A, Lopez-Gazpio I et al (2016) In: SemEval-2016 task 2: interpretable semantic textual similarity. SemEval-2016. 10th International workshop on semantic evaluation, 16–17 June 2016, San Diego, CA. ACL (Association for Computational Linguistics), Stroudsburg, pp 512–524

  • Ahmad WU, Chakraborty S, Ray B et al (2021) Unified pre-training for program understanding and generation. arXiv preprint. arXiv:2103.06333

  • Akbik A, Bergmann T, Blythe D et al (2019) Flair: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics (demonstrations), 2019, pp 54–59

  • Allahyari M, Pouriyeh S, Assefi M et al (2017) Text summarization techniques: a brief survey. arXiv preprint. arXiv:1707.02268

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp 65–72

  • Barzilay R, Lee L (2003) Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. arXiv preprint cs/0304006

  • Baumel T, Eyal M, Elhadad M (2018) Query focused abstractive summarization: incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint. arXiv:1801.07704

  • Baziotis C, Androutsopoulos I, Konstas I et al (2019) Seq \(^{3}\): differentiable sequence-to-sequence-to-sequence autoencoder for unsupervised abstractive sentence compression. arXiv preprint. arXiv:1904.03651

  • Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv preprint. arXiv:2004.05150

  • Bowman SR, Angeli G, Potts C et al (2015) A large annotated corpus for learning natural language inference. arXiv preprint. arXiv:1508.05326

  • Bražinskas A, Lapata M, Titov I (2019) Unsupervised opinion summarization as copycat-review generation. arXiv preprint. arXiv:1911.02247

  • Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems, 2020, vol 33, pp 1877–1901

  • Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, 2001, p 2

  • Burstein J, Doran C, Solorio T (2019) In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies (long and short papers),2019, vol 1

  • Cajueiro DO, Nery AG, Tavares I et al (2023) A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding. arXiv preprint. arXiv:2301.03403

  • Cao S, Wang L (2021) Inference time style control for summarization. arXiv preprint. arXiv:2104.01724

  • Cao Y, Liu H, Wan X (2020) Jointly learning to align and summarize for neural cross-lingual summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 6220–6231

  • Carletta J, Ashby S, Bourban S et al (2006) The AMI meeting corpus: a pre-announcement. In: Machine learning for multimodal interaction: second international workshop, MLMI 2005, Edinburgh, UK, 11–13 July 2005, revised selected papers 2. Springer, pp 28–39

  • Chen J, Yang D (2021) Structure-aware abstractive conversation summarization via discourse and action graphs. arXiv preprint. arXiv:2104.08400

  • Chen J, Zhuge H (2018) Abstractive text-image summarization using multi-modal attentional hierarchical RNN. In: Proceedings of the 2018 conference on empirical methods in natural language processing, 2918, pp 4046–4056

  • Chen X, Chan Z, Gao S et al (2019) Learning towards abstractive timeline summarization. In: IJCAI, 2019, pp 4939–4945

  • Chen Y, Liu P, Qiu X (2021) Are factuality checkers reliable? Adversarial meta-evaluation of factuality in summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp 2082–2095

  • Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2016, pp 93–98

  • Clark E, Celikyilmaz A, Smith NA (2019) Sentence Mover’s Similarity: automatic evaluation for multi-sentence texts. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, 2019, pp 2748–2760

  • Cohan A, Dernoncourt F, Kim DS et al (2018) A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint. arXiv:1804.05685

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Cohen N, Kalinsky O, Ziser Y et al (2021) WikiSum: coherent summarization dataset for efficient human-evaluation. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: short papers, 2021, vol 2, pp 212–219

  • Colombo PJA, Clavel C, Piantanida P (2022) Infolm: a new metric to evaluate summarization and data2text generation. In: Proceedings of the AAAI conference on artificial intelligence, 2022, pp 10554–10562

  • Cui C, Yang H, Wang Y et al (2023) Deep multi-modal fusion of image and non-image data in disease diagnosis and prognosis: a review. Prog Biomed Eng 5(2):022001

    Article  Google Scholar 

  • Dagan I, Glickman O, Magnini B (2006) The Pascal recognising textual entailment challenge. In: Machine learning challenges. Evaluating predictive uncertainty, visual object classification, and recognising textual entailment: first PASCAL machine learning challenges workshop, MLCW 2005, Southampton, UK, 11–13 April 2005, revised selected papers. Springer, pp 177–190

  • Dang HT (2005) Overview of DUC 2005. In: Proceedings of the document understanding conference, 2005. Citeseer, pp 1–12

  • Dang HT (2006) DUC 2005: evaluation of question-focused summarization systems. In: Proceedings of the workshop on task-focused summarization and question answering, 2006, pp 48–55

  • Delbrouck JB, Zhang C, Rubin D (2021) Qiai at Mediqa 2021: multimodal radiology report summarization. In: Proceedings of the 20th workshop on biomedical language processing, 2021, pp 285–290

  • Deng M, Tan B, Liu Z et al (2021) Compression, transduction, and creation: a unified framework for evaluating natural language generation. arXiv preprint. arXiv:2109.06379

  • Devlin J, Chang MW, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint. arXiv:1810.04805

  • Dolan B, Brockett C (2005) Automatically constructing a corpus of sentential paraphrases. In: Third international workshop on paraphrasing (IWP2005), 2005

  • Dong Y, Wang S, Gan Z et al (2020) Multi-fact correction in abstractive text summarization. arXiv preprint. arXiv:2010.02443

  • Dou ZY, Kumar S, Tsvetkov Y (2020) A deep reinforced model for zero-shot cross-lingual summarization with bilingual semantic similarity rewards. arXiv preprint. arXiv:2006.15454

  • Egan N, Vasilyev O, Bohannon J (2022) Play the Shannon game with language models: a human-free approach to summary evaluation. In: Proceedings of the AAAI conference on artificial intelligence, 2022, pp 10599–10607

  • Fabbri AR, Li I, She T et al (2019) Multi-News: a large-scale multi-document summarization dataset and abstractive hierarchical model. arXiv preprint. arXiv:1906.01749

  • Fabbri AR, Han S, Li H et al (2020) Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. arXiv preprint. arXiv:2010.12836

  • Fabbri AR, Kryściński W, McCann B et al (2021) Summeval: re-evaluating summarization evaluation. Trans Assoc Comput Linguist 9:391–409

    Article  Google Scholar 

  • Feigenblat G, Roitman H, Boni O et al (2017) Unsupervised query-focused multi-document summarization using the cross entropy method. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017, pp 961–964

  • Feng Z, Guo D, Tang D et al (2020) CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint. arXiv:2002.08155

  • Feng X, Feng X, Qin B (2021) Incorporating commonsense knowledge into abstractive dialogue summarization via heterogeneous graph networks. In: Chinese computational linguistics: 20th China national conference, CCL 2021, Hohhot, China, 13–15 August 2021, proceedings. Springer, pp 127–142

  • Fu X, Zhang Y, Wang T et al (2021) RepSum: unsupervised dialogue summarization based on replacement strategy. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: long papers, 2021, vol 1, pp 6042–6051

  • Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66

    Article  Google Scholar 

  • Ganesan K (2018) ROUGE 2.0: updated and improved measures for evaluation of summarization tasks. arXiv preprint. arXiv:1803.01937

  • Gao Y, Zhao W, Eger S (2020) SUPERT: towards new frontiers in unsupervised evaluation metrics for multi-document summarization. arXiv preprint. arXiv:2005.03724

  • Ghalandari DG, Ifrim G (2020) Examining the state-of-the-art in news timeline summarization. arXiv preprint. arXiv:2005.10107

  • Ghalandari DG, Hokamp C, Pham NT et al (2020) A large-scale multi-document summarization dataset from the Wikipedia current events portal. arXiv preprint. arXiv:2005.10070

  • Gliwa B, Mochol I, Biesek M et al (2019) SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. arXiv preprint. arXiv:1911.12237

  • Gu N, Ash E, Hahnloser RH (2021) MemSum: extractive summarization of long documents using multi-step episodic Markov decision processes. arXiv preprint. arXiv:2107.08929

  • Guo M, Ainslie J, Uthus D et al (2021) LongT5: efficient text-to-text transformer for long sequences. arXiv preprint. arXiv:2112.07916

  • Haiduc S, Aponte J, Moreno L et al (2010) On the use of automated text summarization techniques for summarizing source code. In: 2010 17th Working conference on reverse engineering, 2010. IEEE, pp 35–44

  • Hsu CC, Tan C (2021) Decision-focused summarization. arXiv preprint. arXiv:2109.06896

  • Hsu WT, Lin CK, Lee MY et al (2018) A unified model for extractive and abstractive summarization using inconsistency loss. arXiv preprint. arXiv:1805.06266

  • Hu B, Chen Q, Zhu F (2015) LCSTS: a large scale Chinese short text summarization dataset. arXiv preprint. arXiv:1506.05865

  • Hu X, Li G, Xia X et al (2018a) Deep code comment generation. In: Proceedings of the 26th conference on program comprehension, 2018, pp 200–210

  • Hu X, Li G, Xia X et al (2018b) Summarizing source code with transferred API knowledge. Research collection. School of Computing and Information Systems

  • Huang L, Wu L, Wang L (2020) Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward. arXiv preprint. arXiv:2005.01159

  • Huang L, Cao S, Parulian N et al (2021) Efficient attentions for long document summarization. arXiv preprint. arXiv:2104.02112

  • Husain H, Wu HH, Gazit T et al (2019) CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint. arXiv:1909.09436

  • Im J, Kim M, Lee H et al (2021) Self-supervised multimodal opinion summarization. arXiv preprint. arXiv:2105.13135

  • Indu M, Kavitha K (2016) Review on text summarization evaluation methods. In: 2016 International conference on research advances in integrated navigation systems (RAINS), 2016. IEEE, pp 1–4

  • Inui K, Jiang J, Ng V et al (2019) In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2019

  • Jain R, Jangra A, Saha S et al (2022) A survey on medical document summarization. arXiv preprint. arXiv:2212.01669

  • Jia Q, Liu Y, Tang H et al (2022) Post-training dialogue summarization using pseudo-paraphrasing. arXiv preprint. arXiv:2204.13498

  • Jiang M, Hu J, Huang Q et al (2019) Reo-relevance, extraness, omission: a fine-grained evaluation for image captioning. arXiv preprint. arXiv:1909.02217

  • Jin H, Wan X (2020) Abstractive multi-document summarization via joint learning with single-document summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp 2545–2554

  • Jin H, Wang T, Wan X (2020) Multi-granularity interaction network for extractive and abstractive multi-document summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 6244–6254

  • Joshi M, Chen D, Liu Y et al (2020) SpanBERT: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77

    Article  Google Scholar 

  • Kane H, Kocyigit MY, Abdalla A et al (2020) Nubia: neural based interchangeability assessor for text generation. arXiv preprint. arXiv:2004.14667

  • Kasner Z (2020) Domain adaptation for natural language generation. Tiny Transactions of Computer Science

  • Khullar A, Arora U (2020) MAST: multimodal abstractive summarization with trimodal hierarchical attention. arXiv preprint. arXiv:2010.08021

  • Kiritchenko S, Mohammad SM (2017) Capturing reliable fine-grained sentiment associations by crowdsourcing and best-worst scaling. arXiv preprint. arXiv:1712.01741

  • Koh HY, Ju J, Liu M et al (2022) An empirical survey on long document summarization: datasets, models, and metrics. ACM Comput Surv 55(8):1–35

    Article  Google Scholar 

  • Krishna K, Srinivasan BV (2018) Generating topic-oriented summaries using neural attention. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1, pp 1697–1705

  • Krishna K, Khosla S, Bigham JP et al (2020) Generating soap notes from doctor–patient conversations using modular summarization techniques. arXiv preprint. arXiv:2005.01795

  • Krubiński M, Pecina P (2023) Mlask: multimodal summarization of video-based news articles. In: Findings of the Association for Computational Linguistics: EACL 2023, 2023, pp 880–894

  • Kryściński W, McCann B, Xiong C et al (2019) Evaluating the factual consistency of abstractive text summarization. arXiv preprint. arXiv:1910.12840

  • Kulkarni S, Chammas S, Zhu W et al (2020) AQUAMUSE: automatically generating datasets for query-based multi-document summarization. arXiv preprint. arXiv:2010.12694

  • Kusner M, Sun Y, Kolkin N et al (2015) From word embeddings to document distances. In: International conference on machine learning, PMLR, 2015, pp 957–966

  • Kwon O, Kim D, Lee SR et al (2021) Handling out-of-vocabulary problem in Hangeul word embeddings. In: Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics: main volume, 2021, pp 3213–3221

  • LeClair A, McMillan C (2019) Recommendations for datasets for source code summarization. arXiv preprint. arXiv:1904.02660

  • LeClair A, Jiang S, McMillan C (2019) A neural model for generating natural language summaries of program subroutines. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), 2019. IEEE, pp 795–806

  • Lee N, Bang Y, Yu T et al (2022) Neus: neutral multi-news summarization for mitigating framing bias. arXiv preprint. arXiv:2204.04902

  • Lewis M, Liu Y, Goyal N et al (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint. arXiv:1910.13461

  • Li Y, Li S (2014) Query-focused multi-document summarization: combining a topic model with graph-based semi-supervised learning. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014, pp 1197–1207

  • Li H, Zhu J, Ma C et al (2017) Multi-modal summarization for asynchronous collection of text, image, audio and video. In: Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp 1092–1102

  • Li B, Peng X, Wang Z et al (2018a) End-to-end united video dehazing and detection. In: Proceedings of the AAAI conference on artificial intelligence, 2018

  • Li H, Zhu J, Liu T et al (2018b) Multi-modal sentence summarization with modality attention and image filtering. In: IJCAI, 2018, pp 4152–4158

  • Liang Y, Meng F, Zhou C et al (2022) A variational hierarchical model for neural cross-lingual summarization. arXiv preprint. arXiv:2203.03820

  • Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81

  • Lin H, Ma L, Zhu J et al (2021) CSDS: a fine-grained Chinese dataset for customer service dialogue summarization. arXiv preprint. arXiv:2108.13139

  • Lin H, Zhu J, Xiang L et al (2022) Other roles matter! Enhancing role-oriented dialogue summarization via role interactions. arXiv preprint. arXiv:2205.13190

  • Liu Y, Lapata M (2019) Text summarization with pretrained encoders. arXiv preprint. arXiv:1908.08345

  • Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226

    Article  Google Scholar 

  • Liu Y, Zhu C, Zeng M (2021) End-to-end segmentation-based news summarization. arXiv preprint. arXiv:2110.07850

  • Liu X, Zang S, Zhang C et al (2022a) CLTS+: a new Chinese long text summarization dataset with abstractive summaries. In: Part I (ed) Artificial neural networks and machine learning-ICANN 2022: 31st international conference on artificial neural networks, Bristol, UK, 6–9 September 2022, proceedings. Springer, pp 73–84

  • Liu Y, Jia Q, Zhu K (2022b) Reference-free summarization evaluation via semantic correlation and compression ratio. In: Proceedings of the 2022 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2022, pp 2109–2115

  • Lloret E, Plaza L, Aker A (2018) The challenging task of summary evaluation: an overview. Lang Resour Eval 52:101–148

    Article  Google Scholar 

  • Louviere JJ, Flynn TN, Marley AAJ (2015) Best–worst scaling: theory, methods and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Lu Y, Dong Y, Charlin L (2020) Multi-Xscience: a large-scale dataset for extreme multi-document summarization of scientific articles. arXiv preprint. arXiv:2010.14235

  • Martin AF, Przybocki MA (2001) The NIST speaker recognition evaluations: 1996–2001. In: 2001: a speaker Odyssey-the speaker recognition workshop, 2001

  • Martschat S, Markert K (2018) A temporally sensitive submodularity framework for timeline summarization. arXiv preprint. arXiv:1810.07949

  • Mihalcea R, Tarau P (2004) TextRank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp 404–411

  • Murray G, Renals S, Carletta J (2005) Extractive summarization of meeting recordings. Centre for Speech Technology Research

  • Nallapati R, Zhou B, Gulcehre C et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint. arXiv:1602.06023

  • Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint. arXiv:1808.08745

  • Nazari N, Mahdavi M (2019) A survey on automatic text summarization. J AI Data Min 7(1):121–135

    Google Scholar 

  • Nema P, Khapra M, Laha A et al (2017) Diversity driven attention model for query-based abstractive summarization. arXiv preprint. arXiv:1704.08300

  • OpenAI (2023) GPT-4 technical report. arXiv abs/2303.08774

  • Over P, Yen J (2004) An introduction to DUC-2004. National Institute of Standards and Technology

  • Palaskar S, Libovickỳ J, Gella S et al (2019) Multimodal abstractive summarization for How2 videos. arXiv preprint. arXiv:1906.07901

  • Papineni K, Roukos S, Ward T et al (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp 311–318

  • Parvez MR, Ahmad WU, Chakraborty S et al (2021) Retrieval augmented code generation and summarization. arXiv preprint. arXiv:2108.11601

  • Pasunuru R, Bansal M (2017) Multi-task video captioning with video and entailment generation. arXiv preprint. arXiv:1704.07489

  • Perez-Beltrachini L, Lapata M (2021) Multi-document summarization with determinantal point process attention. J Artif Intell Res 71:371–399

    Article  Google Scholar 

  • Perez-Beltrachini L, Lapata M (2022) Models and datasets for cross-lingual summarisation. arXiv preprint. arXiv:2202.09583

  • Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1. Association for Computational Linguistics, New Orleans, pp 2227–2237. https://aclanthology.org/N18-1202

  • Peyrard M, Botschen T, Gurevych I (2017) Learning to score system summaries for better content selection evaluation. In: Proceedings of the workshop on new frontiers in summarization, 2017, pp 74–84

  • Pilault J, Li R, Subramanian S et al (2020) On extractive and abstractive neural document summarization with transformer language models. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), 2020, pp 9308–9319

  • Popović M (2015) CHRF: character n-gram f-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, 2015, pp 392–395

  • Popović M (2017) CHRF++: words helping character n-grams. In: Proceedings of the second conference on machine translation, 2017, pp 612–618

  • Post M (2018) A call for clarity in reporting bleu scores. arXiv preprint. arXiv:1804.08771

  • Purver M, Körding KP, Griffiths TL et al (2006) Unsupervised topic modelling for multi-party spoken discourse. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, 2006, pp 17–24

  • Pushpakar L, D’Souza A (1958) A brief study on Hindi text summarization using natural language processing. Res Dev 2(4):354–361

    Google Scholar 

  • Recasens M, Hovy E (2011) BLANC: implementing the rand index for coreference evaluation. Nat Lang Eng 17(4):485–510

    Article  Google Scholar 

  • Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint. arXiv:1908.10084

  • Ribeiro LF, Schmitt M, Schütze H et al (2020) Investigating pretrained language models for graph-to-text generation. arXiv preprint. arXiv:2007.08426

  • Ribeiro LF, Liu M, Gurevych I et al (2022) FACTGRAPH: evaluating factuality in summarization with semantic graph representations. arXiv preprint. arXiv:2204.06508

  • Roberts A, Raffel C, Shazeer N (2020) How much knowledge can you pack into the parameters of a language model? arXiv preprint. arXiv:2002.08910

  • Rohde T, Wu X, Liu Y (2021) Hierarchical learning for generation with long source sequences. arXiv preprint. arXiv:2104.07545

  • Roush A (2020) CX DB8: a queryable extractive summarizer and semantic search engine. arXiv preprint. arXiv:2012.03942

  • Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv preprint. arXiv:1509.00685

  • Sanabria R, Caglayan O, Palaskar S et al (2018) How2: a large-scale dataset for multimodal language understanding. arXiv preprint. arXiv:1811.00347

  • Schluter N (2017) The limits of automatic summarisation according to rouge. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics, 2017. Association for Computational Linguistics, pp 41–45

  • Schumann R, Mou L, Lu Y et al (2020) Discrete optimization for unsupervised sentence summarization with word-level extraction. arXiv preprint. arXiv:2005.01791

  • Scialom T, Lamprier S, Piwowarski B et al (2019) Answers unite! Unsupervised metrics for reinforced summarization models. arXiv preprint. arXiv:1909.01610

  • Sellam T, Das D, Parikh AP (2020) BLEURT: learning robust metrics for text generation. arXiv preprint. arXiv:2004.04696

  • Shapira O, Pasunuru R, Ronen H et al (2021) Extending multi-document summarization evaluation to the interactive setting. In: Proceedings of the 2021 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2021, pp 657–677

  • Shi E, Wang Y, Du L et al (2021) Cast: enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint. arXiv:2108.12987

  • Shi E, Wang Y, Du L et al (2022) On the evaluation of neural code summarization. In: Proceedings of the 44th international conference on software engineering, 2022, pp 1597–1608

  • Smith S, Patwary M, Norick B et al (2022) Using Deepspeed and Megatron to train Megatron-Turing NLG 530b, a large-scale generative language model. arXiv preprint. arXiv:2201.11990

  • Speer R, Chin J, Havasi C (2017) ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the AAAI conference on artificial intelligence, 2017

  • Steen J, Markert K (2019) Abstractive timeline summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, 2019, pp 21–31

  • Sun E, Hou Y, Wang D et al (2021) D2s: document-to-slide generation via query-based text summarization. arXiv preprint. arXiv:2105.03664

  • Suneetha M, Fatima SS, Pervez SMZ (2011) Clustering of web search results using suffix tree algorithm and avoidance of repetition of same images in search results using l-point comparison algorithm. In: 2011 International conference on emerging trends in electrical and computer technology, 2011. IEEE, pp 1041–1046

  • Syed AA, Gaol FL, Matsuo T (2021) A survey of the state-of-the-art models in neural abstractive text summarization. IEEE Access 9:13248–13265

    Article  Google Scholar 

  • Tang X, Nair A, Wang B et al (2021) Confit: toward faithful dialogue summarization with linguistically-informed contrastive fine-tuning. arXiv preprint. arXiv:2112.08713

  • Tran GB, Tran TA, Tran NK et al (2013) Leveraging learning to rank in an optimization framework for timeline summarization. In: SIGIR 2013 workshop on time-aware information access (TAIA), 2013

  • Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems, 2017, vol 30

  • Vedantam R, Lawrence Zitnick C, Parikh D (2015) CIDEr: consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp 4566–4575

  • Wan X, Li H, Xiao J (2010) Cross-language document summarization based on machine translation quality prediction. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 2010, pp 917–926

  • Wang K, Quan X, Wang R (2019) BiSET: bi-directional selective encoding with template for abstractive summarization. arXiv preprint. arXiv:1906.05012

  • Wang D, Chen J, Wu X et al (2021) CNewSum: a large-scale Chinese news summarization dataset with human-annotated adequacy and deducibility level. arXiv preprint. arXiv:2110.10874

  • Xiao W, Beltagy I, Carenini G et al (2021) PRIMER: pyramid-based masked sentence pre-training for multi-document summarization. arXiv preprint. arXiv:2110.08499

  • Xie Y, Sun F, Deng Y et al (2021) Factual consistency evaluation for text summarization via counterfactual estimation. arXiv preprint. arXiv:2108.13134

  • Xu W, Callison-Burch C, Napoles C (2015) Problems in current text simplification research: new data can help. Trans Assoc Comput Linguist 3:283–297

    Article  Google Scholar 

  • Yang A, Liu K, Liu J et al (2018) Adaptations of ROUGE and BLEU to better evaluate machine reading comprehension task. arXiv preprint. arXiv:1806.03578

  • Yi J, Wu F, Wu C et al (2022) Effective and efficient query-aware snippet extraction for web search. arXiv preprint. arXiv:2210.08809

  • You J, Li D, Kamigaito H et al (2023) Joint learning-based heterogeneous graph attention network for timeline summarization. J Nat Lang Process 30(1):184–214

    Article  Google Scholar 

  • Yu T, Dai W, Liu Z et al (2021a) Vision guided generative pre-trained language models for multimodal abstractive summarization. arXiv preprint. arXiv:2109.02401

  • Yu Y, Jatowt A, Doucet A et al (2021b) Multi-timeline summarization (MTLs): Improving timeline summarization by generating multiple summaries. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: long papers, 2021, vol 1, pp 377–387

  • Yuan L, Chen Y, Wang T et al (2021) Tokens-to-token VIT: training vision transformers from scratch on ImageNet. In: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp 558–567

  • Zhang X, Lapata M (2017) Sentence simplification with deep reinforcement learning. arXiv preprint. arXiv:1703.10931

  • Zhang T, Kishore V, Wu F et al (2019) BERTScore: evaluating text generation with BERT. arXiv preprint. arXiv:1904.09675

  • Zhang J, Zhao Y, Saleh M et al (2020) PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning, PMLR, 2020, pp 11328–11339

  • Zhang Y, Ni A, Mao Z et al (2021) Summ\(^n\): a multi-stage summarization framework for long input dialogues and documents. arXiv preprint. arXiv:2110.10150

  • Zhao W, Peyrard M, Liu F et al (2019) MoverScore: text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint. arXiv:1909.02622

  • Zheng H, Lapata M (2019) Sentence centrality revisited for unsupervised summarization. arXiv preprint. arXiv:1906.03508

  • Zhong M, Yin D, Yu T et al (2021) QMSum: a new benchmark for query-based multi-domain meeting summarization. arXiv preprint. arXiv:2104.05938

  • Zhou B, Chen Y, Liu K et al (2022) Generating temporally-ordered event sequences via event optimal transport. In: Proceedings of the 29th international conference on computational linguistics, 2022, pp 1875–1884

  • Zhu J, Wang Q, Wang Y et al (2019) NCLS: neural cross-lingual summarization. arXiv preprint. arXiv:1909.00156

  • Zhu C, Hinthorn W, Xu R et al (2020a) Enhancing factual consistency of abstractive summarization. arXiv preprint. arXiv:2003.08612

  • Zhu J, Zhou Y, Zhang J et al (2020b) Attend, translate and summarize: an efficient method for neural cross-lingual summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 1309–1321

  • Zopf M (2018) Estimating summary quality with pairwise preferences. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1, pp 1687–1696

  • Zou Y, Zhu B, Hu X et al (2021) Low-resource dialogue summarization with domain-agnostic multi-source pretraining. arXiv preprint. arXiv:2109.04080

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guozi Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, L., Liu, Y., Xu, W. et al. From task to evaluation: an automatic text summarization review. Artif Intell Rev 56 (Suppl 2), 2477–2507 (2023). https://doi.org/10.1007/s10462-023-10582-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10582-5

Keywords

Navigation