Abstract
Automatic summarization is attracting increasing attention as one of the most promising research areas. This technology has been tried in various real-world applications in recent years and achieved a good response. However, the applicability of conventional evaluation metrics cannot keep up with rapidly evolving summarization task formats and ensuing indicator. After recent years of research, automatic summarization task requires not only readability and fluency, but also informativeness and consistency. Diversified application scenarios also bring new challenges both for generative language models and evaluation metrics. In this review, we analysis and specifically focus on the difference between the task format and the evaluation metrics.
Similar content being viewed by others
References
Acharya S, Di Eugenio B, Boyd A et al (2018) Towards generating personalized hospitalization summaries. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: student research workshop, 2018, pp 74–82
Agirre E, Gonzalez Agirre A, Lopez-Gazpio I et al (2016) In: SemEval-2016 task 2: interpretable semantic textual similarity. SemEval-2016. 10th International workshop on semantic evaluation, 16–17 June 2016, San Diego, CA. ACL (Association for Computational Linguistics), Stroudsburg, pp 512–524
Ahmad WU, Chakraborty S, Ray B et al (2021) Unified pre-training for program understanding and generation. arXiv preprint. arXiv:2103.06333
Akbik A, Bergmann T, Blythe D et al (2019) Flair: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics (demonstrations), 2019, pp 54–59
Allahyari M, Pouriyeh S, Assefi M et al (2017) Text summarization techniques: a brief survey. arXiv preprint. arXiv:1707.02268
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp 65–72
Barzilay R, Lee L (2003) Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. arXiv preprint cs/0304006
Baumel T, Eyal M, Elhadad M (2018) Query focused abstractive summarization: incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint. arXiv:1801.07704
Baziotis C, Androutsopoulos I, Konstas I et al (2019) Seq \(^{3}\): differentiable sequence-to-sequence-to-sequence autoencoder for unsupervised abstractive sentence compression. arXiv preprint. arXiv:1904.03651
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. arXiv preprint. arXiv:2004.05150
Bowman SR, Angeli G, Potts C et al (2015) A large annotated corpus for learning natural language inference. arXiv preprint. arXiv:1508.05326
Bražinskas A, Lapata M, Titov I (2019) Unsupervised opinion summarization as copycat-review generation. arXiv preprint. arXiv:1911.02247
Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems, 2020, vol 33, pp 1877–1901
Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, 2001, p 2
Burstein J, Doran C, Solorio T (2019) In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies (long and short papers),2019, vol 1
Cajueiro DO, Nery AG, Tavares I et al (2023) A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding. arXiv preprint. arXiv:2301.03403
Cao S, Wang L (2021) Inference time style control for summarization. arXiv preprint. arXiv:2104.01724
Cao Y, Liu H, Wan X (2020) Jointly learning to align and summarize for neural cross-lingual summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 6220–6231
Carletta J, Ashby S, Bourban S et al (2006) The AMI meeting corpus: a pre-announcement. In: Machine learning for multimodal interaction: second international workshop, MLMI 2005, Edinburgh, UK, 11–13 July 2005, revised selected papers 2. Springer, pp 28–39
Chen J, Yang D (2021) Structure-aware abstractive conversation summarization via discourse and action graphs. arXiv preprint. arXiv:2104.08400
Chen J, Zhuge H (2018) Abstractive text-image summarization using multi-modal attentional hierarchical RNN. In: Proceedings of the 2018 conference on empirical methods in natural language processing, 2918, pp 4046–4056
Chen X, Chan Z, Gao S et al (2019) Learning towards abstractive timeline summarization. In: IJCAI, 2019, pp 4939–4945
Chen Y, Liu P, Qiu X (2021) Are factuality checkers reliable? Adversarial meta-evaluation of factuality in summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp 2082–2095
Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2016, pp 93–98
Clark E, Celikyilmaz A, Smith NA (2019) Sentence Mover’s Similarity: automatic evaluation for multi-sentence texts. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, 2019, pp 2748–2760
Cohan A, Dernoncourt F, Kim DS et al (2018) A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint. arXiv:1804.05685
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Cohen N, Kalinsky O, Ziser Y et al (2021) WikiSum: coherent summarization dataset for efficient human-evaluation. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: short papers, 2021, vol 2, pp 212–219
Colombo PJA, Clavel C, Piantanida P (2022) Infolm: a new metric to evaluate summarization and data2text generation. In: Proceedings of the AAAI conference on artificial intelligence, 2022, pp 10554–10562
Cui C, Yang H, Wang Y et al (2023) Deep multi-modal fusion of image and non-image data in disease diagnosis and prognosis: a review. Prog Biomed Eng 5(2):022001
Dagan I, Glickman O, Magnini B (2006) The Pascal recognising textual entailment challenge. In: Machine learning challenges. Evaluating predictive uncertainty, visual object classification, and recognising textual entailment: first PASCAL machine learning challenges workshop, MLCW 2005, Southampton, UK, 11–13 April 2005, revised selected papers. Springer, pp 177–190
Dang HT (2005) Overview of DUC 2005. In: Proceedings of the document understanding conference, 2005. Citeseer, pp 1–12
Dang HT (2006) DUC 2005: evaluation of question-focused summarization systems. In: Proceedings of the workshop on task-focused summarization and question answering, 2006, pp 48–55
Delbrouck JB, Zhang C, Rubin D (2021) Qiai at Mediqa 2021: multimodal radiology report summarization. In: Proceedings of the 20th workshop on biomedical language processing, 2021, pp 285–290
Deng M, Tan B, Liu Z et al (2021) Compression, transduction, and creation: a unified framework for evaluating natural language generation. arXiv preprint. arXiv:2109.06379
Devlin J, Chang MW, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint. arXiv:1810.04805
Dolan B, Brockett C (2005) Automatically constructing a corpus of sentential paraphrases. In: Third international workshop on paraphrasing (IWP2005), 2005
Dong Y, Wang S, Gan Z et al (2020) Multi-fact correction in abstractive text summarization. arXiv preprint. arXiv:2010.02443
Dou ZY, Kumar S, Tsvetkov Y (2020) A deep reinforced model for zero-shot cross-lingual summarization with bilingual semantic similarity rewards. arXiv preprint. arXiv:2006.15454
Egan N, Vasilyev O, Bohannon J (2022) Play the Shannon game with language models: a human-free approach to summary evaluation. In: Proceedings of the AAAI conference on artificial intelligence, 2022, pp 10599–10607
Fabbri AR, Li I, She T et al (2019) Multi-News: a large-scale multi-document summarization dataset and abstractive hierarchical model. arXiv preprint. arXiv:1906.01749
Fabbri AR, Han S, Li H et al (2020) Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. arXiv preprint. arXiv:2010.12836
Fabbri AR, Kryściński W, McCann B et al (2021) Summeval: re-evaluating summarization evaluation. Trans Assoc Comput Linguist 9:391–409
Feigenblat G, Roitman H, Boni O et al (2017) Unsupervised query-focused multi-document summarization using the cross entropy method. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017, pp 961–964
Feng Z, Guo D, Tang D et al (2020) CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint. arXiv:2002.08155
Feng X, Feng X, Qin B (2021) Incorporating commonsense knowledge into abstractive dialogue summarization via heterogeneous graph networks. In: Chinese computational linguistics: 20th China national conference, CCL 2021, Hohhot, China, 13–15 August 2021, proceedings. Springer, pp 127–142
Fu X, Zhang Y, Wang T et al (2021) RepSum: unsupervised dialogue summarization based on replacement strategy. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: long papers, 2021, vol 1, pp 6042–6051
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66
Ganesan K (2018) ROUGE 2.0: updated and improved measures for evaluation of summarization tasks. arXiv preprint. arXiv:1803.01937
Gao Y, Zhao W, Eger S (2020) SUPERT: towards new frontiers in unsupervised evaluation metrics for multi-document summarization. arXiv preprint. arXiv:2005.03724
Ghalandari DG, Ifrim G (2020) Examining the state-of-the-art in news timeline summarization. arXiv preprint. arXiv:2005.10107
Ghalandari DG, Hokamp C, Pham NT et al (2020) A large-scale multi-document summarization dataset from the Wikipedia current events portal. arXiv preprint. arXiv:2005.10070
Gliwa B, Mochol I, Biesek M et al (2019) SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. arXiv preprint. arXiv:1911.12237
Gu N, Ash E, Hahnloser RH (2021) MemSum: extractive summarization of long documents using multi-step episodic Markov decision processes. arXiv preprint. arXiv:2107.08929
Guo M, Ainslie J, Uthus D et al (2021) LongT5: efficient text-to-text transformer for long sequences. arXiv preprint. arXiv:2112.07916
Haiduc S, Aponte J, Moreno L et al (2010) On the use of automated text summarization techniques for summarizing source code. In: 2010 17th Working conference on reverse engineering, 2010. IEEE, pp 35–44
Hsu CC, Tan C (2021) Decision-focused summarization. arXiv preprint. arXiv:2109.06896
Hsu WT, Lin CK, Lee MY et al (2018) A unified model for extractive and abstractive summarization using inconsistency loss. arXiv preprint. arXiv:1805.06266
Hu B, Chen Q, Zhu F (2015) LCSTS: a large scale Chinese short text summarization dataset. arXiv preprint. arXiv:1506.05865
Hu X, Li G, Xia X et al (2018a) Deep code comment generation. In: Proceedings of the 26th conference on program comprehension, 2018, pp 200–210
Hu X, Li G, Xia X et al (2018b) Summarizing source code with transferred API knowledge. Research collection. School of Computing and Information Systems
Huang L, Wu L, Wang L (2020) Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward. arXiv preprint. arXiv:2005.01159
Huang L, Cao S, Parulian N et al (2021) Efficient attentions for long document summarization. arXiv preprint. arXiv:2104.02112
Husain H, Wu HH, Gazit T et al (2019) CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint. arXiv:1909.09436
Im J, Kim M, Lee H et al (2021) Self-supervised multimodal opinion summarization. arXiv preprint. arXiv:2105.13135
Indu M, Kavitha K (2016) Review on text summarization evaluation methods. In: 2016 International conference on research advances in integrated navigation systems (RAINS), 2016. IEEE, pp 1–4
Inui K, Jiang J, Ng V et al (2019) In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2019
Jain R, Jangra A, Saha S et al (2022) A survey on medical document summarization. arXiv preprint. arXiv:2212.01669
Jia Q, Liu Y, Tang H et al (2022) Post-training dialogue summarization using pseudo-paraphrasing. arXiv preprint. arXiv:2204.13498
Jiang M, Hu J, Huang Q et al (2019) Reo-relevance, extraness, omission: a fine-grained evaluation for image captioning. arXiv preprint. arXiv:1909.02217
Jin H, Wan X (2020) Abstractive multi-document summarization via joint learning with single-document summarization. In: Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp 2545–2554
Jin H, Wang T, Wan X (2020) Multi-granularity interaction network for extractive and abstractive multi-document summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 6244–6254
Joshi M, Chen D, Liu Y et al (2020) SpanBERT: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Kane H, Kocyigit MY, Abdalla A et al (2020) Nubia: neural based interchangeability assessor for text generation. arXiv preprint. arXiv:2004.14667
Kasner Z (2020) Domain adaptation for natural language generation. Tiny Transactions of Computer Science
Khullar A, Arora U (2020) MAST: multimodal abstractive summarization with trimodal hierarchical attention. arXiv preprint. arXiv:2010.08021
Kiritchenko S, Mohammad SM (2017) Capturing reliable fine-grained sentiment associations by crowdsourcing and best-worst scaling. arXiv preprint. arXiv:1712.01741
Koh HY, Ju J, Liu M et al (2022) An empirical survey on long document summarization: datasets, models, and metrics. ACM Comput Surv 55(8):1–35
Krishna K, Srinivasan BV (2018) Generating topic-oriented summaries using neural attention. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1, pp 1697–1705
Krishna K, Khosla S, Bigham JP et al (2020) Generating soap notes from doctor–patient conversations using modular summarization techniques. arXiv preprint. arXiv:2005.01795
Krubiński M, Pecina P (2023) Mlask: multimodal summarization of video-based news articles. In: Findings of the Association for Computational Linguistics: EACL 2023, 2023, pp 880–894
Kryściński W, McCann B, Xiong C et al (2019) Evaluating the factual consistency of abstractive text summarization. arXiv preprint. arXiv:1910.12840
Kulkarni S, Chammas S, Zhu W et al (2020) AQUAMUSE: automatically generating datasets for query-based multi-document summarization. arXiv preprint. arXiv:2010.12694
Kusner M, Sun Y, Kolkin N et al (2015) From word embeddings to document distances. In: International conference on machine learning, PMLR, 2015, pp 957–966
Kwon O, Kim D, Lee SR et al (2021) Handling out-of-vocabulary problem in Hangeul word embeddings. In: Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics: main volume, 2021, pp 3213–3221
LeClair A, McMillan C (2019) Recommendations for datasets for source code summarization. arXiv preprint. arXiv:1904.02660
LeClair A, Jiang S, McMillan C (2019) A neural model for generating natural language summaries of program subroutines. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), 2019. IEEE, pp 795–806
Lee N, Bang Y, Yu T et al (2022) Neus: neutral multi-news summarization for mitigating framing bias. arXiv preprint. arXiv:2204.04902
Lewis M, Liu Y, Goyal N et al (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint. arXiv:1910.13461
Li Y, Li S (2014) Query-focused multi-document summarization: combining a topic model with graph-based semi-supervised learning. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014, pp 1197–1207
Li H, Zhu J, Ma C et al (2017) Multi-modal summarization for asynchronous collection of text, image, audio and video. In: Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp 1092–1102
Li B, Peng X, Wang Z et al (2018a) End-to-end united video dehazing and detection. In: Proceedings of the AAAI conference on artificial intelligence, 2018
Li H, Zhu J, Liu T et al (2018b) Multi-modal sentence summarization with modality attention and image filtering. In: IJCAI, 2018, pp 4152–4158
Liang Y, Meng F, Zhou C et al (2022) A variational hierarchical model for neural cross-lingual summarization. arXiv preprint. arXiv:2203.03820
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Lin H, Ma L, Zhu J et al (2021) CSDS: a fine-grained Chinese dataset for customer service dialogue summarization. arXiv preprint. arXiv:2108.13139
Lin H, Zhu J, Xiang L et al (2022) Other roles matter! Enhancing role-oriented dialogue summarization via role interactions. arXiv preprint. arXiv:2205.13190
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. arXiv preprint. arXiv:1908.08345
Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226
Liu Y, Zhu C, Zeng M (2021) End-to-end segmentation-based news summarization. arXiv preprint. arXiv:2110.07850
Liu X, Zang S, Zhang C et al (2022a) CLTS+: a new Chinese long text summarization dataset with abstractive summaries. In: Part I (ed) Artificial neural networks and machine learning-ICANN 2022: 31st international conference on artificial neural networks, Bristol, UK, 6–9 September 2022, proceedings. Springer, pp 73–84
Liu Y, Jia Q, Zhu K (2022b) Reference-free summarization evaluation via semantic correlation and compression ratio. In: Proceedings of the 2022 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2022, pp 2109–2115
Lloret E, Plaza L, Aker A (2018) The challenging task of summary evaluation: an overview. Lang Resour Eval 52:101–148
Louviere JJ, Flynn TN, Marley AAJ (2015) Best–worst scaling: theory, methods and applications. Cambridge University Press, Cambridge
Lu Y, Dong Y, Charlin L (2020) Multi-Xscience: a large-scale dataset for extreme multi-document summarization of scientific articles. arXiv preprint. arXiv:2010.14235
Martin AF, Przybocki MA (2001) The NIST speaker recognition evaluations: 1996–2001. In: 2001: a speaker Odyssey-the speaker recognition workshop, 2001
Martschat S, Markert K (2018) A temporally sensitive submodularity framework for timeline summarization. arXiv preprint. arXiv:1810.07949
Mihalcea R, Tarau P (2004) TextRank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp 404–411
Murray G, Renals S, Carletta J (2005) Extractive summarization of meeting recordings. Centre for Speech Technology Research
Nallapati R, Zhou B, Gulcehre C et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint. arXiv:1602.06023
Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint. arXiv:1808.08745
Nazari N, Mahdavi M (2019) A survey on automatic text summarization. J AI Data Min 7(1):121–135
Nema P, Khapra M, Laha A et al (2017) Diversity driven attention model for query-based abstractive summarization. arXiv preprint. arXiv:1704.08300
OpenAI (2023) GPT-4 technical report. arXiv abs/2303.08774
Over P, Yen J (2004) An introduction to DUC-2004. National Institute of Standards and Technology
Palaskar S, Libovickỳ J, Gella S et al (2019) Multimodal abstractive summarization for How2 videos. arXiv preprint. arXiv:1906.07901
Papineni K, Roukos S, Ward T et al (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp 311–318
Parvez MR, Ahmad WU, Chakraborty S et al (2021) Retrieval augmented code generation and summarization. arXiv preprint. arXiv:2108.11601
Pasunuru R, Bansal M (2017) Multi-task video captioning with video and entailment generation. arXiv preprint. arXiv:1704.07489
Perez-Beltrachini L, Lapata M (2021) Multi-document summarization with determinantal point process attention. J Artif Intell Res 71:371–399
Perez-Beltrachini L, Lapata M (2022) Models and datasets for cross-lingual summarisation. arXiv preprint. arXiv:2202.09583
Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1. Association for Computational Linguistics, New Orleans, pp 2227–2237. https://aclanthology.org/N18-1202
Peyrard M, Botschen T, Gurevych I (2017) Learning to score system summaries for better content selection evaluation. In: Proceedings of the workshop on new frontiers in summarization, 2017, pp 74–84
Pilault J, Li R, Subramanian S et al (2020) On extractive and abstractive neural document summarization with transformer language models. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), 2020, pp 9308–9319
Popović M (2015) CHRF: character n-gram f-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, 2015, pp 392–395
Popović M (2017) CHRF++: words helping character n-grams. In: Proceedings of the second conference on machine translation, 2017, pp 612–618
Post M (2018) A call for clarity in reporting bleu scores. arXiv preprint. arXiv:1804.08771
Purver M, Körding KP, Griffiths TL et al (2006) Unsupervised topic modelling for multi-party spoken discourse. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, 2006, pp 17–24
Pushpakar L, D’Souza A (1958) A brief study on Hindi text summarization using natural language processing. Res Dev 2(4):354–361
Recasens M, Hovy E (2011) BLANC: implementing the rand index for coreference evaluation. Nat Lang Eng 17(4):485–510
Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint. arXiv:1908.10084
Ribeiro LF, Schmitt M, Schütze H et al (2020) Investigating pretrained language models for graph-to-text generation. arXiv preprint. arXiv:2007.08426
Ribeiro LF, Liu M, Gurevych I et al (2022) FACTGRAPH: evaluating factuality in summarization with semantic graph representations. arXiv preprint. arXiv:2204.06508
Roberts A, Raffel C, Shazeer N (2020) How much knowledge can you pack into the parameters of a language model? arXiv preprint. arXiv:2002.08910
Rohde T, Wu X, Liu Y (2021) Hierarchical learning for generation with long source sequences. arXiv preprint. arXiv:2104.07545
Roush A (2020) CX DB8: a queryable extractive summarizer and semantic search engine. arXiv preprint. arXiv:2012.03942
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv preprint. arXiv:1509.00685
Sanabria R, Caglayan O, Palaskar S et al (2018) How2: a large-scale dataset for multimodal language understanding. arXiv preprint. arXiv:1811.00347
Schluter N (2017) The limits of automatic summarisation according to rouge. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics, 2017. Association for Computational Linguistics, pp 41–45
Schumann R, Mou L, Lu Y et al (2020) Discrete optimization for unsupervised sentence summarization with word-level extraction. arXiv preprint. arXiv:2005.01791
Scialom T, Lamprier S, Piwowarski B et al (2019) Answers unite! Unsupervised metrics for reinforced summarization models. arXiv preprint. arXiv:1909.01610
Sellam T, Das D, Parikh AP (2020) BLEURT: learning robust metrics for text generation. arXiv preprint. arXiv:2004.04696
Shapira O, Pasunuru R, Ronen H et al (2021) Extending multi-document summarization evaluation to the interactive setting. In: Proceedings of the 2021 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, 2021, pp 657–677
Shi E, Wang Y, Du L et al (2021) Cast: enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint. arXiv:2108.12987
Shi E, Wang Y, Du L et al (2022) On the evaluation of neural code summarization. In: Proceedings of the 44th international conference on software engineering, 2022, pp 1597–1608
Smith S, Patwary M, Norick B et al (2022) Using Deepspeed and Megatron to train Megatron-Turing NLG 530b, a large-scale generative language model. arXiv preprint. arXiv:2201.11990
Speer R, Chin J, Havasi C (2017) ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the AAAI conference on artificial intelligence, 2017
Steen J, Markert K (2019) Abstractive timeline summarization. In: Proceedings of the 2nd workshop on new frontiers in summarization, 2019, pp 21–31
Sun E, Hou Y, Wang D et al (2021) D2s: document-to-slide generation via query-based text summarization. arXiv preprint. arXiv:2105.03664
Suneetha M, Fatima SS, Pervez SMZ (2011) Clustering of web search results using suffix tree algorithm and avoidance of repetition of same images in search results using l-point comparison algorithm. In: 2011 International conference on emerging trends in electrical and computer technology, 2011. IEEE, pp 1041–1046
Syed AA, Gaol FL, Matsuo T (2021) A survey of the state-of-the-art models in neural abstractive text summarization. IEEE Access 9:13248–13265
Tang X, Nair A, Wang B et al (2021) Confit: toward faithful dialogue summarization with linguistically-informed contrastive fine-tuning. arXiv preprint. arXiv:2112.08713
Tran GB, Tran TA, Tran NK et al (2013) Leveraging learning to rank in an optimization framework for timeline summarization. In: SIGIR 2013 workshop on time-aware information access (TAIA), 2013
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems, 2017, vol 30
Vedantam R, Lawrence Zitnick C, Parikh D (2015) CIDEr: consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp 4566–4575
Wan X, Li H, Xiao J (2010) Cross-language document summarization based on machine translation quality prediction. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 2010, pp 917–926
Wang K, Quan X, Wang R (2019) BiSET: bi-directional selective encoding with template for abstractive summarization. arXiv preprint. arXiv:1906.05012
Wang D, Chen J, Wu X et al (2021) CNewSum: a large-scale Chinese news summarization dataset with human-annotated adequacy and deducibility level. arXiv preprint. arXiv:2110.10874
Xiao W, Beltagy I, Carenini G et al (2021) PRIMER: pyramid-based masked sentence pre-training for multi-document summarization. arXiv preprint. arXiv:2110.08499
Xie Y, Sun F, Deng Y et al (2021) Factual consistency evaluation for text summarization via counterfactual estimation. arXiv preprint. arXiv:2108.13134
Xu W, Callison-Burch C, Napoles C (2015) Problems in current text simplification research: new data can help. Trans Assoc Comput Linguist 3:283–297
Yang A, Liu K, Liu J et al (2018) Adaptations of ROUGE and BLEU to better evaluate machine reading comprehension task. arXiv preprint. arXiv:1806.03578
Yi J, Wu F, Wu C et al (2022) Effective and efficient query-aware snippet extraction for web search. arXiv preprint. arXiv:2210.08809
You J, Li D, Kamigaito H et al (2023) Joint learning-based heterogeneous graph attention network for timeline summarization. J Nat Lang Process 30(1):184–214
Yu T, Dai W, Liu Z et al (2021a) Vision guided generative pre-trained language models for multimodal abstractive summarization. arXiv preprint. arXiv:2109.02401
Yu Y, Jatowt A, Doucet A et al (2021b) Multi-timeline summarization (MTLs): Improving timeline summarization by generating multiple summaries. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing: long papers, 2021, vol 1, pp 377–387
Yuan L, Chen Y, Wang T et al (2021) Tokens-to-token VIT: training vision transformers from scratch on ImageNet. In: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp 558–567
Zhang X, Lapata M (2017) Sentence simplification with deep reinforcement learning. arXiv preprint. arXiv:1703.10931
Zhang T, Kishore V, Wu F et al (2019) BERTScore: evaluating text generation with BERT. arXiv preprint. arXiv:1904.09675
Zhang J, Zhao Y, Saleh M et al (2020) PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning, PMLR, 2020, pp 11328–11339
Zhang Y, Ni A, Mao Z et al (2021) Summ\(^n\): a multi-stage summarization framework for long input dialogues and documents. arXiv preprint. arXiv:2110.10150
Zhao W, Peyrard M, Liu F et al (2019) MoverScore: text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint. arXiv:1909.02622
Zheng H, Lapata M (2019) Sentence centrality revisited for unsupervised summarization. arXiv preprint. arXiv:1906.03508
Zhong M, Yin D, Yu T et al (2021) QMSum: a new benchmark for query-based multi-domain meeting summarization. arXiv preprint. arXiv:2104.05938
Zhou B, Chen Y, Liu K et al (2022) Generating temporally-ordered event sequences via event optimal transport. In: Proceedings of the 29th international conference on computational linguistics, 2022, pp 1875–1884
Zhu J, Wang Q, Wang Y et al (2019) NCLS: neural cross-lingual summarization. arXiv preprint. arXiv:1909.00156
Zhu C, Hinthorn W, Xu R et al (2020a) Enhancing factual consistency of abstractive summarization. arXiv preprint. arXiv:2003.08612
Zhu J, Zhou Y, Zhang J et al (2020b) Attend, translate and summarize: an efficient method for neural cross-lingual summarization. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, 2020, pp 1309–1321
Zopf M (2018) Estimating summary quality with pairwise preferences. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1, pp 1687–1696
Zou Y, Zhu B, Hu X et al (2021) Low-resource dialogue summarization with domain-agnostic multi-source pretraining. arXiv preprint. arXiv:2109.04080
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, L., Liu, Y., Xu, W. et al. From task to evaluation: an automatic text summarization review. Artif Intell Rev 56 (Suppl 2), 2477–2507 (2023). https://doi.org/10.1007/s10462-023-10582-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-023-10582-5