Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

Zeng, Xianfeng; Liu, Yijin; Meng, Fandong; Zhou, Jie

Computer Science > Computation and Language

arXiv:2308.03131 (cs)

[Submitted on 6 Aug 2023 (v1), last revised 10 Aug 2023 (this version, v4)]

Title:Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

Authors:Xianfeng Zeng, Yijin Liu, Fandong Meng, Jie Zhou

View PDF

Abstract:N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. However, recent studies have revealed a weak correlation between these matching-based metrics and human evaluations, especially when compared with neural-based metrics like BLEURT. In this paper, we conjecture that the performance bottleneck in matching-based metrics may be caused by the limited diversity of references. To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations. Within the WMT Metrics benchmarks, we observe that the multi-references F200spBLEU surpasses the conventional single-reference one by an accuracy improvement of 7.2\%. Remarkably, it also exceeds the neural-based BERTscore by an accuracy enhancement of 3.9\%. Moreover, we observe that the data leakage issue in large language models (LLMs) can be mitigated to a large extent by our multi-reference metric. We release the code and data at \url{this https URL}

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2308.03131 [cs.CL]
	(or arXiv:2308.03131v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.03131

Submission history

From: Xianfeng Zeng [view email]
[v1] Sun, 6 Aug 2023 14:49:26 UTC (2,669 KB)
[v2] Tue, 8 Aug 2023 02:01:14 UTC (2,670 KB)
[v3] Wed, 9 Aug 2023 10:49:20 UTC (2,670 KB)
[v4] Thu, 10 Aug 2023 02:08:04 UTC (2,670 KB)

Computer Science > Computation and Language

Title:Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators