인간 번역평가에서 재현도(recall)의 중요성

@article{ART002827495},
author={ 정혜연 and 최지수 and 허탁성 and 서수영 },
title={인간 번역평가에서 재현도(recall)의 중요성},
journal={번역학연구},
issn={1229-795X},
year={2022},
volume={23},
number={1},
pages={81-100},
doi={10.15749/jts.2022.23.1.003},
url={http://dx.doi.org/10.15749/jts.2022.23.1.003}

TY - JOUR
AU - 정혜연
AU - 최지수
AU - 허탁성
AU - 서수영
TI - 인간 번역평가에서 재현도(recall)의 중요성
T2 - 번역학연구
PY - 2022
VL - 23
IS - 1
PB - 한국번역학회
SP - 81-100
SN - 1229-795X
AB - In the automatic evaluation of translations, precision and recall are two indices that show how precisely (precision) and how much (recall) the system is able to recognize the well-translated portion in a translation. It would be ideal if two indices could be equally weighted in the evaluation system, since both accuracy and completeness are important criteria in evaluation of human translations (HT). This is, however, not easy, as both indices are negatively correlated. Papineni et al. (2002), for example, opted for precision, while Lavie et al. (2005) used both indices, giving recall nine times more weight than precision. The aim of this work is to examine which of the two indices correlates better with evaluation of professional evaluators and how much weight should be given each to precision and to recall. For this purpose, 459 translated texts were rated with precision, recall, F1 (harmonic mean of precision and recall) and Fmean (nine times higher weight on recall) as well as by professional evaluators. The results show that recall correlates better with human evaluation than precision in almost all cases, but not Fmean than F1, which were equivalent in all but one case. They indicate that recall is indeed a more important metric, but the weight as high as nine on recall is not ideal for HT evaluation.
KW - 자동평가, 번역품질, 정확도, 재현도, F1, Fmean
DO - 10.15749/jts.2022.23.1.003
UR - http://dx.doi.org/10.15749/jts.2022.23.1.003
ER -

정혜연 , 최지수 , 허탁성 and 서수영 (2022). 인간 번역평가에서 재현도(recall)의 중요성. 번역학연구, 23( 1), 81- 100.

정혜연 , 최지수 , 허탁성 and 서수영 . 2022, “인간 번역평가에서 재현도(recall)의 중요성”, 번역학연구, vol. 23, no. 1, pp. 81-100. Available from: doi:10.15749/jts.2022.23.1.003

정혜연 최지수 et al. 허탁성 서수영 “인간 번역평가에서 재현도(recall)의 중요성” 번역학연구 23.1 pp. 81-100 (2022): 81.

정혜연 , 최지수 , 허탁성 , 서수영 . 인간 번역평가에서 재현도(recall)의 중요성 번역학연구 [Internet]. 2022; 23( 1), : 81-100. Available from: doi:10.15749/jts.2022.23.1.003

정혜연 , 최지수 , 허탁성 and 서수영 . “인간 번역평가에서 재현도(recall)의 중요성” 번역학연구 23, no.1 (2022): 81-100. doi: 10.15749/jts.2022.23.1.003

홈 권호 목록 논문 목록 논문 상세

인간 번역평가에서 재현도(recall)의 중요성

번역학연구

약어 : JTS

2022, vol.23, no.1, pp.81 - 100

DOI : 10.15749/jts.2022.23.1.003

발행기관 : 한국번역학회

연구분야 : 통역번역학

정혜연¹ , 최지수² , 허탁성³ , 서수영⁴

¹한국외국어대학교

²한국외국어대학교

³한림대학교

⁴한림대학교

인용한 논문 수 : 2 서지 간략 보기

초록

In the automatic evaluation of translations, precision and recall are two indices that show how precisely (precision) and how much (recall) the system is able to recognize the well-translated portion in a translation. It would be ideal if two indices could be equally weighted in the evaluation system, since both accuracy and completeness are important criteria in evaluation of human translations (HT). This is, however, not easy, as both indices are negatively correlated. Papineni et al. (2002), for example, opted for precision, while Lavie et al. (2005) used both indices, giving recall nine times more weight than precision. The aim of this work is to examine which of the two indices correlates better with evaluation of professional evaluators and how much weight should be given each to precision and to recall. For this purpose, 459 translated texts were rated with precision, recall, F1 (harmonic mean of precision and recall) and Fmean (nine times higher weight on recall) as well as by professional evaluators. The results show that recall correlates better with human evaluation than precision in almost all cases, but not Fmean than F1, which were equivalent in all but one case. They indicate that recall is indeed a more important metric, but the weight as high as nine on recall is not ideal for HT evaluation.

키워드

자동평가, 번역품질, 정확도, 재현도, F1, Fmean

참고문헌(19)

[단행본] 권철민 / 2020 / 파이썬 머신러닝 완벽 가이드 / 위키북스
[학술지] 정혜연 / 2020 / 번역자동평가에서 풀리지 않은 과제 / 번역학연구 21 (1) : 9 ~ 29
[보고서] 박혜주 / 2007 / 문학번역 평가 시스템 연구
[학술지] 정혜연 / 2021 / 인간번역 자동평가에서 정답자와 평가자가 다르다면 / 독일언어문학 (93) : 75 ~ 95
[학술지] 정혜연 / 2021 / 임베딩을 활용한 인간번역의 자동평가 - 기계가 의미를 평가할 수 있을까 / 통번역학연구 25 (3) : 141 ~ 162
[학술대회] 한국외대 번역평가인증 연구팀 / 2016 / 번역인증제도 (실무편) / 한국외대 통번역연구소 학술대회 <언어, 통번역의 평가 및 인증> 발표집 : 23 ~ 33
[학술대회] Banerjee, Satanjeev / 2005 / METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments / Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization : 65 ~ 72
[학술지] Buckland, Michael / 1994 / The Relationship between Recall and Precision / Journal of the American Society for Information Science 45 (1) : 12 ~ 19
[학술지] Chung, Hye-Yeon / 2020 / Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR / Lebende Sprachen 65 (1) : 181 ~ 205
[학술대회] Han, Lifeng / 2018 / Machine Translation Evaluation Resources and Methods: A Survey / IPRC-2018 (Ireland Postgraduate Research Conference)
[학술지] Kunilovskaya, Maria / 2015 / How Far Do We Agree on the Quality of Translation? / English Studies at NBU 1 (1) : 18 ~ 31
[학술지] Lai, Tzu-Yun / 2011 / Reliability and Validity of a Scale-based Assessment for Translation Tests / Meta 56 (3) : 713 ~ 722
[인터넷자료] Lavie, Alon / The Significance of Recall in Automatic Metrics for MT Evaluation
[학술대회] Papineni, Kishore / 2002 / BLEU: A Method for Automatic Evaluation of Machine Translation / Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) : 311 ~ 318
[인터넷자료] Sasaki, Yutaka / The Truth of the F-measure
[학술지] Waddington, Christopher / 2001 / Should Translations Be Assessed Holistically or through Error Analysis? / HERMES Journal of Language and Communication in Business 26 : 15 ~ 37
[학술지] Waddington, Christopher / 2001 / Different Methods of Evaluating Student Translations: The Question of Validity / Meta 46 (2) : 311 ~ 325
[단행본] van Rijsbergen, Cornelius / 1979 / Information Retrieval / Butterworth
[학술대회] Zhang, Tianyi / 2020 / BERTScore: Evaluating Text Generation with BERT / Conference Paper at ICLR 2020 : 1 ~ 14