A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation

Elliott, Debbie; Hartley, Anthony; Atwell, Eric

doi:10.1007/978-3-540-30194-3_8

A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation

Debbie Elliott²⁰,
Anthony Hartley²⁰ &
Eric Atwell²⁰

Conference paper

1149 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Abstract

Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

White, J.S.: How to evaluate machine translation. In: Somers, H. (ed.) Computers and translation: a translator’s guide, pp. 211–244. J. Benjamins, Amsterdam (2003)
Google Scholar
FEMTI: A Framework for the Evaluation of Machine Translation in ISLE (2004), http://www.issco.unige.ch/projects/isle/femti/
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. IBM Research Report RC22176. IBM: Yorktown Heights, NY (2001)
Google Scholar
Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of MT Summit VIII, Santiago de Compostela, Spain (2001)
Google Scholar
Akiba, Y., Sumita, E., Nakaiwa, H., Yamamoto, S., Okuno, H.G.: Experimental Comparison of MT Evaluation Methods: RED vs. BLEU. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)
Google Scholar
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broadcoverage bilingual corpus for speech translation of travel conversations in the real world. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)
Google Scholar
White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the 1994 Conference, Association for Machine Translation in the Americas, Columbia, Maryland (1994)
Google Scholar
Rajman, M., Hartley, A.: Automatically predicting MT systems rankings compatible with Fluency, Adequacy or Informativeness scores. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)
Google Scholar
Rajman, M., Hartley, A.: Automatic Ranking of MT Systems. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)
Google Scholar
Vanni, M., Miller, K.: Scaling the ISLE Framework: Validating Tests of Machine Translation Quality for Multi-Dimensional Measurement. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)
Google Scholar
Vanni, M., Miller, K.: Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Evaluation Metrics across Languages. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)
Google Scholar
White, J., Forner, M.: Predicting MT fidelity from noun-compound handling. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)
Google Scholar
Reeder, F., Miller, K., Doyon, K., White, J.: The Naming of Things and the Confusion of Tongues. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)
Google Scholar
Elliott, D., Hartley, A., Atwell, E.: Rationale for a multilingual corpus for machine translation evaluation. In: Proceedings of CL 2003: International Conference on Corpus Linguistics, Lancaster University, UK (2003)
Google Scholar
Elliott, D., Atwell, E., Hartley, A.: Compiling and Using a Shareable Parallel Corpus for Machine Translation Evaluation. In: Proceedings of the Workshop on The Amazing Utility of Parallel and Comparable Corpora, Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal (2004)
Google Scholar
SAE J2450: Translation Quality Metric, Society of Automotive Engineers, Warrendale, USA (2001)
Google Scholar
American Translators Association, Framework for Standard Error Marking, ATA Accreditation Program (2002), http://www.atanet.org/bin/view/fpl/12438.html
Correa, N.: A Fine-grained Evaluation Framework for Machine Translation System Development. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)
Google Scholar
Flanagan, M.: Error Classification for MT Evaluation. In: Technology Partnerships for Crossing the Language Barrier.In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland (1994)
Google Scholar
Loffler-Laurian, A.-M.: Typologie des erreurs. In: La Traduction Automatique. Presses Universitaires Septentrion, Lille (1996)
Google Scholar
Roudaud, B., Puerta, M.C., Gamrat, O.: A Procedure for the Evaluation and Improvement of an MT System by the End-User. In: Arnold, D., Humphreys, R.L., Sadler, L. (eds.) Special Issue on Evaluation of MT Systems. Machine Translation, vol. 8 (1993)
Google Scholar
Van Slype, G.: Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR 19142. Bureau Marcel van Dijk (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Centre for Translation Studies, University of Leeds, LS2 9JT, UK
Debbie Elliott, Anthony Hartley & Eric Atwell

Authors

Debbie Elliott
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Hartley
View author publications
You can also search for this author in PubMed Google Scholar
Eric Atwell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Robert E. Frederking
Intelligence Technology Innovation Center, 20505, Washington, D.C., USA
Kathryn B. Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elliott, D., Hartley, A., Atwell, E. (2004). A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-30194-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics