Skip to main content

ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships

  • Conference paper
Advances in Artificial Intelligence and Its Applications (MICAI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

  • 1334 Accesses

Abstract

This paper proposes ELEXR, a novel metric to evaluate machine translation (MT). In our proposed method, we extract lexical co-occurrence relationships of a given reference translation (Ref) and its corresponding hypothesis sentence using hyperspace analogue to language space matrix. Then, for each term appearing in these two sentences, we convert the co-occurrence information into a conditional probability distribution. Finally, by comparing the conditional probability distributions of the words held in common by Ref and the candidate sentence (Cand) using Kullback-Leibler divergence, we can score the hypothesis. ELEXR can evaluate MT by using only one Ref assigned to each Cand without incorporating any semantic annotated resources like WordNet. Our experiments on eight language pairs of WMT 2011 submissions show that ELEXR outperforms baselines, TER and BLEU, on average at system-level correlation with human judgments. It achieves average Spearman’s rho correlation of about 0.78, Kendall’s tau correlation of about 0.66 and Pearson’s correlation of about 0.84, corresponding to improvements of about 0.04, 0.07 and 0.06 respectively over BLEU, the best baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods 28, 203–208 (1996)

    Article  Google Scholar 

  2. Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  3. Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M., Zaidan, O.F.: Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pp. 17–53. Association for Computational Linguistics (2010)

    Google Scholar 

  4. Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 22–64. Association for Computational Linguistics (2011)

    Google Scholar 

  5. Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics (2012)

    Google Scholar 

  6. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  7. Agarwal, A., Lavie, A.: Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 115–118. Association for Computational Linguistics (2008)

    Google Scholar 

  8. Chen, B., Kuhn, R.: Amber: A modified bleu, enhanced ranking metric. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 71–77 (2011)

    Google Scholar 

  9. Chen, B., Kuhn, R., Foster, G.: Improving amber, an mt evaluation metric. In: NAACL 2012 Workshop on Statistical Machine Translation (WMT 2012), pp. 59–63 (2012)

    Google Scholar 

  10. Chen, B., Kuhn, R., Larkin, S.: Port: a precision-order-recall mt evaluation metric for tuning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012) (2012)

    Google Scholar 

  11. Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

    Google Scholar 

  12. Lavie, A., Denkowski, M.J.: The meteor metric for automatic evaluation of machine translation. Machine Translation 23, 105–115 (2009)

    Article  Google Scholar 

  13. Dahlmeier, D., Liu, C., Ng, H.T.: Tesla at wmt 2011: Translation evaluation and tunable metric. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 78–84. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Song, X., Cohn, T.: Regression and ranking based optimisation for sentence level machine translation evaluation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 123–129. Association for Computational Linguistics (2011)

    Google Scholar 

  15. Popović, M.: Morphemes and pos tags for n-gram based evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 104–107. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Rios, M., Aziz, W., Specia, L.: Tine: A metric to assess mt adequacy. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 116–122. Association for Computational Linguistics (2011)

    Google Scholar 

  17. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)

    Google Scholar 

  18. Snover, M., Madnani, N., Dorr, B., Schwartz, R.: Fluency, adequacy, or hter? exploring different human judgments with a tunable mt metric. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, vol. 30, pp. 259–268. Association for Computational Linguistics (2009)

    Google Scholar 

  19. Nieen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: Fast evaluation for mt research. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, pp. 39–45 (2000)

    Google Scholar 

  20. Leusch, G., Ueffing, N., Ney, H.: Cder: Efficient mt evaluation using block movements. In: Proceedings of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics, pp. 241–248 (2006)

    Google Scholar 

  21. Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated dp based search for statistical translation. In: European Conf. on Speech Communication and Technology, pp. 2667–2670 (1997)

    Google Scholar 

  22. Wang, M., Manning, C.D.: Spede: Probabilistic edit distance metrics for mt evaluation. In: Proceedings of WMT (2012)

    Google Scholar 

  23. Kahn, J.G., Snover, M., Ostendorf, M.: Expected dependency pair match: predicting translation quality with expected syntactic structure. Machine Translation 23, 169–179 (2009)

    Article  Google Scholar 

  24. Wong, B., Kit, C.: Atec: automatic evaluation of machine translation via word choice and word order. Machine Translation 23, 141–155 (2009)

    Article  Google Scholar 

  25. Popović, M., Vilar, D., Avramidis, E., Burchardt, A.: Evaluation without references: Ibm1 scores as evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 99–103. Association for Computational Linguistics (2011)

    Google Scholar 

  26. Parton, K., Tetreault, J., Madnani, N., Chodorow, M.: E-rating machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 108–115. Association for Computational Linguistics (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahmoudi, A., Faili, H., Dehghan, M.H., Maleki, J. (2013). ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45114-0_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45113-3

  • Online ISBN: 978-3-642-45114-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics