ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships

Mahmoudi, Alireza; Faili, Heshaam; Dehghan, Mohammad Hossein; Maleki, Jalal

doi:10.1007/978-3-642-45114-0_32

Alireza Mahmoudi²²,
Heshaam Faili²²,
Mohammad Hossein Dehghan²² &
…
Jalal Maleki²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1334 Accesses

Abstract

This paper proposes ELEXR, a novel metric to evaluate machine translation (MT). In our proposed method, we extract lexical co-occurrence relationships of a given reference translation (Ref) and its corresponding hypothesis sentence using hyperspace analogue to language space matrix. Then, for each term appearing in these two sentences, we convert the co-occurrence information into a conditional probability distribution. Finally, by comparing the conditional probability distributions of the words held in common by Ref and the candidate sentence (Cand) using Kullback-Leibler divergence, we can score the hypothesis. ELEXR can evaluate MT by using only one Ref assigned to each Cand without incorporating any semantic annotated resources like WordNet. Our experiments on eight language pairs of WMT 2011 submissions show that ELEXR outperforms baselines, TER and BLEU, on average at system-level correlation with human judgments. It achieves average Spearman’s rho correlation of about 0.78, Kendall’s tau correlation of about 0.66 and Pearson’s correlation of about 0.84, corresponding to improvements of about 0.04, 0.07 and 0.06 respectively over BLEU, the best baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods 28, 203–208 (1996)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951)
Article MathSciNet MATH Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M., Zaidan, O.F.: Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pp. 17–53. Association for Computational Linguistics (2010)
Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 22–64. Association for Computational Linguistics (2011)
Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics (2012)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Agarwal, A., Lavie, A.: Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 115–118. Association for Computational Linguistics (2008)
Google Scholar
Chen, B., Kuhn, R.: Amber: A modified bleu, enhanced ranking metric. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 71–77 (2011)
Google Scholar
Chen, B., Kuhn, R., Foster, G.: Improving amber, an mt evaluation metric. In: NAACL 2012 Workshop on Statistical Machine Translation (WMT 2012), pp. 59–63 (2012)
Google Scholar
Chen, B., Kuhn, R., Larkin, S.: Port: a precision-order-recall mt evaluation metric for tuning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012) (2012)
Google Scholar
Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Lavie, A., Denkowski, M.J.: The meteor metric for automatic evaluation of machine translation. Machine Translation 23, 105–115 (2009)
Article Google Scholar
Dahlmeier, D., Liu, C., Ng, H.T.: Tesla at wmt 2011: Translation evaluation and tunable metric. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 78–84. Association for Computational Linguistics (2011)
Google Scholar
Song, X., Cohn, T.: Regression and ranking based optimisation for sentence level machine translation evaluation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 123–129. Association for Computational Linguistics (2011)
Google Scholar
Popović, M.: Morphemes and pos tags for n-gram based evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 104–107. Association for Computational Linguistics (2011)
Google Scholar
Rios, M., Aziz, W., Specia, L.: Tine: A metric to assess mt adequacy. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 116–122. Association for Computational Linguistics (2011)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Snover, M., Madnani, N., Dorr, B., Schwartz, R.: Fluency, adequacy, or hter? exploring different human judgments with a tunable mt metric. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, vol. 30, pp. 259–268. Association for Computational Linguistics (2009)
Google Scholar
Nieen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: Fast evaluation for mt research. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, pp. 39–45 (2000)
Google Scholar
Leusch, G., Ueffing, N., Ney, H.: Cder: Efficient mt evaluation using block movements. In: Proceedings of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics, pp. 241–248 (2006)
Google Scholar
Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated dp based search for statistical translation. In: European Conf. on Speech Communication and Technology, pp. 2667–2670 (1997)
Google Scholar
Wang, M., Manning, C.D.: Spede: Probabilistic edit distance metrics for mt evaluation. In: Proceedings of WMT (2012)
Google Scholar
Kahn, J.G., Snover, M., Ostendorf, M.: Expected dependency pair match: predicting translation quality with expected syntactic structure. Machine Translation 23, 169–179 (2009)
Article Google Scholar
Wong, B., Kit, C.: Atec: automatic evaluation of machine translation via word choice and word order. Machine Translation 23, 141–155 (2009)
Article Google Scholar
Popović, M., Vilar, D., Avramidis, E., Burchardt, A.: Evaluation without references: Ibm1 scores as evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 99–103. Association for Computational Linguistics (2011)
Google Scholar
Parton, K., Tetreault, J., Madnani, N., Chodorow, M.: E-rating machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 108–115. Association for Computational Linguistics (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Alireza Mahmoudi, Heshaam Faili & Mohammad Hossein Dehghan
Dept of Computer and Information Science, Linköping University, Linköping, Sweden
Jalal Maleki

Authors

Alireza Mahmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Heshaam Faili
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hossein Dehghan
View author publications
You can also search for this author in PubMed Google Scholar
Jalal Maleki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad Autónoma del Estado de Hidalgo, Ciudad Universitaria,, Carretera Pachuca–Tulancingo km 4.5, Hidalgo, Mexico
Félix Castro
Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan Dios Bátiz s/n, Col. Nueva Industrial Vallejo, 07738, Mexico City, Mexico
Alexander Gelbukh
Tecnológico de Monterrey, Campus Estado de México,, Carretera Lago de Guadalupe Km 3.5, Atizapán de Zaragoza,, CP 52926, Estado de México, Mexico
Miguel González

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmoudi, A., Faili, H., Dehghan, M.H., Maleki, J. (2013). ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-45114-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics