Searching for Poor Quality Machine Translated Text: Learning the Difference between Human Writing and Machine Translations

Carter, Dave; Inkpen, Diana

doi:10.1007/978-3-642-30353-1_5

Dave Carter^21,22 &
Diana Inkpen²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7310))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1843 Accesses
2 Citations

Abstract

As machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98. We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Helft, M.: Googles Computing Power Refines Translation Tool. In: New York Times (March 8, 2010), A1, Retrieved from http://www.nytimes.com/2010/03/09/technology/09translate.html?nl=technology&emc=techupdateema1
Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text. Literary and Linguistic Computing 21(3), 259–274 (2006)
Article Google Scholar
Kurokawa, D., Goutte, C., Isabelle, P.: Automatic detection of translated text and its impact on machine translation. In: MT Summit XII: Proceedings of the Twelfth Machine Translation Summit, Ottawa, Ontario, Canada, August 26-30, pp. 81–88 (2009)
Google Scholar
Gellerstam, M.: Translationese in Swedish Novels Translated from English. In: Wollin, L., Lindquist, H. (eds.) Translation Studies in Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT) II, Lund, June 14-15, pp. 88–95 (1985)
Google Scholar
Santos, D.: On the use of parallel texts in the comparison on languages. Actas do XI Encontro da Associação Portuguesa de Linguística, Lisboa, 2-4 de Outubro de 1995, 217–239 (1995)
Google Scholar
Santos, D.: On grammatical translationese. In: Koskenniemi, K. (ed.) Short Papers Presented at the Tenth Scandinavian Conference on Computational Linguistics, Helsinki, pp. 29–30 (1995)
Google Scholar
Koppel, M., Ordan, N.: Translationese and Its Dialects. In: Proceedings of ACL, Portland OR, pp. 1318–1326 (June 2011)
Google Scholar
Carpuat, M.: One Translation per Discourse. In: Agirre, E., Márquez, L., Wicentowski, R. (eds.) SEW-2009 Semantic Evaluations: Recent Achievements and Future Directions, pp. 19–27 (2009)
Google Scholar
Lembersky, G., Ordan, N., Wintner, S.: Language models for machine translation: original vs. translated texts. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 363–374 (2011)
Google Scholar
Ilisei, I., Inkpen, D.: Translationese Traits in Romanian Newspapers: A Machine Learning Approach. In: Gelbukh, A. (ed.) International Journal of Computational Linguistics and Applications (2011) (in press)
Google Scholar
Ilisei, I., Inkpen, D., Pastor, G.C., Mitkov, R.: Identification of Translationese: A Machine Learning Approach. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 503–511. Springer, Heidelberg (2010)
Chapter Google Scholar
Popescu, M.: Studying Translationese at the Character Level. In: Proceedings of Recent Advances in Natural Language Processing, pp. 634–639 (2011)
Google Scholar
Uchimoto, K., Hayashida, N., Ishida, T., Isahara, H.: Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. In: LREC-2006: Fifth International Conference on Language Resources and Evaluation. Proceedings, Genoa, Italy, May 22-28, pp. 703–708 (2006)
Google Scholar
Russell, G.: Automatic detection of translation errors: the TransCheck system. In: Translating and the Computer 27: Proceedings of the Twenty-Seventh International Conference on Translating and the Computer, London, 17, November 24-25, Aslib, London (2005)
Google Scholar
Melamed, D.: Automatic detection of omissions in translations. In: Coling 1996: The 16th International Conference on Computational Linguistics: Proceedings, Center for Sprogteknologi, Copenhagen, August 5-9, pp. 764–769 (1996)
Google Scholar
Somers, H., Gaspari, F., Niño, A.: Detecting inappropriate use of free online machine translation by language students. A special case of plagiarism detection. In: EAMT-2006: 11th Annual Conference of the European Association for Machine Translation, Oslo, Norway, June 19-20, pp. 41–48 (2006)
Google Scholar
Germann, U. (ed.): Aligned Hansards of the 36th Parliament of Canada Release 2001-1a (2001), Retrieved from http://www.isi.edu/natural-language/download/hansard/

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada
Dave Carter & Diana Inkpen
Institute for Information Technology, National Research Council Canada, Canada
Dave Carter

Authors

Dave Carter
View author publications
You can also search for this author in PubMed Google Scholar
Diana Inkpen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Computer Science, Department of Computer Science and Software Engineering, Concordia University, H3G 1M8, Montreal, QC, Canada
Leila Kosseim
Faculty of Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, K1N 6N5, Ottawa, ON, Canada
Diana Inkpen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carter, D., Inkpen, D. (2012). Searching for Poor Quality Machine Translated Text: Learning the Difference between Human Writing and Machine Translations. In: Kosseim, L., Inkpen, D. (eds) Advances in Artificial Intelligence. Canadian AI 2012. Lecture Notes in Computer Science(), vol 7310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30353-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-30353-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30352-4
Online ISBN: 978-3-642-30353-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics