Skip to main content

Searching for Poor Quality Machine Translated Text: Learning the Difference between Human Writing and Machine Translations

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7310))

Included in the following conference series:

Abstract

As machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98. We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Helft, M.: Googles Computing Power Refines Translation Tool. In: New York Times (March 8, 2010), A1, Retrieved from http://www.nytimes.com/2010/03/09/technology/09translate.html?nl=technology&emc=techupdateema1

  2. Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text. Literary and Linguistic Computing 21(3), 259–274 (2006)

    Article  Google Scholar 

  3. Kurokawa, D., Goutte, C., Isabelle, P.: Automatic detection of translated text and its impact on machine translation. In: MT Summit XII: Proceedings of the Twelfth Machine Translation Summit, Ottawa, Ontario, Canada, August 26-30, pp. 81–88 (2009)

    Google Scholar 

  4. Gellerstam, M.: Translationese in Swedish Novels Translated from English. In: Wollin, L., Lindquist, H. (eds.) Translation Studies in Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT) II, Lund, June 14-15, pp. 88–95 (1985)

    Google Scholar 

  5. Santos, D.: On the use of parallel texts in the comparison on languages. Actas do XI Encontro da Associação Portuguesa de Linguística, Lisboa, 2-4 de Outubro de 1995, 217–239 (1995)

    Google Scholar 

  6. Santos, D.: On grammatical translationese. In: Koskenniemi, K. (ed.) Short Papers Presented at the Tenth Scandinavian Conference on Computational Linguistics, Helsinki, pp. 29–30 (1995)

    Google Scholar 

  7. Koppel, M., Ordan, N.: Translationese and Its Dialects. In: Proceedings of ACL, Portland OR, pp. 1318–1326 (June 2011)

    Google Scholar 

  8. Carpuat, M.: One Translation per Discourse. In: Agirre, E., Márquez, L., Wicentowski, R. (eds.) SEW-2009 Semantic Evaluations: Recent Achievements and Future Directions, pp. 19–27 (2009)

    Google Scholar 

  9. Lembersky, G., Ordan, N., Wintner, S.: Language models for machine translation: original vs. translated texts. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 363–374 (2011)

    Google Scholar 

  10. Ilisei, I., Inkpen, D.: Translationese Traits in Romanian Newspapers: A Machine Learning Approach. In: Gelbukh, A. (ed.) International Journal of Computational Linguistics and Applications (2011) (in press)

    Google Scholar 

  11. Ilisei, I., Inkpen, D., Pastor, G.C., Mitkov, R.: Identification of Translationese: A Machine Learning Approach. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 503–511. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Popescu, M.: Studying Translationese at the Character Level. In: Proceedings of Recent Advances in Natural Language Processing, pp. 634–639 (2011)

    Google Scholar 

  13. Uchimoto, K., Hayashida, N., Ishida, T., Isahara, H.: Automatic detection and semi-automatic revision of non-machine-translatable parts of a sentence. In: LREC-2006: Fifth International Conference on Language Resources and Evaluation. Proceedings, Genoa, Italy, May 22-28, pp. 703–708 (2006)

    Google Scholar 

  14. Russell, G.: Automatic detection of translation errors: the TransCheck system. In: Translating and the Computer 27: Proceedings of the Twenty-Seventh International Conference on Translating and the Computer, London, 17, November 24-25, Aslib, London (2005)

    Google Scholar 

  15. Melamed, D.: Automatic detection of omissions in translations. In: Coling 1996: The 16th International Conference on Computational Linguistics: Proceedings, Center for Sprogteknologi, Copenhagen, August 5-9, pp. 764–769 (1996)

    Google Scholar 

  16. Somers, H., Gaspari, F., Niño, A.: Detecting inappropriate use of free online machine translation by language students. A special case of plagiarism detection. In: EAMT-2006: 11th Annual Conference of the European Association for Machine Translation, Oslo, Norway, June 19-20, pp. 41–48 (2006)

    Google Scholar 

  17. Germann, U. (ed.): Aligned Hansards of the 36th Parliament of Canada Release 2001-1a (2001), Retrieved from http://www.isi.edu/natural-language/download/hansard/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carter, D., Inkpen, D. (2012). Searching for Poor Quality Machine Translated Text: Learning the Difference between Human Writing and Machine Translations. In: Kosseim, L., Inkpen, D. (eds) Advances in Artificial Intelligence. Canadian AI 2012. Lecture Notes in Computer Science(), vol 7310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30353-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30353-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30352-4

  • Online ISBN: 978-3-642-30353-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics