Skip to main content
Log in

Not by chance. Russian aspect in rule-based machine translation

Не случайно. Машинный перевод русского глагольного вида на основе правил

  • Published:
Russian Linguistics Aims and scope Submit manuscript

Abstract

The aim of this paper is twofold: it illustrates the benefits of rule-based instead of statistical machine translation, and it provides a starting point for the machine translation of the Russian aspect into English. Rule-based machine translation is still promising, from both a computational and theoretical point of view, because by implementing rules on the computer theoretical assumptions concerning linguistic structures can be verified and improved. This will be shown using the example of the category of aspect, which is one of the main challenges for machine translation from Russian to English. A small corpus study on the translation of Russian sentences with verbs in the past tense (perfective and imperfective) by human translators shows that three-quarters of Russian verbs (both imperfective and perfective) are translated by English simple past forms. While this results from language internal markedness relations, the translation of the remaining 25 % requires an in-depth analysis of the various interpretations possible for the Russian aspect. We propose a semantic analysis based on which rules for the interpretation and translation of Russian aspect in a machine translation system can be derived. Their implementation in the machine translation system ĖTAP is shown in this paper using two test cases as examples.

Аннотация

Цель этой статьи двояка: она иллюстрирует пользу машинного перевода на основе правил по сравнению с машинным переводом на основе статистики и предлагает отправной пункт для машинного перевода русского вида глагола на английский язык. Машинный перевод на основе правил всё ещё имеет свои выгоды, и с вычислительной, и с теоретической точки зрения, поскольку, применив правила на компьютере, теоретические гипотезы, касающиеся лингвистических структур, будут проверены и улучшены. Мы это покажем на примере вида глагола, который является одной из главных сложностей для машинного перевода с русского на английский язык. Исследуя часть параллельного корпуса русского национального корпуса, мы изучаем, как русские предложения с глаголами в прошедшем времени переводятся на английский язык переводчиками-людьми. Эти исследования показывают, что три четверти русских глаголов (как несовершенного, так и совершенного вида) этого корпуса переводятся английскими формами past simple (претерит). В то время как это представляет собой следствие внутренних языковых отношений маркированности, перевод остальных 25 % требует глубокого анализа различных возможностей интерпретации русского аспекта. На основе семантического анализа, который мы предложим, можно получить правила для трактовки и перевода русского аспекта в системе машинного перевода. Их применение в системе машинного перевода (в этом случае ЭТАП) продемонстрировано в данной статье на двух примерах.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. Graham, Baldwin, Moffat, and Zobel (2014) examined the quality of about 40 MT systems for seven language pairs. For the translation of news articles from English to Spanish they note the following results for the best SMT system: fluency of the translated texts was evaluated as almost 72 % by human evaluators (rating the text by how much they agreed that “the text is fluent Spanish” (ibid., p. 445) on a 100-point scale); adequacy was evaluated as about 67 % (rating the text by how much the human evaluators agreed that it “adequately expresses the meaning” (ibid., p. 447) of the translation by a human translator). The improvement between 2007 and 2012 amounted to about 12 percentage points. The difficulties involved in measuring the quality of the results of machine translation, in particular ĖTAP, are pointed out by Apresjan et al. (1989, p. 11). Referring to Kulagina (1979), Apresjan et al. (1989, p. 11) cite the following three criteria—similar to ‘fluency’ and ‘adequacy’ listed by Graham et al. (2014), but different in detail—, that all have to be evaluated by an expert (in the end: a human): 1) degree of match in terms of content, 2) comprehensibility, 3) grammatical correctness. While criterion 3) seems to be measurable on largely objective grounds, 1) and 2) crucially rely on native speakers’ intuitions.

  2. Cf. “Naša cel’—ustanovit’ istinu ‘Our purpose is to establish the truth’ ” (Iomdin 2003, p. 255).

  3. Apresjan et al. (1989, p. 11) point out the difficulties related to aspect in MT from English to Russian. These problems result mainly from the fact that it is not possible to ‘synthesise’ the adequate Russian aspect form based on the verbal form in the English original, since English—according to Apresjan et al.—does not have an aspect category. Even though this lack of an aspect category is disputed in the present paper, Apresjan et al. are right in stating that there is no 1:1-correspondence between the English and the Russian system. That is, lexical semantic features of the verbal forms as well as context have to be taken into account. The same holds for translations from Russian into English, the topic of the present paper.

  4. http://www.ruscorpora.ru/search-para-en.html.

  5. By referring to the shimmering as such and by referring to the visual perception of something shimmering, respectively, the Russian original and the English translation deliver different descriptions of reality.

  6. This example thus illustrates the ‘secondary deictic’ nature of aspect pointed out by Padučeva (2006).

  7. These nine translations are different from example (2), which exceeds by far language-internal rules and is a free paraphrase.

  8. The question as to the difference between ipf and pf present perfect and pluperfect readings is beyond the scope of the present paper.

  9. Obviously, not all of these SMT translations using simple past were in accordance with the translations by human translators in ruscorpora. Only 64 out of the 84 translations of ipf verb forms were translated in accordance with ruscorpora (cf. the 77 translations with simple past in ruscorpora); i.e. in 20 cases Google used simple past where human translators did not, and in 13 cases Google did not use simple past where human translators did. Concerning translations of pf verbs, the figures are: 66 out of the 87 Google translations were in accordance with ruscorpora (cf. the 73 translations with simple past in ruscorpora); i.e. in 21 cases Google used simple past where human translators did not, and in 7 cases Google did not use simple past where human translators did. The correctness of Google translations that are not in accordance with translations by humans must still be evaluated by native speakers.

  10. As an average just about 25 % of translations should be concerned.

  11. The differentiation of syntactic and semantic features in ĖTAP is mainly due to technical reasons.

  12. Cf. Table 1. For a classification of predicates cf. Apresjan (2006); his classification includes 17 classes, some of which exclude certain disambiguation possibilities and/or make others highly probable.

  13. This syntactic feature had to be introduced to ĖTAP because a similar already existing feature could not be used for our purposes.

  14. This line is necessary for sentences in which the predicate is given by a support verb construction, cf. (17) below.

  15. This is a simplified example from Bendixen et al. (2005–2012).

  16. Two further lexemes of the preposition po in ĖTAP govern the accusative and the prepositional case, resp.

  17. This dictionary lists 20 lexemes of the preposition po that govern the dative case. In the course of developing the rules presented in this paper po17 proved to be the lexeme that fits best here, other than assumed in Sonnenhauser and Zangenfeind (2013) and Zangenfeind and Sonnenhauser (2014), where po16 was used.

  18. The sentence in (17) is an example where this line is necessary because of the support verb construction.

  19. This line checks e.g.—if the preposition in question is po1 of ĖTAP—whether the preposition po corresponds to po17 of the Slovar’ russkogo jazyka (1983). For further cases adverbial expressions in the form of simple adverbs with the syntactic feature ‘regularity’ would have to be also considered. For simplicity’s sake this has not been done yet.

References

  • Apresjan, Ju. D. (2006). Fundamental’naja klassifikacija predikatov. In Ju. D. Apresjan (Ed.), Jazykovaja kartina mira i sistemnaja leksikografija (pp. 75–110). Moskva.

    Google Scholar 

  • Apresjan, Ju. D., et al. (1989). Lingvističeskoe obespečenie sistemy ĖTAP-2. Moskva.

    Google Scholar 

  • Bendixen, B., et al. (2005–2012). Russisch aktuell. Wiesbaden.

  • Bickel, B. (1996). Aspect, mood, and time in Belhare: studies in the semantics-pragmantics interface of a Himalayan language. Zürich.

    Google Scholar 

  • Dobrovol’skij, D. O., Kretov, A. A., & Šarov, S. A. (2005). Korpus parallel’nyx tekstov: arxitektura i vozmožnosti ispol’zovanija. In Nacional’nyj korpus russkogo jazyka: 2003–2005 (pp. 263–296). Moskva. Retrieved from http://ruscorpora.ru/sbornik2005/17dobrovolsky.pdf (1 March 2016).

    Google Scholar 

  • Graham, Y., Baldwin, T., Moffat, A., & Zobel, J. (2014). Is machine translation getting better over time? In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL-14). Gothenburg, Sweden, April 26–30, 2014 (pp. 443–451). Gothenburg.

    Google Scholar 

  • Iomdin, L. L. (2003). Purpose and idea: a lesson drawn from machine translation. In MTT 2003. First International Conference on Meaning-Text Theory. Paris, Ecole Normale Superieure (pp. 269–278). Paris.

    Google Scholar 

  • Iomdin, L. (2008). A few lessons learned from rule-based machine translation. In G. Gross & K. U. Schulz (Eds.), Linguistics, computer science and language processing. Festschrift for Franz Guenthner on the occasion of his 60th birthday (Tributes, 6, pp. 171–187). London.

    Google Scholar 

  • Klein, W. (1995). A time-relational analysis of Russian aspect. Language, 71(4), 669–695.

    Article  Google Scholar 

  • Kulagina, O. S. (1979). Issledovanija po mašinnomu perevodu. Moskva.

    Google Scholar 

  • Mel’čuk, I. (2004). Verbes supports sans peine. Lingvisticæ investigationes, 27(2), 203–217.

    Google Scholar 

  • Padučeva, E. V. (1992). Towards the problem of translating grammatical meanings. Meta: Journal des traducteurs, 37(1), 113–126.

    Article  Google Scholar 

  • Padučeva, E. V. (2006). Nabljudatel’: tipologija i vozmožnye traktovki. In N. I. Laufer, A. S. Narin’jani, & V. P. Selegej (Eds.), Komp’juternaja lingvistika i intellektual’nye texnologii: Trudy Meždunarodnoj konferencii “Dialog 2006”. Bekasovo, 31 maja–4 ijunja 2006 g. Moskva. Retrieved from http://www.dialog-21.ru/digests/dialog2006/materials/html/Paducheva.htm (1 March 2016).

    Google Scholar 

  • Slovar’ russkogo jazyka (1983): Evgen’eva, A. P. (Ed.) (1983). Slovar’ russkogo jazyka (Vol. III). Moskva.

  • Sonnenhauser, B. (2006). Yet there’s method in it. Semantics, pragmatics, and the interpretation of the Russian imperfective aspect. München.

    Google Scholar 

  • Sonnenhauser, B., & Zangenfeind, R. (2013). Towards machine translation of Russian aspect. In V. Apresjan, B. Iomdin, & E. Ageeva (Eds.), Proceedings of the 6th International Conference on Meaning-Text Theory. Prague, August 30–31, 2013 (pp. 192–201). Retrieved from http://meaningtext.net/mtt2013/proceedings_MTT13.pdf (1 March 2016).

    Google Scholar 

  • Zangenfeind, R., & Sonnenhauser, B. (2014). Russian verbal aspect and machine translation. In V. P. Selegej et al. (Eds.), Komp’juternaja lingvistika i intellektual’nye texnologii: Po materialam ežegodnoj Meždunarodnoj konferencii “Dialog”. Bekasovo, 4–8 ijunja 2014 g. Vyp. 13(20). Moskva. Retrieved from http://www.dialog-21.ru/digests/dialog2014/materials/pdf/ZangenfeindRSonnenhauserB.pdf (1 March 2016).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Sonnenhauser.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sonnenhauser, B., Zangenfeind, R. Not by chance. Russian aspect in rule-based machine translation. Russ Linguist 40, 199–213 (2016). https://doi.org/10.1007/s11185-016-9169-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11185-016-9169-6

Keywords

Navigation