ABSTRACT

Languages differ systematically from each other in many fundamental ways, the two most familiar categories being grammar and syntax on the one hand, and vocabulary structure and distribution on the other. Mastery of translation technologies is now a key professional requirement, and they are developing at an unprecedented rate. The vast and growing amounts of digital content being created, much of it time-sensitive, means that there is far more material to be translated than there are human translators available to do the work. Before the appearance of the digital computer, translation by human translators was the only available option for overcoming language barriers. Rule-based machine translation (RBMT) remained the dominant paradigm until Nagao proposed that MT should make use of the growing repositories of parallel corpora, previous translations that are stored digitally alongside their source texts as in a translation memory (TM), automatically searching for phrases and sections of text from legacy translations to reassemble.