Lexically Constrained Knowledge Distillation for Neural Machine Translation

Hideya Mino; Kazutaka Kinugawa; Hitoshi Ito; Isao Goto; Ichiro Yamada; Takenobu Tokunaga

doi:10.5715/jnlp.29.1082

Abstract

Knowledge distillation is a representative approach in neural machine translation (NMT) for compressing a large model into a lightweight one. This approach first trains a strong teacher model, and then forces a more compact student model to imitate the teacher. Although the key to successful knowledge distillation is constructing a stronger teacher model, the teacher model using state-of-the-art NMT may remain inadequate owing to translation errors. Accordingly, using an inadequate teacher model severely degrades the student model due to error propagation, especially regarding words important to sentence meaning. To mitigate the degradation problem, we propose a knowledge distillation method using a lexical constraint as privileged information for NMT. The proposed method trains a teacher model with a lexical constraint, a list of words automatically extracted from a target sentence in the training data. We configure the lexical constraint according to the importance of words and the fallibility of NMT. Models trained with our proposed method result in improved translation compared with those trained with a baseline method for English↔German and English↔Japanese translation tasks under the condition without ensemble decoding and beam-search decoding.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!