Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Lexically Constrained Knowledge Distillation for Neural Machine Translation
Hideya MinoKazutaka KinugawaHitoshi ItoIsao GotoIchiro YamadaTakenobu Tokunaga
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 4 Pages 1082-1105

Details
Abstract

Knowledge distillation is a representative approach in neural machine translation (NMT) for compressing a large model into a lightweight one. This approach first trains a strong teacher model, and then forces a more compact student model to imitate the teacher. Although the key to successful knowledge distillation is constructing a stronger teacher model, the teacher model using state-of-the-art NMT may remain inadequate owing to translation errors. Accordingly, using an inadequate teacher model severely degrades the student model due to error propagation, especially regarding words important to sentence meaning. To mitigate the degradation problem, we propose a knowledge distillation method using a lexical constraint as privileged information for NMT. The proposed method trains a teacher model with a lexical constraint, a list of words automatically extracted from a target sentence in the training data. We configure the lexical constraint according to the importance of words and the fallibility of NMT. Models trained with our proposed method result in improved translation compared with those trained with a baseline method for English↔German and English↔Japanese translation tasks under the condition without ensemble decoding and beam-search decoding.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top