Reducing Offensive Replies in Open Domain Dialogue Systems

Uchida, Naokazu; Homma, Takeshi; Iwayama, Makoto; Sogawa, Yasuhiro

doi:10.21437/Interspeech.2022-200

Reducing Offensive Replies in Open Domain Dialogue Systems

Naokazu Uchida, Takeshi Homma, Makoto Iwayama, Yasuhiro Sogawa

In recent years, a series of open-domain dialogue systems using large-scale language models have been proposed. These dialogue systems are attracting business attention because these do significantly natural and diverse dialogues with humans. However, it has been noted that these dialogue systems reflect gender, race, and other biases inherent in the data and may generate offensive replies or replies that agree with offensive utterances. This study examined a dialogue system that outputs appropriate replies to offensive utterances. Specifically, our system incorporates multiple dialogue models, each of which is specialized to suppress offensive replies in a specific category, then selects the most non-offensive reply from the outputs of the models. We evaluated the utility of our system when suppressing offensive replies of DialoGPT. We confirmed ours reduces the offensive replies to less than 1%, whereas one of the state-of-the-art suppressing methods reduces to 9.8%.

doi: 10.21437/Interspeech.2022-200

Cite as: Uchida, N., Homma, T., Iwayama, M., Sogawa, Y. (2022) Reducing Offensive Replies in Open Domain Dialogue Systems. Proc. Interspeech 2022, 1076-1080, doi: 10.21437/Interspeech.2022-200

@inproceedings{uchida22_interspeech,
  author={Naokazu Uchida and Takeshi Homma and Makoto Iwayama and Yasuhiro Sogawa},
  title={{Reducing Offensive Replies in Open Domain Dialogue Systems}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={1076--1080},
  doi={10.21437/Interspeech.2022-200}
}