人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
言語・韻律情報及び対話履歴を用いたLSTMベースのターンテイキング推定
劉 超然石井 カルロス石黒 浩
著者情報
ジャーナル フリー

2019 年 34 巻 2 号 p. C-I65_1-9

詳細
抄録

A natural conversation involves rapid exchanges of turns while talking. Taking turns at appropriate timing or intervals is a requisite feature for a dialog system as a conversation partner. We propose a Recurrent Neural Network (RNN) based model that takes the current utterance and the dialog history as its input to classify utterances into turn-taking related classes and estimates the turn-taking timing. The dialog history is represented by a sequence of speaker-specified joint embedding of lexical and prosodic contents. To this end, we trained a neural network to embed the lexical and the prosodic contents into a joint embedding space. To learn meaningful embedding spaces, the prosodic feature sequence from each single utterance is mapped into a fixed-dimensional space using RNN and combined with utterance lexical embedding. These joint embeddings are then shifted to different parts of embedding spaces according to the speakers. Finally, the speaker-specified joint embeddings are used as the input of our proposed model. We tested this model on a spontaneous conversation dataset and confirmed that it outperformed conventional models that use lexical/prosodic features and dialog history without speaker information.

著者関連情報
© 人工知能学会 2019
前の記事 次の記事
feedback
Top