ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection

Xiaodan Zhu, Xuming He, Cosmin Munteanu, Gerald Penn

This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17.41% on manual transcripts and 27.37% on automatic transcripts.


doi: 10.21437/Interspeech.2008-606

Cite as: Zhu, X., He, X., Munteanu, C., Penn, G. (2008) Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection. Proc. Interspeech 2008, 2443-2445, doi: 10.21437/Interspeech.2008-606

@inproceedings{zhu08c_interspeech,
  author={Xiaodan Zhu and Xuming He and Cosmin Munteanu and Gerald Penn},
  title={{Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2443--2445},
  doi={10.21437/Interspeech.2008-606}
}