A unified probabilistic generative framework for extractive spoken document summarization

Chen, Yi-Ting; Chiu, Hsuan-Sheng; Wang, Hsin-Min; Chen, Berlin

doi:10.21437/Interspeech.2007-723

A unified probabilistic generative framework for extractive spoken document summarization

Yi-Ting Chen, Hsuan-Sheng Chiu, Hsin-Min Wang, Berlin Chen

In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different matching strategies, i.e., literal term matching and concept matching, were extensively investigated. We explored the use of the hidden Markov model (HMM) and relevance model (RM) for literal term matching, while the word topical mixture model (WTMM) for concept matching. On the other hand, the confidence scores, structural features, and a set of prosodic features were properly incorporated together using the whole sentence maximum entropy model (WSME) for the estimation of the sentence prior probability. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.

doi: 10.21437/Interspeech.2007-723

Cite as: Chen, Y.-T., Chiu, H.-S., Wang, H.-M., Chen, B. (2007) A unified probabilistic generative framework for extractive spoken document summarization. Proc. Interspeech 2007, 2805-2808, doi: 10.21437/Interspeech.2007-723

@inproceedings{chen07d_interspeech,
  author={Yi-Ting Chen and Hsuan-Sheng Chiu and Hsin-Min Wang and Berlin Chen},
  title={{A unified probabilistic generative framework for extractive spoken document summarization}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2805--2808},
  doi={10.21437/Interspeech.2007-723}
}