Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines

Akita, Yuya; Saikou, Masahiro; Nanjo, Hiroaki; Kawahara, Tatsuya

doi:10.21437/Interspeech.2006-333

Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines

Yuya Akita, Masahiro Saikou, Hiroaki Nanjo, Tatsuya Kawahara

This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On the other hand, SVM is adopted to realize robust classification against a wide variety of expressions and speech recognition errors. Detection is performed by an SVM-based text chunker using lexical and pause information as features. We evaluated these approaches on manual and automatic transcription of spontaneous lectures and speeches, and achieved F-measures of 0.85 and 0.78, respectively.

doi: 10.21437/Interspeech.2006-333

Cite as: Akita, Y., Saikou, M., Nanjo, H., Kawahara, T. (2006) Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines. Proc. Interspeech 2006, paper 1370-Tue2A2O.4, doi: 10.21437/Interspeech.2006-333

@inproceedings{akita06_interspeech,
  author={Yuya Akita and Masahiro Saikou and Hiroaki Nanjo and Tatsuya Kawahara},
  title={{Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1370-Tue2A2O.4},
  doi={10.21437/Interspeech.2006-333}
}