This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On the other hand, SVM is adopted to realize robust classification against a wide variety of expressions and speech recognition errors. Detection is performed by an SVM-based text chunker using lexical and pause information as features. We evaluated these approaches on manual and automatic transcription of spontaneous lectures and speeches, and achieved F-measures of 0.85 and 0.78, respectively.
Cite as: Akita, Y., Saikou, M., Nanjo, H., Kawahara, T. (2006) Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines. Proc. Interspeech 2006, paper 1370-Tue2A2O.4, doi: 10.21437/Interspeech.2006-333
@inproceedings{akita06_interspeech, author={Yuya Akita and Masahiro Saikou and Hiroaki Nanjo and Tatsuya Kawahara}, title={{Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1370-Tue2A2O.4}, doi={10.21437/Interspeech.2006-333} }