Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions

Chen, Qian; Ling, Zhen-Hua; Yang, Chen-Yu; Dai, Li-Rong

doi:10.21437/Interspeech.2015-367

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions

Qian Chen, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai

This paper presents an automatic phrase boundary labeling method for speech synthesis database annotation using context-dependent hidden Markov models (CD-HMMs) and n-gram prior distributions. At training stage, CD-HMMs are built to describe the conditional distribution of acoustic features given phonetic label and phrase boundary. In addition, n-gram models are estimated to represent the prior distributions of the phrase boundaries to be predicted. At decoding stage, the CD-HMMs and n-gram models are combined to predict the phrase boundaries by Viterbi decoding under maximum a posteriori (MAP) criterion. In our experiments, the proposed method utilizing context-dependent bigram prior distributions improved the F-score of phrase boundary labeling from 72.2% to 79.6% on the Boston University Radio News Corpus (BURNC), and from 69.6% to 81.0% on the Blizzard Challenge 2007 database respectively, comparing with the method using only acoustic models.

doi: 10.21437/Interspeech.2015-367

Cite as: Chen, Q., Ling, Z.-H., Yang, C.-Y., Dai, L.-R. (2015) Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions. Proc. Interspeech 2015, 1581-1585, doi: 10.21437/Interspeech.2015-367

@inproceedings{chen15i_interspeech,
  author={Qian Chen and Zhen-Hua Ling and Chen-Yu Yang and Li-Rong Dai},
  title={{Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={1581--1585},
  doi={10.21437/Interspeech.2015-367}
}