The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (L-CRF) for punctuation prediction on conversational speech texts. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sen-tence boundary and punctuation prediction task on TDT3 En-glish broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using L-CRF or maximum entropy model (MaxEnt). We show the im-portance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introduc-ing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant per-formance penalty. Our results show that adding prosodic and n-gram score features gives ~20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.
Index Terms: punctuation, dynamic conditional random fields, sentence boundary detection
Cite as: Wang, X., Ng, H.T., Sim, K.C. (2012) Dynamic conditional random fields for joint sentence boundary and punctuation prediction. Proc. Interspeech 2012, 1384-1387, doi: 10.21437/Interspeech.2012-398
@inproceedings{wang12h_interspeech, author={Xuancong Wang and Hwee Tou Ng and Khe Chai Sim}, title={{Dynamic conditional random fields for joint sentence boundary and punctuation prediction}}, year=2012, booktitle={Proc. Interspeech 2012}, pages={1384--1387}, doi={10.21437/Interspeech.2012-398} }