ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Dynamic conditional random fields for joint sentence boundary and punctuation prediction

Xuancong Wang, Hwee Tou Ng, Khe Chai Sim

The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (L-CRF) for punctuation prediction on conversational speech texts. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sen-tence boundary and punctuation prediction task on TDT3 En-glish broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using L-CRF or maximum entropy model (MaxEnt). We show the im-portance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introduc-ing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant per-formance penalty. Our results show that adding prosodic and n-gram score features gives ~20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.

Index Terms: punctuation, dynamic conditional random fields, sentence boundary detection


doi: 10.21437/Interspeech.2012-398

Cite as: Wang, X., Ng, H.T., Sim, K.C. (2012) Dynamic conditional random fields for joint sentence boundary and punctuation prediction. Proc. Interspeech 2012, 1384-1387, doi: 10.21437/Interspeech.2012-398

@inproceedings{wang12h_interspeech,
  author={Xuancong Wang and Hwee Tou Ng and Khe Chai Sim},
  title={{Dynamic conditional random fields for joint sentence boundary and punctuation prediction}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={1384--1387},
  doi={10.21437/Interspeech.2012-398}
}