Automatic formatted transcripts for videos

Pappu, Aasish; Stent, Amanda

doi:10.21437/Interspeech.2015-542

Automatic formatted transcripts for videos

Aasish Pappu, Amanda Stent

Multimedia content may be supplemented with time-aligned closed captions for accessibility. Often these captions are created manually by professional editors — an expensive and time-consuming process. In this paper, we present a novel approach to automatic creation of a well-formatted, readable transcript for a video from closed captions or ASR output. Our approach uses acoustic and lexical features extracted from the video and the raw transcription/caption files. We compare our approach with two standard baselines: a) silence segmented transcripts and b) text-only segmented transcripts. We show that our approach outperforms both these baselines based on subjective and objective metrics.

doi: 10.21437/Interspeech.2015-542

Cite as: Pappu, A., Stent, A. (2015) Automatic formatted transcripts for videos. Proc. Interspeech 2015, 2514-2518, doi: 10.21437/Interspeech.2015-542

@inproceedings{pappu15_interspeech,
  author={Aasish Pappu and Amanda Stent},
  title={{Automatic formatted transcripts for videos}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2514--2518},
  doi={10.21437/Interspeech.2015-542}
}