ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

Pelloin, Valentin; Dary, Franck; Hervé, Nicolas; Favre, Benoit; Camelin, Nathalie; LAURENT, Antoine; Besacier, Laurent

doi:10.21437/Interspeech.2022-352

ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

Valentin Pelloin, Franck Dary, Nicolas Hervé, Benoit Favre, Nathalie Camelin, Antoine LAURENT, Laurent Besacier

We aim at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of di- verse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch. New models (FlauBERT-Oral) are shared with the community and evaluated for 3 downstream tasks: spoken language understanding, classification of TV shows and speech syntactic parsing. Results show that FlauBERT-Oral can be beneficial compared to its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-generated text can be used to build spoken language models.

doi: 10.21437/Interspeech.2022-352

Cite as: Pelloin, V., Dary, F., Hervé, N., Favre, B., Camelin, N., LAURENT, A., Besacier, L. (2022) ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks. Proc. Interspeech 2022, 3453-3457, doi: 10.21437/Interspeech.2022-352

@inproceedings{pelloin22_interspeech,
  author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier},
  title={{ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3453--3457},
  doi={10.21437/Interspeech.2022-352},
  issn={2308-457X}
}