ISCA Archive ICSLP 1990
ISCA Archive ICSLP 1990

Design considerations and text selection for BREF, a large French read-speech corpus

Jean-Luc Gauvain, Lori F. Lamel, Maxine Eskenazi

BREF, a large read-speech corpus in French has been designed with several aims: to provide enough speech data to develop dictation machines, to provide data for evaluation of continuous speech recognition systems (both speaker-dependent and speaker-independent), and to provide a corpus of continuous speech to study phonological variations. This paper presents some of the design considerations of BREF, focusing on the text analysis and the selection of text materials. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with an emphasis on maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. The goal is to obtain about 10,000 words (approximately 60-70 min.) of speech from each of 100 speakers, from different French dialects.


doi: 10.21437/ICSLP.1990-287

Cite as: Gauvain, J.-L., Lamel, L.F., Eskenazi, M. (1990) Design considerations and text selection for BREF, a large French read-speech corpus. Proc. First International Conference on Spoken Language Processing (ICSLP 1990), 1097-1100, doi: 10.21437/ICSLP.1990-287

@inproceedings{gauvain90_icslp,
  author={Jean-Luc Gauvain and Lori F. Lamel and Maxine Eskenazi},
  title={{Design considerations and text selection for BREF, a large French read-speech corpus}},
  year=1990,
  booktitle={Proc. First International Conference on Spoken Language Processing (ICSLP 1990)},
  pages={1097--1100},
  doi={10.21437/ICSLP.1990-287}
}