BREF, a large vocabulary spoken corpus for French

Larnel, Lori F.; Gauvain, Jean-Luc; Eskenazi, Maxine

doi:10.21437/Eurospeech.1991-126

BREF, a large vocabulary spoken corpus for French

Lori F. Larnel, Jean-Luc Gauvain, Maxine Eskenazi

This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speaker-independent), and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min. ) of speech.

doi: 10.21437/Eurospeech.1991-126

Cite as: Larnel, L.F., Gauvain, J.-L., Eskenazi, M. (1991) BREF, a large vocabulary spoken corpus for French. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 505-508, doi: 10.21437/Eurospeech.1991-126

@inproceedings{larnel91_eurospeech,
  author={Lori F. Larnel and Jean-Luc Gauvain and Maxine Eskenazi},
  title={{BREF, a large vocabulary spoken corpus for French}},
  year=1991,
  booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)},
  pages={505--508},
  doi={10.21437/Eurospeech.1991-126}
}