Abstract
This paper presents a new method for automatic speaker recognition. The principle is to split the whole spectral domain into partial frequency subbands on which recognizers are independently applied and then recombined to yield a global recognition decision. In this article, we particularly discuss the selection of the most critical subbands for the speaker recognition task and the choice of an optimal division of the frequency domain.
Speaker recognition experiments are conducted on different subbands for a 630 population on TIMIT using second-order statistical methods. Large differences in identification between subbands are observed. In particular, the low-frequency subbands (under 600Hz) and the high-frequency subbands (over 2000Hz) are more speaker specific than middle-frequency ones. An appropriate selection of the most critical subbands shows that very good performances are still obtained with only half of the frequency domain.
Finally experiments on different subband system architectures show that the correlations between frequency channels are of prime importance for the speaker recognition task. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently efficient recombination procedures need to be investigated to perform enhanced speaker identification results.
Preview
Unable to display preview. Download preview PDF.
References
BIMBOT, F., MAGRIN-CHAGNOLLEAU, Y., MATHAN, L., Second-order statistical methods for text-independent speaker Identification. Speech Communication, n∘17(1–2), pp 177–192, August 1995.
BONASTRE, J.F., MELONI, H., Inter and intra-speaker variability of french phonemes; advantages of an explicit knowledge based approach. In Workshop on Automatic Speaker Recognition, pp 157–160, April 1994. Martigny (Switzerland).
BOURLARD, H., DUPONT, S., A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings ICSLP, October 1996. Philadelphia, USA.
FISHER, W., ZUE, V., BERNSTEIN, J., PALLET, D., An acoustic-phonetic database. JASA, suppl. A, Vol. 81(S92). 1986.
HOLLIEN, H., The acoustics of crime. Applied Psycholinguistics and Communication Disorders 1990. Plenum Press: New-York & London. 370p.
MATSUI, T., FURUI, S., Text-independent speaker recognition using vocal tract and pitch information. In Proceedings ICSLP 90, pp 137–140, 1990.
NOLAN, F., The phonetic bases of speaker recognition. CUP 1983. Cambridge.
REYNOLDS, D.A., Speaker Identification and verification using gaussian mixture models. In Workshop on Automatic Speaker Recognition and Verification, pp 27–30, April 1994. Martigny (Switzerland).
SAMBUR, M.R., Selection of acoustic features for speaker identification. In IEEE Transactions on ASSP. n∘23(2), pp 176–182, April 1975.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Besacier, L., Bonastre, JF. (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015996
Download citation
DOI: https://doi.org/10.1007/BFb0015996
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive