Abstract
Speaker Identification(SI) has numerous applications in real world. Traditional classifiers like Gaussian Mixture Models (GMM), Support Vector Machine (SVM), and Hidden Markov Models (HMM) were used earlier for SI. Features like Mel Frequency Cepstral Coefficient (MFCC), and Gammatone Frequency Cepstral Coefficients (GFCC) need to be generated first. But these approaches do not perform well when audio data captured through multiple devices and recorded in different environments, i.e., in mismatch condition. Whereas Machine Learning (ML) algorithms usually provide better accuracy, and hence became more popular. Restricted Boltzmann Machine(RBM), Long-Short-Term Memory (LSTM), and Convolutional neural network (CNN) are some of the ML approaches applied on SI. In this paper, CNN is used for automatic feature extraction and speaker classification on IITG-MV noisy dataset. CNN performs better than GMM, specially for device mismatch case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: An ASR System Using MFCC and VQ/GMM with Emphasis on Environmental Dependency (2018)
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent Engineering Informatics, pp. 337–346. Springer (2018)
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Proc. Lett. 13(5), 308–311 (2006)
Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014)
Ghahabi, O., Hernando, J.: Restricted boltzmann machines for vector representation of speech in speaker recognition. Comput. Speech Lang. 47, 16–29 (2018)
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., et al.: Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018)
Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S.: Multi-variability speech database for robust speaker recognition. In: 2011 National Conference on Communications (NCC), pp. 1–5. IEEE (2011)
Jumelle, M., Sakmeche, T.: Speaker clustering with neural networks and audio processing, arXiv preprint arXiv:1803.08276 (2018)
Madikeri, S., Bourlard, H.: KL-HMM based speaker diarization system for meetings. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4435–4439. IEEE (2015)
McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: Audio and Music Signal Analysis in Python (2015)
Acknowledgements
This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II, and UPE-II projects of Govt. of India. Subhadip Basu is partially supported by the Research Award (F.30-31/2016(SA-II)) from UGC, Government of India. This work is also supported by the project sponsored by SERB (Government of India, order no. SB/S3/EECE/054/2016) (dated 25/11/2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chakraborty, T., Barai, B., Chatterjee, B., Das, N., Basu, S., Nasipuri, M. (2020). Closed-Set Device-Independent Speaker Identification Using CNN. In: Bhateja, V., Satapathy, S., Zhang, YD., Aradhya, V. (eds) Intelligent Computing and Communication. ICICC 2019. Advances in Intelligent Systems and Computing, vol 1034. Springer, Singapore. https://doi.org/10.1007/978-981-15-1084-7_28
Download citation
DOI: https://doi.org/10.1007/978-981-15-1084-7_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1083-0
Online ISBN: 978-981-15-1084-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)