Skip to main content

Closed-Set Device-Independent Speaker Identification Using CNN

  • Conference paper
  • First Online:
Intelligent Computing and Communication (ICICC 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1034))

Included in the following conference series:

Abstract

Speaker Identification(SI) has numerous applications in real world. Traditional classifiers like Gaussian Mixture Models (GMM), Support Vector Machine (SVM), and Hidden Markov Models (HMM) were used earlier for SI. Features like Mel Frequency Cepstral Coefficient (MFCC), and Gammatone Frequency Cepstral Coefficients (GFCC) need to be generated first. But these approaches do not perform well when audio data captured through multiple devices and recorded in different environments, i.e., in mismatch condition. Whereas Machine Learning (ML) algorithms usually provide better accuracy, and hence became more popular. Restricted Boltzmann Machine(RBM), Long-Short-Term Memory (LSTM), and Convolutional neural network (CNN) are some of the ML approaches applied on SI. In this paper, CNN is used for automatic feature extraction and speaker classification on IITG-MV noisy dataset. CNN performs better than GMM, specially for device mismatch case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: An ASR System Using MFCC and VQ/GMM with Emphasis on Environmental Dependency (2018)

    Google Scholar 

  2. Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent Engineering Informatics, pp. 337–346. Springer (2018)

    Google Scholar 

  3. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Proc. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  4. Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014)

    Google Scholar 

  5. Ghahabi, O., Hernando, J.: Restricted boltzmann machines for vector representation of speech in speaker recognition. Comput. Speech Lang. 47, 16–29 (2018)

    Article  Google Scholar 

  6. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., et al.: Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018)

    Article  Google Scholar 

  7. Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S.: Multi-variability speech database for robust speaker recognition. In: 2011 National Conference on Communications (NCC), pp. 1–5. IEEE (2011)

    Google Scholar 

  8. Jumelle, M., Sakmeche, T.: Speaker clustering with neural networks and audio processing, arXiv preprint arXiv:1803.08276 (2018)

  9. Madikeri, S., Bourlard, H.: KL-HMM based speaker diarization system for meetings. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4435–4439. IEEE (2015)

    Google Scholar 

  10. McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: Audio and Music Signal Analysis in Python (2015)

    Google Scholar 

Download references

Acknowledgements

This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II, and UPE-II projects of Govt. of India. Subhadip Basu is partially supported by the Research Award (F.30-31/2016(SA-II)) from UGC, Government of India. This work is also supported by the project sponsored by SERB (Government of India, order no. SB/S3/EECE/054/2016) (dated 25/11/2016)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tapas Chakraborty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakraborty, T., Barai, B., Chatterjee, B., Das, N., Basu, S., Nasipuri, M. (2020). Closed-Set Device-Independent Speaker Identification Using CNN. In: Bhateja, V., Satapathy, S., Zhang, YD., Aradhya, V. (eds) Intelligent Computing and Communication. ICICC 2019. Advances in Intelligent Systems and Computing, vol 1034. Springer, Singapore. https://doi.org/10.1007/978-981-15-1084-7_28

Download citation

Publish with us

Policies and ethics