Spin-Image Descriptors for Text-Independent Speaker Recognition

Mohammed, Suhaila N.; Jabir, Adnan J.; Abbas, Zaid Ali

doi:10.1007/978-3-030-33582-3_21

Suhaila N. Mohammed¹⁷,
Adnan J. Jabir¹⁷ &
Zaid Ali Abbas¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1073))

Included in the following conference series:

International Conference of Reliable Information and Communication Technology

1559 Accesses
1 Citations

Abstract

Building a system to identify individuals through their speech recording can find its application in diverse areas, such as telephone shopping, voice mail and security control. However, building such systems is a tricky task because of the vast range of differences in the human voice. Thus, selecting strong features becomes very crucial for the recognition system. Therefore, a speaker recognition system based on new spin-image descriptors (SISR) is proposed in this paper. In the proposed system, circular windows (spins) are extracted from the frequency domain of the spectrogram image of the sound, and then a run length matrix is built for each spin, to work as a base for feature extraction tasks. Five different descriptors are generated from the run length matrix within each spin and the final feature vector is then used to populate a deep belief network for classification purpose. The proposed SISR system is evaluated using the English language Speech Database for Speaker Recognition (ELSDSR) database. The experimental results were achieved with 96.46 accuracy; showing that the proposed SISR system outperforms those reported in the related current research work in terms of recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kekre, H.B., Kulkarni, V., Gaikar, P., Gupta, N.: Speaker identification using spectrograms of varying frame sizes. Int. J. Comput. Appl. 50(20), 27–33 (2012)
Google Scholar
Dhakal, P., Damacharla, P., Javaid, A.Y.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)
Article Google Scholar
Chauhan, T., Soni, H., Zafar, S.: A review of automatic speaker recognition system. Int. J. Soft Comput. Eng. (IJSCE) 3(4), 132–135 (2013)
Google Scholar
Fandrianto, A., Jin, A., Neelappa, A.: Speaker recognition using deep belief networks. [CS 229] Fall, 14 December 2012. http://cs229.stanford.edu/proj2012/JinFandriantoNeelappa-SpeakerRecognitionUsingDeepBeliefNetworks.pdf. Accessed 20 Apr 2019
Dennis, J., Dat, T.H., Li, H.: Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011)
Article Google Scholar
Neammalai, P, Phimoltares, S., Lursinsap, C.: Speech and music classification using hybrid form of spectrogram and fourier transformation. In: APSIPA (2014)
Google Scholar
Nguyen, Q.T., Bui, T.D.: Speech classification using SIFT features on spectrogram images. Vietnam J. Comput. Sci. 3, 247–257 (2016)
Article Google Scholar
Radionov, A., Aliev, V., Shvets, A.A.: Deep learning approaches for understanding simple speech commands. arXiv:1810.02364v1 [cs.SD] (2018)
Saady, M.R., El-Borey, H., El-Dahshan, E.S.A., Yahia, S.: Stand-alone intelligent voice recognition system. J. Signal Inf. Process. 5(04), 70–75 (2014)
Google Scholar
Bora, A., Vajpai, J., Sanjay, G.: Speaker identification for biometric access control using hybrid features. Int. J. Comput. Sci. Eng. (IJCSE) 9(11), 666–673 (2017)
Google Scholar
Soleymanpour, H.M.M.: Text-independent speaker identification based on selection of the most similar feature vectors. Int. J. Speech Technol. 20, 99–108 (2017)
Article Google Scholar
Padmaja, J.N., Rao, R.R.: A comparative study of silence and non silence regions of speech signal using prosody features for emotion recognition. Indian J. Comput. Sci. Eng. (IJCSE) 7(4), 153–161 (2016)
Google Scholar
Umbaugh, S.E.: Digital Image Processing and Analysis. CRC Press, London (2010)
MATH Google Scholar
Makandar, A., Halalli, B.: Image enhancement techniques using highpass and lowpass filters. Int. J. Comput. Appl. 109(14), 12–15 (2015)
Google Scholar
Farina, A.: Methods. Springer, Dordrecht (2014)
Google Scholar
Baraa, A.K., Abdullah, N.A.Z., Abood, Q.K.: Hand written signature verification based on geometric and grid features. Iraqi J. Sci. 56(2C), 1799–1809 (2015)
Google Scholar
Patel, V., Mistree, K.: A review on different image interpolation techniques for image enhancement. Int. J. Emerg. Technol. Adv. Eng. 3(12), 129–133 (2013)
Google Scholar
Goshtasby, A.A.: Image Registration Principles Tools and Methods. Springer, London (2012)
Book Google Scholar
Bondarenko, A., Borisov, A.: Research on the classification ability of deep belief networks on small and medium datasets. Inf. Technol. Manag. Sci. 16, 60–65 (2013)
Google Scholar
Pezeshki, M., Gholami, S.: Distinction between features extracted using deep belief networks, pp. 1–4. arXiv:1312.6157v2 [cs.LG] (2014)
English language speech database for speaker recognition (ELSDSR). http://www.imm.dtu.dk/~lfen/elsdsr/index.php?page=index. Accessed 20 Mar 2019

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq
Suhaila N. Mohammed & Adnan J. Jabir
Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, 43300, Seri Kembangan, Selangor, Malaysia
Zaid Ali Abbas

Authors

Suhaila N. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Adnan J. Jabir
View author publications
You can also search for this author in PubMed Google Scholar
Zaid Ali Abbas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suhaila N. Mohammed .

Editor information

Editors and Affiliations

College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
Faisal Saeed
School of Computing, Universiti Utara Malaysia (UUM), Sintok, Kedah Darul Aman, Malaysia
Fathey Mohammed
Management of Information Systems Department College of Business Administration, Taibah University, Yanbu, Saudi Arabia
Nadhmi Gazem

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohammed, S.N., Jabir, A.J., Abbas, Z.A. (2020). Spin-Image Descriptors for Text-Independent Speaker Recognition. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-33582-3_21
Published: 02 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33581-6
Online ISBN: 978-3-030-33582-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics