Abstract
In this research, the application of automatic speech recognition system in taxi call services is investigated. In comparison with traditional query handling systems such as live agents, Interactive Voice Response systems, type-base websites and mobile applications, the newest trend of artificial intelligence - speech recognition can be applied to make conversations in more natural way. For developing, training and testing of the system, Kaldi and CMUSphinx open-source speech recognition tools were utilized. Approximately 4 h of speech data in Azerbaijani have been processed for both tools. Testing has been accomplished in two ways; one of which is recognizing dataset from unknown speakers, and the other one is recognizing shuffled dataset. During these tests, variance and speed were investigated, along with accuracy. Kaldi showed accuracy between 97.3 and 99.6 with variance changing between 0.03 and 4.8. On the other hand, CMUSphinx attained accuracy between 95.6 and 97.8 with variance values of 0.2 and 3.8 in relatively less training time. Accomplished results were compared and used to define appropriate parameters for investigated models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Matarneh, R., Maksymova, S., Lyashenko, V.V., Belova, N.V.: Speech recognition systems: a comparative review. IOSR J. Comput. Eng. 19(5), 71–79 (2017). https://www.researchgate.net/publication/320673436_Speech_Recognition_Systems_A_Comparative_Review
Saon, G., et al.: English conversational telephone speech recognition by humans and machines, March 2017. https://arxiv.org/pdf/1703.02136v1.pdf
Gaida, C., Lange, P., Petrick, R., Proba, P., Malatawy, A., Suendermann-Oeft, D.: Comparing open-source speech recognition toolkits. In: 11th International Workshop on Natural Language Processing and Cognitive Science (2014). http://suendermann.com/su/pdf/oasis2014.pdf
Ravanelli, M., Parcollet, T., Bengio, Y.: The pytorch-kaldi speech recognition toolkit, February 2019. https://arxiv.org/pdf/1811.07453v2.pdf
Parthasarathi, S.H.K., Strom, N.: Lessons from building acoustic models with a million hours of speech, April 2019. https://arxiv.org/pdf/1904.01624.pdf
Wang, Q., et al.: VoiceFilter: targeted voice separation by speaker-conditioned spectrogram masking, February 2019. https://arxiv.org/pdf/1810.04826v4.pdf
Schatz, T., Feldman, N.H.: Neural network vs. HMM speech recognition systems as models of human cross-linguistic phonetic perception. In: 2018 Conference on Cognitive Computational Neuroscience (2018). http://thomas.schatz.cogserver.net/wp-content/uploads/2018/11/Schatz2018b.pdf
Fukuda, T., et al.: Data augmentation improves recognition of foreign accented speech. Interspeech, September 2018. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1211.pdf
Jain, A., Upreti, M., Jyothi, P.: Improved accented speech recognition using accent embeddings and multi-task learning. Interspeech, September 2018. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1864.pdf
Ragni, A., Upreti, M., Gales, M.J.F.: Automatic speech recognition system development in the “Wild”. Interspeech, September 2018. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1085.pdf
Rustamov, S., Gasimov, E., Hasanov, R., Jahangirli, S., Mustafayev, E., Usikov, D.: Speech recognition in flight simulator. aegean international textile and advanced engineering conference. IOP Conf. Ser. Mater. Sci. Eng. 459 (2018). https://iopscience.iop.org/article/10.1088/1757-899X/459/1/012005/pdf
Forsyth, A.: Taxi Company Adopts Speech Recognition Technology. Computerworld, 26 October 2000
Forsyth, A.: Taxi fleet bets on speech recognition, Computerworld, 18 May 2001
Malcolm, A.: Cab firm books speech recognition system, Computerworld, 17 May 2001
Aida-Zade, K., Ardil, C., Rustamov, S.: Investigation of combined use of MFCC and LPC Features in Speech Recognition Systems. World Acad. Sci. Eng. Technol. Int. J. Comput. Inf. Eng. 1, 2647–2653 (2007)
Aida-Zade, K., Rustamov, S.: The principles of construction of the azerbaijan speech recognition system. In: The 2nd International Conference “Problems of Cybernetics and Informatics”, pp. 183–186 (2008)
Aida-Zade, K., Rustamov, S., Mustafayev, E.: Principles of construction of speech recognition system by the example of azerbaijan language. In: International Symposium on Innovations in Intelligent Systems and Applications, pp. 378–382 (2009)
Ayda-zade, K., Rustamov, S.: Research of cepstral coefficients for azerbaijan speech recognition system. Trans. Azerbaijan Natl. Acad. Sci. Inform. Control. Probl. 3, 89–94 (2005)
Aida-zade, K., Xocayev, A., Rustamov, S.: Speech recognition using support vector machines. In: 10th IEEE International Conference on Application of Information and Communication Technologies, AICT 2016 (2016)
Juang, B.H., Lawrence, R.: Automatic Speech Recognition - A Brief History of the Technology Development, January 2005
Jurafsky, D., Martin, J.H.: Automatic speech recognition. In: Speech and Language Processing, pp. 285–291. Pearson Education (2008)
Emms, S.: Best Free Linux Speech Recognition Tools – Open Source Software, LinuxLinks 3 March 2018
Thompson, C.: Open Source Toolkits for Speech Recognition, Silicon Valley Data Science, 23 February 2017
Kaldi ASR. kaldi-asr.org. Accessed 16 April 2019
CMUSphinx Open Source Speech Recognition. cmusphinx.github.io. Accessed 16 Apr 2019
Bagiyev, A., Gurbanli, K., Mammadova, N., Nuriyeva, S.: Development of limited-vocabulary ASR for Azerbaijani. ACM Celebration of Women in Computing womENcourage 2018, October 2018. https://womencourage.acm.org/2018/wp-content/uploads/2018/07/womENcourage_2018_paper_26.pdf
Acknowledgment
This work has been carried out in Center for Data Analytics Research at ADA University and in Research and Development Laboratory at ATL Tech.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Rustamov, S., Akhundova, N., Valizada, A. (2019). Automatic Speech Recognition in Taxi Call Service Systems. In: Miraz, M., Excell, P., Ware, A., Soomro, S., Ali, M. (eds) Emerging Technologies in Computing. iCETiC 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 285. Springer, Cham. https://doi.org/10.1007/978-3-030-23943-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-23943-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23942-8
Online ISBN: 978-3-030-23943-5
eBook Packages: Computer ScienceComputer Science (R0)