Abstract
In the paper, we present a software pipeline for speech recognition to automate the creation of training datasets, based on desired unlabeled audios, for low resource languages and domain-specific area. Considering the commoditizing of speech recognition, more teams build domain-specific models as well as models for local languages. At the same time, lack of training datasets for low to middle resource languages significantly decreases possibilities to exploit last achievements and frameworks in the Speech Recognition area and limits the wide range of software engineers to work on speech recognition problems. This problem is even more critical for domain-specific datasets. The pipeline was tested for building Ukrainian language recognition and confirmed that the created design is adaptable to different data source formats and expandable to integrate with existing frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Santosh, D.T.: A combined approach for effective features extraction from online product reviews. Int. J. Educ. Manag. Eng. 8(1), 11–21 (2018). https://doi.org/10.5815/ijeme.2018.01.02
Khodadi, I., Abadeh, M.S.: A memetic-based approach for web-based question answering. Int. J. Inf. Technol. Comput. Sci. 6(9), 39–45 (2014). https://doi.org/10.5815/ijitcs.2014.09.05
Protim Ghosh, P., Shahariar, R., Hossain Khan, M.A.: A rule based extractive text summarization technique for Bangla news documents. Int. J. Mod. Educ. Comput. Sci. 10(12), 44–53 (2018). https://doi.org/10.5815/ijmecs.2018.12.06
Jain, H.A.: Web based application for sentiment analysis. Int. J. Educ. Manag. Eng. 7(1), 25–35 (2017). https://doi.org/10.5815/ijeme.2017.01.03
Kokare, R., Wanjale, K.: A natural language query builder interface for structured databases using dependency parsing. Int. J. Math. Sci. Comput. 1(4), 11–20 (2015). https://doi.org/10.5815/ijmsc.2015.04.02
Bais, H., Machkour, M., Koutti, L.: A model of a generic natural language interface for querying database. Int. J. Intell. Syst. Appl. 8(2), 35–44 (2016). https://doi.org/10.5815/ijisa.2016.02.05
Pylypenko, V., Lyudovyk, T.: Automatic recognition of mixed Ukrainian-Russian speech. In: Conference on Language Technologies for All (2019). https://doi.org/10.13140/RG.2.2.22119.60320
Baevski, A., et al.: A framework for self-supervised learning of speech representations, pp. 1–19 (2020). https://arxiv.org/abs/2006.11477. (Preprint)
Ma, J., Schwartz, R.: Unsupervised versus supervised training of acoustic models. In: 9th Annual Conference of the International Speech Communication Association, pp. 1–4 (2008)
Liao, H., McDermott, E., Senior, A.: Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription. Autom. Speech Recogn. Underst. 368–373 (2013). https://doi.org/10.1109/asru.2013.6707758
Iosifova, O., at el.: Techniques comparison for natural language processing. Mod. Mach. Learn. Technol. Data Sci. 2631, 57–67 (2020). https://doi.org/10.5281/zenodo.3895815
Iosifov, I., Iosifova, O., Sokolov, V.: Sentence segmentation from unformatted text using language modeling and sequence labeling approaches. In: Conference on Problems of Infocommunications, Science and Technology, pp. 581–585 (2020). (Preprint)
TensorFlow: The Functional API. https://www.tensorflow.org/guide/keras/functional. Accessed 23 Oct 2020
Apache Spark: ML Pipelines. https://spark.apache.org/docs/latest/ml-pipeline.html. Accessed 23 Oct 2020
Apache Airflow: DAGs. https://airflow.apache.org/docs/stable/concepts.html. Accessed 23 Oct 2020
Modin: Scale your pandas workflow by changing a single line of code. https://modin.readthedocs.io/en/latest/. Accessed 23 Oct 2020
Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: Workshop on Spoken Language Technologies for Under-Resourced, pp. 1–6 (2014)
Lyudovyk, T.V., Pylypenko, V.V.: Bilingual speech recognition without preliminary language identification, pp. 12–34 (2016). (Publication in Russian)
Vasileva, N., at el.: Corpus of Ukrainian on-air speech. Speech Technol. 2, 12–21 (2012). (Publication in Russian)
Meyer, J.: Open speech corpora. https://github.com/JRMeyer/open-speech-corpora. Accessed 23 Oct 2020
Xu, Q., at el.: Iterative pseudo-labeling for speech recognition, pp. 1–13 (2020). https://arxiv.org/abs/2005.09267. (Preprint)
Acknowledgments
This scientific work was partially supported by RAMECS and self-determined research funds of CCNU from the colleges’ primary research and operation of MOE (CCNU19TS022). The research team is grateful to Ender Turing OÜ for defining the business problem, comments, corrections, inspiration, and computational resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Romanovskyi, O., Iosifov, I., Iosifova, O., Sokolov, V., Kipchuk, F., Sukaylo, I. (2021). Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-80472-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80471-8
Online ISBN: 978-3-030-80472-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)