Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition

Romanovskyi, O.; Iosifov, I.; Iosifova, O.; Sokolov, V.; Kipchuk, F.; Sukaylo, I.

doi:10.1007/978-3-030-80472-5_3

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 83))

Included in the following conference series:

International Conference on Computer Science, Engineering and Education Applications

343 Accesses
3 Citations
1 Altmetric

Abstract

In the paper, we present a software pipeline for speech recognition to automate the creation of training datasets, based on desired unlabeled audios, for low resource languages and domain-specific area. Considering the commoditizing of speech recognition, more teams build domain-specific models as well as models for local languages. At the same time, lack of training datasets for low to middle resource languages significantly decreases possibilities to exploit last achievements and frameworks in the Speech Recognition area and limits the wide range of software engineers to work on speech recognition problems. This problem is even more critical for domain-specific datasets. The pipeline was tested for building Ukrainian language recognition and confirmed that the created design is adaptable to different data source formats and expandable to integrate with existing frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Santosh, D.T.: A combined approach for effective features extraction from online product reviews. Int. J. Educ. Manag. Eng. 8(1), 11–21 (2018). https://doi.org/10.5815/ijeme.2018.01.02
Article Google Scholar
Khodadi, I., Abadeh, M.S.: A memetic-based approach for web-based question answering. Int. J. Inf. Technol. Comput. Sci. 6(9), 39–45 (2014). https://doi.org/10.5815/ijitcs.2014.09.05
Article Google Scholar
Protim Ghosh, P., Shahariar, R., Hossain Khan, M.A.: A rule based extractive text summarization technique for Bangla news documents. Int. J. Mod. Educ. Comput. Sci. 10(12), 44–53 (2018). https://doi.org/10.5815/ijmecs.2018.12.06
Article Google Scholar
Jain, H.A.: Web based application for sentiment analysis. Int. J. Educ. Manag. Eng. 7(1), 25–35 (2017). https://doi.org/10.5815/ijeme.2017.01.03
Article Google Scholar
Kokare, R., Wanjale, K.: A natural language query builder interface for structured databases using dependency parsing. Int. J. Math. Sci. Comput. 1(4), 11–20 (2015). https://doi.org/10.5815/ijmsc.2015.04.02
Article Google Scholar
Bais, H., Machkour, M., Koutti, L.: A model of a generic natural language interface for querying database. Int. J. Intell. Syst. Appl. 8(2), 35–44 (2016). https://doi.org/10.5815/ijisa.2016.02.05
Article Google Scholar
Pylypenko, V., Lyudovyk, T.: Automatic recognition of mixed Ukrainian-Russian speech. In: Conference on Language Technologies for All (2019). https://doi.org/10.13140/RG.2.2.22119.60320
Baevski, A., et al.: A framework for self-supervised learning of speech representations, pp. 1–19 (2020). https://arxiv.org/abs/2006.11477. (Preprint)
Ma, J., Schwartz, R.: Unsupervised versus supervised training of acoustic models. In: 9th Annual Conference of the International Speech Communication Association, pp. 1–4 (2008)
Google Scholar
Liao, H., McDermott, E., Senior, A.: Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription. Autom. Speech Recogn. Underst. 368–373 (2013). https://doi.org/10.1109/asru.2013.6707758
Iosifova, O., at el.: Techniques comparison for natural language processing. Mod. Mach. Learn. Technol. Data Sci. 2631, 57–67 (2020). https://doi.org/10.5281/zenodo.3895815
Iosifov, I., Iosifova, O., Sokolov, V.: Sentence segmentation from unformatted text using language modeling and sequence labeling approaches. In: Conference on Problems of Infocommunications, Science and Technology, pp. 581–585 (2020). (Preprint)
Google Scholar
TensorFlow: The Functional API. https://www.tensorflow.org/guide/keras/functional. Accessed 23 Oct 2020
Apache Spark: ML Pipelines. https://spark.apache.org/docs/latest/ml-pipeline.html. Accessed 23 Oct 2020
Apache Airflow: DAGs. https://airflow.apache.org/docs/stable/concepts.html. Accessed 23 Oct 2020
Modin: Scale your pandas workflow by changing a single line of code. https://modin.readthedocs.io/en/latest/. Accessed 23 Oct 2020
Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: Workshop on Spoken Language Technologies for Under-Resourced, pp. 1–6 (2014)
Google Scholar
Lyudovyk, T.V., Pylypenko, V.V.: Bilingual speech recognition without preliminary language identification, pp. 12–34 (2016). (Publication in Russian)
Google Scholar
Vasileva, N., at el.: Corpus of Ukrainian on-air speech. Speech Technol. 2, 12–21 (2012). (Publication in Russian)
Google Scholar
Meyer, J.: Open speech corpora. https://github.com/JRMeyer/open-speech-corpora. Accessed 23 Oct 2020
Xu, Q., at el.: Iterative pseudo-labeling for speech recognition, pp. 1–13 (2020). https://arxiv.org/abs/2005.09267. (Preprint)

Download references

Acknowledgments

This scientific work was partially supported by RAMECS and self-determined research funds of CCNU from the colleges’ primary research and operation of MOE (CCNU19TS022). The research team is grateful to Ender Turing OÜ for defining the business problem, comments, corrections, inspiration, and computational resources.

Author information

Authors and Affiliations

Ender Turing OÜ, Tallinn, Estonia
O. Romanovskyi, I. Iosifov & O. Iosifova
Borys Grinchenko Kyiv University, Kyiv, Ukraine
V. Sokolov, F. Kipchuk & I. Sukaylo

Authors

O. Romanovskyi
View author publications
You can also search for this author in PubMed Google Scholar
I. Iosifov
View author publications
You can also search for this author in PubMed Google Scholar
O. Iosifova
View author publications
You can also search for this author in PubMed Google Scholar
V. Sokolov
View author publications
You can also search for this author in PubMed Google Scholar
F. Kipchuk
View author publications
You can also search for this author in PubMed Google Scholar
I. Sukaylo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Sokolov .

Editor information

Editors and Affiliations

School of Educational Information Technology, Central China Normal University, Wuhan, China
Zhengbing Hu
Mechanical Engineering Research Institute of the Russian Academy of Sciences, Moscow, Russia
Sergey Petoukhov
Faculty of Applied Mathematics, National Technical University of Ukraine “Igor Sikorsky Kiev Polytechnic Institute”, Kiev, Ukraine
Ivan Dychka
Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Plantation, FL, USA
Matthew He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Romanovskyi, O., Iosifov, I., Iosifova, O., Sokolov, V., Kipchuk, F., Sukaylo, I. (2021). Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-80472-5_3
Published: 21 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80471-8
Online ISBN: 978-3-030-80472-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics