Skip to main content

Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition

  • Conference paper
  • First Online:
Advances in Computer Science for Engineering and Education IV (ICCSEEA 2021)

Abstract

In the paper, we present a software pipeline for speech recognition to automate the creation of training datasets, based on desired unlabeled audios, for low resource languages and domain-specific area. Considering the commoditizing of speech recognition, more teams build domain-specific models as well as models for local languages. At the same time, lack of training datasets for low to middle resource languages significantly decreases possibilities to exploit last achievements and frameworks in the Speech Recognition area and limits the wide range of software engineers to work on speech recognition problems. This problem is even more critical for domain-specific datasets. The pipeline was tested for building Ukrainian language recognition and confirmed that the created design is adaptable to different data source formats and expandable to integrate with existing frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Santosh, D.T.: A combined approach for effective features extraction from online product reviews. Int. J. Educ. Manag. Eng. 8(1), 11–21 (2018). https://doi.org/10.5815/ijeme.2018.01.02

    Article  Google Scholar 

  2. Khodadi, I., Abadeh, M.S.: A memetic-based approach for web-based question answering. Int. J. Inf. Technol. Comput. Sci. 6(9), 39–45 (2014). https://doi.org/10.5815/ijitcs.2014.09.05

    Article  Google Scholar 

  3. Protim Ghosh, P., Shahariar, R., Hossain Khan, M.A.: A rule based extractive text summarization technique for Bangla news documents. Int. J. Mod. Educ. Comput. Sci. 10(12), 44–53 (2018). https://doi.org/10.5815/ijmecs.2018.12.06

    Article  Google Scholar 

  4. Jain, H.A.: Web based application for sentiment analysis. Int. J. Educ. Manag. Eng. 7(1), 25–35 (2017). https://doi.org/10.5815/ijeme.2017.01.03

    Article  Google Scholar 

  5. Kokare, R., Wanjale, K.: A natural language query builder interface for structured databases using dependency parsing. Int. J. Math. Sci. Comput. 1(4), 11–20 (2015). https://doi.org/10.5815/ijmsc.2015.04.02

    Article  Google Scholar 

  6. Bais, H., Machkour, M., Koutti, L.: A model of a generic natural language interface for querying database. Int. J. Intell. Syst. Appl. 8(2), 35–44 (2016). https://doi.org/10.5815/ijisa.2016.02.05

    Article  Google Scholar 

  7. Pylypenko, V., Lyudovyk, T.: Automatic recognition of mixed Ukrainian-Russian speech. In: Conference on Language Technologies for All (2019). https://doi.org/10.13140/RG.2.2.22119.60320

  8. Baevski, A., et al.: A framework for self-supervised learning of speech representations, pp. 1–19 (2020). https://arxiv.org/abs/2006.11477. (Preprint)

  9. Ma, J., Schwartz, R.: Unsupervised versus supervised training of acoustic models. In: 9th Annual Conference of the International Speech Communication Association, pp. 1–4 (2008)

    Google Scholar 

  10. Liao, H., McDermott, E., Senior, A.: Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription. Autom. Speech Recogn. Underst. 368–373 (2013). https://doi.org/10.1109/asru.2013.6707758

  11. Iosifova, O., at el.: Techniques comparison for natural language processing. Mod. Mach. Learn. Technol. Data Sci. 2631, 57–67 (2020). https://doi.org/10.5281/zenodo.3895815

  12. Iosifov, I., Iosifova, O., Sokolov, V.: Sentence segmentation from unformatted text using language modeling and sequence labeling approaches. In: Conference on Problems of Infocommunications, Science and Technology, pp. 581–585 (2020). (Preprint)

    Google Scholar 

  13. TensorFlow: The Functional API. https://www.tensorflow.org/guide/keras/functional. Accessed 23 Oct 2020

  14. Apache Spark: ML Pipelines. https://spark.apache.org/docs/latest/ml-pipeline.html. Accessed 23 Oct 2020

  15. Apache Airflow: DAGs. https://airflow.apache.org/docs/stable/concepts.html. Accessed 23 Oct 2020

  16. Modin: Scale your pandas workflow by changing a single line of code. https://modin.readthedocs.io/en/latest/. Accessed 23 Oct 2020

  17. Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: Workshop on Spoken Language Technologies for Under-Resourced, pp. 1–6 (2014)

    Google Scholar 

  18. Lyudovyk, T.V., Pylypenko, V.V.: Bilingual speech recognition without preliminary language identification, pp. 12–34 (2016). (Publication in Russian)

    Google Scholar 

  19. Vasileva, N., at el.: Corpus of Ukrainian on-air speech. Speech Technol. 2, 12–21 (2012). (Publication in Russian)

    Google Scholar 

  20. Meyer, J.: Open speech corpora. https://github.com/JRMeyer/open-speech-corpora. Accessed 23 Oct 2020

  21. Xu, Q., at el.: Iterative pseudo-labeling for speech recognition, pp. 1–13 (2020). https://arxiv.org/abs/2005.09267. (Preprint)

Download references

Acknowledgments

This scientific work was partially supported by RAMECS and self-determined research funds of CCNU from the colleges’ primary research and operation of MOE (CCNU19TS022). The research team is grateful to Ender Turing OÜ for defining the business problem, comments, corrections, inspiration, and computational resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Sokolov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Romanovskyi, O., Iosifov, I., Iosifova, O., Sokolov, V., Kipchuk, F., Sukaylo, I. (2021). Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_3

Download citation

Publish with us

Policies and ethics