Abstract
In this research paper we show results from our experiments on creating a binary classifier for stammering identification in Hindi speech data. We train several Sequential CNN models with parametric adjustments such as color, image size, and training data shape changes to tweak classification performance. Our experimental pipeline converts speech samples into spectrograms using Librosa, and trains the Sequential CNN classifier on the image data using TensorFlow Lite. Our classification models achieve more than 95% accuracy in this classification task.
Similar content being viewed by others
Data availability
This journal article forms a continuing segment of an ongoing doctoral project. Upon the culmination of the thesis to avoid data breaches and uphold intellectual property safeguards, the researchers plan to make the research dataset available (https://shivamdwivedi.com/resources) upon reasonable requests.
References
Ambrose, N. G., Cox, N. J., & Yairi, E. (1997). The genetic basis of persistence and recovery in stuttering. Journal of Speech, Language, and Hearing Research. https://doi.org/10.1044/jslhr.4003.567
Asadi, B., & Jiang, H. (2020). On approximation capabilities of ReLU activation and Softmax output layer in neural networks. arXiv Preprint. https://doi.org/10.48550/ARXIV.2002.04060
Audacity, T. (2017). Audacity. The name Audacity (R) is a registered trademark of Dominic Mazzoni. Retrieved from http://audacity.sourceforge.net
Barrett, L., Hu, J., & Howell, P. (2022). Systematic review of machine learning approaches for detecting developmental stuttering. IEEE/ACM Transactions on Audio Speech and Language Processing. https://doi.org/10.1109/TASLP.2022.3155295
Chollet, F. (2015). Keras: The Python Deep Learning library. www.keras.io
Clark, L., Cowan, B. R., Roper, A., Lindsay, S., & Sheers, O. (2020). Speech diversity and speech interfaces: Considering an inclusive future through stammering. ACM International Conference Proceeding Series. https://doi.org/10.1145/3405755.3406139
Craig, A., & Tran, Y. (2006). Fear of speaking: Chronic anxiety and stammering. Advances in Psychiatric Treatment. https://doi.org/10.1192/apt.12.1.63
Dwivedi, S., Ghosh, S., & Dwivedi, S. (2021). Developing Hindi stammering corpus: Framework and insights. SN Computer Science, 3(1), 39. https://doi.org/10.1007/s42979-021-00891-3
Howell, P. (2011a). Listen to the lessons of the King’s speech. Nature. https://doi.org/10.1038/470007a
Howell, P. (2011b). Recovery from stuttering. Psychology Press. https://doi.org/10.4324/9780203847404
Howell, P., & Huckvale, M. (2004). Facilities to assist people to research into stammered speech. Stammering Research: An on-Line Journal Published by the British Stammering Association, 1(2), 130–242.
Howell, P., Sackin, S., & Glenn, K. (1997). Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers. Journal of Speech, Language, and Hearing Research. https://doi.org/10.1044/jslhr.4005.1085
Kachru, Y. (2008). Hindi–Urdu–Hindustani. In Language in South Asia. Cambridge University Press. https://doi.org/10.1017/CBO9780511619069.006
Kourkounakis, T., Hajavi, A., & Etemad, A. (2020). Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory. In ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings, 2020-May. https://doi.org/10.1109/ICASSP40776.2020.9053893
McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in science conference. https://doi.org/10.25080/majora-7b98e3ed-003
Mitra, V., Huang, Z., Lea, C., Tooley, L., Wu, S., Botten, D., Palekar, A., Thelapurath, S., Georgiou, P., Kajarekar, S., & Bigham, J. (2021). Analysis and tuning of a voice assistant system for dysfluent speech. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH (Vol. 4). https://doi.org/10.21437/Interspeech.2021-2006
Morreale, S. P., Osborn, M. M., & Pearson, J. C. (2000). Why communication is important: A rationale for the centrality of the study of communication. Journal of the Association for Communication Administration, 29(1), 1–25.
Pruett, D. G., Shaw, D. M., Chen, H. H., Petty, L. E., Polikowsky, H. G., Kraft, S. J., Jones, R. M., & Below, J. E. (2021). Identifying developmental stuttering and associated comorbidities in electronic health records and creating a phenome risk classifier. Journal of Fluency Disorders. https://doi.org/10.1016/j.jfludis.2021.105847
Sayago, S., Neves, B. B., & Cowan, B. R. (2019). Voice assistants and older people: Some open issues. ACM International Conference Proceeding Series. https://doi.org/10.1145/3342775.3342803
Sheikh, S. A., Sahidullah, M., Hirsch, F., & Ouni, S. (2021). StutterNet: Stuttering detection using time delay neural network. In European signal processing conference, 2021-August. https://doi.org/10.23919/EUSIPCO54536.2021.9616063
Acknowledgements
First and foremost, we wish to express our profound gratitude to our research subjects. Their dedication and active involvement not only made the data collection drive a success but also enriched this research with invaluable speech data. Their unwavering support has been the bedrock upon which this work stands. Equally essential to the completion of this work was the guidance of Dr. Anil Thakur. His insights and direction have been instrumental in shaping our research. Additionally, we are deeply indebted to Dr. Sukomal Pal, whose discerning critiques and constructive feedback have been invaluable in refining our approach and processes. To all of you who have been part of this journey with us, we extend our heartfelt thanks.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclosures
The recruitment of research participants for this study was based on a voluntary basis. It’s important to note that none of the participants received monetary compensation for their involvement. in tandem with this, the research project garnered no external funding, with all research-related expenses being borne by the authors themselves. the authors assert their absence of conflict of interest pertaining to this research endeavor. it is further confirmed that external entities had no involvement in the study's design, data collection, analysis, interpretation of results, or the decision to publish. the presented study findings stand as a product solely derived from the authors' collected and analyzed data, maintaining independence from any external influences.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dwivedi, S., Ghosh, S. & Dwivedi, S. Binary classifier for identification of stammering instances in Hindi speech data. Int J Speech Technol 26, 765–774 (2023). https://doi.org/10.1007/s10772-023-10046-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-023-10046-9