Enhanced Sound Recognition and Classification Through Spectrogram Analysis, MEMS Sensors, and PyTorch: A Comprehensive Approach

Spournias, Alexandros; Nanos, Nikolaos; Faliagka, Evanthia; Antonopoulos, Christos; Voros, Nikolaos; Keramidas, Giorgos

doi:10.1007/978-3-031-54521-4_1

Alexandros Spournias ORCID: orcid.org/0000-0001-6706-6635¹⁸,
Nikolaos Nanos¹⁸,
Evanthia Faliagka¹⁸,
Christos Antonopoulos¹⁸,
Nikolaos Voros¹⁸ &
…
Giorgos Keramidas¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 561))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

108 Accesses

Abstract

The importance of sound recognition and classification systems in various fields has led researchers to seek innovative methods to address these challenges. In this paper, the authors propose a concise yet effective approach for sound recognition and classification by combining spectrogram analysis, Micro-Electro-Mechanical Systems (MEMS) sensors, and the Pytorch deep learning framework. This method utilizes the rich information in audio signals to develop a robust and accurate sound recognition and classification system.

The authors outline a three-stage process: data acquisition, feature extraction, and classification. MEMS sensors are employed for data acquisition, offering advantages such as reduced noise, low power consumption, and enhanced sensitivity compared to traditional microphones. The acquired audio signals are then preprocessed and converted into spectrograms, visually representing the audio data’s frequency, amplitude, and temporal attributes.

During feature extraction, the spectrograms are analyzed to extract significant features conducive to sound recognition and classification. The classification task is performed using a custom deep learning model in Pytorch, leveraging modern neural networks’ pattern recognition capabilities. The model is trained and validated on a diverse dataset of audio samples, ensuring its proficiency in recognizing and classifying various sound types.

The experimental results demonstrate the effectiveness of the proposed method, surpassing existing techniques in sound recognition and classification performance. By integrating spectrogram analysis, MEMS sensors, and Pytorch, the authors present a compact yet powerful sound recognition system with potential applications in numerous domains, such as predictive maintenance, environmental monitoring, and personalized voice-controlled devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Acoustic Features for Environmental Sound Analysis

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

References

PyTorch 2.0: Deep learning framework. https://pytorch.org
Micro-Electro-Mechanical Systems. https://www.mems-exchange.org/MEMS/what-is.html
Spectrogram. https://en.wikipedia.org/wiki/Spectrogram
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980). https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River (1978). https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1456137
Fulop, S.A., Fitz, K.: A Spectrogram for the Twenty-First Century, January 2006. https://www.researchgate.net/publication/243716460_A_Spectrogram_for_the_Twenty-First_Century
Koickal, T.J., Hamilton, A., Tan, S.L., Covington, J.A., Gardner, J.W., Pearce, T.C.: Analog VLSI circuit implementation of an adaptive neuromorphic olfaction chip. Circuits Syst. (2007). shorturl.at/qNQW3
Google Scholar
Convolutional Neural Networks. https://en.wikipedia.org/wiki/Convolutional_neural_network
Recurrent Neural Networks. https://en.wikipedia.org/wiki/Recurrent_neural_network
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. https://doi.org/10.48550/arXiv.1207.0580
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning (2015). https://www.nature.com/articles/nature14539
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. https://papers.nips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. https://proceedings.neurips.cc/paper/2016/file/7dcd340d84f762eba80aa538b0c527f7-Paper.pdf
Zhang, Z., et al.: A framework for quantifying the impacts of sub-pixel reflectance variance and covariance on cloud optical thickness and effective radius retrievals based on the bi-spectral method (2017). https://aip.scitation.org/doi/abs/10.1063/1.4975502
Raspberry Pi 4. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
Raspberry Pi Pico. https://www.raspberrypi.com/products/raspberry-pi-pico/
Adafruit I2S MEMS Microphone Breakout - SPH0645LM4H. https://www.adafruit.com/product/3421
Audacity, Free, open source, cross-platform audio software. https://www.audacityteam.org
Short-Time Fourier Transform (STFT). https://www.dsprelated.com/freebooks/sasp/Short_Time_Fourier_Transform.html
Linear Chirp waves. https://en.wikipedia.org/wiki/Chirp
Softmax activation function. https://deepai.org/machine-learning-glossary-and-terms/softmax-layer
Cross Entropy Loss. https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

Download references

Acknowledgment

This work Funded by the European Union under the Grant Agreement No. 101087257. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, University of the Peloponnese, Patra, Greece
Alexandros Spournias, Nikolaos Nanos, Evanthia Faliagka, Christos Antonopoulos & Nikolaos Voros
School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Giorgos Keramidas

Authors

Alexandros Spournias
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Nanos
View author publications
You can also search for this author in PubMed Google Scholar
Evanthia Faliagka
View author publications
You can also search for this author in PubMed Google Scholar
Christos Antonopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Voros
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Keramidas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandros Spournias .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool, Suzhou, China
Xinheng Wang
University of Peloponnese, Patra, Greece
Nikolaos Voros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spournias, A., Nanos, N., Faliagka, E., Antonopoulos, C., Voros, N., Keramidas, G. (2024). Enhanced Sound Recognition and Classification Through Spectrogram Analysis, MEMS Sensors, and PyTorch: A Comprehensive Approach. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-54521-4_1
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54520-7
Online ISBN: 978-3-031-54521-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhanced Sound Recognition and Classification Through Spectrogram Analysis, MEMS Sensors, and PyTorch: A Comprehensive Approach

Abstract

Access this chapter

Similar content being viewed by others

Acoustic Features for Environmental Sound Analysis

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Enhanced Sound Recognition and Classification Through Spectrogram Analysis, MEMS Sensors, and PyTorch: A Comprehensive Approach

Abstract

Access this chapter

Similar content being viewed by others

Acoustic Features for Environmental Sound Analysis

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation