A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Wu, Bian; Ren, Xiaolin; Liu, Chongqing; Zhang, Yaxin

doi:10.1007/s10772-005-2165-7

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Published: June 2005

Volume 8, pages 133–146, (2005)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Bian Wu¹,
Xiaolin Ren²,
Chongqing Liu³ &
…
Yaxin Zhang⁴

134 Accesses
Explore all metrics

Abstract

When an Automatic Speech Recognition (ASR) system is applied in noisy environments, Voice Activity Detection (VAD) is crucial to the performance of the overall system. The employment of the VAD for ASR on embedded mobile systems will minimize physical distractions and make the system convenient to use. Conventional VAD algorithm is of high complexity, which makes it unsuitable for embedded mobile devices; or of low robustness, which holds back its application in mobile noisy environments. In this paper, we propose a robust VAD algorithm specifically designed for ASR on embedded mobile devices. The architecture of the proposed algorithm is based on a two-level decision making strategy, where there is an interaction between a lower features-based level and subsequent decision logic based on a finite-state machine. Many discriminating features are employed in the lower level to improve the robustness of the VAD. The two-level decision strategy allows different features to be used in different states and reduces the cost of the algorithm, which makes the proposed algorithm suitable for embedded mobile devices. The evaluation experiments show the proposed VAD algorithm is robust and contribute to the overall performance gain of the ASR system in various acoustic environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvements on self-adaptive voice activity detector for telephone data

Article 25 July 2016

Haoran Wei, Yanhua Long & Hongwei Mao

A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices

Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review

References

Benassine, A., Shlomot, E., and Su, H. (1997). ITU-T recommendation G.729, annex B, a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. In IEEE Commun. Mag., pp. 64–97.
Chengalvaryan, R. (2001). Evaluation of front-end features and noise compensation methods for robust mandarin speech recognition. In Proceeding of Eurospeech.
De Wet, F. (2001). A comparison of LPC and FFT-based acoustic features for noise robust ASR. In Proceeding of Eurospeech.
Ganapathiraju, A. (1996). Comparison of energy-based endpoint detection for speech signal processing. In Proceedings of the IEEE Southeastcon. Tampa, Florida, USA, pp. 500–503.
Huang, X.D. and Acero, A. (2001). Spoken Language Processing, A Guide to Theory, Algorithm, and System Development. Prentice Hall.
Junqua, J.C., Reaves, B., and Mak, B. (1991). A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. In Proceeding of Eurospeech, pp. 1371–1374.
Martens, J.P. (2000). Continuous speech recognition over the telephone. Final Report of COST Action 249.
Nemer, E. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. on Speech and Audio Processing, 9(3).
Picone, J. (1993). Signal modeling techniques in speech recognition. Proc. IEEE, 79(4):1215–1247.
Google Scholar
Rabiner, L. and Juang, B.H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Renevey, P. (2001). Entropy based voice activity detection in very noisy conditions. In Proceeding of Eurospeech.
Savoji, M.H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8:45–60.
Article Google Scholar
Shieh, W.C. (1999). The dependence of feature vectors under adverse noise, In Proceeding of Eurospeech.
Shin, W.H. (2000). Speech/non-speech classification using multiple features for robust endpoint detection. In Proceeding of ICASSP.
Tanyer, S.G. (2000). Voice activity detection in nonstationary noise. IEEE Trans. On Speech and Audio Processing, 8(4).
Tucker, R. (1992). Voice activity detection using a periodicity measure. In Proc Inst. Elect. Eng., 139:377–380.
Google Scholar
Wu, G.D. and Lin, C.T. (2000). Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech and Audio Processing, 8(5).

Download references

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai, 200030, People's Republic of China
Bian Wu
Motorola Labs China Research Center, 38F, CITIC Square, 1168 Nanjing Rd., W. Shanghai, 200041, People's Republic of China
Xiaolin Ren
Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai, 200030, People's Republic of China
Chongqing Liu
Motorola Labs China Research Center, 38F, CITIC Square, 1168 Nanjing Rd., W. Shanghai, 200041, People's Republic of China
Yaxin Zhang

Authors

Bian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Ren
View author publications
You can also search for this author in PubMed Google Scholar
Chongqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bian Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Ren, X., Liu, C. et al. A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices. Int J Speech Technol 8, 133–146 (2005). https://doi.org/10.1007/s10772-005-2165-7

Download citation

Issue Date: June 2005
DOI: https://doi.org/10.1007/s10772-005-2165-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Abstract

Access this article

Similar content being viewed by others

Improvements on self-adaptive voice activity detector for telephone data

A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices

Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Abstract

Access this article

Similar content being viewed by others

Improvements on self-adaptive voice activity detector for telephone data

A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices

Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation