Skip to main content

Topic-Dependent Language Model Switching for Embedded Automatic Speech Recognition

  • Conference paper

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 153))

Abstract

Embedded devices incorporate everyday new applications in different domains due to their increasing computational power.Many of these applications have a voice interface that uses Automatic Speech Recognition (ASR). When the complexity of the language model is high, it is common to use an external server to perform the recognition at the expense of certain limitations (network availability, latency, etc.). This paper focuses on a new proposal to improve the efficiency of the usage of the language model in a recognizer for multiple domains. The idea is based on the selection of a proper language model for each domain within the ASR system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BeagleBoard website, http://beagleboard.org/

  2. CMU sphinx, http://cmusphinx.sourceforge.net/ , http://cmusphinx.sourceforge.net/

  3. Ballinger, B., Allauzen, C., Gruenstein, A., Schalkwyk, J.: On-demand language model interpolation for mobile speech input. In: Kobayashi, T., Hirose, K., Nakamura, S. (eds.) Proceedings of Interspeech, pp. 1812–1815. ISCA (2010)

    Google Scholar 

  4. Bennett, C., Rudnicky, A.I.: The Carnegie Mellon Communicator corpus. In: Proceedings of the International Conference on Spoken Language Processing, pp. 341–344 (2002)

    Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Google Scholar 

  6. Chen, S.F.: An empirical study of smoothing techniques for language modeling. Tech. rep. (1998)

    Google Scholar 

  7. CMU Communicator limited domain website, http://festvox.org/dbs/dbs_com.html

  8. CMU Weather limited domain website, http://festvox.org/dbs/dbs_weather.html

  9. Hsu, B.J., Glass, J.: Iterative language model estimation: Efficient data structure & algorithms. In: Proceedings of Interspeech, pp. 504–511. ISCA (2008)

    Google Scholar 

  10. Huggins-daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: Proceedings of ICASSP (2006)

    Google Scholar 

  11. Lane, I.R., Kawahara, T., Matsui, T., Nakamura, S.: Dialogue speech recognition by combining hierarchical topic classification and language model switching. IEICE - Trans. Inf. Syst. E88-D, 446–454 (2005)

    Article  Google Scholar 

  12. Price, P., Fisher, W., Bernstein, J., Pallet, D.: Resource Management RM1 2.0. Linguistic Data Consortium, Philadelphia (1993), LDC93S3B

    Google Scholar 

  13. Ravishankar, M.: Efficient algorithms for speech recognition. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh (1996), Available as tech report CMU-CS-96-143

    Google Scholar 

  14. Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: “your word is my command”: Google search by voice: A case study. In: Neustein, A. (ed.) Advances in Speech Recognition, pp. 61–90. Springer, US (2010)

    Chapter  Google Scholar 

  15. Schmitt, A., Zaykovskiy, D., Minker, W.: Speech recognition for mobile devices. International Journal of Speech Technology 11, 63–72 (2008)

    Article  Google Scholar 

  16. Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc., New York (1995)

    MATH  Google Scholar 

  17. Vertanen, K.: Baseline WSJ acoustic models for HTK and sphinx: Training recipes and recognition experiments. Technical report, University of Cambridge, Cavendish Laboratory (2006)

    Google Scholar 

  18. Voxforge English Acoustic Model website, http://www.voxforge.org/home/downloads

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Santos-Pérez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos-Pérez, M., González-Parada, E., Cano-García, J.M. (2012). Topic-Dependent Language Model Switching for Embedded Automatic Speech Recognition. In: Novais, P., Hallenborg, K., Tapia, D., Rodríguez, J. (eds) Ambient Intelligence - Software and Applications. Advances in Intelligent and Soft Computing, vol 153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28783-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28783-1_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28782-4

  • Online ISBN: 978-3-642-28783-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics