Skip to main content

ASR Hypothesis Reranking Using Prior-Informed Restricted Boltzmann Machine

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Abstract

Discriminative language models (DLMs) have been widely used for reranking competing hypotheses produced by an Automatic Speech Recognition (ASR) system. While existing DLMs suffer from limited generalization power, we propose a novel DLM based on a discriminatively trained Restricted Boltzmann Machine (RBM). The hidden layer of the RBM improves generalization and allows for employing additional prior knowledge, including pre-trained parameters and entity-related prior. Our approach outperforms the single-layer-perceptron (SLP) reranking model, and fusing our approach with SLP achieves up to 1.3% absolute Word Error Rate (WER) reduction and a relative 180% improvement in terms of WER reduction over the SLP reranker. In particular, it shows that proposed prior informed RBM reranker achieves largest ASR error reduction (3.1% absolute WER) on content words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://kaldi.sourceforge.net/.

  2. 2.

    http://cmusphinx.sourceforge.net/2013/01/a-new-english-language-model-release/.

  3. 3.

    http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.

References

  1. Collins, M.: Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 489–496 (2002)

    Google Scholar 

  2. Ma, Y., Cambria, E., Gao, S.: Label embedding for zero-shot fine-grained named entity typing. In: COLING, Osaka, pp. 171–180 (2016)

    Google Scholar 

  3. Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Comput. Linguist. 31, 25–70 (2005)

    Article  MathSciNet  Google Scholar 

  4. Koo, T., Collins, M.: Hidden-variable models for discriminative reranking. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 507–514 (2005)

    Google Scholar 

  5. Li, Z., Khudanpur, S.: Large-scale discriminative n-gram language models for statistical machine translation. In: Proceedings of AMTA (2009)

    Google Scholar 

  6. Roark, B., Saraclar, M., Collins, M., Johnson, M.: Discriminative language modeling with conditional random fields and the perceptron algorithm. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Main Volume, Barcelona, Spain, pp. 47–54 (2004)

    Google Scholar 

  7. Collins, M., Roark, B., Saraclar, M.: Discriminative syntactic language modeling for speech recognition. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 507–514 (2005)

    Google Scholar 

  8. Lambert, B., Raj, B., Singh, R.: Discriminatively trained dependency language modeling for conversational speech recognition. In: INTERSPEECH, pp. 3414–3418 (2013)

    Google Scholar 

  9. Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 194–281 (1986)

    Google Scholar 

  10. Niehues, J., Waibel, A.: Continuous space language models using restricted boltzmann machines. In: IWSLT, pp. 164–170 (2012)

    Google Scholar 

  11. Dahl, G.E., Adams, R.P., Larochelle, H.: Training restricted boltzmann machines on word observations. arXiv preprint arXiv:1202.5695 (2012)

  12. Wang, L., Liu, K., Cao, Z., Zhao, J., de Melo, G.: Sentiment-aspect extraction based on restricted boltzmann machines. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1: Long Papers. Association for Computational Linguistics, Beijing (2015)

    Google Scholar 

  13. Fries, C.C.: The Structure of English. Harcourt Brace, New York (1952)

    Google Scholar 

  14. Pernkopf, F., Wohlmayr, M., Tschiatschek, S.: Maximum margin Bayesian network classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 34, 521–532 (2012)

    Article  Google Scholar 

  15. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)

    Article  Google Scholar 

  16. Levit, M., Parthasarathy, S., Chang, S.: Word-phrase-entity language models: Getting more mileage out of n-grams. In: Proceedings of Interspeech, Singapore, ISCA - International Speech Communication Association, pp. 666–670 (2014)

    Google Scholar 

  17. Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: NIPS, vol. 22, pp. 1607–1614 (2009)

    Google Scholar 

  18. Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more ted talks. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 3935–3939 (2014)

    Google Scholar 

  19. Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S.: Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of WAC-4 (2008)

    Google Scholar 

  20. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)

    Google Scholar 

  21. Walker, W., et al.: Sphinx-4: a flexible open source framework for speech recognition. Technical report, Mountain View, CA, USA (2004)

    Google Scholar 

  22. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)

    Google Scholar 

  23. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  24. Cambria, E., Poria, S., Bajpai, R., Schuller, B.: SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: COLING, pp. 2666–2677 (2016)

    Google Scholar 

Download references

Acknowledgements

This work was conducted within the Rolls-Royce@NTU Corp Lab with support from the National Research Foundation Singapore under the Corp Lab@University Scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yukun Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Cambria, E., Bigot, B. (2018). ASR Hypothesis Reranking Using Prior-Informed Restricted Boltzmann Machine. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics