Skip to main content

Compact WFSA Based Language Model and Its Application in Statistical Machine Translation

  • Conference paper
Natural Language Processing and Chinese Computing (NLPCC 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 333))

  • 782 Accesses

Abstract

The authors explore the fast query techniques for n-gram language model (LM) in statistical machine translation (SMT), and then propose a compact WFSA (weighted finite-state automaton) based LM motivated by the contextual features in process of model queries. It is demonstrated that the query based on WFSA can effectively avoid the redundant queries and accelerate the query speed. Furthermore, it is revealed that investigating a simple caching techni que can further speed up the query. The experiment results show that this method can finally speed up the LM query by 75% in relative. With the LM order increasing, the performance benefits by WFSA will be much more significant.

This work was supported by 863 program in China (No. 2011AA01A207).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thorsten, B., Popat, A.C., Peng, X., Franz, J.O., Jeffrey, D.: Large Language Models in Machine Translation. In: Proceedings of EMNLP-CoNLL, pp. 858–867 (2007)

    Google Scholar 

  2. Goodman, J.: A Bit of Progress in Language Modeling. Technical report. Microsoft Research (2001)

    Google Scholar 

  3. Marcello, F., Mauro, C.: Efficient handling of n-gram language models for statis tical machine translation. In: Proceedings of the 2nd Workshop on Statistical Machine Translation, pp. 88–95 (2007)

    Google Scholar 

  4. David, T., Miles, O.: Randomised language modelling for statistical machine translation. In: Proceedings of the ACL, pp. 512–519 (2007)

    Google Scholar 

  5. Kevin, K., Jonathan, G.: An overview of probabilistic tree transducers for na tural language processing. In: Proceedings of CICLing (2005)

    Google Scholar 

  6. David, C., Jonathan, G., Kevin, K., Adam, P., Sujith, R.: Bayesian inference for Finite-State transducers. In: Proceedings of the NAACL, pp. 447–455 (2010)

    Google Scholar 

  7. Adam, P., Dan, K.: Faster and Smaller N-Gram Language Models. In: Proceedings of the ACL, pp. 258–267 (2011)

    Google Scholar 

  8. Zhifei, L., Sanjeev, K.: A scalable decoder for parsing- based machine translation with equivalent language model state maintenance. In: Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, pp. 10–18 (2008)

    Google Scholar 

  9. Kenneth, H.: KenLM: Faster and Smaller Language Model Queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 187–197 (2011)

    Google Scholar 

  10. Lambert, M., William, B.: Statistical phrase-based speech translation. In: Proceedings of ICASSP (2006)

    Google Scholar 

  11. Okan, K., Willian, B., Philip, R.: A generative probabilistic OCR model for NLP applications. In: Proceedings of the HLT-NAACL (2003)

    Google Scholar 

  12. Alexis, N., Yannick, E., Frédéric, B., Thierry, S., de Renato, M.: A language model combining N-grams and stochastic finite state automata. In: Proceedings of Eurospeech (1999)

    Google Scholar 

  13. Reinhard, K., Hermann, N.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 181–184 (1995)

    Google Scholar 

  14. Slava, M.K.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 400–401 (1987)

    Google Scholar 

  15. Andreas, S.: SRILM: An extensible language modeling toolkit. In: Proceedings of Interspeech (2002)

    Google Scholar 

  16. Edward, W., Bhiksha, R.: Quantization based language model compression. In: Proceedings of Eurospeech (2001)

    Google Scholar 

  17. David, C.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL, pp. 263–270 (2005)

    Google Scholar 

  18. David, C.: Hierarchical phrase-based translation. Computational Linguistics 33(2), 201–228 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, X., Wei, W., Lu, S., Ke, D., Xu, B. (2012). Compact WFSA Based Language Model and Its Application in Statistical Machine Translation. In: Zhou, M., Zhou, G., Zhao, D., Liu, Q., Zou, L. (eds) Natural Language Processing and Chinese Computing. NLPCC 2012. Communications in Computer and Information Science, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34456-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34456-5_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34455-8

  • Online ISBN: 978-3-642-34456-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics