Improving statistical MT by coupling reordering and decoding

Crego, Josep Maria; Mariño, José B.

doi:10.1007/s10590-007-9024-z

Improving statistical MT by coupling reordering and decoding

Published: 12 July 2007

Volume 20, pages 199–215, (2006)
Cite this article

Machine Translation

Josep Maria Crego¹ &
José B. Mariño¹

123 Accesses
17 Citations
Explore all metrics

Abstract

In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n-gram translation model is also employed as distortion model. The reordering search problem is tackled through a set of linguistically motivated rewrite rules, which are used to extend a monotonic search graph with reordering hypotheses. The extended graph is traversed in the global search when a fully informed decision can be taken. Further experiments show that the n-gram translation model can be successfully used as reordering model when estimated with reordered source words. Experiments are reported on the Europarl task (Spanish–English and English–Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality with respect to monotonic search for both translation directions at a very low computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berger AL, Della Pietra SA and Della Pietra VJ (1996). A maximum entropy approach to natural language processing. Comput Ling 22: 39–72
Google Scholar
Brants T (2000) TnT—A statistical part-of-speech tagger. In: Association for Computational Linguistics 6th applied natural language processing conference, Seattle, Washington, pp 224–231
Brown PE, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Ling 16:79–85; repr. in: Nirenburg S, Somers H, Wilks Y (eds) (2003) Readings in machine translation. MIT Press, Cambridge MA, 355–362
Google Scholar
Brown PE, Della Pietra VJ, Della Pietra SA and Mercer RL (1993). The mathematics of statistical machine translation: parameter estimation. Comput Ling 19: 263–311
Google Scholar
Carreras X, Chao I, Padró L, Padró M (2004) FreeLing: an open-source suite of language analyzers. In: 4th international conference on language resources and evaluation, Lisbon, Portugal, pp 239–242
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: ACL-05, 43rd annual meeting of the Association for Computational Linguistics, University of Michigan, pp 263–270
Collins M, Koehn P, Kučerová I (2005) Clause restructuring for statistical machine translation. In: ACL-05, 43rd annual meeting of the Association for Computational Linguistics, University of Michigan, pp 531–540
Costa-jussà MR, Fonollosa JAR (2006) Statistical machine reordering. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP-06), Sydney, Australia, pp 70–76
Crego JM (2005) MARIE: Ngra(m)-based statistical m(a)chine t(r)anslat(i)on d(e)coder, http://gps-tsc.upc.es/veu/soft/soft/marie/, accessed April 5, 2007
Crego JM, Costa-jussà MR, Mariño JB, Fonollosa JAR (2005a) Ngram-based versus phrase-based statistical machine translation. In: International workshop on spoken language translation: Evaluation campaign on spoken language translation, Pittsburgh, PA
Crego JM, Mariño JB (2006a) Integration of POStag-based source reordering into SMT decoding by an extended search graph. In: AMTA 2006, Proceedings of the 7th conference of the Association for Machine Translation in the America, Visions for the future of machine translation, Cambridge, MA, pp 29–36
Crego JM, Mariño JB (2006b) Reordering experiments for N-gram-based SMT. In: IEEE/ACL 2006 Workshop on spoken language technology, Palm Beach, Aruba
Crego JM, Mariño JB, de Gispert A (2004) Finite-state-based and phrase-based statistical machine translation. In: INTERSPEECH 2004–ICSLP, 8th international conference on spoken language processing, Jeju Island, Korea, pp 37–40
Crego JM, Mariño JB, de Gispert A (2005b) An n-gram-based statistical machine translation decoder. In: Interspeech’2005–Eurospeech, 9th European conference on speech communication and technology Lisbon, Portugal, pp 3185–3188
Crego JM, Mariño JB, de Gispert A (2005c) Reordered search and tuple unfolding for Ngram-based SMT. In: The tenth machine translation summit, Phuket, Thailand, pp 283–289
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: ARPA workshop on human language technology notebook proceedings, San Diego, CA, pp 139–145
Kanthak S, Vilar D, Matusov E, Zens R, Ney H (2005) Novel reordering approaches in phrase-based statistical machine translation. In: ACL-05 workshop, Building and using parallel texts: Data-driven machine translation and beyond, Ann Arbor, Michigan, pp 167–174
Knight K (1999). Decoding complexity in word replacement translation models. Comput Ling 26: 607–615
Google Scholar
Koehn P, Axelrod A, Birch A, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: International workshop on spoken language translation: Evaluation campaign on spoken language translation, Pittsburgh, PA
Kumar S, Byrne W (2005) Local phrase reordering models for statistical machine translation. In: HLT/ EMNLP 2005 human language technology conference and conference on empirical methods in natural language processing, Vancouver, British Columbia, pp 161–168
Langlais P, Gotti F (2005) Phrase-based SMT with shallow tree-phrases. In: HLT-NAACL 06 statistical machine translation workshop, New York City, pp 39–46
Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JAR, Costa-jussà MR and Gispert A (2006). N-gram-based machine translation. Comput Ling 32: 527–549
Article Google Scholar
Nagata M, Saito K, Yamamoto K, Ohashi K (2006) Clustered global phrase reordering model for statistical machine translation. In: COLING·ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 713–720
Nelder J and Mead R (1965). A simplex method for function minimization. Comput J 7: 308–313
Google Scholar
Och FJ (2003) Minimum error rate training for statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167
Och FJ, Ney H (2000) Improved statistical alignment models. In: 38th annual meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318
Popovic M, Ney H (2006) POS-based word reorderings for statistical machine translation. In: LREC-2006: Fifth international conference on language resources and evaluation, Genova, Italy, pp 1278–1283
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: Syntactically informed phrasal SMT. In: ACL-05, 43rd annual meeting of the Association for Computational Linguistics, University of Michigan, pp 271–279
Simard M, Cancedda N, Cavestro B, Dymetman M, Gaussier E, Goutte C, Yamada K, Langlais P, Mauser A (2005) Translating with non-contiguous phrases. In: HLT/EMNLP 2005 human language technology conference and conference on empirical methods in natural language processing, Vancouver, British Columbia, pp 755–762
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: 7th international conference on spoken language processing, Denver, CO, pp 901–904
Tillmann C (2004) A unigram orientation model for statistical machine translation. In: HLT-NAACL 2004, Human language technology conference and North American chapter of the Association for Computational Linguistics annual meeting, Short papers, Boston, USA, pp 101-104
Tillmann C, Zhang T (2005) A localized prediction model for statistical machine translation. In: ACL-05, 43rd annual meeting of the Association for Computational Linguistics, University of Michigan, pp 557–564
Watanabe T, Tsukada H, Isozaki H (2006) Left-to-right target generation for hierarchical phrase-based translation. In: COLING·ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 777–784
Wu D (1996) A polynomial-time algorithm for statistical machine translation. In: 34th annual meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp 152–158
Wu D (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Ling 23: 377–404
Google Scholar
Xia F, McCord M (2004) Improving a statistical MT system with automatically learned rewrite patterns. In: 20th international conference on computational linguistics, Geneva, Switzerland, pp 508–514
Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 303–310
Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: HLT-NAACL 06 statistical machine translation workshop, New York City, pp 55–63
Zens R, Och FJ, Ney H (2002) Phrase-based statistical machine translation. In: Jarke M, Koehler J, Lakemeyer G (eds) KI 2002: Advances in artificial intelligence, 25th annual German conference on AI, KI 2002, Aachen, Germany. Springer Verlag, Berlin, Germany, pp 191–198

Download references

Author information

Authors and Affiliations

Department of Signal Theory and Communications, TALP Research Center, Universitat Politècnica de Catalunya, Campus Norte UPC, Edificio D5, C/Jordi Girona 1-3, 08034, Barcelona, Spain
Josep Maria Crego & José B. Mariño

Authors

Josep Maria Crego
View author publications
You can also search for this author in PubMed Google Scholar
José B. Mariño
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josep Maria Crego.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crego, J.M., Mariño, J.B. Improving statistical MT by coupling reordering and decoding. Machine Translation 20, 199–215 (2006). https://doi.org/10.1007/s10590-007-9024-z

Download citation

Received: 11 October 2006
Accepted: 24 May 2007
Published: 12 July 2007
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10590-007-9024-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving statistical MT by coupling reordering and decoding

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Machine translation systems and quality assessment: a systematic review

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving statistical MT by coupling reordering and decoding

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

Machine translation systems and quality assessment: a systematic review

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation