Article

Free Access

Improving a statistical MT system with automatically learned rewrite patterns

Authors:
Fei Xia

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Michael McCord

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

COLING '04: Proceedings of the 20th international conference on Computational LinguisticsAugust 2004Pages 508–eshttps://doi.org/10.3115/1220355.1220428

Published:23 August 2004Publication History

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

Pages 508–es

ABSTRACT

Current clump-based statistical MT systems have two limitations with respect to word ordering: First, they lack a mechanism for expressing and using generalization that accounts for reorderings of linguistic phrases. Second, the ordering of target words in such systems does not respect linguistic phrase boundaries. To address these limitations, we propose to use automatically learned rewrite patterns to preprocess the source sentences so that they have a word order similar to that of the target language. Our system is a hybrid one. The basic model is statistical, but we use broad-coverage rule-based parsers in two ways - during training for learning rewrite patterns, and at runtime for reordering the source sentences. Our experiments show 10% relative improvement in Bleu measure.

References

Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263--311.]] Google ScholarDigital Library
Kenji Imamura, Eiichiro Sumita, and Yuji Matsumoto. 2003. Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), pages 447--454.]] Google ScholarDigital Library
Kenji Imamura. 2001. Hierarchical Phrase Alignment Harmonized with Parsing. In Proc. of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pages 377--384.]]Google Scholar
Hiroyuki Kaji, Yuuko Kida, and Yasutsugu Morimoto. 1992. Learning Translation Templates From Bilingual Text. In Proc. of the 14th International Conference on Computational Linguistics (COLING 1992), pages 672--678.]] Google ScholarDigital Library
Benoit Lavoie, Michael White, and Tanya Korelsky. 2001. Inducing Lexico-Structural Transfer Rules from Parsed Bitexts. In Proc. of the Workshop on Data-driven Machine Translation in conjunction with ACL 2001.]] Google ScholarDigital Library
Daniel Marcu and William Wong. 2002. A Phrased-Based, Joint Probability Model for Statistical Machine Translation. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 133--139.]] Google ScholarDigital Library
Yuji Matsumoto, Hiroyuki Ishimoto, and Takehito Utsuro. 1993. Structural Matching of Parallel Texts. In Proc of the 31st Annual Meeting of the Association for Computational Linguistics (ACL 1993), pages 23--30.]] Google ScholarDigital Library
Michael C. McCord and Arendse Bernth. 1998. The LMT Transformational System. In Proc. of the Third Conference of the Association for Machine Translation in the Americas (AMTA 1998), pages 344--355.]] Google ScholarDigital Library
Michael McCord. 1980. Slot Grammars. Computational Linguistics, 6(1):31--43.]] Google ScholarDigital Library
Michael McCord. 1989. Design of LMT: A Prolog-based Machine Translation System. Computational Linguistics, 15:33--52.]] Google ScholarDigital Library
Michael C. McCord. 1990. Slot Grammar: A system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, pages 118--145. Springer Verlag, Berlin.]] Google ScholarDigital Library
Michael C. McCord. 1993. Heuristics for broad-coverage natural language parsing. In Proceedings of the ARPA Human Language Technology Workshop, pages 127--132. Morgan-Kaufmann.]] Google ScholarDigital Library
Arul Menezes and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proc. of the Workshop on Data-Driven Machine Translation in conjunction with ACL 2001.]] Google ScholarDigital Library
Adam Meyers, Michiko Kosaka, and Ralph Grishman. 2000. Chart-based transfer rule application in machine translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarDigital Library
Franz-Josef Och, Christoph Tillmann, and Hermann Ney. 1999. Improved Alignment Models for Statistical Machine Translation. In Proc. of the Joint Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 1999), pages 20--28.]] Google ScholarDigital Library
Franz Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2004).]] Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 311--318.]] Google ScholarDigital Library
Christoph Tillmann and Fei Xia. 2003. A Phrase-Based Unigram Model for Statistical Machine Translation. In Proc. of the third Human Language Technology Conference (HLT/NAACL 2003).]] Google ScholarDigital Library
Hideo Watanabe, Sado Kurohashi, and Eiji Aramak. 2000. Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarDigital Library
Dekai Wu. 1996. A Polynominal-Time Algorithm for Statistical Machine Translation. In Proc of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996).]] Google ScholarDigital Library
Kenji Yamada and Kevin Knight. 2002. A Decoder for Syntax-based Statistical MT. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 303--310.]] Google ScholarDigital Library

Recommendations

Improving statistical MT through morphological analysis
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected ...
Read More
Improving statistical MT by coupling reordering and decoding

In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n -gram translation model is also employed as distortion model. The reordering search problem is tackled ...
Read More
Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation
SSST '07: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation

In this paper, we describe a source-side reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING '04: Proceedings of the 20th international conference on Computational Linguistics
August 2004
1411 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 2004
Qualifiers
- Article
Conference

Acceptance Rates
COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 702
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving a statistical MT system with automatically learned rewrite patterns

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Improving statistical MT through morphological analysis

Improving statistical MT by coupling reordering and decoding

Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving a statistical MT system with automatically learned rewrite patterns

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Improving statistical MT through morphological analysis

Improving statistical MT by coupling reordering and decoding

Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media