ABSTRACT
Current clump-based statistical MT systems have two limitations with respect to word ordering: First, they lack a mechanism for expressing and using generalization that accounts for reorderings of linguistic phrases. Second, the ordering of target words in such systems does not respect linguistic phrase boundaries. To address these limitations, we propose to use automatically learned rewrite patterns to preprocess the source sentences so that they have a word order similar to that of the target language. Our system is a hybrid one. The basic model is statistical, but we use broad-coverage rule-based parsers in two ways - during training for learning rewrite patterns, and at runtime for reordering the source sentences. Our experiments show 10% relative improvement in Bleu measure.
- Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263--311.]] Google ScholarDigital Library
- Kenji Imamura, Eiichiro Sumita, and Yuji Matsumoto. 2003. Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), pages 447--454.]] Google ScholarDigital Library
- Kenji Imamura. 2001. Hierarchical Phrase Alignment Harmonized with Parsing. In Proc. of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pages 377--384.]]Google Scholar
- Hiroyuki Kaji, Yuuko Kida, and Yasutsugu Morimoto. 1992. Learning Translation Templates From Bilingual Text. In Proc. of the 14th International Conference on Computational Linguistics (COLING 1992), pages 672--678.]] Google ScholarDigital Library
- Benoit Lavoie, Michael White, and Tanya Korelsky. 2001. Inducing Lexico-Structural Transfer Rules from Parsed Bitexts. In Proc. of the Workshop on Data-driven Machine Translation in conjunction with ACL 2001.]] Google ScholarDigital Library
- Daniel Marcu and William Wong. 2002. A Phrased-Based, Joint Probability Model for Statistical Machine Translation. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 133--139.]] Google ScholarDigital Library
- Yuji Matsumoto, Hiroyuki Ishimoto, and Takehito Utsuro. 1993. Structural Matching of Parallel Texts. In Proc of the 31st Annual Meeting of the Association for Computational Linguistics (ACL 1993), pages 23--30.]] Google ScholarDigital Library
- Michael C. McCord and Arendse Bernth. 1998. The LMT Transformational System. In Proc. of the Third Conference of the Association for Machine Translation in the Americas (AMTA 1998), pages 344--355.]] Google ScholarDigital Library
- Michael McCord. 1980. Slot Grammars. Computational Linguistics, 6(1):31--43.]] Google ScholarDigital Library
- Michael McCord. 1989. Design of LMT: A Prolog-based Machine Translation System. Computational Linguistics, 15:33--52.]] Google ScholarDigital Library
- Michael C. McCord. 1990. Slot Grammar: A system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, pages 118--145. Springer Verlag, Berlin.]] Google ScholarDigital Library
- Michael C. McCord. 1993. Heuristics for broad-coverage natural language parsing. In Proceedings of the ARPA Human Language Technology Workshop, pages 127--132. Morgan-Kaufmann.]] Google ScholarDigital Library
- Arul Menezes and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proc. of the Workshop on Data-Driven Machine Translation in conjunction with ACL 2001.]] Google ScholarDigital Library
- Adam Meyers, Michiko Kosaka, and Ralph Grishman. 2000. Chart-based transfer rule application in machine translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarDigital Library
- Franz-Josef Och, Christoph Tillmann, and Hermann Ney. 1999. Improved Alignment Models for Statistical Machine Translation. In Proc. of the Joint Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 1999), pages 20--28.]] Google ScholarDigital Library
- Franz Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2004).]] Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 311--318.]] Google ScholarDigital Library
- Christoph Tillmann and Fei Xia. 2003. A Phrase-Based Unigram Model for Statistical Machine Translation. In Proc. of the third Human Language Technology Conference (HLT/NAACL 2003).]] Google ScholarDigital Library
- Hideo Watanabe, Sado Kurohashi, and Eiji Aramak. 2000. Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarDigital Library
- Dekai Wu. 1996. A Polynominal-Time Algorithm for Statistical Machine Translation. In Proc of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996).]] Google ScholarDigital Library
- Kenji Yamada and Kevin Knight. 2002. A Decoder for Syntax-based Statistical MT. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 303--310.]] Google ScholarDigital Library
Recommendations
Improving statistical MT through morphological analysis
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingIn statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected ...
Improving statistical MT by coupling reordering and decoding
In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n -gram translation model is also employed as distortion model. The reordering search problem is tackled ...
Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation
SSST '07: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical TranslationIn this paper, we describe a source-side reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are automatically learned from source-...
Comments