ABSTRACT
Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik's soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.
- Abhishek Arun and Philipp Koehn. 2007. Online learning methods for discriminative training of phrase based statistical machine translation. In Proc. MT Summit XI.Google Scholar
- Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proc. ACL-08: HLT.Google Scholar
- Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A practical minimal perfect hashing method. In 4th International Workshop on Efficient and Experimental Algorithms (WEA05). Google ScholarDigital Library
- Daniel Cer, Daniel Jurafsky, and Christopher D. Manning. 2008. Regularization and search for minimum error rate training. In Proc. Third Workshop on Statistical Machine Translation. Google ScholarDigital Library
- Colin Cherry. 2008. Cohesive phrase-based decoding for statistical machine translation. In Proc. ACL-08: HLT.Google Scholar
- David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL 2005. Google ScholarDigital Library
- David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2). Google ScholarDigital Library
- Michael Collins. 2002. Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In Proc. EMNLP 2002. Google ScholarDigital Library
- Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951--991. Google ScholarDigital Library
- Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585. Google ScholarDigital Library
- Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proc. HLT/NAACL 2004. Companion volume. Google ScholarDigital Library
- Markus Dreyer, Keith Hall, and Sanjeev Khudanpur. 2007. Comparing reordering constraints for SMT using efficient Bleu oracle computation. In Proc. 2007 Workshop on Syntax and Structure in Statistical Translation. Google ScholarDigital Library
- Kevin Duh and Katrin Kirchoff. 2008. Beyond log-linear models: Boosted minimum error rate training for n-best re-ranking. In Proc. ACL-08: HLT, Short Papers. Google ScholarDigital Library
- Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine Learning, 37:277--296. Google ScholarDigital Library
- Dan Klein and Chris D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002).Google Scholar
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. HLT-NAACL 2003. Google ScholarDigital Library
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proc. EMNLP 2004.Google Scholar
- Gregor Leusch, Evgeny Matusov, and Hermann Ney. 2008. Complexity of finding the BLEU-optimal hypothesis in a confusion network. In Proc. EMNLP 2008. This volume. Google ScholarDigital Library
- Percy Liang, Alexandre Bouchard-Côté, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In Proc. COLING-ACL 2006. Google ScholarDigital Library
- Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In Proc. COLING 2004. Google ScholarDigital Library
- Yuval Marton and Philip Resnik. 2008. Soft syntactic constraints for hierarchical phrased-based translation. In Proc. ACL-08: HLT.Google Scholar
- Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proc. ACL 2005. Google ScholarDigital Library
- Haitao Mi, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proc. ACL-08: HLT.Google Scholar
- Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. ACL 2002. Google ScholarDigital Library
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. ACL 2003. Google ScholarDigital Library
- John C. Platt. 1998. Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 195--208. MIT Press. Google ScholarDigital Library
- David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proc. COLING/ACL 2006, Poster Sessions. Google ScholarDigital Library
- David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proc. ACL-08: HLT.Google Scholar
- Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning. 2004. Max-margin parsing. In Proc. EMNLP 2004, pages 1--8.Google Scholar
- Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005. Learning structured prediction models: A large margin approach. In Proc. ICML 2005. Google ScholarDigital Library
- Christoph Tillmann and Tong Zhang. 2006. A discriminative global training algorithm for statistical MT. In Proc. COLING-ACL 2006. Google ScholarDigital Library
- Joseph Turian, Benjamin Wellington, and I. Dan Melamed. 2007. Scalable discriminative learning for natural language parsing and translation. In Advances in Neural Information Processing Systems 19 (NIPS 2006).Google Scholar
- Taro Watanabe, Jun Suzuki, Hajime Tsukuda, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proc. EMNLP 2007.Google Scholar
- Dekai Wu. 1996. A polynomial-time algorithm for statistical machine translation. In Proc. 34th Annual Meeting of the Association for Computational Linguistics, pages 152--158. Google ScholarDigital Library
- Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proc. ACL-08: HLT.Google Scholar
Recommendations
Incremental syntactic language models for phrase-based translation
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. Bottom-up and top-down parsers typically require a completed string as input. This requirement ...
Syntactic parsing of clause constituents for statistical machine translation
The clause is considered as the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are one important kind of linguistically valid syntactic phrases. This paper adopts the CRFs model ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Comments