research-article

Free Access

Online large-margin training of syntactic and structural translation features

Authors:
David Chiang

University of Southern California, Marina del Rey, CA

University of Southern California, Marina del Rey, CA
View Profile

,
Yuval Marton

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Philip Resnik

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

Authors Info & Claims

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingOctober 2008Pages 224–233

Published:25 October 2008Publication History

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pages 224–233

ABSTRACT

Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik's soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.

References

Abhishek Arun and Philipp Koehn. 2007. Online learning methods for discriminative training of phrase based statistical machine translation. In Proc. MT Summit XI.Google Scholar
Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proc. ACL-08: HLT.Google Scholar
Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A practical minimal perfect hashing method. In 4th International Workshop on Efficient and Experimental Algorithms (WEA05). Google ScholarDigital Library
Daniel Cer, Daniel Jurafsky, and Christopher D. Manning. 2008. Regularization and search for minimum error rate training. In Proc. Third Workshop on Statistical Machine Translation. Google ScholarDigital Library
Colin Cherry. 2008. Cohesive phrase-based decoding for statistical machine translation. In Proc. ACL-08: HLT.Google Scholar
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL 2005. Google ScholarDigital Library
David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2). Google ScholarDigital Library
Michael Collins. 2002. Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In Proc. EMNLP 2002. Google ScholarDigital Library
Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951--991. Google ScholarDigital Library
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585. Google ScholarDigital Library
Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proc. HLT/NAACL 2004. Companion volume. Google ScholarDigital Library
Markus Dreyer, Keith Hall, and Sanjeev Khudanpur. 2007. Comparing reordering constraints for SMT using efficient Bleu oracle computation. In Proc. 2007 Workshop on Syntax and Structure in Statistical Translation. Google ScholarDigital Library
Kevin Duh and Katrin Kirchoff. 2008. Beyond log-linear models: Boosted minimum error rate training for n-best re-ranking. In Proc. ACL-08: HLT, Short Papers. Google ScholarDigital Library
Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine Learning, 37:277--296. Google ScholarDigital Library
Dan Klein and Chris D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002).Google Scholar
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. HLT-NAACL 2003. Google ScholarDigital Library
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proc. EMNLP 2004.Google Scholar
Gregor Leusch, Evgeny Matusov, and Hermann Ney. 2008. Complexity of finding the BLEU-optimal hypothesis in a confusion network. In Proc. EMNLP 2008. This volume. Google ScholarDigital Library
Percy Liang, Alexandre Bouchard-Côté, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In Proc. COLING-ACL 2006. Google ScholarDigital Library
Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In Proc. COLING 2004. Google ScholarDigital Library
Yuval Marton and Philip Resnik. 2008. Soft syntactic constraints for hierarchical phrased-based translation. In Proc. ACL-08: HLT.Google Scholar
Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proc. ACL 2005. Google ScholarDigital Library
Haitao Mi, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proc. ACL-08: HLT.Google Scholar
Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. ACL 2002. Google ScholarDigital Library
Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. ACL 2003. Google ScholarDigital Library
John C. Platt. 1998. Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 195--208. MIT Press. Google ScholarDigital Library
David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proc. COLING/ACL 2006, Poster Sessions. Google ScholarDigital Library
David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proc. ACL-08: HLT.Google Scholar
Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning. 2004. Max-margin parsing. In Proc. EMNLP 2004, pages 1--8.Google Scholar
Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005. Learning structured prediction models: A large margin approach. In Proc. ICML 2005. Google ScholarDigital Library
Christoph Tillmann and Tong Zhang. 2006. A discriminative global training algorithm for statistical MT. In Proc. COLING-ACL 2006. Google ScholarDigital Library
Joseph Turian, Benjamin Wellington, and I. Dan Melamed. 2007. Scalable discriminative learning for natural language parsing and translation. In Advances in Neural Information Processing Systems 19 (NIPS 2006).Google Scholar
Taro Watanabe, Jun Suzuki, Hajime Tsukuda, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proc. EMNLP 2007.Google Scholar
Dekai Wu. 1996. A polynomial-time algorithm for statistical machine translation. In Proc. 34th Annual Meeting of the Association for Computational Linguistics, pages 152--158. Google ScholarDigital Library
Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proc. ACL-08: HLT.Google Scholar

Recommendations

Incremental syntactic language models for phrase-based translation
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. Bottom-up and top-down parsers typically require a completed string as input. This requirement ...
Read More
Syntactic parsing of clause constituents for statistical machine translation

The clause is considered as the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are one important kind of linguistically valid syntactic phrases. This paper adopts the CRFs model ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages
Program Chairs:
Mirella Lapata
University of Edinburgh
,
Hwee Tou Ng
National University of Singapore
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 25 October 2008
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 755
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online large-margin training of syntactic and structural translation features

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Incremental syntactic language models for phrase-based translation

Syntactic parsing of clause constituents for statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Online large-margin training of syntactic and structural translation features

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Incremental syntactic language models for phrase-based translation

Syntactic parsing of clause constituents for statistical machine translation

Syntactic discriminative language model rerankers for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media