skip to main content
10.5555/1613715.1613747dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free Access

Online large-margin training of syntactic and structural translation features

Published:25 October 2008Publication History

ABSTRACT

Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik's soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.

References

  1. Abhishek Arun and Philipp Koehn. 2007. Online learning methods for discriminative training of phrase based statistical machine translation. In Proc. MT Summit XI.Google ScholarGoogle Scholar
  2. Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar
  3. Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A practical minimal perfect hashing method. In 4th International Workshop on Efficient and Experimental Algorithms (WEA05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniel Cer, Daniel Jurafsky, and Christopher D. Manning. 2008. Regularization and search for minimum error rate training. In Proc. Third Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Colin Cherry. 2008. Cohesive phrase-based decoding for statistical machine translation. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar
  6. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. ACL 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Michael Collins. 2002. Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In Proc. EMNLP 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951--991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proc. HLT/NAACL 2004. Companion volume. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Markus Dreyer, Keith Hall, and Sanjeev Khudanpur. 2007. Comparing reordering constraints for SMT using efficient Bleu oracle computation. In Proc. 2007 Workshop on Syntax and Structure in Statistical Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kevin Duh and Katrin Kirchoff. 2008. Beyond log-linear models: Boosted minimum error rate training for n-best re-ranking. In Proc. ACL-08: HLT, Short Papers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine Learning, 37:277--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dan Klein and Chris D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002).Google ScholarGoogle Scholar
  16. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. HLT-NAACL 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proc. EMNLP 2004.Google ScholarGoogle Scholar
  18. Gregor Leusch, Evgeny Matusov, and Hermann Ney. 2008. Complexity of finding the BLEU-optimal hypothesis in a confusion network. In Proc. EMNLP 2008. This volume. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Percy Liang, Alexandre Bouchard-Côté, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In Proc. COLING-ACL 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In Proc. COLING 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuval Marton and Philip Resnik. 2008. Soft syntactic constraints for hierarchical phrased-based translation. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar
  22. Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proc. ACL 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Haitao Mi, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar
  24. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. ACL 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. ACL 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John C. Platt. 1998. Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 195--208. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proc. COLING/ACL 2006, Poster Sessions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar
  29. Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning. 2004. Max-margin parsing. In Proc. EMNLP 2004, pages 1--8.Google ScholarGoogle Scholar
  30. Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. 2005. Learning structured prediction models: A large margin approach. In Proc. ICML 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christoph Tillmann and Tong Zhang. 2006. A discriminative global training algorithm for statistical MT. In Proc. COLING-ACL 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Joseph Turian, Benjamin Wellington, and I. Dan Melamed. 2007. Scalable discriminative learning for natural language parsing and translation. In Advances in Neural Information Processing Systems 19 (NIPS 2006).Google ScholarGoogle Scholar
  33. Taro Watanabe, Jun Suzuki, Hajime Tsukuda, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proc. EMNLP 2007.Google ScholarGoogle Scholar
  34. Dekai Wu. 1996. A polynomial-time algorithm for statistical machine translation. In Proc. 34th Annual Meeting of the Association for Computational Linguistics, pages 152--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proc. ACL-08: HLT.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
    October 2008
    1129 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 25 October 2008

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate73of234submissions,31%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader