ABSTRACT
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement.
- Alshawi, H. (1996). Head automata and bilingual tiling: Translation with minimal representations (invited talk). In Proceedings of ACL 1996. Google ScholarDigital Library
- Berger, A. L., Pietra, S. A. D., and Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--69. Google ScholarDigital Library
- Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation. Computational Linguistics, 19(2):263--313. Google ScholarDigital Library
- Charniak, E., Knight, K., and Yamada, K. (2003). Syntax-based language models for statistical machine translation. In Proceedings of the MT Summit IX.Google Scholar
- Dubey, A. and Keller, F. (2003). Parsing german with sister-head dependencies. In Proceedings of ACL 2003. Google ScholarDigital Library
- Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Springer-Verlag.Google Scholar
- Galley, M., Hopkins, M., Knight, K., and Marcu, D. (2004). What's in a translation rule? In Proceedings of HLT-NAACL 2004.Google ScholarCross Ref
- Gildea, D. (2003). Loosely tree-based alignment for machine translation. In Proceedings of ACL 2003. Google ScholarDigital Library
- Graehl, J. and Knight, K. (2004). Training tree transducers. In Proceedings of HLT-NAACL 2004.Google Scholar
- Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Lin, D. and Wu, D., editors, Proceedings of EMNLP 2004.Google Scholar
- Koehn, P. and Knight, K. (2003). Feature-rich statistical translation of noun phrases. In Hinrichs, E. and Roth, D., editors, Proceedings of ACL 2003, pages 311--318. Google ScholarDigital Library
- Koehn, P., Och, F. J., and Marcu, D. (2003). Statistical phrase based translation. In Proceedings of HLT-NAACL 2003. Google ScholarDigital Library
- Lehmann, E. L. (1986). Testing Statistical Hypotheses (Second Edition). Springer-Verlag.Google Scholar
- Marcu, D. and Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of EMNLP 2002. Google ScholarDigital Library
- Melamed, I. D. (2004). Statistical machine translation by parsing. In Proceedings of ACL 2004. Google ScholarDigital Library
- Niessen, S. and Ney, H. (2004). Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguistics, 30(2):181--204. Google ScholarDigital Library
- Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of ACL 2003. Google ScholarDigital Library
- Och, F. J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., and Radev, D. (2004). A smorgasbord of features for statistical machine translation. In Proceedings of HLT-NAACL 2004.Google Scholar
- Och, F. J., Tillmann, C., and Ney, H. (1999). Improved alignment models for statistical machine translation. In Proceedings of EMNLP 1999, pages 20--28. Google ScholarDigital Library
- Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL 2002. Google ScholarDigital Library
- Shen, L., Sarkar, A., and Och, F. J. (2004). Discriminative reranking for machine translation. In Proceedings of HLT-NAACL 2004.Google Scholar
- Wasserman, L. (2004). All of Statistics. Springer-Verlag.Google Scholar
- Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3). Google ScholarDigital Library
- Xia, F. and McCord, M. (2004). Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of Coling 2004. Google ScholarDigital Library
- Yamada, K. and Knight, K. (2001). A syntax-based statistical translation model. In Proceedings of ACL 2001. Google ScholarDigital Library
- Zhang, Y. and Vogel, S. (2004). Measuring confidence intervals for the machine translation evaluation metrics. In Proceedings of the Tenth Conference on Theoretical and Methodological Issues in Machine Translation (TMI).Google Scholar
- Clause restructuring for statistical machine translation
Recommendations
Syntactic parsing of clause constituents for statistical machine translation
The clause is considered as the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are one important kind of linguistically valid syntactic phrases. This paper adopts the CRFs model ...
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Comments