skip to main content
10.3115/1220355.1220428dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

Improving a statistical MT system with automatically learned rewrite patterns

Authors Info & Claims
Published:23 August 2004Publication History

ABSTRACT

Current clump-based statistical MT systems have two limitations with respect to word ordering: First, they lack a mechanism for expressing and using generalization that accounts for reorderings of linguistic phrases. Second, the ordering of target words in such systems does not respect linguistic phrase boundaries. To address these limitations, we propose to use automatically learned rewrite patterns to preprocess the source sentences so that they have a word order similar to that of the target language. Our system is a hybrid one. The basic model is statistical, but we use broad-coverage rule-based parsers in two ways - during training for learning rewrite patterns, and at runtime for reordering the source sentences. Our experiments show 10% relative improvement in Bleu measure.

References

  1. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263--311.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kenji Imamura, Eiichiro Sumita, and Yuji Matsumoto. 2003. Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), pages 447--454.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kenji Imamura. 2001. Hierarchical Phrase Alignment Harmonized with Parsing. In Proc. of the 6th Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pages 377--384.]]Google ScholarGoogle Scholar
  4. Hiroyuki Kaji, Yuuko Kida, and Yasutsugu Morimoto. 1992. Learning Translation Templates From Bilingual Text. In Proc. of the 14th International Conference on Computational Linguistics (COLING 1992), pages 672--678.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benoit Lavoie, Michael White, and Tanya Korelsky. 2001. Inducing Lexico-Structural Transfer Rules from Parsed Bitexts. In Proc. of the Workshop on Data-driven Machine Translation in conjunction with ACL 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Marcu and William Wong. 2002. A Phrased-Based, Joint Probability Model for Statistical Machine Translation. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 133--139.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yuji Matsumoto, Hiroyuki Ishimoto, and Takehito Utsuro. 1993. Structural Matching of Parallel Texts. In Proc of the 31st Annual Meeting of the Association for Computational Linguistics (ACL 1993), pages 23--30.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Michael C. McCord and Arendse Bernth. 1998. The LMT Transformational System. In Proc. of the Third Conference of the Association for Machine Translation in the Americas (AMTA 1998), pages 344--355.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael McCord. 1980. Slot Grammars. Computational Linguistics, 6(1):31--43.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael McCord. 1989. Design of LMT: A Prolog-based Machine Translation System. Computational Linguistics, 15:33--52.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael C. McCord. 1990. Slot Grammar: A system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, pages 118--145. Springer Verlag, Berlin.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael C. McCord. 1993. Heuristics for broad-coverage natural language parsing. In Proceedings of the ARPA Human Language Technology Workshop, pages 127--132. Morgan-Kaufmann.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Arul Menezes and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proc. of the Workshop on Data-Driven Machine Translation in conjunction with ACL 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Adam Meyers, Michiko Kosaka, and Ralph Grishman. 2000. Chart-based transfer rule application in machine translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Franz-Josef Och, Christoph Tillmann, and Hermann Ney. 1999. Improved Alignment Models for Statistical Machine Translation. In Proc. of the Joint Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 1999), pages 20--28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Franz Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2004).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 311--318.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christoph Tillmann and Fei Xia. 2003. A Phrase-Based Unigram Model for Statistical Machine Translation. In Proc. of the third Human Language Technology Conference (HLT/NAACL 2003).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hideo Watanabe, Sado Kurohashi, and Eiji Aramak. 2000. Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In Proc. of the 18th International Conference on Computational Linguistics (COLING 2000).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dekai Wu. 1996. A Polynominal-Time Algorithm for Statistical Machine Translation. In Proc of the 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kenji Yamada and Kevin Knight. 2002. A Decoder for Syntax-based Statistical MT. In Proc. of the 40th Annual Conf. of the Association for Computational Linguistics (ACL 2002), pages 303--310.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    COLING '04: Proceedings of the 20th international conference on Computational Linguistics
    August 2004
    1411 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 23 August 2004

    Qualifiers

    • Article

    Acceptance Rates

    COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader