Skip to main content

Bilingual Segmentation for Alignment and Translation

  • Conference paper
  • 1465 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Abstract

We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, the integration of a monolingual language model and a statistical translation model, is devised to tokenize sentences more suitably for bilingual applications such as word alignment and machine translation. Empirical results show that bilingually-motivated segmenters outperform pure monolingual one in both the word-aligning (12% reduction in error rate) and the translating (5% improvement in BLEU) tasks, suggesting monolingual segmentation is useful in some aspects but, in a sense, not built for bilingual researches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., et al.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 263–311 (1993)

    Google Scholar 

  2. Cherry, C., Lin, D.: A probability model to improve word alignment. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 88–95 (2003)

    Google Scholar 

  3. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics, pp. 263–270 (2005)

    Google Scholar 

  4. Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 609–616 (2006)

    Google Scholar 

  5. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 440–447 (2000)

    Google Scholar 

  6. Toutanova, K., Ilhan, H.T., Manning, C.D.: Extentions to HMM-based statistical word alignment models. In: Proceedings of the Conference on Empirical Methods in Natural Processing Language (2002)

    Google Scholar 

  7. Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th conference on Computational linguistics, pp. 836–841 (1996)

    Google Scholar 

  8. Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 377–403 (1997)

    Google Scholar 

  9. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the Annual Conference of the Association for Computational Linguistics (2001)

    Google Scholar 

  10. Zen, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: Proceedings of the Annual Conference of the Association for Computational Linguistics, pp. 144–151 (2003)

    Google Scholar 

  11. Zhang, H., Gildea, D.: Stochastic lexicalized inversion transduction grammar for alignment. In: Proceedings of the Annual Meeting of the ACL, pp. 475–482 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, CC., Chen, WT., Chang, J.S. (2008). Bilingual Segmentation for Alignment and Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78135-6_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78134-9

  • Online ISBN: 978-3-540-78135-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics