PEACook: Post-editing Advancement Cookbook

Tao, Shimin; Guo, Jiaxing; Zhao, Yanqing; Zhang, Min; Wei, Daimeng; Wang, Minghan; Yang, Hao; Ma, Miaomiao; Qin, Ying

doi:10.1007/978-981-19-7960-6_1

Shimin Tao⁷,
Jiaxing Guo⁷,
Yanqing Zhao⁷,
Min Zhang⁷,
Daimeng Wei⁷,
Minghan Wang⁷,
Hao Yang⁷,
Miaomiao Ma⁷ &
…
Ying Qin⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1671))

Included in the following conference series:

China Conference on Machine Translation

220 Accesses

Abstract

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing efforts. Training on APE models has made a great progress since 2015; however, whether APE models are really performing well on domain samples remains as an open question, and achieving this is still a hard task. This paper provides a mobile domain APE corpus with 50.1 TER/37.4 BLEU for the En-Zh language pair. This corpus is much more practical than that provided in WMT 2021 APE tasks (18.05 TER/71.07 BLEU for En-De, 22.73 TER/69.2 BLEU for En-Zh) [1]. To obtain a more comprehensive investigation on the presented corpus, this paper provides two mainstream models as the Cookbook baselines: (1) Autoregressive Translation APE model (AR-APE) based on HW-TSC APE 2020 [2], which is the SOTA model of WMT 2020 APE tasks. (2) Non-Autoregressive Translation APE model (NAR-APE) based on the well-known Levenshtein Transformer [3]. Experiments show that both the mainstream models of AR and NAR can effectively improve the effect of APE. The corpus has been released in the CCMT 2022 APE evaluation task and the baseline models will be open-sourced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akhbardeh, F., et al.: Findings of the 2021 conference on machine translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1–88. Association for Computational Linguistics, November 2021 (Online)
Google Scholar
Yang, H., et al.: HW-TSC’s participation at WMT 2020 automatic post editing shared task. In: Proceedings of the Fifth Conference on Machine Translation, pp. 797–802. Association for Computational Linguistics, November 2020 (Online)
Google Scholar
Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Bojar, O., et al.: Findings of the 2015 workshop on statistical machine translation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 1–46. Association for Computational Linguistics, September 2015
Google Scholar
Junczys-Dowmunt, M.: Are we experiencing the golden age of automatic post-editing? In: Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing, Boston, MA, pp. 144–206. Association for Machine Translation in the Americas, March 2018
Google Scholar
Akhbardeh, F., et al.: Findings of the 2021 conference on machine translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1–88 (2021)
Google Scholar
Chatterjee, R., Federmann, C., Negri, M., Turchi, M.: Findings of the WMT 2020 shared task on automatic post-editing. In: Barrault, L., et al. (eds.) Proceedings of the Fifth Conference on Machine Translation, WMT@EMNLP 2020, 19–20 November 2020, pp. 646–659. Association for Computational Linguistics (2020, online)
Google Scholar
Gu, J., Bradbury, J., Xiong, C., Li, V.O.K., Socher, R.: Non-autoregressive neural machine translation. In: International Conference on Learning Representations (2018)
Google Scholar
Snover, M., Dorr, B.J., Schwartz, R., Micciulla, L.: A study of translation edit rate with targeted human annotation (2006)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. Association for Computational Linguistics, July 2002
Google Scholar
Chatterjee, R., Freitag, M., Negri, M., Turchi, M.: Findings of the WMT 2020 shared task on automatic post-editing. In: Proceedings of the Fifth Conference on Machine Translation, pp. 646–659. Association for Computational Linguistics, November 2020 (Online)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017, pp. 5998–6008 (2017)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)
Google Scholar
Lopes, A.V., Farajian, M.A., Correia, G.M., Trénous, J., Martins, A.F.: Unbabel’s submission to the WMT2019 APE shared task: Bert-based encoder-decoder for automatic post-editing. CoRR, abs/1905.13068 (2019)
Google Scholar
Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019, pp. 11179–11189 (2019)
Google Scholar
Chollampatt, S., Susanto, R.H., Tan, L., Szymanska, E.: Can automatic post-editing improve NMT? arXiv preprint arXiv:2009.14395 (2020)
Wang, M., et al.: HW-TSC’s participation at WMT 2020 quality estimation shared task. In: Proceedings of the Fifth Conference on Machine Translation, pp. 1056–1061. Association for Computational Linguistics, November 2020 (Online)
Google Scholar
Yang, H., et al. Hw-TSC’s submissions to the WMT21 biomedical translation task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 879–884 (2021)
Google Scholar
Ng, N., Yee, K., Baevski, A., Ott, M., Auli, M., Edunov, S.: Facebook fair’s WMT19 news translation task submission. In: Bojar, O., et al. (eds.) Proceedings of the Fourth Conference on Machine Translation, WMT 2019, Florence, Italy, 1–2 August 2019 - Volume 2: Shared Task Papers, Day 1, pp. 314–319. Association for Computational Linguistics (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

2012 Labs, Huawei Technologies CO., LTD., Beijing, China
Shimin Tao, Jiaxing Guo, Yanqing Zhao, Min Zhang, Daimeng Wei, Minghan Wang, Hao Yang, Miaomiao Ma & Ying Qin

Authors

Shimin Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Daimeng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Minghan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ying Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Yang .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Tong Xiao
Meta AI, San Francisco, CA, USA
Juan Pino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tao, S. et al. (2022). PEACook: Post-editing Advancement Cookbook. In: Xiao, T., Pino, J. (eds) Machine Translation. CCMT 2022. Communications in Computer and Information Science, vol 1671. Springer, Singapore. https://doi.org/10.1007/978-981-19-7960-6_1

Download citation

DOI: https://doi.org/10.1007/978-981-19-7960-6_1
Published: 09 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7959-0
Online ISBN: 978-981-19-7960-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics