Skip to main content

Models in the Wild: On Corruption Robustness of Neural NLP Systems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11955))

Abstract

Natural Language Processing models lack a unified approach to robustness testing. In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. We release the code of WildNLP framework for the community.

B. Rychalska and D. Basaj—Equal contribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/MI2DataLab/WildNLP/.

  2. 2.

    https://en.wikipedia.org/wiki/Commonly_misspelled_English_words.

  3. 3.

    https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/Homophones.

  4. 4.

    https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.

  5. 5.

    https://github.com/huggingface/pytorch-pretrained-BERT.

References

  1. Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: COLING 2018, 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)

    Google Scholar 

  2. Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. CoRR abs/1711.02173 (2017). http://arxiv.org/abs/1711.02173

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  4. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2015)

    Google Scholar 

  5. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 670–680. Association for Computational Linguistics, Copenhagen, September 2017. https://www.aclweb.org/anthology/D17-1070

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for NLP. CoRR abs/1712.06751 (2017). http://arxiv.org/abs/1712.06751

  8. Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. CoRR abs/1801.04354 (2018). http://arxiv.org/abs/1801.04354

  9. Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform (2017)

    Google Scholar 

  10. Glockner, M., Shwartz, V., Goldberg, Y.: Breaking NLI systems with sentences that require simple lexical inferences. CoRR abs/1805.02266 (2018). http://arxiv.org/abs/1805.02266

  11. Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015). http://arxiv.org/abs/1412.6572

  12. Howard, J., Ruder, S.: Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146

  13. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. CoRR abs/1707.07328 (2017). http://arxiv.org/abs/1707.07328

  14. Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. CoRR abs/1704.08006 (2017). http://arxiv.org/abs/1704.08006

  15. Papernot, N., McDaniel, P.D., Goodfellow, I.J.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR abs/1605.07277 (2016). http://arxiv.org/abs/1605.07277

  16. Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. In: EMNLP (2016)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162

  18. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)

    Google Scholar 

  19. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/D16-1264. http://aclweb.org/anthology/D16-1264

  20. Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1, pp. 856–865. Association for Computational Linguistics (2018). http://aclweb.org/anthology/P18-1079

Download references

Acknowledgements

Barbara Rychalska and Dominika Basaj were financially supported by grant no. 2018/31/N/ST6/02273 funded by National Science Centre, Poland. Our research was partially supported as a part of RENOIR Project by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skodłowska-Curie grant agreement No. 691152 and by the Ministry of Science and Higher Education (Poland), grant No. W34/H2020/2016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Rychalska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rychalska, B., Basaj, D., Gosiewska, A., Biecek, P. (2019). Models in the Wild: On Corruption Robustness of Neural NLP Systems. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11955. Springer, Cham. https://doi.org/10.1007/978-3-030-36718-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36718-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36717-6

  • Online ISBN: 978-3-030-36718-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics