Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Authors

  • Maxime Darrin International Laboratory on Learning Systems MILA - Quebec AI Institute McGill University Université Paris-Saclay
  • Guillaume Staerman Université Paris-Saclay CNRS INRIA, CEA, Paris
  • Eduardo Dadalto Camara Gomes Université Paris-Saclay Laboratoire signaux et systèmes CNRS CentraleSupelec
  • Jackie C. K. Cheung McGill University MILA - Quebec AI Institute Canada CIFAR AI Chair, Mila
  • Pablo Piantanida International Laboratory on Learning Systems MILA - Quebec AI Institute Université Paris-Saclay CNRS
  • Pierre Colombo Université Paris-Saclay CentraleSupelec Equal, Paris MICS

DOI:

https://doi.org/10.1609/aaai.v38i16.29742

Keywords:

NLP: Safety and Robustness, NLP: Ethics -- Bias, Fairness, Transparency & Privacy, NLP: Text Classification

Abstract

Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on anomaly scores (\textit{e.g.}, Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results can be achieved, provided that an oracle selects the best layer. We propose a data-driven, unsupervised method to leverage this observation to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a more significant number of classes (up to 150), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results comparable to using the best layer according to an oracle while removing manual feature selection altogether.

Downloads

Published

2024-03-24

How to Cite

Darrin, M., Staerman, G., Dadalto Camara Gomes, E., Cheung, J. C. K., Piantanida, P., & Colombo, P. (2024). Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17880-17888. https://doi.org/10.1609/aaai.v38i16.29742

Issue

Section

AAAI Technical Track on Natural Language Processing I