Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

Authors

  • Zhen Tan Arizona State University
  • Tianlong Chen University of North Carolina at Chapel Hill
  • Zhenyu Zhang University of Texas at Austin
  • Huan Liu Arizona State University

DOI:

https://doi.org/10.1609/aaai.v38i19.30160

Keywords:

General

Abstract

Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements.

Downloads

Published

2024-03-24

How to Cite

Tan, Z., Chen, T., Zhang, Z., & Liu, H. (2024). Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21619-21627. https://doi.org/10.1609/aaai.v38i19.30160

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track