Who Knows the Answer? Finding the Best Model and Prompt for Each Query Using Confidence-Based Search

Authors

  • Walter Gerych MIT CSAIL
  • Yara Rizk IBM Research
  • Vatche Isahagian IBM Research
  • Vinod Muthusamy IBM Research
  • Evelyn Duesterwald IBM Research
  • Praveen Venkateswaran IBM Research

DOI:

https://doi.org/10.1609/aaai.v38i16.29763

Keywords:

NLP: (Large) Language Models

Abstract

There are increasingly many large language models (LLMs) available to the public. While these LLMs have exhibited impressive abilities on a variety of task, any individual LLM in particular may do well on some tasks and worse on others. Additionally, the performance of these models is heavily dependent on the choice of prompt template used. For instance, they exhibit sensitivity to the few shot examples chosen or brittleness to the wording of instructions. Moreover, a prompt template that makes a model perform well for one input may not be the optimal template for another input. This necessitates an approach for adaptively selecting LLM and prompt template pairs for each input. Recent work has shown that the accuracy of LLM's responses is correlated with the LLM's confidence in the response. Thus, a natural choice for selecting which model and prompt template to use is to select the pair that is most confident in its response. However, existing confidence metrics are expensive to calculate - necessitating multiple calls to each LLm and prompt pair. We thus propose an approach to predict the confidence of each pair using an auxiliary regression model that is inexpensive to run. Using this auxiliary model, we select the LLM and prompt template with the highest predicted confidence for a given input. Results on a range of benchmark datasets show that our confidence-based instance-level prompt search method consistently improves the performance of LLMs.

Published

2024-03-24

How to Cite

Gerych, W., Rizk, Y., Isahagian, V., Muthusamy, V., Duesterwald, E., & Venkateswaran, P. (2024). Who Knows the Answer? Finding the Best Model and Prompt for Each Query Using Confidence-Based Search. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18065-18072. https://doi.org/10.1609/aaai.v38i16.29763

Issue

Section

AAAI Technical Track on Natural Language Processing I