Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization

Authors

  • Beier Zhu Nanyang Technological University
  • Yulei Niu Columbia University
  • Saeil Lee HMGICS AIR Center
  • Minhoe Hur AIRS Company, Hyundai Motor Group
  • Hanwang Zhang Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v37i3.25496

Keywords:

CV: Bias, Fairness & Privacy, CV: Language and Vision, ML: Bias and Fairness, ML: Classification and Regression

Abstract

We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model “a photo of a [CLASS]”, the fill-in answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its Kullback-Leibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade- off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.

Downloads

Published

2023-06-26

How to Cite

Zhu, B., Niu, Y., Lee, S., Hur, M., & Zhang, H. (2023). Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3834-3842. https://doi.org/10.1609/aaai.v37i3.25496

Issue

Section

AAAI Technical Track on Computer Vision III