Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Oh, Changdae; Lim, Hyesu; Kim, Mijoo; Choo, Jaegul; Hauptmann, Alexander; Cheng, Zhi-Qi; Song, Kyungwoo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.01723 (cs)

[Submitted on 3 Nov 2023 (v1), last revised 12 Feb 2024 (this version, v4)]

Title:Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Authors:Changdae Oh, Hyesu Lim, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

View PDF

Abstract:Robust fine-tuning aims to ensure performance on out-of-distribution (OOD) samples, which is sometimes compromised by pursuing adaptation on in-distribution (ID) samples. However, another criterion for reliable machine learning -- confidence calibration has been overlooked despite its increasing demand for real-world high-stakes applications, e.g., autonomous driving. We raise concerns about the calibration of fine-tuned vision-language models (VLMs) under distribution shift by showing that naive fine-tuning and even state-of-the-art robust fine-tuning hurt the calibration of pre-trained VLMs, especially on OOD datasets. We first show the OOD calibration error is bounded from above with ID calibration errors and domain discrepancy between ID and OOD. From this analysis, we propose CaRot, a calibrated robust fine-tuning method that incentivizes ID calibration and robust prediction across domains to reduce the upper bound of OOD calibration error. Extensive experiments on three types of distribution shifts (natural, synthetic, and adversarial) on ImageNet-1K classification demonstrate the effectiveness of CaRot across diverse environments. We justify the empirical success of CaRot through our theoretical analysis.

Comments:	Presented at the NeurIPS 2023 Workshop on Distribution Shifts (DistShift)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.01723 [cs.CV]
	(or arXiv:2311.01723v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.01723

Submission history

From: Changdae Oh [view email]
[v1] Fri, 3 Nov 2023 05:41:25 UTC (2,774 KB)
[v2] Mon, 6 Nov 2023 17:52:05 UTC (2,774 KB)
[v3] Thu, 30 Nov 2023 00:07:54 UTC (2,774 KB)
[v4] Mon, 12 Feb 2024 02:57:26 UTC (3,951 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators