Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Shuai Shao; Yu Bai; Yan Wang; Baodi Liu; Bin Liu

doi:10.1609/aaai.v38i5.28275

Authors

Shuai Shao Zhejiang Lab
Yu Bai Zhejiang Lab China University of Petroleum (East China)
Yan Wang Beihang University
Baodi Liu China University of Petroleum (East China)
Bin Liu Zhejiang Lab

DOI:

https://doi.org/10.1609/aaai.v38i5.28275

Keywords:

CV: Object Detection & Categorization, CV: Multi-modal Vision, ML: Ensemble Methods, ML: Multimodal Learning

Abstract

Open-World Few-Shot Learning (OFSL) is a crucial research field dedicated to accurately identifying target samples in scenarios where data is limited and labels are unreliable. This research holds significant practical implications and is highly relevant to real-world applications. Recently, the advancements in foundation models like CLIP and DINO have showcased their robust representation capabilities even in resource-constrained settings with scarce data. This realization has brought about a transformative shift in focus, moving away from “building models from scratch” towards “effectively harnessing the potential of foundation models to extract pertinent prior knowledge suitable for OFSL and utilizing it sensibly”. Motivated by this perspective, we introduce the Collaborative Consortium of Foundation Models (CO3), which leverages CLIP, DINO, GPT-3, and DALL-E to collectively address the OFSL problem. CO3 comprises four key blocks: (1) the Label Correction Block (LC-Block) corrects unreliable labels, (2) the Data Augmentation Block (DA-Block) enhances available data, (3) the Feature Extraction Block (FE-Block) extracts multi-modal features, and (4) the Text-guided Fusion Adapter (TeFu-Adapter) integrates multiple features while mitigating the impact of noisy labels through semantic constraints. Only the adapter's parameters are adjustable, while the others remain frozen. Through collaboration among these foundation models, CO3 effectively unlocks their potential and unifies their capabilities to achieve state-of-the-art performance on multiple benchmark datasets. https://github.com/The-Shuai/CO3.

Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription