Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance

Authors

  • Dong Chen Zhejiang University
  • Yueting Zhuang Zhejiang University
  • Shuo Zhang Zhejiang University
  • Jinfeng Liu Ant Group
  • Su Dong Ant Group
  • Siliang Tang Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i10.29003

Keywords:

ML: Multimodal Learning, CV: Large Vision Models, NLP: (Large) Language Models

Abstract

Pretrained large models, particularly large language models, have garnered increasing attention, as they have demonstrated remarkable abilities through contextual learning. Pretrained large models are increasingly recognized as fundamental tools for solving various tasks. However, the substantial computational demands of large models have dissuaded most product teams and individuals from running them. In such scenarios, to leverage the exceptional performance of large models, one must solely depend on costly APIs, further burdening product teams and individuals. On the other hand, despite the overall inferior performance of small models compared to large models, there are certain distributions where small models can achieve comparable or even superior results. For instance, during training, small models may become trapped in a local optimum that is unique to certain distributions, leading to superior performance. Hence, we propose Data Shunt (DS), a general paradigm for collaboration of small and large models. DS not only substantially reduces the cost associated with deploying large models but also effectively enhances overall performance. Specifically, DS determines the shunting direction by evaluating the confidence level of small models. When the confidence level falls below a specific threshold, the input data is forwarded to large models. To further leverage the advantages of the small and large models, we introduce Prompt Pruning (PP) and 2-Stage Confidence Distillation (2CD), which facilitate mutual collaboration, leading to better results and less cost. The remarkable performance across diverse modalities and tasks demonstrates the superiority of the proposed DS over large models. For instance, ChatGPT achieves an accuracy of 94.43% on Amazon Product sentiment analysis, and DS achieves an accuracy of 95.64%, while the cost has been reduced to only 31.18%. The code for the proposed method are provided for research purposes https://github.com/Anfeather/Data-Shunt.

Published

2024-03-24

How to Cite

Chen, D., Zhuang, Y., Zhang, S., Liu, J., Dong, S., & Tang, S. (2024). Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11249-11257. https://doi.org/10.1609/aaai.v38i10.29003

Issue

Section

AAAI Technical Track on Machine Learning I