MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts

Authors

  • Zhitian Xie Ant Group
  • Yinger Zhang Zhejiang University
  • Chenyi Zhuang Ant Group
  • Qitao Shi Ant Group
  • Zhining Liu Ant Group
  • Jinjie Gu Ant Group
  • Guannan Zhang Ant Group

DOI:

https://doi.org/10.1609/aaai.v38i14.29539

Keywords:

ML: Deep Learning Algorithms

Abstract

The application of mixture-of-experts (MoE) is gaining popularity due to its ability to improve model's performance. In an MoE structure, the gate layer plays a significant role in distinguishing and routing input features to different experts. This enables each expert to specialize in processing their corresponding sub-tasks. However, the gate's routing mechanism also gives rise to "narrow vision": the individual MoE's expert fails to use more samples in learning the allocated subtask, which in turn limits the MoE to further improve its generalization ability. To effectively address this, we propose a method called Mixture-of-Distilled-Expert (MoDE), which applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts and gain more accurate perceptions on their allocated sub-tasks. We conduct plenty experiments including tabular, NLP and CV datasets, which shows MoDE's effectiveness, universality and robustness. Furthermore, we develop a parallel study through innovatively constructing "expert probing", to experimentally prove why MoDE works: moderate distilling knowledge from other experts can improve each individual expert's test performances on their assigned tasks, leading to MoE's overall performance improvement.

Published

2024-03-24

How to Cite

Xie, Z., Zhang, Y., Zhuang, C., Shi, Q., Liu, Z., Gu, J., & Zhang, G. (2024). MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts. Proceedings of the AAAI Conference on Artificial Intelligence, 38(14), 16067-16075. https://doi.org/10.1609/aaai.v38i14.29539

Issue

Section

AAAI Technical Track on Machine Learning V