PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Authors

  • Yuxuan Sun College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Chenglu Zhu Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Sunyi Zheng Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Kai Zhang Department of Computer Science and Engineering, The Ohio State University, USA
  • Lin Sun School of Computer and Computing Science, Hangzhou City University, China
  • Zhongyi Shui College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Yunlong Zhang College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Honglin Li College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
  • Lin Yang Research Center for Industries of the Future and School of Engineering, Westlake University, China

DOI:

https://doi.org/10.1609/aaai.v38i5.28308

Keywords:

CV: Language and Vision, CV: Medical and Biological Imaging, CV: Multi-modal Vision, NLP: (Large) Language Models, NLP: Applications, NLP: Language Grounding & Multi-modal NLP

Abstract

As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes. We open-source our dataset, as well as a comprehensive toolkit for extensive pathology data collection and preprocessing at https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology.

Published

2024-03-24

How to Cite

Sun, Y., Zhu, C., Zheng, S., Zhang, K., Sun, L., Shui, Z., Zhang, Y., Li, H., & Yang, L. (2024). PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 5034-5042. https://doi.org/10.1609/aaai.v38i5.28308

Issue

Section

AAAI Technical Track on Computer Vision IV