PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Yuxuan Sun; Chenglu Zhu; Sunyi Zheng; Kai Zhang; Lin Sun; Zhongyi Shui; Yunlong Zhang; Honglin Li; Lin Yang

doi:10.1609/aaai.v38i5.28308

Authors

Yuxuan Sun College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
Chenglu Zhu Research Center for Industries of the Future and School of Engineering, Westlake University, China
Sunyi Zheng Research Center for Industries of the Future and School of Engineering, Westlake University, China
Kai Zhang Department of Computer Science and Engineering, The Ohio State University, USA
Lin Sun School of Computer and Computing Science, Hangzhou City University, China
Zhongyi Shui College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
Yunlong Zhang College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
Honglin Li College of Computer Science and Technology, Zhejiang University, China Research Center for Industries of the Future and School of Engineering, Westlake University, China
Lin Yang Research Center for Industries of the Future and School of Engineering, Westlake University, China

DOI:

https://doi.org/10.1609/aaai.v38i5.28308

Keywords:

CV: Language and Vision, CV: Medical and Biological Imaging, CV: Multi-modal Vision, NLP: (Large) Language Models, NLP: Applications, NLP: Language Grounding & Multi-modal NLP

Abstract

As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes. We open-source our dataset, as well as a comprehensive toolkit for extensive pathology data collection and preprocessing at https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology.

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription