Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence

Xu, Minrui; Niyato, Dusit; Zhang, Hongliang; Kang, Jiawen; Xiong, Zehui; Mao, Shiwen; Han, Zhu

Abstract:With the rapid development of artificial general intelligence (AGI), various multimedia services based on pretrained foundation models (PFMs) need to be effectively deployed. With edge servers that have cloud-level computing power, edge intelligence can extend the capabilities of AGI to mobile edge networks. However, compared with cloud data centers, resource-limited edge servers can only cache and execute a small number of PFMs, which typically consist of billions of parameters and require intensive computing power and GPU memory during inference. To address this challenge, in this paper, we propose a joint foundation model caching and inference framework that aims to balance the tradeoff among inference latency, accuracy, and resource consumption by managing cached PFMs and user requests efficiently during the provisioning of generative AI services. Specifically, considering the in-context learning ability of PFMs, a new metric named the Age of Context (AoC), is proposed to model the freshness and relevance between examples in past demonstrations and current service requests. Based on the AoC, we propose a least context caching algorithm to manage cached PFMs at edge servers with historical prompts and inference results. The numerical results demonstrate that the proposed algorithm can reduce system costs compared with existing baselines by effectively utilizing contextual information.

Subjects:	Networking and Internet Architecture (cs.NI)
Cite as:	arXiv:2305.12130 [cs.NI]
	(or arXiv:2305.12130v1 [cs.NI] for this version)
	https://doi.org/10.48550/arXiv.2305.12130

Computer Science > Networking and Internet Architecture

Title:Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators