SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Song, Jiwon; Oh, Kyungseok; Kim, Taesu; Kim, Hyungjun; Kim, Yulhwa; Kim, Jae-Joon

Computer Science > Computation and Language

arXiv:2402.09025 (cs)

[Submitted on 14 Feb 2024]

Title:SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Authors:Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim

View PDF

Abstract:Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB successfully accelerates LLM inference without compromising the linguistic capabilities of these models, making it a promising technique for optimizing the efficiency of LLMs. The code is available at: this https URL

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2402.09025 [cs.CL]
	(or arXiv:2402.09025v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.09025

Submission history

From: Yulhwa Kim [view email]
[v1] Wed, 14 Feb 2024 09:01:13 UTC (8,750 KB)

Computer Science > Computation and Language

Title:SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators