Residential College | false |
Status | 即將出版Forthcoming |
GreenLLM: Towards Efficient Large Language Model via Energy-aware Pruning | |
Tian, Chunlin1; Qin, Xinpeng2; Li, Li1 | |
2024 | |
Conference Name | 32nd IEEE/ACM International Symposium on Quality of Service, IWQoS 2024 |
Source Publication | IEEE International Workshop on Quality of Service, IWQoS |
Pages | 202971 |
Conference Date | 19 June 2024through 21 June 2024 |
Conference Place | Guangzhou |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Abstract | This paper proposes GreenLLM, a framework that effectively deploys generative Large Language Models (LLMs) on resource-limited edge devices to well meet the memory and timing constraints with minimized energy consumption. Specifically, GreenLLM employs an energy estimation scheme based on physical hardware to guide a pruning-ratio generator incorporating space, weight, and power (SWaP) constraints for optimal pruning ratio. For each layer, we employ a dependency-aware energy-efficient Pruner in a task-agnostic manner, maximally preserving most of the LLM functionality. Finally, we use downstream datasets to fine-tune the pruned model to recover performance. |
Keyword | Edge Device Energy-aware Pruning Large Language Model |
DOI | 10.1109/IWQoS61813.2024.10682928 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85206351058 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) |
Affiliation | 1.University of Macau, Iotsc, Macao 2.University of Electronic Science and Technology of China, China |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Tian, Chunlin,Qin, Xinpeng,Li, Li. GreenLLM: Towards Efficient Large Language Model via Energy-aware Pruning[C]:Institute of Electrical and Electronics Engineers Inc., 2024, 202971. |
APA | Tian, Chunlin., Qin, Xinpeng., & Li, Li (2024). GreenLLM: Towards Efficient Large Language Model via Energy-aware Pruning. IEEE International Workshop on Quality of Service, IWQoS, 202971. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment