Residential College | false |
Status | 即將出版Forthcoming |
AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs | |
Huang, Xingchen1; Huo, Yujia1,2; Wong, Derek F.2,3; Wang, Yao1; Cai, Liqiong1; Jiang, Yonghong4 | |
2025 | |
Conference Name | 13th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2024 |
Source Publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
![]() |
Volume | 15360 LNAI |
Pages | 268-280 |
Conference Date | 1 November 2024 to 3 November 2024 |
Conference Place | Hangzhou; China |
Publisher | Springer Science and Business Media Deutschland GmbH |
Abstract | Large language models (LLMs) have shown exceptional performance in the domain of composite artificial intelligence tasks, offering a preliminary insight into the potential of general artificial intelligence. The fine-tuning process for LLMs necessitates significant computational resources, often surpassing those available from standard consumer-grade GPUs. To this end, we introduce the Adaptive Quantization Low-Rank Adaptation fine-tuning (AQLoRA), a method that reduces memory demands during fine-tuning by utilizing quantization coupled with pruning techniques. This dual strategy not only reduces memory usage but also preserves accuracy. AQLoRA refines the original Low-Rank Adaptation fine-tuning (LoRA) method by efficiently quantizing LLMs weights, prioritizing computational resource allocation based on weight importance, and effectively integrating the quantized model with auxiliary weights post fine-tuning. Applying AQLoRA to the ChatGLM2-6B model, we demonstrate its effectiveness in both natural language generation (NLG) and natural language understanding (NLU) across diverse fine-tuning datasets and scenarios. Our findings reveal that AQLoRA achieves balance between performance and memory efficiency, reducing memory consumption by 25% in NLG tasks. For NLU tasks, it enhances performance by 10% and reduces memory consumption by 10% compared to state-of-the-art methods. |
Keyword | Fine-tuning Large Language Model Lora |
DOI | 10.1007/978-981-97-9434-8_21 |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85210099466 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | University of Macau |
Affiliation | 1.School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang, China 2.NLP²CT Lab, University of Macau, China 3.Department of Computer and Information Science, University of Macau, China 4.Guizhou SiSo Electronics Co., LTD, Guiyang, China |
Recommended Citation GB/T 7714 | Huang, Xingchen,Huo, Yujia,Wong, Derek F.,et al. AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs[C]:Springer Science and Business Media Deutschland GmbH, 2025, 268-280. |
APA | Huang, Xingchen., Huo, Yujia., Wong, Derek F.., Wang, Yao., Cai, Liqiong., & Jiang, Yonghong (2025). AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 15360 LNAI, 268-280. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment