Residential College | false |
Status | 已發表Published |
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning | |
Chen, Zhongzhi1,2; Sun, Xingwu2; Jiao, Xianfeng2; Lian, Fengzong2; Kang, Zhanhui2; Wang, Di2; Xu, Cheng Zhong3 | |
2024-03-25 | |
Conference Name | 38th AAAI Conference on Artificial Intelligence, AAAI 2024 |
Source Publication | Proceedings of the AAAI Conference on Artificial Intelligence |
Volume | 38 |
Issue | 19 |
Pages | 20967-20974 |
Conference Date | 20-27 February 2024 |
Conference Place | Vancouver |
Country | Canada |
Abstract | Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8% to 74.5% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. |
Keyword | General |
DOI | 10.1609/aaai.v38i19.30087 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods |
WOS ID | WOS:001239984900008 |
Scopus ID | 2-s2.0-85189634541 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Chen, Zhongzhi; Sun, Xingwu |
Affiliation | 1.Beihang University, China 2.Tencent Inc., China 3.University of Macau, Macao |
Recommended Citation GB/T 7714 | Chen, Zhongzhi,Sun, Xingwu,Jiao, Xianfeng,et al. Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning[C], 2024, 20967-20974. |
APA | Chen, Zhongzhi., Sun, Xingwu., Jiao, Xianfeng., Lian, Fengzong., Kang, Zhanhui., Wang, Di., & Xu, Cheng Zhong (2024). Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 20967-20974. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment