Residential College | false |
Status | 已發表Published |
Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning | |
Wang, Siying1; Chen, Wenyu1; Hu, Jian2; Hu, Siyue3; Huang, Liwei1,4 | |
2022-08-02 | |
Source Publication | Mathematics |
ISSN | 2227-7390 |
Volume | 10Issue:15Pages:2728 |
Abstract | Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the centralized critic. In this paper, we propose a novel noise-injection method to regularize the policies of agents and mitigate the overfitting issue. We analyze the cause of policy overfitting in actor–critic MARL, and design two specific patterns of noise injection applied to the advantage function with random Gaussian noise to stabilize the training and enhance the performance. The experimental results on the Matrix Game and StarCraft II show the higher training efficiency and superior performance of our method, and the ablation studies indicate our method will keep higher entropy of agents’ policies during training, which leads to more exploration. |
Keyword | Advantage Function Exploration Multi-agent Reinforcement Learning Noise Injection Proximal Policy Optimization |
DOI | 10.3390/math10152728 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Mathematics |
WOS Subject | Mathematics |
WOS ID | WOS:000839905800001 |
Publisher | MDPI, ST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND |
Scopus ID | 2-s2.0-85136796852 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) |
Corresponding Author | Hu, Jian |
Affiliation | 1.School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China 2.Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, 106, Taiwan 3.Department of Computer Science & Information Engineering, National Taiwan University, Taipei, 106, Taiwan 4.The State Key Laboratory of IoTSC, University of Macau, Taipa, 999078, Macao |
Recommended Citation GB/T 7714 | Wang, Siying,Chen, Wenyu,Hu, Jian,et al. Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning[J]. Mathematics, 2022, 10(15), 2728. |
APA | Wang, Siying., Chen, Wenyu., Hu, Jian., Hu, Siyue., & Huang, Liwei (2022). Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning. Mathematics, 10(15), 2728. |
MLA | Wang, Siying,et al."Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning".Mathematics 10.15(2022):2728. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment