Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning

doi:10.3390/math10152728

UM > THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)

Residential College	false
Status	已發表Published
	Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning
	Wang, Siying 1; Chen, Wenyu 1; Hu, Jian 2; Hu, Siyue 3; Huang, Liwei 1,4
	2022-08-02
Source Publication	Mathematics
ISSN	2227-7390
Volume	10 Issue:15 Pages:2728
Abstract	Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the centralized critic. In this paper, we propose a novel noise-injection method to regularize the policies of agents and mitigate the overfitting issue. We analyze the cause of policy overfitting in actor–critic MARL, and design two specific patterns of noise injection applied to the advantage function with random Gaussian noise to stabilize the training and enhance the performance. The experimental results on the Matrix Game and StarCraft II show the higher training efficiency and superior performance of our method, and the ablation studies indicate our method will keep higher entropy of agents’ policies during training, which leads to more exploration.
Keyword	Advantage Function Exploration Multi-agent Reinforcement Learning Noise Injection Proximal Policy Optimization
DOI	10.3390/math10152728
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Mathematics
WOS Subject	Mathematics
WOS ID	WOS:000839905800001
Publisher	MDPI, ST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND
Scopus ID	2-s2.0-85136796852
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Corresponding Author	Hu, Jian
Affiliation	1.School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China 2.Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, 106, Taiwan 3.Department of Computer Science & Information Engineering, National Taiwan University, Taipei, 106, Taiwan 4.The State Key Laboratory of IoTSC, University of Macau, Taipa, 999078, Macao
Recommended Citation GB/T 7714	Wang, Siying,Chen, Wenyu,Hu, Jian,et al. Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning[J]. Mathematics, 2022, 10(15), 2728.
APA	Wang, Siying., Chen, Wenyu., Hu, Jian., Hu, Siyue., & Huang, Liwei (2022). Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning. Mathematics, 10(15), 2728.
MLA	Wang, Siying,et al."Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning".Mathematics 10.15(2022):2728.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh