UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Inference-Based Posteriori Parameter Distribution Optimization
Wang, Xuesong1,2,3; Li, Tianyi1,2,3; Cheng, Yuhu1,2,3; Chen, C. L.Philip4,5
2022-05-01
Source PublicationIEEE Transactions on Cybernetics
ABS Journal Level3
ISSN2168-2267
Volume52Issue:5Pages:3006-3017
Abstract

Encouraging the agent to explore has always been an important and challenging topic in the field of reinforcement learning (RL). Distributional representation for network parameters or value functions is usually an effective way to improve the exploration ability of the RL agent. However, directly changing the representation form of network parameters from fixed values to function distributions may cause algorithm instability and low learning inefficiency. Therefore, to accelerate and stabilize parameter distribution learning, a novel inference-based posteriori parameter distribution optimization (IPPDO) algorithm is proposed. From the perspective of solving the evidence lower bound of probability, we, respectively, design the objective functions for continuous-action and discrete-action tasks of parameter distribution optimization based on inference. In order to alleviate the overestimation of the value function, we use multiple neural networks to estimate value functions with Retrace, and the smaller estimate participates in the network parameter update; thus, the network parameter distribution can be learned. After that, we design a method used for sampling weight from network parameter distribution by adding an activation function to the standard deviation of parameter distribution, which achieves the adaptive adjustment between fixed values and distribution. Furthermore, this IPPDO is a deep RL (DRL) algorithm based on off-policy, which means that it can effectively improve data efficiency by using off-policy techniques such as experience replay. We compare IPPDO with other prevailing DRL algorithms on the OpenAI Gym and MuJoCo platforms. Experiments on both continuous-action and discrete-action tasks indicate that IPPDO can explore more in the action space, get higher rewards faster, and ensure algorithm stability.

KeywordExploration Inference Parameter Distribution Reinforcement Learning (Rl)
DOI10.1109/TCYB.2020.3023127
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaAutomation & Control Systems ; Computer Science
WOS SubjectAutomation & Control Systems ; Computer Science, Artificial Intelligence ; Computer Science, Cybernetics
WOS IDWOS:000798227800039
Scopus ID2-s2.0-85113205295
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorCheng, Yuhu
Affiliation1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China
2.Xuzhou Key Laboratory of Artificial Intelligence and Big Data, China University of Mining and Technology, Xuzhou, 221116, China
3.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
4.Faculty of Science and Technology, The University of Macau, 99999, Macao
5.School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Recommended Citation
GB/T 7714
Wang, Xuesong,Li, Tianyi,Cheng, Yuhu,et al. Inference-Based Posteriori Parameter Distribution Optimization[J]. IEEE Transactions on Cybernetics, 2022, 52(5), 3006-3017.
APA Wang, Xuesong., Li, Tianyi., Cheng, Yuhu., & Chen, C. L.Philip (2022). Inference-Based Posteriori Parameter Distribution Optimization. IEEE Transactions on Cybernetics, 52(5), 3006-3017.
MLA Wang, Xuesong,et al."Inference-Based Posteriori Parameter Distribution Optimization".IEEE Transactions on Cybernetics 52.5(2022):3006-3017.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Xuesong]'s Articles
[Li, Tianyi]'s Articles
[Cheng, Yuhu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Xuesong]'s Articles
[Li, Tianyi]'s Articles
[Cheng, Yuhu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Xuesong]'s Articles
[Li, Tianyi]'s Articles
[Cheng, Yuhu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.