Residential College | false |
Status | 已發表Published |
Inference-Based Posteriori Parameter Distribution Optimization | |
Wang, Xuesong1,2,3; Li, Tianyi1,2,3; Cheng, Yuhu1,2,3![]() ![]() | |
2022-05-01 | |
Source Publication | IEEE Transactions on Cybernetics
![]() |
ABS Journal Level | 3 |
ISSN | 2168-2267 |
Volume | 52Issue:5Pages:3006-3017 |
Abstract | Encouraging the agent to explore has always been an important and challenging topic in the field of reinforcement learning (RL). Distributional representation for network parameters or value functions is usually an effective way to improve the exploration ability of the RL agent. However, directly changing the representation form of network parameters from fixed values to function distributions may cause algorithm instability and low learning inefficiency. Therefore, to accelerate and stabilize parameter distribution learning, a novel inference-based posteriori parameter distribution optimization (IPPDO) algorithm is proposed. From the perspective of solving the evidence lower bound of probability, we, respectively, design the objective functions for continuous-action and discrete-action tasks of parameter distribution optimization based on inference. In order to alleviate the overestimation of the value function, we use multiple neural networks to estimate value functions with Retrace, and the smaller estimate participates in the network parameter update; thus, the network parameter distribution can be learned. After that, we design a method used for sampling weight from network parameter distribution by adding an activation function to the standard deviation of parameter distribution, which achieves the adaptive adjustment between fixed values and distribution. Furthermore, this IPPDO is a deep RL (DRL) algorithm based on off-policy, which means that it can effectively improve data efficiency by using off-policy techniques such as experience replay. We compare IPPDO with other prevailing DRL algorithms on the OpenAI Gym and MuJoCo platforms. Experiments on both continuous-action and discrete-action tasks indicate that IPPDO can explore more in the action space, get higher rewards faster, and ensure algorithm stability. |
Keyword | Exploration Inference Parameter Distribution Reinforcement Learning (Rl) |
DOI | 10.1109/TCYB.2020.3023127 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Automation & Control Systems ; Computer Science |
WOS Subject | Automation & Control Systems ; Computer Science, Artificial Intelligence ; Computer Science, Cybernetics |
WOS ID | WOS:000798227800039 |
Scopus ID | 2-s2.0-85113205295 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Cheng, Yuhu |
Affiliation | 1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China 2.Xuzhou Key Laboratory of Artificial Intelligence and Big Data, China University of Mining and Technology, Xuzhou, 221116, China 3.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China 4.Faculty of Science and Technology, The University of Macau, 99999, Macao 5.School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China |
Recommended Citation GB/T 7714 | Wang, Xuesong,Li, Tianyi,Cheng, Yuhu,et al. Inference-Based Posteriori Parameter Distribution Optimization[J]. IEEE Transactions on Cybernetics, 2022, 52(5), 3006-3017. |
APA | Wang, Xuesong., Li, Tianyi., Cheng, Yuhu., & Chen, C. L.Philip (2022). Inference-Based Posteriori Parameter Distribution Optimization. IEEE Transactions on Cybernetics, 52(5), 3006-3017. |
MLA | Wang, Xuesong,et al."Inference-Based Posteriori Parameter Distribution Optimization".IEEE Transactions on Cybernetics 52.5(2022):3006-3017. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment