UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Proximal Policy Optimization With Policy Feedback
Gu, Yang1,2; Cheng, Yuhu1,2; Chen, C. L.P.3; Wang, Xuesong1,2
2022-07
Source PublicationIEEE Transactions on Systems, Man, and Cybernetics: Systems
ABS Journal Level3
ISSN2168-2216
Volume52Issue:7Pages:4600-4610
Abstract

Proximal policy optimization (PPO) is a deep reinforcement learning algorithm based on the actor-critic (AC) architecture. In the classic AC architecture, the Critic (value) network is used to estimate the value function while the Actor (policy) network optimizes the policy according to the estimated value function. The efficiency of the classic AC architecture is limited due that the policy does not directly participate in the value function update. The classic AC architecture will make the value function estimation inaccurate, which will affect the performance of the PPO algorithm. For improvement, we designed a novel AC architecture with policy feedback (AC-PF) by introducing the policy into the update process of the value function and further proposed the PPO with policy feedback (PPO-PF). For the AC-PF architecture, the policy-based expected (PBE) value function and discount reward formulas are designed by drawing inspiration from expected Sarsa. In order to enhance the sensitivity of the value function to the change of policy and to improve the accuracy of PBE value estimation at the early learning stage, we proposed a policy update method based on the clipped discount factor. Moreover, we specifically defined the loss functions of the policy network and value network to ensure that the policy update of PPO-PF satisfies the unbiased estimation of the trust region. Experiments on Atari games and control tasks show that compared to PPO, PPO-PF has faster convergence speed, higher reward, and smaller variance of reward.

KeywordActor-critic (Ac) Clipped Discount Factor Estimation Games Optimization Policy Feedback Proximal Policy Optimization (Ppo) Space Exploration Task Analysis Training Trajectory Value Function.
DOI10.1109/TSMC.2021.3098451
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaAutomation & Control Systems ; Computer Science
WOS SubjectAutomation & Control Systems ; Computer Science, Cybernetics
WOS IDWOS:000732102700001
Scopus ID2-s2.0-85112651578
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorWang, Xuesong
Affiliation1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education and the School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
2.College of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
3.Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China.
Recommended Citation
GB/T 7714
Gu, Yang,Cheng, Yuhu,Chen, C. L.P.,et al. Proximal Policy Optimization With Policy Feedback[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(7), 4600-4610.
APA Gu, Yang., Cheng, Yuhu., Chen, C. L.P.., & Wang, Xuesong (2022). Proximal Policy Optimization With Policy Feedback. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4600-4610.
MLA Gu, Yang,et al."Proximal Policy Optimization With Policy Feedback".IEEE Transactions on Systems, Man, and Cybernetics: Systems 52.7(2022):4600-4610.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Gu, Yang]'s Articles
[Cheng, Yuhu]'s Articles
[Chen, C. L.P.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Gu, Yang]'s Articles
[Cheng, Yuhu]'s Articles
[Chen, C. L.P.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Gu, Yang]'s Articles
[Cheng, Yuhu]'s Articles
[Chen, C. L.P.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.