Residential College | false |
Status | 已發表Published |
Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration | |
Cheng, Yuhu1,2; Chen, Lin1,2; Chen, C. L.Philip3,4; Wang, Xuesong1,2 | |
2021-12-18 | |
Source Publication | IEEE Transactions on Cognitive and Developmental Systems |
ISSN | 2379-8920 |
Volume | 13Issue:4Pages:1023-1032 |
Abstract | As an important machine learning method, deep reinforcement learning (DRL) has been rapidly developed in recent years and has achieved breakthrough results in many fields, such as video games, natural language processing, and robot control. However, due to the inherit trial-and-error learning mechanism of reinforcement learning and the time-consuming training of deep neural network itself, the convergence speed of DRL is very slow and consequently limits the real applications of DRL. In this article, aiming to improve the convergence speed of DRL, we proposed a novel Steffensen value iteration (SVI) method by applying the Steffensen iteration to the value function iteration of off-policy DRL from the perspective of fixed-point iteration. The proposed SVI is theoretically proved to be convergent and have a faster convergence speed than Bellman value iteration. The proposed SVI has versatility, which can be easily combined with existing off-policy RL algorithms. In this article, we proposed two speedy off-policy DRLs by combining SVI with DDQN and TD3, respectively, namely, SVI-DDQN and SVI-TD3. Experiments on several discrete-action and continuous-action tasks from the Atari 2600 and MuJoCo platforms demonstrated that our proposed SVI-based DRLs can achieve higher average reward in a shorter time than the comparative algorithm. |
Keyword | Convergence Speed Deep Reinforcement Learning (Drl) Off-policy Steffensen Iteration Value Iteration (Vi) |
DOI | 10.1109/TCDS.2020.3034452 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science ; Neurosciences & Neurology ; Robotics |
WOS Subject | Computer Science, Artificial Intelligence ; Robotics ; Neurosciences |
WOS ID | WOS:000728925200028 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85096098282 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou Key Laboratory of Artificial Intelligence and Big Data, Xuzhou, 221116, China 2.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China 3.School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China 4.Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao |
Recommended Citation GB/T 7714 | Cheng, Yuhu,Chen, Lin,Chen, C. L.Philip,et al. Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration[J]. IEEE Transactions on Cognitive and Developmental Systems, 2021, 13(4), 1023-1032. |
APA | Cheng, Yuhu., Chen, Lin., Chen, C. L.Philip., & Wang, Xuesong (2021). Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 1023-1032. |
MLA | Cheng, Yuhu,et al."Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration".IEEE Transactions on Cognitive and Developmental Systems 13.4(2021):1023-1032. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment