Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration

doi:10.1109/TCDS.2020.3034452

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
	Cheng, Yuhu 1,2; Chen, Lin 1,2; Chen, C. L.Philip3,4 ; Wang, Xuesong 1,2
	2021-12-18
Source Publication	IEEE Transactions on Cognitive and Developmental Systems
ISSN	2379-8920
Volume	13 Issue:4 Pages:1023-1032
Abstract	As an important machine learning method, deep reinforcement learning (DRL) has been rapidly developed in recent years and has achieved breakthrough results in many fields, such as video games, natural language processing, and robot control. However, due to the inherit trial-and-error learning mechanism of reinforcement learning and the time-consuming training of deep neural network itself, the convergence speed of DRL is very slow and consequently limits the real applications of DRL. In this article, aiming to improve the convergence speed of DRL, we proposed a novel Steffensen value iteration (SVI) method by applying the Steffensen iteration to the value function iteration of off-policy DRL from the perspective of fixed-point iteration. The proposed SVI is theoretically proved to be convergent and have a faster convergence speed than Bellman value iteration. The proposed SVI has versatility, which can be easily combined with existing off-policy RL algorithms. In this article, we proposed two speedy off-policy DRLs by combining SVI with DDQN and TD3, respectively, namely, SVI-DDQN and SVI-TD3. Experiments on several discrete-action and continuous-action tasks from the Atari 2600 and MuJoCo platforms demonstrated that our proposed SVI-based DRLs can achieve higher average reward in a shorter time than the comparative algorithm.
Keyword	Convergence Speed Deep Reinforcement Learning (Drl) Off-policy Steffensen Iteration Value Iteration (Vi)
DOI	10.1109/TCDS.2020.3034452
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Computer Science ; Neurosciences & Neurology ; Robotics
WOS Subject	Computer Science, Artificial Intelligence ; Robotics ; Neurosciences
WOS ID	WOS:000728925200028
Publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC445 HOES LANE, PISCATAWAY, NJ 08855-4141
Scopus ID	2-s2.0-85096098282
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation	1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou Key Laboratory of Artificial Intelligence and Big Data, Xuzhou, 221116, China 2.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China 3.School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China 4.Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao
Recommended Citation GB/T 7714	Cheng, Yuhu,Chen, Lin,Chen, C. L.Philip,et al. Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration[J]. IEEE Transactions on Cognitive and Developmental Systems, 2021, 13(4), 1023-1032.
APA	Cheng, Yuhu., Chen, Lin., Chen, C. L.Philip., & Wang, Xuesong (2021). Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 1023-1032.
MLA	Cheng, Yuhu,et al."Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration".IEEE Transactions on Cognitive and Developmental Systems 13.4(2021):1023-1032.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh