UM  > Faculty of Science and Technology
Residential Collegefalse
Status已發表Published
Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations
Liang, Weixiang; Liu, Yinlong; Wang, Jikun; Yang, Zhi Xin
2024-12
Source PublicationIEEE Transactions on Automation Science and Engineering
ISSN1545-5955
Abstract

Training robots by model-free deep reinforcement learning (DRL) to carry out robotic manipulation tasks without sufficient successful experiences is challenging. Hindsight experience replay (HER) is introduced to enable DRL agents to learn from failure experiences. However, the HER-enabled model-free DRL still suffers from limited training performance due to its uniform sampling strategy and scarcity of reward information in the task environment. Inspired by the progress incentive mechanism in human psychology, we propose Progress Intrinsic Motivation-based HER (P-HER) in this work to overcome these difficulties. First, the Trajectory Progress-based Prioritized Experience Replay (TPPER) module is developed to prioritize sampling valuable trajectory data thereby achieving more efficient training. Second, the Progress Intrinsic Reward (PIR) module is introduced in agent training to add extra intrinsic rewards for encouraging the agents throughout the exploration of task space. Experiments in challenging robotic manipulation tasks demonstrate that our P-HER method outperforms original HER and state-of-the-art HER-based methods in training performance. Our code of P-HER and its experimental videos in both virtual and real environments are available at https://github.com/weixiang-smart/P-HER. Note to Practitioners - This work is motivated to develop a fast and effective learning method for intelligent robotic manipulation of typical industrial tasks, including pushing, picking, and placing workpieces, which are essential and fundamental processing plan activities for accomplishing robotic machining and assembly applications towards smart manufacturing. The introduction of reinforcement learning enables robots to learn manipulation tasks autonomously, which can save the effort for engineers to teach or hard program the robot and also reduce labor costs. However, the existing HER-based reinforcement learning algorithms are with low training efficiency and performance due to the uniform sampling and scant task reward. Inspired by human learning, this work introduces a progress incentive mechanism to identify valuable trajectory data for effective training. In addition, a novel rewarding method, that applies additional intrinsic rewards for agents learning valuable trajectory space, results in fast and robust learning. The setting of important weight parameters in the rewarding method is given in the paper, which provides a practical reference for applying the proposed algorithm. The average success rate of two actual manipulation tasks in simulation and real robotic manipulation environments are 96% and 92.5%, respectively, which demonstrates that the method is effective for both environments and there is 3.5% average gap of successful rate dropping from simulation scenarios to real ones due to the inherent mismatches between simulation and reality. The high success rate demonstrated in the real Workpieces-sorting task exemplifies the potential of the trained policies for application in industrial scenarios.

KeywordHindsight Experience Replay Progress Intrinsic Motivation Deep Reinforcement Learning Robotic Manipulations
DOI10.1109/TASE.2024.3513354
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaAutomation & Control Systems
WOS SubjectAutomation & Control Systems
WOS IDWOS:001377375800001
PublisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141
Scopus ID2-s2.0-85212286263
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionFaculty of Science and Technology
THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
DEPARTMENT OF ELECTROMECHANICAL ENGINEERING
Corresponding AuthorYang, Zhi Xin
AffiliationUniversity of Macau, State Key Laboratory of Internet of Things for Smart City, Department of Electromechanical Engineering, Macau, Macao
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Liang, Weixiang,Liu, Yinlong,Wang, Jikun,et al. Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations[J]. IEEE Transactions on Automation Science and Engineering, 2024.
APA Liang, Weixiang., Liu, Yinlong., Wang, Jikun., & Yang, Zhi Xin (2024). Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations. IEEE Transactions on Automation Science and Engineering.
MLA Liang, Weixiang,et al."Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations".IEEE Transactions on Automation Science and Engineering (2024).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Liang, Weixiang]'s Articles
[Liu, Yinlong]'s Articles
[Wang, Jikun]'s Articles
Baidu academic
Similar articles in Baidu academic
[Liang, Weixiang]'s Articles
[Liu, Yinlong]'s Articles
[Wang, Jikun]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Liang, Weixiang]'s Articles
[Liu, Yinlong]'s Articles
[Wang, Jikun]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.