Residential College | false |
Status | 已發表Published |
Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations | |
Liang, Weixiang; Liu, Yinlong; Wang, Jikun; Yang, Zhi Xin![]() ![]() | |
2024-12 | |
Source Publication | IEEE Transactions on Automation Science and Engineering
![]() |
ISSN | 1545-5955 |
Abstract | Training robots by model-free deep reinforcement learning (DRL) to carry out robotic manipulation tasks without sufficient successful experiences is challenging. Hindsight experience replay (HER) is introduced to enable DRL agents to learn from failure experiences. However, the HER-enabled model-free DRL still suffers from limited training performance due to its uniform sampling strategy and scarcity of reward information in the task environment. Inspired by the progress incentive mechanism in human psychology, we propose Progress Intrinsic Motivation-based HER (P-HER) in this work to overcome these difficulties. First, the Trajectory Progress-based Prioritized Experience Replay (TPPER) module is developed to prioritize sampling valuable trajectory data thereby achieving more efficient training. Second, the Progress Intrinsic Reward (PIR) module is introduced in agent training to add extra intrinsic rewards for encouraging the agents throughout the exploration of task space. Experiments in challenging robotic manipulation tasks demonstrate that our P-HER method outperforms original HER and state-of-the-art HER-based methods in training performance. Our code of P-HER and its experimental videos in both virtual and real environments are available at https://github.com/weixiang-smart/P-HER. Note to Practitioners - This work is motivated to develop a fast and effective learning method for intelligent robotic manipulation of typical industrial tasks, including pushing, picking, and placing workpieces, which are essential and fundamental processing plan activities for accomplishing robotic machining and assembly applications towards smart manufacturing. The introduction of reinforcement learning enables robots to learn manipulation tasks autonomously, which can save the effort for engineers to teach or hard program the robot and also reduce labor costs. However, the existing HER-based reinforcement learning algorithms are with low training efficiency and performance due to the uniform sampling and scant task reward. Inspired by human learning, this work introduces a progress incentive mechanism to identify valuable trajectory data for effective training. In addition, a novel rewarding method, that applies additional intrinsic rewards for agents learning valuable trajectory space, results in fast and robust learning. The setting of important weight parameters in the rewarding method is given in the paper, which provides a practical reference for applying the proposed algorithm. The average success rate of two actual manipulation tasks in simulation and real robotic manipulation environments are 96% and 92.5%, respectively, which demonstrates that the method is effective for both environments and there is 3.5% average gap of successful rate dropping from simulation scenarios to real ones due to the inherent mismatches between simulation and reality. The high success rate demonstrated in the real Workpieces-sorting task exemplifies the potential of the trained policies for application in industrial scenarios. |
Keyword | Hindsight Experience Replay Progress Intrinsic Motivation Deep Reinforcement Learning Robotic Manipulations |
DOI | 10.1109/TASE.2024.3513354 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Automation & Control Systems |
WOS Subject | Automation & Control Systems |
WOS ID | WOS:001377375800001 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85212286263 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT OF ELECTROMECHANICAL ENGINEERING |
Corresponding Author | Yang, Zhi Xin |
Affiliation | University of Macau, State Key Laboratory of Internet of Things for Smart City, Department of Electromechanical Engineering, Macau, Macao |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Liang, Weixiang,Liu, Yinlong,Wang, Jikun,et al. Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations[J]. IEEE Transactions on Automation Science and Engineering, 2024. |
APA | Liang, Weixiang., Liu, Yinlong., Wang, Jikun., & Yang, Zhi Xin (2024). Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations. IEEE Transactions on Automation Science and Engineering. |
MLA | Liang, Weixiang,et al."Trajectory Progress-Based Prioritizing and Intrinsic Reward Mechanism for Robust Training of Robotic Manipulations".IEEE Transactions on Automation Science and Engineering (2024). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment