Residential College | false |
Status | 已發表Published |
Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation | |
Suo, Yucheng1; Zheng, Zhedong2; Wang, Xiaohan1; Zhang, Bang3; Yang, Yi1 | |
2024-03 | |
Source Publication | ACM Transactions on Multimedia Computing, Communications and Applications |
ISSN | 1551-6857 |
Volume | 20Issue:6Pages:185 |
Abstract | Sign language provides a way for differently-abled individuals to express their feelings and emotions. However, learning sign language can be challenging and time consuming. An alternative approach is to animate user photos using sign language videos of specific words, which can be achieved using existing image animation methods. However, the finger motions in the generated videos are often not ideal. To address this issue, we propose the Structure-aware Temporal Consistency Network (STCNet), which jointly optimizes the prior structure of humans with temporal consistency to produce sign language videos. We use a fine-grained skeleton detector to acquire knowledge of body structure and introduce both short- and long-term cycle loss to ensure the continuity of the generated video. The two losses and keypoint detector network are optimized in an end-to-end manner. Quantitative and qualitative evaluations on three widely used datasets, namely LSA64, Phoenix-2014T, and WLASL-2000, demonstrate the effectiveness of the proposed method. It is our hope that this work can contribute to future studies on sign language production. |
Keyword | Sign Language Jointly Training Motion Transfer Video Generation |
DOI | 10.1145/3648368 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods |
WOS ID | WOS:001208681800035 |
Publisher | ASSOC COMPUTING MACHINERY, 1601 Broadway, 10th Floor, NEW YORK, NY 10019-7434 |
Scopus ID | 2-s2.0-85189454373 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology INSTITUTE OF COLLABORATIVE INNOVATION DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Yang, Yi |
Affiliation | 1.College of Computer Science and Technology, Zhejiang University, Hangzhou, 38 Zheda Road, Xihu District, , Zhejiang, 310027, China 2.Faculty of Science and Technology, Institute of Collaborative Innovation, University of Macau, Taipa University Boulevard, 999078, Macao 3.DAMO Academy, Alibaba Group, Hangzhou, 969 Wenyi West Road, Yuhang District, , Zhejiang, 311121, China |
Recommended Citation GB/T 7714 | Suo, Yucheng,Zheng, Zhedong,Wang, Xiaohan,et al. Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20(6), 185. |
APA | Suo, Yucheng., Zheng, Zhedong., Wang, Xiaohan., Zhang, Bang., & Yang, Yi (2024). Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation. ACM Transactions on Multimedia Computing, Communications and Applications, 20(6), 185. |
MLA | Suo, Yucheng,et al."Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation".ACM Transactions on Multimedia Computing, Communications and Applications 20.6(2024):185. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment