Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

doi:10.1145/3648368

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation
	Suo, Yucheng 1; Zheng, Zhedong2 ; Wang, Xiaohan 1; Zhang, Bang 3; Yang, Yi 1
	2024-03
Source Publication	ACM Transactions on Multimedia Computing, Communications and Applications
ISSN	1551-6857
Volume	20 Issue:6 Pages:185
Abstract	Sign language provides a way for differently-abled individuals to express their feelings and emotions. However, learning sign language can be challenging and time consuming. An alternative approach is to animate user photos using sign language videos of specific words, which can be achieved using existing image animation methods. However, the finger motions in the generated videos are often not ideal. To address this issue, we propose the Structure-aware Temporal Consistency Network (STCNet), which jointly optimizes the prior structure of humans with temporal consistency to produce sign language videos. We use a fine-grained skeleton detector to acquire knowledge of body structure and introduce both short- and long-term cycle loss to ensure the continuity of the generated video. The two losses and keypoint detector network are optimized in an end-to-end manner. Quantitative and qualitative evaluations on three widely used datasets, namely LSA64, Phoenix-2014T, and WLASL-2000, demonstrate the effectiveness of the proposed method. It is our hope that this work can contribute to future studies on sign language production.
Keyword	Sign Language Jointly Training Motion Transfer Video Generation
DOI	10.1145/3648368
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Computer Science
WOS Subject	Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods
WOS ID	WOS:001208681800035
Publisher	ASSOC COMPUTING MACHINERY, 1601 Broadway, 10th Floor, NEW YORK, NY 10019-7434
Scopus ID	2-s2.0-85189454373
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	Faculty of Science and Technology INSTITUTE OF COLLABORATIVE INNOVATION DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Yang, Yi
Affiliation	1.College of Computer Science and Technology, Zhejiang University, Hangzhou, 38 Zheda Road, Xihu District, , Zhejiang, 310027, China 2.Faculty of Science and Technology, Institute of Collaborative Innovation, University of Macau, Taipa University Boulevard, 999078, Macao 3.DAMO Academy, Alibaba Group, Hangzhou, 969 Wenyi West Road, Yuhang District, , Zhejiang, 311121, China
Recommended Citation GB/T 7714	Suo, Yucheng,Zheng, Zhedong,Wang, Xiaohan,et al. Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20(6), 185.
APA	Suo, Yucheng., Zheng, Zhedong., Wang, Xiaohan., Zhang, Bang., & Yang, Yi (2024). Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation. ACM Transactions on Multimedia Computing, Communications and Applications, 20(6), 185.
MLA	Suo, Yucheng,et al."Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation".ACM Transactions on Multimedia Computing, Communications and Applications 20.6(2024):185.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh