PIXEL: Prompt-based Zero-shot Hashing via Visual and Textual Semantic Alignment

doi:10.1145/3627673.3679747

UM > THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)

Residential College	false
Status	已發表Published
	PIXEL: Prompt-based Zero-shot Hashing via Visual and Textual Semantic Alignment
	Dong, Zeyu 1; Long, Qingqing 2; Zhou, Yihang 2; Wang, Pengfei 2; Zhu, Zhihong 3; Luo, Xiao 4; Wang, Yidong 3; Wang, Pengyang5 ; Zhou, Yuanchun 2
	2024-11
Conference Name	33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Source Publication	International Conference on Information and Knowledge Management, Proceedings
Pages	487-496
Conference Date	21-25 October 2024
Conference Place	Boise, Idaho
Country	USA
Publisher	Association for Computing Machinery
Abstract	Zero-Shot Hashing (ZSH) has aroused significant attention due to its efficiency and generalizability in multi-modal retrieval scenarios, which aims to encode semantic information into hash codes without needing unseen labeled training samples. In addition to commonly used visual images as visual semantics and class labels as global semantics, the corresponding attribute descriptions contain critical local semantics with detailed information. However, most existing methods focus on leveraging the extracted attribute numerical values, without exploring the textual semantics in attribute descriptions. To bridge this gap, in this paper, we propose Prompt-based zero-shot hashing via vIsual and teXtual sEmantic aLignment, namely PIXEL. Concretely, we design the attribute prompt template depending on attribute descriptions to make the model capture the corresponding local semantics. Then, achieving the textual embedding and visual embedding, we proposed an alignment module to model the intra- and inter-class contrastive distances. In addition, the attribute-wise constraint and class-wise constraint are utilized to collaboratively learn the hash code, image representation, and visual attributes more effectively. Finally, extensive experimental results demonstrate the superiority of PIXEL.
Keyword	Attribute Hashing Image Prompting Semantic Alignment Zero-shot Hashing
DOI	10.1145/3627673.3679747
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85210037495
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Affiliation	1.Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China 2.Computer Network Information Center, Chinese Academy of Sciences, University of the Chinese Academy of Sciences, Beijing, China 3.Peking University, Beijing, China 4.University of California, Los Angeles, Los Angeles, United States 5.University of Macau, Macau, Macao
Recommended Citation GB/T 7714	Dong, Zeyu,Long, Qingqing,Zhou, Yihang,et al. PIXEL: Prompt-based Zero-shot Hashing via Visual and Textual Semantic Alignment[C]:Association for Computing Machinery, 2024, 487-496.
APA	Dong, Zeyu., Long, Qingqing., Zhou, Yihang., Wang, Pengfei., Zhu, Zhihong., Luo, Xiao., Wang, Yidong., Wang, Pengyang., & Zhou, Yuanchun (2024). PIXEL: Prompt-based Zero-shot Hashing via Visual and Textual Semantic Alignment. International Conference on Information and Knowledge Management, Proceedings, 487-496.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh