Residential College | false |
Status | 已發表Published |
Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval | |
Lei Liao1; Meng Yang1,2; Bob Zhang3 | |
2022-09-07 | |
Source Publication | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY |
ISSN | 1051-8215 |
Volume | 33Issue:2Pages:920-934 |
Abstract | Cross-modal retrieval tasks, which are more natural and challenging than traditional retrieval tasks, have attracted increasing interest from researchers in recent years. Although different modalities with the same semantics have some potential relevance, the feature space heterogeneity still seriously weakens the performance of cross-modal retrieval models. To solve this problem, common space-based methods in which multimodal data is projected into a learned common space for similarity measurement have become the mainstream approach for cross-modal retrieval tasks. However, current methods entangle the modality style and semantic content in the common space and neglect to fully explore the semantic and discriminative representation/ reconstruction of the semantic content. This often results in an unsatisfactory retrieval performance. To solve these issues, this paper proposes a new Deep Supervised Dual Cycle Adversarial Network (DSDCAN) model based on common space learning. It is composed of two cross-modal cycle GANs, one for the image and one for the text. The proposed cycle GAN model disentangles the semantic content and modality style features by making the data of one modality well reconstructed from the extracted modal style feature and the content feature of the other modality. Then, a discriminative semantic and label loss is proposed by fully considering the category, sample contrast, and label supervision to enhance the semantic discrimination of the common space representation. Besides this, to make the data distribution between two modalities similar, a second-order similarity is presented as a distance measurement of the cross-modal representation in the common space. Extensive experiments have been conducted on the Wikipedia, Pascal Sentence, NUS-WIDE-10k, PKU XMedia, MSCOCO, NUS-WIDE, Flickr30k and MIRFlickr datasets. The results demonstrate that the proposed method can achieve a higher performance than the state-of-the-art methods. |
Keyword | Cross-modal Retrieval Dual Cycle Generative Adversarial Networks Deep Supervised Learning |
DOI | 10.1109/TCSVT.2022.3203247 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Engineering |
WOS Subject | Engineering, Electrical & Electronic |
WOS ID | WOS:000941726100035 |
Publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC445 HOES LANE, PISCATAWAY, NJ 08855-4141 |
Scopus ID | 2-s2.0-85137932863 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | Faculty of Science and Technology DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Meng Yang |
Affiliation | 1.School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2.Key Laboratory of Machine Intelligence and Advanced Computing (SYSU), Ministry of Education, Guangzhou, China 3.Department of Computer and Information Science, PAMI Research Group, University of Macau, Macau, China |
Recommended Citation GB/T 7714 | Lei Liao,Meng Yang,Bob Zhang. Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 33(2), 920-934. |
APA | Lei Liao., Meng Yang., & Bob Zhang (2022). Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 33(2), 920-934. |
MLA | Lei Liao,et al."Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.2(2022):920-934. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment