Residential College | false |
Status | 已發表Published |
Unifying Image Processing as Visual Prompting Question Answering | |
Yihao Liu1,2; Xiangyu Chen1,2,3; Xianzheng Ma1; Xintao Wang4; Jiantao Zhou3; Yu Qiao1,2; Chao Dong1,2 | |
2024-02 | |
Conference Name | Proceedings of IEEE Conference on Machine Learning |
Source Publication | Proceedings of Machine Learning Research |
Volume | 235 |
Pages | 30873 - 30891 |
Conference Date | July 21 through July 27, 2024. |
Conference Place | Vienna, Austria. |
Publisher | ML Research Press |
Abstract | Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies these diverse image processing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the image processing task as a prompting QA problem. PromptGIP can undertake diverse cross-domain tasks using provided visual prompts, eliminating the need for task-specific finetuning. Capable of handling up to 15 different image processing tasks, PromptGIP represents a versatile and adaptive approach to general image processing. |
URL | View the original |
Language | 英語English |
Scopus ID | 2-s2.0-85203824374 |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Chao Dong |
Affiliation | 1.Shanghai Artificial Intelligence Laboratory 2.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 3.University of Macau 4.ARC Lab, Tencent PCG |
Recommended Citation GB/T 7714 | Yihao Liu,Xiangyu Chen,Xianzheng Ma,et al. Unifying Image Processing as Visual Prompting Question Answering[C]:ML Research Press, 2024, 30873 - 30891. |
APA | Yihao Liu., Xiangyu Chen., Xianzheng Ma., Xintao Wang., Jiantao Zhou., Yu Qiao., & Chao Dong (2024). Unifying Image Processing as Visual Prompting Question Answering. Proceedings of Machine Learning Research, 235, 30873 - 30891. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment