T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models

doi:10.1609/aaai.v38i5.28226

UM > Faculty of Science and Technology

Residential College	false
Status	已發表Published
	T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models
	Mou, Chong 1,2; Wang, Xintao 2; Xie, Liangbin 2,3,4; Wu, Yanze 2; Zhang, Jian 1; Qi, Zhongang 2; Shan, Ying 2
	2024-03-25
Conference Name	38th AAAI Conference on Artificial Intelligence, AAAI 2024
Source Publication	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue	5
Pages	4296-4304
Conference Date	20 February 2024through 27 February 2024
Conference Place	Vancouver
Abstract	The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated strong power of learning complex structures and meaningful semantics. However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., structure and color) is needed. In this paper, we aim to "dig out"the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. Specifically, we propose to learn low-cost T2I-Adapters to align internal knowledge in T2I models with external control signals, while freezing the original large T2I models. In this way, we can train various adapters according to different conditions, achieving rich control and editing effects in the color and structure of the generation results. Further, the proposed T2IAdapters have attractive properties of practical value, such as composability and generalization ability. Extensive experiments demonstrate that our T2I-Adapter has promising generation quality and a wide range of applications.
Keyword	Cv: Computational Photography Image & Video Synthesis Cv: Multi-modal Vision
DOI	10.1609/aaai.v38i5.28226
URL	View the original
Language	英語English
Scopus ID	2-s2.0-85189556876
Fulltext Access	View Full-Text via DOI View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	Faculty of Science and Technology THE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU) DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Xie, Liangbin
Affiliation	1.Peking University Shenzhen Graduate School, China 2.ARC Lab, Tencent PCG, China 3.University of Macau, Macao 4.Shenzhen Institute of Advanced Technology, China
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Mou, Chong,Wang, Xintao,Xie, Liangbin,et al. T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models[C], 2024, 4296-4304.
APA	Mou, Chong., Wang, Xintao., Xie, Liangbin., Wu, Yanze., Zhang, Jian., Qi, Zhongang., & Shan, Ying (2024). T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4296-4304.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh