UNIFIED DISCRETE DIFFUSION FOR SIMULTANEOUS VISION-LANGUAGE GENERATION

Hu, Minghui; Zheng, Chuanxia; Zheng, Heliang; Cham, Tat-Jen; Wang, Chaoyue; Yang, Zuopeng; Tao, Dacheng; Suganthan, Ponnuthurai N

المؤلف	Hu, Minghui
المؤلف	Zheng, Chuanxia
المؤلف	Zheng, Heliang
المؤلف	Cham, Tat-Jen
المؤلف	Wang, Chaoyue
المؤلف	Yang, Zuopeng
المؤلف	Tao, Dacheng
المؤلف	Suganthan, Ponnuthurai N
تاريخ الإتاحة	2025-01-19T10:05:08Z
تاريخ النشر	2023
اسم المنشور	11th International Conference on Learning Representations, ICLR 2023
المصدر	Scopus
المعرّف	http://dx.doi.org/10.48550/arXiv.2211.14842
معرّف المصادر الموحد	http://hdl.handle.net/10576/62244
الملخص	The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the"modality translation" and"multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks. 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
اللغة	en
الناشر	International Conference on Learning Representations, ICLR
الموضوع	Diffusion model Diffusion process Discrete diffusion Embeddings Image-based Language generation Multi-modal Single models Transition matrices Image generation Image Caption.
العنوان	UNIFIED DISCRETE DIFFUSION FOR SIMULTANEOUS VISION-LANGUAGE GENERATION
النوع	Conference
dc.accessType	Open Access

الملفات في هذه التسجيلة

الاسم:: 1852_unified_discrete_diffusio ...
الحجم:: 16.31Mb
الصيغة:: PDF

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

الشبكات وخدمات البنية التحتية للمعلومات والبيانات [‎142‎ items ]

عرض بسيط للتسجيلة

UNIFIED DISCRETE DIFFUSION FOR SIMULTANEOUS VISION-LANGUAGE GENERATION

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video