• English
    • العربية
  • العربية
  • Login
  • QU
  • QU Library
  •  Home
  • Communities & Collections
  • Help
    • Item Submission
    • Publisher policies
    • User guides
    • FAQs
  • About QSpace
    • Vision & Mission
View Item 
  •   Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Research Units
  • KINDI Center for Computing Research
  • Network & Distributed Systems
  • View Item
  • Qatar University Digital Hub
  • Qatar University Institutional Repository
  • Academic
  • Research Units
  • KINDI Center for Computing Research
  • Network & Distributed Systems
  • View Item
  •      
  •  
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    UNIFIED DISCRETE DIFFUSION FOR SIMULTANEOUS VISION-LANGUAGE GENERATION

    Thumbnail
    View/Open
    1852_unified_discrete_diffusion_for.pdf (16.31Mb)
    Date
    2023
    Author
    Hu, Minghui
    Zheng, Chuanxia
    Zheng, Heliang
    Cham, Tat-Jen
    Wang, Chaoyue
    Yang, Zuopeng
    Tao, Dacheng
    Suganthan, Ponnuthurai N
    ...show more authors ...show less authors
    Metadata
    Show full item record
    Abstract
    The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the"modality translation" and"multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks. 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
    DOI/handle
    http://dx.doi.org/10.48550/arXiv.2211.14842
    http://hdl.handle.net/10576/62244
    Collections
    • Network & Distributed Systems [‎142‎ items ]

    entitlement


    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Home

    Submit your QU affiliated work

    Browse

    All of Digital Hub
      Communities & Collections Publication Date Author Title Subject Type Language Publisher
    This Collection
      Publication Date Author Title Subject Type Language Publisher

    My Account

    Login

    Statistics

    View Usage Statistics

    About QSpace

    Vision & Mission

    Help

    Item Submission Publisher policiesUser guides FAQs

    Qatar University Digital Hub is a digital collection operated and maintained by the Qatar University Library and supported by the ITS department

    Contact Us | Send Feedback
    Contact Us | Send Feedback | QU

     

     

    Video