Improving text-to-image generation with object layout guidance

Zakraoui J.; Saleh M.; Al-Maadeed, Somaya; Jaam J.M.

المؤلف	Zakraoui J.
المؤلف	Saleh M.
المؤلف	Al-Maadeed, Somaya
المؤلف	Jaam J.M.
تاريخ الإتاحة	2022-05-19T10:23:07Z
تاريخ النشر	2021
اسم المنشور	Multimedia Tools and Applications
المصدر	Scopus
المعرّف	http://dx.doi.org/10.1007/s11042-021-11038-0
معرّف المصادر الموحد	http://hdl.handle.net/10576/31089
الملخص	The automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomposes the task of story visualization into three phases: semantic text understanding, object layout prediction, and image generation and refinement. We start by simplifying the text using a scene graph triple notation that encodes semantic relationships between the story objects. We then introduce an object layout module to capture the features of these objects from the corresponding scene graph. Specifically, the object layout module aggregates individual object features from the scene graph as well as averaged or likelihood object features generated by a graph convolutional neural network. All these features are concatenated to form semantic triples that are then provided to the image generation framework. For the image generation phase, we adopt a scene graph image generation framework as stage-I, which is refined using a StackGAN as stage-II conditioned on the object layout module and the generated output image from stage-I. Our approach renders object details in high-resolution images while keeping the image structure consistent with the input text. To evaluate the performance of our approach, we use the COCO dataset and compare it with three baseline approaches, namely, sg2im, StackGAN and AttnGAN, in terms of image quality and user evaluation. According to the obtained assessment results, our object layout guidance-based approach significantly outperforms the abovementioned baseline approaches in terms of the accuracy of semantic matching and realism of the generated images representing the story text sentences
راعي المشروع	This work was made possible by NPRP grant #10-0205-170346 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
اللغة	en
الناشر	Springer
الموضوع	Air navigation Convolutional neural networks Quality control Semantics Automatic Generation High resolution image Image generations Image Structures Individual objects Semantic matching Semantic relationships User evaluations Image enhancement
العنوان	Improving text-to-image generation with object layout guidance
النوع	Article
الصفحات	27423-27443
رقم العدد	18
رقم المجلد	80

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2211‎ items ]

عرض بسيط للتسجيلة

Improving text-to-image generation with object layout guidance

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video