基于单阶段GANs的文本生成图像模型-AET-电子技术应用

基于单阶段GANs的文本生成图像模型

信息技术与网络安全

胡涛1，李金龙2

(1.中国科学技术大学大数据学院，安徽合肥230026； 2.中国科学技术大学计算机科学与技术学院，安徽合肥230026)

摘要： 针对目前生成以文本为条件的图像通常会遇到生成质量差、训练不稳定的问题，提出了通过单阶段生成对抗网络(GANs)生成高质量图像的模型。具体而言，在GANs的生成器中引入注意力机制生成细粒度的图像，同时通过在判别器中添加局部-全局语言表示，来精准地鉴别生成图像和真实图像；通过生成器和判别器之间的相互博弈，最终生成高质量图像。在基准数据集上的实验结果表明，与具有多阶段框架的最新模型相比，该模型生成的图像更加真实且取得了当前最高的IS值，能够较好地应用于通过文本描述生成图像的场景。

关键词： 文本生成图像生成对抗网络注意力机制

中图分类号： TP391
文献标识码： A
DOI： 10.19358/j.issn.2096-5133.2021.06.009
引用格式：胡涛，李金龙. 基于单阶段GANs的文本生成图像模型[J].信息技术与网络安全，2021，40(6)：50-55.

Text to image generation based on single-stage GANs

Hu Tao1，Li Jinlong2

(1.School of Data Science，University of Science and Technology of China，Hefei 230026，China； 2.School of Computer Science and Technology，University of Science and Technology of China，Hefei 230026，China)

Abstract： For the current generation of images conditioned on text usually encounters the problems of poor quality and unstable training, a model for generating high-quality images through single-stage generative adversarial networks (GANs) is proposed. Specifically, the attention mechanism is introduced into the generator to generate fine-grained images, also, local language is added to the discriminator to indicate accurate discrimination between the generated image and the real image. Finally, a high-quality image is generated through the mutual game of the generator and the discriminator. The experimental results on the benchmark dataset show that, compared with the latest model with a multi-stage framework, the image generated by the model is more realistic and achieves the highest IS value, which can be better applied to scenes that generate images through text descriptions.

Key words : text to image generation；generative adversarial networks；attention mechanism

0 引言

生成以给定文本描述为条件的高分辨率逼真的图像，已成为计算机视觉(CV)和自然语言处理(NLP)中具有挑战性的任务。该课题具有各种潜在的应用，例如艺术创作、照片编辑和视频游戏。

最近，由于生成对抗网络(GANs)[1]在生成图像中已经取得了很好的效果，REEDS在2016年首次提出通过条件生成对抗网络(cGANs)[2]从文字描述中生成合理的图像[3]；ZHANG H在2017年通过堆叠多个生成器和判别器提出StackGAN++[4]模型，首次生成256×256分辨率图像。当前，几乎所有文本生成图像模型都是基于StackGAN，这些模型有多对生成器和判别器，通过将文本嵌入和随机噪声输入第一个生成器生成初始图像，在后续的生成器中对初始图像进行细化最终生成高分辨率图像。例如，AttnGAN[5]在每个生成器中引入了跨模态注意力机制，以帮助生成器更详细地合成图像；MirrorGAN[6]从生成的图像中重新生成文本描述，以实现文本-图像语义一致性；DM-GAN[7]引入了动态记忆网络[8]来解决堆叠结构训练不稳定的问题。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000003600

作者信息：

胡涛1，李金龙2

(1.中国科学技术大学大数据学院，安徽合肥230026；

2.中国科学技术大学计算机科学与技术学院，安徽合肥230026)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容