摘 要: 针对在多目标文本生成图像和语义相关度高的情况下,于CUB数据集中进行实验时,发现生成的鸟图像中有许多“多头”“多脚”情况,文章在MA-GAN(多阶段注意力机制的生成对抗网络)模型上加入对比学习以优化图像生成。同时,采用特征插值方法增强图像的某些特征,从而提高语义一致性和文本辨识度。通过在CUB和COCO数据集上的实现验证,改进后模型的IS(InceptionScore)指标分别提高了0.11和2.58,而R 分数(Rprecision)指标分别提高了1.98和1.37,证明了改进后的模型能够解决图像质量和语义一致性问题。 |
关键词: 文本生成图像;对比学习;文本特征表示;特征插值 |
中图分类号: TP393
文献标识码: A
|
|
Text-to-Image Generation Basedon Contrastive Learning |
ZHOU Gang, LI Handong, CHEN Yeye
|
(School of Electrical Engineering, Guizhou University, Guiyang 550025, China)
1101808591@qq.com; 470394668@qq.com; zgsrkl@126.com
|
Abstract: When conducting experiments on the CUB dataset with high semantic relevance and multi-object text generated images, it was found that many generated bird images contained instances of "multiple heads" and "multiple feet". To optimize image generation, this paper proposes to enhance the MA-GAN (Multi-stage Attention Mechanism Generative Adversarial Network) model with contrastive learning. Additionally, a feature interpolation method is used to enhance certain image features, thereby improving semantic consistency and text recognition. Experiments on the CUB and COCO datasets verify that that the improved model increases the Inception Score (IS) by 0.11 and 2.58, respectively, and the R-precision (R score) by 1.98 and 1.37, respectively. This demonstrates that the modified model effectively addresses the issues of image quality and semantic consistency. |
Keywords: text-to-image generation; contrastive learning; text feature representation; feature interpolation |