摘 要: 针对Transformer模型在处理图像任务时存在计算复杂度过大的问题,提出了一种改进的Swin Transformer图像分类识别方法。首先,Swin Transformer使用补丁(Patch)化的图像特征图处理方法,极大地降低了计算复杂度,提高了模型性能。其次,在Swin Transformer的基础上加入全局的信息交互模块,加深了跨模态特征信息之间的表征能力,使模型能够获得更准确的图像分类准确率和更快的模型收敛速度。实验结果表明,该模型在公开数据集ImageNet上获得的分类准确率能达到84.2%。本文方法相较于Swin Transformer图像分类方法,分类准确率提高了2.8%。 |
关键词: 图像分类;计算复杂度;信息交互;模型收敛 |
中图分类号: TP391
文献标识码: A
|
|
An Improved Swin Transformer Image Classification and Recognition Method |
CHEN Cheng1, GENG Xiaozhong2, LIU Baijin1, WANG Linen1, HU Weixin2
|
(1. School of Inf ormation and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, China; 2. School of Computer Technology and Engineering, Changchun Institute of Technology, Changchun 130012, China)
2468295244@qq.com; dq_gxz@ccit.edu.cn; 1692797200@qq.com; 3172876826@qq.com; 1051090429@qq.com
|
Abstract: This paper proposes an improved Swin Transformer image classification and recognition method to address the issue of excessive computational complexity in processing image tasks with the Transformer model. Firstly, Swin Transformer uses a patched image feature map processing method, which greatly reduces computational complexity and improves model performance. Secondly, by adding a global information exchange module on the basis of Swin Transformer, the representation ability between cross modal feature information is deepened, and the model can achieve more accurate image classification accuracy and faster model convergence speed. The results of this experiment indicate that the classification accuracy achieved by the model on the public dataset ImageNet can reach 84. 2% . Compared to the Swin Transformer image classification method, the improved method has improved classification accuracy by 2.8% . |
Keywords: image classification; computational complexity; information interaction; model convergence |