摘 要: 为了解决文档图像质量评价网络对图像特征提取不充分、评价指标不恰当等问题,提出了一种基于Transformer的双流文档图像质量评价算法。首先,利用Transformer提取图像特征,计算特征通道间注意力;其次,使用权重模块预测文档图像OCR(光学字符识别)准确率作为文档图像质量得分,使用CNN(卷积神经网络)提取文档全局特征,全连接后预测图像的自然图像得分;最后,将两者得分结合作为预测图像的质量得分。实验结果表明,基于Transformer的双流文档图像质量评价算法在数据集上的皮尔逊线性相关系数(PLCC)达到0.9045,史比尔曼等级相关系数(SROCC)达到0.8775,证明该算法可以预测出更符合人类视觉标准的文档图像质量分数。 |
关键词: 图像质量评价;文档图像;Transformer;神经网络 |
中图分类号: TP391
文献标识码: A
|
基金项目: 国家自然科学基金资助(62172132) |
|
Researchon Dual-stream Document Image Quality Assessment Algorithm Basedon Transformer |
JIAO Shuheng, ZHANG Shanqing
|
(School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China)
yidachuanshuzi@hdu.edu.cn; sqzhang@hdu.edu.cn
|
Abstract: To address issues such as insufficient feature extraction and inappropriate evaluation metrics in document image quality assessment networks, this paper proposes a Dual-stream Document Image Quality Assessment (DSDIQA)algorithm based on Transformer. Firstly, Transformer is employed to extract image features and calculate attention between feature channels. Secondly, a weighting module is used to predict the OCR (Optical Character Recognition) accuracy of document images as the mage quality score, while a CNN (Convolutional Neural Network) is used to extract the global features of the document and the natural image score of the image is predicted after the full connectivity. Finally, the two scores are combined to form the overall quality score of the predicted image. Experimental results show that the Transformer-based dual-stream document image quality evaluation algorithm achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.904 5 and a Spearman Rank Order Correlation Coefficient (SROCC) of 0.877 5 on the dataset, demonstrating that the algorithm can predict document image quality scores that align more closely with human visual standards. |
Keywords: image quality assessment; document image; Transformer; neural network |