摘 要: 针对自然场景中的文本图像存在信息、背景复杂,以及基于CNN(Convolutional Neural Networks)的自然场景文本图像检测鲁棒性低的问题,提出一种改进的Faster RCNN(Region based Convolutional NeuralNetworks)模型和多头注意力机制的字符关联模型文本检测识别方法。该方法首先使用改进的Faster RCNN模型检测出图像中字符的特征,其次通过字符关联模块和多头注意力模块获取字符间的语义关联信息,最后由字符输出模块的生成识别结果。实验结果表明,该方法具有良好的鲁棒性,能够有效利用字符间的关联信息和上下文语义信息解码字符序列,尤其是在不规则文本的识别中表现优异。 |
关键词: 场景文本识别;改进的Faster RCNN;鲁棒性;注意力机制 |
中图分类号: TP391
文献标识码: A
|
|
Text Detection and Recognition Method Based on Multi-Head Attention Mechanism |
GONG Yu1, ZHANG Yunhua2
|
(1.School o f In f ormation Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.School o f Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China)
202120501003@mails.zstu.edu.cn; xxyjs@zstu.edu.cn
|
Abstract: In response to the challenges posed by text images in natural scenes, which have complex information and backgrounds, as well as the low robustness of Convolutional Neural Networks(CNN)-based natural scene text image detection, this paper proposes an improved Faster RCNN(Region-based Convolutional Neural Networks) model and a character association model based on the multi-head attention mechanism for text detection and recognition. This method first utilizes an improved Faster RCNN model to detect the features of characters in an image. Next, the character association module and multi-head attention module are employed to obtain semantic association information between characters. Finally, the character output module generates recognition results. Experimental results demonstrate that this method exhibits good robustness, effectively leveraging the association information between characters and contextual semantic information to decode character sequences, particularly excelling in the recognition of irregular text. |
Keywords: scene text recognition; improved Faster RCNN; robustness; attention mechanism |