摘 要: 为了应对场景文本检测的查询更新上依赖隐式更新的挑战,提出了基于深度学习实现增强更新的文本检测模型。该模型首先对边界框的控制点进行建模完成查询的初始化。在解码过程中,该模型不仅利用解码器的注意力机制,还结合当前解码器层及后续层的预测信息来指导查询进行更精确的增强更新。此外,还引入了预测聚合模块,它能够对相似的控制点预测进行聚合,从而提高了检测的鲁棒性。Total-Text数据集上的实验,结果表明,Recall提升了0.7%,F-measure提升了0.3%,验证了该方案的有效性。 |
关键词: 文本检测 增强更新 深度学习 预测聚合 |
中图分类号: TP391
文献标识码: A
|
基金项目: 浙江省科技计划项目(2024C01181) |
|
A Text Detection Model with Enhanced Updates Based on Deep Learning |
ZHANG Hanshuo, JIANG Ming, ZHANG Min
|
(School of Computer, Hangzhou Dianzi University, Hangzhou 310018, China)
zhs1316168044@163.com; jmzju@163.com; hz_andy@163.com
|
Abstract: To address the challenge of implicit update dependency in query updates for scene text detection, this paper proposes a deep learning-based text detection model with enhanced updates. The model first initializes queries by modeling the control points of bounding boxes. During the decoding process, it not only leverages the attention mechanism of the decoder but also incorporates prediction information from both the current and subsequent decoder layers to guide more precise and enhanced query updates. Additionally, a prediction aggregation module is introduced to aggregate predictions of similar control points, thereby improving detection robustness. Experiments conducted on the Total-Text dataset demonstrate the effectiveness of the proposed method, achieving a 0.7% improvement in recall and a 0.3% increase in F-measure. |
Keywords: text detection enhanced updates deep learning prediction aggregation |