摘 要: 为了解决目标轴承生产企业存在的手写原料表格存储难、二次利用率低、人工登记效率低且失误率高等问题,基于形态学检测原理和在Tesseract-OCR字符识别的基础上,设计一套原料制式表单识别系统。该识别系统可以对手写表格进行二值化、降噪、倾斜校正等预处理,并采用形态学检测对表格框架进行提取,通过动态掩膜及角点检测实现单元格分割,再采用jTessBoxEditor工具训练字库,从而实现对手写表格的识别过程。实验结果表明:识别系统对图片的识别时间仅需6.88 s,准确率达到96%,具有较高的应用价值和实用价值。 |
关键词: 预处理;形态学检测;Tesseract-OCR;表格框架;动态掩膜 |
中图分类号: TP391
文献标识码: A
|
|
Recognition and Digital Processing of Handwritten Forms for Factory Inspection |
FANG Haodong, BAO Min
|
(School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China )
18375398223@163.com; mbao@zstu.edu.cn
|
Abstract: In order to solve the problems of difficult storage, low secondary utilization, low manual registration efficiency and high error rate of handwritten raw material forms in target bearing enterprises, this paper proposes to design a set of system for recognizing standardized raw material forms based on the principle of morphological detection and Tesseract-OCR character recognition.The recognition system can carry out the preprocessing for the handwritten forms such as binarization, noise reduction and tilt correction, and use morphological detection to extract the form frame. Cell segmentation is realized through dynamic mask and corner detection, and then jTessBoxEditor tool is used to train the word library, so as to realize the recognition process of handwritten form. The experimental results show that the proposed recognition system takes only 6.88 seconds to recognize images, and the accuracy rate reaches 96%, which is of high practical value. |
Keywords: preprocess; morphological detection; Tesseract-OCR; table frame; dynamic mask |