摘 要: 命名实体识别是一项从非结构化大数据集中抽取有意义的实体的技术。命名实体识别技术有着非常广泛 的应用,例如从轨道交通列车产生的海量运行控制日志中抽取日期、列车、站台等实体信息进行进阶数据分析。近年 来,基于学习的方法成为主流,然而这些算法严重依赖人工标注,训练集较小时会出现过拟合现象,无法达到预期的泛 化效果。针对以上问题,本文提出了一种基于强化学习的协同训练框架,在少量标注数据的情况下,无须人工参与,利 用大量无标注数据自动提升模型性能。在两种不同领域的语料上进行实验,模型F1值均提升10%,证明了本文方法的有 效性和通用性。同时,与传统的协同训练方法进行对比,本文方法F1值高于其他方法5%,实验结果表明本文方法更加 智能。 |
关键词: 强化学习;协同训练;命名实体识别 |
中图分类号: TP391.1
文献标识码: A
|
基金项目: 国家重点研发计划课题(2017YFB1201001). |
|
Named Entity Recognition Method Based on Co-training of Reinforcement Learning |
CHENG Zhonghui,CHEN Ke,CHEN Gang,XU Shize,FU Dingli1,2,3
|
1.( 1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;2. 2.Key Laboratory of Big Data Intelligent Computing of Zhejiang Province, Hangzhou 310027, China;3. 3.Zhejiang Huayun Electric Power Engineering Design & Consultation CO., LTD., Hangzhou 310027, China)
|
Abstract: Named entity recognition(NER)is a technique for extracting meaningful entities from unstructured big datasets.NER has a wide range of applications.An example of NER is advanced data analysis which extracts date,train,platform and other entity information from a large operation logs dataset produced by rail transit trains.In recent years,the reinforcement learning based method has become the mainstream method of solving this task.However,these algorithms rely heavily on manual labeling.The over-fitting problem may occur when the training set is small,and cannot achieve the expected generalization effect.In this paper,we propose a novel method,Reinforced Co-Training.With only small amount of labeled data,the performance of the named entity recognition model can be automatically improved by using a large amount of unlabeled data.We have experimented our framework on corpus in two different fields,the results show that the F1 value of our proposed method is increased by 10%,which proves the effectiveness and generality of the method in this paper.We also compared our method with the traditional co-training methods,the F1 value of our method is 5% higher than other methods,which shows that this method is more intelligent. |
Keywords: reinforcement learning;co-training;named entity recognition |