摘 要: 在中文电子病历命名实体识别(CNER)中,中文文本缺乏划分单词边界的分隔符,一些现有的方法难以捕捉长距离相互依赖的特征。因此,文章提出一种利用预训练模型(BERT-Transformer-CRF, BTC)实现CNER的命名实体识别方法。首先,运用BERT(Bidirectional Encoder Representations from Transformers)提取文本特征。其次,使用Transformer捕捉字符之间的依赖关系,此过程不需要考虑字符间的距离;此外,由于汉字的术语字典信息和部首信息包含更深层次的语义信息,所以将术语字典和部首的特征纳入模型以提高模型的性能。最后,运用CRF解码预测标签。实验结果表明所提模型在CCKS2017和CCKS2021数据集上的F1值分别达到了96.22%和84.65%,优于当前主流的命名实体识别模型,具有更好的识别效果。 |
关键词: 中文命名实体识别;部首特征;Transformer;BERT |
中图分类号: TP391
文献标识码: A
|
|
Research on Chinese Clinical Named Entity Recognition Method based on Radical Feature and BERT-Transformer-CRF |
YAO Lei, JIANG Mingfeng, FANG Xian, WEI Bo, LI Yang
|
(School of Computer Science and Technology, Zhejiang Sci -Tech University, Hangzhou 310018, China)
202030504181@mails.zstu.edu.cn; m.jiang@zstu.edu.cn; xianfang@zstu.edu.cn; weibo@zstu.edu.cn; yangli@zstu.edu.cn
|
Abstract: In Chinese Clinical Named Entity Recognition (CNER), Chinese text lacks separators to delineate word boundaries, and some existing methods are difficult to capture the long-distance interdependent features. This paper proposes a pre-trained BERT-Transformer-CRF method to realize CNER. Firstly, BERT (Bidirectional Encoder Representation Transformer) is applied to extract text features. Then, Transformer is utilized to capture the dependencies between characters regardless of the distance between characters. In addition, as term dictionary and radical information of Chinese characters contain deeper semantic information, the features of term dictionary and radicals are incorporated into the model to improve its performance. Finally, CRF (Conditional Random Field) is applied to decode predicted labels. The experimental results show that F1 values of the proposed model on CCKS2017 and CCKS2021 datasets reach 96.22% and 84.65% respectively, which is superior to the current mainstream named entity recognition model and has better recognition effect. |
Keywords: Chinese Clinical Named Entity Recognition; radical feature; transformer; BERT |