摘 要: 为解决现有的抽取方法在中文简历中抽取准确率低和抽取信息不全面的问题,提出了基于分区过滤网络的中文简历实体关系联合抽取模型。该模型首先使用Chinese-RoBERTa-wwm-ext预训练语言模型对输入序列进行预处理。其次结合多头自注意力机制和分区过滤网络的门控机制的结果,精确提取输入序列的特征,并将提取到的特征与全局表示拼接后进行多标签预测。在中文简历数据集上的实验结果表明,该模型的精确率、召回率和F1值分别为95.86%、97.02%和96.44%,这3项指标与之前最优的模型相比,分别提高了3.12百分点、2.83百分点和2.98百分点,证明了该模型能够有效地提升中文简历实体关系抽取的准确率。 |
关键词: 中文简历;实体关系联合抽取;多头自注意力机制;分区过滤网络;全局表示 |
中图分类号: TP391.1
文献标识码: A
|
基金项目: 陕西科技大学博士科研启动基金项目(2022BJ-20) |
|
Research on Joint Extraction of Chinese Resume Entity Relationships Based on Partition Filter Network |
WEI Jie, YANG Yuexin, WANG Changhao
|
(School of Electronic Inf ormation and Artif icial Intelligence, Shaanxi University of Science and Technology, X'i an 710021, China)
221612123@sust.edu.cn; yangyuexin@sust.edu.cn; wangchanghao@sust.edu.cn
|
Abstract: To address the issues of low accuracy and incomplete information extraction in existing methods for Chinese resume extraction, this paper proposes a joint entity relationship extraction model based on a partition filter network. The model first preprocesses the input sequence using the Chinese-RoBERTa-wwm-ext pretrained language model. Next, it combines the results of the multi-head self-attention mechanism with the gating mechanism of the partition filtering network to accurately extract features from the input sequence. The extracted features are then concatenated with global representations for multi-label prediction. Experimental results on a Chinese resume dataset show that the model achieves precision, recall, and F1-score of 95.86% , 97.02% , and 96.44% , respectively. Compared to the previous best model, the three metrics represent improvements of 3.12 percentage points, 2.83 percentage points, and 2.98 percentage points, respectively, demonstrating that the proposed model effectively enhances the accuracy of Chinese resume entity relation extraction. |
Keywords: Chinese resumes; joint entity relationship extraction; multi-head self-attention mechanism; partition filtering network; global representation |