摘 要: 新闻文本分类是自然语言处理领域中一项重要任务,本文使用新闻标题进行文本分类。随着BERT预训练模型的崛起,BERT模型在多项NLP(Natural Language Processing)任务中都取得了较好的效果,并应用在新闻分类任务中。为了提高新闻分类这一主要任务的效果,本文引入辅助任务判断两个新闻是否是同类新闻,对BERT预训练模型在辅助任务和主要任务上进行微调。在THUCNews数据集上进行实验,实验结果表明,引入辅助任务的BERT新闻分类模型在效果上优于原BERT模型。 |
关键词: 新闻文本分类;BERT;辅助任务 |
中图分类号: TP391
文献标识码: A
|
基金项目: 太原工业学院省级大创项目(S202114101016). |
|
Research on BERT Chinese News Text Classification based on Auxiliary Tasks |
CUI Jianqing, QIU Cehao
|
(Department of Computer Engineering, Taiyuan Institute of Technology, Taiyuan 030008, China)
initgraph@163.com; 2694266017@qq.com
|
Abstract: News text classification is an important task in the field of natural language processing. This paper proposes to use news headlines for text classification. With the rise of BERT (Bidirectional Encoder Representation from Transformers) pre-training model, BERT model has achieved good results in many NLP (Natural Language Processing) tasks, and it is also applied to news classification tasks. In order to improve the effect of the main task of news classification, an auxiliary task is introduced to judge whether the two news are similar ones, and the BERT pre-training model is fine-tuned in the auxiliary task and the main task separately. Experiments are carried out on THUCNews data set. The experimental results show that BERT news classification model with auxiliary tasks is better than the original BERT model. |
Keywords: news text classification; BERT; auxiliary task |