摘 要: 针对民事裁判文书区别于新闻文本的文本结构和重要信息分布的特点,基于BERT提出了一种结合粗粒度和细粒度抽取方法的结构化民事裁判文书摘要生成方法。首先通过粗粒度抽取方法对裁判文书进行重要的模块信息抽取,以保留文本结构;然后采用基于BERT的序列标注方法构建细粒度的抽取式摘要模型,从句子级别对重要模块的信息进行进一步抽取,以构建最终摘要。实验表明,相比于单一的粗粒度抽取或者细粒度抽取,本文方法均获得了更好的摘要生成性能。 |
关键词: 司法领域;裁判文书;抽取式文本摘要;序列标注 |
中图分类号: TP399
文献标识码: A
|
|
Research on Extractive Judgment Document Abstract Generation Method based on BERT |
WEI Xinyang1, TANG Xianghong1,2
|
( 1.College of Computer Science and Technology, Guizhou University, Guiyang 550025, China ; 2.Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China)
604564607@qq.com; xhtang@gzu.edu.cn
|
Abstract: Aiming at the text structure and important information distribution features of civil judgment documents that are different from news texts, this paper proposes a structured civil judgment document abstract generation method based on BERT (Bidirectional Encoder Representation from Transformers), combining coarse-grained and fine-grained extraction methods. Firstly, important module information is extracted from the judgment documents by the coarse-grained extraction method to preserve the text structure. Then the BERT-based sequence labeling method is used to build a fine-grained extractive abstract model. Information of important modules is further extracted based on the sentence level, so to construct the final abstract. Experiments show that the proposed method has better abstract generation performance than single coarsegrained extraction or fine-grained extraction. |
Keywords: judicial field; judgment documents; extractive text abstract; sequence annotation |