软件工程

引用本文:

刘俊,王修来.FPC-Kmeans++专利聚类分析与技术主题识别研究———以无人机领域为例[J].软件工程,2024,27(5):14-20.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

FPC-Kmeans++专利聚类分析与技术主题识别研究———以无人机领域为例

刘俊¹, 王修来^1,2

(1.南京信息工程大学计算机学院, 江苏南京 210044;
2.南京大学附属金陵医院, 江苏南京 210016)
20211249335@nuist.edu.cn; wangxiulai@126.com

摘要: 针对专利技术主题识别效率偏低、识别难度大等问题,文章提出了FPC-Kmeans++(Kmeans plus plus with feature phrase clusters)专利聚类分析与技术主题识别方法,该方法创新性地使用特征短语替代传统的分词结果,作为专利数据分析的基础。文章以无人机专利为例,对该方法进行了实证检验。实验结果表明,相较于传统的Kmeans++(Kmeans plus plus)和LDAKmeans++(Kmeans plus plus with Latent Dirichlet Allocation)方法,该方法能更精确地判断出最佳主题数和得到层次更鲜明的聚类效果,展现了其在专利主题识别上的优势。并且,相较于其他对比算法,文章提出的NER-FPP(Named Entity Recognition with Feature Phrase Probability)算法在专利特征短语提取上效果最好,F1值分数最高,达到了93.36%。

关键词: 主题识别专利聚类 NER TF-IDF

中图分类号: TP391.1 文献标识码: A

基金项目: 2022年国家社科基金一般项目(22BGL282)

Research on Patent Clustering Analysis and Technical Topic Recognition Based on FPC-Kmeans++ in the Field of Unmanned Aerial Vehicle Field

LIU Jun¹, WANG Xiulai^1,2

(1.School of Computer, Nanjing University of Inf ormation Science and Technology, Nanjing 210044, China;
2.Nanjing Jinling Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing 210016, China)
20211249335@nuist.edu.cn; wangxiulai@126.com

Abstract: In view of the low efficiency and high difficulty of patent technical topic recognition, this paper proposes a FPC-Kmeans++ (Kmeans Plus Plus with Feature Phrase Clusters) patent clustering analysis and technical topic recognition method, which innovatively uses feature phrases instead of traditional word segmentation results as the basis for patent data analysis. Taking patents of Unmanned Aerial Vehicle (UAV) as examples, this method is empirically tested. The experimental results show that compared to traditional Kmeans++ and LDAKmeans++ (Kmeans Plus Plus with Latent Dirichlet Allocation) methods, the proposed method can more accurately determine the optimal number of topics and achieve more distinct hierarchical clustering effects, demonstrating its advantages in patent topic recognition. Furthermore, compared to other contrast algorithms, the proposed NER-FPP ( Named Entity Recognition with Feature Phrase Probability) algorithm performs best in extracting patent feature phrases, with the highest F1 score reaching 93.36% .

Keywords: topic recognition patent clustering NER TF-IDF

用微信扫一扫