• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:刘 俊,王修来.FPC-Kmeans++专利聚类分析与技术主题识别研究———以无人机领域为例[J].软件工程,2024,27(5):14-20.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
FPC-Kmeans++专利聚类分析与技术主题识别研究———以无人机领域为例
刘 俊1, 王修来1,2
(1.南京信息工程大学计算机学院, 江苏 南京 210044;
2.南京大学附属金陵医院, 江苏 南京 210016)
20211249335@nuist.edu.cn; wangxiulai@126.com
摘 要: 针对专利技术主题识别效率偏低、识别难度大等问题,文章提出了FPC-Kmeans++(Kmeans plus plus with feature phrase clusters)专利聚类分析与技术主题识别方法,该方法创新性地使用特征短语替代传统的分词结果,作为专利数据分析的基础。文章以无人机专利为例,对该方法进行了实证检验。实验结果表明,相较于传统的Kmeans++(Kmeans plus plus)和LDAKmeans++(Kmeans plus plus with Latent Dirichlet Allocation)方法,该方法能更精确地判断出最佳主题数和得到层次更鲜明的聚类效果,展现了其在专利主题识别上的优势。并且,相较于其他对比算法,文章提出的NER-FPP(Named Entity Recognition with Feature Phrase Probability)算法在专利特征短语提取上效果最好,F1值分数最高,达到了93.36%。
关键词: 主题识别;专利聚类;NER;TF-IDF
中图分类号: TP391.1    文献标识码: A
基金项目: 2022年国家社科基金一般项目(22BGL282)
Research on Patent Clustering Analysis and Technical Topic Recognition Based on FPC-Kmeans++ in the Field of Unmanned Aerial Vehicle Field
LIU Jun1, WANG Xiulai1,2
(1.School of Computer, Nanjing University of Inf ormation Science and Technology, Nanjing 210044, China;
2.Nanjing Jinling Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing 210016, China)
20211249335@nuist.edu.cn; wangxiulai@126.com
Abstract: In view of the low efficiency and high difficulty of patent technical topic recognition, this paper proposes a FPC-Kmeans++ (Kmeans Plus Plus with Feature Phrase Clusters) patent clustering analysis and technical topic recognition method, which innovatively uses feature phrases instead of traditional word segmentation results as the basis for patent data analysis. Taking patents of Unmanned Aerial Vehicle (UAV) as examples, this method is empirically tested. The experimental results show that compared to traditional Kmeans++ and LDAKmeans++ (Kmeans Plus Plus with Latent Dirichlet Allocation) methods, the proposed method can more accurately determine the optimal number of topics and achieve more distinct hierarchical clustering effects, demonstrating its advantages in patent topic recognition. Furthermore, compared to other contrast algorithms, the proposed NER-FPP ( Named Entity Recognition with Feature Phrase Probability) algorithm performs best in extracting patent feature phrases, with the highest F1 score reaching 93.36% .
Keywords: topic recognition; patent clustering; NER; TF-IDF


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫