软件工程

引用本文:

张波.基于维基百科链接特征的词语语义相似度计算[J].软件工程,2019,22(10):36-43.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于维基百科链接特征的词语语义相似度计算

张波

(贺州学院数学与计算机学院，广西贺州 542899)

摘要: 针对目前基于维基百科的相似度计算方法预处理过程烦琐、计算量大的问题，本文以维基百科为本体引入基于特征的词语语义计算，提出了一种基于维基百科的快速词语相似度计算方法。根据维基百科页面链接结构的特点，该方法把页面的入链接和出链接作为页面特征值构建特征向量模型，通过计算页面的特征向量相关系数计算对应词语的语义相似度。本文还改进了维基百科消歧处理算法，在一词多义的处理中减少社会认知度低的义项页面的干扰，进一步提高了计算准确度。经Miller & Charles(MC30)和Rubenstein & Goodenough(RG65)测试集的测试，测试结果表明了基于维基百科链接特征的方法在计算相似度方面的可行性，也验证了本文的计算策略和消歧改进算法的合理性。

关键词: 语义相似度维基百科基于链接基于特征值

中图分类号: TP391 文献标识码: A

基金项目: 广西高校科学研究项目,基于描述逻辑的教育技术标准本体模型研究(项目批准号：ZD2014129).

A Semantic Similarity Calculation Based on the Features of Wikipedia Links

ZHANG Bo

( School of Mathematics & Computer Science, Hezhou University, Hezhou 542899, China)

Abstract: Measuring semantic similarity is a critical basic research in natural language processing.Because Wikipedia has open-editing,huge vocabulary,rapid update and other features,more and more research and applications have been focused on Wikipedia.This paper proposes a page-link approach for calculating word semantic similarity by taking Wikipedia as data resource.This approach improves the Wikipedia Link Vector Model (WLVM) method taking outgoing links as the feature vector,and utilizes page's incoming links and outgoing links as feature values in Wikipedia,then calculates the semantic similarity between words by measuring feature set similarity between the corresponding pages.The method also improves the disambiguation page processing by reducing the interference of the low social recognition pages.Through testing with Miller & Charles (MC30) and Rubenstein & Goodenough (RG65) benchmark,the validity of this method on the measuring word semantic similarity measurement is verified.

Keywords: word similarity Wikipedia link-based feature-based

用微信扫一扫