摘 要: 对支持检索结果多样化任务的信息源选择进行了研究。分析了现有研究的不足,提出利用词向量提取文 本的语义特征,在此基础上实现文档建模和信息源选择。采用ClueWeb12b-13数据集构建实验平台和进行实验,基于R 方法的评价结果表明,所提出的方法优于现有的方法GLS和MnStD,且在不同条件下性能稳定。 |
关键词: 联邦搜索;信息源选择;检索结果多样化;词向量 |
中图分类号: TP391.3
文献标识码: A
|
|
Word Representation-Based Resource Selection for Search Result Diversification in Federated Search |
WANG Yarong,LI Liang,WU Shengli
|
(Jiangsu University, Zhenjiang 212013, China)
|
Abstract: This article studies the resource selection in supporting of search result diversification,analyzes the shortcomings of the existing researches and proposes to use the distributed word representation to extract the semantic features of the text.Based on this,document modeling and resource selection are achieved.The experimental platform is constructed by using the ClueWeb12b-13 dataset.The evaluation results based on the R-method show that the proposed algorithm is superior to the existing GLS and MnStD and it is stable in various kinds of situations. |
Keywords: federated search;resource selection;search result diversification;distributed word representation |