摘 要: 为了从大量微博信息中提取重要事件并预测发展趋势,基于微博的地理特征和时间特征,提出了一种对微博进行聚类和索引的多层次方法。该方法使用X均值聚类,根据用户输入的关键词建立索引,并根据索引自动评估聚类的数量。同时,基于情感特征对微博进行聚类,创建包含负面情感微博和正面情感微博的两个聚类。实验结果表明,所提索引机制不仅便于搜索,而且有利于检索任务。与其他微博聚类方法相比,所提方法在DBI指标和S系数两个指标上均有更好的表现,且时间复杂度较传统方法更低,与输入数据量的对数成正比。 |
关键词: 微博检索;时间特征;地理特征;情感特征 |
中图分类号: TP391
文献标识码: A
|
基金项目: 江西省教育厅科学技术研究项目(GJJ191016). |
|
Research on Multi-level Microblog Retrieval Method based on Multiple Features |
FAN Yimin
|
(College of Computer Information and Engineering, Nanchang Institute of Technology, Nanchang 330044, China)
rowan521@163.com
|
Abstract: In order to extract important events from a large amount of microblog information and predict the development trend, this paper proposes a multi-level method for clustering and indexing microblogs based on geographic and temporal characteristics of microblogs. X-mean clustering is used in this method, an index is built based on the keywords entered by the user, and the number of clusters is automatically evaluated based on the index. At the same time, the microblogs are clustered based on emotional characteristics, and two clusters containing negative emotional microblogs and positive emotional microblogs are created. Experimental results show that the proposed indexing mechanism is not only convenient for searching, but also conducive to retrieval tasks. Compared with other microblog clustering methods, the proposed method has better performance on both the DBI (Discriminated Bond Index) indicator and the S coefficient. The time complexity is lower than that of the traditional method, which is proportional to the logarithm of the input data volume. |
Keywords: microblog retrieval; temporal characteristics; geographic characteristics; emotional characteristics |