摘 要: 针对K-means算法中对初始聚类中心和孤立点敏感的缺点,我们通过从密度和距离两个方面的改进,提 出新的改进K-means算法。该算法引入特征权重,从近邻密度出发,去除孤立点对算法的影响,同时确定初始聚类中 心,在距离计算过程中,引入集成簇内与簇间距离的计算方法,以提升聚类的效果。实验结果表明,该算法比传统聚类 算法能够提升10%以上的聚类效果。 |
关键词: 聚类;K-means;特征加权;近邻密度;孤立点 |
中图分类号: TP311
文献标识码: A
|
|
A K-means Clustering Algorithm based on Density and Distance |
LUO Junfeng, HONG Dandan
|
( Network Information Center, Xi 'an Jiaotong University, Xi 'an 710049, China)
luojf@xjtu.edu.cn; ddhong@xjtu.edu.cn
|
Abstract: In order to improve the sensitivity of initial clustering centers and outliers of K-means algorithm, an improved K-means algorithm is proposed based on density and distance. In this algorithm, feature weight is introduced to remove the in uence of outliers on the algorithm from the neighborhood density. At the same time, the initial clustering center is determined. In the process of distance calculation, the distance calculation method within and between clusters is introduced to improve the clustering effect. The experimental results show that this algorithm improves the clustering effect by more than 10%, compared with the traditional clustering algorithm. |
Keywords: clustering; K-means; feature weighting; neighbor density; isolated points |